Technology

Microsoft 365 Copilot Researcher Gains Multi-Model AI Features for Enterprise Deep Research

Microsoft's Copilot Researcher now pits OpenAI's models against Anthropic's Claude to cross-check answers, posting a 13.8-point DRACO benchmark gain over rivals.

Marcus Williams·3/31/2026·2 min read

Published 03:54 PM

Listen to this article•0:00 min

Share this article:

Microsoft 365 Copilot Researcher Gains Multi-Model AI Features for Enterprise Deep Research — AI-generated illustration

Microsoft 365 Copilot's deep research agent took a structural turn on March 30 when the company unveiled two features, Critique and Council, that abandon the single-model answer in favor of a deliberative, multi-model workflow built explicitly for enterprise accountability.

Critique pairs two AI partners with distinct responsibilities: one model handles generation, planning the research task, iterating through retrieval, and producing an initial draft, while a second model acts as an expert reviewer, focusing on validation and refinement before a final report is delivered. Microsoft measured the result on the DRACO benchmark, a 100-task suite spanning 10 domains that grades outputs on factual accuracy, breadth and depth of analysis, presentation quality, and citation quality. Researcher with Critique posted a +7.0-point improvement in the aggregated DRACO score, a 13.88 percent gain over Perplexity Deep Research, which had held the top position in the February 2026 paper that introduced the benchmark.

Critique separates generation from evaluation and draws on models from frontier labs including Anthropic and OpenAI, with Anthropic's Claude reviewing responses drafted by OpenAI's models. Critique becomes the default when a user selects "auto" in Researcher's model picker, meaning most enterprise users will encounter cross-model scrutiny without any additional configuration.

Council brings multiple model responses side-by-side inside the Researcher experience, and a cover letter surfaces where the models agree, where they diverge, and the unique insights each contributes. The distinction between the two features reflects a deliberate design choice: Critique operates invisibly in the background to improve a single output, while Council hands the divergence directly to the user, turning model disagreement into a visible audit trail rather than a liability.

The governance logic underpinning both features tracks a specific pressure point in enterprise AI procurement. Businesses in finance, healthcare, and legal services have increasingly demanded tools that can explain and justify their reasoning, and Microsoft explicitly frames Critique and Council as responses for teams that need defensible, documented answers for regulatory processes. By exposing provenance and model-level disagreement, Microsoft is positioning Copilot not just as a drafting assistant but as a documented reasoning layer capable of withstanding compliance review.

The rollout is part of Microsoft's Wave 3 updates for Microsoft 365 Copilot, with Critique and Council broadly available through the company's Frontier program as of the March 30 announcement. The same wave also brought Copilot Cowork, an Anthropic Claude-powered agent designed for long-running, multi-step workflows inside Microsoft 365 apps, to Frontier availability on the same day.

Industry analysts characterized Microsoft's approach as a bet that differentiation in the AI market will ultimately be won at the context and orchestration layer rather than through raw model performance. The compute cost of running parallel models at scale remains an open question, but for regulated enterprises weighing the organizational risk of a single hallucinated answer in a legal brief or financial memo, Microsoft's architecture makes a direct case that the overhead is the point.

Know something we missed? Have a correction or additional information?

Submit a Tip