Technology

Microsoft unveils ASSERT, open-source framework for AI evaluations

Microsoft’s ASSERT turns plain-language AI policies into test suites, aiming to catch when agents stray on refunds, fraud and out-of-policy requests.

Sarah Chen·6/2/2026·2 min read

Published 07:17 PM

Listen to this article•0:00 min

Share this article:

Follow on Google

Microsoft unveils ASSERT, open-source framework for AI evaluations — Source: techcrunch.com

Microsoft moved to close one of enterprise AI’s most persistent trust gaps on Tuesday with ASSERT, an open-source framework that turns natural-language behavior requirements into executable evaluations for models, applications and agents. The system, short for Adaptive Spec-driven Scoring for Evaluation and Regression Testing, automatically generates test scenarios, datasets, metrics and scorecards, then runs them against a target system so developers can see where behavior drifts.

The pitch goes beyond generic model scoring. Microsoft said broad metrics such as helpfulness, relevance, groundedness, toxicity and faithfulness can still miss application-specific failures, especially when an agent has to follow business rules. A support bot may need to approve a refund in one case, escalate suspected fraud in another, and reject a request that falls outside policy. ASSERT is designed to make those boundaries explicit by treating the behavior specification as the starting point for evaluation, converting it into an inspectable taxonomy, generating stratified test cases and scoring each failure against the policy that produced it.

Microsoft said the framework was built on Microsoft Research work and was announced in a June 2, 2026 Foundry blog post as part of a broader Build 2026 trust framework push. The company said ASSERT works across LangChain, CrewAI, LiteLLM, OpenAI and other stacks, and the GitHub repository describes it as a requirement-driven evaluation harness for AI agents and LLM applications that can generate behavior-specific test cases and inspect local-first artifacts. That makes the tool less about a single benchmark and more about standardizing how teams test the rules that matter inside their own products.

The release also fit into a larger governance effort. Microsoft paired ASSERT with Agent Control Specification, a portable runtime control standard in its Agent Governance Toolkit, and it follows the open-source RAMPART and Clarity safety tools Microsoft released on May 20, 2026. Together, the projects show a strategy aimed at making agent testing and runtime control more repeatable. The limit is also clear: text-defined tests can make policies legible, but they still only catch the failures developers can think to describe, leaving live systems exposed to edge cases, emergent behavior and shifts that no static test set will fully cover.

This article was produced by Prism’s automated news system from verified source data, official records, and press releases, then run through automated quality and moderation checks before publishing. The system is built and supervised by the people who set the standards it runs under. Read our full AI policy.

Know something we missed? Have a correction or additional information?

Submit a Tip