Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The Microsoft 365 Copilot Agent Evaluations CLI (@microsoft/m365-copilot-eval) helps you test, measure, and improve the quality of your agents with structured evaluations and rich result reports with AI-based scoring.
Note
The Agent Evaluations CLI is currently in preview. Features and functionality are subject to change.
What you can do
The evaluation tool provides the following capabilities:
- Run batch and interactive evaluations.
- Automatically score responses using Azure AI + machine learning evaluation metrics.
- Test using JSON datasets, inline prompts, or interactive input.
- Generate reports in HTML, JSON, or CSV formats.
Evaluation metrics
Each response is scored using standard evaluation metrics.
| Evaluator | Type | Scale | Default Threshold | Default |
|---|---|---|---|---|
| Relevance | LLM-based | 1-5 | 3 | Yes |
| Coherence | LLM-based | 1-5 | 3 | Yes |
| Groundedness | LLM-based | 1-5 | 3 | No |
| Similarity | LLM-based | 1-5 | 3 | No |
| Citations | Count-based | >= 0 | 1 | No |
| ExactMatch | String match | boolean | N/A | No |
| PartialMatch | String match | 0.0-1.0 | 0.5 | No |
How the evaluation workflow works
Evaluations follow a consistent workflow:
- Install and configure the CLI.
- Provide environment configuration and credentials.
- Create a dataset of test prompts.
- Run evaluations against your agent.
- Review results and iterate.
Required environment variables
The evaluation tool uses environment variables to authenticate and connect to your tenant and Azure OpenAI in Foundry Models resource.
| Variable | Description | Default |
|---|---|---|
TENANT_ID |
Microsoft Entra tenant ID where your agent is deployed. | None |
AZURE_AI_OPENAI_ENDPOINT |
Azure OpenAI endpoint URL. | None |
AZURE_AI_API_KEY |
Azure OpenAI API key. | None |
M365_TITLE_ID (optional) |
Title ID used to auto-detect the Microsoft 365 agent ID for evaluation. | None |
M365_AGENT_ID (optional) |
Explicit agent ID for evaluation. | Auto-detected from M365_TITLE_ID |
AZURE_AI_API_VERSION |
Azure OpenAI REST API version. | 2024-12-01-preview |
AZURE_AI_MODEL_NAME |
Model deployment name in your Azure OpenAI in Foundry Models resource. | gpt-4o-mini |
These values enable authentication and allow the tool to run LLM-based evaluation scoring. For details about how to get these values, see Get values for environment variables.