# AgentEval — .NET Toolkit for Agent Evaluation

> AgentEval is a .NET evaluation toolkit for AI agents that validates tool usage, scores RAG quality, compares models, and exports regression-ready reports.

## Install

Save as a script file and run:

## Quick Use

1. Install / run:
   ```bash
   dotnet add package AgentEval --prerelease
   ```
2. Start / smoke test:
   ```bash
   dotnet test
   ```
3. Verify:
   - Run the Getting Started guide; confirm at least one evaluation asserts tool usage and produces a report artifact in CI.

## Intro

AgentEval is a .NET evaluation toolkit for AI agents that validates tool usage, scores RAG quality, compares models, and exports regression-ready reports.

- **Best for:** .NET teams building tool-using agents who want evaluation code that lives next to unit tests
- **Works with:** .NET 8+ apps; integrates with agent frameworks and CI pipelines; ships as a NuGet package
- **Setup time:** 15 minutes

## Practical Notes

- Setup time ~15 minutes (add NuGet + run one starter eval)
- Runs alongside tests: the fastest check is `dotnet test` with evaluation assertions enabled
- GitHub stars + forks (verified): see Source & Thanks

AgentEval is most useful when you treat tool usage as a contract. Instead of only judging final text, assert that:

- The agent called the expected tools (and did not call forbidden ones).
- The tool inputs are well-formed and minimally scoped.
- Retrieval answers are grounded (your RAG checks pass consistently).

Because this repo is explicitly labeled as preview/experimental, pin versions in CI and keep an upgrade checklist (baseline scores + golden traces) before bumping.

### FAQ

**Q: Is this production-ready?**
A: The repo warns it is preview/experimental. Use it in CI with pinned versions and your own validation before shipping.

**Q: Can I evaluate tool calls, not just text?**
A: Yes — tool usage validation is a first-class goal in the project description.

**Q: How do I start fast?**
A: Add the NuGet package, follow the Getting Started guide, and turn one high-risk workflow into an eval test.

## Source & Thanks

> Source: https://github.com/AgentEvalHQ/AgentEval
> License: MIT
> GitHub stars: 89 · forks: 8

---

<!-- ZH -->

## 快速使用

1. 安装 / 运行：
   ```bash
   dotnet add package AgentEval --prerelease
   ```
2. 启动 / 冒烟测试：
   ```bash
   dotnet test
   ```
3. 验证：
   - 按 Getting Started 跑通最小评测；确认至少 1 项断言能校验工具使用，并在 CI 里产出可保存的报告产物。

## 简介

AgentEval 是面向 .NET 的 Agent 评测工具箱：可校验工具调用、衡量 RAG 质量、做随机性/记忆基准，并输出可审计报告，让评测像单测一样进入工程流程与回归体系，适合 .NET 8+ 团队。

- **适合谁：** 用 .NET 构建工具型 Agent 的团队，希望把评测像单测一样纳入工程流程
- **可搭配：** .NET 8+ 应用；可与 agent 框架与 CI 集成；以 NuGet 包交付
- **准备时间：** 15 分钟

## 实战建议

- 准备时间约 15 分钟（引入 NuGet + 跑通一个最小评测）
- 可与单测同跑：最快的回归入口是 `dotnet test` + 评测断言
- GitHub stars / forks（已核验）：见「来源与感谢」

AgentEval 的关键价值在于：把“工具调用”变成契约。不要只评估最终文本，而要断言：

- Agent 是否调用了期望的工具（同时没有调用禁用工具）。
- 工具入参是否结构化且最小权限。
- 检索回答是否有依据（RAG 检查能稳定通过）。

由于该项目在 README 中明确提示为预览/实验性质，建议在 CI 固定版本，并在升级前准备好基线分数与 golden traces 作为对照。

### FAQ

**能直接上生产吗？**
答：README 提示为预览/实验性质。建议先在 CI 里使用并固定版本，自行验证后再用于关键流程。

**能评估工具调用而不是只看文本吗？**
答：可以。工具使用校验是项目的核心目标之一。

**怎么最快落地？**
答：引入 NuGet，按 Getting Started 跑通，然后把 1 个高风险工作流做成评测用例。

## 来源与感谢

> Source: https://github.com/AgentEvalHQ/AgentEval
> License: MIT
> GitHub stars: 89 · forks: 8


---
Source: https://tokrepo.com/en/workflows/agenteval-net-toolkit-for-agent-evaluation
Author: Agent Toolkit