# Future AGI — Evals + Tracing for Agents > Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host. ## Install Copy the content below into your project: ## Quick Use **Cloud (fastest):** ```bash pip install ai-evaluation # Sign up: https://app.futureagi.com/auth/jwt/register ``` **Self-host (Docker):** ```bash git clone https://github.com/future-agi/future-agi.git cd future-agi ./bin/install ``` Open `http://localhost:3000`. ## Intro Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host. - **Best for:** LLM/agent teams that want eval + tracing + guardrails in one feedback loop - **Works with:** Docker; Python 3.11+; OpenTelemetry; OpenAI-compatible gateway layer - **Setup time:** 15–40 minutes ## Practical Notes - Per README: gateway benchmarks ~29k req/s on t3.xlarge; P99 ≤ 21 ms with guardrails on. - Per README: 50+ framework instrumentors (OTel-native tracing) and 50+ evaluation metrics. - Per README: built-in scanners for injection/jailbreak/PII plus a self-hostable data loop. ## Main A practical rollout plan: 1. **Instrument first, optimize later.** Turn on tracing before you tune prompts, so every change has before/after evidence. 2. **Build a small eval suite (10–30 cases).** Mix: happy-path, edge cases, tool failures, and policy-sensitive inputs. 3. **Route all traffic through the gateway.** Keep routing, guardrails, and logging in one place; treat it like your agent “control plane”. 4. **Close the loop weekly.** Use traces + eval failures to pick the next prompt/tool/fallback improvements. If you already use OpenTelemetry elsewhere, align service names, environments, and trace IDs so you can correlate agent spans with API/database spans. ### FAQ **Q: Is it only for evals?** A: No—per README it includes tracing/observability, simulations, guardrails, and a gateway so you can run an end-to-end feedback loop. **Q: How do I start small?** A: Self-host, then instrument one agent and run a tiny eval suite (10–30 cases). Expand only after you trust the data. **Q: What should I track first?** A: Latency, token/cost proxies, tool-call success rate, and top failure modes (hallucination, injection, unsafe outputs). ## Source & Thanks > Source: https://github.com/future-agi/future-agi > License: Apache-2.0 > GitHub stars: 938 · forks: 179 --- ## 快速使用 **云端(最快):** ```bash pip install ai-evaluation # 注册:https://app.futureagi.com/auth/jwt/register ``` **自托管(Docker):** ```bash git clone https://github.com/future-agi/future-agi.git cd future-agi ./bin/install ``` 打开 `http://localhost:3000`。 ## 简介 Future AGI 是开源的 Agent 评测与可观测性平台:提供 OTel 追踪、50+ 指标评测、对话模拟、注入/越狱扫描护栏与 OpenAI 兼容网关,支持本地或 Docker 自托管闭环迭代。 - **适合谁:** 想把评测、追踪与护栏做成闭环迭代的 LLM/Agent 团队 - **可搭配:** Docker;Python 3.11+;OpenTelemetry;OpenAI 兼容网关层 - **准备时间:** 15–40 分钟 ## 实战建议 - README 给出基准:网关在 t3.xlarge 上约 29k req/s;开启护栏后 P99 ≤ 21 ms。 - README 标注:50+ 框架埋点(OTel 原生追踪)+ 50+ 评测指标。 - README 标注:内置注入/越狱/PII 扫描等护栏,可自托管把数据回流做闭环。 ## 主要内容 落地建议(按优先级): 1. **先埋点再优化。** 在调提示词/工具前先打开追踪,让每次改动都有前后对比证据。 2. **先做小规模评测集(10–30 条)。** 覆盖:正常输入、边界情况、工具失败、合规敏感输入。 3. **统一走网关。** 把路由、护栏与日志收敛到一处,把它当作 agent 的“控制平面”。 4. **每周闭环一次。** 用 traces + eval 失败样本决定下一轮改 prompt / tool / fallback。 如果你已有 OpenTelemetry 体系,建议统一 service name / environment / trace ID,便于把 agent span 跟 API/DB span 串起来定位瓶颈。 ### FAQ **它只做评测吗?** 答:不是。README 里包含追踪/可观测性、模拟、护栏与网关,目标是跑通端到端闭环。 **怎么从小规模开始?** 答:先自托管,再只接入一个 agent,并用 10–30 条用例跑通评测;数据可信后再扩容。 **优先监控哪些指标?** 答:延迟、token/成本近似、工具调用成功率,以及主要失败类型(幻觉、注入、越权/不安全输出)。 ## 来源与感谢 > Source: https://github.com/future-agi/future-agi > License: Apache-2.0 > GitHub stars: 938 · forks: 179 --- Source: https://tokrepo.com/en/workflows/future-agi-evals-tracing-for-agents Author: AI Open Source