# Future AGI — Evals + Tracing for Agents

> Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host.

## Install

Copy the content below into your project:

## Quick Use

**Cloud (fastest):**
```bash
pip install ai-evaluation
# Sign up: https://app.futureagi.com/auth/jwt/register
```

**Self-host (Docker):**
```bash
git clone https://github.com/future-agi/future-agi.git
cd future-agi
./bin/install
```

Open `http://localhost:3000`.

## Intro

Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host.

- **Best for:** LLM/agent teams that want eval + tracing + guardrails in one feedback loop
- **Works with:** Docker; Python 3.11+; OpenTelemetry; OpenAI-compatible gateway layer
- **Setup time:** 15–40 minutes

## Practical Notes

- Per README: gateway benchmarks ~29k req/s on t3.xlarge; P99 ≤ 21 ms with guardrails on.
- Per README: 50+ framework instrumentors (OTel-native tracing) and 50+ evaluation metrics.
- Per README: built-in scanners for injection/jailbreak/PII plus a self-hostable data loop.

## Main

A practical rollout plan:

1. **Instrument first, optimize later.** Turn on tracing before you tune prompts, so every change has before/after evidence.
2. **Build a small eval suite (10–30 cases).** Mix: happy-path, edge cases, tool failures, and policy-sensitive inputs.
3. **Route all traffic through the gateway.** Keep routing, guardrails, and logging in one place; treat it like your agent “control plane”.
4. **Close the loop weekly.** Use traces + eval failures to pick the next prompt/tool/fallback improvements.

If you already use OpenTelemetry elsewhere, align service names, environments, and trace IDs so you can correlate agent spans with API/database spans.

### FAQ

**Q: Is it only for evals?**
A: No—per README it includes tracing/observability, simulations, guardrails, and a gateway so you can run an end-to-end feedback loop.

**Q: How do I start small?**
A: Self-host, then instrument one agent and run a tiny eval suite (10–30 cases). Expand only after you trust the data.

**Q: What should I track first?**
A: Latency, token/cost proxies, tool-call success rate, and top failure modes (hallucination, injection, unsafe outputs).

## Source & Thanks

> Source: https://github.com/future-agi/future-agi
> License: Apache-2.0
> GitHub stars: 938 · forks: 179

---

<!-- ZH -->

## 快速使用

**云端（最快）：**
```bash
pip install ai-evaluation
# 注册：https://app.futureagi.com/auth/jwt/register
```

**自托管（Docker）：**
```bash
git clone https://github.com/future-agi/future-agi.git
cd future-agi
./bin/install
```

打开 `http://localhost:3000`。

## 简介

Future AGI 是开源的 Agent 评测与可观测性平台：提供 OTel 追踪、50+ 指标评测、对话模拟、注入/越狱扫描护栏与 OpenAI 兼容网关，支持本地或 Docker 自托管闭环迭代。

- **适合谁：** 想把评测、追踪与护栏做成闭环迭代的 LLM/Agent 团队
- **可搭配：** Docker；Python 3.11+；OpenTelemetry；OpenAI 兼容网关层
- **准备时间：** 15–40 分钟

## 实战建议

- README 给出基准：网关在 t3.xlarge 上约 29k req/s；开启护栏后 P99 ≤ 21 ms。
- README 标注：50+ 框架埋点（OTel 原生追踪）+ 50+ 评测指标。
- README 标注：内置注入/越狱/PII 扫描等护栏，可自托管把数据回流做闭环。

## 主要内容

落地建议（按优先级）：

1. **先埋点再优化。** 在调提示词/工具前先打开追踪，让每次改动都有前后对比证据。
2. **先做小规模评测集（10–30 条）。** 覆盖：正常输入、边界情况、工具失败、合规敏感输入。
3. **统一走网关。** 把路由、护栏与日志收敛到一处，把它当作 agent 的“控制平面”。
4. **每周闭环一次。** 用 traces + eval 失败样本决定下一轮改 prompt / tool / fallback。

如果你已有 OpenTelemetry 体系，建议统一 service name / environment / trace ID，便于把 agent span 跟 API/DB span 串起来定位瓶颈。

### FAQ

**它只做评测吗？**
答：不是。README 里包含追踪/可观测性、模拟、护栏与网关，目标是跑通端到端闭环。

**怎么从小规模开始？**
答：先自托管，再只接入一个 agent，并用 10–30 条用例跑通评测；数据可信后再扩容。

**优先监控哪些指标？**
答：延迟、token/成本近似、工具调用成功率，以及主要失败类型（幻觉、注入、越权/不安全输出）。

## 来源与感谢

> Source: https://github.com/future-agi/future-agi
> License: Apache-2.0
> GitHub stars: 938 · forks: 179


---
Source: https://tokrepo.com/en/workflows/future-agi-evals-tracing-for-agents
Author: AI Open Source