WorkflowsMay 12, 2026·2 min read

Future AGI — Evals + Tracing for Agents

Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Native · 94/100Policy: allow
Agent surface
Any MCP/CLI agent
Kind
Cli
Install
Manual
Trust
Trust: Established
Entrypoint
./bin/install
Universal CLI install command
npx tokrepo install 8c7b4d7d-b353-52bf-adc8-654c43f36edf
Intro

Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host.

  • Best for: LLM/agent teams that want eval + tracing + guardrails in one feedback loop
  • Works with: Docker; Python 3.11+; OpenTelemetry; OpenAI-compatible gateway layer
  • Setup time: 15–40 minutes

Practical Notes

  • Per README: gateway benchmarks ~29k req/s on t3.xlarge; P99 ≤ 21 ms with guardrails on.
  • Per README: 50+ framework instrumentors (OTel-native tracing) and 50+ evaluation metrics.
  • Per README: built-in scanners for injection/jailbreak/PII plus a self-hostable data loop.

Main

A practical rollout plan:

  1. Instrument first, optimize later. Turn on tracing before you tune prompts, so every change has before/after evidence.
  2. Build a small eval suite (10–30 cases). Mix: happy-path, edge cases, tool failures, and policy-sensitive inputs.
  3. Route all traffic through the gateway. Keep routing, guardrails, and logging in one place; treat it like your agent “control plane”.
  4. Close the loop weekly. Use traces + eval failures to pick the next prompt/tool/fallback improvements.

If you already use OpenTelemetry elsewhere, align service names, environments, and trace IDs so you can correlate agent spans with API/database spans.

FAQ

Q: Is it only for evals? A: No—per README it includes tracing/observability, simulations, guardrails, and a gateway so you can run an end-to-end feedback loop.

Q: How do I start small? A: Self-host, then instrument one agent and run a tiny eval suite (10–30 cases). Expand only after you trust the data.

Q: What should I track first? A: Latency, token/cost proxies, tool-call success rate, and top failure modes (hallucination, injection, unsafe outputs).

🙏

Source & Thanks

Source: https://github.com/future-agi/future-agi > License: Apache-2.0 > GitHub stars: 938 · forks: 179

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets