Esta página se muestra en inglés. Una traducción al español está en curso.
WorkflowsMay 12, 2026·2 min de lectura

Future AGI — Evals + Tracing for Agents

Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Native · 94/100Política: permitir
Superficie agent
Cualquier agent MCP/CLI
Tipo
Cli
Instalación
Manual
Confianza
Confianza: Established
Entrada
./bin/install
Comando CLI universal
npx tokrepo install 8c7b4d7d-b353-52bf-adc8-654c43f36edf
Introducción

Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host.

  • Best for: LLM/agent teams that want eval + tracing + guardrails in one feedback loop
  • Works with: Docker; Python 3.11+; OpenTelemetry; OpenAI-compatible gateway layer
  • Setup time: 15–40 minutes

Practical Notes

  • Per README: gateway benchmarks ~29k req/s on t3.xlarge; P99 ≤ 21 ms with guardrails on.
  • Per README: 50+ framework instrumentors (OTel-native tracing) and 50+ evaluation metrics.
  • Per README: built-in scanners for injection/jailbreak/PII plus a self-hostable data loop.

Main

A practical rollout plan:

  1. Instrument first, optimize later. Turn on tracing before you tune prompts, so every change has before/after evidence.
  2. Build a small eval suite (10–30 cases). Mix: happy-path, edge cases, tool failures, and policy-sensitive inputs.
  3. Route all traffic through the gateway. Keep routing, guardrails, and logging in one place; treat it like your agent “control plane”.
  4. Close the loop weekly. Use traces + eval failures to pick the next prompt/tool/fallback improvements.

If you already use OpenTelemetry elsewhere, align service names, environments, and trace IDs so you can correlate agent spans with API/database spans.

FAQ

Q: Is it only for evals? A: No—per README it includes tracing/observability, simulations, guardrails, and a gateway so you can run an end-to-end feedback loop.

Q: How do I start small? A: Self-host, then instrument one agent and run a tiny eval suite (10–30 cases). Expand only after you trust the data.

Q: What should I track first? A: Latency, token/cost proxies, tool-call success rate, and top failure modes (hallucination, injection, unsafe outputs).

🙏

Fuente y agradecimientos

Source: https://github.com/future-agi/future-agi > License: Apache-2.0 > GitHub stars: 938 · forks: 179

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados