Cette page est affichée en anglais. Une traduction française est en cours.
WorkflowsMay 12, 2026·2 min de lecture

Future AGI — Evals + Tracing for Agents

Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 94/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Cli
Installation
Manual
Confiance
Confiance : Established
Point d'entrée
./bin/install
Commande CLI universelle
npx tokrepo install 8c7b4d7d-b353-52bf-adc8-654c43f36edf
Introduction

Future AGI is an open-source platform for self-improving agents: tracing, evals, simulations, guardrails, and an OpenAI-compatible gateway to self-host.

  • Best for: LLM/agent teams that want eval + tracing + guardrails in one feedback loop
  • Works with: Docker; Python 3.11+; OpenTelemetry; OpenAI-compatible gateway layer
  • Setup time: 15–40 minutes

Practical Notes

  • Per README: gateway benchmarks ~29k req/s on t3.xlarge; P99 ≤ 21 ms with guardrails on.
  • Per README: 50+ framework instrumentors (OTel-native tracing) and 50+ evaluation metrics.
  • Per README: built-in scanners for injection/jailbreak/PII plus a self-hostable data loop.

Main

A practical rollout plan:

  1. Instrument first, optimize later. Turn on tracing before you tune prompts, so every change has before/after evidence.
  2. Build a small eval suite (10–30 cases). Mix: happy-path, edge cases, tool failures, and policy-sensitive inputs.
  3. Route all traffic through the gateway. Keep routing, guardrails, and logging in one place; treat it like your agent “control plane”.
  4. Close the loop weekly. Use traces + eval failures to pick the next prompt/tool/fallback improvements.

If you already use OpenTelemetry elsewhere, align service names, environments, and trace IDs so you can correlate agent spans with API/database spans.

FAQ

Q: Is it only for evals? A: No—per README it includes tracing/observability, simulations, guardrails, and a gateway so you can run an end-to-end feedback loop.

Q: How do I start small? A: Self-host, then instrument one agent and run a tiny eval suite (10–30 cases). Expand only after you trust the data.

Q: What should I track first? A: Latency, token/cost proxies, tool-call success rate, and top failure modes (hallucination, injection, unsafe outputs).

🙏

Source et remerciements

Source: https://github.com/future-agi/future-agi > License: Apache-2.0 > GitHub stars: 938 · forks: 179

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires