Practical Notes
- Per README: gateway benchmarks ~29k req/s on t3.xlarge; P99 ≤ 21 ms with guardrails on.
- Per README: 50+ framework instrumentors (OTel-native tracing) and 50+ evaluation metrics.
- Per README: built-in scanners for injection/jailbreak/PII plus a self-hostable data loop.
Main
A practical rollout plan:
- Instrument first, optimize later. Turn on tracing before you tune prompts, so every change has before/after evidence.
- Build a small eval suite (10–30 cases). Mix: happy-path, edge cases, tool failures, and policy-sensitive inputs.
- Route all traffic through the gateway. Keep routing, guardrails, and logging in one place; treat it like your agent “control plane”.
- Close the loop weekly. Use traces + eval failures to pick the next prompt/tool/fallback improvements.
If you already use OpenTelemetry elsewhere, align service names, environments, and trace IDs so you can correlate agent spans with API/database spans.
FAQ
Q: Is it only for evals? A: No—per README it includes tracing/observability, simulations, guardrails, and a gateway so you can run an end-to-end feedback loop.
Q: How do I start small? A: Self-host, then instrument one agent and run a tiny eval suite (10–30 cases). Expand only after you trust the data.
Q: What should I track first? A: Latency, token/cost proxies, tool-call success rate, and top failure modes (hallucination, injection, unsafe outputs).