[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"pack-detail-llm-observability-en":3,"seo:pack:llm-observability:en":78},{"code":4,"message":5,"data":6},200,"操作成功",{"pack":7},{"slug":8,"icon":9,"tone":10,"status":11,"status_label":12,"title":13,"description":14,"items":15,"install_cmd":77},"llm-observability","📊","#EA580C","stable","Stable","LLM Observability","Langfuse, AgentOps, LangSmith, Phoenix — the dashboards that catch token blow-ups before your CFO does.",[16,28,36,46,54,62,69],{"id":17,"uuid":18,"slug":19,"title":20,"description":21,"author_name":22,"view_count":23,"vote_count":24,"lang_type":25,"type":26,"type_label":27},288,"49a8eb0b-b44b-46c2-b3c8-b54e55fb224f","langfuse-open-source-llm-observability-49a8eb0b","Langfuse — Open Source LLM Observability","Langfuse is an open-source LLM engineering platform for tracing, prompt management, evaluation, and debugging AI apps. 24.1K+ GitHub stars. Self-hosted or cloud. MIT.","Langfuse",300,0,"en","skill","Skill",{"id":29,"uuid":30,"slug":31,"title":32,"description":33,"author_name":34,"view_count":35,"vote_count":24,"lang_type":25,"type":26,"type_label":27},236,"d570c84f-4e22-4723-806a-d23710686a5c","agentops-observability-ai-agents-d570c84f","AgentOps — Observability for AI Agents","Python SDK for AI agent monitoring. LLM cost tracking, session replay, benchmarking, and error analysis. Integrates with CrewAI, LangChain, AutoGen, and more. 5.4K+ stars.","Script Depot",240,{"id":37,"uuid":38,"slug":39,"title":40,"description":41,"author_name":42,"view_count":43,"vote_count":24,"lang_type":25,"type":44,"type_label":45},768,"4d9432ea-330f-44b6-a629-5b29627f746a","langsmith-prompt-debugging-llm-observability-4d9432ea","LangSmith — Prompt Debugging and LLM Observability","Debug, test, and monitor LLM applications in production. LangSmith provides trace visualization, prompt playground, dataset evaluation, and regression testing for AI.","Prompt Lab",305,"prompt","Prompt",{"id":47,"uuid":48,"slug":49,"title":50,"description":51,"author_name":52,"view_count":53,"vote_count":24,"lang_type":25,"type":26,"type_label":27},303,"42fa8573-760e-4a07-a19f-43422546e9f5","phoenix-open-source-ai-observability-42fa8573","Phoenix — Open Source AI Observability","Phoenix is an AI observability platform for tracing, evaluating, and debugging LLM apps. 9.1K+ stars. OpenTelemetry, evals, prompt management.","Arize AI",269,{"id":55,"uuid":56,"slug":57,"title":58,"description":59,"author_name":60,"view_count":61,"vote_count":24,"lang_type":25,"type":26,"type_label":27},442,"13e3c714-032f-4323-b9ee-69f38e613f45","openlit-opentelemetry-llm-observability-13e3c714","OpenLIT — OpenTelemetry LLM Observability","Monitor LLM costs, latency, and quality with OpenTelemetry-native tracing. GPU monitoring and guardrails built in. 2.3K+ stars.","AI Open Source",255,{"id":63,"uuid":64,"slug":65,"title":66,"description":67,"author_name":60,"view_count":68,"vote_count":24,"lang_type":25,"type":26,"type_label":27},730,"a53444d6-2d55-4f59-ba6f-3b672d7ec458","langtrace-open-source-ai-observability-platform-a53444d6","Langtrace — Open Source AI Observability Platform","Open-source observability for LLM apps. Trace OpenAI, Anthropic, and LangChain calls with OpenTelemetry-native instrumentation and a real-time dashboard.",248,{"id":70,"uuid":71,"slug":72,"title":73,"description":74,"author_name":75,"view_count":76,"vote_count":24,"lang_type":25,"type":26,"type_label":27},92,"aa41279c-0695-4fd6-a8ec-f70e0f255cff","gemini-cli-extension-observability-monitoring-logs-aa41279c","Gemini CLI Extension: Observability — Monitoring & Logs","Gemini CLI extension for Google Cloud observability. Set up monitoring, analyze logs, create dashboards, and configure alerts.","Google · Gemini Team",322,"tokrepo install pack\u002Fllm-observability",{"pageType":79,"pageKey":8,"locale":25,"title":80,"metaDescription":81,"h1":13,"tldr":82,"bodyMarkdown":83,"faq":84,"schema":100,"internalLinks":109,"citations":122,"wordCount":135,"generatedAt":136},"pack","LLM Observability: Langfuse, AgentOps, LangSmith, Phoenix","Catch token blow-ups before your CFO does. Langfuse, AgentOps, LangSmith, Phoenix — the dashboards every shipping LLM team runs. Install via TokRepo.","Seven LLM observability assets — open-source (Langfuse, Phoenix, AgentOps) plus hosted (LangSmith) — to trace prompts, score outputs, and alert on cost spikes before they hit your billing dashboard.","## What's in this pack\n\nYou can't fix what you can't see. The day a prompt regression silently 3x's your token bill is the day you wish you'd installed an observability layer last quarter. This pack collects the **seven assets** that turn an opaque LLM black box into a debuggable, alertable, optimizable system.\n\n| # | Asset | Tier | What it does |\n|---|---|---|---|\n| 1 | Langfuse | open-source | full traces, eval, prompt management — self-host or cloud |\n| 2 | AgentOps | open-source | agent-specific observability with session replay |\n| 3 | Arize Phoenix | open-source | OpenInference traces with built-in evaluators |\n| 4 | LangSmith | hosted | LangChain's first-party tracing & dataset platform |\n| 5 | Token cost dashboards | pattern | per-user, per-feature, per-prompt-version breakdown |\n| 6 | Latency budget alerts | pattern | p95 \u002F p99 with PagerDuty wiring |\n| 7 | Prompt version diffs | pattern | side-by-side trace replay across two prompt versions |\n\n## Why this matters\n\nThree production failure modes that observability catches and intuition misses:\n\n1. **Silent token inflation.** A \"minor\" prompt edit adds a 200-token reminder. Multiply by 1M requests\u002Fday and that's $2-6k\u002Fmo extra you didn't budget for. Langfuse's per-prompt-version cost view surfaces it on day one.\n2. **The 95th-percentile tail.** Average latency looks fine — but the 5% of queries hitting cold cache, retry loops, or oversized RAG payloads tank user experience. p99 dashboards from Phoenix or LangSmith make the tail visible.\n3. **Quality regression invisible at the unit level.** Each individual response looks plausible. Aggregate evaluator scores (LLM-as-judge, retrieval recall, hallucination rate) over the last 24h vs the previous 7d, and the regression jumps out.\n\n## Install in one command\n\n```bash\n# Install the entire pack\ntokrepo install pack\u002Fllm-observability\n\n# Or pick the platform you want to start with\ntokrepo install langfuse\ntokrepo install agentops\ntokrepo install phoenix\n```\n\nThe TokRepo CLI drops the SDK config and dashboard scaffolding into your project so traces start flowing on the next request — no manual instrumentation walk-through required.\n\n## Common pitfalls\n\n- **Logging full prompts and PII to a third-party SaaS.** If your prompts include user data, self-host Langfuse or Phoenix; don't ship raw payloads to LangSmith Cloud without redaction. All three open-source options run on a single VM under 4GB RAM for typical loads.\n- **No sampling on high-volume endpoints.** Tracing 100% of requests at 1M\u002Fday will overwhelm both your storage and your wallet. Sample 10% by default, 100% on errors. Langfuse and Phoenix both support this natively.\n- **Tracking tokens but not dollars.** Different models price differently per token. Configure model-pricing in your platform once; track cost in dollars, not just token counts. CFOs care about dollars.\n- **One generic dashboard for everyone.** Build one dashboard per persona — eng (latency, error rate), product (cost per feature), exec (cost per active user, week-over-week trend). Generic dashboards get ignored.\n- **No alert on prompt-version cost delta.** Add an alert that fires when a new prompt version's avg-cost-per-call deviates >20% from the previous version. This is the single highest-ROI alert you'll set up.\n\n## Relationship to other packs\n\nLLM Observability is the **runtime telemetry layer**. The complementary LLM Eval & Guardrails pack is the **offline scoring layer** — DeepEval, Promptfoo, Ragas. You want both: observability shows you what's happening in production, eval tells you whether a proposed change is better *before* you ship.\n\nMulti-Agent Frameworks (CAMEL, LangGraph, DeepAgents) are the *systems being instrumented*. If you're running a LangGraph workflow and can't see which node failed, you don't have observability — you have a print-statement debugger. Pair the framework pack with this one from day one.",[85,88,91,94,97],{"q":86,"a":87},"Is this stuff free?","Langfuse, Phoenix, and AgentOps are open-source under MIT\u002FApache 2.0 and run on a single VM. Self-hosted is free; you only pay for storage and compute. LangSmith is hosted-only and metered per trace — free tier covers small teams, prices scale to enterprise. For most teams the right answer is start with self-hosted Langfuse, switch to LangSmith only if you're already deep in the LangChain ecosystem and want first-party integration.",{"q":89,"a":90},"How does Langfuse compare to LangSmith?","Langfuse is open-source, self-hostable, and framework-agnostic — it works with LangChain, LlamaIndex, raw OpenAI SDK, custom code. LangSmith is closed-source, hosted, and tightly coupled to LangChain. Feature-wise they're roughly equivalent on tracing and prompt management; LangSmith has a slight edge on LangChain-specific features, Langfuse has a stronger evaluator framework and self-host story. Pick Langfuse if data sovereignty matters, LangSmith if you want zero-ops and are LangChain-native.",{"q":92,"a":93},"Will this work with Cursor or Codex CLI?","Observability is at the API call level, not the editor level — so any tool that hits an LLM API can be instrumented. The TokRepo install adds SDK init code to your project. If you're proxying through Claude Code, Cursor, or Codex CLI, instrument the agent backend (the framework or service that calls the LLM), not the editor. Each platform's SDK is a 5-line import.",{"q":95,"a":96},"What's the difference vs the LLM Eval pack?","Eval is offline scoring — given a prompt and a reference answer, how good is the output. Observability is runtime telemetry — what happened in production: latency, cost, errors, traces. Eval feeds CI; observability feeds dashboards and alerts. You need both. A common pattern: eval scores from your golden set get logged into your observability platform so quality, cost, and latency live on the same dashboard.",{"q":98,"a":99},"How much instrumentation overhead does this add?","Async batched logging adds ~1-3ms p50 latency to LLM calls — negligible compared to the model latency itself (often 500-3000ms). All four platforms ship async SDKs that batch traces in the background. Set sampling to 10% on high-volume endpoints to keep storage costs sane. The actual hot-path overhead is so low that there's no good reason to ship without observability.",{"@context":101,"@type":102,"name":13,"description":103,"numberOfItems":104,"publisher":105},"https:\u002F\u002Fschema.org","CollectionPage","Langfuse, AgentOps, LangSmith, Phoenix and the dashboards that catch token blow-ups before your CFO does.",7,{"@type":106,"name":107,"url":108},"Organization","TokRepo","https:\u002F\u002Ftokrepo.com",[110,114,118],{"url":111,"anchor":112,"reason":113},"\u002Fen\u002Fpacks\u002Fllm-eval-guardrails","LLM Eval & Guardrails","complementary offline scoring layer",{"url":115,"anchor":116,"reason":117},"\u002Fen\u002Fpacks\u002Fmulti-agent-frameworks","Multi-Agent Frameworks","the systems these dashboards instrument",{"url":119,"anchor":120,"reason":121},"\u002Fen\u002Ftools\u002Fclaude-code","Claude Code","the agent surface that emits the traces",[123,127,131],{"claim":124,"source_name":125,"source_url":126},"Langfuse open-source LLM engineering platform with tracing, evaluations, and prompt management","langfuse\u002Flangfuse","https:\u002F\u002Fgithub.com\u002Flangfuse\u002Flangfuse",{"claim":128,"source_name":129,"source_url":130},"Arize Phoenix open-source AI observability and evaluation library","Arize-ai\u002Fphoenix","https:\u002F\u002Fgithub.com\u002FArize-ai\u002Fphoenix",{"claim":132,"source_name":133,"source_url":134},"AgentOps SDK for monitoring, debugging and benchmarking AI agents","AgentOps-AI\u002Fagentops","https:\u002F\u002Fgithub.com\u002FAgentOps-AI\u002Fagentops",615,"2026-05-02T15:10:00Z"]