Observability

2026 最佳 AI 监控与可观测性工具推荐

AI 可观测性平台、LLM 评估工具、运行监控和 Agent 调试仪表盘。深入了解你的 AI 系统。

30 个工具
💬

LangSmith — Prompt Debugging and LLM Observability

Debug, test, and monitor LLM applications in production. LangSmith provides trace visualization, prompt playground, dataset evaluation, and regression testing for AI.

Prompt Lab 13Prompts

LangFuse — Open Source LLM Observability & Tracing

Trace, evaluate, and monitor LLM applications in production. Open-source alternative to LangSmith with prompt management, cost tracking, and evaluation pipelines.

AI Open Source 11Configs

Latitude — AI Agent Engineering Platform

Open-source platform for building, evaluating, and monitoring AI agents in production. Observability, prompt playground, LLM-as-judge evals, experiment comparison. LGPL-3.0, 4,000+ stars.

AI Open Source 35Scripts

Opik — Debug, Evaluate & Monitor LLM Apps

Trace LLM calls, run automated evaluations, and monitor RAG and agent quality in production. By Comet. 18K+ GitHub stars.

AI Open Source 31Configs

Agenta — Open-Source LLMOps Platform

Prompt playground, evaluation, and observability in one platform. Compare prompts, run evals, trace production calls. 4K+ stars.

Agent Toolkit 26Workflows

Grafana — Open Source Data Visualization & Observability

Grafana is the leading open-source platform for monitoring and observability. Visualize metrics, logs, and traces from Prometheus, Loki, Elasticsearch, and 100+ data sources.

Script Depot 15Scripts

Gemini CLI Extension: Observability — Monitoring & Logs

Gemini CLI extension for Google Cloud observability. Set up monitoring, analyze logs, create dashboards, and configure alerts.

Skill Factory 28Skills

AgentOps — Observability Dashboard for AI Agents

Python SDK for monitoring AI agent sessions with real-time dashboards, token tracking, cost analysis, and error replay. Two lines of code to instrument any framework. 4,500+ GitHub stars.

Agent Toolkit 23Scripts

Langfuse — Open Source LLM Observability

Langfuse is an open-source LLM engineering platform for tracing, prompt management, evaluation, and debugging AI apps. 24.1K+ GitHub stars. Self-hosted or cloud. MIT.

AI Open Source 23Configs
🔌

Axiom MCP — Log Search and Analytics for AI Agents

MCP server that gives AI agents access to Axiom log analytics. Query logs, traces, and metrics through natural language for AI-powered observability and incident response.

MCP Hub 21MCP Configs
⚙️

OpenLIT — OpenTelemetry LLM Observability

Monitor LLM costs, latency, and quality with OpenTelemetry-native tracing. GPU monitoring and guardrails built in. 2.3K+ stars.

AI Open Source 19Configs

AgentOps — Observability for AI Agents

Python SDK for AI agent monitoring. LLM cost tracking, session replay, benchmarking, and error analysis. Integrates with CrewAI, LangChain, AutoGen, and more. 5.4K+ stars.

Script Depot 19Scripts
📜

Netdata — Real-Time Infrastructure Monitoring & Observability

Netdata is an open-source monitoring agent that collects thousands of metrics per second with zero configuration. Beautiful dashboards, ML-powered alerts, and instant deployment.

Script Depot 18Scripts
⚙️

Phoenix — Open Source AI Observability

Phoenix is an AI observability platform for tracing, evaluating, and debugging LLM apps. 9.1K+ stars. OpenTelemetry, evals, prompt management.

AI Open Source 15Configs

Evidently — ML & LLM Monitoring with 100+ Metrics

Evaluate, test, and monitor AI systems with 100+ built-in metrics for data drift, model quality, and LLM output. 7.3K+ stars.

AI Open Source 14Workflows

OpenAI Agents SDK — Build Multi-Agent Systems in Python

Official OpenAI Python SDK for building multi-agent systems with handoffs, guardrails, and tracing. Agents delegate to specialists, enforce safety rules, and produce observable traces. 8,000+ stars.

Agent Toolkit 13Scripts

Sentry MCP — Error Monitoring Server for AI Agents

MCP server that connects AI agents to Sentry for real-time error monitoring. Query issues, analyze stack traces, track regressions, and resolve bugs with full crash context. 2,000+ stars.

MCP Hub 12MCP Configs
⚙️

Langtrace — Open Source AI Observability Platform

Open-source observability for LLM apps. Trace OpenAI, Anthropic, and LangChain calls with OpenTelemetry-native instrumentation and a real-time dashboard.

AI Open Source 11Configs
⚙️

SigNoz — Open Source APM & Observability Platform

SigNoz is an open-source Datadog/New Relic alternative with logs, traces, and metrics in one platform. Native OpenTelemetry support, ClickHouse backend, and powerful dashboards.

AI Open Source 10Configs

Sentry — Open Source Error Tracking & Performance Monitoring

Sentry is the developer-first error tracking and performance monitoring platform. Capture exceptions, trace performance issues, and debug production errors across all languages.

AI Open Source 8Configs

VoltAgent — TypeScript AI Agent Framework

Open-source TypeScript framework for building AI agents with built-in Memory, RAG, Guardrails, MCP, Voice, and Workflow support. Includes LLM observability console for debugging.

Script Depot 44Scripts

Claude Code Agent: K8s Specialist — Kubernetes Operations

Claude Code agent for Kubernetes. Deployment configs, helm charts, troubleshooting, scaling, monitoring, and cluster management.

Skill Factory 35Skills

Uptime Kuma — Self-Hosted Uptime Monitoring

Monitor HTTP, TCP, DNS, Docker services with notifications to 90+ channels. Beautiful dashboard. 84K+ GitHub stars.

MCP Hub 32MCP Configs

Gemini CLI Extension: Vertex AI — Model Management

Gemini CLI extension for Vertex AI. Deploy models, manage endpoints, run predictions, and monitor ML pipelines.

Skill Factory 32Skills

DeepEval — LLM Testing Framework with 30+ Metrics

DeepEval is a pytest-like testing framework for LLM apps with 30+ metrics. 14.4K+ GitHub stars. RAG, agent, multimodal evaluation. Runs locally. MIT.

Script Depot 30Scripts
📜

Phidata — Build & Deploy AI Agents at Scale

Framework for building, running, and managing AI agents at scale. Memory, knowledge, tools, reasoning, and team workflows. Monitoring dashboard included. 39K+ stars.

Script Depot 26Scripts
🔌

mcp-agent — Build AI Agents with MCP Patterns

mcp-agent is a Python framework for building AI agents using the Model Context Protocol. 8.2K+ GitHub stars. Implements composable workflow patterns (orchestrator, map-reduce, evaluator-optimizer, rou

MCP Hub 26MCP ConfigsScripts

Google ADK — Official AI Agent Dev Kit

Google's open-source Agent Development Kit for building, evaluating, and deploying AI agents in Python. 18.7K+ stars. Multi-agent, tool use, eval. Apache 2.0.

Script Depot 25Scripts

Ragas — Evaluate RAG & LLM Applications

Ragas evaluates LLM applications with objective metrics, test data generation, and data-driven insights. 13.2K+ GitHub stars. RAG evaluation, auto test generation. Apache 2.0.

Script Depot 25Scripts
🔌

Glama — MCP Server Discovery and Management

Directory and management platform for MCP servers. Discover, install, and monitor Model Context Protocol servers for Claude Code, Cline, and other AI coding tools.

MCP Hub 24MCP Configs

AI 可观测性

AI Observability

As AI moves from prototypes to production, observability becomes critical. You need to know what your AI is doing, why it made a decision, how much it costs, and when it fails. LLM Observability — Opik, Langfuse, and AgentOps provide tracing, logging, and analytics for LLM applications. See every prompt, completion, tool call, and token cost in a unified dashboard.

Agent Debugging — Multi-step AI agents are hard to debug. Observability tools capture the full execution trace — every reasoning step, tool invocation, and decision point — so you can replay and diagnose failures. Evaluation Frameworks — DeepEval, Ragas, and custom eval pipelines measure AI quality systematically. Track accuracy, hallucination rates, latency, and cost across model versions.

Infrastructure Monitoring — Uptime Kuma and Grafana integrations monitor your AI endpoints, alert on degradation, and track SLAs. Essential for production AI services where downtime or quality drops directly impact users.

You can't improve what you can't measure — and AI systems are notoriously hard to measure.

常见问题

What is AI observability?+

AI observability is the practice of monitoring, tracing, and analyzing AI system behavior in production. It goes beyond traditional monitoring (is the server up?) to answer AI-specific questions: Is the model hallucinating? Are responses getting slower? Which prompts produce the best results? How much does each query cost? Tools like Opik and AgentOps provide dashboards that answer these questions in real-time.

How do I debug AI agent failures?+

Use tracing tools that capture the full agent execution: every LLM call, tool invocation, memory access, and decision point. AgentOps and Langfuse visualize these traces as timelines, letting you pinpoint exactly where an agent went wrong. For intermittent failures, set up automated evaluation that flags quality drops before users report them.

What metrics should I track for LLM applications?+

Essential metrics: latency (time to first token, total response time), cost (tokens per request, cost per user), quality (eval scores, hallucination rate, user feedback), and reliability (error rate, timeout rate, retry rate). Advanced: track these metrics per prompt template, per model version, and per user segment to identify regressions quickly.

探索更多分类