Observability

Mejores herramientas de IA para monitoreo y observabilidad (2026)

Plataformas de observabilidad de IA, herramientas de evaluación de LLM, monitoreo de uptime y dashboards de debug de Agents. Ve qué pasa dentro de tus sistemas de IA.

30 herramientas
LangSmith — Prompt Debugging and LLM Observability logo

LangSmith — Prompt Debugging and LLM Observability

Debug, test, and monitor LLM applications in production. LangSmith provides trace visualization, prompt playground, dataset evaluation, and regression testing for AI.

Prompt Lab 259Prompts
Opik — Debug, Evaluate & Monitor LLM Apps logo

Opik — Debug, Evaluate & Monitor LLM Apps

Trace LLM calls, run automated evaluations, and monitor RAG and agent quality in production. By Comet. 18K+ GitHub stars.

AI Open Source 249Skills
Grafana — Open Source Data Visualization & Observability logo

Grafana — Open Source Data Visualization & Observability

Grafana is the leading open-source platform for monitoring and observability. Visualize metrics, logs, and traces from Prometheus, Loki, Elasticsearch, and 100+ data sources.

Grafana Labs 240Skills
Arize Phoenix — Open Source AI Observability and Evaluation logo

Arize Phoenix — Open Source AI Observability and Evaluation

Arize Phoenix is an open-source platform for monitoring, evaluating, and debugging AI applications, providing tracing, experiment tracking, and automated evaluation for LLM and ML pipelines.

Script Depot 164Skills
Coze Loop — Agent Prompt, Eval, and Observability Hub logo

Coze Loop — Agent Prompt, Eval, and Observability Hub

Coze Loop unifies prompt iteration, evaluation, and trace observability, helping agent teams debug workflows without jumping across separate tools.

Agent Toolkit 112Prompts
Gemini CLI Extension: Observability — Monitoring & Logs logo

Gemini CLI Extension: Observability — Monitoring & Logs

Gemini CLI extension for Google Cloud observability. Set up monitoring, analyze logs, create dashboards, and configure alerts.

Google · Gemini Team 270Skills
Langfuse — Open Source LLM Observability logo

Langfuse — Open Source LLM Observability

Langfuse is an open-source LLM engineering platform for tracing, prompt management, evaluation, and debugging AI apps. 24.1K+ GitHub stars. Self-hosted or cloud. MIT.

Langfuse 247Skills
TensorZero — Open-Source LLMOps Platform in Rust logo

TensorZero — Open-Source LLMOps Platform in Rust

TensorZero is an open-source LLMOps platform that unifies an LLM gateway, observability, evaluation, optimization, and experimentation into a single performant system written in Rust.

Script Depot 235Skills
SigNoz — Open Source APM & Observability Platform logo

SigNoz — Open Source APM & Observability Platform

SigNoz is an open-source Datadog/New Relic alternative with logs, traces, and metrics in one platform. Native OpenTelemetry support, ClickHouse backend, and powerful dashboards.

AI Open Source 232Skills
Sentry — Open Source Error Tracking & Performance Monitoring logo

Sentry — Open Source Error Tracking & Performance Monitoring

Sentry is the developer-first error tracking and performance monitoring platform. Capture exceptions, trace performance issues, and debug production errors across all languages.

AI Open Source 221Skills
Sentry MCP — Error Monitoring Server for AI Agents logo

Sentry MCP — Error Monitoring Server for AI Agents

MCP server that connects AI agents to Sentry for real-time error monitoring. Query issues, analyze stack traces, track regressions, and resolve bugs with full crash context. 2,000+ stars.

MCP Hub 220MCP Configs
Phoenix — Open Source AI Observability logo

Phoenix — Open Source AI Observability

Phoenix is an AI observability platform for tracing, evaluating, and debugging LLM apps. 9.1K+ stars. OpenTelemetry, evals, prompt management.

Arize AI 220Skills
AgentOps — Observability Dashboard for AI Agents logo

AgentOps — Observability Dashboard for AI Agents

Python SDK for monitoring AI agent sessions with real-time dashboards, token tracking, cost analysis, and error replay. Two lines of code to instrument any framework. 4,500+ GitHub stars.

Agent Toolkit 218Skills
Netdata — Real-Time Infrastructure Monitoring & Observability logo

Netdata — Real-Time Infrastructure Monitoring & Observability

Netdata is an open-source monitoring agent that collects thousands of metrics per second with zero configuration. Beautiful dashboards, ML-powered alerts, and instant deployment.

Script Depot 207Skills
Pixie — eBPF-Based Auto-Instrumentation for Kubernetes Observability logo

Pixie — eBPF-Based Auto-Instrumentation for Kubernetes Observability

CNCF observability platform that uses eBPF to capture metrics, traces, and logs from every pod with zero code changes.

AI Open Source 205Skills
OpenLIT — OpenTelemetry LLM Observability logo

OpenLIT — OpenTelemetry LLM Observability

Monitor LLM costs, latency, and quality with OpenTelemetry-native tracing. GPU monitoring and guardrails built in. 2.3K+ stars.

AI Open Source 201Skills
AgentOps — Observability for AI Agents logo

AgentOps — Observability for AI Agents

Python SDK for AI agent monitoring. LLM cost tracking, session replay, benchmarking, and error analysis. Integrates with CrewAI, LangChain, AutoGen, and more. 5.4K+ stars.

Script Depot 201Skills
OpenObserve — Rust-Based Petabyte-Scale Observability Platform logo

OpenObserve — Rust-Based Petabyte-Scale Observability Platform

All-in-one Rust observability platform that ingests logs, metrics, traces and RUM into Parquet on object storage for 140x cheaper retention.

AI Open Source 199Skills
Langtrace — Open Source AI Observability Platform logo

Langtrace — Open Source AI Observability Platform

Open-source observability for LLM apps. Trace OpenAI, Anthropic, and LangChain calls with OpenTelemetry-native instrumentation and a real-time dashboard.

AI Open Source 198Skills
Vector — High-Performance Observability Data Pipeline logo

Vector — High-Performance Observability Data Pipeline

Vector collects, transforms, and routes logs, metrics, and traces from any source to any destination. Written in Rust, it handles 100x more throughput than Logstash/Fluentd on the same hardware with a unified config language.

AI Open Source 196Skills
HyperDX — Open Source Full-Stack Observability Platform logo

HyperDX — Open Source Full-Stack Observability Platform

A self-hosted observability platform that unifies logs, metrics, traces, and session replays in one interface powered by ClickHouse and OpenTelemetry.

Script Depot 189Skills
Coroot — Open Source Observability with AI Root Cause Analysis logo

Coroot — Open Source Observability with AI Root Cause Analysis

Coroot is a self-hosted observability and APM tool that combines metrics, logs, traces, and continuous profiling with eBPF-based auto-instrumentation and AI-powered root cause analysis in predefined dashboards.

AI Open Source 183Skills
Evidently — ML & LLM Monitoring with 100+ Metrics logo

Evidently — ML & LLM Monitoring with 100+ Metrics

Evaluate, test, and monitor AI systems with 100+ built-in metrics for data drift, model quality, and LLM output. 7.3K+ stars.

AI Open Source 179Skills
PostHog LLM Observability — Track AI Agents in Production logo

PostHog LLM Observability — Track AI Agents in Production

PostHog LLM Observability traces every LLM call from your app — model, latency, cost, errors. Auto-detects via SDK wrapper. Free up to 100K events/month.

PostHog 156Knowledge
SigNoz MCP Server — Query Traces, Logs & Alerts logo

SigNoz MCP Server — Query Traces, Logs & Alerts

SigNoz MCP Server connects MCP clients to your SigNoz instance: query traces/logs, inspect alerts, and automate observability workflows using an API key.

MCP Hub 145MCP Configs
Datadog LLM Observability — Trace Cost, Latency, Drift logo

Datadog LLM Observability — Trace Cost, Latency, Drift

Datadog LLM Observability traces OpenAI / Anthropic / Bedrock calls, tracks per-user cost, surfaces drift. Dashboards and span-level prompt view.

Datadog 131Knowledge
DeepFlow — eBPF Observability for Cloud & AI logo

DeepFlow — eBPF Observability for Cloud & AI

DeepFlow offers zero-code eBPF observability for Kubernetes/VMs—flows, metrics, traces, profiling—with OpenTelemetry support and a Docker Compose deploy.

Script Depot 130Skills
Judgeval — Tracing + Evaluation for Agent Apps logo

Judgeval — Tracing + Evaluation for Agent Apps

Judgeval adds tracing and evaluation to agent apps, helping teams score behavior and monitor live traffic with a small SDK and dashboard workflow.

Agent Toolkit 129Skills
Highlight.io — Open Source Full-Stack Application Monitoring logo

Highlight.io — Open Source Full-Stack Application Monitoring

A self-hostable observability platform that combines session replay, error monitoring, log management, and tracing in one tool with OpenTelemetry-native ingestion.

Script Depot 124Skills
Nightingale — Cloud-Native Monitoring and Alerting Platform logo

Nightingale — Cloud-Native Monitoring and Alerting Platform

An open-source observability platform that complements Grafana with alerting, dashboards, and metric management.

AI Open Source 124Skills

Observabilidad de IA

AI Observability

As AI moves from prototypes to production, observability becomes critical. You need to know what your AI is doing, why it made a decision, how much it costs, and when it fails. LLM Observability — Opik, Langfuse, and AgentOps provide tracing, logging, and analytics for LLM applications. See every prompt, completion, tool call, and token cost in a unified dashboard.

Agent Debugging — Multi-step AI agents are hard to debug. Observability tools capture the full execution trace — every reasoning step, tool invocation, and decision point — so you can replay and diagnose failures. Evaluation Frameworks — DeepEval, Ragas, and custom eval pipelines measure AI quality systematically. Track accuracy, hallucination rates, latency, and cost across model versions.

Infrastructure Monitoring — Uptime Kuma and Grafana integrations monitor your AI endpoints, alert on degradation, and track SLAs. Essential for production AI services where downtime or quality drops directly impact users.

You can't improve what you can't measure — and AI systems are notoriously hard to measure.

Preguntas frecuentes

¿Qué es la observabilidad de IA?+

La observabilidad de IA es la práctica de monitorear, tracear y analizar el comportamiento de los sistemas de IA en producción. Va más allá del monitoreo tradicional (¿está arriba el servidor?) para responder preguntas específicas de IA: ¿está alucinando el modelo? ¿Se están volviendo más lentas las respuestas? ¿Qué prompts producen los mejores resultados? ¿Cuánto cuesta cada query? Herramientas como Opik y AgentOps ofrecen dashboards que responden estas preguntas en tiempo real.

¿Cómo depuro los fallos de los Agents de IA?+

Usa herramientas de tracing que capturen toda la ejecución del Agent: cada llamada al LLM, invocación de herramienta, acceso a memoria y punto de decisión. AgentOps y Langfuse visualizan esas trazas como timelines, permitiéndote identificar con precisión dónde se desvió el Agent. Para fallos intermitentes, configura evaluación automatizada que marque caídas de calidad antes de que los usuarios las reporten.

¿Qué métricas debo trackear para aplicaciones LLM?+

Métricas esenciales: latencia (time to first token, tiempo total de respuesta), costo (tokens por request, costo por usuario), calidad (scores de eval, tasa de alucinación, feedback de usuarios) y fiabilidad (tasa de error, tasa de timeout, tasa de retry). Avanzado: trackea estas métricas por template de prompt, por versión de modelo y por segmento de usuario para identificar regresiones rápidamente.

Explora categorías relacionadas