Esta página se muestra en inglés. Una traducción al español está en curso.
KnowledgeMay 8, 2026·4 min de lectura

Datadog LLM Observability — Trace Cost, Latency, Drift

Datadog LLM Observability traces OpenAI / Anthropic / Bedrock calls, tracks per-user cost, surfaces drift. Dashboards and span-level prompt view.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 15/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
Knowledge
Instalación
Stage only
Confianza
Confianza: New
Entrada
Asset
Comando CLI universal
npx tokrepo install 563a1aea-506a-4568-a5e6-583c28d5f242
Introducción

Datadog LLM Observability (formerly LLM Monitoring) is a turn-key tracing layer for AI apps that already live in Datadog. Drop the ddtrace SDK in, every OpenAI / Anthropic / Bedrock / LangChain call generates a span with prompt, completion, cost, latency, model name, user, and session ID. Built-in dashboards for top-cost users, p95 latency by model, error rate, and drift detection. Best for: teams with Datadog APM/logs already wired into product; enterprise security review where prompt logging needs central retention. Works with: Python ddtrace, Node dd-trace, OpenTelemetry exporter for any language. Setup time: 10 minutes.


Python install

pip install ddtrace

Auto-instrument OpenAI

import os, ddtrace
from ddtrace import patch
patch(openai=True)

os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"]  = "my-rag-app"
os.environ["DD_API_KEY"]        = "..."
os.environ["DD_SITE"]           = "datadoghq.com"

# Now use OpenAI normally — every call gets traced
from openai import OpenAI
OpenAI().chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain BPE tokenization"}],
)

Tag traces with user / session

from ddtrace.llmobs import LLMObs

with LLMObs.workflow(name="support_chat", session_id=session_id, user_id=user_id):
    # All LLM calls inside this block carry the session_id and user_id tags
    answer = run_my_rag_pipeline(question)

Custom span for non-instrumented call

@LLMObs.llm(name="custom-call", model_name="gpt-4o", model_provider="openai")
def call_my_proxy(prompt):
    return my_internal_proxy.complete(prompt)

Built-in views (LLM Observability tab)

  • Traces — every call with prompt, completion, cost, latency
  • Topology — agent graph showing tools called per request
  • Quality — eval scores attached to spans (hallucination, toxicity)
  • Cost — by user / model / session, top spenders
  • Drift — input topic distribution shift over time
  • Errors — rate, by model, by application

OpenTelemetry alternative

If you don't want ddtrace, send OTLP traces to Datadog with the OpenInference semantic conventions — Datadog renders them in the same LLM Observability views.


FAQ

Q: How does pricing work? A: LLM Observability is billed per million spans — a few cents per million. Existing Datadog APM customers can reuse the same agent infra. The first 100M spans/month are typically included in Pro plans.

Q: Will prompts and completions be stored long-term? A: By default yes, with configurable retention (15 / 30 / 90 days). For PII-sensitive prompts, enable scrubbing rules at SDK level (DD_LLMOBS_SAMPLE_RATE + custom redactor) so PII is masked before it leaves the host.

Q: Datadog vs Phoenix vs Langfuse? A: Datadog wins if your stack already lives there — same dashboards, alerts, on-call workflows. Phoenix wins for OTel-native portability and free self-host. Langfuse wins for prompt management + cheap self-host.


Quick Use

  1. pip install ddtrace
  2. Set DD_LLMOBS_ENABLED=1, DD_LLMOBS_ML_APP, DD_API_KEY, DD_SITE
  3. patch(openai=True) — every call now traces to Datadog

Intro

Datadog LLM Observability (formerly LLM Monitoring) is a turn-key tracing layer for AI apps that already live in Datadog. Drop the ddtrace SDK in, every OpenAI / Anthropic / Bedrock / LangChain call generates a span with prompt, completion, cost, latency, model name, user, and session ID. Built-in dashboards for top-cost users, p95 latency by model, error rate, and drift detection. Best for: teams with Datadog APM/logs already wired into product; enterprise security review where prompt logging needs central retention. Works with: Python ddtrace, Node dd-trace, OpenTelemetry exporter for any language. Setup time: 10 minutes.


Python install

pip install ddtrace

Auto-instrument OpenAI

import os, ddtrace
from ddtrace import patch
patch(openai=True)

os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"]  = "my-rag-app"
os.environ["DD_API_KEY"]        = "..."
os.environ["DD_SITE"]           = "datadoghq.com"

# Now use OpenAI normally — every call gets traced
from openai import OpenAI
OpenAI().chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain BPE tokenization"}],
)

Tag traces with user / session

from ddtrace.llmobs import LLMObs

with LLMObs.workflow(name="support_chat", session_id=session_id, user_id=user_id):
    # All LLM calls inside this block carry the session_id and user_id tags
    answer = run_my_rag_pipeline(question)

Custom span for non-instrumented call

@LLMObs.llm(name="custom-call", model_name="gpt-4o", model_provider="openai")
def call_my_proxy(prompt):
    return my_internal_proxy.complete(prompt)

Built-in views (LLM Observability tab)

  • Traces — every call with prompt, completion, cost, latency
  • Topology — agent graph showing tools called per request
  • Quality — eval scores attached to spans (hallucination, toxicity)
  • Cost — by user / model / session, top spenders
  • Drift — input topic distribution shift over time
  • Errors — rate, by model, by application

OpenTelemetry alternative

If you don't want ddtrace, send OTLP traces to Datadog with the OpenInference semantic conventions — Datadog renders them in the same LLM Observability views.


FAQ

Q: How does pricing work? A: LLM Observability is billed per million spans — a few cents per million. Existing Datadog APM customers can reuse the same agent infra. The first 100M spans/month are typically included in Pro plans.

Q: Will prompts and completions be stored long-term? A: By default yes, with configurable retention (15 / 30 / 90 days). For PII-sensitive prompts, enable scrubbing rules at SDK level (DD_LLMOBS_SAMPLE_RATE + custom redactor) so PII is masked before it leaves the host.

Q: Datadog vs Phoenix vs Langfuse? A: Datadog wins if your stack already lives there — same dashboards, alerts, on-call workflows. Phoenix wins for OTel-native portability and free self-host. Langfuse wins for prompt management + cheap self-host.


Source & Thanks

Built by Datadog. Docs at docs.datadoghq.com/llm_observability.

DataDog/dd-trace-py — ⭐ 700+

🙏

Fuente y agradecimientos

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados