Cette page est affichée en anglais. Une traduction française est en cours.
KnowledgeMay 8, 2026·4 min de lecture

Datadog LLM Observability — Trace Cost, Latency, Drift

Datadog LLM Observability traces OpenAI / Anthropic / Bedrock calls, tracks per-user cost, surfaces drift. Dashboards and span-level prompt view.

Datadog
Datadog · Community
Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose la commande CLI, le metadata JSON, le plan d'installation et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 15/100Stage only
Cible
Claude Code, Codex, Gemini CLI
Type
Knowledge
Installation
Stage only
Confiance
Confiance : New
Point d'entrée
Asset
Commande CLI d'installation
npx tokrepo install 563a1aea-506a-4568-a5e6-583c28d5f242 --target codex
Introduction

Datadog LLM Observability (formerly LLM Monitoring) is a turn-key tracing layer for AI apps that already live in Datadog. Drop the ddtrace SDK in, every OpenAI / Anthropic / Bedrock / LangChain call generates a span with prompt, completion, cost, latency, model name, user, and session ID. Built-in dashboards for top-cost users, p95 latency by model, error rate, and drift detection. Best for: teams with Datadog APM/logs already wired into product; enterprise security review where prompt logging needs central retention. Works with: Python ddtrace, Node dd-trace, OpenTelemetry exporter for any language. Setup time: 10 minutes.


Python install

pip install ddtrace

Auto-instrument OpenAI

import os, ddtrace
from ddtrace import patch
patch(openai=True)

os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"]  = "my-rag-app"
os.environ["DD_API_KEY"]        = "..."
os.environ["DD_SITE"]           = "datadoghq.com"

# Now use OpenAI normally — every call gets traced
from openai import OpenAI
OpenAI().chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain BPE tokenization"}],
)

Tag traces with user / session

from ddtrace.llmobs import LLMObs

with LLMObs.workflow(name="support_chat", session_id=session_id, user_id=user_id):
    # All LLM calls inside this block carry the session_id and user_id tags
    answer = run_my_rag_pipeline(question)

Custom span for non-instrumented call

@LLMObs.llm(name="custom-call", model_name="gpt-4o", model_provider="openai")
def call_my_proxy(prompt):
    return my_internal_proxy.complete(prompt)

Built-in views (LLM Observability tab)

  • Traces — every call with prompt, completion, cost, latency
  • Topology — agent graph showing tools called per request
  • Quality — eval scores attached to spans (hallucination, toxicity)
  • Cost — by user / model / session, top spenders
  • Drift — input topic distribution shift over time
  • Errors — rate, by model, by application

OpenTelemetry alternative

If you don't want ddtrace, send OTLP traces to Datadog with the OpenInference semantic conventions — Datadog renders them in the same LLM Observability views.


FAQ

Q: How does pricing work? A: LLM Observability is billed per million spans — a few cents per million. Existing Datadog APM customers can reuse the same agent infra. The first 100M spans/month are typically included in Pro plans.

Q: Will prompts and completions be stored long-term? A: By default yes, with configurable retention (15 / 30 / 90 days). For PII-sensitive prompts, enable scrubbing rules at SDK level (DD_LLMOBS_SAMPLE_RATE + custom redactor) so PII is masked before it leaves the host.

Q: Datadog vs Phoenix vs Langfuse? A: Datadog wins if your stack already lives there — same dashboards, alerts, on-call workflows. Phoenix wins for OTel-native portability and free self-host. Langfuse wins for prompt management + cheap self-host.


Quick Use

  1. pip install ddtrace
  2. Set DD_LLMOBS_ENABLED=1, DD_LLMOBS_ML_APP, DD_API_KEY, DD_SITE
  3. patch(openai=True) — every call now traces to Datadog

Intro

Datadog LLM Observability (formerly LLM Monitoring) is a turn-key tracing layer for AI apps that already live in Datadog. Drop the ddtrace SDK in, every OpenAI / Anthropic / Bedrock / LangChain call generates a span with prompt, completion, cost, latency, model name, user, and session ID. Built-in dashboards for top-cost users, p95 latency by model, error rate, and drift detection. Best for: teams with Datadog APM/logs already wired into product; enterprise security review where prompt logging needs central retention. Works with: Python ddtrace, Node dd-trace, OpenTelemetry exporter for any language. Setup time: 10 minutes.


Python install

pip install ddtrace

Auto-instrument OpenAI

import os, ddtrace
from ddtrace import patch
patch(openai=True)

os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"]  = "my-rag-app"
os.environ["DD_API_KEY"]        = "..."
os.environ["DD_SITE"]           = "datadoghq.com"

# Now use OpenAI normally — every call gets traced
from openai import OpenAI
OpenAI().chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain BPE tokenization"}],
)

Tag traces with user / session

from ddtrace.llmobs import LLMObs

with LLMObs.workflow(name="support_chat", session_id=session_id, user_id=user_id):
    # All LLM calls inside this block carry the session_id and user_id tags
    answer = run_my_rag_pipeline(question)

Custom span for non-instrumented call

@LLMObs.llm(name="custom-call", model_name="gpt-4o", model_provider="openai")
def call_my_proxy(prompt):
    return my_internal_proxy.complete(prompt)

Built-in views (LLM Observability tab)

  • Traces — every call with prompt, completion, cost, latency
  • Topology — agent graph showing tools called per request
  • Quality — eval scores attached to spans (hallucination, toxicity)
  • Cost — by user / model / session, top spenders
  • Drift — input topic distribution shift over time
  • Errors — rate, by model, by application

OpenTelemetry alternative

If you don't want ddtrace, send OTLP traces to Datadog with the OpenInference semantic conventions — Datadog renders them in the same LLM Observability views.


FAQ

Q: How does pricing work? A: LLM Observability is billed per million spans — a few cents per million. Existing Datadog APM customers can reuse the same agent infra. The first 100M spans/month are typically included in Pro plans.

Q: Will prompts and completions be stored long-term? A: By default yes, with configurable retention (15 / 30 / 90 days). For PII-sensitive prompts, enable scrubbing rules at SDK level (DD_LLMOBS_SAMPLE_RATE + custom redactor) so PII is masked before it leaves the host.

Q: Datadog vs Phoenix vs Langfuse? A: Datadog wins if your stack already lives there — same dashboards, alerts, on-call workflows. Phoenix wins for OTel-native portability and free self-host. Langfuse wins for prompt management + cheap self-host.


Source & Thanks

Built by Datadog. Docs at docs.datadoghq.com/llm_observability.

DataDog/dd-trace-py — ⭐ 700+

🙏

Source et remerciements

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires