What is Datadog LLM Observability — Trace Cost, Latency, Drift?

Datadog LLM Observability traces OpenAI / Anthropic / Bedrock calls, tracks per-user cost, surfaces drift. Dashboards and span-level prompt view.

Is Datadog LLM Observability — Trace Cost, Latency, Drift free to use?

Yes. Datadog LLM Observability — Trace Cost, Latency, Drift is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Datadog LLM Observability — Trace Cost, Latency, Drift?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Datadog LLM Observability — Trace Cost, Latency, Drift

Name: Datadog LLM Observability — Trace Cost, Latency, Drift
Author: Datadog

简介

Datadog LLM Observability（前称 LLM Monitoring）是已经活在 Datadog 里的 AI 应用的开箱即用追踪层。装 ddtrace SDK，每次 OpenAI / Anthropic / Bedrock / LangChain 调用产生一个 span，含 prompt、completion、成本、延迟、模型名、用户、session ID。内置仪表盘：最高成本用户、按模型的 p95 延迟、错误率、漂移检测。适合 Datadog APM/日志已经接到产品的团队、需要 prompt 日志中心化保留的企业安全合规。兼容 Python ddtrace、Node dd-trace、任意语言的 OpenTelemetry 导出器。装机时间 10 分钟。

Python 安装

pip install ddtrace

自动注入 OpenAI

import os, ddtrace
from ddtrace import patch
patch(openai=True)

os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"]  = "my-rag-app"
os.environ["DD_API_KEY"]        = "..."
os.environ["DD_SITE"]           = "datadoghq.com"

# 之后正常用 OpenAI —— 每次调用都被 trace
from openai import OpenAI
OpenAI().chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "解释 BPE 分词"}],
)

Trace 打 user / session 标签

from ddtrace.llmobs import LLMObs

with LLMObs.workflow(name="support_chat", session_id=session_id, user_id=user_id):
    # 这块儿里所有 LLM 调用都带上 session_id 和 user_id 标签
    answer = run_my_rag_pipeline(question)

自定义 span（未被注入的调用）

@LLMObs.llm(name="custom-call", model_name="gpt-4o", model_provider="openai")
def call_my_proxy(prompt):
    return my_internal_proxy.complete(prompt)

内置视图（LLM Observability 标签页）

Traces —— 每次调用，带 prompt、completion、成本、延迟
Topology —— agent 图谱，看每请求调了哪些 tool
Quality —— eval 分数挂到 span 上（幻觉、毒性）
Cost —— 按用户 / 模型 / session，最高花费
Drift —— 输入主题分布随时间偏移
Errors —— 速率、按模型、按应用

OpenTelemetry 备选

不想用 ddtrace 的话，按 OpenInference 语义约定把 OTLP trace 推到 Datadog —— 同样在 LLM Observability 视图渲染。

FAQ

Q: 怎么计费？ A: LLM Observability 按百万 span 计费 —— 每百万几美分。现有 Datadog APM 客户能复用同一 agent 基建。Pro 套餐通常含每月前 1 亿 span。

Q: prompt 和 completion 会长期存吗？ A: 默认存，保留期可配（15 / 30 / 90 天）。PII 敏感 prompt 在 SDK 级开 scrubbing 规则（DD_LLMOBS_SAMPLE_RATE + 自定义 redactor），让 PII 在离开主机前打码。

Q: Datadog vs Phoenix vs Langfuse？ A: 栈已经在 Datadog 的话 Datadog 赢 —— 同样仪表盘、告警、值班工作流。要 OTel 原生可移植 + 免费自托管选 Phoenix。要 prompt 管理 + 便宜自托管选 Langfuse。

Datadog LLM Observability — Trace Cost, Latency, Drift

简介

Python 安装

自动注入 OpenAI

Trace 打 user / session 标签

自定义 span（未被注入的调用）

内置视图（LLM Observability 标签页）

OpenTelemetry 备选

FAQ

来源与感谢

讨论

相关资产

PostHog LLM Observability — Track AI Agents in Production

Helicone Sessions — Group LLM Calls by User Conversation

LiteLLM Cost Tracking — Per-Project LLM Spend Dashboard

Weave — Trace and Debug LLM Apps