ScriptsMay 8, 2026·4 min read

Datadog APM Auto-Instrumentation for LangChain Pipelines

ddtrace auto-instruments LangChain chains, agents, tools — every step gets a span, parent-child preserved, latency and tokens recorded.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Stage only · 17/100Stage only
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: New
Entrypoint
Asset
Universal CLI install command
npx tokrepo install 842d84f8-86e6-408e-ae4f-21cee48dba1c
Intro

Datadog's ddtrace SDK auto-instruments LangChain — every chain run, agent step, retriever call, and tool execution becomes a span in your service flame graph with proper parent-child relationships. You see exactly which retrieval step took 800ms, which tool returned an error, which prompt template hit the model. Best for: LangChain or LlamaIndex pipelines you can't easily decompose; debugging slow agents; surfacing the long tail of agent failures. Works with: ddtrace ≥ 2.10 patched against LangChain ≥ 0.1, LlamaIndex ≥ 0.10. Setup time: 5 minutes.


Enable LangChain instrumentation

import os
from ddtrace import patch_all
patch_all(langchain=True)

os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"] = "my-langchain-rag"
os.environ["DD_API_KEY"] = "..."

# Now LangChain runs are auto-traced
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "{question}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o")
chain.invoke({"question": "Explain BERT in 50 words"})

Multi-step agent (RAG + tools)

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_community.tools import TavilySearchResults

tools = [TavilySearchResults(max_results=3)]
agent = create_tool_calling_agent(ChatOpenAI(model="gpt-4o"), tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
executor.invoke({"question": "What's the latest GPT release?"})
# Datadog flame graph shows: agent → tool(tavily_search) → llm(gpt-4o) → llm(gpt-4o, final)

Span hierarchy in Datadog

agent.run               (1.8s, total)
├─ retrieve.documents   (320ms)
├─ tool.tavily_search   (640ms)
└─ llm.openai           (820ms, 1247 tokens, $0.012)
    ├─ prompt.template  (12ms)
    └─ http.request     (798ms)

Attributes captured per span

  • langchain.request.type — chain | agent | retriever | tool | llm
  • langchain.request.model_name — gpt-4o, claude-3-5-sonnet, etc.
  • langchain.tokens.prompt, langchain.tokens.completion
  • langchain.cost.usd
  • error.type, error.message if the step failed

Combine with logs and metrics

# datadog.yaml log → trace correlation
logs_enabled: true
apm_config:
  trace_id_injection: true

Now any log line emitted during a chain step joins the trace in the LLM Observability view — search "session_id:abc-123" and see logs + spans in one timeline.


FAQ

Q: Does LangGraph work too? A: Yes — ddtrace ≥ 2.18 instruments LangGraph node executions. Each graph node becomes a span; the supergraph run is the parent. Cycle detection keeps repeated nodes distinct.

Q: What if I use LangServe? A: LangServe runs over FastAPI; ddtrace's patch(fastapi=True) plus patch(langchain=True) gives you HTTP request → chain run → LLM call as one continuous trace. Drop both patch_all'd together.

Q: Performance overhead? A: Tiny — ddtrace's hooks add <1% latency on tested LangChain workloads. The exporter batches and ships async. Disable on hot paths only if you hit measured regressions.


Quick Use

  1. pip install ddtrace>=2.10
  2. from ddtrace import patch_all; patch_all(langchain=True) before imports
  3. Set DD_LLMOBS_ENABLED=1, DD_LLMOBS_ML_APP, DD_API_KEY

Intro

Datadog's ddtrace SDK auto-instruments LangChain — every chain run, agent step, retriever call, and tool execution becomes a span in your service flame graph with proper parent-child relationships. You see exactly which retrieval step took 800ms, which tool returned an error, which prompt template hit the model. Best for: LangChain or LlamaIndex pipelines you can't easily decompose; debugging slow agents; surfacing the long tail of agent failures. Works with: ddtrace ≥ 2.10 patched against LangChain ≥ 0.1, LlamaIndex ≥ 0.10. Setup time: 5 minutes.


Enable LangChain instrumentation

import os
from ddtrace import patch_all
patch_all(langchain=True)

os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"] = "my-langchain-rag"
os.environ["DD_API_KEY"] = "..."

# Now LangChain runs are auto-traced
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "{question}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o")
chain.invoke({"question": "Explain BERT in 50 words"})

Multi-step agent (RAG + tools)

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_community.tools import TavilySearchResults

tools = [TavilySearchResults(max_results=3)]
agent = create_tool_calling_agent(ChatOpenAI(model="gpt-4o"), tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
executor.invoke({"question": "What's the latest GPT release?"})
# Datadog flame graph shows: agent → tool(tavily_search) → llm(gpt-4o) → llm(gpt-4o, final)

Span hierarchy in Datadog

agent.run               (1.8s, total)
├─ retrieve.documents   (320ms)
├─ tool.tavily_search   (640ms)
└─ llm.openai           (820ms, 1247 tokens, $0.012)
    ├─ prompt.template  (12ms)
    └─ http.request     (798ms)

Attributes captured per span

  • langchain.request.type — chain | agent | retriever | tool | llm
  • langchain.request.model_name — gpt-4o, claude-3-5-sonnet, etc.
  • langchain.tokens.prompt, langchain.tokens.completion
  • langchain.cost.usd
  • error.type, error.message if the step failed

Combine with logs and metrics

# datadog.yaml log → trace correlation
logs_enabled: true
apm_config:
  trace_id_injection: true

Now any log line emitted during a chain step joins the trace in the LLM Observability view — search "session_id:abc-123" and see logs + spans in one timeline.


FAQ

Q: Does LangGraph work too? A: Yes — ddtrace ≥ 2.18 instruments LangGraph node executions. Each graph node becomes a span; the supergraph run is the parent. Cycle detection keeps repeated nodes distinct.

Q: What if I use LangServe? A: LangServe runs over FastAPI; ddtrace's patch(fastapi=True) plus patch(langchain=True) gives you HTTP request → chain run → LLM call as one continuous trace. Drop both patch_all'd together.

Q: Performance overhead? A: Tiny — ddtrace's hooks add <1% latency on tested LangChain workloads. The exporter batches and ships async. Disable on hot paths only if you hit measured regressions.


Source & Thanks

Built by Datadog. LangChain integration in DataDog/dd-trace-py.

Apache-2.0 + Datadog API ToS

🙏

Source & Thanks

Built by Datadog. LangChain integration in DataDog/dd-trace-py.

Apache-2.0 + Datadog API ToS

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets