SkillsMay 8, 2026·4 min read

Datadog APM Auto-Instrumentation for LangChain Pipelines

ddtrace auto-instruments LangChain chains, agents, tools — every step gets a span, parent-child preserved, latency and tokens recorded.

Agent ready

Safe staging for this asset

This asset is staged first. The copied prompt tells the agent to inspect the staged files and ask before activating scripts, MCP config, or global config.

Stage only · 29/100Policy: stage
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Stage only
Trust
Trust: Community
Entrypoint
Asset
Safe staging command
npx -y tokrepo@latest install 842d84f8-86e6-408e-ae4f-21cee48dba1c --target codex

Stages files first; activation requires review of the staged README and plan.

Intro

Datadog's ddtrace SDK auto-instruments LangChain — every chain run, agent step, retriever call, and tool execution becomes a span in your service flame graph with proper parent-child relationships. You see exactly which retrieval step took 800ms, which tool returned an error, which prompt template hit the model. Best for: LangChain or LlamaIndex pipelines you can't easily decompose; debugging slow agents; surfacing the long tail of agent failures. Works with: ddtrace ≥ 2.10 patched against LangChain ≥ 0.1, LlamaIndex ≥ 0.10. Setup time: 5 minutes.


Enable LangChain instrumentation

import os
from ddtrace import patch_all
patch_all(langchain=True)

os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"] = "my-langchain-rag"
os.environ["DD_API_KEY"] = "..."

# Now LangChain runs are auto-traced
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "{question}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o")
chain.invoke({"question": "Explain BERT in 50 words"})

Multi-step agent (RAG + tools)

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_community.tools import TavilySearchResults

tools = [TavilySearchResults(max_results=3)]
agent = create_tool_calling_agent(ChatOpenAI(model="gpt-4o"), tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
executor.invoke({"question": "What's the latest GPT release?"})
# Datadog flame graph shows: agent → tool(tavily_search) → llm(gpt-4o) → llm(gpt-4o, final)

Span hierarchy in Datadog

agent.run               (1.8s, total)
├─ retrieve.documents   (320ms)
├─ tool.tavily_search   (640ms)
└─ llm.openai           (820ms, 1247 tokens, $0.012)
    ├─ prompt.template  (12ms)
    └─ http.request     (798ms)

Attributes captured per span

  • langchain.request.type — chain | agent | retriever | tool | llm
  • langchain.request.model_name — gpt-4o, claude-3-5-sonnet, etc.
  • langchain.tokens.prompt, langchain.tokens.completion
  • langchain.cost.usd
  • error.type, error.message if the step failed

Combine with logs and metrics

# datadog.yaml log → trace correlation
logs_enabled: true
apm_config:
  trace_id_injection: true

Now any log line emitted during a chain step joins the trace in the LLM Observability view — search "session_id:abc-123" and see logs + spans in one timeline.


FAQ

Q: Does LangGraph work too? A: Yes — ddtrace ≥ 2.18 instruments LangGraph node executions. Each graph node becomes a span; the supergraph run is the parent. Cycle detection keeps repeated nodes distinct.

Q: What if I use LangServe? A: LangServe runs over FastAPI; ddtrace's patch(fastapi=True) plus patch(langchain=True) gives you HTTP request → chain run → LLM call as one continuous trace. Drop both patch_all'd together.

Q: Performance overhead? A: Tiny — ddtrace's hooks add <1% latency on tested LangChain workloads. The exporter batches and ships async. Disable on hot paths only if you hit measured regressions.


Quick Use

  1. pip install ddtrace>=2.10
  2. from ddtrace import patch_all; patch_all(langchain=True) before imports
  3. Set DD_LLMOBS_ENABLED=1, DD_LLMOBS_ML_APP, DD_API_KEY

Intro

Datadog's ddtrace SDK auto-instruments LangChain — every chain run, agent step, retriever call, and tool execution becomes a span in your service flame graph with proper parent-child relationships. You see exactly which retrieval step took 800ms, which tool returned an error, which prompt template hit the model. Best for: LangChain or LlamaIndex pipelines you can't easily decompose; debugging slow agents; surfacing the long tail of agent failures. Works with: ddtrace ≥ 2.10 patched against LangChain ≥ 0.1, LlamaIndex ≥ 0.10. Setup time: 5 minutes.


Enable LangChain instrumentation

import os
from ddtrace import patch_all
patch_all(langchain=True)

os.environ["DD_LLMOBS_ENABLED"] = "1"
os.environ["DD_LLMOBS_ML_APP"] = "my-langchain-rag"
os.environ["DD_API_KEY"] = "..."

# Now LangChain runs are auto-traced
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant"),
    ("user", "{question}"),
])
chain = prompt | ChatOpenAI(model="gpt-4o")
chain.invoke({"question": "Explain BERT in 50 words"})

Multi-step agent (RAG + tools)

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_community.tools import TavilySearchResults

tools = [TavilySearchResults(max_results=3)]
agent = create_tool_calling_agent(ChatOpenAI(model="gpt-4o"), tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)
executor.invoke({"question": "What's the latest GPT release?"})
# Datadog flame graph shows: agent → tool(tavily_search) → llm(gpt-4o) → llm(gpt-4o, final)

Span hierarchy in Datadog

agent.run               (1.8s, total)
├─ retrieve.documents   (320ms)
├─ tool.tavily_search   (640ms)
└─ llm.openai           (820ms, 1247 tokens, $0.012)
    ├─ prompt.template  (12ms)
    └─ http.request     (798ms)

Attributes captured per span

  • langchain.request.type — chain | agent | retriever | tool | llm
  • langchain.request.model_name — gpt-4o, claude-3-5-sonnet, etc.
  • langchain.tokens.prompt, langchain.tokens.completion
  • langchain.cost.usd
  • error.type, error.message if the step failed

Combine with logs and metrics

# datadog.yaml log → trace correlation
logs_enabled: true
apm_config:
  trace_id_injection: true

Now any log line emitted during a chain step joins the trace in the LLM Observability view — search "session_id:abc-123" and see logs + spans in one timeline.


FAQ

Q: Does LangGraph work too? A: Yes — ddtrace ≥ 2.18 instruments LangGraph node executions. Each graph node becomes a span; the supergraph run is the parent. Cycle detection keeps repeated nodes distinct.

Q: What if I use LangServe? A: LangServe runs over FastAPI; ddtrace's patch(fastapi=True) plus patch(langchain=True) gives you HTTP request → chain run → LLM call as one continuous trace. Drop both patch_all'd together.

Q: Performance overhead? A: Tiny — ddtrace's hooks add <1% latency on tested LangChain workloads. The exporter batches and ships async. Disable on hot paths only if you hit measured regressions.


Source & Thanks

Built by Datadog. LangChain integration in DataDog/dd-trace-py.

Apache-2.0 + Datadog API ToS

🙏

Source & Thanks

Built by Datadog. LangChain integration in DataDog/dd-trace-py.

Apache-2.0 + Datadog API ToS

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets