Question 1

Is this stuff free?

Accepted Answer

Langfuse, Phoenix, and AgentOps are open-source under MIT/Apache 2.0 and run on a single VM. Self-hosted is free; you only pay for storage and compute. LangSmith is hosted-only and metered per trace — free tier covers small teams, prices scale to enterprise. For most teams the right answer is start with self-hosted Langfuse, switch to LangSmith only if you're already deep in the LangChain ecosystem and want first-party integration.

Question 2

How does Langfuse compare to LangSmith?

Accepted Answer

Langfuse is open-source, self-hostable, and framework-agnostic — it works with LangChain, LlamaIndex, raw OpenAI SDK, custom code. LangSmith is closed-source, hosted, and tightly coupled to LangChain. Feature-wise they're roughly equivalent on tracing and prompt management; LangSmith has a slight edge on LangChain-specific features, Langfuse has a stronger evaluator framework and self-host story. Pick Langfuse if data sovereignty matters, LangSmith if you want zero-ops and are LangChain-native.

Question 3

Will this work with Cursor or Codex CLI?

Accepted Answer

Observability is at the API call level, not the editor level — so any tool that hits an LLM API can be instrumented. The TokRepo install adds SDK init code to your project. If you're proxying through Claude Code, Cursor, or Codex CLI, instrument the agent backend (the framework or service that calls the LLM), not the editor. Each platform's SDK is a 5-line import.

Question 4

What's the difference vs the LLM Eval pack?

Accepted Answer

Eval is offline scoring — given a prompt and a reference answer, how good is the output. Observability is runtime telemetry — what happened in production: latency, cost, errors, traces. Eval feeds CI; observability feeds dashboards and alerts. You need both. A common pattern: eval scores from your golden set get logged into your observability platform so quality, cost, and latency live on the same dashboard.

Question 5

How much instrumentation overhead does this add?

Accepted Answer

Async batched logging adds ~1-3ms p50 latency to LLM calls — negligible compared to the model latency itself (often 500-3000ms). All four platforms ship async SDKs that batch traces in the background. Set sampling to 10% on high-volume endpoints to keep storage costs sane. The actual hot-path overhead is so low that there's no good reason to ship without observability.

#	Asset	Tier	What it does
1	Langfuse	open-source	full traces, eval, prompt management — self-host or cloud
2	AgentOps	open-source	agent-specific observability with session replay
3	Arize Phoenix	open-source	OpenInference traces with built-in evaluators
4	LangSmith	hosted	LangChain's first-party tracing & dataset platform
5	Token cost dashboards	pattern	per-user, per-feature, per-prompt-version breakdown
6	Latency budget alerts	pattern	p95 / p99 with PagerDuty wiring
7	Prompt version diffs	pattern	side-by-side trace replay across two prompt versions

LLM Observability

What's in this pack

Why this matters

Install in one command

Common pitfalls

Relationship to other packs

7 assets in this pack

Frequently asked questions

12 packs · 80+ hand-picked assets