ConfigsApr 1, 2026·1 min read

Phoenix — Open Source AI Observability

Phoenix is an AI observability platform for tracing, evaluating, and debugging LLM apps. 9.1K+ stars. OpenTelemetry, evals, prompt management.

TL;DR
Phoenix provides tracing, evaluation, and prompt management for LLM apps via OpenTelemetry.
§01

What it is

Phoenix is an open-source AI observability platform by Arize AI. It traces LLM application calls, evaluates output quality, and helps debug issues in retrieval-augmented generation (RAG) pipelines, agents, and chat applications. It uses OpenTelemetry for instrumentation and provides a web UI for exploring traces.

Phoenix targets ML engineers and developers building production LLM applications who need visibility into what their AI is doing, why it fails, and how to improve it.

§02

How it saves time or tokens

Phoenix shows you exactly which prompts, retrievals, and tool calls happened in each request. When an LLM produces a bad answer, you trace the root cause (wrong documents retrieved, poor prompt, hallucination) without adding debug logging manually.

The evaluation framework lets you score outputs automatically, catching quality regressions before users report them.

§03

How to use

  1. Install Phoenix: pip install arize-phoenix
  2. Start the Phoenix server: phoenix serve
  3. Instrument your LLM application with the Phoenix OpenTelemetry integration
  4. Open the web UI at http://localhost:6006 to explore traces
§04

Example

import phoenix as px
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor

# Start Phoenix
px.launch_app()

# Register OpenTelemetry tracer
tracer_provider = register(project_name='my-app')

# Instrument OpenAI calls
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

# Your LLM calls are now traced automatically
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model='gpt-4o',
    messages=[{'role': 'user', 'content': 'What is RAG?'}]
)
§05

Related on TokRepo

§06

Common pitfalls

  • Phoenix traces can grow large for high-throughput apps; configure sampling rates for production
  • OpenTelemetry instrumentation adds slight latency; benchmark before deploying to latency-sensitive endpoints
  • The evaluation framework requires labeled data or LLM-as-judge setup; plan your eval strategy before instrumenting

Frequently Asked Questions

How does Phoenix compare to LangSmith?+

Both provide LLM tracing and evaluation. Phoenix is open source and self-hosted. LangSmith is a managed service by LangChain. Phoenix uses standard OpenTelemetry; LangSmith uses proprietary instrumentation. Choose Phoenix for self-hosting and vendor independence.

Does Phoenix support RAG tracing?+

Yes. Phoenix traces retrieval steps including document chunks, similarity scores, and reranking. You can see exactly which documents were retrieved and whether they were relevant to the query, helping debug RAG quality issues.

Can I use Phoenix with any LLM provider?+

Yes. Phoenix supports OpenAI, Anthropic, Google, and any provider through OpenTelemetry-compatible instrumentation libraries. The openinference library provides auto-instrumentors for popular frameworks.

Does Phoenix store trace data permanently?+

By default, Phoenix stores traces in memory for the session. For persistence, configure a database backend like PostgreSQL. The managed Arize platform provides long-term storage and additional features.

What evaluations does Phoenix support?+

Phoenix supports relevance scoring, hallucination detection, toxicity checks, and custom evaluations. You can use LLM-as-judge evaluations where a model scores the output, or write custom evaluation functions.

Citations (3)
  • Phoenix GitHub— Phoenix is an AI observability platform with 9.1K+ GitHub stars
  • Phoenix Docs— OpenTelemetry-based instrumentation for LLM applications
  • OpenTelemetry— OpenTelemetry observability framework
🙏

Source & Thanks

Arize-ai/phoenix — 9,100+ GitHub stars

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets