Cette page est affichée en anglais. Une traduction française est en cours.
SkillsApr 1, 2026·1 min de lecture

Phoenix — Open Source AI Observability

Phoenix is an AI observability platform for tracing, evaluating, and debugging LLM apps. 9.1K+ stars. OpenTelemetry, evals, prompt management.

Arize AI
Arize AI · Community
Prêt pour agents

Installation agent prête

Cet actif peut être installé après choix du runtime, vérification du plan et exécution de la commande adaptée.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Community
Point d'entrée
phoenix.md
Commande d'installation directe
npx -y tokrepo@latest install 42fa8573-760e-4a07-a19f-43422546e9f5 --target codex

À exécuter après confirmation du plan en dry-run.

TL;DR
Phoenix provides tracing, evaluation, and prompt management for LLM apps via OpenTelemetry.
§01

What it is

Phoenix is an open-source AI observability platform by Arize AI. It traces LLM application calls, evaluates output quality, and helps debug issues in retrieval-augmented generation (RAG) pipelines, agents, and chat applications. It uses OpenTelemetry for instrumentation and provides a web UI for exploring traces.

Phoenix targets ML engineers and developers building production LLM applications who need visibility into what their AI is doing, why it fails, and how to improve it.

§02

How it saves time or tokens

Phoenix shows you exactly which prompts, retrievals, and tool calls happened in each request. When an LLM produces a bad answer, you trace the root cause (wrong documents retrieved, poor prompt, hallucination) without adding debug logging manually.

The evaluation framework lets you score outputs automatically, catching quality regressions before users report them.

§03

How to use

  1. Install Phoenix: pip install arize-phoenix
  2. Start the Phoenix server: phoenix serve
  3. Instrument your LLM application with the Phoenix OpenTelemetry integration
  4. Open the web UI at http://localhost:6006 to explore traces
§04

Example

import phoenix as px
from phoenix.otel import register
from openinference.instrumentation.openai import OpenAIInstrumentor

# Start Phoenix
px.launch_app()

# Register OpenTelemetry tracer
tracer_provider = register(project_name='my-app')

# Instrument OpenAI calls
OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)

# Your LLM calls are now traced automatically
from openai import OpenAI
client = OpenAI()
response = client.chat.completions.create(
    model='gpt-4o',
    messages=[{'role': 'user', 'content': 'What is RAG?'}]
)
§05

Related on TokRepo

§06

Common pitfalls

  • Phoenix traces can grow large for high-throughput apps; configure sampling rates for production
  • OpenTelemetry instrumentation adds slight latency; benchmark before deploying to latency-sensitive endpoints
  • The evaluation framework requires labeled data or LLM-as-judge setup; plan your eval strategy before instrumenting

Questions fréquentes

How does Phoenix compare to LangSmith?+

Both provide LLM tracing and evaluation. Phoenix is open source and self-hosted. LangSmith is a managed service by LangChain. Phoenix uses standard OpenTelemetry; LangSmith uses proprietary instrumentation. Choose Phoenix for self-hosting and vendor independence.

Does Phoenix support RAG tracing?+

Yes. Phoenix traces retrieval steps including document chunks, similarity scores, and reranking. You can see exactly which documents were retrieved and whether they were relevant to the query, helping debug RAG quality issues.

Can I use Phoenix with any LLM provider?+

Yes. Phoenix supports OpenAI, Anthropic, Google, and any provider through OpenTelemetry-compatible instrumentation libraries. The openinference library provides auto-instrumentors for popular frameworks.

Does Phoenix store trace data permanently?+

By default, Phoenix stores traces in memory for the session. For persistence, configure a database backend like PostgreSQL. The managed Arize platform provides long-term storage and additional features.

What evaluations does Phoenix support?+

Phoenix supports relevance scoring, hallucination detection, toxicity checks, and custom evaluations. You can use LLM-as-judge evaluations where a model scores the output, or write custom evaluation functions.

Sources citées (3)
  • Phoenix GitHub— Phoenix is an AI observability platform with 9.1K+ GitHub stars
  • Phoenix Docs— OpenTelemetry-based instrumentation for LLM applications
  • OpenTelemetry— OpenTelemetry observability framework
🙏

Source et remerciements

Arize-ai/phoenix — 9,100+ GitHub stars

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires