ConfigsApr 3, 2026·2 min read

Opik — Debug, Evaluate & Monitor LLM Apps

Trace LLM calls, run automated evaluations, and monitor RAG and agent quality in production. By Comet. 18K+ GitHub stars.

AI
AI Open Source · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install opik
opik configure
import opik

# One-line tracing for any LLM call
@opik.track
def my_llm_call(prompt: str) -> str:
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

result = my_llm_call("What is retrieval augmented generation?")
# Trace captured: input, output, latency, tokens, cost

Self-host the dashboard:

docker compose up -d  # from the opik repo

Intro

Opik is an open-source LLM evaluation and observability platform by Comet with 18,600+ GitHub stars. It provides end-to-end tracing for LLM calls, automated evaluation with 20+ built-in metrics, dataset management for regression testing, and production monitoring dashboards. A single @opik.track decorator captures everything — inputs, outputs, latency, token usage, and costs. Opik integrates with LangChain, LlamaIndex, OpenAI, Anthropic, and major agent frameworks, giving teams full visibility into their AI application quality.

Works with: OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, Haystack, Bedrock. Best for teams running LLM apps in production who need evaluation and monitoring. Setup time: under 3 minutes.


Opik Features

Tracing

import opik

@opik.track
def rag_pipeline(query: str):
    docs = retrieve(query)       # Traced as child span
    context = format(docs)       # Traced as child span
    answer = generate(query, context)  # Traced as child span
    return answer

# Dashboard shows full trace tree:
# rag_pipeline (2.3s)
#   ├─ retrieve (0.5s) - 8 docs found
#   ├─ format (0.1s)
#   └─ generate (1.7s) - 342 tokens, $0.005

Automated Evaluation (20+ Metrics)

from opik.evaluation.metrics import Hallucination, AnswerRelevance, ContextPrecision

# Evaluate your RAG pipeline
results = opik.evaluate(
    dataset="qa-test-set",
    task=rag_pipeline,
    scoring_metrics=[
        Hallucination(),
        AnswerRelevance(),
        ContextPrecision(),
    ]
)
print(results.summary())
# Hallucination: 0.12 | Relevance: 0.89 | Precision: 0.85

Built-in metrics:

  • Hallucination — Detects fabricated information
  • Answer Relevance — Does the answer match the question?
  • Context Precision — Is retrieved context relevant?
  • Faithfulness — Is the answer supported by context?
  • Moderation — Toxicity, bias, PII detection
  • Custom — Write your own Python scoring functions

Dataset Management

# Create evaluation datasets from production traces
dataset = opik.Dataset(name="regression-tests")
dataset.insert([
    {"input": "What is RAG?", "expected": "Retrieval Augmented Generation..."},
    {"input": "How does fine-tuning work?", "expected": "Fine-tuning adjusts..."},
])

# Run evaluations on every deployment
results = opik.evaluate(dataset=dataset, task=my_pipeline)

Framework Integrations

# LangChain
from opik.integrations.langchain import OpikTracer
callbacks = [OpikTracer()]
chain.invoke(input, config={"callbacks": callbacks})

# LlamaIndex
from opik.integrations.llama_index import LlamaIndexCallbackHandler
handler = LlamaIndexCallbackHandler()

# OpenAI directly
from opik.integrations.openai import track_openai
client = track_openai(OpenAI())

FAQ

Q: What is Opik? A: Opik is an open-source LLM evaluation and observability platform by Comet with 18,600+ GitHub stars. It provides tracing, 20+ automated evaluation metrics, dataset management, and production monitoring for LLM applications.

Q: How is Opik different from Langfuse? A: Both provide LLM tracing and observability. Opik has stronger evaluation features (20+ built-in metrics, automated eval pipelines). Langfuse focuses more on prompt management. Opik is backed by Comet (established MLOps company).

Q: Is Opik free? A: Yes, open-source under Apache-2.0. Self-host for free. Comet also offers a managed cloud version.


🙏

Source & Thanks

Created by Comet ML. Licensed under Apache-2.0.

opik — ⭐ 18,600+

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.