Opik Features
Tracing
import opik
@opik.track
def rag_pipeline(query: str):
docs = retrieve(query) # Traced as child span
context = format(docs) # Traced as child span
answer = generate(query, context) # Traced as child span
return answer
# Dashboard shows full trace tree:
# rag_pipeline (2.3s)
# ├─ retrieve (0.5s) - 8 docs found
# ├─ format (0.1s)
# └─ generate (1.7s) - 342 tokens, $0.005Automated Evaluation (20+ Metrics)
from opik.evaluation.metrics import Hallucination, AnswerRelevance, ContextPrecision
# Evaluate your RAG pipeline
results = opik.evaluate(
dataset="qa-test-set",
task=rag_pipeline,
scoring_metrics=[
Hallucination(),
AnswerRelevance(),
ContextPrecision(),
]
)
print(results.summary())
# Hallucination: 0.12 | Relevance: 0.89 | Precision: 0.85Built-in metrics:
- Hallucination — Detects fabricated information
- Answer Relevance — Does the answer match the question?
- Context Precision — Is retrieved context relevant?
- Faithfulness — Is the answer supported by context?
- Moderation — Toxicity, bias, PII detection
- Custom — Write your own Python scoring functions
Dataset Management
# Create evaluation datasets from production traces
dataset = opik.Dataset(name="regression-tests")
dataset.insert([
{"input": "What is RAG?", "expected": "Retrieval Augmented Generation..."},
{"input": "How does fine-tuning work?", "expected": "Fine-tuning adjusts..."},
])
# Run evaluations on every deployment
results = opik.evaluate(dataset=dataset, task=my_pipeline)Framework Integrations
# LangChain
from opik.integrations.langchain import OpikTracer
callbacks = [OpikTracer()]
chain.invoke(input, config={"callbacks": callbacks})
# LlamaIndex
from opik.integrations.llama_index import LlamaIndexCallbackHandler
handler = LlamaIndexCallbackHandler()
# OpenAI directly
from opik.integrations.openai import track_openai
client = track_openai(OpenAI())FAQ
Q: What is Opik? A: Opik is an open-source LLM evaluation and observability platform by Comet with 18,600+ GitHub stars. It provides tracing, 20+ automated evaluation metrics, dataset management, and production monitoring for LLM applications.
Q: How is Opik different from Langfuse? A: Both provide LLM tracing and observability. Opik has stronger evaluation features (20+ built-in metrics, automated eval pipelines). Langfuse focuses more on prompt management. Opik is backed by Comet (established MLOps company).
Q: Is Opik free? A: Yes, open-source under Apache-2.0. Self-host for free. Comet also offers a managed cloud version.