# Opik — Debug, Evaluate & Monitor LLM Apps

> Trace LLM calls, run automated evaluations, and monitor RAG and agent quality in production. By Comet. 18K+ GitHub stars.

## Install

Save in your project root:

# Opik — Debug, Evaluate & Monitor LLM Apps

## Quick Use

```bash
pip install opik
opik configure
```

```python
import opik

# One-line tracing for any LLM call
@opik.track
def my_llm_call(prompt: str) -> str:
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": prompt}]
    )
    return response.choices[0].message.content

result = my_llm_call("What is retrieval augmented generation?")
# Trace captured: input, output, latency, tokens, cost
```

Self-host the dashboard:
```bash
docker compose up -d  # from the opik repo
```

---

## Intro

Opik is an open-source LLM evaluation and observability platform by Comet with 18,600+ GitHub stars. It provides end-to-end tracing for LLM calls, automated evaluation with 20+ built-in metrics, dataset management for regression testing, and production monitoring dashboards. A single `@opik.track` decorator captures everything — inputs, outputs, latency, token usage, and costs. Opik integrates with LangChain, LlamaIndex, OpenAI, Anthropic, and major agent frameworks, giving teams full visibility into their AI application quality.

Works with: OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, Haystack, Bedrock. Best for teams running LLM apps in production who need evaluation and monitoring. Setup time: under 3 minutes.

---

## Opik Features

### Tracing

```python
import opik

@opik.track
def rag_pipeline(query: str):
    docs = retrieve(query)       # Traced as child span
    context = format(docs)       # Traced as child span
    answer = generate(query, context)  # Traced as child span
    return answer

# Dashboard shows full trace tree:
# rag_pipeline (2.3s)
#   ├─ retrieve (0.5s) - 8 docs found
#   ├─ format (0.1s)
#   └─ generate (1.7s) - 342 tokens, $0.005
```

### Automated Evaluation (20+ Metrics)

```python
from opik.evaluation.metrics import Hallucination, AnswerRelevance, ContextPrecision

# Evaluate your RAG pipeline
results = opik.evaluate(
    dataset="qa-test-set",
    task=rag_pipeline,
    scoring_metrics=[
        Hallucination(),
        AnswerRelevance(),
        ContextPrecision(),
    ]
)
print(results.summary())
# Hallucination: 0.12 | Relevance: 0.89 | Precision: 0.85
```

Built-in metrics:
- **Hallucination** — Detects fabricated information
- **Answer Relevance** — Does the answer match the question?
- **Context Precision** — Is retrieved context relevant?
- **Faithfulness** — Is the answer supported by context?
- **Moderation** — Toxicity, bias, PII detection
- **Custom** — Write your own Python scoring functions

### Dataset Management

```python
# Create evaluation datasets from production traces
dataset = opik.Dataset(name="regression-tests")
dataset.insert([
    {"input": "What is RAG?", "expected": "Retrieval Augmented Generation..."},
    {"input": "How does fine-tuning work?", "expected": "Fine-tuning adjusts..."},
])

# Run evaluations on every deployment
results = opik.evaluate(dataset=dataset, task=my_pipeline)
```

### Framework Integrations

```python
# LangChain
from opik.integrations.langchain import OpikTracer
callbacks = [OpikTracer()]
chain.invoke(input, config={"callbacks": callbacks})

# LlamaIndex
from opik.integrations.llama_index import LlamaIndexCallbackHandler
handler = LlamaIndexCallbackHandler()

# OpenAI directly
from opik.integrations.openai import track_openai
client = track_openai(OpenAI())
```

---

## FAQ

**Q: What is Opik?**
A: Opik is an open-source LLM evaluation and observability platform by Comet with 18,600+ GitHub stars. It provides tracing, 20+ automated evaluation metrics, dataset management, and production monitoring for LLM applications.

**Q: How is Opik different from Langfuse?**
A: Both provide LLM tracing and observability. Opik has stronger evaluation features (20+ built-in metrics, automated eval pipelines). Langfuse focuses more on prompt management. Opik is backed by Comet (established MLOps company).

**Q: Is Opik free?**
A: Yes, open-source under Apache-2.0. Self-host for free. Comet also offers a managed cloud version.

---

## Source & Thanks

> Created by [Comet ML](https://github.com/comet-ml). Licensed under Apache-2.0.
>
> [opik](https://github.com/comet-ml/opik) — ⭐ 18,600+

---

<!-- ZH -->

## 快速使用

```bash
pip install opik && opik configure
```

```python
import opik

@opik.track
def my_llm_call(prompt):
    return call_openai(prompt)
```

---

## 简介

Opik 是 Comet 开源的 LLM 评估与可观测性平台，拥有 18,600+ GitHub stars。提供调用追踪、20+ 自动评估指标、数据集管理和生产监控。一个 `@opik.track` 装饰器即可捕获所有 LLM 调用信息。

---

## 来源与感谢

> Created by [Comet ML](https://github.com/comet-ml). Licensed under Apache-2.0.
>
> [opik](https://github.com/comet-ml/opik) — ⭐ 18,600+


---
Source: https://tokrepo.com/en/workflows/a543eba5-fe14-46f3-9aa5-96a5a23b72d0
Author: AI Open Source