# Opik — Debug, Evaluate & Monitor LLM Apps > Trace LLM calls, run automated evaluations, and monitor RAG and agent quality in production. By Comet. 18K+ GitHub stars. ## Install Save in your project root: # Opik — Debug, Evaluate & Monitor LLM Apps ## Quick Use ```bash pip install opik opik configure ``` ```python import opik # One-line tracing for any LLM call @opik.track def my_llm_call(prompt: str) -> str: from openai import OpenAI client = OpenAI() response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": prompt}] ) return response.choices[0].message.content result = my_llm_call("What is retrieval augmented generation?") # Trace captured: input, output, latency, tokens, cost ``` Self-host the dashboard: ```bash docker compose up -d # from the opik repo ``` --- ## Intro Opik is an open-source LLM evaluation and observability platform by Comet with 18,600+ GitHub stars. It provides end-to-end tracing for LLM calls, automated evaluation with 20+ built-in metrics, dataset management for regression testing, and production monitoring dashboards. A single `@opik.track` decorator captures everything — inputs, outputs, latency, token usage, and costs. Opik integrates with LangChain, LlamaIndex, OpenAI, Anthropic, and major agent frameworks, giving teams full visibility into their AI application quality. Works with: OpenAI, Anthropic, LangChain, LlamaIndex, CrewAI, Haystack, Bedrock. Best for teams running LLM apps in production who need evaluation and monitoring. Setup time: under 3 minutes. --- ## Opik Features ### Tracing ```python import opik @opik.track def rag_pipeline(query: str): docs = retrieve(query) # Traced as child span context = format(docs) # Traced as child span answer = generate(query, context) # Traced as child span return answer # Dashboard shows full trace tree: # rag_pipeline (2.3s) # ├─ retrieve (0.5s) - 8 docs found # ├─ format (0.1s) # └─ generate (1.7s) - 342 tokens, $0.005 ``` ### Automated Evaluation (20+ Metrics) ```python from opik.evaluation.metrics import Hallucination, AnswerRelevance, ContextPrecision # Evaluate your RAG pipeline results = opik.evaluate( dataset="qa-test-set", task=rag_pipeline, scoring_metrics=[ Hallucination(), AnswerRelevance(), ContextPrecision(), ] ) print(results.summary()) # Hallucination: 0.12 | Relevance: 0.89 | Precision: 0.85 ``` Built-in metrics: - **Hallucination** — Detects fabricated information - **Answer Relevance** — Does the answer match the question? - **Context Precision** — Is retrieved context relevant? - **Faithfulness** — Is the answer supported by context? - **Moderation** — Toxicity, bias, PII detection - **Custom** — Write your own Python scoring functions ### Dataset Management ```python # Create evaluation datasets from production traces dataset = opik.Dataset(name="regression-tests") dataset.insert([ {"input": "What is RAG?", "expected": "Retrieval Augmented Generation..."}, {"input": "How does fine-tuning work?", "expected": "Fine-tuning adjusts..."}, ]) # Run evaluations on every deployment results = opik.evaluate(dataset=dataset, task=my_pipeline) ``` ### Framework Integrations ```python # LangChain from opik.integrations.langchain import OpikTracer callbacks = [OpikTracer()] chain.invoke(input, config={"callbacks": callbacks}) # LlamaIndex from opik.integrations.llama_index import LlamaIndexCallbackHandler handler = LlamaIndexCallbackHandler() # OpenAI directly from opik.integrations.openai import track_openai client = track_openai(OpenAI()) ``` --- ## FAQ **Q: What is Opik?** A: Opik is an open-source LLM evaluation and observability platform by Comet with 18,600+ GitHub stars. It provides tracing, 20+ automated evaluation metrics, dataset management, and production monitoring for LLM applications. **Q: How is Opik different from Langfuse?** A: Both provide LLM tracing and observability. Opik has stronger evaluation features (20+ built-in metrics, automated eval pipelines). Langfuse focuses more on prompt management. Opik is backed by Comet (established MLOps company). **Q: Is Opik free?** A: Yes, open-source under Apache-2.0. Self-host for free. Comet also offers a managed cloud version. --- ## Source & Thanks > Created by [Comet ML](https://github.com/comet-ml). Licensed under Apache-2.0. > > [opik](https://github.com/comet-ml/opik) — ⭐ 18,600+ --- ## 快速使用 ```bash pip install opik && opik configure ``` ```python import opik @opik.track def my_llm_call(prompt): return call_openai(prompt) ``` --- ## 简介 Opik 是 Comet 开源的 LLM 评估与可观测性平台,拥有 18,600+ GitHub stars。提供调用追踪、20+ 自动评估指标、数据集管理和生产监控。一个 `@opik.track` 装饰器即可捕获所有 LLM 调用信息。 --- ## 来源与感谢 > Created by [Comet ML](https://github.com/comet-ml). Licensed under Apache-2.0. > > [opik](https://github.com/comet-ml/opik) — ⭐ 18,600+ --- Source: https://tokrepo.com/en/workflows/a543eba5-fe14-46f3-9aa5-96a5a23b72d0 Author: AI Open Source