Is Opik — Debug, Evaluate & Monitor LLM Apps free to use?

Yes. Opik — Debug, Evaluate & Monitor LLM Apps is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Opik — Debug, Evaluate & Monitor LLM Apps?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ConfigsApr 3, 2026·2 min read

Opik — Debug, Evaluate & Monitor LLM Apps

Trace LLM calls, run automated evaluations, and monitor RAG and agent quality in production. By Comet. 18K+ GitHub stars.

AI Open Source · Community

TL;DR

Opik traces LLM calls, runs evals, and monitors RAG quality in production.

§01

What it is

Opik is an open-source LLM observability platform by Comet that provides tracing, evaluation, and production monitoring for AI applications. It instruments LLM calls with a single decorator, runs automated quality evaluations on your outputs, and monitors RAG retrieval quality and agent behavior in production.

It targets AI engineers building production LLM applications who need to debug issues, measure output quality systematically, and catch regressions before users report them.

§02

How it saves time or tokens

Opik surfaces the root cause of quality issues faster than manual debugging. The tracing view shows exactly which step in a multi-step chain produced a bad output, with token counts, latency, and cost for each step. Automated evaluations run continuously, so you know when prompt changes improve or degrade quality. For RAG applications, Opik evaluates retrieval relevance and generation faithfulness, identifying where tokens are wasted on irrelevant context.

§03

How to use

Install and configure:

pip install opik
opik configure

Add tracing with one decorator:

import opik

@opik.track()
def generate_answer(question: str):
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{'role': 'user', 'content': question}]
    )
    return response.choices[0].message.content

Run evaluations on your dataset:

from opik.evaluation import evaluate
from opik.evaluation.metrics import Hallucination, AnswerRelevance

results = evaluate(
    experiment_name='qa-v2',
    dataset=my_dataset,
    task=generate_answer,
    scoring_metrics=[Hallucination(), AnswerRelevance()]
)

§04

Example

import opik
from opik.evaluation import evaluate
from opik.evaluation.metrics import Hallucination, AnswerRelevance

@opik.track()
def rag_pipeline(query: str):
    # Retrieve relevant documents
    docs = vector_store.similarity_search(query, k=3)
    context = '\n'.join([d.page_content for d in docs])
    
    # Generate answer with context
    from openai import OpenAI
    client = OpenAI()
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[
            {'role': 'system', 'content': f'Context: {context}'},
            {'role': 'user', 'content': query}
        ]
    )
    return response.choices[0].message.content

# Evaluate the RAG pipeline
results = evaluate(
    experiment_name='rag-eval-v1',
    dataset=test_questions,
    task=rag_pipeline,
    scoring_metrics=[Hallucination(), AnswerRelevance()]
)

§05

Related on TokRepo

AI tools for monitoring -- LLM monitoring and observability platforms
AI tools for testing -- Testing and evaluation frameworks for AI

§06

Common pitfalls

Evaluation metrics like Hallucination use an LLM judge, which adds API costs. Run evaluations on representative samples rather than entire datasets to control costs.
Tracing in production generates significant data volume. Configure sampling rates for high-traffic applications to keep storage and costs manageable.
Custom metrics require understanding of the scoring API. Start with built-in metrics (Hallucination, AnswerRelevance, Moderation) before writing custom evaluators.

Frequently Asked Questions

How does Opik compare to LangSmith?+

Both provide LLM tracing and evaluation. Opik is open-source and self-hostable, while LangSmith is a commercial product. Opik works with any LLM library and does not require LangChain. Both offer trace visualization, evaluation frameworks, and production monitoring.

Can I self-host Opik?+

Yes. Opik is open-source and can be self-hosted using Docker. The self-hosted version includes the full tracing, evaluation, and dashboard functionality. Comet also offers a cloud-hosted version with additional features and managed infrastructure.

What evaluation metrics does Opik provide?+

Opik includes built-in metrics for Hallucination, AnswerRelevance, ContextPrecision, ContextRecall, and Moderation. These use LLM-as-judge patterns to score outputs. You can also define custom metrics using Python functions for domain-specific quality criteria.

Does Opik work with RAG applications?+

Yes. Opik is designed with RAG in mind. It traces both the retrieval and generation steps, evaluates retrieval relevance (ContextPrecision, ContextRecall), and checks generation faithfulness (Hallucination). This gives you end-to-end visibility into RAG pipeline quality.

Which LLM providers does Opik support?+

Opik works with any LLM provider. The @opik.track() decorator wraps your existing code regardless of provider. It also provides direct integrations with LangChain, LlamaIndex, and OpenAI for automatic tracing without decorators.

Citations (3)

Opik GitHub Repository— Opik is an open-source LLM observability platform by Comet
Opik Documentation— Opik provides built-in evaluation metrics for hallucination and relevance
Judging LLM-as-a-Judge Paper— LLM-as-judge evaluation patterns for automated output quality assessment

Related on TokRepo

Monitoring tools Testing tools Langfuse

🙏

Source & Thanks

Created by Comet ML. Licensed under Apache-2.0.

opik — ⭐ 18,600+

Discussion

No comments yet. Be the first to share your thoughts.

Opik — Debug, Evaluate & Monitor LLM Apps

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Source & Thanks

Discussion

Related Assets

Conda — Cross-Platform Package and Environment Manager

Sphinx — Python Documentation Generator

Neutralinojs — Lightweight Cross-Platform Desktop Apps