Is DeepEval — LLM Testing Framework with 30+ Metrics free to use?

Yes. DeepEval — LLM Testing Framework with 30+ Metrics is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install DeepEval — LLM Testing Framework with 30+ Metrics?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsApr 1, 2026·2 min read

DeepEval — LLM Testing Framework with 30+ Metrics

Name: DeepEval — LLM Testing Framework with 30+ Metrics
Author: TokRepo精选

DeepEval is a pytest-like testing framework for LLM apps with 30+ metrics. 14.4K+ GitHub stars. RAG, agent, multimodal evaluation. Runs locally. MIT.

TokRepo精选 · Community

Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

# Install
pip install -U deepeval

# Create a test (test_llm.py)
cat > test_llm.py << 'EOF'
from deepeval import assert_test
from deepeval.test_case import LLMTestCase
from deepeval.metrics import AnswerRelevancyMetric

def test_answer_relevancy():
    test_case = LLMTestCase(
        input="What is DeepEval?",
        actual_output="DeepEval is an LLM testing framework.",
        retrieval_context=["DeepEval provides 30+ metrics for LLM evaluation."]
    )
    metric = AnswerRelevancyMetric(threshold=0.7)
    assert_test(test_case, [metric])
EOF

# Run with deepeval (pytest-compatible)
deepeval test run test_llm.py

Intro

DeepEval is an open-source testing framework for LLM applications, functioning like pytest but specialized for AI evaluation. With 14,400+ GitHub stars and MIT license, it provides 30+ evaluation metrics including G-Eval, RAG metrics (answer relevancy, faithfulness, contextual precision), agentic metrics (task completion, tool correctness), and multimodal evaluations. DeepEval supports component-level testing via the @observe decorator, integrates with OpenAI, LangChain, LlamaIndex, CrewAI, and Anthropic, and runs all evaluations locally on your machine.

Best for: Teams who want pytest-style testing for their LLM applications with comprehensive metrics Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Integrations: OpenAI, LangChain, LlamaIndex, CrewAI, Anthropic

Key Features

30+ metrics: G-Eval, RAG, agentic, multimodal, custom metrics
pytest-compatible: deepeval test run works like pytest
Component tracing: @observe decorator for per-component evaluation
Benchmark suite: MMLU, HellaSwag, DROP, and more in minimal code
Local execution: All metrics run on your machine
Framework support: OpenAI, LangChain, LlamaIndex, CrewAI, Anthropic

FAQ

Q: What is DeepEval? A: DeepEval is a pytest-like LLM testing framework with 14.4K+ stars. 30+ metrics for RAG, agents, multimodal. Runs locally. MIT licensed.

Q: How do I install DeepEval? A: pip install -U deepeval. Write test cases with LLMTestCase, run with deepeval test run.

🙏

Source & Thanks

Created by Confident AI. Licensed under MIT. confident-ai/deepeval — 14,400+ GitHub stars

◈Home 🏆Trending 👤Me

DeepEval — LLM Testing Framework with 30+ Metrics

Use it first, then decide how deep to go

Key Features

FAQ

Source & Thanks

Related Assets

Void — Open-Source Cursor Alternative

AI Shell — Natural Language to Shell Commands

LLM — CLI Tool for 100+ Language Models