# DeepEval — LLM Testing Framework with 30+ Metrics > DeepEval is a pytest-like testing framework for LLM apps with 30+ metrics. 14.4K+ GitHub stars. RAG, agent, multimodal evaluation. Runs locally. MIT. ## Install Save as a script file and run: ## Quick Use ```bash # Install pip install -U deepeval # Create a test (test_llm.py) cat > test_llm.py << 'EOF' from deepeval import assert_test from deepeval.test_case import LLMTestCase from deepeval.metrics import AnswerRelevancyMetric def test_answer_relevancy(): test_case = LLMTestCase( input="What is DeepEval?", actual_output="DeepEval is an LLM testing framework.", retrieval_context=["DeepEval provides 30+ metrics for LLM evaluation."] ) metric = AnswerRelevancyMetric(threshold=0.7) assert_test(test_case, [metric]) EOF # Run with deepeval (pytest-compatible) deepeval test run test_llm.py ``` --- ## Intro DeepEval is an open-source testing framework for LLM applications, functioning like pytest but specialized for AI evaluation. With 14,400+ GitHub stars and MIT license, it provides 30+ evaluation metrics including G-Eval, RAG metrics (answer relevancy, faithfulness, contextual precision), agentic metrics (task completion, tool correctness), and multimodal evaluations. DeepEval supports component-level testing via the `@observe` decorator, integrates with OpenAI, LangChain, LlamaIndex, CrewAI, and Anthropic, and runs all evaluations locally on your machine. **Best for**: Teams who want pytest-style testing for their LLM applications with comprehensive metrics **Works with**: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf **Integrations**: OpenAI, LangChain, LlamaIndex, CrewAI, Anthropic --- ## Key Features - **30+ metrics**: G-Eval, RAG, agentic, multimodal, custom metrics - **pytest-compatible**: `deepeval test run` works like pytest - **Component tracing**: `@observe` decorator for per-component evaluation - **Benchmark suite**: MMLU, HellaSwag, DROP, and more in minimal code - **Local execution**: All metrics run on your machine - **Framework support**: OpenAI, LangChain, LlamaIndex, CrewAI, Anthropic --- ### FAQ **Q: What is DeepEval?** A: DeepEval is a pytest-like LLM testing framework with 14.4K+ stars. 30+ metrics for RAG, agents, multimodal. Runs locally. MIT licensed. **Q: How do I install DeepEval?** A: `pip install -U deepeval`. Write test cases with `LLMTestCase`, run with `deepeval test run`. --- ## Source & Thanks > Created by [Confident AI](https://github.com/confident-ai). Licensed under MIT. > [confident-ai/deepeval](https://github.com/confident-ai/deepeval) — 14,400+ GitHub stars --- Source: https://tokrepo.com/en/workflows/a4d57f88-3711-4032-8ad5-f2040ae03178 Author: Script Depot