Key Features
- 30+ metrics: G-Eval, RAG, agentic, multimodal, custom metrics
- pytest-compatible:
deepeval test runworks like pytest - Component tracing:
@observedecorator for per-component evaluation - Benchmark suite: MMLU, HellaSwag, DROP, and more in minimal code
- Local execution: All metrics run on your machine
- Framework support: OpenAI, LangChain, LlamaIndex, CrewAI, Anthropic
FAQ
Q: What is DeepEval? A: DeepEval is a pytest-like LLM testing framework with 14.4K+ stars. 30+ metrics for RAG, agents, multimodal. Runs locally. MIT licensed.
Q: How do I install DeepEval?
A: pip install -U deepeval. Write test cases with LLMTestCase, run with deepeval test run.