Scripts2026年3月31日·1 分钟阅读

Ragas — Evaluate RAG & LLM Applications

Ragas evaluates LLM applications with objective metrics, test data generation, and data-driven insights. 13.2K+ GitHub stars. RAG evaluation, auto test generation. Apache 2.0.

介绍

Ragas is a toolkit for supercharging LLM application evaluations with objective metrics, intelligent test data generation, and data-driven insights. With 13,200+ GitHub stars and Apache 2.0 license, Ragas provides LLM-based and traditional evaluation metrics for RAG pipelines (faithfulness, answer relevancy, context precision/recall), automatic test dataset generation from your documents, seamless integration with LangChain and observability tools, and feedback loops to leverage production data for improvement.

Best for: Teams building RAG pipelines who need systematic evaluation and quality measurement Works with: Claude Code, OpenAI Codex, Cursor, Gemini CLI, Windsurf Integrations: LangChain, LlamaIndex, Langfuse, Phoenix


Key Features

  • RAG metrics: Faithfulness, answer relevancy, context precision/recall
  • Test data generation: Auto-create evaluation datasets from documents
  • LLM + traditional metrics: Both AI-judged and deterministic scoring
  • Production feedback loops: Use real data to improve quality
  • LangChain integration: Evaluate chains and retrievers directly
  • Async scoring: Fast parallel evaluation with any LLM provider

FAQ

Q: What is Ragas? A: Ragas is an LLM evaluation toolkit with 13.2K+ stars. Objective metrics for RAG (faithfulness, relevancy), auto test generation, LangChain integration. Apache 2.0.

Q: How do I install Ragas? A: pip install ragas. Quick start with ragas quickstart rag_eval -o ./my-project.


🙏

来源与感谢

Created by Exploding Gradients. Apache 2.0. explodinggradients/ragas — 13,200+ GitHub stars

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产