# Evidently — ML & LLM Monitoring with 100+ Metrics

> Evaluate, test, and monitor AI systems with 100+ built-in metrics for data drift, model quality, and LLM output. 7.3K+ stars.

## Install

Copy the content below into your project:

# Evidently — ML & LLM Monitoring with 100+ Metrics

## Quick Use

```bash
pip install evidently
```

```python
from evidently.report import Report
from evidently.metric_preset import TextEvals
import pandas as pd

# Evaluate LLM outputs
data = pd.DataFrame({
    "question": ["What is RAG?", "Explain fine-tuning"],
    "answer": ["RAG is Retrieval Augmented Generation...", "Fine-tuning adjusts..."],
    "context": ["RAG combines retrieval with generation...", "Fine-tuning is a process..."]
})

report = Report(metrics=[TextEvals()])
report.run(current_data=data, reference_data=None)
report.save_html("llm_eval_report.html")
```

Launch the monitoring dashboard:
```bash
evidently ui --workspace ./my_workspace
```

---

## Intro

Evidently is an open-source ML and LLM observability framework with 7,300+ GitHub stars, providing 100+ built-in metrics for evaluating, testing, and monitoring any AI-powered system. It covers the full lifecycle — from offline evaluation (test LLM outputs before deployment) to production monitoring (detect data drift and quality degradation in real-time). Evidently generates rich HTML reports and dashboards, integrates with CI/CD for automated testing, and works with both traditional ML models and LLM applications.

Works with: Any ML model, LLM applications (OpenAI, Claude, etc.), Pandas DataFrames, MLflow, Airflow, Grafana. Best for ML/AI teams who need comprehensive model and data monitoring. Setup time: under 3 minutes.

---

## Evidently Capabilities

### LLM Evaluation

```python
from evidently.report import Report
from evidently.metrics import (
    TextLength, Sentiment, NonLetterCharacterPercentage,
    OOVWordsPercentage, RegExp
)

report = Report(metrics=[
    TextLength(column="answer"),
    Sentiment(column="answer"),
    RegExp(column="answer", reg_exp=r"I don't know", top=5),
])
report.run(current_data=production_data)
```

### Data Drift Detection

```python
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=training_data, current_data=production_data)
# Detects: feature distribution shifts, new categories, outliers
```

### Test Suites for CI/CD

```python
from evidently.test_suite import TestSuite
from evidently.tests import (
    TestColumnDrift, TestShareOfMissingValues,
    TestMeanInNSigmas
)

suite = TestSuite(tests=[
    TestColumnDrift(column="prediction"),
    TestShareOfMissingValues(column="input", lt=0.05),
    TestMeanInNSigmas(column="score", n=2),
])
suite.run(reference_data=ref, current_data=curr)

# Use in CI/CD
assert suite.as_dict()["summary"]["all_passed"] == True
```

### 100+ Built-in Metrics

| Category | Metrics |
|----------|--------|
| **Data Quality** | Missing values, duplicates, outliers, data types |
| **Data Drift** | Distribution shift, feature importance drift |
| **Classification** | Accuracy, precision, recall, F1, AUC, confusion matrix |
| **Regression** | MAE, RMSE, MAPE, residuals |
| **Text/LLM** | Length, sentiment, toxicity, regex patterns, embedding drift |
| **Ranking** | NDCG, MAP, MRR |

### Monitoring Dashboard

```python
from evidently.ui.workspace import Workspace

ws = Workspace.create("my_workspace")
project = ws.create_project("LLM App")
project.dashboard.add_panel("Text Quality Over Time")

# Add snapshots over time
for batch in daily_batches:
    report = Report(metrics=[TextEvals()])
    report.run(current_data=batch)
    ws.add_report(project.id, report)
```

---

## FAQ

**Q: What is Evidently?**
A: Evidently is an open-source ML/LLM monitoring framework with 7,300+ GitHub stars providing 100+ metrics for evaluation, testing, and production monitoring of AI systems.

**Q: How is Evidently different from MLflow?**
A: MLflow tracks experiments and models. Evidently monitors model and data quality in production — detecting drift, degradation, and LLM output issues. They complement each other: MLflow for experiment tracking, Evidently for production monitoring.

**Q: Is Evidently free?**
A: Yes, open-source under Apache-2.0. Evidently also offers a managed cloud product.

---

## Source & Thanks

> Created by [Evidently AI](https://github.com/evidentlyai). Licensed under Apache-2.0.
>
> [evidently](https://github.com/evidentlyai/evidently) — ⭐ 7,300+

---

<!-- ZH -->

## 快速使用

```bash
pip install evidently
```

```python
from evidently.report import Report
from evidently.metric_preset import TextEvals
report = Report(metrics=[TextEvals()])
report.run(current_data=data)
report.save_html("report.html")
```

---

## 简介

Evidently 是一个拥有 7,300+ GitHub stars 的开源 ML/LLM 可观测性框架，提供 100+ 内置指标用于评估、测试和监控 AI 系统。覆盖数据漂移检测、模型质量和 LLM 输出监控。

---

## 来源与感谢

> Created by [Evidently AI](https://github.com/evidentlyai). Licensed under Apache-2.0.
>
> [evidently](https://github.com/evidentlyai/evidently) — ⭐ 7,300+


---
Source: https://tokrepo.com/en/workflows/1aa244dc-3770-4626-b1f7-26ad63e0ee0b
Author: AI Open Source