Evidently Capabilities
LLM Evaluation
from evidently.report import Report
from evidently.metrics import (
TextLength, Sentiment, NonLetterCharacterPercentage,
OOVWordsPercentage, RegExp
)
report = Report(metrics=[
TextLength(column="answer"),
Sentiment(column="answer"),
RegExp(column="answer", reg_exp=r"I don't know", top=5),
])
report.run(current_data=production_data)Data Drift Detection
from evidently.report import Report
from evidently.metric_preset import DataDriftPreset
report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=training_data, current_data=production_data)
# Detects: feature distribution shifts, new categories, outliersTest Suites for CI/CD
from evidently.test_suite import TestSuite
from evidently.tests import (
TestColumnDrift, TestShareOfMissingValues,
TestMeanInNSigmas
)
suite = TestSuite(tests=[
TestColumnDrift(column="prediction"),
TestShareOfMissingValues(column="input", lt=0.05),
TestMeanInNSigmas(column="score", n=2),
])
suite.run(reference_data=ref, current_data=curr)
# Use in CI/CD
assert suite.as_dict()["summary"]["all_passed"] == True100+ Built-in Metrics
| Category | Metrics |
|---|---|
| Data Quality | Missing values, duplicates, outliers, data types |
| Data Drift | Distribution shift, feature importance drift |
| Classification | Accuracy, precision, recall, F1, AUC, confusion matrix |
| Regression | MAE, RMSE, MAPE, residuals |
| Text/LLM | Length, sentiment, toxicity, regex patterns, embedding drift |
| Ranking | NDCG, MAP, MRR |
Monitoring Dashboard
from evidently.ui.workspace import Workspace
ws = Workspace.create("my_workspace")
project = ws.create_project("LLM App")
project.dashboard.add_panel("Text Quality Over Time")
# Add snapshots over time
for batch in daily_batches:
report = Report(metrics=[TextEvals()])
report.run(current_data=batch)
ws.add_report(project.id, report)FAQ
Q: What is Evidently? A: Evidently is an open-source ML/LLM monitoring framework with 7,300+ GitHub stars providing 100+ metrics for evaluation, testing, and production monitoring of AI systems.
Q: How is Evidently different from MLflow? A: MLflow tracks experiments and models. Evidently monitors model and data quality in production — detecting drift, degradation, and LLM output issues. They complement each other: MLflow for experiment tracking, Evidently for production monitoring.
Q: Is Evidently free? A: Yes, open-source under Apache-2.0. Evidently also offers a managed cloud product.