Esta página se muestra en inglés. Una traducción al español está en curso.
WorkflowsApr 3, 2026·2 min de lectura

Evidently — ML & LLM Monitoring with 100+ Metrics

Evaluate, test, and monitor AI systems with 100+ built-in metrics for data drift, model quality, and LLM output. 7.3K+ stars.

Introducción

Evidently is an open-source ML and LLM observability framework with 7,300+ GitHub stars, providing 100+ built-in metrics for evaluating, testing, and monitoring any AI-powered system. It covers the full lifecycle — from offline evaluation (test LLM outputs before deployment) to production monitoring (detect data drift and quality degradation in real-time). Evidently generates rich HTML reports and dashboards, integrates with CI/CD for automated testing, and works with both traditional ML models and LLM applications.

Works with: Any ML model, LLM applications (OpenAI, Claude, etc.), Pandas DataFrames, MLflow, Airflow, Grafana. Best for ML/AI teams who need comprehensive model and data monitoring. Setup time: under 3 minutes.


Evidently Capabilities

LLM Evaluation

from evidently.report import Report
from evidently.metrics import (
    TextLength, Sentiment, NonLetterCharacterPercentage,
    OOVWordsPercentage, RegExp
)

report = Report(metrics=[
    TextLength(column="answer"),
    Sentiment(column="answer"),
    RegExp(column="answer", reg_exp=r"I don't know", top=5),
])
report.run(current_data=production_data)

Data Drift Detection

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=training_data, current_data=production_data)
# Detects: feature distribution shifts, new categories, outliers

Test Suites for CI/CD

from evidently.test_suite import TestSuite
from evidently.tests import (
    TestColumnDrift, TestShareOfMissingValues,
    TestMeanInNSigmas
)

suite = TestSuite(tests=[
    TestColumnDrift(column="prediction"),
    TestShareOfMissingValues(column="input", lt=0.05),
    TestMeanInNSigmas(column="score", n=2),
])
suite.run(reference_data=ref, current_data=curr)

# Use in CI/CD
assert suite.as_dict()["summary"]["all_passed"] == True

100+ Built-in Metrics

Category Metrics
Data Quality Missing values, duplicates, outliers, data types
Data Drift Distribution shift, feature importance drift
Classification Accuracy, precision, recall, F1, AUC, confusion matrix
Regression MAE, RMSE, MAPE, residuals
Text/LLM Length, sentiment, toxicity, regex patterns, embedding drift
Ranking NDCG, MAP, MRR

Monitoring Dashboard

from evidently.ui.workspace import Workspace

ws = Workspace.create("my_workspace")
project = ws.create_project("LLM App")
project.dashboard.add_panel("Text Quality Over Time")

# Add snapshots over time
for batch in daily_batches:
    report = Report(metrics=[TextEvals()])
    report.run(current_data=batch)
    ws.add_report(project.id, report)

FAQ

Q: What is Evidently? A: Evidently is an open-source ML/LLM monitoring framework with 7,300+ GitHub stars providing 100+ metrics for evaluation, testing, and production monitoring of AI systems.

Q: How is Evidently different from MLflow? A: MLflow tracks experiments and models. Evidently monitors model and data quality in production — detecting drift, degradation, and LLM output issues. They complement each other: MLflow for experiment tracking, Evidently for production monitoring.

Q: Is Evidently free? A: Yes, open-source under Apache-2.0. Evidently also offers a managed cloud product.


🙏

Fuente y agradecimientos

Created by Evidently AI. Licensed under Apache-2.0.

evidently — ⭐ 7,300+

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados