What is Evidently — ML & LLM Monitoring with 100+ Metrics?

Evaluate, test, and monitor AI systems with 100+ built-in metrics for data drift, model quality, and LLM output. 7.3K+ stars.

Is Evidently — ML & LLM Monitoring with 100+ Metrics free to use?

Yes. Evidently — ML & LLM Monitoring with 100+ Metrics is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Evidently — ML & LLM Monitoring with 100+ Metrics?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Evidently — ML & LLM Monitoring with 100+ Metrics

from evidently.report import Report from evidently.metric_preset import TextEvals import pandas as pd # Evaluate LLM outputs data = pd.DataFrame({ "question": ["What is RAG?", "Explain fine-tuning"], "answer": ["RAG is Retrieval Augmented Generation...", "Fine-tuning adjusts..."], "context": ["RAG combines retrieval with generation...", "Fine-tuning is a process..."] }) report = Report(metrics=[TextEvals()]) report.run(current_data=data, reference_data=None) report.save_html("llm_eval_report.html")

Evidently is an open-source ML and LLM observability framework with 7,300+ GitHub stars, providing 100+ built-in metrics for evaluating, testing, and monitoring any AI-powered system. It covers the full lifecycle — from offline evaluation (test LLM outputs before deployment) to production monitoring (detect data drift and quality degradation in real-time). Evidently generates rich HTML reports and dashboards, integrates with CI/CD for automated testing, and works with both traditional ML models and LLM applications.

Works with: Any ML model, LLM applications (OpenAI, Claude, etc.), Pandas DataFrames, MLflow, Airflow, Grafana. Best for ML/AI teams who need comprehensive model and data monitoring. Setup time: under 3 minutes.

Evidently Capabilities

LLM Evaluation

from evidently.report import Report
from evidently.metrics import (
    TextLength, Sentiment, NonLetterCharacterPercentage,
    OOVWordsPercentage, RegExp
)

report = Report(metrics=[
    TextLength(column="answer"),
    Sentiment(column="answer"),
    RegExp(column="answer", reg_exp=r"I don't know", top=5),
])
report.run(current_data=production_data)

Data Drift Detection

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

report = Report(metrics=[DataDriftPreset()])
report.run(reference_data=training_data, current_data=production_data)
# Detects: feature distribution shifts, new categories, outliers

Test Suites for CI/CD

from evidently.test_suite import TestSuite
from evidently.tests import (
    TestColumnDrift, TestShareOfMissingValues,
    TestMeanInNSigmas
)

suite = TestSuite(tests=[
    TestColumnDrift(column="prediction"),
    TestShareOfMissingValues(column="input", lt=0.05),
    TestMeanInNSigmas(column="score", n=2),
])
suite.run(reference_data=ref, current_data=curr)

# Use in CI/CD
assert suite.as_dict()["summary"]["all_passed"] == True

100+ Built-in Metrics

Category	Metrics
Data Quality	Missing values, duplicates, outliers, data types
Data Drift	Distribution shift, feature importance drift
Classification	Accuracy, precision, recall, F1, AUC, confusion matrix
Regression	MAE, RMSE, MAPE, residuals
Text/LLM	Length, sentiment, toxicity, regex patterns, embedding drift
Ranking	NDCG, MAP, MRR

Monitoring Dashboard

from evidently.ui.workspace import Workspace

ws = Workspace.create("my_workspace")
project = ws.create_project("LLM App")
project.dashboard.add_panel("Text Quality Over Time")

# Add snapshots over time
for batch in daily_batches:
    report = Report(metrics=[TextEvals()])
    report.run(current_data=batch)
    ws.add_report(project.id, report)

FAQ

Q: What is Evidently? A: Evidently is an open-source ML/LLM monitoring framework with 7,300+ GitHub stars providing 100+ metrics for evaluation, testing, and production monitoring of AI systems.

Q: How is Evidently different from MLflow? A: MLflow tracks experiments and models. Evidently monitors model and data quality in production — detecting drift, degradation, and LLM output issues. They complement each other: MLflow for experiment tracking, Evidently for production monitoring.

Q: Is Evidently free? A: Yes, open-source under Apache-2.0. Evidently also offers a managed cloud product.

Evidently — ML & LLM Monitoring with 100+ Metrics

Use it first, then decide how deep to go

Evidently Capabilities

LLM Evaluation

Data Drift Detection

Test Suites for CI/CD

100+ Built-in Metrics

Monitoring Dashboard

FAQ

Source & Thanks

Discussion

Related Assets

Latitude — AI Agent Engineering Platform

Plane — Open-Source AI Project Management

Twenty — Open-Source AI CRM (Salesforce Alternative)