Cette page est affichée en anglais. Une traduction française est en cours.
SkillsMay 10, 2026·3 min de lecture

ydata-profiling — Automated Data Quality Profiling for DataFrames

ydata-profiling (formerly pandas-profiling) generates comprehensive HTML reports from pandas or Spark DataFrames, covering statistics, correlations, missing values, duplicates, and data type analysis.

Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Native · 98/100Policy : autoriser
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Single
Confiance
Confiance : Established
Point d'entrée
ydata-profiling Overview
Commande CLI universelle
npx tokrepo install 6c8a5473-4c49-11f1-9bc6-00163e2b0d79

Introduction

ydata-profiling generates detailed exploratory data analysis reports from a single function call. Originally known as pandas-profiling, it automates the repetitive first step of any data science workflow by surfacing statistics, distributions, correlations, and data quality issues in an interactive HTML report.

What ydata-profiling Does

  • Computes descriptive statistics for every column including mean, median, quantiles, and unique counts
  • Detects missing values, zeros, and constant columns with visual indicators
  • Calculates pairwise correlations using Pearson, Spearman, Kendall, and Phik methods
  • Identifies duplicate rows and near-duplicate patterns
  • Renders an interactive HTML report with collapsible sections and histograms

Architecture Overview

The library iterates over DataFrame columns, infers types (numeric, categorical, datetime, text, image), and dispatches type-specific analysis routines. Results are collected into a description dictionary, then rendered through a Jinja2 HTML template. For large datasets, it supports minimal mode to skip expensive computations and Spark DataFrames for distributed profiling.

Self-Hosting & Configuration

  • Install via pip; optional extras include Spark support and image analysis
  • Pass a config object to control which analyses run and set thresholds
  • Use minimal=True for datasets with more than one million rows to reduce runtime
  • Export reports as HTML, JSON, or inline widgets in Jupyter notebooks
  • Integrate with Great Expectations by converting profiles to expectation suites

Key Features

  • One-line report generation from any pandas or Spark DataFrame
  • Type inference engine adapts analysis to numeric, categorical, datetime, boolean, text, and URL columns
  • Comparison mode diffs two datasets side by side for drift detection
  • Sensitive data detection flags columns containing emails, URLs, or file paths
  • Time-series mode adds autocorrelation and seasonality analysis for temporal data

Comparison with Similar Tools

  • pandas describe() — basic statistics only; ydata-profiling adds visualizations, correlations, and alerts
  • Sweetviz — similar HTML reports; ydata-profiling supports Spark and offers more configuration
  • D-Tale — interactive DataFrame browser; ydata-profiling produces static shareable reports
  • Great Expectations — focuses on validation rules; ydata-profiling focuses on exploratory analysis
  • Lux — auto-visualization in Jupyter; ydata-profiling generates comprehensive standalone reports

FAQ

Q: How long does a report take to generate? A: A few seconds for datasets under 100K rows. Use minimal=True or sampling for million-row datasets.

Q: Can I customize which sections appear in the report? A: Yes. Pass a ProfileReport config to disable correlations, duplicates, or specific variable analyses.

Q: Does it work with PySpark DataFrames? A: Yes. Install the spark extra and pass a Spark DataFrame directly. Profiling runs distributed across the cluster.

Q: Can I embed the report in a web application? A: Export to HTML and serve it statically, or use to_widgets() to embed inside a Jupyter or Streamlit app.

Sources

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires