Scripts2026年4月9日·1 分钟阅读

Great Expectations — Data Validation for AI Pipelines

Test your data like you test code. Validate data quality in AI/ML pipelines with expressive assertions, auto-profiling, and data docs. Apache-2.0, 11,400+ stars.

Introduction

Great Expectations is the leading data validation framework, with 11,400+ GitHub stars. Write data tests the way you write unit tests — catch data quality issues before they break your AI/ML pipelines. Supports 300+ built-in expectations, auto-profiling, and auto-generated data docs. Ideal for data engineers and ML practitioners building production data pipelines. Supports Pandas, Spark, SQL databases, and more.


Great Expectations — Test Data Like You Test Code

Core Features

  • 300+ built-in expectations — null checks, range validation, regex matching, statistical tests
  • Auto-profiling — generate expectations from sample data
  • Data docs — auto-generated HTML data quality reports
  • Multi-backend — Pandas, Spark, PostgreSQL, BigQuery, and more
  • Pipeline integration — Airflow, Dagster, Prefect, dbt

FAQ

Q: What is Great Expectations? A: An open-source data validation framework that lets you write data assertions like unit tests to catch issues before they impact AI/ML pipelines.

Q: Is it free? A: The open-source core is free (Apache-2.0); a paid cloud version is also available.


🙏

来源与感谢

Created by Great Expectations. Licensed under Apache-2.0.

great_expectations — ⭐ 11,400+

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产