Scripts2026年4月12日·1 分钟阅读

Polars — Blazingly Fast DataFrame Library in Rust

Polars is an extremely fast DataFrame library written in Rust with bindings for Python, Node.js, and R. Uses Apache Arrow columnar format, lazy evaluation, and multi-threaded query execution. The modern alternative to pandas for data engineering and analytics.

SC
Script Depot · Community
快速使用

先拿来用,再决定要不要深挖

这里应该同时让用户和 Agent 知道第一步该复制什么、安装什么、落到哪里。

pip install polars
import polars as pl

# Create DataFrame
df = pl.DataFrame({
    "repo": ["react", "vue", "svelte", "angular", "solid"],
    "stars": [230000, 210000, 82000, 98000, 35000],
    "language": ["JS", "JS", "JS", "TS", "TS"],
})

# Query (eager)
result = (
    df.filter(pl.col("stars") > 50000)
    .sort("stars", descending=True)
    .select("repo", "stars")
)
print(result)

# Lazy evaluation (optimized)
lazy_result = (
    pl.scan_parquet("assets.parquet")
    .filter(pl.col("stars") > 10000)
    .group_by("language")
    .agg([
        pl.col("stars").mean().alias("avg_stars"),
        pl.col("repo").count().alias("count"),
    ])
    .sort("avg_stars", descending=True)
    .collect()
)

# Read various formats
df = pl.read_csv("data.csv")
df = pl.read_parquet("data.parquet")
df = pl.read_json("data.json")
df = pl.read_database("SELECT * FROM assets", connection)
介绍

Polars is an extremely fast DataFrame library written in Rust with first-class Python, Node.js, and R bindings. Uses Apache Arrow columnar memory format, lazy evaluation with query optimization, and multi-threaded execution. Designed as the modern, high-performance alternative to pandas. Created by Ritchie Vink.

What Polars Does

  • Eager and lazy evaluation — choose per query
  • Query optimization — predicate pushdown, projection pushdown, common subexpression elimination
  • Multi-threaded — parallel execution on all cores
  • Arrow-native — Apache Arrow columnar format, zero-copy
  • Streaming — process larger-than-RAM datasets
  • Expressions — composable, type-safe column expressions
  • IO — CSV, Parquet, JSON, Arrow IPC, Avro, databases, cloud storage (S3, GCS, Azure)
  • SQL interfacepl.SQLContext for SQL queries on DataFrames
  • Group by — fast aggregation with rich expression API
  • Window functions — rolling, expanding, partition-based

Architecture

Rust core with Python bindings via PyO3. Lazy mode builds a logical plan → optimizer → physical plan → parallel execution. Data stored in Apache Arrow chunked arrays for cache-friendly, SIMD-accelerated operations.

Comparison

Library Language Speed Lazy Memory
Polars Rust + Python Fastest Yes Arrow
pandas Python (C ext) Slow No NumPy
Spark DataFrame Scala/Python Fast (distributed) Yes JVM
DuckDB C++ Very fast Yes Columnar
Vaex C++ + Python Fast Yes Memory-mapped

常见问题 FAQ

Q: Polars vs pandas? A: Polars 在几乎所有 benchmark 上快 5-100 倍(Rust 多线程 vs Python 单线程)。API 不兼容但 Polars 的 expression API 更一致、更不容易踩坑。新项目推荐 Polars。

Q: 能处理多大数据? A: Lazy + streaming 模式可以处理远超内存的数据集。单机 TB 级 Parquet 文件没问题。

Q: 和 DuckDB 比? A: Polars 是 DataFrame 库(Python API 为主),DuckDB 是 SQL 数据库引擎。两者都很快,可以互补使用。

来源与致谢 Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产