ScriptsApr 12, 2026·1 min read

Polars — Blazingly Fast DataFrame Library in Rust

Polars is an extremely fast DataFrame library written in Rust with bindings for Python, Node.js, and R. Uses Apache Arrow columnar format, lazy evaluation, and multi-threaded query execution. The modern alternative to pandas for data engineering and analytics.

SC
Script Depot · Community
Quick Use

Use it first, then decide how deep to go

This block should tell both the user and the agent what to copy, install, and apply first.

pip install polars
import polars as pl

# Create DataFrame
df = pl.DataFrame({
    "repo": ["react", "vue", "svelte", "angular", "solid"],
    "stars": [230000, 210000, 82000, 98000, 35000],
    "language": ["JS", "JS", "JS", "TS", "TS"],
})

# Query (eager)
result = (
    df.filter(pl.col("stars") > 50000)
    .sort("stars", descending=True)
    .select("repo", "stars")
)
print(result)

# Lazy evaluation (optimized)
lazy_result = (
    pl.scan_parquet("assets.parquet")
    .filter(pl.col("stars") > 10000)
    .group_by("language")
    .agg([
        pl.col("stars").mean().alias("avg_stars"),
        pl.col("repo").count().alias("count"),
    ])
    .sort("avg_stars", descending=True)
    .collect()
)

# Read various formats
df = pl.read_csv("data.csv")
df = pl.read_parquet("data.parquet")
df = pl.read_json("data.json")
df = pl.read_database("SELECT * FROM assets", connection)
Intro

Polars is an extremely fast DataFrame library written in Rust with first-class Python, Node.js, and R bindings. Uses Apache Arrow columnar memory format, lazy evaluation with query optimization, and multi-threaded execution. Designed as the modern, high-performance alternative to pandas. Created by Ritchie Vink.

What Polars Does

  • Eager and lazy evaluation — choose per query
  • Query optimization — predicate pushdown, projection pushdown, common subexpression elimination
  • Multi-threaded — parallel execution on all cores
  • Arrow-native — Apache Arrow columnar format, zero-copy
  • Streaming — process larger-than-RAM datasets
  • Expressions — composable, type-safe column expressions
  • IO — CSV, Parquet, JSON, Arrow IPC, Avro, databases, cloud storage (S3, GCS, Azure)
  • SQL interfacepl.SQLContext for SQL queries on DataFrames
  • Group by — fast aggregation with rich expression API
  • Window functions — rolling, expanding, partition-based

Architecture

Rust core with Python bindings via PyO3. Lazy mode builds a logical plan → optimizer → physical plan → parallel execution. Data stored in Apache Arrow chunked arrays for cache-friendly, SIMD-accelerated operations.

Comparison

Library Language Speed Lazy Memory
Polars Rust + Python Fastest Yes Arrow
pandas Python (C ext) Slow No NumPy
Spark DataFrame Scala/Python Fast (distributed) Yes JVM
DuckDB C++ Very fast Yes Columnar
Vaex C++ + Python Fast Yes Memory-mapped

常见问题 FAQ

Q: Polars vs pandas? A: Polars 在几乎所有 benchmark 上快 5-100 倍(Rust 多线程 vs Python 单线程)。API 不兼容但 Polars 的 expression API 更一致、更不容易踩坑。新项目推荐 Polars。

Q: 能处理多大数据? A: Lazy + streaming 模式可以处理远超内存的数据集。单机 TB 级 Parquet 文件没问题。

Q: 和 DuckDB 比? A: Polars 是 DataFrame 库(Python API 为主),DuckDB 是 SQL 数据库引擎。两者都很快,可以互补使用。

来源与致谢 Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets