# Polars — Blazingly Fast DataFrame Library in Rust

> Polars is an extremely fast DataFrame library written in Rust with bindings for Python, Node.js, and R. Uses Apache Arrow columnar format, lazy evaluation, and multi-threaded query execution. The modern alternative to pandas for data engineering and analytics.

## Install

Save as a script file and run:

## Quick Use

```bash
pip install polars
```

```python
import polars as pl

# Create DataFrame
df = pl.DataFrame({
    "repo": ["react", "vue", "svelte", "angular", "solid"],
    "stars": [230000, 210000, 82000, 98000, 35000],
    "language": ["JS", "JS", "JS", "TS", "TS"],
})

# Query (eager)
result = (
    df.filter(pl.col("stars") > 50000)
    .sort("stars", descending=True)
    .select("repo", "stars")
)
print(result)

# Lazy evaluation (optimized)
lazy_result = (
    pl.scan_parquet("assets.parquet")
    .filter(pl.col("stars") > 10000)
    .group_by("language")
    .agg([
        pl.col("stars").mean().alias("avg_stars"),
        pl.col("repo").count().alias("count"),
    ])
    .sort("avg_stars", descending=True)
    .collect()
)

# Read various formats
df = pl.read_csv("data.csv")
df = pl.read_parquet("data.parquet")
df = pl.read_json("data.json")
df = pl.read_database("SELECT * FROM assets", connection)
```

## Intro

Polars is an extremely fast DataFrame library written in Rust with first-class Python, Node.js, and R bindings. Uses Apache Arrow columnar memory format, lazy evaluation with query optimization, and multi-threaded execution. Designed as the modern, high-performance alternative to pandas. Created by Ritchie Vink.

- **Repo**: https://github.com/pola-rs/polars
- **Stars**: 38K+
- **Language**: Rust
- **License**: MIT

## What Polars Does

- **Eager and lazy evaluation** — choose per query
- **Query optimization** — predicate pushdown, projection pushdown, common subexpression elimination
- **Multi-threaded** — parallel execution on all cores
- **Arrow-native** — Apache Arrow columnar format, zero-copy
- **Streaming** — process larger-than-RAM datasets
- **Expressions** — composable, type-safe column expressions
- **IO** — CSV, Parquet, JSON, Arrow IPC, Avro, databases, cloud storage (S3, GCS, Azure)
- **SQL interface** — `pl.SQLContext` for SQL queries on DataFrames
- **Group by** — fast aggregation with rich expression API
- **Window functions** — rolling, expanding, partition-based

## Architecture

Rust core with Python bindings via PyO3. Lazy mode builds a logical plan → optimizer → physical plan → parallel execution. Data stored in Apache Arrow chunked arrays for cache-friendly, SIMD-accelerated operations.

## Comparison

| Library | Language | Speed | Lazy | Memory |
|---|---|---|---|---|
| Polars | Rust + Python | Fastest | Yes | Arrow |
| pandas | Python (C ext) | Slow | No | NumPy |
| Spark DataFrame | Scala/Python | Fast (distributed) | Yes | JVM |
| DuckDB | C++ | Very fast | Yes | Columnar |
| Vaex | C++ + Python | Fast | Yes | Memory-mapped |

## 常见问题 FAQ

**Q: Polars vs pandas？**
A: Polars 在几乎所有 benchmark 上快 5-100 倍（Rust 多线程 vs Python 单线程）。API 不兼容但 Polars 的 expression API 更一致、更不容易踩坑。新项目推荐 Polars。

**Q: 能处理多大数据？**
A: Lazy + streaming 模式可以处理远超内存的数据集。单机 TB 级 Parquet 文件没问题。

**Q: 和 DuckDB 比？**
A: Polars 是 DataFrame 库（Python API 为主），DuckDB 是 SQL 数据库引擎。两者都很快，可以互补使用。

## 来源与致谢 Sources

- Docs: https://docs.pola.rs
- GitHub: https://github.com/pola-rs/polars
- License: MIT

---
Source: https://tokrepo.com/en/workflows/903325aa-3649-11f1-9bc6-00163e2b0d79
Author: Script Depot