# Polars — Blazingly Fast DataFrame Library in Rust > Polars is an extremely fast DataFrame library written in Rust with bindings for Python, Node.js, and R. Uses Apache Arrow columnar format, lazy evaluation, and multi-threaded query execution. The modern alternative to pandas for data engineering and analytics. ## Install Save the content below to `.claude/skills/` or append to your `CLAUDE.md`: ## Quick Use ```bash pip install polars ``` ```python import polars as pl # Create DataFrame df = pl.DataFrame({ "repo": ["react", "vue", "svelte", "angular", "solid"], "stars": [230000, 210000, 82000, 98000, 35000], "language": ["JS", "JS", "JS", "TS", "TS"], }) # Query (eager) result = ( df.filter(pl.col("stars") > 50000) .sort("stars", descending=True) .select("repo", "stars") ) print(result) # Lazy evaluation (optimized) lazy_result = ( pl.scan_parquet("assets.parquet") .filter(pl.col("stars") > 10000) .group_by("language") .agg([ pl.col("stars").mean().alias("avg_stars"), pl.col("repo").count().alias("count"), ]) .sort("avg_stars", descending=True) .collect() ) # Read various formats df = pl.read_csv("data.csv") df = pl.read_parquet("data.parquet") df = pl.read_json("data.json") df = pl.read_database("SELECT * FROM assets", connection) ``` ## Intro Polars is an extremely fast DataFrame library written in Rust with first-class Python, Node.js, and R bindings. Uses Apache Arrow columnar memory format, lazy evaluation with query optimization, and multi-threaded execution. Designed as the modern, high-performance alternative to pandas. Created by Ritchie Vink. - **Repo**: https://github.com/pola-rs/polars - **Stars**: 38K+ - **Language**: Rust - **License**: MIT ## What Polars Does - **Eager and lazy evaluation** — choose per query - **Query optimization** — predicate pushdown, projection pushdown, common subexpression elimination - **Multi-threaded** — parallel execution on all cores - **Arrow-native** — Apache Arrow columnar format, zero-copy - **Streaming** — process larger-than-RAM datasets - **Expressions** — composable, type-safe column expressions - **IO** — CSV, Parquet, JSON, Arrow IPC, Avro, databases, cloud storage (S3, GCS, Azure) - **SQL interface** — `pl.SQLContext` for SQL queries on DataFrames - **Group by** — fast aggregation with rich expression API - **Window functions** — rolling, expanding, partition-based ## Architecture Rust core with Python bindings via PyO3. Lazy mode builds a logical plan → optimizer → physical plan → parallel execution. Data stored in Apache Arrow chunked arrays for cache-friendly, SIMD-accelerated operations. ## Comparison | Library | Language | Speed | Lazy | Memory | |---|---|---|---|---| | Polars | Rust + Python | Fastest | Yes | Arrow | | pandas | Python (C ext) | Slow | No | NumPy | | Spark DataFrame | Scala/Python | Fast (distributed) | Yes | JVM | | DuckDB | C++ | Very fast | Yes | Columnar | | Vaex | C++ + Python | Fast | Yes | Memory-mapped | ## FAQ **Q: Polars vs pandas?** A: Polars is 5-100x faster on nearly all benchmarks (Rust multithreading vs Python single-threaded). The API is not compatible, but Polars' expression API is more consistent and has fewer pitfalls. Polars is recommended for new projects. **Q: How large of a dataset can it handle?** A: Lazy + streaming mode can process datasets far larger than memory. TB-scale Parquet files on a single machine are fine. **Q: Compared to DuckDB?** A: Polars is a DataFrame library (Python-first API); DuckDB is a SQL database engine. Both are fast and can complement each other. ## Sources - Docs: https://docs.pola.rs - GitHub: https://github.com/pola-rs/polars - License: MIT --- Source: https://tokrepo.com/en/workflows/polars-blazingly-fast-dataframe-library-rust-903325aa Author: Script Depot