Esta página se muestra en inglés. Una traducción al español está en curso.
ScriptsApr 12, 2026·2 min de lectura

Polars — Blazingly Fast DataFrame Library in Rust

Polars is an extremely fast DataFrame library written in Rust with bindings for Python, Node.js, and R. Uses Apache Arrow columnar format, lazy evaluation, and multi-threaded query execution. The modern alternative to pandas for data engineering and analytics.

Introducción

Polars is an extremely fast DataFrame library written in Rust with first-class Python, Node.js, and R bindings. Uses Apache Arrow columnar memory format, lazy evaluation with query optimization, and multi-threaded execution. Designed as the modern, high-performance alternative to pandas. Created by Ritchie Vink.

What Polars Does

  • Eager and lazy evaluation — choose per query
  • Query optimization — predicate pushdown, projection pushdown, common subexpression elimination
  • Multi-threaded — parallel execution on all cores
  • Arrow-native — Apache Arrow columnar format, zero-copy
  • Streaming — process larger-than-RAM datasets
  • Expressions — composable, type-safe column expressions
  • IO — CSV, Parquet, JSON, Arrow IPC, Avro, databases, cloud storage (S3, GCS, Azure)
  • SQL interfacepl.SQLContext for SQL queries on DataFrames
  • Group by — fast aggregation with rich expression API
  • Window functions — rolling, expanding, partition-based

Architecture

Rust core with Python bindings via PyO3. Lazy mode builds a logical plan → optimizer → physical plan → parallel execution. Data stored in Apache Arrow chunked arrays for cache-friendly, SIMD-accelerated operations.

Comparison

Library Language Speed Lazy Memory
Polars Rust + Python Fastest Yes Arrow
pandas Python (C ext) Slow No NumPy
Spark DataFrame Scala/Python Fast (distributed) Yes JVM
DuckDB C++ Very fast Yes Columnar
Vaex C++ + Python Fast Yes Memory-mapped

FAQ

Q: Polars vs pandas? A: Polars is 5-100x faster on nearly all benchmarks (Rust multithreading vs Python single-threaded). The API is not compatible, but Polars' expression API is more consistent and has fewer pitfalls. Polars is recommended for new projects.

Q: How large of a dataset can it handle? A: Lazy + streaming mode can process datasets far larger than memory. TB-scale Parquet files on a single machine are fine.

Q: Compared to DuckDB? A: Polars is a DataFrame library (Python-first API); DuckDB is a SQL database engine. Both are fast and can complement each other.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados