Introduction
Ibis provides a unified Python dataframe API that compiles analytics expressions to SQL or native query plans for many different backends. You write your data transformations once in Python, and Ibis translates them to run on DuckDB, Polars, PostgreSQL, Spark, BigQuery, Snowflake, or other engines without changing your code.
What Ibis Does
- Provides a pandas-like API that produces deferred expressions instead of eager computation
- Compiles expressions to optimized SQL or query plans for the target backend engine
- Supports 20+ backends including DuckDB, Polars, PostgreSQL, Spark, BigQuery, and Snowflake
- Enables interactive exploration with the same code that runs in production pipelines
- Offers a consistent type system across backends for predictable behavior
Architecture Overview
Ibis uses a two-layer architecture. The top layer is the expression API, which builds a lazy computation graph of dataframe operations. The bottom layer is a compiler that translates the expression graph into the target backend's query language (SQL for relational databases, native API calls for Polars or DataFusion). Backends implement a standard interface so that new engines can be added as plugins. No data moves between systems until the user explicitly materializes results.
Self-Hosting & Configuration
- Install with pip including your backend extra, e.g.,
pip install ibis-framework[duckdb] - Connect to a backend with
ibis.<backend>.connect()passing connection parameters - Read data from files, tables, or existing database schemas
- Chain operations using the expression API (filter, select, group_by, join, mutate)
- Materialize results with
.to_pandas(),.to_polars(), or.execute()
Key Features
- Backend portability: switch from DuckDB in development to BigQuery in production with no code changes
- Lazy evaluation: operations build an expression tree and execute only when results are requested
- SQL output: call
.compile()on any expression to see the generated SQL for debugging - Type-safe expressions: operations are validated at expression-build time, catching errors before execution
- Composable: build reusable transformation functions that work across any backend
Comparison with Similar Tools
- pandas — eager in-memory computation; Ibis is lazy and pushes computation to the backend engine
- Polars — single fast engine; Ibis is a multi-backend API layer that can target Polars as one of many backends
- SQLAlchemy — ORM and SQL toolkit for application development; Ibis is an analytics-focused dataframe API
- dbt — SQL-based transformation layer; Ibis provides Python-native analytics with SQL compilation
- PySpark — Spark-specific dataframe API; Ibis supports Spark and 20+ other backends with one API
FAQ
Q: Is Ibis a replacement for pandas? A: Ibis can replace pandas for analytics workflows where you want backend portability or to work with data that does not fit in memory. For small in-memory data manipulation, pandas remains a fine choice.
Q: Can I see the SQL that Ibis generates?
A: Yes. Call ibis.to_sql(expression) or expression.compile() to inspect the generated SQL for any relational backend.
Q: Does Ibis load all data into memory?
A: No. Ibis pushes computation to the backend engine. Data stays in the database or engine until you explicitly call .execute() or .to_pandas() to fetch results.
Q: Which backends are supported? A: DuckDB, Polars, PostgreSQL, MySQL, SQLite, Spark, BigQuery, Snowflake, Trino, ClickHouse, DataFusion, Impala, MSSQL, Oracle, Exasol, Flink, and others. The list grows as the community adds backends.