# Apache Calcite — Dynamic SQL Query Planning and Optimization Framework

> Modular SQL query planning framework used as the query optimizer inside Apache Hive, Druid, Flink, and dozens of other data systems.

## Install

Save as a script file and run:

# Apache Calcite — Dynamic SQL Query Planning and Optimization Framework

## Quick Use
```xml
<!-- Maven dependency -->
<dependency>
  <groupId>org.apache.calcite</groupId>
  <artifactId>calcite-core</artifactId>
  <version>1.37.0</version>
</dependency>
```
```java
// Connect via JDBC and query a CSV file as a table
Connection conn = DriverManager.getConnection("jdbc:calcite:model=model.json");
ResultSet rs = conn.createStatement().executeQuery("SELECT * FROM EMPS WHERE age > 30");
```

## Introduction
Apache Calcite is a foundational framework for building databases and data management systems. Rather than storing data itself, it provides a SQL parser, validator, query optimizer, and JDBC adapter that other systems plug into. Projects like Apache Hive, Druid, Flink, and Phoenix all rely on Calcite for SQL processing.

## What Apache Calcite Does
- Parses and validates SQL statements against user-defined schemas
- Optimizes query plans using cost-based and rule-based transformations
- Provides a JDBC driver that turns any data source into a SQL-queryable endpoint
- Supports federated queries across multiple heterogeneous data sources
- Offers adapters for CSV files, JSON, JDBC databases, Elasticsearch, and more

## Architecture Overview
Calcite processes queries in stages: the SQL parser produces a syntax tree, the validator checks types and resolves names against a schema, and the optimizer (called the planner) transforms the relational algebra tree using pluggable rules. The planner supports both heuristic (rule-based) and Volcano-style (cost-based) optimization. Adapters translate optimized plans into operations on the underlying data source, whether that is an in-memory collection, a file, or a remote database.

## Self-Hosting & Configuration
- Add calcite-core as a Maven or Gradle dependency in your Java project
- Define a model.json file describing schemas and their adapter types
- Implement the Schema and Table interfaces to expose custom data sources
- Register optimization rules with the planner for domain-specific transformations
- Use the JDBC driver (jdbc:calcite:) for SQL access from any Java application

## Key Features
- Pluggable adapter architecture lets you query any data source through standard SQL
- Cost-based optimizer with extensible statistics and cost model for smart plan selection
- Materialized view rewriting automatically routes queries to precomputed results
- Streaming SQL extensions support continuous queries over event streams
- Lattice and star-schema optimizations accelerate OLAP-style aggregate queries

## Comparison with Similar Tools
- **Apache DataFusion** — Rust-based query engine; embeddable like Calcite but focused on single-process execution rather than framework reuse
- **Substrait** — Cross-language query plan specification; Calcite can produce Substrait plans but also includes its own optimizer and execution
- **Presto/Trino** — Distributed SQL engines that embed their own optimizers; Calcite is a library others embed rather than a standalone engine
- **DuckDB** — Embedded analytical database with its own parser and optimizer; Calcite is a framework for building such systems
- **Apache Drill** — SQL query engine for multiple data sources; built on top of Calcite for parsing and optimization

## FAQ
**Q: Is Calcite a database?**
A: No. Calcite is a framework that provides SQL parsing, optimization, and JDBC connectivity. It does not store data. Systems like Hive, Druid, and Flink use Calcite as their SQL processing layer.

**Q: Which projects use Calcite?**
A: Apache Hive, Druid, Flink, Phoenix, Beam, Kylin, and Storm all use Calcite for query parsing and optimization. Many commercial data products also embed it.

**Q: Can I use Calcite to query CSV or JSON files?**
A: Yes. Calcite includes built-in adapters for CSV and JSON files. Define a model.json pointing to your files and query them with standard SQL via the JDBC driver.

**Q: How do I add custom optimization rules?**
A: Implement the RelOptRule interface, define pattern matching for the relational tree nodes you want to transform, and register the rule with the planner. Calcite applies matching rules during optimization.

## Sources
- https://github.com/apache/calcite
- https://calcite.apache.org/docs/

---
Source: https://tokrepo.com/en/workflows/fd0fa862-3d3a-11f1-9bc6-00163e2b0d79
Author: Script Depot