# Presto — Distributed SQL Engine for Interactive Analytics > Facebook-born distributed SQL engine for running fast, interactive queries against data lakes, warehouses, and federated sources. ## Install Save as a script file and run: # Presto — Distributed SQL for Federated, Interactive Analytics ## Quick Use ```bash # Docker quickstart — a single-node Presto cluster docker run -p 8080:8080 --name presto prestodb/presto:latest # Connect with the Presto CLI presto --server localhost:8080 --catalog hive --schema default # Example federated query across Hive and MySQL catalogs SELECT h.event_id, m.customer_name FROM hive.events.logs h JOIN mysql.app.customers m ON m.id = h.customer_id WHERE h.event_date = DATE '2026-04-16'; ``` ## Introduction Presto (PrestoDB) is a distributed SQL query engine originally developed at Facebook for running fast, interactive analytical queries against petabyte-scale datasets. Unlike traditional warehouses, Presto separates compute from storage and federates many sources — Hive, Iceberg, MySQL, Postgres, Cassandra, Kafka — through pluggable connectors. ## What Presto Does - Executes ANSI SQL across heterogeneous data stores in one query. - Runs interactive, low-latency queries on HDFS/S3 data lakes. - Federates warehouses and operational databases using 20+ connectors. - Supports CTAS/INSERT into lakehouse tables (Iceberg, Hudi, Delta via extensions). - Exposes REST + JDBC/ODBC endpoints for BI tools like Tableau and Superset. ## Architecture Overview A Presto cluster has one coordinator and many workers. The coordinator parses SQL, plans distributed execution, and splits scans across workers that stream data pages through a pipelined execution model. Connectors adapt external sources to Presto's type system, while the discovery service tracks active workers. Memory is managed with per-node and query-level pools to keep the cluster stable under heavy concurrency. ## Self-Hosting & Configuration - Packaged as `presto-server` tarball; deploy via Ansible, Kubernetes, or EMR. - Key config files: `config.properties`, `node.properties`, `jvm.config`, and per-catalog `*.properties`. - Tune `query.max-memory`, `query.max-memory-per-node`, and `task.concurrency` per workload. - Use `hive.metastore-uri` to point at a Hive Metastore, AWS Glue, or REST catalog. - Secure with Kerberos, TLS, LDAP/OAuth2, and system + Ranger authorizers. ## Key Features - Interactive SQL latency on lake-scale data (seconds to minutes). - Rich connector ecosystem for federated analytics without ETL. - Cost-based optimizer with statistics from Hive, Iceberg, and Glue. - Materialized views and caching (with plugins like Presto cache and Alluxio). - Strong SQL: window functions, lambdas, geospatial, and approximate aggregates. ## Comparison with Similar Tools - **Trino** — A community fork of Presto with faster feature iteration; PrestoDB stays aligned with Meta/LinkedIn. - **Apache Spark SQL** — Batch-centric with streaming; Presto is tuned for interactive latency. - **Apache Drill** — Schema-on-read file queries; Presto offers richer connectors and optimizer. - **ClickHouse** — Columnar OLAP DB you own; Presto federates across many existing stores instead. - **BigQuery / Athena** — Managed cloud analytics; Presto is the open engine you can self-host everywhere. ## FAQ **Q:** What is the difference between Presto and Trino? A: Same origin; Trino (formerly PrestoSQL) is an active community fork, while PrestoDB evolves at Meta, LinkedIn, and the Linux Foundation's Presto Foundation. **Q:** Does Presto store data? A: No — it is a query engine only. Storage lives in S3, HDFS, Hive, Iceberg, RDBMS, and other catalogs. **Q:** How many nodes do I need? A: A small cluster of 3–5 workers handles GB-scale analytics; large deployments run hundreds of workers. **Q:** What clients can connect? A: JDBC, ODBC, Python (`presto-python-client`), Node.js, Go drivers, and the `presto` CLI. ## Sources - https://github.com/prestodb/presto - https://prestodb.io/docs/current --- Source: https://tokrepo.com/en/workflows/5f0c50a5-3931-11f1-9bc6-00163e2b0d79 Author: Script Depot