How do I install Presto — Distributed SQL Engine for Interactive Analytics?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

ScriptsApr 16, 2026·3 min read

Presto — Distributed SQL Engine for Interactive Analytics

Facebook-born distributed SQL engine for running fast, interactive queries against data lakes, warehouses, and federated sources.

Script Depot · Community

TL;DR

Presto runs fast interactive SQL queries against data lakes, warehouses, and federated sources at petabyte scale.

§01

What it is

Presto is an open-source distributed SQL query engine originally developed at Facebook (now Meta). It runs interactive analytical queries against data sources ranging from HDFS and S3 data lakes to relational databases, Kafka streams, and Elasticsearch indexes. A single Presto query can join data across multiple sources without moving it.

Presto targets data engineers, analysts, and platform teams who need fast ad-hoc queries on large datasets. It separates compute from storage, meaning you do not need to load data into a separate analytical database first.

§02

How it saves time or tokens

Without Presto, querying across multiple data sources requires ETL pipelines to move data into a single warehouse. Presto federates queries directly against the source systems, eliminating data movement. Queries that would take minutes in batch-oriented engines like Hive return results in seconds through Presto's in-memory execution model.

§03

How to use

Deploy Presto with a coordinator and one or more worker nodes.
Configure connectors for your data sources (S3, MySQL, PostgreSQL, etc.).
Run SQL queries via the Presto CLI, JDBC, or any SQL client.

# Start Presto CLI
presto --server localhost:8080 --catalog hive --schema default

# Query data in S3 via Hive connector
SELECT date, count(*) as events
FROM hive.analytics.events
WHERE date >= DATE '2026-04-01'
GROUP BY date
ORDER BY date;

# Federated query: join S3 data with MySQL
SELECT u.name, count(e.event_id)
FROM hive.analytics.events e
JOIN mysql.app.users u ON e.user_id = u.id
GROUP BY u.name
ORDER BY count(e.event_id) DESC
LIMIT 10;

§04

Example

-- Create a table backed by S3 Parquet files
CREATE TABLE hive.analytics.pageviews (
  page_url VARCHAR,
  user_id BIGINT,
  timestamp TIMESTAMP
)
WITH (
  format = 'PARQUET',
  external_location = 's3://my-bucket/pageviews/'
);

-- Query it immediately
SELECT page_url, count(*) as views
FROM hive.analytics.pageviews
WHERE timestamp >= TIMESTAMP '2026-04-01 00:00:00'
GROUP BY page_url
ORDER BY views DESC
LIMIT 20;

§05

Related on TokRepo

Database tools — Tools for database management and querying
DevOps tools — Infrastructure for deploying distributed systems

§06

Common pitfalls

Presto is not designed for ETL or long-running batch jobs. Queries that process terabytes of data may hit memory limits on workers. Use appropriate partitioning and predicate pushdown.
The Hive connector requires a Hive Metastore service for table metadata, even if you do not use Hive for processing. This is an extra component to deploy and maintain.
Federated queries across slow connectors (e.g., REST APIs) can bottleneck the entire query. Materialize slow sources into a fast store before joining.

Frequently Asked Questions

What is the difference between Presto and Trino?+

Trino is a fork of Presto created by the original Presto creators after they left Facebook. Both projects share the same core architecture. Trino has a more active open-source community and faster release cadence. Presto continues to be developed by Meta. Feature sets are similar but diverging over time.

Can Presto replace my data warehouse?+

Presto can serve as a query layer on top of your data lake, reducing the need for a separate warehouse for ad-hoc analytics. However, it does not provide built-in storage, indexing, or data management features that warehouses like Snowflake or BigQuery offer. It complements rather than replaces a warehouse.

What data sources can Presto connect to?+

Presto has connectors for HDFS, S3, MySQL, PostgreSQL, MongoDB, Elasticsearch, Kafka, Redis, Cassandra, Google Sheets, and many more. Custom connectors can be built using the Presto SPI (Service Provider Interface).

How does Presto handle large queries?+

Presto distributes query execution across worker nodes. Each worker processes a partition of the data in parallel. For very large queries, you can add more workers to increase parallelism. Memory-intensive queries may require spill-to-disk configuration to avoid out-of-memory failures.

Is Presto suitable for real-time queries?+

Presto is designed for interactive (seconds-level) latency on analytical queries. It is not a real-time streaming engine. For sub-second latency, consider a streaming database. Presto excels at ad-hoc exploratory queries where response times of 1-30 seconds are acceptable.

Citations (3)

Presto GitHub— Presto is a distributed SQL query engine for interactive analytics
Presto Documentation— Presto architecture and connector documentation
Meta Engineering Blog— Presto was developed at Facebook for interactive analytics at scale

Related on TokRepo

Database tools DevOps tools Featured workflows

Discussion

No comments yet. Be the first to share your thoughts.

Presto — Distributed SQL Engine for Interactive Analytics

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Discussion

Related Assets

NAPI-RS — Build Node.js Native Addons in Rust

Mamba — Fast Cross-Platform Package Manager

Plasmo — The Browser Extension Framework