ConfigsApr 15, 2026·3 min read

Trino — Fast Distributed SQL Query Engine for Data Lakes

The federated SQL engine formerly known as PrestoSQL. Query S3/HDFS/Iceberg/Delta/Hudi, MySQL, Postgres, Kafka, Cassandra and dozens more with ANSI SQL — in seconds, at petabyte scale.

TL;DR
Federated SQL engine that queries S3, Iceberg, Delta, MySQL, Postgres, Kafka, and dozens more data sources with ANSI SQL at scale.
§01

What it is

Trino (formerly PrestoSQL) is a fast distributed SQL query engine designed for interactive analytics on data lakes and federated data sources. It executes ANSI SQL queries across Amazon S3, Apache Iceberg, Delta Lake, Hudi, MySQL, PostgreSQL, Kafka, Cassandra, and dozens of other connectors.

Trino targets data engineers and analysts who need to query data wherever it lives without moving it into a single warehouse. A single Trino query can join tables from S3 and PostgreSQL in real time.

§02

How it saves time or tokens

Trino eliminates the need for ETL pipelines that copy data between systems. Query data in place using standard SQL, reducing data movement costs and latency. Interactive queries return results in seconds rather than the minutes or hours typical of batch ETL.

For AI data pipelines, Trino provides a single SQL interface to all data sources. LLMs can generate standard SQL without needing to know the underlying storage format.

§03

How to use

  1. Deploy Trino with Docker:
docker run -d -p 8080:8080 --name trino trinodb/trino
  1. Connect with the Trino CLI:
trino --server localhost:8080 --catalog hive --schema default
  1. Query data across sources:
SELECT o.order_id, c.name, o.total
FROM hive.sales.orders o
JOIN postgresql.public.customers c ON o.customer_id = c.id
WHERE o.created_at > DATE '2026-01-01'
ORDER BY o.total DESC
LIMIT 10;
  1. Configure catalogs for each data source in etc/catalog/ directory.
§04

Example

-- Query Iceberg tables on S3
SELECT date_trunc('month', event_time) AS month,
       count(*) AS events,
       count(DISTINCT user_id) AS users
FROM iceberg.analytics.events
WHERE event_time >= DATE '2026-01-01'
GROUP BY 1
ORDER BY 1;
§05

Related on TokRepo

§06

Common pitfalls

  • Not configuring memory limits properly. Trino queries can consume large amounts of memory for joins and aggregations. Set query memory limits and kill queries that exceed thresholds.
  • Using Trino for transactional workloads. Trino is designed for analytical queries, not OLTP. It does not support UPDATE or DELETE on most connectors.
  • Ignoring connector-specific optimizations. Each connector has different pushdown capabilities. Learn which filters and aggregations each connector can handle natively for better performance.
  • Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.

Frequently Asked Questions

What is the difference between Trino and Presto?+

Trino is the continuation of PrestoSQL, created by the original Presto developers after they left Facebook. PrestoDB is a separate fork maintained by the Presto Foundation. Both share the same origins but have diverged. Trino has a more active community and faster release cadence.

Can Trino replace a data warehouse?+

Trino can serve as a query engine for lakehouse architectures, querying data on S3 with Iceberg or Delta Lake. For some workloads, this replaces traditional data warehouses. However, Trino does not manage data storage or optimize table layouts. Pair it with Iceberg for a complete lakehouse.

How does Trino handle federation?+

Trino uses a connector architecture. Each data source has a connector that translates SQL into source-native operations. A single query can reference tables from different connectors, and Trino handles the join across sources in memory.

Does Trino support real-time data?+

Trino has connectors for Kafka and other streaming systems. You can query real-time data alongside historical data in the same SQL query. However, Trino is not a streaming engine; it executes point-in-time queries.

What scale can Trino handle?+

Trino scales horizontally by adding worker nodes. Production deployments handle petabytes of data and thousands of concurrent queries. Companies like Netflix, LinkedIn, and Lyft use Trino for large-scale analytics.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets