Trino — Fast Distributed SQL Query Engine for Data Lakes
The federated SQL engine formerly known as PrestoSQL. Query S3/HDFS/Iceberg/Delta/Hudi, MySQL, Postgres, Kafka, Cassandra and dozens more with ANSI SQL — in seconds, at petabyte scale.
What it is
Trino (formerly PrestoSQL) is a fast distributed SQL query engine designed for interactive analytics on data lakes and federated data sources. It executes ANSI SQL queries across Amazon S3, Apache Iceberg, Delta Lake, Hudi, MySQL, PostgreSQL, Kafka, Cassandra, and dozens of other connectors.
Trino targets data engineers and analysts who need to query data wherever it lives without moving it into a single warehouse. A single Trino query can join tables from S3 and PostgreSQL in real time.
How it saves time or tokens
Trino eliminates the need for ETL pipelines that copy data between systems. Query data in place using standard SQL, reducing data movement costs and latency. Interactive queries return results in seconds rather than the minutes or hours typical of batch ETL.
For AI data pipelines, Trino provides a single SQL interface to all data sources. LLMs can generate standard SQL without needing to know the underlying storage format.
How to use
- Deploy Trino with Docker:
docker run -d -p 8080:8080 --name trino trinodb/trino
- Connect with the Trino CLI:
trino --server localhost:8080 --catalog hive --schema default
- Query data across sources:
SELECT o.order_id, c.name, o.total
FROM hive.sales.orders o
JOIN postgresql.public.customers c ON o.customer_id = c.id
WHERE o.created_at > DATE '2026-01-01'
ORDER BY o.total DESC
LIMIT 10;
- Configure catalogs for each data source in
etc/catalog/directory.
Example
-- Query Iceberg tables on S3
SELECT date_trunc('month', event_time) AS month,
count(*) AS events,
count(DISTINCT user_id) AS users
FROM iceberg.analytics.events
WHERE event_time >= DATE '2026-01-01'
GROUP BY 1
ORDER BY 1;
Related on TokRepo
- AI Tools for Database — Database query tools and engines
- AI Tools for Automation — Data pipeline automation tools
Common pitfalls
- Not configuring memory limits properly. Trino queries can consume large amounts of memory for joins and aggregations. Set query memory limits and kill queries that exceed thresholds.
- Using Trino for transactional workloads. Trino is designed for analytical queries, not OLTP. It does not support UPDATE or DELETE on most connectors.
- Ignoring connector-specific optimizations. Each connector has different pushdown capabilities. Learn which filters and aggregations each connector can handle natively for better performance.
- Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.
Frequently Asked Questions
Trino is the continuation of PrestoSQL, created by the original Presto developers after they left Facebook. PrestoDB is a separate fork maintained by the Presto Foundation. Both share the same origins but have diverged. Trino has a more active community and faster release cadence.
Trino can serve as a query engine for lakehouse architectures, querying data on S3 with Iceberg or Delta Lake. For some workloads, this replaces traditional data warehouses. However, Trino does not manage data storage or optimize table layouts. Pair it with Iceberg for a complete lakehouse.
Trino uses a connector architecture. Each data source has a connector that translates SQL into source-native operations. A single query can reference tables from different connectors, and Trino handles the join across sources in memory.
Trino has connectors for Kafka and other streaming systems. You can query real-time data alongside historical data in the same SQL query. However, Trino is not a streaming engine; it executes point-in-time queries.
Trino scales horizontally by adding worker nodes. Production deployments handle petabytes of data and thousands of concurrent queries. Companies like Netflix, LinkedIn, and Lyft use Trino for large-scale analytics.
Citations (3)
- Trino GitHub— Trino is a fast distributed SQL query engine formerly known as PrestoSQL
- Trino Documentation— Trino documentation and connector reference
- Apache Iceberg— Apache Iceberg table format for data lakes
Related on TokRepo
Discussion
Related Assets
HumHub — Open-Source Enterprise Social Network
A flexible, open-source social networking platform built on Yii2 for creating private communities, intranets, and collaboration spaces within organizations.
Dolibarr — Open-Source ERP & CRM for Business Management
A modular open-source ERP and CRM application written in PHP for managing contacts, invoices, orders, inventory, accounting, and more from a single web interface.
PrestaShop — Open-Source PHP E-Commerce Platform
A widely adopted open-source e-commerce platform written in PHP with a rich module marketplace, multi-language support, and a strong European user base.