SkillsApr 15, 2026·3 min read

Trino — Fast Distributed SQL Query Engine for Data Lakes

The federated SQL engine formerly known as PrestoSQL. Query S3/HDFS/Iceberg/Delta/Hudi, MySQL, Postgres, Kafka, Cassandra and dozens more with ANSI SQL — in seconds, at petabyte scale.

AI Open Source · Community

Agent ready

Ready-to-run agent install

This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.

Native · 98/100Policy: allow

Agent surface

Any MCP/CLI agent

Kind

Skill

Install

Single

Trust

Trust: Established

Entrypoint

Trino Guide

Direct install command

npx -y tokrepo@latest install 976e6a2f-3920-11f1-9bc6-00163e2b0d79 --target codex

Run after dry-run confirms the install plan.

TL;DR

Federated SQL engine that queries S3, Iceberg, Delta, MySQL, Postgres, Kafka, and dozens more data sources with ANSI SQL at scale.

§01

What it is

Trino (formerly PrestoSQL) is a fast distributed SQL query engine designed for interactive analytics on data lakes and federated data sources. It executes ANSI SQL queries across Amazon S3, Apache Iceberg, Delta Lake, Hudi, MySQL, PostgreSQL, Kafka, Cassandra, and dozens of other connectors.

Trino targets data engineers and analysts who need to query data wherever it lives without moving it into a single warehouse. A single Trino query can join tables from S3 and PostgreSQL in real time.

§02

How it saves time or tokens

Trino eliminates the need for ETL pipelines that copy data between systems. Query data in place using standard SQL, reducing data movement costs and latency. Interactive queries return results in seconds rather than the minutes or hours typical of batch ETL.

For AI data pipelines, Trino provides a single SQL interface to all data sources. LLMs can generate standard SQL without needing to know the underlying storage format.

§03

How to use

Deploy Trino with Docker:

docker run -d -p 8080:8080 --name trino trinodb/trino

Connect with the Trino CLI:

trino --server localhost:8080 --catalog hive --schema default

Query data across sources:

SELECT o.order_id, c.name, o.total
FROM hive.sales.orders o
JOIN postgresql.public.customers c ON o.customer_id = c.id
WHERE o.created_at > DATE '2026-01-01'
ORDER BY o.total DESC
LIMIT 10;

Configure catalogs for each data source in etc/catalog/ directory.

§04

Example

-- Query Iceberg tables on S3
SELECT date_trunc('month', event_time) AS month,
       count(*) AS events,
       count(DISTINCT user_id) AS users
FROM iceberg.analytics.events
WHERE event_time >= DATE '2026-01-01'
GROUP BY 1
ORDER BY 1;

§05

Related on TokRepo

AI Tools for Database — Database query tools and engines
AI Tools for Automation — Data pipeline automation tools

§06

Common pitfalls

Not configuring memory limits properly. Trino queries can consume large amounts of memory for joins and aggregations. Set query memory limits and kill queries that exceed thresholds.
Using Trino for transactional workloads. Trino is designed for analytical queries, not OLTP. It does not support UPDATE or DELETE on most connectors.
Ignoring connector-specific optimizations. Each connector has different pushdown capabilities. Learn which filters and aggregations each connector can handle natively for better performance.
Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.

Frequently Asked Questions

What is the difference between Trino and Presto?+

Trino is the continuation of PrestoSQL, created by the original Presto developers after they left Facebook. PrestoDB is a separate fork maintained by the Presto Foundation. Both share the same origins but have diverged. Trino has a more active community and faster release cadence.

Can Trino replace a data warehouse?+

Trino can serve as a query engine for lakehouse architectures, querying data on S3 with Iceberg or Delta Lake. For some workloads, this replaces traditional data warehouses. However, Trino does not manage data storage or optimize table layouts. Pair it with Iceberg for a complete lakehouse.

How does Trino handle federation?+

Trino uses a connector architecture. Each data source has a connector that translates SQL into source-native operations. A single query can reference tables from different connectors, and Trino handles the join across sources in memory.

Does Trino support real-time data?+

Trino has connectors for Kafka and other streaming systems. You can query real-time data alongside historical data in the same SQL query. However, Trino is not a streaming engine; it executes point-in-time queries.

What scale can Trino handle?+

Trino scales horizontally by adding worker nodes. Production deployments handle petabytes of data and thousands of concurrent queries. Companies like Netflix, LinkedIn, and Lyft use Trino for large-scale analytics.

Citations (3)

Trino GitHub— Trino is a fast distributed SQL query engine formerly known as PrestoSQL
Trino Documentation— Trino documentation and connector reference
Apache Iceberg— Apache Iceberg table format for data lakes

Related on TokRepo

Database Tools Automation Tools Featured Workflows

Discussion

No comments yet. Be the first to share your thoughts.

Related Assets

Arroyo — Distributed Stream Processing Engine in Rust

A Rust-based distributed stream processing engine that lets you write SQL or Rust pipelines for real-time data transformation over Kafka, Kinesis, and other sources.

Skills

AI Open Source

Apache DataFusion — Fast In-Process SQL Query Engine in Rust

An extensible query engine written in Rust that uses Apache Arrow as its in-memory format, enabling fast analytical SQL queries embeddable in any application.

Skills

Apache Software Foundation

Apache Hive — Distributed Data Warehouse for Big Data Analytics

Apache Hive is a data warehouse system built on Hadoop that provides SQL-like querying (HiveQL) over large datasets stored in distributed storage. It translates SQL queries into MapReduce, Tez, or Spark jobs for scalable batch analytics.

Skills

Script Depot

Cadence — Distributed Workflow Execution Engine by Uber

Cadence is a distributed, scalable, fault-tolerant workflow orchestration engine developed by Uber for executing long-running business logic as durable, stateful workflows that survive process and infrastructure failures.

Skills

AI Open Source