Configs2026年4月15日·1 分钟阅读

Trino — Fast Distributed SQL Query Engine for Data Lakes

The federated SQL engine formerly known as PrestoSQL. Query S3/HDFS/Iceberg/Delta/Hudi, MySQL, Postgres, Kafka, Cassandra and dozens more with ANSI SQL — in seconds, at petabyte scale.

Introduction

Trino is a high-performance distributed SQL engine built for interactive analytics across heterogeneous data sources. It is the project that split from PrestoSQL in 2020 and is used at Meta, Netflix, LinkedIn, Pinterest, Shopify and many others to query petabytes of data with ANSI SQL.

What Trino Does

  • Executes standard ANSI SQL across 60+ connectors (Hive, Iceberg, Delta, Hudi, Kafka, Kudu, MySQL, Postgres, Cassandra, Redis, Elasticsearch, S3, GCS, Oracle, MongoDB...).
  • Pushes predicates, projections and aggregates into source systems when possible.
  • Runs distributed joins, window functions, CTEs, recursive queries and SQL-native JSON.
  • Supports row- and column-level access control with Ranger, OPA or file rules.
  • Provides a REST API and JDBC/ODBC/Python/Go clients for BI and apps.

Architecture Overview

A coordinator parses and plans queries; stages of tasks run on a cluster of workers that exchange pages of columnar data between each other. Each connector provides metadata, splits and page sources so the engine can stream data in parallel. Trino is purely an execution engine — it stores nothing itself, so you can spin it up and down freely and scale horizontally on Kubernetes, EC2 or bare metal.

Self-Hosting & Configuration

  • Launch with Helm (trino/trino), Docker, or tarball install of one coordinator + N workers.
  • Declare connectors in etc/catalog/*.properties — e.g. iceberg.properties, postgres.properties.
  • Tune query.max-memory-per-node, task.writer-count and spill-to-disk for big joins.
  • Hook up auth: password file, LDAP, OAuth2, Kerberos or JWT; add impersonation rules.
  • Deploy Trino Gateway to route queries across multiple clusters or versions.

Key Features

  • Interactive latency (seconds) over PB-scale lakes thanks to columnar pipelined execution.
  • First-class Apache Iceberg and Delta Lake support: time-travel, schema evolution, MERGE INTO.
  • Fault-tolerant execution with exchange-manager when queries run for hours.
  • Full ANSI SQL — window functions, CTEs, geospatial, JSON, array and map types.
  • Dynamic filtering, cost-based optimiser and adaptive join reordering.

Comparison with Similar Tools

  • Presto (PrestoDB) — the sibling project at Meta; similar but different governance.
  • Apache Spark SQL — great for ETL; Trino is faster for interactive BI.
  • Dremio — commercial lakehouse query engine with Arrow and reflections.
  • StarRocks / Doris — MPP databases; they store data, Trino does not.
  • BigQuery / Snowflake — managed warehouses; Trino is the open-source federated alternative.

FAQ

Q: Trino vs Presto? A: Both descend from the same codebase. Trino is the original team's fork; Presto is Meta's continuation. Q: Can Trino handle ETL as well as BI? A: Yes, especially with fault-tolerant execution enabled for long queries. Q: Do I need Hive Metastore? A: For Hive and some Iceberg setups, yes — or use Iceberg REST catalog, Glue, Nessie or Unity. Q: Is Trino free for commercial use? A: Yes — Apache 2.0; commercial support is available from Starburst and others.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产