# Apache SeaTunnel — High-Performance Data Integration Engine > Fast, distributed, cloud-native data integration tool for batch and streaming data synchronization across 100+ sources and sinks. ## Install Save as a script file and run: # Apache SeaTunnel — Scalable Data Integration for Batch and Streaming ## Quick Use ```bash # Download SeaTunnel (Zeta engine) wget https://dlcdn.apache.org/seatunnel/2.3.5/apache-seatunnel-2.3.5-bin.tar.gz tar xzf apache-seatunnel-2.3.5-bin.tar.gz && cd apache-seatunnel-2.3.5 # Install connector plugins (example: jdbc, kafka, clickhouse) sh bin/install-plugin.sh 2.3.5 # Run a job with the Zeta engine ./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local ``` Example config: ```hocon env { parallelism = 2; job.mode = "BATCH" } source { Jdbc { url = "jdbc:mysql://db/app", query = "SELECT id, name FROM users" } } transform { Sql { sql = "select id, upper(name) as name from source" } } sink { Console {} } ``` ## Introduction Apache SeaTunnel is a high-performance, distributed data integration platform that moves huge amounts of data between heterogeneous systems — databases, data lakes, message queues, SaaS APIs, and file stores — for batch or streaming workloads. Its pluggable connector architecture and Zeta engine make it a modern alternative to Sqoop, DataX, and traditional ETL tools. ## What SeaTunnel Does - Synchronizes data across 100+ sources/sinks: MySQL, Postgres, Kafka, Iceberg, Hudi, S3, ClickHouse, MongoDB, Elasticsearch, and more. - Runs batch and streaming jobs with exactly-once semantics. - Supports CDC ingestion from MySQL, Postgres, SQL Server, MongoDB, and Oracle. - Executes on its own "Zeta" engine or on Spark and Flink for big-data workloads. - Declares jobs with HOCON config — no code required for most sync scenarios. ## Architecture Overview A SeaTunnel job is a DAG of Source → Transform → Sink plugins. The job manager compiles the config, assigns tasks to task managers, and coordinates checkpoints. The Zeta engine provides native distributed execution with its own scheduler and KV state; alternatively, jobs can run on Flink or Spark engines. Connectors implement the Connector V2 API with parallel splits, schema inference, and exactly-once sinks. ## Self-Hosting & Configuration - Packaged as a tarball; run standalone, in a cluster, or on Kubernetes via Helm. - Use Zeta mode (`-e local` or `cluster`) for lightweight deployments, Flink/Spark for scale-out. - Add connectors with `install-plugin.sh`; plugins load from `connectors//`. - Provide credentials via HOCON includes or environment variables, avoiding plaintext in Git. - Monitor jobs via the SeaTunnel Web UI, REST API, Prometheus metrics, and OpenTelemetry. ## Key Features - Connector V2 API with unified batch + stream + CDC semantics. - Exactly-once state via checkpointing across all supported engines. - Schema evolution, dynamic routing, and conditional splits in the transform stage. - Pluggable engines: Zeta, Flink, and Spark — reuse existing cluster investments. - Full CDC suite with Debezium-powered connectors for major databases. ## Comparison with Similar Tools - **Airbyte** — Great SaaS connector catalog and UI; SeaTunnel optimizes for huge DB/lake throughput. - **Apache NiFi** — Flow-based GUI; SeaTunnel is config-first with stronger CDC and lakehouse support. - **Apache Gobblin** — LinkedIn's ingestion tool; SeaTunnel is newer and Flink/Spark-native. - **DataX** (Alibaba) — Batch only; SeaTunnel adds streaming, CDC, and cluster execution. - **Debezium** — Pure CDC; SeaTunnel embeds Debezium and adds transforms and many sinks. ## FAQ **Q:** Which engine should I pick? A: Zeta for lightweight, self-contained clusters. Flink for streaming at scale. Spark for giant batch jobs reusing Spark infra. **Q:** Does it support CDC from Postgres? A: Yes — via the `postgres-cdc` connector backed by Debezium, with snapshot and streaming phases. **Q:** Can I write custom connectors? A: Yes — implement the Connector V2 interfaces in Java/Scala; connectors load as plugins. **Q:** Is there a UI for non-engineers? A: The SeaTunnel Web sub-project offers a UI for creating and scheduling jobs. ## Sources - https://github.com/apache/seatunnel - https://seatunnel.apache.org/docs --- Source: https://tokrepo.com/en/workflows/b9625074-3931-11f1-9bc6-00163e2b0d79 Author: Script Depot