# Apache SeaTunnel — High-Performance Data Integration Engine

> Fast, distributed, cloud-native data integration tool for batch and streaming data synchronization across 100+ sources and sinks.

## Install

Save as a script file and run:

# Apache SeaTunnel — Scalable Data Integration for Batch and Streaming

## Quick Use
```bash
# Download SeaTunnel (Zeta engine)
wget https://dlcdn.apache.org/seatunnel/2.3.5/apache-seatunnel-2.3.5-bin.tar.gz
tar xzf apache-seatunnel-2.3.5-bin.tar.gz && cd apache-seatunnel-2.3.5

# Install connector plugins (example: jdbc, kafka, clickhouse)
sh bin/install-plugin.sh 2.3.5

# Run a job with the Zeta engine
./bin/seatunnel.sh --config ./config/v2.batch.config.template -e local
```

Example config:
```hocon
env { parallelism = 2; job.mode = "BATCH" }
source { Jdbc { url = "jdbc:mysql://db/app", query = "SELECT id, name FROM users" } }
transform { Sql { sql = "select id, upper(name) as name from source" } }
sink { Console {} }
```

## Introduction
Apache SeaTunnel is a high-performance, distributed data integration platform that moves huge amounts of data between heterogeneous systems — databases, data lakes, message queues, SaaS APIs, and file stores — for batch or streaming workloads. Its pluggable connector architecture and Zeta engine make it a modern alternative to Sqoop, DataX, and traditional ETL tools.

## What SeaTunnel Does
- Synchronizes data across 100+ sources/sinks: MySQL, Postgres, Kafka, Iceberg, Hudi, S3, ClickHouse, MongoDB, Elasticsearch, and more.
- Runs batch and streaming jobs with exactly-once semantics.
- Supports CDC ingestion from MySQL, Postgres, SQL Server, MongoDB, and Oracle.
- Executes on its own "Zeta" engine or on Spark and Flink for big-data workloads.
- Declares jobs with HOCON config — no code required for most sync scenarios.

## Architecture Overview
A SeaTunnel job is a DAG of Source → Transform → Sink plugins. The job manager compiles the config, assigns tasks to task managers, and coordinates checkpoints. The Zeta engine provides native distributed execution with its own scheduler and KV state; alternatively, jobs can run on Flink or Spark engines. Connectors implement the Connector V2 API with parallel splits, schema inference, and exactly-once sinks.

## Self-Hosting & Configuration
- Packaged as a tarball; run standalone, in a cluster, or on Kubernetes via Helm.
- Use Zeta mode (`-e local` or `cluster`) for lightweight deployments, Flink/Spark for scale-out.
- Add connectors with `install-plugin.sh`; plugins load from `connectors/<engine>/`.
- Provide credentials via HOCON includes or environment variables, avoiding plaintext in Git.
- Monitor jobs via the SeaTunnel Web UI, REST API, Prometheus metrics, and OpenTelemetry.

## Key Features
- Connector V2 API with unified batch + stream + CDC semantics.
- Exactly-once state via checkpointing across all supported engines.
- Schema evolution, dynamic routing, and conditional splits in the transform stage.
- Pluggable engines: Zeta, Flink, and Spark — reuse existing cluster investments.
- Full CDC suite with Debezium-powered connectors for major databases.

## Comparison with Similar Tools
- **Airbyte** — Great SaaS connector catalog and UI; SeaTunnel optimizes for huge DB/lake throughput.
- **Apache NiFi** — Flow-based GUI; SeaTunnel is config-first with stronger CDC and lakehouse support.
- **Apache Gobblin** — LinkedIn's ingestion tool; SeaTunnel is newer and Flink/Spark-native.
- **DataX** (Alibaba) — Batch only; SeaTunnel adds streaming, CDC, and cluster execution.
- **Debezium** — Pure CDC; SeaTunnel embeds Debezium and adds transforms and many sinks.

## FAQ
**Q:** Which engine should I pick?
A: Zeta for lightweight, self-contained clusters. Flink for streaming at scale. Spark for giant batch jobs reusing Spark infra.

**Q:** Does it support CDC from Postgres?
A: Yes — via the `postgres-cdc` connector backed by Debezium, with snapshot and streaming phases.

**Q:** Can I write custom connectors?
A: Yes — implement the Connector V2 interfaces in Java/Scala; connectors load as plugins.

**Q:** Is there a UI for non-engineers?
A: The SeaTunnel Web sub-project offers a UI for creating and scheduling jobs.

## Sources
- https://github.com/apache/seatunnel
- https://seatunnel.apache.org/docs

---
Source: https://tokrepo.com/en/workflows/b9625074-3931-11f1-9bc6-00163e2b0d79
Author: Script Depot