Configs2026年5月19日·1 分钟阅读

Flink CDC — Real-Time Change Data Capture for Apache Flink

Flink CDC is a streaming data integration framework built on Apache Flink. It captures row-level changes from databases like MySQL, PostgreSQL, and MongoDB in real time and delivers them as Flink DataStreams for processing, transformation, and synchronization.

Agent 就绪

这个资产可以被 Agent 直接读取和安装

TokRepo 同时提供通用 CLI 命令、安装契约、metadata JSON、按适配器生成的安装计划和原始内容链接,方便 Agent 判断适配度、风险和下一步动作。

Needs Confirmation · 64/100策略:需确认
Agent 入口
任意 MCP/CLI Agent
类型
Skill
安装
Single
信任
信任等级:Established
入口
Flink CDC Overview
通用 CLI 安装命令
npx tokrepo install 9ef19846-5318-11f1-9bc6-00163e2b0d79

Introduction

Flink CDC brings change-data-capture into the Apache Flink ecosystem. It reads database transaction logs (binlog, WAL, oplog) and converts them into Flink streams, enabling real-time ETL, data lake ingestion, and cross-database synchronization without custom glue code.

What Flink CDC Does

  • Reads binlog/WAL/oplog from MySQL, PostgreSQL, MongoDB, Oracle, SQL Server, and more
  • Delivers insert, update, and delete events as structured Flink DataStreams
  • Supports full snapshot followed by continuous incremental capture in a single job
  • Provides a YAML-based pipeline definition for codeless database-to-database sync
  • Handles schema evolution by propagating DDL changes downstream automatically

Architecture Overview

Flink CDC connectors embed Debezium engines within Flink source operators. On startup a snapshot reader performs a parallel chunked scan of existing data, then hands off to a binlog reader for ongoing changes. Events are checkpointed using Flink exactly-once semantics so no data is lost or duplicated, even across restarts.

Self-Hosting & Configuration

  • Deploy Apache Flink 1.18+ and add the appropriate CDC connector JARs to the lib directory
  • Configure source database credentials and binlog/WAL access permissions
  • Define a pipeline in YAML or write a Flink job in Java specifying source tables and sink targets
  • Tune parallelism and checkpoint intervals for throughput and latency requirements
  • Monitor via the Flink Web UI or integrate with Prometheus metrics

Key Features

  • Exactly-once processing semantics for reliable data delivery
  • Parallel snapshot reading using table chunk splitting for fast initial loads
  • Schema evolution support propagates ALTER TABLE changes to downstream sinks
  • Codeless YAML pipeline mode for common sync scenarios
  • Compatible with the full Apache Flink ecosystem including SQL, Table API, and DataStream API

Comparison with Similar Tools

  • Debezium — Standalone CDC platform using Kafka Connect; Flink CDC embeds Debezium inside Flink for tighter integration
  • Airbyte — General ELT platform with CDC connectors, but batch-oriented rather than continuous streaming
  • AWS DMS — Managed CDC service locked to AWS, not open source
  • Canal — Alibaba MySQL binlog reader focused on MySQL-only use cases
  • Maxwell — Lightweight MySQL-only binlog reader that writes to Kafka; no built-in transformation

FAQ

Q: Do I need Kafka to use Flink CDC? A: No. Flink CDC reads database logs directly without requiring an intermediate message queue.

Q: Which databases are supported? A: MySQL, PostgreSQL, MongoDB, Oracle, SQL Server, Db2, OceanBase, TiDB, and Vitess, with the list growing.

Q: Can Flink CDC handle schema changes automatically? A: Yes. The pipeline mode can propagate DDL changes like column additions to supported sinks.

Q: What is the difference between Flink CDC and Debezium? A: Flink CDC uses Debezium internally but runs inside the Flink runtime, giving you access to Flink SQL, exactly-once checkpointing, and the full Flink ecosystem.

Sources

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产