ConfigsMay 19, 2026·3 min read

Flink CDC — Real-Time Change Data Capture for Apache Flink

Flink CDC is a streaming data integration framework built on Apache Flink. It captures row-level changes from databases like MySQL, PostgreSQL, and MongoDB in real time and delivers them as Flink DataStreams for processing, transformation, and synchronization.

Agent ready

This asset can be read and installed directly by agents

TokRepo exposes a universal CLI command, install contract, metadata JSON, adapter-aware plan, and raw content links so agents can judge fit, risk, and next actions.

Needs Confirmation · 64/100Policy: confirm
Agent surface
Any MCP/CLI agent
Kind
Skill
Install
Single
Trust
Trust: Established
Entrypoint
Flink CDC Overview
Universal CLI install command
npx tokrepo install 9ef19846-5318-11f1-9bc6-00163e2b0d79

Introduction

Flink CDC brings change-data-capture into the Apache Flink ecosystem. It reads database transaction logs (binlog, WAL, oplog) and converts them into Flink streams, enabling real-time ETL, data lake ingestion, and cross-database synchronization without custom glue code.

What Flink CDC Does

  • Reads binlog/WAL/oplog from MySQL, PostgreSQL, MongoDB, Oracle, SQL Server, and more
  • Delivers insert, update, and delete events as structured Flink DataStreams
  • Supports full snapshot followed by continuous incremental capture in a single job
  • Provides a YAML-based pipeline definition for codeless database-to-database sync
  • Handles schema evolution by propagating DDL changes downstream automatically

Architecture Overview

Flink CDC connectors embed Debezium engines within Flink source operators. On startup a snapshot reader performs a parallel chunked scan of existing data, then hands off to a binlog reader for ongoing changes. Events are checkpointed using Flink exactly-once semantics so no data is lost or duplicated, even across restarts.

Self-Hosting & Configuration

  • Deploy Apache Flink 1.18+ and add the appropriate CDC connector JARs to the lib directory
  • Configure source database credentials and binlog/WAL access permissions
  • Define a pipeline in YAML or write a Flink job in Java specifying source tables and sink targets
  • Tune parallelism and checkpoint intervals for throughput and latency requirements
  • Monitor via the Flink Web UI or integrate with Prometheus metrics

Key Features

  • Exactly-once processing semantics for reliable data delivery
  • Parallel snapshot reading using table chunk splitting for fast initial loads
  • Schema evolution support propagates ALTER TABLE changes to downstream sinks
  • Codeless YAML pipeline mode for common sync scenarios
  • Compatible with the full Apache Flink ecosystem including SQL, Table API, and DataStream API

Comparison with Similar Tools

  • Debezium — Standalone CDC platform using Kafka Connect; Flink CDC embeds Debezium inside Flink for tighter integration
  • Airbyte — General ELT platform with CDC connectors, but batch-oriented rather than continuous streaming
  • AWS DMS — Managed CDC service locked to AWS, not open source
  • Canal — Alibaba MySQL binlog reader focused on MySQL-only use cases
  • Maxwell — Lightweight MySQL-only binlog reader that writes to Kafka; no built-in transformation

FAQ

Q: Do I need Kafka to use Flink CDC? A: No. Flink CDC reads database logs directly without requiring an intermediate message queue.

Q: Which databases are supported? A: MySQL, PostgreSQL, MongoDB, Oracle, SQL Server, Db2, OceanBase, TiDB, and Vitess, with the list growing.

Q: Can Flink CDC handle schema changes automatically? A: Yes. The pipeline mode can propagate DDL changes like column additions to supported sinks.

Q: What is the difference between Flink CDC and Debezium? A: Flink CDC uses Debezium internally but runs inside the Flink runtime, giving you access to Flink SQL, exactly-once checkpointing, and the full Flink ecosystem.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets