Esta página se muestra en inglés. Una traducción al español está en curso.
ConfigsMay 19, 2026·3 min de lectura

Flink CDC — Real-Time Change Data Capture for Apache Flink

Flink CDC is a streaming data integration framework built on Apache Flink. It captures row-level changes from databases like MySQL, PostgreSQL, and MongoDB in real time and delivers them as Flink DataStreams for processing, transformation, and synchronization.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Needs Confirmation · 64/100Política: confirmar
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Single
Confianza
Confianza: Established
Entrada
Flink CDC Overview
Comando CLI universal
npx tokrepo install 9ef19846-5318-11f1-9bc6-00163e2b0d79

Introduction

Flink CDC brings change-data-capture into the Apache Flink ecosystem. It reads database transaction logs (binlog, WAL, oplog) and converts them into Flink streams, enabling real-time ETL, data lake ingestion, and cross-database synchronization without custom glue code.

What Flink CDC Does

  • Reads binlog/WAL/oplog from MySQL, PostgreSQL, MongoDB, Oracle, SQL Server, and more
  • Delivers insert, update, and delete events as structured Flink DataStreams
  • Supports full snapshot followed by continuous incremental capture in a single job
  • Provides a YAML-based pipeline definition for codeless database-to-database sync
  • Handles schema evolution by propagating DDL changes downstream automatically

Architecture Overview

Flink CDC connectors embed Debezium engines within Flink source operators. On startup a snapshot reader performs a parallel chunked scan of existing data, then hands off to a binlog reader for ongoing changes. Events are checkpointed using Flink exactly-once semantics so no data is lost or duplicated, even across restarts.

Self-Hosting & Configuration

  • Deploy Apache Flink 1.18+ and add the appropriate CDC connector JARs to the lib directory
  • Configure source database credentials and binlog/WAL access permissions
  • Define a pipeline in YAML or write a Flink job in Java specifying source tables and sink targets
  • Tune parallelism and checkpoint intervals for throughput and latency requirements
  • Monitor via the Flink Web UI or integrate with Prometheus metrics

Key Features

  • Exactly-once processing semantics for reliable data delivery
  • Parallel snapshot reading using table chunk splitting for fast initial loads
  • Schema evolution support propagates ALTER TABLE changes to downstream sinks
  • Codeless YAML pipeline mode for common sync scenarios
  • Compatible with the full Apache Flink ecosystem including SQL, Table API, and DataStream API

Comparison with Similar Tools

  • Debezium — Standalone CDC platform using Kafka Connect; Flink CDC embeds Debezium inside Flink for tighter integration
  • Airbyte — General ELT platform with CDC connectors, but batch-oriented rather than continuous streaming
  • AWS DMS — Managed CDC service locked to AWS, not open source
  • Canal — Alibaba MySQL binlog reader focused on MySQL-only use cases
  • Maxwell — Lightweight MySQL-only binlog reader that writes to Kafka; no built-in transformation

FAQ

Q: Do I need Kafka to use Flink CDC? A: No. Flink CDC reads database logs directly without requiring an intermediate message queue.

Q: Which databases are supported? A: MySQL, PostgreSQL, MongoDB, Oracle, SQL Server, Db2, OceanBase, TiDB, and Vitess, with the list growing.

Q: Can Flink CDC handle schema changes automatically? A: Yes. The pipeline mode can propagate DDL changes like column additions to supported sinks.

Q: What is the difference between Flink CDC and Debezium? A: Flink CDC uses Debezium internally but runs inside the Flink runtime, giving you access to Flink SQL, exactly-once checkpointing, and the full Flink ecosystem.

Sources

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados