ConfigsApr 16, 2026·3 min read

Debezium — Real-Time Change Data Capture Platform

A distributed platform for streaming database changes into event logs, capturing row-level inserts, updates, and deletes from MySQL, PostgreSQL, MongoDB, and more.

TL;DR
Debezium captures row-level database changes and streams them to Kafka for real-time data pipelines.
§01

What it is

Debezium is a distributed platform for change data capture (CDC). It monitors database transaction logs and streams row-level inserts, updates, and deletes into Apache Kafka topics. Debezium supports MySQL, PostgreSQL, MongoDB, SQL Server, Oracle, Cassandra, and Db2.

Debezium targets data engineers and platform teams building real-time data pipelines, event-driven architectures, cache invalidation systems, and data warehouse synchronization.

§02

How it saves time or tokens

Debezium eliminates polling-based data synchronization. Instead of querying databases on an interval to detect changes, Debezium reads the transaction log and emits changes as they happen. This reduces database load, eliminates missed changes between poll intervals, and provides sub-second latency. The Kafka Connect architecture means you configure connectors declaratively without writing code.

§03

How to use

  1. Start the required infrastructure:
docker run -d --name zookeeper -p 2181:2181 quay.io/debezium/zookeeper
docker run -d --name kafka -p 9092:9092 \
  --link zookeeper quay.io/debezium/kafka
docker run -d --name connect -p 8083:8083 \
  --link kafka --link zookeeper quay.io/debezium/connect
  1. Register a MySQL connector:
curl -X POST http://localhost:8083/connectors -H 'Content-Type: application/json' -d '{
  "name": "mysql-connector",
  "config": {
    "connector.class": "io.debezium.connector.mysql.MySqlConnector",
    "database.hostname": "mysql",
    "database.port": "3306",
    "database.user": "debezium",
    "database.password": "dbz",
    "database.server.id": "1",
    "topic.prefix": "dbserver1",
    "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
    "schema.history.internal.kafka.topic": "schema-changes"
  }
}'
  1. Consume change events from Kafka topics named dbserver1.<database>.<table>.
§04

Example

A Debezium change event JSON structure:

{
  "before": {"id": 1, "name": "Alice", "email": "alice@old.com"},
  "after": {"id": 1, "name": "Alice", "email": "alice@new.com"},
  "source": {"db": "inventory", "table": "customers"},
  "op": "u",
  "ts_ms": 1713000000000
}
§05

Related on TokRepo

§06

Common pitfalls

  • MySQL requires binlog_format=ROW and binlog_row_image=FULL. Without these settings, Debezium cannot capture complete change events.
  • Initial snapshots of large tables can take hours and put load on the source database. Schedule initial snapshots during low-traffic periods.
  • Kafka topic retention must outlast your downstream consumer lag. If consumers fall behind, they lose events when topics are compacted.

Frequently Asked Questions

Does Debezium require Kafka?+

The primary deployment uses Kafka Connect. However, Debezium Server provides a standalone runtime that can send events to Amazon Kinesis, Google Pub/Sub, Apache Pulsar, and other messaging systems without Kafka.

Which databases does Debezium support?+

Debezium supports MySQL, PostgreSQL, MongoDB, SQL Server, Oracle, Db2, Cassandra, and Vitess. Each database has a dedicated connector that reads its specific transaction log format.

How does CDC differ from polling?+

CDC reads the database transaction log to capture every change in order with sub-second latency. Polling queries the database on an interval, missing changes between polls and adding query load to the database.

Can Debezium handle schema changes?+

Yes. Debezium tracks schema changes through the transaction log and records them in a schema history topic. Downstream consumers can detect when columns are added, removed, or modified.

What happens if the connector goes down?+

Debezium stores its position in the transaction log in Kafka Connect offsets. When the connector restarts, it resumes from the last committed offset without missing or duplicating events.

Citations (3)

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets