ConfigsApr 16, 2026·3 min read

Debezium — Real-Time Change Data Capture Platform

A distributed platform for streaming database changes into event logs, capturing row-level inserts, updates, and deletes from MySQL, PostgreSQL, MongoDB, and more.

Introduction

Debezium is an open-source distributed platform for change data capture (CDC). It monitors database transaction logs and streams every row-level change as an event to Apache Kafka or other messaging systems, enabling real-time data pipelines without modifying application code.

What Debezium Does

  • Captures row-level INSERT, UPDATE, and DELETE events from database transaction logs
  • Supports MySQL, PostgreSQL, MongoDB, SQL Server, Oracle, Cassandra, and Db2
  • Streams change events to Kafka topics with exactly-once delivery semantics
  • Provides before and after snapshots of changed rows in each event
  • Handles initial snapshots of existing data before switching to streaming mode

Architecture Overview

Debezium runs as Kafka Connect source connectors. Each connector reads the database's write-ahead log (WAL in Postgres, binlog in MySQL) and converts changes into structured events with a consistent envelope format. The connector tracks offsets in Kafka so it can resume after failures. A schema history topic stores DDL changes to correctly interpret row data as schemas evolve over time.

Self-Hosting & Configuration

  • Deploy as Kafka Connect connectors in an existing Kafka cluster
  • Use Debezium Server for standalone operation without Kafka Connect infrastructure
  • Configure database connection, topic routing, and snapshot mode per connector
  • Set up schema registry (Confluent or Apicurio) for Avro or Protobuf serialization
  • Use signal tables and incremental snapshots for re-snapshotting without downtime

Key Features

  • Log-based CDC with no polling, no triggers, and no application code changes
  • Exactly-once semantics when combined with Kafka transactions
  • Schema evolution tracking with automatic topic schema updates
  • Single Message Transforms (SMTs) for filtering, routing, and reshaping events
  • Debezium UI for visual connector management and monitoring

Comparison with Similar Tools

  • Maxwell — MySQL-only CDC; Debezium supports 8+ database types
  • Canal — Alibaba MySQL binlog parser; Debezium provides a broader connector ecosystem
  • AWS DMS — managed service with CDC; Debezium is self-hosted and open source
  • Airbyte — batch-first ELT platform; Debezium is real-time stream-first
  • Fivetran — managed SaaS CDC; Debezium gives full control over infrastructure

FAQ

Q: Does Debezium require Kafka? A: Not necessarily. Debezium Server can send events directly to Redis, Pulsar, Kinesis, or HTTP endpoints without Kafka.

Q: How does CDC differ from triggers or polling? A: CDC reads the transaction log directly, adding zero overhead to the database. Triggers add write latency, and polling misses intermediate states between intervals.

Q: Can Debezium handle schema changes? A: Yes. Debezium tracks DDL changes in a schema history topic and applies them to correctly serialize events as tables evolve.

Q: What happens if the connector falls behind? A: Debezium maintains offsets and will catch up by reading from the log. If the log has been purged, an incremental snapshot can re-capture the data.

Sources

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets