Introduction
Canal is an open-source incremental data subscription and consumption platform developed by Alibaba. It parses MySQL binlog events in real time, acting as a MySQL replica, and delivers row-level change events to downstream consumers for cache invalidation, search index updates, data warehousing, and cross-database synchronization.
What Canal Does
- Captures MySQL row-level changes (INSERT, UPDATE, DELETE) by parsing binary logs in real time
- Simulates a MySQL slave to subscribe to binlog streams without impacting the source database
- Delivers change events to Kafka, RocketMQ, RabbitMQ, or Elasticsearch as downstream sinks
- Supports filtering by database, table, and column to reduce unnecessary event processing
- Provides a client API for building custom change data capture consumers in Java
Architecture Overview
Canal Server connects to MySQL as a replication slave, receiving binlog events through the MySQL replication protocol. The server parses these binary events into structured row-change objects. Canal instances are organized by destination (one per source database). A Canal Client or Canal Adapter connects to the server and consumes parsed events. The Admin component provides a web UI for managing instances and monitoring lag.
Self-Hosting & Configuration
- Enable binlog on MySQL with ROW format and create a replication user for Canal
- Deploy Canal Server and configure instance.properties with MySQL connection details
- Use Canal Admin for web-based instance management and monitoring
- Configure Canal Adapter to sink changes directly to Elasticsearch, HBase, or RDB targets
- Deploy Canal in cluster mode with ZooKeeper for high availability and failover
Key Features
- Near-zero latency change data capture with sub-second binlog parsing
- Cluster mode with ZooKeeper-based HA for automatic failover between Canal instances
- Built-in adapters for Elasticsearch, RDB, HBase, and Kafka without custom code
- Position tracking and resumption to handle restarts without data loss
- Support for MySQL, MariaDB, and PolarDB-X as source databases
Comparison with Similar Tools
- Debezium — Kafka Connect-based CDC; Canal is standalone and lighter for MySQL-only workloads
- Maxwell — MySQL CDC to Kafka; Canal offers more sinks, clustering, and a management UI
- MySQL Replication — Native replication syncs whole databases; Canal enables selective, event-driven consumption
- AWS DMS — Managed migration service; Canal is self-hosted with no cloud vendor dependency
- Flink CDC — Stream processing CDC; Canal focuses on capture and delivery, pairs well with Flink downstream
FAQ
Q: Does Canal modify the source MySQL database? A: No. Canal connects as a read-only replication slave. It only reads binlog events and never writes to the source.
Q: What MySQL binlog format does Canal require? A: Canal requires ROW format binlog. STATEMENT and MIXED formats do not provide the row-level detail Canal needs.
Q: Can Canal handle schema changes (DDL)? A: Yes. Canal parses DDL events and updates its internal schema cache. Consumers receive DDL events alongside DML changes.
Q: How does Canal compare to Debezium for MySQL CDC? A: Canal is lighter and standalone for MySQL-focused use cases. Debezium supports more databases and integrates deeply with Kafka Connect for broader ecosystems.