Change Data Capture
How systems listen to a database's internal changelog to react to every insert, update, and delete without polling the database directly.
The Polling Problem
Suppose you have a users table in your primary Postgres database, and you need to keep an Elasticsearch search index in sync. The naive approach is to poll: every 10 seconds, query for rows updated in the last 10 seconds and re-index them.
This works until it doesn't. You miss updates that happen between polls. Deletes are invisible. You add load to your database continuously. And at high volume, a 10-second lag is often too much.
Change Data Capture (CDC) solves this by reading the database's own internal record of every write, the same log the database already uses internally for crash recovery.
How Databases Record Writes
Before any write commits, most production databases append it to a Write-Ahead Log (WAL). The WAL is append-only and sequential, which is why it is fast. If the database crashes mid-write, it replays the WAL on restart to recover. Replication between a primary and its replicas also works by streaming this log.
CDC tools tap into this same stream. Instead of querying the table, they read the log and emit an event for every change: an insert with the new row's data, an update with before and after values, or a delete with the deleted row's identifier.
What a CDC Event Looks Like
Each event carries the operation type and the full row data before and after the change. A Debezium event for an updated user might look like:
{
"op": "u",
"before": { "id": 42, "email": "alice@old.com" },
"after": { "id": 42, "email": "alice@new.com" },
"source": { "table": "users", "lsn": 190234 }
}
Consumers subscribe to the topic for their table and react however they need: reindex the document, invalidate a cache key, update a replica in another region.
Common Use Cases
Search index sync: Keep Elasticsearch or Algolia in sync with your database without dual-writes or polling. Every row change flows into the search index automatically.
Data warehouse replication: Stream production database changes into a data warehouse (Snowflake, BigQuery) for analytics. No nightly batch jobs.
Cache invalidation: When a row updates, immediately evict the corresponding cache key. More reliable than TTL-based expiry.
Audit logs: Record every change to sensitive data (user records, payment data) in a tamper-evident log, without modifying the application code.
Cross-service sync: In a microservices architecture, one service's database changes can trigger updates in another service's own database, without the two services coupling directly to each other.
The Transactional Outbox Pattern
A related pattern worth knowing: sometimes you want to write to your database and publish a message to a queue atomically. If the service crashes between the two, you end up in an inconsistent state.
The Transactional Outbox pattern solves this by writing the message into an outbox table in the same database transaction as your main write. A CDC consumer then reads the outbox table and publishes to the message queue. Since everything lives in the same database transaction, it is either all committed or all rolled back.
Tools
Debezium is the most widely used open-source CDC connector, built on top of Kafka Connect. AWS DMS (Database Migration Service) supports CDC for migrations. PlanetScale's Vitess has built-in CDC. For Postgres specifically, the pg_logical extension and pgoutput plugin power most CDC implementations.
Further Reading
- Change Data Capture explained (Confluent) — Confluent's overview of CDC patterns and Kafka integration
- Debezium documentation — how to set up CDC from Postgres, MySQL, or MongoDB
- The Transactional Outbox pattern — atomically combining a database write with an event publication
- Postgres WAL explained — Crunchy Data's explanation of how Postgres's write-ahead log works
Prerequisites
Continue learning
ACID & Isolation Levels
Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.
API Gateways
Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.
API Security & OAuth 2.0
Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.