Message Queues
Message queues decouple producers from consumers in time, enabling async communication, load leveling, and fault isolation across distributed services.
The core problem
Imagine your order service calls the inventory service directly over HTTP. If inventory is slow, the order service waits. If inventory is down, the order call fails outright. The two services are coupled in time: neither can operate independently of the other.
A message queue breaks this coupling. Instead of calling inventory directly, the order service drops a message into a queue and immediately gets an acknowledgement from the broker. The inventory service picks up that message whenever it is ready. The caller is free to move on; the callee is free to process at its own pace.
The restaurant analogy
Think of a restaurant order-ticket system. When a customer orders, the waiter writes a ticket and clips it to the rail above the kitchen window. The waiter walks away and takes the next table. The kitchen picks up tickets when a cook is free. No one is blocked waiting at the kitchen window.
In software terms: the waiter is the producer, the ticket rail is the queue, and the kitchen is the consumer. The queue stores messages durably so that a brief kitchen slowdown (consumer lag) or even a restart doesn't lose any orders.
Architecture overview
Notice the key detail: the producer gets its acknowledgement from the broker (the queue), not from any consumer. This is what makes the decoupling real. Even if all three consumers are offline, the producer keeps publishing successfully and the messages pile up safely in the queue.
Delivery guarantees
Every messaging system makes a choice about what it promises when delivering a message.
- At-most-once: The broker fires the message and forgets it. If the consumer crashes before processing, the message is gone. Fast and simple, used for metrics or telemetry where some loss is acceptable.
- At-least-once: The broker retries delivery until the consumer sends an acknowledgement. The message will not be lost, but it may be delivered more than once if the ack is delayed. This is the default in Kafka, SQS, and RabbitMQ. Consumers must be idempotent: processing the same message twice should have the same net effect as processing it once.
- Exactly-once: The message is processed exactly once, no loss and no duplication. This requires coordination between the broker and the consumer's storage (e.g. transactional commits), and is the hardest guarantee to implement. Kafka supports it via
enable.idempotence=truecombined with transactional producers, but it adds overhead.
In practice, most teams design for at-least-once delivery and build idempotent consumers. Exactly-once is reserved for financial or inventory-critical paths.
Point-to-point vs pub/sub
Point-to-point (work queue): a message is delivered to exactly one consumer. Multiple consumers compete for messages from the same queue, which gives you horizontal scaling. A payment processing queue with three worker instances is a classic example — each payment goes to only one worker.
Pub/sub (publish/subscribe): every subscriber receives a copy of each message. The producer publishes to a topic; any service that cares subscribes. When an order is placed, both the inventory service and the email service need to know — pub/sub is the natural fit.
Kafka supports both models through consumer groups. Within a group, each message goes to one member (point-to-point scaling). Across groups, every group gets a copy (pub/sub broadcast).
Push vs pull
In a push model (RabbitMQ), the broker decides when to send messages to consumers. This reduces latency but can overwhelm a slow consumer.
In a pull model (Kafka), the consumer asks the broker for the next batch of messages at its own pace. This naturally implements backpressure: a slow consumer just reads slowly, and the queue depth grows rather than the consumer crashing. Pull is generally preferred in high-throughput pipelines.
Dead Letter Queues
When a consumer fails to process a message after N retries, it would be a mistake to keep retrying forever or to silently discard the message. Instead, move it to a Dead Letter Queue (DLQ).
A DLQ holds messages that could not be processed. Engineers can inspect them, fix the underlying bug, and replay them into the main queue. Without a DLQ, bad messages either poison the queue (blocking all progress) or disappear silently.
Configure DLQs from day one. Set a maxReceiveCount (SQS terminology) or a dead-letter exchange (RabbitMQ) and alert on DLQ depth.
Broker comparison
| Broker | Model | Delivery | Throughput | Notable strength |
|---|---|---|---|---|
| Kafka | Log-based, pull | At-least-once / exactly-once | Very high | Replay, retention, consumer groups |
| RabbitMQ | Broker-mediated, push | At-least-once | High | Flexible routing (exchanges), AMQP |
| AWS SQS | Managed queue, pull | At-least-once | High | Zero ops, native DLQ, FIFO option |
| Redis Streams | Log-based, pull | At-least-once | Medium | Lightweight, co-located with cache |
Backpressure and queue depth
When consumers are slower than producers, messages accumulate in the queue. This is normal up to a point. What matters is the trend.
Monitor queue depth (number of pending messages) and consumer lag (how far behind the consumer offset is, in Kafka terms). A queue depth that grows without bound means you need more consumer instances or a faster consumer implementation. Add auto-scaling policies that trigger on queue depth so the system self-corrects before the queue becomes a problem.
Code example
import redis
r = redis.Redis(host="localhost", port=6379, decode_responses=True)
# Producer: publish an order event
msg_id = r.xadd(
"orders",
{"order_id": "ord-9912", "item": "keyboard", "qty": "1"},
)
print(f"Published message: {msg_id}")
# Consumer: read new messages (blocking, 2-second timeout)
messages = r.xread({"orders": "$"}, count=10, block=2000)
for stream, entries in (messages or []):
for entry_id, fields in entries:
print(f"[{entry_id}] Received: {fields}")
xadd appends the event to the orders stream and returns a unique monotonic ID. xread with "$" as the starting ID means "give me only new messages from this point forward". The block=2000 argument makes the call wait up to 2 seconds for a message before returning, avoiding a busy-poll loop.
Further Reading
Prerequisites
Code Examples
Continue learning
ACID & Isolation Levels
Deep dive into database transaction guarantees, isolation levels, concurrency anomalies like write skew, and control mechanisms such as MVCC, 2PL, and SSI.
API Gateways
Understand the API Gateway pattern as the central ingress point for microservices, handling routing, auth, rate limiting, and protocol translation.
API Security & OAuth 2.0
Understand API authentication and authorization mechanisms, JWT security, and the OAuth 2.0 framework including Authorization Code Flow with PKCE.