Webhooks

Learn how to build secure, reliable Server-to-Server callbacks with HMAC signatures, exponential retries, and asynchronous ingestion.

IntermediateAPI DesignChapter: API Design12 min read

The Concept

Modern distributed applications rely heavily on real-time event updates. While traditional client-server communications use polling where the client repeatedly queries the server for changes, this model is highly inefficient. It wastes database cycles and network bandwidth when no new data is available.

Webhooks invert this design by establishing a callback pattern. Instead of the client asking the server for status updates, the server proactively sends an HTTP POST request to a client-registered URL immediately when an event occurs. This server-to-server callback allows downstream systems to process events in real time without the overhead of continuous polling.

Secure Webhook Handshake and Event Processing Flow Source Server (Dispatch) Client API (Receiver) Client Worker (Async) POST Webhook Payload X-Signature: HMAC-SHA256, X-Timestamp, X-Event-ID Verify HMAC 202 Accepted (Immediately releases socket) Enqueue Task Process Event

Practical Analogy

Consider the difference between buying mail order goods in two different ways:

  • Polling is like running down to your letterbox outside your house every five minutes to check if your package has arrived. You spend all day walking back and forth, wasting physical energy, even though the mail delivery only occurs once a day.
  • A Webhook is like installing a doorbell on your door. You go about your normal day doing work inside your house. When the delivery driver arrives with the package, they press the doorbell button. The ring triggers your immediate attention to collect the package, eliminating unnecessary checks.

Message Integrity and HMAC Signatures

Because webhook receivers are public HTTP endpoints exposed to the internet, they are highly vulnerable to attack. Malicious users can easily spoof POST requests to trigger fraudulent business logic (such as marking an unpaid order as paid).

To establish authenticity and message integrity, webhook publishers cryptographically sign payloads using a shared secret key. This process employs the Hash-based Message Authentication Code (HMAC) algorithm:

  1. The server serializes the request payload into a raw string.
  2. The server generates a signature by hashing the payload and a shared secret key using the SHA-256 algorithm.
  3. The server transmits the signature, along with a timestamp, inside custom HTTP headers (such as X-Signature and X-Timestamp).
  4. The client receiver calculates the HMAC signature of the incoming request body using its local copy of the shared secret.
  5. The receiver performs a constant-time string comparison to match its calculated signature with the header's signature. If they match, the receiver knows the message is authentic and has not been altered in transit.

Preventing Replay Attacks

Attackers can capture valid webhook requests on the wire and send them repeatedly (a replay attack) to overload client systems. To mitigate this:

  • The publisher signs a combined string of the timestamp and payload: timestamp + "." + payload.
  • The receiver extracts the timestamp header and compares it against its local system clock.
  • If the difference exceeds a threshold (for example, 5 minutes), the receiver discards the request as stale, preventing replay attempts.

Egress Security and Network Validation

Beyond signature verification, webhook senders and receivers must apply network security filters to protect their infrastructure:

  • Mutual TLS (mTLS): Webhook endpoints can require mTLS, forcing the publishing server to present a valid TLS certificate that the receiver verifies against trusted certificates, locking down communication channels.
  • Egress Firewalls and IP Blocklists: Source servers routing webhooks to random external addresses must configure egress firewalls. This prevents internal systems from hitting malicious endpoints that might access private internal networks.
  • IP Whitelisting: Receivers can verify the publisher's egress IP address range against a published static list, blocking requests originating from unlisted IP addresses.

Recipient Optimizations

When a webhook receiver gets a valid payload, it must avoid performing long-running tasks (such as image processing or database reports) inline with the request.

Doing so holds the HTTP socket open, leading to:

  • Socket Timeouts: The publishing server will assume the delivery failed and drop the connection or initiate retries.
  • Worker Depletion: The receiver's server thread pool will rapidly drain, blocking new incoming webhook requests.

To optimize ingestion, receivers return a fast 202 Accepted status code immediately after validating the signature. Before sending the response, they enqueue the raw payload to an internal message queue (such as Redis, RabbitMQ, or SQS). Background worker tasks then consume and process the events asynchronously, freeing the HTTP socket in milliseconds.


Server Dispatch Optimization

Publishing servers must also optimize their outgoing webhooks. Directly sending HTTP requests inline with main application transactions slows down user interactions and risks database lockups if the webhook receiver is offline or slow.

Publishers delegate dispatching to message queues:

  1. The application transaction records an event and enqueues a dispatch task.
  2. Background dispatch workers read the event queue, compile payloads, compute HMAC signatures, and execute the POST request.
  3. The outgoing queue rate limits requests destined for individual client domains, preventing the publisher from accidentally launching a distributed denial-of-service (DDoS) attack against a client.

Retry Policies, Exponential Backoff, and DLQs

Since client servers can experience outages, publishers must implement reliable retry policies:

  • Exponential Backoff: If a delivery fails, the worker waits for a period that doubles with each attempt (for example, 1s, 2s, 4s, 8s, 16s).
  • Jitter: Random delays (jitter) are added to the backoff times to prevent retrying workers from hitting client servers in synchronized spikes.
  • Dead Letter Queues (DLQ): If an event fails to deliver after multiple retry attempts (e.g. 5 to 10 attempts), the system moves the event to a Dead Letter Queue. Administrators inspect the DLQ to identify broken endpoints and notify clients.

Further Reading

Prerequisites

Code Examples

Core Literature References

Designing Quality Webhooks

by Tom Snyders — HMAC signing standards and async delivery design

View source