Idempotency

Making repeated operations safe by ensuring the same request produces the same result no matter how many times it's executed.

IntermediateAPI DesignChapter: API Design15 min read

What It Is

An operation is idempotent if performing it multiple times has the same effect as performing it once.

Under the HTTP specification:

  • GET, HEAD, OPTIONS, TRACE are safe and idempotent by definition because they only retrieve data.
  • PUT is idempotent: replacing a resource entirely with the same payload repeatedly leaves the resource in the exact same state.
  • DELETE is idempotent: deleting a resource multiple times results in the resource being gone. The first call deletes it (returning 200 or 204), and subsequent calls will return 404, but the system state is identical.
  • POST is not idempotent: sending the same POST request multiple times will create multiple resources or trigger multiple transactions.

Why It Matters: Network Failures

In distributed systems, networks are unreliable. A request can fail in three ways:

  1. The request never reaches the server.
  2. The server processes the request but the connection drops before sending the response.
  3. The response is lost on the way back to the client.

If a client retries a payment POST /payments request because of a timeout, they do not know if the failure happened at step 1, 2, or 3. Without idempotency, retrying could charge the customer's card twice.


Idempotency Key Lifecycle & Cache Flow

To make POST requests safe, clients generate a unique identifier (typically a UUIDv4) called an Idempotency Key and send it in a custom header: Idempotency-Key: 9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d.

Idempotency Key Lifecycle & Cache Flow Client API Gateway / Server Cache (Redis) Database / Bank 1. POST (Key=xyz) 2. Lookup key 3. Miss (not seen) 4. Charge card 5. Save key + result 6. Success (201) Retry (Key=xyz) Returns cached result

The Step-by-Step Flow:

  1. Key Generation: The client generates a unique token before sending the request.
  2. First Request: The server receives the key and searches its storage. Since it is a cache miss, the server marks the key as IN_PROGRESS (often using an atomic lock) to avoid race conditions.
  3. Execution: The server executes the downstream banking or write operation.
  4. Storage: Once finished, the server stores the response payload, status code, and headers in the cache or database, and sets the key's state to COMPLETED.
  5. Return: The response goes back to the client.
  6. Subsequent Retries: If the client retries the request with the same key, the server finds the key in the cache and returns the saved response immediately. The downstream operation is not triggered a second time.

Production Concerns

1. Handling Concurrent Requests

What happens if two identical requests with the same key arrive at the server at the exact same millisecond? This is a classic race condition.

To prevent this, the database or cache must use atomic lock operations. For example, using Redis SET key val NX PX 10000 (set if not exists with a TTL of 10 seconds). If the second request tries to acquire the same key while the first is still processing, the server returns a 409 Conflict status code or waits for the lock to release.

2. Payload Consistency

Should the server verify if the request body matches the first request for a given key?

If a client sends POST /payments with amount=10 and key xyz, and then sends POST /payments with amount=100 and the same key xyz, this is an application bug or malicious request. Best practice is to hash the request body and store it alongside the key. If the hash does not match on a retry, return a 400 Bad Request explaining the payload mismatch.

3. Expiration (TTL)

You do not need to store idempotency keys forever. Doing so would bloat your database. A standard TTL of 24 to 72 hours is usually sufficient to handle any client retries.


Common Mistakes

  • Using Non-Unique Keys: Using a customer ID or order ID as the idempotency key. Keys must be unique per request attempt.
  • No TTL: Forgetting to set a TTL on the database records, causing infinite database table growth.
  • Ignoring Downstream Failures: If the downstream bank call fails with a 500 error, storing that error permanently. If an operation fails due to transient issues, the idempotency record should be cleared or updated to allow a genuine retry.

Further Reading

Prerequisites

Code Examples