CAP Theorem

Why distributed databases cannot simultaneously guarantee consistency, availability, and partition tolerance during a network split.

IntermediateReliability & ScaleChapter: Reliability & Scalability12 min read

The Foundations of CAP: Three Distributed Guarantees

The CAP Theorem states that any distributed data store can only guarantee two out of three system properties at the same time:

  • Consistency (C): Specifically linearizability. Every read request receives the most recent write or returns an error. The system acts as if there is only a single copy of the data, even if it is replicated across multiple geographic regions.
  • Availability (A): Every non-failing node in the cluster returns a non-error response for every request. It does not guarantee that the returned data is the most recent write, only that the node is active and responsive.
  • Partition Tolerance (P): The system continues to function despite any number of communication drops or delayed messages between nodes in the network.

The Tradeoff: Partitions are Inevitable

In a perfect physical network where fibers never break and routers never drop packets, you could have all three properties. However, physical hardware is imperfect. Network partitions are a reality of distributed systems. A network partition occurs when communication between two or more nodes in a cluster fails while both groups remain individually functional.

Therefore, the choice is not "choose two out of three". The actual tradeoff is: when a network partition (P) occurs, will your system choose Consistency (CP) or Availability (AP)?

Node A Val = "X" Node B Val = "X" Node C Val = "X" Write "Y" Network Partition (P) Blocked AP Choice (Availability) Accept write local. Reads from B/C return stale "X" CP Choice (Consistency) Reject write since quorum is unreachable. Return error.

CP Systems: Prioritizing Consistency Over Availability

In a CP system, when a network partition cuts off communication, the system halts writes or returns errors on the isolated nodes.

If the client attempts to write Value Y to Node A, but Node A cannot reach Node B or Node C to replicate the update, the system rejects the write. This prevents data divergence. If a client subsequently queries Node B or Node C, they will read the correct, albeit old, value of the system (Value X), and the system preserves consistency.

This model is common in systems that manage financial accounts or inventory ledgers where duplicate or diverging records are unacceptable.

  • Examples: Google Spanner, Etcd, and ZooKeeper. These databases use consensus protocols (like Paxos or Raft) requiring a majority quorum (W > N/2) to accept any writes.

AP Systems: Prioritizing Availability Over Consistency

In an AP system, nodes accept writes locally even when partitioned from the rest of the cluster.

If a partition isolates Node A, it will accept Value Y and immediately return a success response to the client. During the partition, if a different client queries Node B or Node C, they will get the stale Value X. The system is fully available, but it is inconsistent.

Once the partition heals, the nodes resolve conflicts using strategies like:

  • Last-Write-Wins (LWW): Using timestamp metadata to overwrite older data, though this is sensitive to clock drift.

  • Vector Clocks: Tracking version histories to detect and flag conflicts for application-level resolution.

  • CRDTs (Conflict-free Replicated Data Types): Merging sets or registers mathematically (such as shopping carts) without manual intervention.

  • Examples: Apache Cassandra, Amazon DynamoDB, and Couchbase.


Beyond CAP: The PACELC Theorem

The CAP Theorem only focuses on how a system behaves during network failures. However, network splits are rare. The PACELC Theorem extends CAP by describing how a database operates under normal, healthy network conditions:

  • If there is a Partition (P), how does the system trade off Availability (A) and Consistency (C)?
  • Else (E) (when the network is running normally), how does the system trade off Latency (L) and Consistency (C)?

For instance, MongoDB is a PC/EC system. During a partition, it sacrifices availability to preserve consistency. When running normally, it forces clients to wait for replication acknowledgments, prioritizing consistency over low latency. Cassandra is a PA/EL system. It prioritizes availability during partitions, and trades off consistency for low latency by returning local read results immediately during normal operations.


Real-World Implementation and Consensus Limits

A common misconception is that a database is strictly CP or AP. In practice, modern databases are highly configurable.

For example, in Cassandra, developers can tune consistency levels on a per-query basis. A client can write using QUORUM consistency, requiring a majority of nodes to acknowledge, and read using QUORUM consistency. This configuration provides consistency (R + W > N) at the cost of latency. If the client writes using ONE and reads using ONE, the system operates with low latency but exhibits eventual consistency.


Further Reading

Code Examples

Core Literature References

Towards Robust Distributed Systems

by Eric A. Brewer — Proceedings of the Annual ACM Symposium on Principles of Distributed Computing (PODC), pp. 7-10

View source

Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-tolerant Web Services

by Seth Gilbert & Nancy Lynch — ACM SIGACT News, pp. 51-59

View source