Consensus Under Network Partitions

Why CAP trade-offs look different in modern cloud topologies, and what that means for system design.

The CAP theorem is the most invoked, least understood result in distributed systems. It's used to justify availability over consistency, consistency over availability, and occasionally both in the same architecture diagram. This paper revisits the trade-off in light of two changes since Brewer's original conjecture: cloud networks are now much more reliable than they were in 2000, and the cost of inconsistency has grown sharply for many workloads.

Background

Brewer's CAP theorem, formalized by Gilbert and Lynch (2002), states that a distributed system cannot simultaneously provide:

Consistency — every read returns the most recent write
Availability — every request receives a non-error response
Partition tolerance — the system continues to function despite arbitrary message loss

When a partition occurs, the system must sacrifice either C or A. This is the famous "pick two" framing, though that framing is misleading: partitions happen, so you're really picking between C and A, with P as a given.

What's changed since 2000

Three things, mainly:

Network reliability. Inside a single cloud region in 2024, partition rates are measured in basis points per year. Cross-region links are less reliable, but most systems are not cross-region.
Workload shifts. Financial transactions, inventory management, and authentication systems pay much higher costs for inconsistency than the early-web workloads (blogs, social posts) that shaped Dynamo and Cassandra.
Consensus performance. Raft (2014) and EPaxos (2013) brought consensus latency from tens of milliseconds to single-digit milliseconds inside a region. The performance argument against CP systems is weaker than it was.

Core argument

The CAP framing treats availability and consistency as binary properties. They are not. Both are continuous, both can be tuned, and the right operating point depends on the cost of each kind of failure.

A more useful model

Consider four quantities:

p — probability of network partition per unit time
C(c) — cost of an inconsistent read (depends on workload class c)
A(c) — cost of unavailability during a partition
t — expected partition duration

Total expected cost is:

E[cost] = p · t · A(c)         (AP system, pays availability cost during partition)
        + p · C(c)              (CP system, pays inconsistency cost per stale read)

For workloads where C(c) ≫ A(c) — payments, leader election, inventory — the CP system wins even at high partition rates. For workloads where A(c) ≫ C(c) — feed ranking, recommendations, telemetry — the AP system wins.

The middle ground

Most real systems are neither pure CP nor pure AP. They use bounded staleness: reads may be stale by up to N seconds or M writes, and the system tracks both. This is what Spanner, CockroachDB, and FaunaDB do under the hood.

Implementation patterns

Three patterns appear repeatedly in production systems that navigate CAP well:

Discussion

Two limitations of this analysis:

The cost functions are hard to measure. "Cost of an inconsistent read" is workload-specific and often political — engineering teams understate it, compliance teams overstate it. The framework is useful for structuring the conversation, not for producing a single number.

Partition probability is a moving target. Cloud providers have made networks dramatically more reliable, but they've also added abstraction layers (load balancers, service meshes, sidecars) that introduce their own failure modes. The "network partition" of 2024 is more likely to be a misconfigured Envoy than a severed fiber.

Conclusion

CAP is a useful starting point, not an architecture decision. The interesting questions are: how often do partitions happen in your environment, how long do they last, and what does each kind of failure actually cost the business? Answering those three questions gives you a defensible position on the CAP trade-off; invoking CAP without them is cargo-culting.

References

Gilbert and Lynch, "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services" (2002)
DeCandia et al., "Dynamo: Amazon's Highly Available Key-Value Store" (SOSP 2007)
Ongaro and Ousterhout, "In Search of an Understandable Consensus Algorithm" (USENIX ATC 2014)
Corbett et al., "Spanner: Google's Globally Distributed Database" (OSDI 2012)
Bailis et al., "Highly Available Transactions: Virtues and Limitations" (VLDB 2014)

How is this guide?

Consensus Under Network Partitions

Hedged reads

Quorum tunability

Causal consistency

On this page