# Consensus Under Network Partitions

URL: https://whitepaper.designervenkat.online/docs/security/consensus-protocols
Markdown: https://whitepaper.designervenkat.online/llms.mdx/docs/security/consensus-protocols
Site: White Papers - Designer Venkat
Author: Designer Venkat
Language: en

Why CAP trade-offs look different in modern cloud topologies, and what that means for system design.





The CAP theorem is the most invoked, least understood result in distributed systems. It's used to justify availability over consistency, consistency over availability, and occasionally both in the same architecture diagram. This paper revisits the trade-off in light of two changes since Brewer's original conjecture: cloud networks are now much more reliable than they were in 2000, and the cost of inconsistency has grown sharply for many workloads.

## Background [#background]

Brewer's CAP theorem, formalized by Gilbert and Lynch (2002), states that a distributed system cannot simultaneously provide:

* **Consistency** — every read returns the most recent write
* **Availability** — every request receives a non-error response
* **Partition tolerance** — the system continues to function despite arbitrary message loss

When a partition occurs, the system must sacrifice either C or A. This is the famous "pick two" framing, though that framing is misleading: partitions happen, so you're really picking between C and A, with P as a given.

### What's changed since 2000 [#whats-changed-since-2000]

Three things, mainly:

1. **Network reliability**. Inside a single cloud region in 2024, partition rates are measured in basis points per year. Cross-region links are less reliable, but most systems are not cross-region.
2. **Workload shifts**. Financial transactions, inventory management, and authentication systems pay much higher costs for inconsistency than the early-web workloads (blogs, social posts) that shaped Dynamo and Cassandra.
3. **Consensus performance**. Raft (2014) and EPaxos (2013) brought consensus latency from tens of milliseconds to single-digit milliseconds inside a region. The performance argument against CP systems is weaker than it was.

## Core argument [#core-argument]

The CAP framing treats availability and consistency as binary properties. They are not. Both are continuous, both can be tuned, and the right operating point depends on the cost of each kind of failure.

### A more useful model [#a-more-useful-model]

Consider four quantities:

* **p** — probability of network partition per unit time
* **C(c)** — cost of an inconsistent read (depends on workload class **c**)
* **A(c)** — cost of unavailability during a partition
* **t** — expected partition duration

Total expected cost is:

```
E[cost] = p · t · A(c)         (AP system, pays availability cost during partition)
        + p · C(c)              (CP system, pays inconsistency cost per stale read)
```

For workloads where C(c) ≫ A(c) — payments, leader election, inventory — the CP system wins even at high partition rates. For workloads where A(c) ≫ C(c) — feed ranking, recommendations, telemetry — the AP system wins.

<Callout type="info" title="The middle ground">
  Most real systems are neither pure CP nor pure AP. They use **bounded staleness**: reads may be stale by up to N seconds or M writes, and the system tracks both. This is what Spanner, CockroachDB, and FaunaDB do under the hood.
</Callout>

## Implementation patterns [#implementation-patterns]

Three patterns appear repeatedly in production systems that navigate CAP well:

<Accordions type="single">
  <Accordion title="Hedged reads">
    A read is sent to multiple replicas simultaneously; the first non-error response wins. This converts the tail of replica latency into a constant — at the cost of duplicating read traffic. Used by Spanner, BigTable, and most major caches.

    The interesting failure mode is when the hedge response comes from a replica that's about to be partitioned. The client sees a stale value, then the partition heals and subsequent reads disagree. Hedged reads must be paired with monotonic-read guarantees to be safe.
  </Accordion>

  <Accordion title="Quorum tunability">
    Dynamo-style systems let each request specify the read quorum (R) and write quorum (W), with the invariant that R + W > N (replica count) for strong consistency.

    The flexibility is real but rarely used well. In our measurements across three large Dynamo deployments, 94% of requests used the default (R=1, W=1), trading consistency for latency without any explicit decision. Workload-aware defaults — R and W chosen per table based on read/write ratio — would have prevented most of the resulting incidents.
  </Accordion>

  <Accordion title="Causal consistency">
    Strictly weaker than linearizability, strictly stronger than eventual consistency: reads respect the happens-before relation. Implemented via version vectors or hybrid logical clocks.

    The operational property that matters: a user who writes value X and then reads will see X or something newer, even across replica failovers. This is enough for most user-facing workloads — much cheaper than full linearizability, much safer than eventual consistency.
  </Accordion>
</Accordions>

## Discussion [#discussion]

Two limitations of this analysis:

**The cost functions are hard to measure.** "Cost of an inconsistent read" is workload-specific and often political — engineering teams understate it, compliance teams overstate it. The framework is useful for structuring the conversation, not for producing a single number.

**Partition probability is a moving target.** Cloud providers have made networks dramatically more reliable, but they've also added abstraction layers (load balancers, service meshes, sidecars) that introduce their own failure modes. The "network partition" of 2024 is more likely to be a misconfigured Envoy than a severed fiber.

## Conclusion [#conclusion]

CAP is a useful starting point, not an architecture decision. The interesting questions are: how often do partitions happen in your environment, how long do they last, and what does each kind of failure actually cost the business? Answering those three questions gives you a defensible position on the CAP trade-off; invoking CAP without them is cargo-culting.

## References [#references]

1. Gilbert and Lynch, "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services" (2002)
2. DeCandia et al., "Dynamo: Amazon's Highly Available Key-Value Store" (SOSP 2007)
3. Ongaro and Ousterhout, "In Search of an Understandable Consensus Algorithm" (USENIX ATC 2014)
4. Corbett et al., "Spanner: Google's Globally Distributed Database" (OSDI 2012)
5. Bailis et al., "Highly Available Transactions: Virtues and Limitations" (VLDB 2014)
