The CAP theorem - Startseite TU IlmenauThe CAP theorem Central proposition In a distributed system, it is impossible to provide • Consistency, • Availability, and • Partition

The bad The good The ugly Messages References

The CAP theoremThe bad, the good and the ugly

Michael Pfeiffer

Advanced Networking TechnologiesFG Telematik/Rechnernetze

TU Ilmenau

2017-05-15

1 / 19


1 The bad: The CAP theorem’s proof

2 The good: A different perspective

3 The ugly: CAP and SDN

2 / 19


Section 1

The bad: The CAP theorem’s proof

3 / 19


The CAP theorem

Central proposition

In a distributed system, it is impossible to provide• Consistency,• Availability, and• Partition tolerance

all at once, i.e. at least one of them has to be sacrificed.

• Suggested by Brewer in 1999/2000, proof by Gilbert andLynch in 2002 [1]

• In many networks, the absence of partitions cannot beguaranteed (firmware bugs, administrative errors, . . . )→ choice between CP and AP

4 / 19


Formal model

Network partition

All messages between nodes in different components are lost.

Availability: Available data objects

• Every request received by a non-failing node must result ina response.

• No time boundary, but network partition can last ‘forever’,thus a strong availability requirement.

Consistency: Atomic data objects

• ∃ total order on all operations such that each operationlooks as if it were completed at a single instant.

• Equivalent: Requests must act as if they were processed ona single node, one at a time.

5 / 19


Formal model

Network partition








5 / 19


Formal model

Network partition








5 / 19


Proof

Proof by contradiction. Assume there is a CAP system:

G1 E G2

C1 C2

1. x← 42 2. success! 3. x? 4. ???

6 / 19


Proof


G1

E

G2

C1 C2

1. x← 42 2. success! 3. x? 4. ???

6 / 19


Proof


G1 E G2

C1 C2

1. x← 42 2. success! 3. x? 4. ???

6 / 19


Proof


G1 E G2

C1

C2

1. x← 42

2. success! 3. x? 4. ???

6 / 19


Proof


G1 E G2

C1

C2

1. x← 42 2. success!

3. x? 4. ???

6 / 19


Proof


G1 E G2

C1 C2

1. x← 42 2. success! 3. x?

4. ???

6 / 19


Proof


G1 E G2

C1 C2

1. x← 42 2. success! 3. x? 4. ???

6 / 19


Classical strategies for CP and AP

CP systems

• Delay the acknowledgement of a write operation until newvalue has been propagated to all nodes

• Examples:• Relational database with synchronous replication• 2PCP

AP systems

• Answer with the (possibly stale) last known value• Examples:

• Slave DNS servers• NoSQL databases

7 / 19


Classical strategies for CP and AP

CP systems

• Delay the acknowledgement of a write operation until newvalue has been propagated to all nodes

• Examples:• Relational database with synchronous replication• 2PCP

AP systems

• Answer with the (possibly stale) last known value• Examples:

• Slave DNS servers• NoSQL databases

7 / 19


Section 2

The good: A different perspective

8 / 19


A different perspective (by Brewer [2])

The partition decision

If a partition occurs during the processing of an operation,each node can decide to• cancel the operation (favour C over A), or• proceed, but risk inconsistencies (favour A over C).

But: It is possible to decide differently every time, based on thecircumstances.

This means:• No partition→ No problem• But during a partition, all systems must decide eventually• Permanently retrying is in fact a choice for C over A

9 / 19


A different perspective (by Brewer [2])

The partition decision

If a partition occurs during the processing of an operation,each node can decide to• cancel the operation (favour C over A), or• proceed, but risk inconsistencies (favour A over C).

But: It is possible to decide differently every time, based on thecircumstances.

This means:• No partition→ No problem• But during a partition, all systems must decide eventually• Permanently retrying is in fact a choice for C over A

9 / 19


Mitigation strategies

• Generally: To keep consistency, some operations must beforbidden during a partition

• Others are okay (e.g. read queries)• Often: Guarantee to consistency to a certain degree• Example: Read-your-own-writes consistency

• Facebook: A user’s timeline is stored at master copy andcached at slaves

• Usually users see (potentially stale) copies at slaves• But when they post something, their reads are redirected

to the respective master for a certain time

• Different strategies on different levels possible, e.g. insidea single site and between sites (latency!)

• Often: In one component progress is possible, multipleconsensus algorithms available (e.g. dynamic voting)

10 / 19


Partition recovery

What if we still want to continue service during partition?

1 Detect partition

2 Enter a special partition mode

3 Continue service

4 After partition: Recovery

The small problem: Partition detection• Nodes can disagree whether a partition exists• Consensus about partition state not possible• Nodes may enter the partition mode at different times• A distributed commit protocol is required (2PCP, Paxos, . . . )

11 / 19


Partition recovery

What if we still want to continue service during partition?

1 Detect partition

2 Enter a special partition mode

3 Continue service

4 After partition: Recovery

The small problem: Partition detection• Nodes can disagree whether a partition exists• Consensus about partition state not possible• Nodes may enter the partition mode at different times• A distributed commit protocol is required (2PCP, Paxos, . . . )

11 / 19


The big problem: Partition recovery

A (very) simple example:• Users register on a web site• Every user is assigned an unique ID (SQL: serial,

auto_increment)• During partition: Same ID might be assigned twice• Recovery: Recreate uniqueness of IDs

Partition recovery: It’s about invariants

• In a consistent system, invariants are guaranteed• Even when the system’s designer does not know them• In an available system, invariants must be explicitly

restored after a partition• System’s designer must know the invariants and how to

restore them

12 / 19


The big problem: Partition recovery

A (very) simple example:• Users register on a web site• Every user is assigned an unique ID (SQL: serial,

auto_increment)• During partition: Same ID might be assigned twice• Recovery: Recreate uniqueness of IDs

Partition recovery: It’s about invariants

• In a consistent system, invariants are guaranteed• Even when the system’s designer does not know them• In an available system, invariants must be explicitly

restored after a partition• System’s designer must know the invariants and how to

restore them

12 / 19


CRDTs

• Commutative/Conflict-free Replicated Data Types (CRDTs)are data types that provably converge

• Example: Google Docs serialises edits into a series of insertand delete operations

On Monday, the ANTlecture is at 13:00.

On Thursday, the ANTlecture is at 13:00.



→ Application-specific invariants are not ensuredautomatically

13 / 19


CRDTs








13 / 19


CRDTs








13 / 19


CRDTs








13 / 19


CRDTs








13 / 19


CRDTs








13 / 19


More on partition recovery

• Recovery is tedious and error prone• Brewer: Similar to going from single-threaded to

multi-threaded programming• Sometimes only possibility: Ask the user (e.g. git merge)• Balance between availability and consistency:

• ATMs: When partitioned, limit withdrawal to amount X• Invariant: Not more withdrawals than allowed• Manual correction afterwards

• Usual tools:• Version vectors (vector clocks)• Logging, replay and rollback

14 / 19


Section 3

The ugly: CAP and SDN

15 / 19


SDN and CAP

• So far, we have talked about distributed systems on theapplication layer (databases, web services, ...)

• SDN is much more basic (layer 2/3)• Network functionality is essential→ pure CP is not really an option

• AP means partition recovery is required

16 / 19


SDN and partition recovery

• Possible without the network up and running?• Beware of dependency loops. . .• Is falling back to non-SDN networking possible?• Even if SDN has been used to replace features like VLANs?• Relying on user input rather unrealistic. . .• Possible to figure out all the invariants?• Most SDN publications ignore the issue. . .• BGP does not stabilise in all cases [3]. . .

17 / 19


Wrapping up

1 The CAP theorem is proven and holds.

2 Do not think about CP or AP systems, but about thepartition decision.

3 Many possibilities to fine-tune the balance betweenconsistency and availability, and to recover from partitions.

4 But systems tend to become very complex.

5 Can we stomach this amount of complexity for buildingservices as basic as network connectivity?

18 / 19


[1] Seth Gilbert and Nancy Lynch. ‘Brewer’s conjecture andthe feasibility of consistent, available, partition-tolerantweb services’. In: ACM SIGACT News 33 (2 June 2002),pp. 51–59. DOI: 10.1145/564585.564601.

[2] Eric Brewer. ‘CAP twelve years later: How the “rules” havechanged’. In: Computer 45 (2 Feb. 2012), pp. 23–29. DOI:10.1109/MC.2012.37.

[3] Timothy G. Griffin and Gordon Wilfong. ‘An analysis ofBGP convergence properties’. In: ACM SIGCOMMComputer Communication Review 29 (4 Oct. 1999),pp. 277–288. DOI: 10.1145/316194.316231.

19 / 19

http://dx.doi.org/10.1145/564585.564601

http://dx.doi.org/10.1109/MC.2012.37

http://dx.doi.org/10.1145/316194.316231

Documents

The CAP theorem - Startseite TU IlmenauThe CAP theorem Central proposition In a distributed system, it is impossible to provide • Consistency, • Availability, and • Partition