Consistency without consensus Linearizable Resilient Data Types (LRDT) Kaushik Rajan Sagar Chordia...

Preview:

Citation preview

Consistency without consensusLinearizable Resilient Data Types (LRDT)

Kaushik RajanSagar Chordia Kapil Vaswani

Ganesan RamalingamSriram Rajamani

Consistency & consensus

Add(The Hobbit)

Add(Kindle)

GetCart()

Processes agree on ordering of operations

GetCart()

No deterministic algorithm in the presence

of failures [FLP]

Commuting updates• What if all update operations commute?– Ordering of updates doesn’t matter!– Eventual consistency reduces to eventual message delivery– Single round trip latency

• What if we desire linearizability?– Updates don’t commute with arbitrary reads – Reads must be consistently ordered with updates– Semantics of queries like the current top(k) elements well

understood

Commuting updates

Add(The Hobbit)

Add(Kindle)

GetCart()

GetCart()

{}

{The Hobbit, Kindle}

Reads must observe comparable sets of operations

Linearizable resilient data types

Possible ImpossibleDon’t know

SS’

op1

op2op1

op2

P1 : commutes(s,op1,op2)

op1

op2

S

S1

S2

op1

P2 : nullify(s,op1,op2)

op2

S

S1

S2

op2

op1

Examples• Read write register :

every pair of writes nullify• Read write memory :

writes to the same location nullify, writes to different locations commute

Examples• Set : add, remove and read the whole set– Add(u), Remove(v) commute– Add(u), Remove(u) nullify – Add(*), Add(*) commute– Remove(*) Remove(*) commute

• Counter : IncrBy(x), DecrBy(x), SetTo(v), Read()– SetTo(v) nullifies all other operations– Other pairs of updates commute

• Other examples Heaps, union-find, atomic snapshot objects…

Lattice agreement• Consistency reduces to lattice agreement– Weaker problem than consensus– Solvable in an asynchronous distributed system

• Assumptions– t < n/2 failures– Eventual message delivery

Lattice agreement• processes, each process starts with a value belonging

to a join semi lattice• Each non-faulty process outputs a value– (Validity) Each process’ output is a join of one or more input

values including its own– (Consistency) Any two output values are comparable– (Liveness) Every correct process eventually outputs a value

Lattice agreement

{}

{𝑎} {𝑏} {𝑐 }

{𝑎 ,𝑏} {𝑏 ,𝑐 } {𝑎 ,𝑐 }

{𝑎 ,𝑏 ,𝑐 }

𝑝1 𝑝2

𝑝3𝑝2

𝑝3𝑝2

𝑝1

a = Add(The Hobbit)b = Add(Kindle)c = Add(Lumia)

Send to all acceptors

All Acks

?

Output

𝑣 𝑖←⋁ ∀ 𝑁𝑎𝑐𝑘 (𝑎 𝑗 )𝑎 𝑗

wait for majority of acceptors to respond

On receiving

𝑎𝑖≤𝑣 𝑗

S S

Y

N

Y N

PROPOSERS ACCEPTORSInitially

𝑎𝑖=𝑎𝑖∨𝑣 𝑗 𝑎𝑖=𝑎𝑖∨𝑣 𝑗

Safety and liveness• Safety always guaranteed• Lattice agreement is t-resilient – Liveness guaranteed if quorum of processes are non-faulty

and communication is reliable– Processes output value in at-most n round trips, where n is

the number of processes

Generalized lattice agreement• Generalization of lattice agreement – Processes receive sequence of values– Values belong to an infinite lattice

• Processes output a sequence of values– (Validity) Every output value is a join of some received values – (Consistency) Any two output values are comparable (i.e.

output values form a chain)– (Liveness) Every value received by a correct process is

eventually included in an output value

GLA algorithm• Liveness (t-resilient)– Every received value is eventually included in some output in

n round trips– Adaptive, complexity depends on contention

• Fast path – Received values output in one round trip

• Reconfigurable – Replicas can be added/removed dynamically

From GLA to linearizability• Update commands form power set lattice• Updates return once majority of processes have learnt a

command set that includes the update command• Read performed by (ABD style algorithm)

1. reading the learnt command set from a quorum of processes2. Writing back the largest among these to a quorum3. Constructing state corresponding to the largest command set

by exploiting commutativity and nullification

• Multi-master replication– Does not require a single primary/leader

Impossibility

• Consensus reductionConsensus(b)

Si S0

if(b) then op1 else op2s = read()if(s = S1,S12) return

trueelse return false

Pair of idempotent update operations that neither commute nor nullify at some state s0

S0

S1

S1

2

S2

S2

1

op2

op1

op1

op2

Si

Op*

op2

op1

Implications for designing ADTs

Most commands commute

Implications for designing ADTs

neither commute nor nullify at

;

The Gap : Open problems Doubly saturating counter

0 1Incr()

Decr()

2Incr()

Decr()

nIncr()

Decr()Decr()

Incr()

Incr() and Decr() commute at 1 … n-1Incr() and Dect() nullify at 0 and n

Don’t know if this is possible or impossible

Summary

graph, RW mem… queues, sequences

Possible Impossible??Saturating

counter

Recommended