Distributed Computing 8. Impossibility of consensus Shmuel Zaks zaks@cs.technion.ac.il ©

Distributed Computing 8. Impossibility of consensus

Shmuel Zakszaks@cs.technion.ac.il

Consensus

Input: 1 or 0 to each processor

Output: Agreement: all procssors decide 0 or 1 Termination: all processors eventually decide Validity: if all inputs x , then decide x

The result: No completely asynchronous

consensus protocol can tolerate even a single unannounced process death.

e.g., the stopping of a single process at inappropriate time can cause any distributed commit protocol to fail to reach agreement.

Motivation

This problem serves a role that is similar to the one served by “the halting problem” in computability theory.

Many problems equivalent to consensus (or reduce to it)

Protocols in the industry

How commit protocols in the industry dealwith this outcome ?Weaken an assumption. For example: Computation model: e.g., assume

bounded –delay network Fault model: e.g., assume faults only at

start.

The Model

Message System Reliable

Delivers all messages correctly Exactly once

Processing Model Completely Asynchronous

No Assumptions about relative speeds

Unbounded time in delivering message

Weak Consensus

Every process starts with initial value in {0,1}

A nonfaulty process decides on a value in {0,1}by entering an appropriate decision state

All nonfaulty process that decide are required to choose the same value (note: termination not required)

Both 0 and 1 are possible decision values, although perhaps for different initial configurations.

(Trivial solutions – e.g., “0” - are ruled out)

System Model

Communicate by means of one global message buffer

Atomic step Attempt to receive a message Perform local computation Send arbitrary but finite set of

messages

Consensus Protocol

N processes (N > 1) Each process has

xp – one-bit input register yp – output register with values in

{b,0,1} Unbounded amount of internal

storage PC – Program counter

Consensus Protocol

N processes (N > 1)

process p

xp 0/1 yp 0/1/b memory

(unboundd)

input register

output register

memory

Program counter

Fixed starting valued at the memory (except the input register)

Output register starts with b The output register is “write once” when a value is written to the output

register, the process is “in a decision state”.

Process acts deterministically according to a Transition function

Communication System

A message is a pair (p,m) p is the name of the destination m is a “message value”

message buffer Maintains messages that have been sent but

not yet delivered

We assume a clique topology

two operations by a process : send (p,m) – place (p,m) in the message buffer

( “message (p,m) is sent to process p”)

receive (p)delete a message (p,m) from the message buffer and returns m ( “message (p,m) is received”)OR

returns (message buffer unchanged)

Message system nondeterministic.For fairness we will require in the proof: if receive(p) is performed infinitely many

times, then each message (p,m) in the message buffer is eventually delivered.

Now: the message system can return an infinite number of times in response to receive(p), even though a message (p,m) is in the message buffer .

Note: Assume a clique topology

(P1,M)

Message Buffer

(P0,M’)

(P2,M’’)

(P1,M’’’)

Process 0 Process 2Process 1

receive(0) (P0,M’)

(P1,M)

Message Buffer

Process 0 Process 2Process 1

receive(1)

(P2,M’’)

(P1,M’’’)

(P1,M’’’) send(2,m

(P2,m)

Configurations

A configurations consists of Internal state of each process Contents of the message buffer

initial configuration each process p starts with xp=0 or xp=1 the message buffer is empty

step – consists of a primitive step by a single process p. phase 1 – receive(p) is performed phase 2 – p enters a new internal

state and sends a finite set of messages

A step is completely determined by the pair e = (p,m), called an event.

event e = (p,m) (“receipt of m by p”).

step of a single process p: receive(p) is performed ( p receives

m) p enters a new internal state p sends a finite set of messagesevent and step:

event: syntax

step: semantic

Events and Schedules

e(C) – denotes the resulting configuration

(“e can be applied to C”) The event (p,) can always be applied A schedule from C is a finite/infinite

sequence of events that can be applied from C.

The associated sequence of steps is called a run.one: event - step

many: schedule - run

If a schedule is finite, (C) denotes the resulting configuration C’, which is “reachable from C “.

C’ is accessible if it is reachable from an initial configuration.

Lemma 1 (‘commutativity’)

Lemma 1 : Suppose that from some configuration C, the schedules 1,2 lead to configurations C1,C2 , respectively.

If the sets of processes taking steps in 1 and 2 , respectively, are disjoint, then 2 can be applied to C1 , and 1 can be applied to C2 , and both lead to the same

configuration C3 .

when 1 and 2 contain a single event (p,m) event

(P1,M1) (P2,M2)

(P2,M2) (P1,M1)

The message buffer of

Message buffer

P1Internal state - A

P2Internal state - X

P1Internal state - B

P2Internal state - Y

P1Internal state - B

P2Internal state - X

P1Internal state - A

P2Internal state - Y

All other processors – change unchanged

states

when 1 and 2 contain a single event (p,m) event - ok when 1 and 2 contain any run – use induction

A configuration C has a decision value v if some process p is in a decision state

with yp = v (v =0 or v=1).

A consensus protocol is partially correct if it satisfies two conditions:

1. No accessible configuration has more than one decision value.

2. For each v {0,1}, some accessible configuration has decision value v .

good news - it is non trivial

- sometimes it decides

- it never decides incorrectly

bad news

- termination not guaranteed

- what about delivering all messages?

- what about failures?

A process p is nonfaulty in a run if it takes steps. It is faulty otherwise.

bad news: a process can be declared faulty only at

!! A run is admissible if - at most one process is faulty, and - all messages sent to non-faulty

processes are eventually received.

A run is deciding if some process reaches a decision state.

A consensus protocol is totally correct in spite of one fault if it is: partially correct, and every admissible run is a deciding run.

Theorem: No consensus protocol is totally

correct in spite of one fault.

Sketch of Proof: Assume that P is totally correct in spite of one fault.

show an initial configuration from which each decision is still possible ( Lemma 2 )

show that from such a configuration one can always reach another similar configuration

( Lemma 3 ) conclude – by induction – with an

admissible run that never decides – a contradiction.

Let C be a configuration and let V be the set of decision values of configurations reachable from C.

C is bivalent if |V| = 2 C is univalent if |V| = 1

if V = {0} then C is 0-valent if V = {1} then C is 1-valent(Note: |V|≠0, since P is totally correct)

Theorem: No consensus protocol is totally correct in spite of one

fault.Proof: Assume that P is totally correct in spite of one fault. We will reach a contradiction.

0-valent configuration

From now on:

Unknown

Proof:Assume there is no bivalent initial

configuration.But P is partially correct.So, there are both 0-valent and 1-valent initial configurations.

Lemma 2: P has a bivalent initial configuration.

bivalent

configurationinitial configurationsC

0-valentconfiguration C1

initial configurations

1-valentconfiguration

Two initial configurations are called adjacent if they differ only in the initial value of a single process.

0 1 0 1 1

0 1 0 1 0

x0 x1 x2 x3 x4

Claim: There exist a 0-valent initial

configuration C0 adjacents to a 1-valent

initial configuration C1.

0 1 0 1 1

1 1 0 1 1

1 1 0 1 0

1 1 0 0 0

1 0 0 0 0

x0 x1 x2 x3 x4

Proof by example:

0-valent

1-valent

So: There exist a 0-valent initial

configuration C0 adjacents to a 1-valent

initial configuration C1.

Let p be the process in whose initial value they differ

P is a consensus protocol that is totally correct in spite of one fault.

Consider an admissible deciding run (with schedule ) from C0 in which process p takes no steps.

can be applied to C1

The two corresponding configurations are identical, except for the internal state in p

Both runs reach the same decision x

x = 1 C0 is bivalent

x = 0 C1 is bivalent

Contradiction.

C’’

Decision: x x

0-valent

1-valent

So , we proved:

Lemma 3:Let: C be a bivalent configuration of P, e = (p,m) be an event that is applicable to C.

S be the set of configurations reachable

from C without applying e, and D = e(S) = {e(E)| ES and e is applicable to

Then, D contains a bivalent configuration.

e2e1e4

ei ≠ e

bivalent configuration

D=e(S)

e5 e6e7

Need to prove: D contains a bivalent configuration

Note:e =(p,m) is applicable to Cso: message (p,m) is in the message

buffer,so: e is applicable to every ES.

Prove by contradictionAssume that D contains no

D=e(S)

ei ≠ e

e ee e

C 0-valent

1-valent

Step 1:

Claim: D contains both and

0-valent

1-valent

So: every configuration d D is or

The proof has three steps.

D=e(S)

e=(p,m)

Step 1

case E S

C is bivalent There exist Ei, , i=0,1, i-valent

configurations reachable from C.

ei ≠ e

e ee e

D=e(S)e

Let F0 = e (E0 ) .

0 1. case E SÎ

ei ≠ e

bivalent configurati

e ee e

D=e(S)

0-valent

1-valent

so: D contains

e was applied in reaching E0

so, either E0 is in D, or there exists

F0 D from which E0 is reachable.

0 2. ( 0)case E S iÏ =

e2e1e4

ei ≠ e

e ee e

D=e(S)

0-valent

1-valent

so: D contains

So: Fi is i-valent (not bivalent) One of Ei and Fi is reachable from the

other.

both and

So, we know that D contains

0-valent

1-valent

End of step 1Start of step 2

Step 2Claim: There exist C0 , C1 S such that:

C0 and C1 are neighbors

( C1 = e’(C0 ), e’=(p’,m’) ) D0 = e(C0) is D1 = e(C1) is

(two configurations neighbors if one results from the other in a single step.)

0-valent

1-valent

D=e(S)

e’C1C0

e=(p,m)

e’=(p’,m’)

Step 2

e(C) is or . Suppose it is .

There is in D.

It has predecessor in S.

D=e(S)

0-valent

1-valent

Consider the path in S from C to

the predecessor of

D=e(S)

0-valent

1-valent

Applying e to each configuration on this path, we get a configuration in D, which is or .

D=e(S)

So we get two configurations C0 and C1 , that

are neighbors in S; i.e., there is e’ s.t.

D=e(S)

e(C)D0 D1

e’C1

So, we proved the claim:

There exist C0 , C1 S such that: C0 and C1 are neighbors

( C1 = e’(C0 ), e’=(p’,m’) ) D0 = e(C0) is D1 = e(C1) is

End of step 2 Start of step 3

D1 = e’(D0) by Lemma 1Case 1 : p’ ≠

pcontradiction

D=e(S)

e(C)D0 D1

e’C1

Step 3: get to a contradiction

Recall: e=(p,m)

D=e(S)

e’C1C0

e=(p,m)

e’=(p’,m’)

p’ p

Case 2 : p’ = p

recall:

Case 2 : p’ = p

- deciding run from C0 in which p takes no steps A = (C0)

deciding run

1-valent

0-valent

E1A is a deciding run. But it cannot be

and it cannot be . a contradiction !!!

Lemma 3:Let: C be a bivalent configuration of P, e = (p,m) be an event that is applicable to C.

S be the set of configurations reachable

from C without applying e, and D = e(S) = {e(E)| ES and e is applicable to

Then, D contains a bivalent configuration.

So, we proved:

Any deciding run from a bivalent initial configuration goes to univalent configuration, so there must be some single step that goes from a bivalent to univalent configuration.We construct a run that avoids such a step:

deciding run

univalent configuration

end of proof:

we construct an infinite non-deciding run

non-deciding run

……

Start with a bivalent initial configuration( Lemma 2)

The run constructed in stages. Every stage starts with a bivalent configuration and ends with a bivalent configuration

A queue of processes, initially in arbitrary order

Message buffer is ordered according to the time messages were sent

In each stage:

C is a bivalent configuration that the stage starts with.

Suppose that process p heads the queue

Suppose that m is the earliest message to p in the message buffer if any (or otherwise)

e = (p,m)

By Lemma 3 there is a bivalent configuration C’ reachable from C by a schedule in which e is the last event.

After applying this schedule: move p to the back of the queue

Message Buffer

P3 P2 P1 P0

(P1,M) (P0,M) (P2,M) (P3,M)

P0 P3 P2 P1

(P1,M) (P2,M) (P3,M)

P1 P0 P3 P2

(P2,M) (P3,M)

P2 P1 P0 P3

(P3,M)

P3 P2 P1 P0

in any infinite sequence of stages every process takes infinitely many steps

every process receives every message sent to it

Therefore, the constructed run is admissible

never reaches a univalent configuration The protocol never reaches a decision The protocol is not totally correct in

spite of one fault.contradiction

Conclusion

Theorem: No consensus protocol is totally correct

in spite of one fault. Proof Construct the run that was shown

before which is an admissible run which never reaches a univalent configuration

The protocol never reaches a decision The protocol is not totally correct in

spite of one fault.

Conclusion

Theorem:

No consensus protocol is totally correct in spite of one fault.

hw: which process fails in the infinite run that was constructed for the proof?

Main lesson:In an asynchronous system, there is no

way to distinguish between a faulty process and a slow process.

One importance lesson:In an asynchronous system, there is no

way to distinguish between a faulty process and a slow process.

Other tasks not solvable with one faulty processor:

Input graph – connectedOutput graph - disconnected

Many extensions and uses

Other tasks not solvable with one faulty processor:

Input graph – connectedOutput graph - disconnected

Extensions

1 fault t faults

Non-Asynchronous Models Synchronous

f+1 rounds if f failures Asynchronous plus eventual sychrony

eventual synchronized clocks eventual message delivery bound d Some communication links good Consensus terminates:

O((f+4)*d) after stabilization

Asynchronous Consensus

1 stop failure - impossible Initially crash failures - possible

Other Consensus Problems Weak Consensus k-set Consensus Approximate Consensus Byzantine failures What if some nodes lie?

(Non-Asynchronous Models)

• Synchronous model • f stopping failures, n nodes• 2f+1 ≤ n

Failure Detectors Assume total asynchrony Assume failure detector service

Notifies node i when node j fails Eventually…

Allow solving consensus Weakest failure-detector? Leader-election failure-detector

(Non-Asynchronous Models)

References

M. Fischer, N. Lynch, M. Paterson,Impossibility of distributed consensus with

one faulty processor, 1985O. Biran, S. Moran and S. Zaks,A Combinatorial Characterization of the

Distributed Tasks Which Are Solvable in the Presence of One Faulty Processor,

J. of Algorithms, 1990.

Distributed Computing 8. Impossibility of consensus Shmuel Zaks zaks@cs.technion.ac.il ©

Documents

A MidLink presentation Java Security Shmuel Babad CEO MidLink Computing LTD shmuel@midlink.co.il Middleware Lecturer at John Bryce Training

MAVEN: Modular Aspect Veriﬁcation - Springer · PDF fileMAVEN: Modular Aspect Veriﬁcation Max Goldman and Shmuel Katz Technion — Israel Institute of Technology {mgoldman,katz}@cs.technion.ac.il

Volumetric Hierarchical Heavy Hittersmascots/Papers/VHHH.pdfsran@cs.technion.ac.il Gil Einziger Nokia Bell Labs gil.einziger@nokia.com Roy Friedman Computer Science Technion roy@cs.technion.ac.il

NO SOLUTION TO THE - Shmuel Katz

Maxim Zaks: Deep dive into data serialisation

Fallo Sejean-zaks. Voto de Petracchi

Distributed Computing 5. Synchronization Shmuel Zaks zaks@cs.technion.ac.il ©

Shmuel Oren Pricing

Distributed Computing 3. Leader Election – lower bound for ring networks Shmuel Zaks zaks@cs.technion.ac.il ©

O. Biran,S. Moran,S. Zaks

A C OMBINATORIAL C HARACTERIZATION OF THE D ISTRIBUTED 1-S OLVABLE T ASKS Ofer Biran, Shlomo Moran, Shmuel Zaks Presented by Ami Paz Technion, Haifa, Israel

Distributed Computing 1. Lower bound for leader election on a complete graph Shmuel Zaks zaks@cs.technion.ac.il ©

Entitas System Architecture with Unity - Maxim Zaks and Simon Schmid

Learning to Bid in Bridge - Technionshaulm/papers/pdf/Amit-Markovitch-mlj2005.… · Learning to Bid in Bridge Asaf Amit asaf@cs.technion.ac.il Shaul Markovitch shaulm@cs.technion.ac.il

Shmuel Amir.pdf

Banks-Zaks Cosmology, In ation, and the Big Bang Singularity · Prepared for submission to JCAP Banks-Zaks Cosmology, In ation, and the Big Bang Singularity Michal Artymowski, Ido

· 6502 applications Rodnay Zaks SYB9( Title __

An Overview of Aspects Shmuel Katz Computer Science Department The Technion Email: katz@cs.technion.ac.il

Li/CFx Batteries The Renaissance - Shmuel De-Leon - the renaissance.pdf · Li/CFx Batteries The Renaissance 8/6/2011 Shmuel De-Leon Shmuel De-Leon Energy, Ltd. shmueld33@gmail.com

· Advanced ssoe Programming Rodna Zaks . Title __