Upload
amaya-leeks
View
214
Download
0
Embed Size (px)
Citation preview
2
Consensus
Input: 1 or 0 to each processor
Output: Agreement: all procssors decide 0 or 1 Termination: all processors eventually decide Validity: if all inputs x , then decide x
The result: No completely asynchronous
consensus protocol can tolerate even a single unannounced process death.
e.g., the stopping of a single process at inappropriate time can cause any distributed commit protocol to fail to reach agreement.
Motivation
This problem serves a role that is similar to the one served by “the halting problem” in computability theory.
Many problems equivalent to consensus (or reduce to it)
Protocols in the industry
How commit protocols in the industry dealwith this outcome ?Weaken an assumption. For example: Computation model: e.g., assume
bounded –delay network Fault model: e.g., assume faults only at
start.
6
The Model
Message System Reliable
Delivers all messages correctly Exactly once
Processing Model Completely Asynchronous
No Assumptions about relative speeds
Unbounded time in delivering message
Weak Consensus
Every process starts with initial value in {0,1}
A nonfaulty process decides on a value in {0,1}by entering an appropriate decision state
All nonfaulty process that decide are required to choose the same value (note: termination not required)
Both 0 and 1 are possible decision values, although perhaps for different initial configurations.
(Trivial solutions – e.g., “0” - are ruled out)
7
8
System Model
Communicate by means of one global message buffer
Atomic step Attempt to receive a message Perform local computation Send arbitrary but finite set of
messages
Consensus Protocol
N processes (N > 1) Each process has
xp – one-bit input register yp – output register with values in
{b,0,1} Unbounded amount of internal
storage PC – Program counter
9
10
Consensus Protocol
N processes (N > 1)
process p
xp 0/1 yp 0/1/b memory
(unboundd)
PC
input register
output register
memory
Program counter
11
Fixed starting valued at the memory (except the input register)
Output register starts with b The output register is “write once” when a value is written to the output
register, the process is “in a decision state”.
Process acts deterministically according to a Transition function
12
Communication System
A message is a pair (p,m) p is the name of the destination m is a “message value”
message buffer Maintains messages that have been sent but
not yet delivered
We assume a clique topology
13
two operations by a process : send (p,m) – place (p,m) in the message buffer
( “message (p,m) is sent to process p”)
receive (p)delete a message (p,m) from the message buffer and returns m ( “message (p,m) is received”)OR
returns (message buffer unchanged)
Message system nondeterministic.For fairness we will require in the proof: if receive(p) is performed infinitely many
times, then each message (p,m) in the message buffer is eventually delivered.
Now: the message system can return an infinite number of times in response to receive(p), even though a message (p,m) is in the message buffer .
Note: Assume a clique topology
15
(P1,M)
Message Buffer
(P0,M’)
(P2,M’’)
(P1,M’’’)
Process 0 Process 2Process 1
receive(0) (P0,M’)
16
(P1,M)
Message Buffer
Process 0 Process 2Process 1
receive(1)
(P2,M’’)
(P1,M’’’)
(P1,M’’’) send(2,m
)
(P2,m)
17
Configurations
A configurations consists of Internal state of each process Contents of the message buffer
initial configuration each process p starts with xp=0 or xp=1 the message buffer is empty
step – consists of a primitive step by a single process p. phase 1 – receive(p) is performed phase 2 – p enters a new internal
state and sends a finite set of messages
A step is completely determined by the pair e = (p,m), called an event.
18
19
event e = (p,m) (“receipt of m by p”).
step of a single process p: receive(p) is performed ( p receives
m) p enters a new internal state p sends a finite set of messagesevent and step:
event: syntax
step: semantic
20
Events and Schedules
e(C) – denotes the resulting configuration
(“e can be applied to C”) The event (p,) can always be applied A schedule from C is a finite/infinite
sequence of events that can be applied from C.
The associated sequence of steps is called a run.one: event - step
many: schedule - run
21
If a schedule is finite, (C) denotes the resulting configuration C’, which is “reachable from C “.
C’ is accessible if it is reachable from an initial configuration.
22
Lemma 1 (‘commutativity’)
Lemma 1 : Suppose that from some configuration C, the schedules 1,2 lead to configurations C1,C2 , respectively.
If the sets of processes taking steps in 1 and 2 , respectively, are disjoint, then 2 can be applied to C1 , and 1 can be applied to C2 , and both lead to the same
configuration C3 .
23
C2
C0
C1
C3
1
1
2
2
when 1 and 2 contain a single event (p,m) event
24
(P1,M1) (P2,M2)
(P2,M2) (P1,M1)
12
12
The message buffer of
C3
The message buffer of
C1
The message buffer of
C2
The message buffer of
C0
Message buffer
25
P1Internal state - A
P2Internal state - X
P1Internal state - B
P2Internal state - Y
P1Internal state - B
P2Internal state - X
P1Internal state - A
P2Internal state - Y
1 2
12
All other processors – change unchanged
states
26
C2
C0
C1
C3
1
1
2
2
when 1 and 2 contain a single event (p,m) event - ok when 1 and 2 contain any run – use induction
27
A configuration C has a decision value v if some process p is in a decision state
with yp = v (v =0 or v=1).
A consensus protocol is partially correct if it satisfies two conditions:
1. No accessible configuration has more than one decision value.
2. For each v {0,1}, some accessible configuration has decision value v .
good news - it is non trivial
- sometimes it decides
- it never decides incorrectly
bad news
- termination not guaranteed
- what about delivering all messages?
- what about failures?
28
A process p is nonfaulty in a run if it takes steps. It is faulty otherwise.
bad news: a process can be declared faulty only at
!! A run is admissible if - at most one process is faulty, and - all messages sent to non-faulty
processes are eventually received.
29
A run is deciding if some process reaches a decision state.
A consensus protocol is totally correct in spite of one fault if it is: partially correct, and every admissible run is a deciding run.
30
Theorem: No consensus protocol is totally
correct in spite of one fault.
31
Sketch of Proof: Assume that P is totally correct in spite of one fault.
show an initial configuration from which each decision is still possible ( Lemma 2 )
show that from such a configuration one can always reach another similar configuration
( Lemma 3 ) conclude – by induction – with an
admissible run that never decides – a contradiction.
32
Let C be a configuration and let V be the set of decision values of configurations reachable from C.
C is bivalent if |V| = 2 C is univalent if |V| = 1
if V = {0} then C is 0-valent if V = {1} then C is 1-valent(Note: |V|≠0, since P is totally correct)
Theorem: No consensus protocol is totally correct in spite of one
fault.Proof: Assume that P is totally correct in spite of one fault. We will reach a contradiction.
33
0-valent configuration
From now on:
1-valent configuration
2-valent configuration
Unknown
34
Proof:Assume there is no bivalent initial
configuration.But P is partially correct.So, there are both 0-valent and 1-valent initial configurations.
Lemma 2: P has a bivalent initial configuration.
35
.
.
.
.
.
.
bivalent
configurationinitial configurationsC
36
C0
.
.
.
.
.
.
0-valentconfiguration C1
initial configurations
1-valentconfiguration
37
Two initial configurations are called adjacent if they differ only in the initial value of a single process.
0 1 0 1 1
0 1 0 1 0
x0 x1 x2 x3 x4
38
Claim: There exist a 0-valent initial
configuration C0 adjacents to a 1-valent
initial configuration C1.
39
0 1 0 1 1
1 1 0 1 1
1 1 0 1 0
1 1 0 0 0
1 0 0 0 0
x0 x1 x2 x3 x4
C0
C1
Proof by example:
0-valent
1-valent
40
So: There exist a 0-valent initial
configuration C0 adjacents to a 1-valent
initial configuration C1.
Let p be the process in whose initial value they differ
41
P is a consensus protocol that is totally correct in spite of one fault.
Consider an admissible deciding run (with schedule ) from C0 in which process p takes no steps.
can be applied to C1
The two corresponding configurations are identical, except for the internal state in p
Both runs reach the same decision x
42
x = 1 C0 is bivalent
x = 0 C1 is bivalent
Contradiction.
C1C0
C’
C’’
Decision: x x
0-valent
1-valent
Lemma 2: P has a bivalent initial configuration.
So , we proved:
43
Lemma 3:Let: C be a bivalent configuration of P, e = (p,m) be an event that is applicable to C.
S be the set of configurations reachable
from C without applying e, and D = e(S) = {e(E)| ES and e is applicable to
E}.
Then, D contains a bivalent configuration.
44
E
e2e1e4
ei ≠ e
bivalent configuration
eee e
S
e
D=e(S)
e
e5 e6e7
C
Need to prove: D contains a bivalent configuration
45
Note:e =(p,m) is applicable to Cso: message (p,m) is in the message
buffer,so: e is applicable to every ES.
46
Prove by contradictionAssume that D contains no
D=e(S)
ei ≠ e
e ee e
S
ee
C 0-valent
1-valent
47
Step 1:
Claim: D contains both and
0-valent
1-valent
So: every configuration d D is or
The proof has three steps.
48
Se
D=e(S)
D0 D1
e
e=(p,m)
Step 1
49
1.
2. i
i
case E S
case E S
Î
Ï
C is bivalent There exist Ei, , i=0,1, i-valent
configurations reachable from C.
ei ≠ e
e ee e
S
e
D=e(S)e
C
50
Let F0 = e (E0 ) .
0 1. case E SÎ
E00
e2e1
e4
ei ≠ e
bivalent configurati
onF0
e ee e
S
e
D=e(S)
e
e5 e6
e7
C
0-valent
1-valent
so: D contains
51
e was applied in reaching E0
so, either E0 is in D, or there exists
F0 D from which E0 is reachable.
0 2. ( 0)case E S iÏ =
e2e1e4
ei ≠ e
bivalent configurati
on
e ee e
S
e
D=e(S)
e
e5 e6
e7
F0
E0
C
0-valent
1-valent
so: D contains
52
So: Fi is i-valent (not bivalent) One of Ei and Fi is reachable from the
other.
both and
So, we know that D contains
0-valent
1-valent
End of step 1Start of step 2
53
Step 2Claim: There exist C0 , C1 S such that:
C0 and C1 are neighbors
( C1 = e’(C0 ), e’=(p’,m’) ) D0 = e(C0) is D1 = e(C1) is
(two configurations neighbors if one results from the other in a single step.)
0-valent
1-valent
54
Se
D=e(S)
D0 D1
e’C1C0
e
e=(p,m)
e’=(p’,m’)
Step 2
55
e(C) is or . Suppose it is .
There is in D.
It has predecessor in S.
e(C)
S
D=e(S)
e(C)
C
e
e
0-valent
1-valent
56
Consider the path in S from C to
the predecessor of
e(C)
S
e
D=e(S)
e
e(C)
C
0-valent
1-valent
57
Applying e to each configuration on this path, we get a configuration in D, which is or .
bivalent configurati
on
S
e
D=e(S)
e
e(C)
eee
C
e
58
So we get two configurations C0 and C1 , that
are neighbors in S; i.e., there is e’ s.t.
Se
D=e(S)
e(C)D0 D1
e’C1
C0C
e
0 1
e'C C
59
So, we proved the claim:
There exist C0 , C1 S such that: C0 and C1 are neighbors
( C1 = e’(C0 ), e’=(p’,m’) ) D0 = e(C0) is D1 = e(C1) is
End of step 2 Start of step 3
60
D1 = e’(D0) by Lemma 1Case 1 : p’ ≠
pcontradiction
Se
D=e(S)
e(C)D0 D1
e’C1
C0C
e
e’
Step 3: get to a contradiction
Recall: e=(p,m)
61
Se
D=e(S)
D0 D1
e’C1C0
e
e=(p,m)
e’=(p’,m’)
p’ p
Case 2 : p’ = p
recall:
62
C1
C0
D0D1
A
Case 2 : p’ = p
e
- deciding run from C0 in which p takes no steps A = (C0)
deciding run
1-valent
0-valent
e
e’
e’
e e
E0
E1A is a deciding run. But it cannot be
and it cannot be . a contradiction !!!
63
Lemma 3:Let: C be a bivalent configuration of P, e = (p,m) be an event that is applicable to C.
S be the set of configurations reachable
from C without applying e, and D = e(S) = {e(E)| ES and e is applicable to
E}.
Then, D contains a bivalent configuration.
Lemma 2: P has a bivalent initial configuration.
So, we proved:
64
Any deciding run from a bivalent initial configuration goes to univalent configuration, so there must be some single step that goes from a bivalent to univalent configuration.We construct a run that avoids such a step:
bivalent configuration
bivalent configuration
deciding run
bivalent configuration
…
univalent configuration
end of proof:
65
we construct an infinite non-deciding run
bivalent configuration
bivalent configuration
non-deciding run
bivalent configuration
……
66
Start with a bivalent initial configuration( Lemma 2)
The run constructed in stages. Every stage starts with a bivalent configuration and ends with a bivalent configuration
A queue of processes, initially in arbitrary order
Message buffer is ordered according to the time messages were sent
67
In each stage:
C is a bivalent configuration that the stage starts with.
Suppose that process p heads the queue
Suppose that m is the earliest message to p in the message buffer if any (or otherwise)
e = (p,m)
68
By Lemma 3 there is a bivalent configuration C’ reachable from C by a schedule in which e is the last event.
After applying this schedule: move p to the back of the queue
Message Buffer
eeee
P3 P2 P1 P0
(P1,M) (P0,M) (P2,M) (P3,M)
P0 P3 P2 P1
(P1,M) (P2,M) (P3,M)
P1 P0 P3 P2
(P2,M) (P3,M)
P2 P1 P0 P3
(P3,M)
P3 P2 P1 P0
70
in any infinite sequence of stages every process takes infinitely many steps
every process receives every message sent to it
Therefore, the constructed run is admissible
never reaches a univalent configuration The protocol never reaches a decision The protocol is not totally correct in
spite of one fault.contradiction
Conclusion
Theorem: No consensus protocol is totally correct
in spite of one fault. Proof Construct the run that was shown
before which is an admissible run which never reaches a univalent configuration
The protocol never reaches a decision The protocol is not totally correct in
spite of one fault.
72
Conclusion
Theorem:
No consensus protocol is totally correct in spite of one fault.
hw: which process fails in the infinite run that was constructed for the proof?
Main lesson:In an asynchronous system, there is no
way to distinguish between a faulty process and a slow process.
74
One importance lesson:In an asynchronous system, there is no
way to distinguish between a faulty process and a slow process.
Other tasks not solvable with one faulty processor:
Input graph – connectedOutput graph - disconnected
Many extensions and uses
Other tasks not solvable with one faulty processor:
Input graph – connectedOutput graph - disconnected
Extensions
1 fault t faults
Non-Asynchronous Models Synchronous
f+1 rounds if f failures Asynchronous plus eventual sychrony
eventual synchronized clocks eventual message delivery bound d Some communication links good Consensus terminates:
O((f+4)*d) after stabilization
Asynchronous Consensus
1 stop failure - impossible Initially crash failures - possible
Other Consensus Problems Weak Consensus k-set Consensus Approximate Consensus Byzantine failures What if some nodes lie?
(Non-Asynchronous Models)
• Synchronous model • f stopping failures, n nodes• 2f+1 ≤ n
Failure Detectors Assume total asynchrony Assume failure detector service
Notifies node i when node j fails Eventually…
Allow solving consensus Weakest failure-detector? Leader-election failure-detector
(Non-Asynchronous Models)
References
M. Fischer, N. Lynch, M. Paterson,Impossibility of distributed consensus with
one faulty processor, 1985O. Biran, S. Moran and S. Zaks,A Combinatorial Characterization of the
Distributed Tasks Which Are Solvable in the Presence of One Faulty Processor,
J. of Algorithms, 1990.