Upload
rosalind-nelson
View
218
Download
1
Embed Size (px)
Citation preview
Two Techniques for Proving Lower Bounds
Hagit AttiyaTechnion
Goal of this Presentation
•Describe two common techniques for proving lower bounds in distributed computing:▫Information theory arguments▫Covering
•Variations•Applications
nicer system architecture
My always first slide…
real system architecture
algorithm
problem
implementation
Part IInformation Theory Arguments
Overview
•Bound the flow of information among processes (and memory)
•Show that information takes long to be acquired
•Argue that solving a particular problem requires information about many processes
•Usually applies to:▫Shared memory systems▫Synchronous executions (imply lower bounds
also for asynchronous executions)•Details depend on the primitives used
Single-writer registers: Possible argument•Need to read from each process•The state of a process can be found only
in its own register•Hence, first process must read n registers
Not really
When processes take steps together
First process doubles information in 2nd step
But can’t do better than that
More Refined Argument
• Consider synchronized executions▫Processes take steps in rounds ▫All reads appear before all writes
• INF(pi,t-1): The set of inputs influencing process pi at the start of round t▫For t = 1, INF(pi,t-1) = {pi}
▫For t > 1, if pi reads a value written by pj, INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1)
▫For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1)
INF determines the state
• INF(pi,t-1): The set of inputs influencing process pi
at the start of round t▫For t = 1, INF(pi,t-1) = {pi}
▫For t > 1, if pi reads a value written by pj, INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1)
▫For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1)
Proof by case analysis
Lemma: If the states of processes in INF(pi,t-1) are the same in configurations C and C’, then pi takes the same steps in a t-round execution from C and from C’
Size of INF
• INF(pi,t-1): The set of inputs influencing process pi at the start of round t▫For t = 1, INF(pi,t-1) = {pi}
▫For t > 1, if pi reads a value written by pj, INF(pi,t) = INF(pi,t-1) [ INF(pj,t-1)
▫For t > 1, if pi writes, INF(pi,t) = INF(pi,t-1)
• I(t) = max |INF(pi,t)|
I(t) ≤ 2t
Lemma: I(0) = 1, and I (t) ≤ 2 I(t-1)
Simple application: Computing OR
• Consider input configurationC0 = (0,0, , 0, , 0)
• The size of the influence set of a process is < n in all rounds < log n
• Some process pi is not in INF(p1,log n-1)
By lemma, p_1 returns the same value in C0 and in C1 = (0,0, , 1, , 0)
A contradiction
pi
Application: Approximate agreement
For a small ² > 0•Processes start with input in [0,1]•Must decide on an output in [0,1] such that
▫All outputs are within ² of each other (agreement)
▫If all inputs are v, the output is v (validity)
System is asynchronous and a process must decide even if it runs by itself (solo termination)
Application: Approximate agreement
[Attiya, Shavit, Lynch]
•Consider input configuration C0 = (0,0, , , , 0)
•Run all processes to completion from C0
must decide 0
•If number of rounds T < log nÞ I(T) < nÞ 9 process pi INF(p1,T)
Approximate agreement (cont.)
•Consider two input configurations C0 = (0, , , , , 0)
C1 = (0, , 1 , , 0)
•Run pi to completion, must decide 1
•pi INF(p1,T)
Þp1 still decides 0 when running from this configuration, contradicting agreement
pi
Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run
Approximate agreement (cont.)
•Consider two input configurations C0 = (0, , , , , 0)
C1 = (0, , 1 , , 0)
•Run pi to completion, must decide 1
•pi INF(p1,T)
Þp1 still decides 0 when running from this configuration, contradicting agreement
pi
Theorem: Solo-terminating approximate agreement requires (log n) rounds in a synchronous failure-free run
Overhead of solo-termination: in “nice” runs, since otherwise, a synchronous algorithm can solve the problem in one round.
With multi-writer registers
•Previous theorem does not hold•A wait-free approximate agreement
algorithm that takes O(1) rounds in “nice” executions
[Schenk]
•Even simpler: An O(1) OR algorithm
With multi-writer registers
•Previous theorem does not hold•A wait-free approximate agreement
algorithm that takes O(1) rounds in “nice” executions
[Schenk]
•Even simpler: An O(1) OR algorithm
•Only a few initial configurations to distinguish between
Can you
find it?
Overhead of single-writer registers: Separates single-writer and multi-writer registers
Information flow with multi-writer registers
The previous argument does not hold
Instead, consider how learning more information allows to differentiate between input configurations
Capture as a partitioning of process states and memory values
[Beame]
(0, , 1 , , 0)
(0 , ,
, , ,0)
(1, , 1 , , 0)
(0, , 0 , , 1)
Multi-writer registers: Ordering events
Within each round•Put all reads, then•Put all writes
ÞReads obtain value written at the end of previous round
Partitioning into equivalence classes
For process p and round t, two input configurations are in the same equivalence class of P(p,t) if p is in the same state after t rounds from both(in a synchronous failure-free execution)
P(t): the number of classes after t rounds (max over p)
V(R,t), V(t) defined similarly for locations R
P(t), V(t) · (4n+2)2t−2
Lemma: P(t) · P(t-1)V(t-1) and V(t) · n P(t-1)+V(t-1)
Application: The collect problem
• update(v) stores v as latest value of a process• collect() returns a set of values (one per process)
When each process initially stores one of two valuesÞ There are 2n possible input configurations
Each leading to a different output
Previous lemma implies (4n+2)2t−2 ≥ P(t) ≥ 2n
Þ Must have (log n) rounds
Also for other primitives (CAS)
Non-reading CAS
Reading CAS returns the old value (can be handled, but we won’t do that)
Can also extend to non-reading kCAS
CAS(R,old,new){if R==old then
R = newreturn success
else return fail}
Careful with CAS
More information flow in a sequence of steps
initially, R == 0cas(R,0,1) cas(R,1,2) . . . cas(R,n−1,n)
On the other hand
cas(R,n-1,n) cas(R,n-2,n-1) . . . cas(R,0,1)
Ordering events within a round
Put all reads first.Put all writes last.
For every register R whose current value is v, consider all CAS events:
▫Put all events with old v: all fail▫Put all events with old == v: only the first
succeeds(assumes operations are non-degenerate)
Allows to prove a lemma analogue to multi-writer registers (different constants)
Information Flow with Bounded Fan-In
Arbitrary objects, but bounded contention▫Not too many processes access the same base
object similtaneously
Isolate processes n a Q-independent execution ▫Only processes in Q take steps▫Access only objects not modified by processes
in QFor a process p 2 Q, a Q-independent
execution is indistinguishable from a p-solo execution
Constructing independent executions
Proof by induction, with a trivial base case.
Induction step: consider Qt-independent execution. We use the following result from graph theory.
Look at the next steps processes in Qt are about to perform, and construct an undirected graph (V,E)
Lemma: For any algorithm using only objects with contention ≤ w and every t ≥ 0, there is a t-round Qt-independent execution, with| Qt | ≥ n/(w+2)t
Turan theorem: Any graph (V,E) has an independent set of size |V|2/(|V|+2|E|)
Induction step: The graph
• V = Qt
• E contains an edge {pi, pj} if ▫pi and pj access the same object, or
▫pi is about to read an object modified by pj, or
▫pj is about to read an object modified by pi
|E| ≤ | Qt|(w+1)/2
Turan’s theorem and inductive hypothesis there is an independent set Qt+1 of size ≥ n/(w+2)t
Omit all steps of Qt – Qt+1 from the execution to get a Qt+1-independent execution
Application: Weak Test&Set
Weak test&set: Like test&set but at most one success
Take t such that (w+2)t < nLemma gives a t-round {pi,pj}-independent execution
• Each of pi and pj seems to be running solo must succeed Contradiction
Theorem: The solo step complexity of weak test&set is (log n / log w )
Part IICovering
Covering: The basic idea
Several processes write to the same locationWrites by early processes are lost, if no read in between
Must write to distinct locationsOther process must read these locations
Max Register
•WriteMax(v,R) operation
•ReadMax operation op returns the maximal value written by a WriteMax operation that▫completed before op started, or▫overlaps op
•Special case of a linearizable object
Lower bound for ReadMax operation
[Jayanti, Tan, Toueg]
The proof is constructive
Theorem: ReadMax must read n different registers.
Construction for the lower bound
®k ¯k
writesby p1 … pk
to R1 … Rk
p1 … pk
perform WriteMaxoperations
°k
Pn performs ReadMaxoperationreads
R1 … Rk
Proof by induction on k = 0, …, n
Base case is simple
Taking k = n yields the result
Inductive Step
®k ¯k
writesby p1 … pk
to R1 … Rk
p1 … pk
perform WriteMaxoperations
°k
Pn performs ReadMaxoperation
pk+1
perform WriteMaxoperations
must write to R R1 …
Rk
¯k
writesby p1 … pk
to R1 … Rk
°k
Pn performs ReadMaxoperation
does not observe
pk+1
¼k
Inductive Step
®k ¯k
writesby p1 … pk
to R1 … Rk
p1 … pk
perform WriteMaxoperations
°k
Pn performs ReadMaxoperation
pk+1
perform WriteMaxoperations
must write to R R1 …
Rk
¯k
writesby p1 … pk
to R1 … Rk
°k
Pn performs ReadMaxoperation
must readR R1 …Rk
Inductive Step
®k ¯k
writesby p1 … pk
to R1 … Rk
p1 … pk
perform WriteMaxoperations
°k
Pn performs ReadMaxoperation
pk+1
perform WriteMaxoperations
¯k
writesby p1 … pk
to R1 … Rk
°k
Pn performs ReadMaxoperationwrite to Rk+1
Claim follows with R1 … Rk Rk+1 and ®k+1 = ®k ¼k
¼k
Swap objects
Theorem holds for other primitives and objects, e.g., (register-to memory) swap
Need some care in constructing ¼k, °k
swap(R,v){tmp = Rreturn tmp
}
Result holds also for other objects•E.g., counters
•Constructed execution contains many increment operations
•Better algorithms when▫Few increment operations▫Max register holds bounded values
[Aspnes, Attiya, Censor-Hillel]
Counters with CAS
Counters can be implemented with a single location R, and a single CAS per operation:•To increment, simply:
▫read previous value from R▫CAS +1 to R
•To read the counter, simply read R
Lots of contention on R! This is inevitable
The memory stalls measure[Dwork, Herlihy, Waarts]
If k processes access (or modify) the same location at the same configuration
▫The first process incurs one step, and no stalls▫The second process incurs one step, and one stall▫ .▫ .▫ .▫The k’th process incurs one step, and k-1 stalls
Lower bound on number of stallsTheorem: ReadCounter must incur n stalls + steps.
p1 … pk poised onR1 … Rm, m · k
p1 … pk
perform Incrementoperations
Pn performs ReadCounter
operation
accessesR1 … Rm
Similar construction as in previous theorem
Lower bound on number of stallsTheorem: ReadCounter must incur n stalls + steps.
p1 … pk poised onR1 … Rm, m · k
p1 … pk
perform Incrementoperations
Pn performs ReadCounter
operation
accessesR1 … Rk
incurs k
stalls +
steps
Similar construction as in previous theorem
Wrap-up
•There are many lower bound results But fewer techniques…
•Some results & techniques are relevant to questions asked in Transform
•Material is based on monograph-in-writing with Faith Ellen▫Let me know if you want to proof-read it!