45
CPSC 668 Set 16: Distributed Shared Memory 1 CPSC 668 Distributed Algorithms and Systems Fall 2006 Prof. Jennifer Welch

CPSC 668 Distributed Algorithms and Systems

Embed Size (px)

DESCRIPTION

CPSC 668 Distributed Algorithms and Systems. Fall 2006 Prof. Jennifer Welch. Distributed Shared Memory. A model for inter-process communication Provides illusion of shared variables on top of message passing - PowerPoint PPT Presentation

Citation preview

Page 1: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 1

CPSC 668Distributed Algorithms and Systems

Fall 2006

Prof. Jennifer Welch

Page 2: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 2

Distributed Shared Memory• A model for inter-process communication• Provides illusion of shared variables on top of

message passing• Shared memory is often considered a more

convenient programming platform than message passing

• Formally, give a simulation of the shared memory model on top of the message passing model

• We'll consider the special case of– no failures– only read/write variables to be simulated

Page 3: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 3

Shared Memory Issues• A process will invoke a shared memory operation at

some time• The simulation algorithm running on the same node

will execute some code, possibly involving exchanges of messages

• Eventually the simulation algorithm will inform the process of the result of the shared memory operation.

• So shared memory operations are not instantaneous!– Operations (invoked by different processes) can overlap

• What should be returned by operations that overlap other operations?– defined by a memory consistency condition

Page 4: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 4

Sequential Specifications

• Each shared object has a sequential specification: specifies behavior of object in the absence of concurrency.

• Object supports operations– invocations– matching responses

• Set of sequences of operations that are legal

Page 5: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 5

Sequential Spec for R/W Registers

• Operations are reads and writes

• Invocations are readi(X) and writei(X,v)

• Responses are returni(X,v) and acki(X)

• A sequence of operations is legal iff each read returns the value of the latest preceding write.

Page 6: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 6

Memory Consistency Conditions

• Consistency conditions tie together the sequential specification with what happens in the presence of concurrency.

• We will study two well-known conditions:– linearizability– sequential consistency

• We will only consider read/write registers, in the absence of failures.

Page 7: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 7

Definition of Linearizability• Suppose is a sequence of invocations and

responses.– an invocation is not necessarily immediately

followed by its matching response is linearizable if there exists a permutation

of all the operations in (now each invocation is immediately followed by its matching response) s.t. |X is legal (satisfies sequential spec) for all X, and– if response of operation O1 occurs in before

invocation of operation O2, then O1 occurs in before O2 ( respects real-time order of non-concurrent operations in ).

Page 8: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 8

Linearizability Examples

write(X,1) ack(X)

Suppose there are two shared variables, X and Y, both initially 0

read(Y) return(Y,1)

write(Y,1) ack(Y) read(X) return(X,1)

p0

p1

Is this sequence linearizable? Yes - green triangles.

What if p1's read returns 0?

0

No - see arrow.

1

2

3

4

Page 9: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 9

Definition of Sequential Consistency

• Suppose is a sequence of invocations and responses.

is sequentially consistent if there exists a permutation of all the operations in s.t. |X is legal (satisfies sequential spec) for all X,

and– if response of operation O1 occurs in before

invocation of operation O2 at the same process, then O1 occurs in before O2 ( respects real-time order of operations by the same process in ).

Page 10: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 10

Sequential Consistency Examples

write(X,1) ack(X)

Suppose there are two shared variables, X and Y, both initially 0

read(Y) return(Y,1)

write(Y,1) ack(Y) read(X) return(X,0)

p0

p1

Is this sequence sequentially consistent? Yes - green numbers.

What if p0's read returns 0?

0

No - see arrows.

1 2

3 4

Page 11: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 11

Specification of Linearizable Shared Memory Comm. System• Inputs are invocations on the shared objects• Outputs are responses from the shared

objects• A sequence is in the allowable set iff

– Correct Interaction: each proc. alternates invocations and matching responses

– Liveness: each invocation has a matching response

– Linearizability: is linearizable

Page 12: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 12

Specification of Sequentially Consistent Shared Memory• Inputs are invocations on the shared objects• Outputs are responses from the shared

objects• A sequence is in the allowable set iff

– Correct Interaction: each proc. alternates invocations and matching responses

– Liveness: each invocation has a matching response

– Sequential Consistency: is sequentially consistent

Page 13: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 13

Algorithm to Implement Linearizable Shared Memory• Uses totally ordered broadcast as the underlying

communication system.• Each proc keeps a replica for each shared variable• When read request arrives:

– send bcast msg containing request– when own bcast msg arrives, return value in local replica

• When write request arrives:– send bcast msg containing request– upon receipt, each proc updates its replica's value– when own bcast msg arrives, respond with ack

Page 14: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 14

The Simulation

alg0

read/write return/ack

to-bc-send to-bc-recv

Totally Ordered Broadcast

algn-1

read/write return/ack

to-bc-send to-bc-recv

user of read/write shared memory

Shared Memory

Page 15: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 15

Correctness of Linearizability Algorithm

• Consider any admissible execution of the algorithm – underlying totally ordered broadcast

behaves properly– users interact properly

• Show that , the restriction of to the events of the top interface, satisfies Liveness, and Linearizability.

Page 16: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 16

Correctness of Linearizability Algorithm• Liveness (every invocation has a response):

By Liveness property of the underlying totally ordered broadcast.

• Linearizability: Define the permutation of the operations to be the order in which the corresponding broadcasts are received. is legal: because all the operations are

consistently ordered by the TO bcast. respects real-time order of operations: if O1

finishes before O2 begins, O1's bcast is ordered before O2's bcast.

Page 17: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 17

Why is Read Bcast Needed?

• The bcast done for a read causes no changes to any replicas, just delays the response to the read.

• Why is it needed?

• Let's see what happens if we remove it.

Page 18: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 18

Why Read Bcast is Needed

write(1)

read return(1)

read return(0)

to-bc-send

p0

p1

p2

Page 19: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 19

Algorithm for Sequential Consistency• The linearizability algorithm, without doing a bcast for

reads:• Uses totally ordered broadcast as the underlying

communication system.• Each proc keeps a replica for each shared variable• When read request arrives:

– immediately return the value stored in the local replica

• When write request arrives:– send bcast msg containing request– upon receipt, each proc updates its replica's value– when own bcast msg arrives, respond with ack

Page 20: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 20

Correctness of SC Algorithm

Lemma (9.3): The local copies at each proc. take on all the values appearing in write operations, in the same order, which preserves the per-proc. order of writes.

Lemma (9.4): If pi writes Y and later reads X, then pi's update of its local copy of Y (on behalf of that write) precedes its read of its local copy of X (on behalf of that read).

Page 21: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 21

Correctness of the SC Algorithm

(Theorem 9.5) Why does SC hold?

• Given any admissible execution , must come up with a permutation of the shared memory operations that is– legal and– respects per-proc. ordering of operations

Page 22: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 22

The Permutation • Insert all writes into in their to-bcast

order.• Consider each read R in in the order of

invocation:– suppose R is a read by pi of X– place R in immediately after the later of

• the operation by pi that immediately precedes R in , and

• the write that R "read from" (caused the latest update of pi's local copy of X preceding the response for R)

Page 23: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 23

Permutation Example

write(2)

read return(2)

read return(1)

to-bc-send

p0

p1

p2

ack

write(1) ack

to-bc-send

permutation is given by green numbers

1

3

4

2

Page 24: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 24

Permutation Respects Per Proc. OrderingFor a specific proc:• Relative ordering of two writes is preserved

by Lemma 9.3• Relative ordering of two reads is preserved by

the construction of • If write W precedes read R in exec. , then W

precedes R in by construction• Suppose read R precedes write W in .

Show same is true in .

Page 25: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 25

Permutation Respects Ordering• Suppose R and W are swapped in :

– There is a read R' by pi that equals or precedes R in – There is a write W' that equals W or follows W in the to-

bcast order– And R' "reads from" W'.

• But:– R' finishes before W starts in and– updates are done to local replicas in to-bcast order (Lemma

9.3) so update for W' does not precede update for W– so R' cannot read from W'.

R' R W|pi :

: …W … W' … R' … R …

Page 26: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 26

Permutation is Legal

• Consider some read R by pi and some write W s.t. R reads from W in .

• Suppose in contradiction, some other write W' falls between W and R in :

• Why does R follow W' in ?

: …W … W' … R …

Page 27: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 27

Permutation is Legal

Case 1: R follows W' in because W' is also by pi and R follows W' in .

• Update for W at pi precedes update for W' at pi in (Lemma 9.3).

• Thus R does not read from W, contradiction.

Page 28: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 28

Permutation is LegalCase 2: R follows W' in due to some operation O

by pi s.t. – O precedes R in , and – O is placed between W' and R in

: …W … W' … O … R …

Case 2.1: O is a write.• update for W' at pi precedes update for O at pi in (Lemma 9.3)• update for O at pi precedes pi's local read for R in (Lemma 9.4)• So R does not read from W, contradiction.

Page 29: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 29

Permutation is Legal : …W … W' … O' … O … R …

Case 2.2: O is a read.• A recursive argument shows that there exists a read O'

by pi (which might equal O) that – reads from W' in and– appears in between W' and O

• Update for W at pi precedes update for W' at pi in (Lemma 9.3).

• Update for W' at pi precedes local read for O' at pi in (otherwise O' would not read from W').

• Recall that O' equals or precedes O (from above) and O precedes R (by assumption for Case 2) in

• Thus R cannot read from W, contradiction.

Page 30: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 30

Performance of SC Algorithm

• Read operations are implemented "locally", without requiring any inter-process communication.

• Thus reads can be viewed as "fast": time between invocation and response is that needed for some local computation.

• Time for writes is time for delivery of one totally ordered broadcast (depends on how to-bcast is implemented).

Page 31: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 31

Alternative SC Algorithm

• It is possible to have an algorithm that implements sequentially consistent shared memory on top of totally ordered broadcast that has reverse performance:– writes are local/fast (even though bcasts are sent,

don't wait for them to be received)– reads can require waiting for some bcasts to be

received

• Like the previous SC algorithm, this one does not implement linearizable shared memory.

Page 32: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 32

Time Complexity for DSM Algorithms• One complexity measure of interest for DSM

algorithms is how long it takes for operations to complete.

• The linearizability algorithm required D time for both reads and writes, where D is the maximum time for a totally-ordered broadcast message to be received.

• The sequential consistency algorithm required D time for writes and C time for reads, where C is the time for doing some local computation.

• Can we do better? To answer this question, we need some kind of timing model.

Page 33: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 33

Timing Model

• Assume the underlying communication system is the point-to-point message passing system (not totally ordered broadcast).

• Assume that every message has delay in the range [d-u,d].

• Claim: Totally ordered broadcast can be implemented in this model so that D, the maximum time for delivery, is O(d).

Page 34: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 34

Time and Clocks in Layered Model• Timed execution: associate an occurrence

time with each node input event.• Times of other events are "inherited" from

time of triggering node input– recall assumption that local processing time is

negligible.

• Model hardware clocks as before: run at same rate as real time, but not synchronized

• Notions of view, timed view, shifting are same:– Shifting Lemma still holds (relates h/w clocks and

msg delays between original and shifted execs)

Page 35: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 35

Lower Bound for SC

Let Tread = worst-case time for a read to complete

Let Twrite = worst-case time for a write to complete

Theorem (9.7): In any simulation of sequentially consistent shared memory on top of point-to-point message passing, Tread + Twrite d.

Page 36: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 36

SC Lower Bound Proof• Consider any SC simulation with Tread + Twrite < d.• Let X and Y be two shared variables, both initially 0.• Let 0 be admissible execution whose top layer

behavior is

write0(X,1) ack0(X) read0(Y) return0(Y,0)– write begins at time 0, read ends before time d– every msg has delay d

• Why does 0 exist?– The alg. must respond correctly to any sequence of invocations.– Suppose user at p0 wants to do a write, immediately followed by

a read.– By SC, read must return 0.– By assumption, total elapsed time is less than d.

Page 37: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 37

SC Lower Bound Proof

• Similarly, let 1 be admissible execution whose top layer behavior is

write1(Y,1) ack1(Y) read1(X) return1(X,0)– write begins at time 0, read ends before time d– every msg has delay d

1 exists for similar reason.

• Now merge p0's timed view in 0 with p1's timed view in 1 to create admissible execution '.

• But ' is not SC, contradiction!

Page 38: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 38

SC Lower Bound Prooftime 0 d

write(X,1) read(Y,0)p0

p1

0

write(Y,1) read(X,0)

p0

p1

1

write(X,1) read(Y,0)p0

p1

'

write(Y,1) read(X,0)

Page 39: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 39

Linearizability Write Lower BoundTheorem (9.8): In any simulation of linearizable

shared memory on top of point-to-point message passing, Twrite ≥ u/2.

Proof: Consider any linearizable simulation with Twrite < u/2.

• Let be an admissible exec. whose top layer behavior is:p1 writes 1 to X, p2 writes 2 to X, p0 reads 2 from X

• Shift to create admissible exec. in which p1 and p2's writes are swapped, causing p0's read to violate linearizability.

Page 40: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 40

Linearizability Write Lower Bound0 u/2 utime:

p0

p1

p2

write 1

read 2

write 2

:

p0

p1

p2

delaypattern

d - u/2

d - u/2

d - u/2 d - u/2

d d - u

Page 41: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 41

Linearizability Write Lower Bound0 u/2 utime:

p0

p1

p2

write 1

read 2

write 2

p0

p1

p2

delaypattern

d

d - u

d - u d

d- u d

shift p1

by u/2

shift p2

by -u/2

Page 42: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 42

Linearizability Read Lower Bound

• Approach is similar to the write lower bound. • Assume in contradiction there is an algorithm

with Tread < u/4.• Identify a particular execution:

– fix a pattern of read and write invocations, occurring at particular times

– fix the pattern of message delays

• Shift this execution to get one that is– still admissible– but not linearizable

Page 43: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 43

Linearizability Read Lower Bound

Original execution:

• p1 reads X and gets 0 (old value).

• Then p0 starts writing 1 to X.

• When write is done, p0 reads X and gets 1 (new value).

• Also, during the write, p0 and p1 alternate reading X.

• At some point, the reads stop getting the old value (0) and start getting the new value (1)

Page 44: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 44

Linearizability Read Lower Bound

• Set all delays in this execution to be d - u/2.

• Now shift p2 earlier by u/2.

• Verify that result is still admissible (every delay either stays the same or becomes d or d - u).

• But in shifted execution, sequence of values read is

0, 0, …, 0, 1, 0, 1, 1, …, 1

Page 45: CPSC 668 Distributed Algorithms and Systems

CPSC 668 Set 16: Distributed Shared Memory 45

p0

p1

p2

Linearizability Read Lower Bound

read 0

read 1

read 0

read 1

read 1

read 1

read 1

read 0

write 1

u/2

p0

p1

read 0 read 0 read 1 read 1

p2

read 1read 1 read 1read 0

write 1