24
Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder

Time and Global States Part 3 ECEN5053 Software Engineering of Distributed Systems University of Colorado, Boulder

Embed Size (px)

Citation preview

Time and Global StatesPart 3

ECEN5053 Software Engineering of Distributed Systems

University of Colorado, Boulder

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

2

Topics Clock synchronization Logical clocks Vector timestamps Global State

                                           

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

3

Vector Timestamps With Lamport logical clocks, nothing can be said about

the relationship between a and b simply by comparing their timestamps C(a) and C(b). Just because C(a) < C(b), doesn’t mean a happened

before b (remember concurrent events) Consider network news where processes post articles and

react to posted articles Postings are multicast to all members Want reactions delivered after associated postings

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

4

Will totally-ordered multicasting work?

That scheme does not mean that if msg B is delivered after msg A, B is a reaction to msg A. They may be completely independent.

What’s missing? If causal relationships are maintained within a group of

processes, then receipt of a reaction to an article should always follow the receipt of the article. If two items are independent, their order of delivery

should not matter at all

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

5

Vector Timestamps capture causality VT(a) < VT(b) means event a causally precedes event b. Let each process Pi maintain vector Vi such that

Vi[i] is the number of events that have occurred so far at Pi

If Vi[j] = x then Pi knows that x events have occurred at Pj

We increment Vi[i] at the occurrence of each new event that happens at process Pi

Piggyback vectors with msgs that are sent. When Pi sends msg m, it sends its current vector along as a timestamp vt.

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

6

For example P1 V1 = [3,1,2]

P2 V2 = [2,4,1]

P3 V3 = [0,2,5]

With a new event at P2, V2[2] = V2[2] + 1

If P3 receives a msg a from P1, V3[1] = vt(a)[1] + 1 where vt(a) = V1

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

7

Receiver thus learns the number of events that have occurred at Pi

Receiver is also told how many events at other processes have taken place before Pi sent message m. timestamp vt of m tells the receiver how many events in

other processes have preceded m and on which m may causally depend

When Pj receives m, it adjusts its own vector by setting each entry Vj[k] to max{Vj[k], vt[k]}

The vector now reflects the # of msgs that Pj must receive to have at least seen the same msgs that preceded the sending of m.

Vj[i] is incremented by 1 representing the event of receiving msg m as the next message from Pi

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

8

When are messages delivered?

Vector timestamps are used to deliver msgs when no causality constraints are violated.

When process Pi posts an article, it multicasts that article as a msg a with timestamp vt(a) set equal to Vi.

When another process Pj receives a, it will have adjusted its own vector such that Vj[i] > vt(a)[i]

Now suppose Pj posts a reaction by multicasting msg r with timestamp vt(r) equal to Vj. vt(r)[i] > vt(a)[i].

Both msg a and msg r will arrive at Pk in some order

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

9

When receiving r, Pk inspects timestamp vt(r) and decides to postpone delivery until all msgs that causally precede r have been received as well.

In particular, r is delivered only if the following conditions are met

1. vt(r)[j] = Vk[j] + 1

2. vt(r)[i] <= Vk [i] for all i not equal to j

1. says r is the next msg Pk was expecting from Pj

2. says Pk has seen no msg not seen by Pj when Pj sent r. In particular, Pk has already seen message a.

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

10

Controversy There has been some debate about

whether support for totally-ordered and causally-ordered multicasting should be provided as part of the message-communication layer or

whether applications should handle ordering Comm layer doesn’t know what it contains, only potential

causality2 msgs from same sender will always be marked as

causally related even if they are not Application developer may not want to think about it

Global State

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

12

Global state of a distributed system

Local state of each process and The messages that are currently in transit (sent but not

received)

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

13

Who cares, globally speaking?

When it is known that local computations have stopped and that there are no more messages in transit, the system has obviously entered a state in which no more progress will be made.deadlocked?correctly terminated?

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

14

How to record the global state Distributed snapshot

reflects a state in which the distributed system might have been

reflects a consistent global state If we have recorded that process P has received a msg

from another process Q, then we should also have recorded that process Q had actually sent the msg

The reverse condition (Q has sent a msg that P has not yet received) is allowed.

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

15

Cut! A cut represents the last event that has been recorded for

each of several processes.All recorded msg receipts have a corresponding

recorded send event An inconsistent cut would have a receipt of a msg but no

corresponding send event

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

16

The algorithm (Chandy & Lamport)

Assume the distributed system can be represented as a collection of processes connected to each other through uni-directional point-to-point communication channels.

Any process may initiate the algorithm.P records its own local state It sends a marker along each of its outgoing channels,

indicating that the receiver should participate in recording the global state

...

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

17

Chandy & Lamport algorithm, cont When process Q receives the marker through an incoming

channel C, its action depends on whether or not it has already saved its local state

If it has not it first records its local state and also sends a marker

along its own outgoing channels If it has

the marker on channel C is an indicator that Q should record the state of the channel, namely, the sequence of messages received by Q since the last time it recorded its own local state and before it received the marker.

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

18

Chandy & Lamport algorithm, cont A process has finished its part of the algorithm when it has

received a marker along each of its incoming channels and processed each one.

Its recorded local state as well as the state it recorded for each incoming channel, can be collected and sent to the process that initiated the snapshot

The initiator can subsequently analyze the current state Meanwhile, the distributed system as a whole can

continue to run normally

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

19

Photo album Because any process can initiate the algorithm, the

construction of several snapshots may be in progress at the same time

A marker is tagged with the identifier and possibly also a version number of the process that initiated the snapshot

Only after a process has received that marker through each of its incoming channels, can it finish its part in the construction of that marker’s associated snapshot

Application of a snapshot

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

21

Termination Detection If a process Q receives the marker requesting a snapshot

for the first time,considers the process that sent that marker as its

predecessor When Q completes its part of the snapshot, it sends its

predecessor a DONE msg. By recursion, when the initiator of the distributed snapshot

has received a DONE msg from all of its successors, it knows the snapshot has been completely taken

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

22

What if msgs still in transit? A snapshot may show a global state in which msgs are still

in transit Suppose a process records that it had rec’d msgs along one

of its incoming channels between the point where it had recorded its local state and the point where it received the marker through that

channel Cannot conclude the distributed computation is completed Termination requires a snapshot in which all channels are

empty

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

23

Modified algorithm When a process Q finishes its part of a snapshot, it either

returns DONE or CONTINUE to its predecessor A DONE msg is returned only when

All of Q’s successors have returned a DONE msgQ has not received any msg between the point it

recorded its own local state and the point it had received the marker along each of its incoming channels

In all other cases, Q sends a CONTINUE msg to its predecessor

November 2, 2005 ECEN5053 SW Eng of Distributed Systems, Time and Global States, Univ of Colorado

24

Modified algorithm, continued

The original initiator of the snapshot will either receive at least one CONTINUE or only DONE msgs from its successors

When only DONE messages are received, it is known that no regular msgs are in transit

Conclusion? The computation has terminated. If a CONTINUE appears, P initiates another snapshot and

continues to do so until only DONE msgs are returned.

(There are lots of other algorithms, too.)