53
Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 1 Distributed Synchronization: outline Causality and time o Lamport timestamps o Vector timestamps o Causal communication Snapshots This presentation is based on the book: “Distributed operating-systems & algorithms” by Randy Chow and Theodore Johnson

Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

  • Upload
    others

  • View
    21

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan 1

Distributed Synchronization: outline

Introduction

Causality and time

o Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

This presentation is based on the book: “Distributed operating-systems & algorithms” by Randy Chow and Theodore Johnson

Page 2: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

2 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

A distributed system is a collection of independent computational nodes, communicating over a network, that is abstracted as a single coherent system

o Grid computing

o Cloud computing (“infrastructure as a service”, “software as a service”)

o Peer-to-peer computing

o Sensor networks

o …

A distributed operating system allows sharing of resources and coordination of distributed computation in a transparent manner

Distributed systems

(a), (b) – a distributed system

(c) – a multiprocessor

Page 3: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

3 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Underlies distributed operating systems and algorithms

Processes communicate via message passing (no shared memory)

Inter-node coordination in distributed systems challenged by

o Lack of global state

o Lack of global clock

o Communication links may fail

o Messages may be delayed or lost

o Nodes may fail

Distributed synchronization supports correct coordination in distributed systems

o May no longer use shared-memory based locks and semaphores

Distributed synchronization

Page 4: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

4 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Distributed Synchronization: outline

Introduction

Causality and time

o Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

Page 5: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

5 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Events

o Sending a message

o Receiving a message

o Timeout, internal interrupt

Processors send control messages to each other

o send(destination, action; parameters)

Processes may declare that they are waiting for events:

o Wait for A1, A2, …., An A1(source; parameters) code to handle A1 . . An(source; parameters) code to handle An

Distributed computation model

Page 6: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

6 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

A distributed system has no global state nor global clock no global order on all events may be determined

Each processor knows total order on events occurring in it

There is a causal relation between the sending of a message and its receipt

Causality and events ordering

Lamport's happened-before relation H

1. e1 <p e2 e1 <H e2 (events within same processor are ordered)

2. e1 <m e2 e1 <H e2 (each message m is sent before it is received)

3. e1 <H e2 AND e2 <H e3 e1 <H e3 (transitivity)

Leslie Lamport (1978): “Time, clocks, and the ordering of events in a distributed system”

Page 7: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

7 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Causality and events ordering (cont'd) Time P1 P2 P3

e1

e2

e3

e4

e5

e6

e7

e8

e1 <H e7 ? Yes. e1 <H e3 ? Yes. e1 <H e8 ? Yes.

e5 <H e7 ? No.

Page 8: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

8 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

1 Initially my_TS=0

2 Upon event e,

3 if e is the receipt of message m

4 my_TS=max(m.TS, my_TS)

5 my_TS++

6 e.TS=my_TS

7 If e is the sending of message m

8 m.TS=my_TS

Lamport's timestamp algorithm

To create a total order, ties are broken by process ID

Page 9: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

9 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Lamport's timestamps (cont'd) Time P1 P2 P3

e1

e2

e3

e4

e5

e6

e7

e8

1.1

2.1

3.1

1.3

2.2

3.2

4.3

1.2

Page 10: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

10 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Distributed Synchronization: outline

Introduction

Causality and time

o Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

Page 11: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

11 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Lamport's timestamps define a total order

o e1 <H e2 e1.TS < e2.TS

o However, e1.TS < e2.TS e1 <H e2 does not hold, in general. (concurrent events ordered arbitrarily)

Vector timestamps - motivation

Definition: Message m1 casually precedes message m2 (written as m1 <c m2) if

s(m1) <H s(m2) (sending m1 happens before sending m2)

Definition: causality violation occurs if m1 <c m2 but r(m2) <p r(m1).

In other words, m1 is sent to processor p before m2 but is received after it.

Lamport's timestamps do not allow to detect (hence nor prevent)

causality violations.

Page 12: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

12 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Causality violation – an example P1 P2 P3

Migrate O

to P2 Where is O?

M1

On p2

Where is O?

I don’t

know

M2

M3

ERROR!

Causality violation between… M1 and M3.

Page 13: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

13 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Vector timestamps

1 Initially my_VT=[0,…,0]

2 Upon event e,

3 if e is the receipt of message m

4 for i=1 to M

5 my_VT[i]=max(m.VT[i], my_VT[i])

6 My_VT[self]++

7 e.VT=my_VT

8 if e is the sending of message m

9 m.VT=my_VT

For vector timestamps it does hold that: e1 <VT e2 e1 <H e2

e1.VT ≤V e2.VT if and only if e1.VT[i] ≤ e2.VT[i], for every i=1,…,M

e1.VT <V e2.VT if and only if e1.VT ≤V e2.VT and e1.VT ≠ e2.VT

Page 14: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

14 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

P1 P2 P3 P4

1 1

1

1

2 2

2 3 2

3 3 4

5

4

3

5

4

6

e

e.VT=(3,6,4,2)

An example of a vector timestamp

Page 15: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

15 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Comparison of vector timestamps P1 P2 P3 P4

1 1

1

1

2 2

2 3 2

3 3 4

5

4

3

5

4

6 e1 e2

e3

e1.VT=(5,4,1,3) e2.VT=(3,6,4,2) e3.VT=(0,0,1,3)

Page 16: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

16 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

VTs can be used to detect causality violations

P1 P2 P3

Migrate O

to P2 Where is O?

M1

On p2

Where is O?

I don’t

know

M2

M3

ERROR!

Causality violation between… M1 and M3.

(1,0,0) (0,0,1)

(2,0,1)

(3,0,1) (3,0,2)

(3,0,3)

(3,1,3)

(3,2,3)

(3,3,3)

(3,2,4)

Page 17: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

17 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Distributed Synchronization: outline

Introduction

Causality and time

o Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

Page 18: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

18 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

A processor cannot control the order in which it receives messages…

But it may control the order in which they are delivered to applications

Preventing causality violations

Application

FIFO ordering

Message passing

Deliver in

FIFO order

Assign

timestamps

Message arrival order

x1

Deliver

x1

x2

Deliver

x2

x4

Buffer

x4

x5

Buffer

x5

x3

Deliever

x3 x4 x5

Protocol for FIFO message delivery (as in, e.g., TCP)

Page 19: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

19 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Preventing causality violations (cont'd) Senders attach a timestamp to each message

Destination delays the delivery of out-of-order messages

Hold back a message m until we are assured that no message m’ <H m will be delivered

o For every other process p, maintain the earliest timestamp of a message m that may be delivered from p

o Do not deliver a message if an earlier message may still be delivered from another process

Algorithm assumes multicast communication

Page 20: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

20 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

For each processor p,

the earliest timestamp

with which a message

from p may still be

delivered

Page 21: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

21 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

For each process p,

the messages from p

that were received but

were not delivered yet

Page 22: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

22 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

List of messages to be

delivered as a result

of m’s receipt

Page 23: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

23 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

If no blocked

messages from p,

update earliest[p]

Page 24: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

24 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

m is now the most

recent message from

p not yet delivered

Page 25: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

25 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

If there is a process k with delayed

messages for which it is now safe

to deliver its earliest message…

Page 26: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

26 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

Remove k'th earliest message

from blocked queue, make sure it

is delivered

Page 27: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

27 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

If there are additional blocked

messages of k, update earliest[k] to be

the timestamp of the earliest such

message

Page 28: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

28 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

Otherwise the earliest message that

may be delivered from k would have

previous timestamp with the k'th

component incremented

Page 29: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

29 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Algorithm for preventing causality violations 1 earliest[1..M] initially [ <1,0,…0>, <0,1,…,0>, …, <0,0,…,1> ]

2 blocked[1…M] initially [ {}, …, {} ]

3 Upon the receipt of message m from processor p

4 Delivery_list = {}

5 If (blocked[p] is empty)

6 earliest[p]=m.timestamp

7 Add m to the tail of blocked[p]

8 While ( k such that blocked[k] is non-empty AND i {1,…,M} (except k and Self) not_earlier(earliest[i], earliest[k])

9 remove the message at the head of blocked[k], put it in delivery_list

10 if blocked[k] is non-empty

11 earliest[k] m'.timestamp, where m' is at the head of blocked[k]

12 else

13 increment the k'th element of earliest[k]

14 End While

15 Deliver the messages in delivery_list, in causal order

Finally, deliver set of messages that

will not cause causality violation (if

there are any).

Page 30: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

30 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Execution of the algorithm as multicast

P1 P2 P3

(0,1,0)

Since the algorithm is “interested” only in causal order of sending

events, vector timestamp is incremented only upon send events.

(1,1,0)

p3 state

earliest[1] earliest[2] earliest[3]

(1,0,0) (0,1,0) (0,0,1)

m1

m2

(1,1,0) (0,1,0) (0,0,1)

Upon receipt of m2

Upon receipt of m1

(1,1,0) (0,1,0) (0,0,1)

deliver m1

(1,1,0) (0,2,0) (0,0,1)

deliver m2

(2,1,0) (0,2,0) (0,0,1)

m2

m2

m2

m1

Page 31: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

31 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Introduction

Causality and time

o Lamport timestamps

o Vector timestamps

o Causal communication

Snapshots

Distributed Synchronization: outline

Page 32: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

32 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Assume we would like to implement a distributed deadlock-detector

We record process states to check if there is a waits-for-cycle

Snapshots motivation – phantom deadlock

P1 r1 r2 P2

request

OK

request

OK

release request

OK

1 2

3

4

p2

r1

r2

p1

Actual waits-for graph

p2

r1

r2

p1

Observed waits-for graph

request

Page 33: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

Global system state: o S=(s1, s2, …, sM) – local processor states

o The contents Li,j =(m1, m2, …, mk) of each communication channel Ci,j (channels assumed to be FIFO) These are messages sent but not yet received

Global state must be consistent o If we observe in state si that pi received message m from pk, then in

observation sk ,k must have sent m.

o Each Li,j must contain exactly the set of messages sent by pi but not yet received by pj, as reflected by si, sj.

What is a snapshot?

• Snapshot state much be consistent, one that might have existed during the computation.

• Observations must be mutually concurrent that is, no observation casually precedes another observation (a consistent cut)

33 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Page 34: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

Upon joining the algorithm, a process records its local state

The process that initiates the snapshot sends snapshot tokens to its neighbors (before sending any other messages)

o Neighbors send them to their neighbors – broadcast

Upon receiving a snapshot token:

o a process records its state prior to sending/receiving additional messages

o Must then send tokens to all its other neighbors

How shall we record sets Lp,q?

o q receives token from p and that is the first time q receives token: Lp,q={ }

o q receives token from p but q received token before: Lp,q={all messages received by q from p since q received token}

Snapshot algorithm – informal description

34 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Page 35: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

35 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0

integer current_snap[1..M] initially [0,…,0]

integer tokens_received[1..M]

processor_state S[1…M]

channel_state [1..M][1..M]

The version number

of my current

snapshot

Page 36: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

36 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0

integer current_snap[1..M] initially [0,…,0]

integer tokens_received[1..M]

processor_state S[1…M]

channel_state [1..M][1..M]

current_snap[r]

contains version

number of snapshot

initiated by processor r

Page 37: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

37 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0

integer current_snap[1..M] initially [0,…,0]

integer tokens_received[1..M]

processor_state S[1…M]

channel_state [1..M][1..M]

tokens_received[r]

contains the number

of tokens received for

the snapshot initiated

by processor r

Page 38: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

38 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0

integer current_snap[1..M] initially [0,…,0]

integer tokens_received[1..M]

processor_state S[1…M]

channel_state [1..M][1..M]

processor_state[r]

contains the state

recorded for the

snapshot of processor r

Page 39: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

39 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – data structures The algorithm supports multiple ongoing snapshots – one per process

Different snapshots from same process distinguished by version number

Per process variables

integer my_version initially 0

integer current_snap[1..M] initially [0,…,0]

integer tokens_received[1..M]

processor_state S[1…M]

channel_state [1..M][1..M]

channel_state[r][q]

records the state of

channel from q to

current processor for

the snapshot initiated

by processor r

Page 40: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

40 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Page 41: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

41 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Page 42: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

42 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Increment version

number of this snapshot,

record my local state

Page 43: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

43 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN; self, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Send snapshot-token on all outgoing

channels, initialize number of

received tokens for my snapshot to 0

Page 44: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

44 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN; self, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Upon receipt from q of TOKEN for

snapshot (r,version)

Page 45: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

45 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN; self, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

If this is first token for snapshot

(r,version)

Page 46: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

46 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN; self, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Record local state for r'th snapshot

Update version number of r'th snapshot

Page 47: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

47 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN; self, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Set of messages on channel from q is empty

Send token on all outgoing channels

Initialize number of received tokens to 1

Page 48: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

48 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN; self, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Not the first token for (r,version)

Page 49: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

49 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN; self, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

Yet another token for snapshot (r,version)

Page 50: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

50 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN; self, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

These messages are the state of the channel

from q for snapshot (r,version)

Page 51: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

51 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Snapshot algorithm – pseudo-code execute_snapshot()

Wait for a snapshot request or a token

Snapshot request:

1 my_version++, current_snap[self]=my_version

2 S[self] my_state

3 for each outgoing channel q, send(q, TOEKEN; self, my_version)

4 tokens_received[self]=0

TOKEN(q; r,version):

5. If current_snap[r] < version

6. S[r] my state

7. current_snap[r]=version

8. L[r][q] empty, send token(r, version) on each outgoing channel

9. tokens_received[r] 1

10. else

11. tokens_received[r]++

12. L[r][q] all messages received from q since first receiving token(r,version)

13. if tokens_received = #incoming channels, local snapshot for (r,version) is finished

If all tokens of snapshot (r,version) arrived,

snapshot computation is over

Page 52: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

52 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Sample execution of the snapshot algorithm

A token passing system

Each processor receives, in its turn, a privilege token, uses it to perform privileged operations and then passes it on

Assume processor p did not receive the token for a long duration so it invokes the snapshot algorithm to find out why

p q

Page 53: Distributed Synchronization: outline - BGUos152/wiki.files/Distributed-Synchornization1.pdf · Distributed Synchronization: outline Introduction Causality and time o Lamport timestamps

53 Operating Systems, 2015, Meni Adler, Danny Hendler & Roie Zivan

Sample execution of the snapshot algorithm

p q 1. p requests a snapshot

State(p)={}

t

p q 2. q sends privilege token

State(p)={}

t

p q 3. Snapshot token arrives at q

State(p)={} t

State(q)={}, Lp,q={}

State(p)={}

Lq,p={ }

p q 4. Snapshot token arrives at p

State(q)={}, Lp,q={}

Observed state: p q