1 Efficient Dependency Tracking for Relevant Events in Shared Memory Systems Anurag Agarwal...

Preview:

Citation preview

1

Efficient Dependency Tracking for Relevant Events in Shared Memory Systems

Anurag Agarwal (anurag@cs.utexas.edu)Vijay K. Garg (garg@ece.utexas.edu)

PDS LabUniversity of Texas at Austin

2

Outline

Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

3

Motivation

Dependency between events required for global state information

Applications like monitoring and debugging Vector clock [Fidge 88, Mattern 89]

O(N) operations for a system with N processes Dynamic creation of processes

4

Outline

Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

5

Relevant Events

Events “useful” for application Predicate Detection

“There are no messages in the channel”

p1

p2

p3

p4

6

Vector Clocks [Fidge 88, Mattern 89] Assigns N-tuple (V) to every relevant event

e → f iff e.V < f.V (clock condition)

Process Pi : V = (0, … , 0) On an event e

I. If e is receive of message m:V = max (V, m.V)

II. If e is a relevant event:V[i] = V[i] + 1

III.If e is a send of message m:m.V = V

7

Outline

Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

8

Key Idea

Any chain in the computation poset can function as a process

a

f

eb

d

c

h

g

p1

p2

p3

p4

a b c d

e f g h

9

Chain Clocks

A component in timestamp corresponds to a chain

Change “Rule II” in the vector clock algorithm If e is a relevant event

V[e.c] = V[e.c] + 1

Theorem: Chain clocks guarantee the “clock condition”

Goal: Online decomposition of poset into as few chains as possible

10

Outline

Motivation Background Chain Clock Instances of Chain Clock

DCC ACC VCC

Experimental Results Conclusion

11

Dynamic Chain Clocks (DCC)

Shared vector Z maintains up-to-date values of all components

Each process starts with empty vector Rule II

e.c = j such that Z[j] = e.V[j] Give preference to component last updated by Pi

V[e.c] = V[e.c] + 1

12

DCC: Example

I. If e is receive of message m:

V = max (V, m.V)

II. If e is a relevant event:e.c = i s.t. Z[i] = V[i]V[e.c] = V[e.c] + 1Z[e.c] = Z[e.c] + 1

III. If e is a send of message m: m.V = V

(1)p1

p2(0,1)

(1,1) = max{(1),(0,1)}

1 10

V1 V2 Z

1 1 122

(2,1)

(3,2)p3

V3

132

3

(3,1)

13

(3,1)

2

13

Problem

Number of processes can be much larger than minimal number of chains

(1)

p1

p2(0,1) (1,2)

(0,1,1) (1,2,2)

(0,1,1,1) (1,2,2,2)

p3

p4

14

Optimal Chain Decomposition Antichain: Set of pairwise concurrent elements Width: Maximum size of an antichain

Dilworth’s Theorem [1950] : A poset of width k can be partitioned into k chains and no fewer.

Requires knowledge of complete poset

15

Online Chain Decomposition

Elements of poset presented in a total order consistent with the poset

Assign elements to chains as they arrive Can be modeled as a game between

Bob : Presents elements Alice : Assigns them to chains

Felsner [1997] : For a poset of width k, Bob can force Alice to use k(k+1)/2 chains

16

Chain Partitioning Algorithm (ACC) Felsner gave an algorithm which meets the k(k+1)/2

bound Our algorithm is simpler and more efficient

B1 B2 B3

B1 … Bk : |Bi| = i

For an element z:

Insert into the first queue q in Bi with head < z

Swap queues in Bi and Bi-1 leaving q in its place

z

17

Drawback of DCC and ACC Require a shared data structure

Monitoring applications generally need a central server

Hybrid clocks Multiple servers, each responsible for a subset of

processes Finds chains within a process group

18

Shared Memory System

Accesses to shared variables induce dependencies

Observation: Access events for a shared variable form a chain

Variable-based Chain Clocks (VCC) Associate a component with every variable

19

VCC Application: Predicate Detection Predicate : (x = 1) and (y = 1) Only events changing x and y are relevant Associate a component of VCC with x and

other with y

x = 0

x =1 x = 2

x = 1y = 1

y = 2

Initially: x=0, y = 0

20

Outline

Motivation Background Chain Clock Instances of Chain Clock Experimental Results Conclusion

21

Experiments

Setup A multithreaded application Each thread generates a sequence of events Parameters:

Number of Processes Number of Events Probability of relevant event:

Metrics Number of components used Execution time

22

Components Used

Events = 100 = 1%

23

Execution Time

Events = 100 = 1%

24

Effect of Relevancy

Threads = 100Events = 100

25

Conclusion

Generalized vector clocks to a class of algorithms called Chain Clocks

Dynamic Chain Clock (DCC) can provide tremendous speedup and reduce memory requirement for applications

Antichain-based Chain Clock (ACC) meets the lower bound for chain decomposition

26

Questions?

27

28

Example: Poset of width 2

For a poset of width 2, Alice can force Bob to use 3 chains

1

2

1

3

29

Drawback of DCC and ACC Require a shared data structure

Monitoring applications generally need a central server

Hybrid clocks Multiple servers, each responsible for a subset of

processes Finds chains within a process group

30

Example: Poset of width 2

For a poset of width 2, Alice can force Bob to use 3 chains

1

2

1

3

31

Chain Partitioning Algorithm (ACC) Felsner gave an algorithm which meets the k(k+1)/2

bound Our algorithm is simpler and more efficient

B1 B2 B3

B1 … Bk : |Bi| = i

For an element z:

Insert into the first queue q in Bi with head < z

Swap queues in Bi and Bi-1 leaving q in its place

z

32

Happened Before Relation (→)[Lamport 78] Distributed computation with N processes Every process executes a series of events

Internal, send or receive event

p1

p2

e → f if there is a path from e to f e║f if there is no path between e and f

33

Future work

Lower bound for online chain decomposition when a decomposition into N chains is already known

Other chain decomposition strategies

34

Distributed System: Time vs Threads

Events = 100 = 1%

35

Distributed System: Events vs Time

Threads = 100 = 1%

36

Effect of Number of Events

Threads = 100 = 1%

37

DCC: Example

I. If e is receive of message m:

V = max (V, m.V)

II. If e is a relevant event:e.c = i s.t. Z[i] = V[i]V[e.c] = V[e.c] + 1Z[e.c] = Z[e.c] + 1

III. If e is a send of message m: m.V = V

(1)p1

p2(0,1)

(1,1) = max{(1),(0,1)}

1 10

V1 V2 Z

1 1 122

(2,1)

(3,2)p3

V3

132

3

(3,1)

13

(3,1)

2

38

39

40

Example for DCC – is it appropriate ? Is the content a bit too much for this amount

Where can I reduce it ? Remove VCC or ACC ?

Chain clock Generalizes vector clocks Reduces the time and memory overhead Elegantly handles dynamic process creation

Recommended