24
Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

Embed Size (px)

Citation preview

Page 1: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

Cilk NOW

Based on a paper by

Robert D. Blumofe & Philip A. Lisiecki

Page 2: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

2

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

Page 3: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

3

Introduction: Cilk-NOW features

• Ease of useStandard command line interface for running

Cilk-NOW programs.

• Adaptive parallelismJoining & retreating is oblivious to users.

• Fault toleranceCilk programs oblivious to:

• Check-pointing• Failure detection & recovery

Page 4: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

4

Introduction: Cilk-NOW features …

• Flexibility– Sovereignty of workstation’s owner is preserved:

Owner defines “idle”.

• Security– Customary Unix user security.

Users must have Unix login on system.

• Guaranteed performanceUses Cilk’s thread scheduler:

• Work-stealing

• Provably efficient predictable performance.

Page 5: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

5

Introduction: Cilk-NOW features …

• No distributed shared memory

• No fault tolerance for I/O

• All workstations share a file system.

• Work focuses on:

– Adaptive parallelism

– Fault tolerance

Page 6: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

6

Organization

1. Introduction

2. The Cilk language & work-stealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

Page 7: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

7

Cilk language & work stealing scheduler

• This is the same as Cilk.

• The standard Fibonacci example follows.

Page 8: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

8

Compute the nth Fibonacci Number

thread fib ( cont int k, int n ){

if ( n < 2 )send_argument ( k, n );

else{

cont int x, y;spawn_next sum ( k, ?x, ?y );spawn fib ( x, n – 1 );spawn fib ( y, n – 2 );

}}thread sum ( cont int k, int x, int y ){

send_argument ( k, x + y );}

Page 9: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

9

Page 10: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

10

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

Page 11: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

11

Cilk-NOW job architecture

• A Cilk-NOW job consists of:– A clearinghouse process– 1 or more worker processes

• Begin a job by typing the commandCilkChouse -- pfold 3 7This starts a worker that:

Forks a clearinghouse process that– Sends the job description to the macro-scheduler– Waits for messages from its workers.

Page 12: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

12

Page 13: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

13

(b) An idle machine joins the job

• Another machine’s node manager goes “idle”• It sends a job request to the macro-scheduler• The macro-scheduler returns the pfold job• The node manager forks a new worker

with no associated clearinghouse

• The worker registers with the pfold clearinghouse• The clearinghouse gives the worker:

– Its name (worker names are integers, starting from 0)– A list of other workers on this job

• The worker steals a closure from a worker.

Page 14: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

14

(c) A no-longer idle machine retreats

• The machine’s owner touches the keyboard

• Node manager sends kill signal to its worker

• Worker catches signal:

– Offloads closures to other workers

– Un-registers from clearinghouse

– Terminates

Page 15: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

15

Maintaining the work lists

• Each worker checks in with clearinghouse every 2 seconds.If

a worker’s “lease” expires ( no check in for 30 sec.)

thenthe clearinghouse

removes it from its list

• Clearinghouse returns a list of revisions: – workers to add & delete from local list.

Page 16: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

16

UDP

• UDP between:

– Workers

– Clearinghouse & worker

• Faster than TCP for the common case.

• No pretense of reliability when none exists.

Page 17: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

17

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

Page 18: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

18

Adaptive parallelism

• What happens when a waiting closure gets offloaded to another worker?– How do send_argument invocations get

their info to the moved waiting closure?

• The paper describes a notion of sub-computation, and uses this notion to handle this situation.

• To be continued …

Page 19: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

19

A simple way ?

• Have the waiting closure’s unfilled arguments

refer to the continuations that refer to them.

– When the waiting closure is offloaded to a new

worker, the waiting closure informs its

continuations of its new address.

– For this to work, when a continuation is passed to

another closure, the waiting closure is informed • This may be a lot of work.

• To be continued …

Page 20: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

20

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

Page 21: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

21

Fault tolerance

• To be continued, based on a fuller

understanding of closure migration

under worker retreat.

Page 22: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

22

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

Page 23: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

23

Cilk-NOW macro-scheduling

Page 24: Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

24

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion