Cilk NOW Based on a paper by Robert D. Blumofe & Philip A. Lisiecki

Preview:

Citation preview

Cilk NOW

Based on a paper by

Robert D. Blumofe & Philip A. Lisiecki

2

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

3

Introduction: Cilk-NOW features

• Ease of useStandard command line interface for running

Cilk-NOW programs.

• Adaptive parallelismJoining & retreating is oblivious to users.

• Fault toleranceCilk programs oblivious to:

• Check-pointing• Failure detection & recovery

4

Introduction: Cilk-NOW features …

• Flexibility– Sovereignty of workstation’s owner is preserved:

Owner defines “idle”.

• Security– Customary Unix user security.

Users must have Unix login on system.

• Guaranteed performanceUses Cilk’s thread scheduler:

• Work-stealing

• Provably efficient predictable performance.

5

Introduction: Cilk-NOW features …

• No distributed shared memory

• No fault tolerance for I/O

• All workstations share a file system.

• Work focuses on:

– Adaptive parallelism

– Fault tolerance

6

Organization

1. Introduction

2. The Cilk language & work-stealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

7

Cilk language & work stealing scheduler

• This is the same as Cilk.

• The standard Fibonacci example follows.

8

Compute the nth Fibonacci Number

thread fib ( cont int k, int n ){

if ( n < 2 )send_argument ( k, n );

else{

cont int x, y;spawn_next sum ( k, ?x, ?y );spawn fib ( x, n – 1 );spawn fib ( y, n – 2 );

}}thread sum ( cont int k, int x, int y ){

send_argument ( k, x + y );}

9

10

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

11

Cilk-NOW job architecture

• A Cilk-NOW job consists of:– A clearinghouse process– 1 or more worker processes

• Begin a job by typing the commandCilkChouse -- pfold 3 7This starts a worker that:

Forks a clearinghouse process that– Sends the job description to the macro-scheduler– Waits for messages from its workers.

12

13

(b) An idle machine joins the job

• Another machine’s node manager goes “idle”• It sends a job request to the macro-scheduler• The macro-scheduler returns the pfold job• The node manager forks a new worker

with no associated clearinghouse

• The worker registers with the pfold clearinghouse• The clearinghouse gives the worker:

– Its name (worker names are integers, starting from 0)– A list of other workers on this job

• The worker steals a closure from a worker.

14

(c) A no-longer idle machine retreats

• The machine’s owner touches the keyboard

• Node manager sends kill signal to its worker

• Worker catches signal:

– Offloads closures to other workers

– Un-registers from clearinghouse

– Terminates

15

Maintaining the work lists

• Each worker checks in with clearinghouse every 2 seconds.If

a worker’s “lease” expires ( no check in for 30 sec.)

thenthe clearinghouse

removes it from its list

• Clearinghouse returns a list of revisions: – workers to add & delete from local list.

16

UDP

• UDP between:

– Workers

– Clearinghouse & worker

• Faster than TCP for the common case.

• No pretense of reliability when none exists.

17

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

18

Adaptive parallelism

• What happens when a waiting closure gets offloaded to another worker?– How do send_argument invocations get

their info to the moved waiting closure?

• The paper describes a notion of sub-computation, and uses this notion to handle this situation.

• To be continued …

19

A simple way ?

• Have the waiting closure’s unfilled arguments

refer to the continuations that refer to them.

– When the waiting closure is offloaded to a new

worker, the waiting closure informs its

continuations of its new address.

– For this to work, when a continuation is passed to

another closure, the waiting closure is informed • This may be a lot of work.

• To be continued …

20

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

21

Fault tolerance

• To be continued, based on a fuller

understanding of closure migration

under worker retreat.

22

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

23

Cilk-NOW macro-scheduling

24

Organization

1. Introduction

2. The Cilk language & workstealing scheduler

3. Cilk-NOW job architecture

4. Adaptive parallelism

5. Fault tolerance

6. Cilk-NOW macro-scheduling

7. Conclusion

Recommended