View
234
Download
10
Category
Preview:
Citation preview
Cilk NOW
Based on a paper by
Robert D. Blumofe & Philip A. Lisiecki
2
Organization
1. Introduction
2. The Cilk language & workstealing scheduler
3. Cilk-NOW job architecture
4. Adaptive parallelism
5. Fault tolerance
6. Cilk-NOW macro-scheduling
7. Conclusion
3
Introduction: Cilk-NOW features
• Ease of useStandard command line interface for running
Cilk-NOW programs.
• Adaptive parallelismJoining & retreating is oblivious to users.
• Fault toleranceCilk programs oblivious to:
• Check-pointing• Failure detection & recovery
4
Introduction: Cilk-NOW features …
• Flexibility– Sovereignty of workstation’s owner is preserved:
Owner defines “idle”.
• Security– Customary Unix user security.
Users must have Unix login on system.
• Guaranteed performanceUses Cilk’s thread scheduler:
• Work-stealing
• Provably efficient predictable performance.
5
Introduction: Cilk-NOW features …
• No distributed shared memory
• No fault tolerance for I/O
• All workstations share a file system.
• Work focuses on:
– Adaptive parallelism
– Fault tolerance
6
Organization
1. Introduction
2. The Cilk language & work-stealing scheduler
3. Cilk-NOW job architecture
4. Adaptive parallelism
5. Fault tolerance
6. Cilk-NOW macro-scheduling
7. Conclusion
7
Cilk language & work stealing scheduler
• This is the same as Cilk.
• The standard Fibonacci example follows.
8
Compute the nth Fibonacci Number
thread fib ( cont int k, int n ){
if ( n < 2 )send_argument ( k, n );
else{
cont int x, y;spawn_next sum ( k, ?x, ?y );spawn fib ( x, n – 1 );spawn fib ( y, n – 2 );
}}thread sum ( cont int k, int x, int y ){
send_argument ( k, x + y );}
9
10
Organization
1. Introduction
2. The Cilk language & workstealing scheduler
3. Cilk-NOW job architecture
4. Adaptive parallelism
5. Fault tolerance
6. Cilk-NOW macro-scheduling
7. Conclusion
11
Cilk-NOW job architecture
• A Cilk-NOW job consists of:– A clearinghouse process– 1 or more worker processes
• Begin a job by typing the commandCilkChouse -- pfold 3 7This starts a worker that:
Forks a clearinghouse process that– Sends the job description to the macro-scheduler– Waits for messages from its workers.
12
13
(b) An idle machine joins the job
• Another machine’s node manager goes “idle”• It sends a job request to the macro-scheduler• The macro-scheduler returns the pfold job• The node manager forks a new worker
with no associated clearinghouse
• The worker registers with the pfold clearinghouse• The clearinghouse gives the worker:
– Its name (worker names are integers, starting from 0)– A list of other workers on this job
• The worker steals a closure from a worker.
14
(c) A no-longer idle machine retreats
• The machine’s owner touches the keyboard
• Node manager sends kill signal to its worker
• Worker catches signal:
– Offloads closures to other workers
– Un-registers from clearinghouse
– Terminates
15
Maintaining the work lists
• Each worker checks in with clearinghouse every 2 seconds.If
a worker’s “lease” expires ( no check in for 30 sec.)
thenthe clearinghouse
removes it from its list
• Clearinghouse returns a list of revisions: – workers to add & delete from local list.
16
UDP
• UDP between:
– Workers
– Clearinghouse & worker
• Faster than TCP for the common case.
• No pretense of reliability when none exists.
17
Organization
1. Introduction
2. The Cilk language & workstealing scheduler
3. Cilk-NOW job architecture
4. Adaptive parallelism
5. Fault tolerance
6. Cilk-NOW macro-scheduling
7. Conclusion
18
Adaptive parallelism
• What happens when a waiting closure gets offloaded to another worker?– How do send_argument invocations get
their info to the moved waiting closure?
• The paper describes a notion of sub-computation, and uses this notion to handle this situation.
• To be continued …
19
A simple way ?
• Have the waiting closure’s unfilled arguments
refer to the continuations that refer to them.
– When the waiting closure is offloaded to a new
worker, the waiting closure informs its
continuations of its new address.
– For this to work, when a continuation is passed to
another closure, the waiting closure is informed • This may be a lot of work.
• To be continued …
20
Organization
1. Introduction
2. The Cilk language & workstealing scheduler
3. Cilk-NOW job architecture
4. Adaptive parallelism
5. Fault tolerance
6. Cilk-NOW macro-scheduling
7. Conclusion
21
Fault tolerance
• To be continued, based on a fuller
understanding of closure migration
under worker retreat.
22
Organization
1. Introduction
2. The Cilk language & workstealing scheduler
3. Cilk-NOW job architecture
4. Adaptive parallelism
5. Fault tolerance
6. Cilk-NOW macro-scheduling
7. Conclusion
23
Cilk-NOW macro-scheduling
24
Organization
1. Introduction
2. The Cilk language & workstealing scheduler
3. Cilk-NOW job architecture
4. Adaptive parallelism
5. Fault tolerance
6. Cilk-NOW macro-scheduling
7. Conclusion
Recommended