View
223
Download
0
Category
Tags:
Preview:
Citation preview
Art of Multiprocessor Programming
1
Futures, Scheduling, and Work Distribution
Companion slides forThe Art of Multiprocessor Programming
by Maurice Herlihy & Nir Shavit
Modified by Rajeev Alur forCIS 640 University of Pennsylvania
Art of Multiprocessor Programming
22
How to write Parallel Apps?
• How to– split a program into parallel parts– In an effective way– Thread management
Art of Multiprocessor Programming
33
Matrix Multiplication
BAC
Art of Multiprocessor Programming
44
Matrix Multiplication
cij = k=0N-1 aki * bjk
Art of Multiprocessor Programming
5
Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}
Art of Multiprocessor Programming
6
Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}
a thread
Art of Multiprocessor Programming
7
Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}
Which matrix entry to compute
Art of Multiprocessor Programming
8
Matrix Multiplication class Worker extends Thread { int row, col; Worker(int row, int col) { this.row = row; this.col = col; } public void run() { double dotProduct = 0.0; for (int i = 0; i < n; i++) dotProduct += a[row][i] * b[i][col]; c[row][col] = dotProduct; }}}
Actual computation
Art of Multiprocessor Programming
9
Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}
Art of Multiprocessor Programming
10
Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}
Create n x n threads
Art of Multiprocessor Programming
11
Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}
Start them
Art of Multiprocessor Programming
12
Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}
Wait for them to finish
Start them
Art of Multiprocessor Programming
13
Matrix Multiplication void multiply() { Worker[][] worker = new Worker[n][n]; for (int row …) for (int col …) worker[row][col] = new Worker(row,col); for (int row …) for (int col …) worker[row][col].start(); for (int row …) for (int col …) worker[row][col].join();}
Wait for them to finish
Start them
What’s wrong with this picture?
Art of Multiprocessor Programming
14
Thread Overhead
• Threads Require resources– Memory for stacks– Setup, teardown
• Scheduler overhead• Worse for short-lived threads
Art of Multiprocessor Programming
15
Thread Pools
• More sensible to keep a pool of long-lived threads
• Threads assigned short-lived tasks– Runs the task– Rejoins pool– Waits for next assignment
Art of Multiprocessor Programming
16
Thread Pool = Abstraction
• Insulate programmer from platform– Big machine, big pool– And vice-versa
• Portable code– Runs well on any platform– No need to mix algorithm/platform
concerns
Art of Multiprocessor Programming
17
ExecutorService Interface
• In java.util.concurrent– Task = Runnable object
• If no result value expected• Calls run() method.
– Task = Callable<T> object• If result value of type T expected• Calls T call() method.
Art of Multiprocessor Programming
18
Future<T>Callable<T> task = …; …
Future<T> future = executor.submit(task);
…
T value = future.get();
Art of Multiprocessor Programming
19
Future<T>Callable<T> task = …; …
Future<T> future = executor.submit(task);
…
T value = future.get();
Submitting a Callable<T> task returns a Future<T> object
Art of Multiprocessor Programming
20
Future<T>Callable<T> task = …; …
Future<T> future = executor.submit(task);
…
T value = future.get();
The Future’s get() method blocks until the value is
available
Art of Multiprocessor Programming
21
Future<?>Runnable task = …; …
Future<?> future = executor.submit(task);
…
future.get();
Art of Multiprocessor Programming
2222
Future<?>Runnable task = …; …
Future<?> future = executor.submit(task);
…
future.get();
Submitting a Runnable task returns a Future<?> object
Art of Multiprocessor Programming
2323
Future<?>Runnable task = …; …
Future<?> future = executor.submit(task);
…
future.get();
The Future’s get() method blocks until the computation is
complete
Art of Multiprocessor Programming
2424
Matrix Addition
00 00 00 00 01 01
10 10 10 10 11 11
C C A B B A
C C A B A B
Art of Multiprocessor Programming
2525
Matrix Addition
00 00 00 00 01 01
10 10 10 10 11 11
C C A B B A
C C A B A B
4 parallel additions
Art of Multiprocessor Programming
2626
Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}
Art of Multiprocessor Programming
27
Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}
Base case: add directly
Art of Multiprocessor Programming
28
Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}
Constant-time operation
Art of Multiprocessor Programming
29
Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}
Submit 4 tasks
Art of Multiprocessor Programming
30
Matrix Addition Taskclass AddTask implements Runnable { Matrix a, b; // multiply this! public void run() { if (a.dim == 1) { c[0][0] = a[0][0] + b[0][0]; // base case } else { (partition a, b into half-size matrices aij and bij) Future<?> f00 = exec.submit(add(a00,b00)); … Future<?> f11 = exec.submit(add(a11,b11)); f00.get(); …; f11.get(); … }}
Let them finish
Art of Multiprocessor Programming
31
Dependencies
• Matrix example is not typical• Tasks are independent
– Don’t need results of one task …– To complete another
• Often tasks are not independent
Art of Multiprocessor Programming
3232
Fibonacci
1 if n = 0 or 1F(n)
F(n-1) + F(n-2) otherwise
• Note– potential parallelism– Dependencies
Art of Multiprocessor Programming
3333
Disclaimer
• This Fibonacci implementation is– Egregiously inefficient
• So don’t deploy it!
– But illustrates our point• How to deal with dependencies
• Exercise:– Make this implementation efficient!
Art of Multiprocessor Programming
34
class FibTask implements Callable<Integer> { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 2) { Future<Integer> left = exec.submit(new FibTask(arg-1)); Future<Integer> right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}}
Multithreaded Fibonacci
Art of Multiprocessor Programming
35
class FibTask implements Callable<Integer> { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 2) { Future<Integer> left = exec.submit(new FibTask(arg-1)); Future<Integer> right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}}
Multithreaded Fibonacci
Parallel calls
Art of Multiprocessor Programming
36
class FibTask implements Callable<Integer> { static ExecutorService exec = Executors.newCachedThreadPool(); int arg; public FibTask(int n) { arg = n; } public Integer call() { if (arg > 2) { Future<Integer> left = exec.submit(new FibTask(arg-1)); Future<Integer> right = exec.submit(new FibTask(arg-2)); return left.get() + right.get(); } else { return 1; }}}
Multithreaded Fibonacci
Pick up & combine results
Art of Multiprocessor Programming
3737
Dynamic Behavior
• Multithreaded program is– A directed acyclic graph (DAG)– That unfolds dynamically
• Each node is– A single unit of work
Art of Multiprocessor Programming
3838
Fib DAGfib(4)
fib(3) fib(2)
submitget
fib(2) fib(1) fib(1)fib(1)
fib(1) fib(1)
Art of Multiprocessor Programming
3939
Arrows Reflect Dependenciesfib(4)
fib(3) fib(2)
submitget
fib(2) fib(1) fib(1)fib(1)
fib(1) fib(1)
Art of Multiprocessor Programming
4040
How Parallel is That?
• Define work:– Total time on one processor
• Define critical-path length:– Longest dependency path– Can’t beat that!
Art of Multiprocessor Programming
4141
Fib Workfib(4)
fib(3) fib(2)
fib(2) fib(1) fib(1)fib(1)
fib(1) fib(1)
Art of Multiprocessor Programming
4242
Fib Work
work is 17
1 2 3
84 765 9
1410 131211 15
16 17
Art of Multiprocessor Programming
4343
Fib Critical Pathfib(4)
Art of Multiprocessor Programming
4444
Fib Critical Pathfib(4)
Critical path length is 8
1 8
2 7
3 64
5
Art of Multiprocessor Programming
4545
Notation Watch
• TP = time on P processors
• T1 = work (time on 1 processor)
• T∞ = critical path length (time on ∞ processors)
Art of Multiprocessor Programming
4646
Simple Bounds
• TP ≥ T1/P– In one step, can’t do more than P
work
• TP ≥ T∞
– Can’t beat infinite resources
Art of Multiprocessor Programming
4747
More Notation Watch
• Speedup on P processors– Ratio T1/TP
– How much faster with P processors
• Linear speedup– T1/TP = Θ(P)
• Max speedup (average parallelism)– T1/T∞
Art of Multiprocessor Programming
4848
Matrix Addition
00 00 00 00 01 01
10 10 10 10 11 11
C C A B B A
C C A B A B
Art of Multiprocessor Programming
4949
Matrix Addition
00 00 00 00 01 01
10 10 10 10 11 11
C C A B B A
C C A B A B
4 parallel additions
Art of Multiprocessor Programming
5050
Addition
• Let AP(n) be running time – For n x n matrix– on P processors
• For example– A1(n) is work
– A∞(n) is critical path length
Art of Multiprocessor Programming
5151
Addition
• Work is
A1(n) = 4 A1(n/2) + Θ(1)
4 spawned additions
Partition, synch, etc
Art of Multiprocessor Programming
5252
Addition
• Work is
A1(n) = 4 A1(n/2) + Θ(1)
= Θ(n2)
Same as double-loop summation
Art of Multiprocessor Programming
5353
Addition
• Critical Path length is
A∞(n) = A∞(n/2) + Θ(1)
spawned additions in parallel
Partition, synch, etc
Art of Multiprocessor Programming
5454
Addition
• Critical Path length is
A∞(n) = A∞(n/2) + Θ(1)
= Θ(log n)
Art of Multiprocessor Programming
5555
Matrix Multiplication Redux
BAC
Art of Multiprocessor Programming
5656
Matrix Multiplication Redux
2221
1211
2221
1211
2221
1211
BB
BB
AA
AA
CC
CC
Art of Multiprocessor Programming
5757
First Phase …
2222122121221121
2212121121121111
2221
1211
BABABABA
BABABABA
CC
CC
8 multiplications
Art of Multiprocessor Programming
5858
Second Phase …
2222122121221121
2212121121121111
2221
1211
BABABABA
BABABABA
CC
CC
4 additions
Art of Multiprocessor Programming
5959
Multiplication
• Work is
M1(n) = 8 M1(n/2) + A1(n)
8 parallel multiplications
Final addition
Art of Multiprocessor Programming
6060
Multiplication
• Work is
M1(n) = 8 M1(n/2) + Θ(n2)
= Θ(n3)
Same as serial triple-nested loop
Art of Multiprocessor Programming
6161
Multiplication
• Critical path length is
M∞(n) = M∞(n/2) + A∞(n)
Half-size parallel multiplications
Final addition
Art of Multiprocessor Programming
6262
Multiplication
• Critical path length is
M∞(n) = M∞(n/2) + A∞(n)
= M∞(n/2) + Θ(log n)
= Θ(log2 n)
Art of Multiprocessor Programming
6363
Parallelism
• M1(n)/ M∞(n) = Θ(n3/log2 n)
• To multiply two 1000 x 1000 matrices– 10003/102=107
• Much more than number of processors on any real machine
Art of Multiprocessor Programming
6464
Work Distributionzzz…
Art of Multiprocessor Programming
6565
Work Dealing
Yes!
Art of Multiprocessor Programming
6666
The Problem with Work Dealing
D’oh!
D’oh!
D’oh!
Art of Multiprocessor Programming
6767
Work Stealing No work…
zzzYes!
Art of Multiprocessor Programming
6868
Lock-Free Work Stealing
• Each thread has a pool of ready work
• Remove work without synchronizing
• If you run out of work, steal someone else’s
• Choose victim at random
Art of Multiprocessor Programming
6969
Local Work Pools
Each work pool is a Double-Ended Queue
Art of Multiprocessor Programming
7070
Work DEQueue1
pushBottom
popBottom
work
1. Double-Ended Queue
Art of Multiprocessor Programming
7171
Obtain Work
•Obtain work•Run task until•Blocks or terminates
popBottom
Art of Multiprocessor Programming
7272
New Work
•Unblock node•Spawn node
pushBottom
Art of Multiprocessor Programming
7373
Whatcha Gonna do When the Well Runs Dry?
@&%$!!
empty
Art of Multiprocessor Programming
7474
Steal Work from OthersPick random thread’s DEQeueue
Art of Multiprocessor Programming
7575
Steal this Task!
popTop
Art of Multiprocessor Programming
7676
Task DEQueue
• Methods– pushBottom– popBottom– popTop
Never happen concurrently
Art of Multiprocessor Programming
7777
Task DEQueue
• Methods– pushBottom– popBottom– popTop
These most common –
make them fast(minimize use
of CAS)
Art of Multiprocessor Programming
7878
Ideal
• Wait-Free• Linearizable• Constant time
Art of Multiprocessor Programming
7979
Compromise
• Method popTop may fail if– Concurrent popTop succeeds, or a – Concurrent popBottom takes last
work
Blame the victim!
Art of Multiprocessor Programming
8080
Dreaded ABA Problem
topCAS
Art of Multiprocessor Programming
8181
Dreaded ABA Problem
top
Art of Multiprocessor Programming
8282
Dreaded ABA Problem
top
Art of Multiprocessor Programming
8383
Dreaded ABA Problem
top
Art of Multiprocessor Programming
8484
Dreaded ABA Problem
top
Art of Multiprocessor Programming
8585
Dreaded ABA Problem
top
Art of Multiprocessor Programming
8686
Dreaded ABA Problem
top
Art of Multiprocessor Programming
8787
Dreaded ABA Problem
topCAS
Uh-Oh …
Yes!
Art of Multiprocessor Programming
8888
Fix for Dreaded ABA
stamp
top
bottom
Art of Multiprocessor Programming
8989
Bounded DEQueuepublic class BDEQueue { AtomicStampedReference<Integer> top; volatile int bottom; Runnable[] tasks; …}
Art of Multiprocessor Programming
9090
Bounded DQueuepublic class BDEQueue { AtomicStampedReference<Integer> top; volatile int bottom; Runnable[] tasks; …}
Index & Stamp (synchronized)
Art of Multiprocessor Programming
9191
Bounded DEQueuepublic class BDEQueue { AtomicStampedReference<Integer> top; volatile int bottom; Runnable[] deq; …}
Index of bottom taskno need to synchronizeDo need memory barrier
Art of Multiprocessor Programming
92
Bounded DEQueuepublic class BDEQueue { AtomicStampedReference<Integer> top; volatile int bottom; Runnable[] tasks; …}
Array holding tasks
Art of Multiprocessor Programming
93
pushBottom()
public class BDEQueue { … void pushBottom(Runnable r){ tasks[bottom] = r; bottom++; } …}
Art of Multiprocessor Programming
94
pushBottom()
public class BDEQueue { … void pushBottom(Runnable r){ tasks[bottom] = r; bottom++; } …} Bottom is the index to
store the new task in the array
Art of Multiprocessor Programming
95
pushBottom()
public class BDEQueue { … void pushBottom(Runnable r){ tasks[bottom] = r; bottom++; } …}
Adjust the bottom index
stamp
top
bottom
Art of Multiprocessor Programming
96
Steal Work
public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))
return r; return null; }
Art of Multiprocessor Programming
97
Steal Work
public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))
return r; return null; } Read top (value & stamp)
Art of Multiprocessor Programming
98
Steal Work
public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))
return r; return null; } Compute new value & stamp
Art of Multiprocessor Programming
99
Steal Work
public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))
return r; return null; }
Quit if queue is empty
stamp
top
bottom
Art of Multiprocessor Programming
100
Steal Work
public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))
return r; return null; } Try to steal the task
stamp
topCAS
bottom
Art of Multiprocessor Programming
101
Steal Work
public Runnable popTop() { int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = oldTop + 1; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom <= oldTop) return null; Runnable r = tasks[oldTop]; if (top.CAS(oldTop, newTop, oldStamp, newStamp))
return r; return null; }
Give up if conflict occurs
Art of Multiprocessor Programming
102
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}
Take Work
Art of Multiprocessor Programming
103
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;} 103
Take Work
Make sure queue is non-empty
Art of Multiprocessor Programming
104
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}
Take Work
Prepare to grab bottom task
Art of Multiprocessor Programming
105105
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}
Take Work
Read top, & prepare new values
Art of Multiprocessor Programming
106
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } return null;} 106
Take Work
If top & bottom 1 or more apart, no conflict
stamp
top
bottom
Art of Multiprocessor Programming
107
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;} 107
Take Work
At most one item left
stamp
top
bottom
Art of Multiprocessor Programming
108
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;} 108
Take WorkTry to steal last task.Always reset bottombecause the DEQueue will be empty even if unsuccessful (why?)
Art of Multiprocessor Programming
109109
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}
Take Work
stamp
topCAS
bottom
I win CAS
Art of Multiprocessor Programming
110110
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}
Take Work
stamp
topCAS
bottom
If I lose CASThief must have won…
Art of Multiprocessor Programming
111111
Runnable popBottom() { if (bottom == 0) return null; bottom--; Runnable r = tasks[bottom]; int[] stamp = new int[1]; int oldTop = top.get(stamp), newTop = 0; int oldStamp = stamp[0], newStamp = oldStamp + 1; if (bottom > oldTop) return r; if (bottom == oldTop){ bottom = 0; if (top.CAS(oldTop, newTop, oldStamp, newStamp)) return r; } top.set(newTop,newStamp); return null;}
Take Work
failed to get last task Must still reset top
Art of Multiprocessor Programming
112112
Variations
• Stealing is expensive– Pay CAS– Only one thread taken
• What if– Randomly balance loads?
Art of Multiprocessor Programming
113113
Work Stealing & Balancing
• Clean separation between app & scheduling layer
• Works well when number of processors fluctuates.
• Works on “black-box” operating systems
Recommended