View
214
Download
0
Category
Tags:
Preview:
Citation preview
INSTITUTE OF COMPUTING
TECHNOLOGY
An Adaptive Task Creation Strategy for Work-Stealing Scheduling
Lei Wang, Huimin Cui, Yuelu Duan, Fang Lu, Xiaobing Feng, Pen-Chung Yew
ICT, Chinese Academy of Sciences, China
University of Minnesota, U.S.A1
INSTITUTE OF COMPUTING
TECHNOLOGY
Forecast
2
Adaptive task granularity
fine-grained parallelism
tasks
Multi-cores
An adaptive task creation strategy Work-stealing
INSTITUTE OF COMPUTING
TECHNOLOGY
Outline
An adaptive task creation strategy
A new data attribute -- taskprivate
Evaluations
Conclusions
3
INSTITUTE OF COMPUTING
TECHNOLOGY
Background Cilk, Cilk++, X10, OpenMP3.0, TBB, TPL …
Parallel programming languages and libraries to support task-level parallelism
Programmer: dividing work into tasks instead of threads
Runtime system: mapping and scheduling tasks into physical threads
Key technique Work-stealing scheduling
4
INSTITUTE OF COMPUTING
TECHNOLOGY
Granularity
too fine scheduling overhead dominates
too coarse lose potential parallelism, cause starvation
5
cut-off = 3
cut-off = 1
INSTITUTE OF COMPUTING
TECHNOLOGY
An unbalanced computation tree
6P0 – red, P1 – blue, P2 – green, P3 – yellow.
INSTITUTE OF COMPUTING
TECHNOLOGY
A cut-off strategy
7P0 – red, P1 – blue, P2 – green, P3 -- yellow
Load imbalance
INSTITUTE OF COMPUTING
TECHNOLOGY
An adaptive task creation strategy -- AdaptiveTC
8
A special task
P0 – red, P1 – blue, P2 – green, P3 -- yellow
INSTITUTE OF COMPUTING
TECHNOLOGY
AdaptiveTC
When executing a spawn statement a task, a function call (a fake task), a special task the task the fake task the special task
Adaptively switching between tasks and fake tasks to get a better performance Cut-off A special task
9
Keeping idle threads busy
Improving performance
Good load balancing
a task a fake task
a fake task a task
INSTITUTE OF COMPUTING
TECHNOLOGY
cilk int nqueens(int depth, int n, char x [ ]){…
tmpx = Cilk_alloca(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += spawn nqueens(depth + 1, n, tmpx);…sync;return sn;}
(3)
cilk int nqueens(int depth, int n, char x [ ]){…
tmpx = (char *)malloc(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += spawn nqueens(depth + 1, n, tmpx);...sync;free(x); return sn;}
(2) cilk int nqueens(int depth, int n, char x [ ]){...
tmpx =(char *)malloc(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += spawn nqueens(depth + 1, n, tmpx);
free(tmpx);...sync;return sn;}
(1)
Which Cilk programs are correct?
10
N-queen problem
INSTITUTE OF COMPUTING
TECHNOLOGY
A new data attribute -- taskprivate Workspace copying
Not easy to program Overhead is high
taskprivate Introduced for
workspace variables
11
cilk int nqueens(int depth, int n, char x [ ])
taskprivate: (x[]) (n * sizeof(char));{ int sn = 0; if(depth >= n){ sn++; return sn; } for(j = 0; j < n; j++){ if(place(depth, j, x)){ x[depth] = j; sn += spawn nqueens(depth + 1, n, x); } }
sync; return sn;}
An AdaptiveTC program for nqueens
In a fake task (a function call) x[depth] = j; sn += nqueens(depth + 1, n, x);
In a task
x[depth] = j; tmpx = Cilk_alloca(n * sizeof(char)); memcpy(tmpx, x, n * sizeof(char)); sn += nqueens(depth + 1, n, tmpx);
INSTITUTE OF COMPUTING
TECHNOLOGY
Test system, test cases 8 cores
2-processor quad core Intel Xeon E5520 (2.26GHz, 8G memory)
8 test cases 6 are backtracking search programs. 2 are divide and conquer programs.
Compared systems Cilk-5.4.6, Tascell (PPoPP’09), AdaptiveTC gcc -O3
12
INSTITUTE OF COMPUTING
TECHNOLOGY
Test case 1 -- performance
1 2 3 4 5 6 7 80
1
2
3
4
5
6
7
8
CilkCilk-SYNCHEDTascellAdaptiveTC
Number of Threads
Spee
dup
(Seconds) 1 thread 8 threads
C 61 61
Cilk 198 24.57
Cilk-SYNCHED 184 22.41
Tascell 85 14.24
AdaptiveTC 66 8.27
13Nqueen-array(16)
INSTITUTE OF COMPUTING
TECHNOLOGY
Test case 1 -- analysis
Tascell Cilk Cilk-SYNCHED
AdaptiveTC0%
20%
40%
60%
80%
100%
120%working taskprivate variable
Load balanced
28.7% 69.2% 67% 7.9% The usage of cores with 8 threads
14
Tascell Cilk AdaptiveTC
83.3%99.9% 99.0%
16.7%0.1% 1.0%
busy idle
Breakdown of overhead
overhead
INSTITUTE OF COMPUTING
TECHNOLOGY
1 2 3 4 5 6 7 80
1
2
3
4
5
6
7
8
Cilk
Cilk-SYNCHED
Tascell
AdaptiveTC
Number of Threads
Spee
dup
Test case 2 -- performance
(Seconds) 1 thread 8 threads
C 554 554
Cilk 669 85
Cilk-SYNCHED 661 88
Tascell 627 114
AdaptiveTC 612 77
15Nqueen-compute(16)
INSTITUTE OF COMPUTING
TECHNOLOGY
Test case 2 -- analysis
11.7% 17.2% 16.2% 9.5%
Tascell Cilk Cilk-SYNCHED
AdaptiveTC0%
20%
40%
60%
80%
100%
120%
working taskprivate variabledeque/nested function
Load balanced
The usage of cores with 8 threads
Tascell Cilk AdaptiveTC
79.2%99.9% 99.1%
20.8%0.1% 0.9%
busy idle
16
Breakdown of overhead
overhead
INSTITUTE OF COMPUTING
TECHNOLOGY
012345678
1 2 3 4 5 6 7 8
spee
dup
# of threads
Sudoku ( i nput_bal ance tree)
Ci l k
Ci l k-SYNCHED
Tascel l
Adapti veTC
Kni ght' s tour(6*6)
0123456789
10
1 2 3 4 5 6 7 8# of threads
spee
dup Ci l k
Ci l k-SYNCHEDTascel lAdapti veTC
St r i mko
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8
# of threads
Spee
dup Ci l k
Ci l k- SYNCHEDTascel lAdapt i veTC
Pentomi no(13)
012345678
1 2 3 4 5 6 7 8
# of threads
Spee
dup Ci l k
Ci l k- SYNCHEDTascel lAdapt i veTC
Experimental results
17
INSTITUTE OF COMPUTING
TECHNOLOGY
Comp(60000)
01
23
45
67
8
1 2 3 4 5 6 7 8
# of threads
Spee
dup Ci l k
Tascel lAdapti veTC
Fi b(45)
0123
4567
1 2 3 4 5 6 7 8
# of threads
spee
dup Ci l k
Tascel lAdapt i veTC
Nquee
n_ar
ray(
16)
Nquee
n_co
mpu
te(16
)
Strim
ko
Knigh
t's T
our(6
*6)
Sudok
u (b
alanc
e_tre
e)
Pento
min
o(13
)
Fib(4
5)
Comp(
6000
0)
Avera
ge0
0.51
1.52
2.53
3.54
Cilk Cilk_SYNCHED Tascell AdaptiveTC
Sp
eed
up
Experimental results (cont’d)
18
Figure: Speedup with 8 threads, baseline is Cilk’s execution time
speedup
Cilk 1
Cilk-SYNED 1.07
Tascell 1.5
AdaptiveTC 2.24
INSTITUTE OF COMPUTING
TECHNOLOGY
Conclusions -- AdaptiveTC
An adaptive task creation strategy controls the tasks granularity. Reducing the system overhead Achieving a good load balancing
A new data attribute taskprivate is introduced for workspace variables. Improving the programmability Reducing the cost of workspace copying with an
adaptive task creation strategy
19
INSTITUTE OF COMPUTING
TECHNOLOGY
Thanks!
20
Recommended