View
7
Download
0
Category
Preview:
Citation preview
Speedup for Multi-Level Parallel Computing
School of Computer Engineering
Nanyang Technological University
21st May 2012
Shanjiang Tang, Bu-Sung Lee, Bingsheng He
OutLine
• Background & Motivation • Multi-Level Parallel Speedup • Evaluation • Conclusion
Multi-Level Computing Architecture and Paradigm
Multi-Level Computing Architecture and Paradigm
• MPI+OpenMP • MPI+CUDA • MPI+OpenMP+CUDA …..
Multi-Level Parallel Computing Model
L Lm L3 L2 L1
LLLLLLLL
Notes: Sequential Part Parallel Part
PE2,
2
PE1,
1 PE2,
1
PE3,
1
PE3,
2
PE3,
3
PE3,
4
PE3,
5
PE3,
6
PE3,
7
PE3,
8
Parallel Speedup
• Definition • Classification
Ø Absolute Speedup
Ø Relative Speedup
SequentialExecutionTimeSpeedupParallelExecutionTime
=
BestSequentialALGExecutionTimeSpeedupParallelALGExecutionTime
=
ParallelALGSequentialExecutionTimeSpeedupParallelALGExecutionTime
=
Relative Speedup Model
• Fixed-size Speedup
Ø Amdahl’s Law
where is parallel fraction workload of the program, is the
number of processors.
• Fixed-time Speedup
Ø Gustafson’s Law
pmeparallelTiTimesequentialSpeedup
αα +−
==1
1
ppmeparallelTiTimesequentialSpeedup αα
αααα
+−=+−
+−== 111
α p
Motivation Example—NAS Benchmark (MPI+OpenMP)
Motivation Example—NAS Benchmark (MPI+OpenMP)
Amdahl’s Law is UNSUITABLE for Multi-Level Parallel Computing
OutLine
• Background & Motivation • Multi-Level Parallel Speedup • Evaluation • Conclusion
E-Amdahl’s Law
• Awareness of Different Grained-Level Parallelism
L Lm L3 L2 L1
LLLLLLLL
Notes:
Sequential Part
Parallel Part
PE2,
2
PE1,
1 PE2,
1
PE3,
1
PE3,
2
PE3,
3
PE3,
4
PE3,
5
PE3,
6
PE3,
7
PE3,
8
⎪⎪⎪⎪
⎩
⎪⎪⎪⎪
⎨
⎧
<≤
++−
=+−
=
)1(
)1()()()(1
1
)(
)()()(1
1
)(
mi
ispipifif
mi
mpmfmf
isp
E-Amdahl’s Law
• Two-Level Parallelism Speedup Model (MPI+OpenMP)
where is the parallel fraction of coarse-grained (MPI-level) parallelism. is the parallel fraction of fine-grained (OpenMP-level) parallelism. is the number of processes spawned. is the number of threads spawned per process.
pt
pt
tpsp)1(
1
1),,,(β
βαα
βα+−
+−
=
αβ
E-Gustafson’s Law
• Awareness of Different Grained-Level Parallelism
L Lm L3 L2 L1
LLLLLLLL
Notes:
Sequential Part
Parallel Part
PE2,
2
PE1,
1 PE2,
1
PE3,
1
PE3,
2
PE3,
3
PE3,
4
PE3,
5
PE3,
6
PE3,
7
PE3,
8
⎩⎨⎧
<≤++−
=+−=
)1()1()()()(1)()()()(1
)(miispipififmimpmfmf
isp
OutLine
• Background & Motivation • Multi-Level Parallel Speedup • Evaluation • Conclusion
Experiment Setup
• Platform and Configuration
Ø A linux cluster consisting of eight computing nodes each with two quad-core chips Ø Configuration: One thread per CPU core
• Benchmarks
NAS Parallel Benchmark (NPB) Multi-Zone (MZ) Version: Ø BT-MZ (Unbalanced Workload Partitioning) Ø SP-MZ (balanced Workload Partitioning) Ø LU-MZ (balanced Workload Partitioning)
Performance Prediction
Prediction Result Comparison
OutLine
• Background & Motivation • Multi-Level Parallel Speedup • Evaluation • Conclusion
Conclusion
• Traditional speedup models are unsuitable for multi-level parallelism
– Unable to be awareness of different granularities of parallelism for multi-level parallel computing.
• Multi-level Parallelism Model
– A guidance model for multi-level optimization. – A prediction model for multi-level parallelism.
Argument Estimation
Speedup Under E-Amdahl’s Law
Speedup Under E-Gustafson’s Law
Recommended