Presented by
Modeling Assertions: Symbolic Model Representation of Application Performance
Jeffrey S. Vetter (PI)Sadaf R. Alam, Nikhil Bhatia
Future Technologies Group Computer Science and Mathematics Division
2 Vetter_Modeling_SC07
Modeling assertions
What? Portable and extensible workload and performance
modeling framework
Why? Provide an integrated solution to develop, validate, and
experiment with symbolic performance models of large-scale applications
Provide an efficient mechanism for sensitivity analysis and workload project growth rates for future problem and system configurations
Components A portable API for code annotation
Post-processing tools
Synthetic MPI trace generation with OTF compatible format
2 Vetter_Modeling_SC07
3 Vetter_Modeling_SC07
Application instrumentation
Application instrumentation
PMPIPMPIPAP
IPAP
I
appapp
pomppomp
Controlflow
model
Controlflow
model
Symbolicmodel
Symbolicmodel
C-stylemodel
C-stylemodel
Modeling workflow
Build Execute
Validateoffline/online
Model generation
MPI taskMPI taskMPI taskMPI task
MPI taskMPI task
MPI taskMPI task
MPI taskMPI task
MPI taskMPI task
MPI taskMPI taskMPI taskMPI task
MPI taskMPI task
MPI taskMPI task
MPI taskMPI task
MPI taskMPI task
MPI taskMPI task
MPI taskMPI task
MPI taskMPI task
MPI taskMPI task
Modelvalidation
4 Vetter_Modeling_SC07
Toward an integrated framework
Online validation, analysis and prediction
Pragma-based instrumentation
Realtime, efficient error reporting mechanisms
Application instrumentation
Application instrumentation
PMPIPMPIPAP
IPAP
I
appapp
pomppomp
Build
ExecuteValidateAnalyze
Controlflow
model
Controlflow
model
Symbolicmodel
Symbolicmodel
C-stylemodel
C-stylemodel Report
statisticsReport
statistics
Modelvalidation
5 Vetter_Modeling_SC07
Sensitivity analysis of DOE applications
Performance and scaling bottlenecks due to• Application parameters
• System parameters
Goal: Identify workload sensitivity at scale before system and application deployment
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Message count per simulation
time step
2x2 2x4 4x2 4x4 4x8 8x4 8x8 8x16 16x8 16x16 16x32 32x16 32x32
Processor grid
Message size (bytes)
0 1 2 4 8 16 32 64128 256 512 1024 2048 4096 8192 1638432768 65536 131072 262144 524288 1048576
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Message count per simulation
time step
2x2 2x4 4x2 4x4 4x8 8x4 8x8 8x16 16x8 16x16 16x32 32x16 32x32
Processor grid
Message size (bytes)
0 1 2 4 8 16 32 64128 256 512 1024 2048 4096 8192 1638432768 65536 131072 262144 524288 1048576
Message size distribution in the key calculation phases as the MPI grid topology is scaled
6 Vetter_Modeling_SC07
Modeling on emerging architectures
Incorporate “application aware” and “architecture aware” parameter in the model
Goal: Investigate how performance-enhancing features of emerging architecture benefit scientific calculations
23. call maf_vec_loop_start(1,“tzetar”,“size^2*(size-1)*26”, (size**2)*(size-1)*26,” (size^2)*(size-1)*16”, (size**2)*(size-1)*16,”(size-1)/(size/64+1)”, (size-1)/(size/64+1),3) 24. M-------< do k = start(3,c), cell_size(3,c)-end(3,c)-1 25. M 2-----< do j = start(2,c), cell_size(2,c)-end(2,c)-1 26. M 2 Vs--< do i = start(1,c), cell_size(1,c)-end(1,c)-1 27. M 2 Vs
Vector FP OPSVector FP OPSVector LS OPSVector LS OPS AVLAVLMVScoreMVScore
Tv = Tvm + Tvc
Tvc =
Ts = Tsm + Tsc
SFLOPS
0.565GHz/2.26GHz
VFLOPS
4.5GHz/18GHz Tsc =
Tvm =(VLOADS + VSTORES )*8*64
BW *109 * AVLTsm =
(SLOADS + SSTORES )*8
BW *109
7 Vetter_Modeling_SC07
Performance projections
Average vector length prediction with different
Problem configurations
MPI task
Experimented with memory bandwidth values: maximum (stream), minimum (random) and mean of max and min
Performance prediction
Err
or
(%)
Problem class (MPI Tasks)
86420
-2-4-6-8
-10-12-14
W (1) A (1) B (1) C (1) W (4) A (4) B (4) C (4)
W (1) A (1) B (1) C (1) W (4) A (4) B (4) C (4)
Problem class (MPI Tasks)
350300250200150100
500
-50-100
Err
or
(%)
Memory Bandwidth
MinimumAverageMaximum
AVLV FP LSV FP OPS
8 Vetter_Modeling_SC07
Synthetic trace file generation
Prototype for future network design at scale infeasible Issues:
Realistic and representative workload for petascale and exascale systems.
Problem configurations; for example, input decks for petascale and exascale problems do not exist.
Solution Generate parameterized network models and traces to drive
network simulators.
Support common MPI trace file formats like OTF.
Identify workload scaling behavior and patterns.
Investigate alternate algorithms and implementation.
Goal: Generate input traces for future problem and system configurations that do not exist
9 Vetter_Modeling_SC07
Workload sensitivity results
Sensitivity analysis for workload requirements of the PME implementation in SANDER for petaflops scale systems
Number of cores
1.E+00
1.E+01
1.E+02
1.E+03
1.E+04
1.E+05
1.E+06
1.E+07
1.E+08
1.E+09
1 2 4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
4
3276
8
6553
6
1310
72
2621
44
5242
88
1048
576
Memory
Message size
Message count
20 K atomsMemory
Message size
Message count
200 K atomsMemory
Message size
Message count
2 M atoms
10 Vetter_Modeling_SC07
Contacts
Jeffrey S. VetterPrinciple InvestigatorFuture Technologies Group Computer Science and Mathematics Division (865) [email protected]
Sadaf R. Alam(865) [email protected]
Nikhil Bhatia(865) [email protected]
10 Vetter_Modeling_SC07