Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Presented by
The HPC Challenge (HPCC)Benchmark Suite
Piotr Luszczek
The MathWorks, Inc.
http://icl.cs.utk.edu/hpcc/
2 Luszczek_HPCC_SC07
HPCC: Components
1. HPL (High Performance LINPACK)
2. STREAM
3. PTRANS A ← AT+B
4. RandomAccess
5. FFT
6. Matrix-matrix multiply7. b_eff (effective bandwidth/latency)
------------------------------------------------------- name kernel bytes/iter FLOPS/iter------------------------------------------------------- COPY: a(i) = b(i) 16 0 SCALE: a(i) = q*b(i) 16 1 SUM: a(i) = b(i) + c(i) 24 1 TRIAD: a(i) = b(i) + q*c(i) 24 2-------------------------------------------------------
+1
-1
T: T[k] (+) ai
64 bits
ping
C ← s*C + t * A*B
Ax=b
zk=Σxj exp(-2π√-1 jk/n)
pong
3 Luszczek_HPCC_SC07
HPCC: Motivation and measurement
HPC Challenge BenchmarksSelect Applications
0.00
0.20
0.40
0.60
0.80
1.00
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Spatial Locality
Te
mp
ora
l lo
ca
lity
HPL
Test3D
CG
OverflowGamess
RandomAccess
AVUS
OOCore
RFCTH2
STREAM
HYCOM
Generated by PMaC @ SDSC
Spatial and temporal data locality hereis for one node/processor - i.e., locallyor “in the small.”
Hig
h
Spatial locality
Tem
pora
l lo
calit
y
DGEMMHPL
PTRANSSTREAM
FFT
RandomAccess
Missionpartner
applications
Low High
Measurement
Concept
4 Luszczek_HPCC_SC07
HPCC: Scope and naming conventions
G
EP
S
HPL STREAM
FFT...
RandomAccess(1 m)HPL (25%)
system
CPU
thread
M
PP
M
PP
M
PP
M
PP
Network
M
PP
M
PP
M
PP
M
PP
Network
M
PP
M
PP
M
PP
M
PP
Network
Global
Embarrassingly Parallel
Single
CPU
Memory Interconnect
Computationalresources
core(s)
MPI
OpenMP
Softwaremodules
Vectorize
5 Luszczek_HPCC_SC07
HPCC: Hardware probes
Registers
Cache
Local memory
Disk
Instr. Operands
Blocks
Pages
Remote memory
Messages
HPC ChallengeBenchmark
CorrespondingMemory Hierarchy
HPCS PerformanceTargets (improvement)
•Top500: solves a system
Ax = b
•STREAM: vector operations
A = B + s x C
•FFT: 1D fast Fourier transform
Z = FFT(X)
•RandomAccess: random updates
T(i) = XOR( T(i), r )
bandwidth
latency
2 Petaflops(8x)
6.5 Petabytes(40x)
0.5 Petaflops(200x)
64,000 GUPS(2000x)
• HPCS program has developed a new suite of benchmarks (HPC Challenge).
• Each benchmark focuses on a different part of the memory hierarchy.
• HPCS program performance targets will flatten the memory hierarchy, improvereal application performance, and make programming easier.
6 Luszczek_HPCC_SC07
HPCC: Official submission process1. Download
2. Install
3 . Run
4. Upload results
5. Confirm via @email@
6. Tune
7. Run
8. Upload results
9. Confirm via @email@• Only some routines can be replaced.• Data layout needs to be preserved.• Multiple languages can be used.
Provide detailedinstallation andexecution environment.
Results are immediately availableon the Web site:● Interactive HTML● XML● MS Excel● Kiviat charts (radar plots)
Optional
Prerequisites:• C compiler• BLAS• MPI
7 Luszczek_HPCC_SC07
HPCC: Submissions over time
10
100
1000
10000
100000
1000000
10000000
Nov-03
Jan-04
Mar-04
May-04
Jul-04
Sep-04
Nov-04
Jan-05
Mar-05
May-05
Jul-05
Sep-05
Nov-05
Jan-06
Mar-06
May-06
Jul-06
Sep-06
1
10
100
1000
10000
100000
Jul-04
Aug-04
Sep-04
Oct-04
Nov-04
Dec-04
Jan-05
Feb-05
Mar-05
Apr-05
May-05
Jun-05
Jul-05
Aug-05
Sep-05
Oct-05
Nov-05
Dec-05
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
0.1
1
10
100
1000
10000
Nov-03
Jan-04
Mar-04
May-04
Jul-04
Sep-04
Nov-04
Jan-05
Mar-05
May-05
Jul-05
Sep-05
Nov-05
Jan-06
Mar-06
May-06
Jul-06
Sep-06
0.001
0.01
0.1
1
10
100
1000
Jul-04
Aug-04
Sep-04
Oct-04
Nov-04
Dec-04
Jan-05
Feb-05
Mar-05
Apr-05
May-05
Jun-05
Jul-05
Aug-05
Sep-05
Oct-05
Nov-05
Dec-05
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
STREAM[GB/s]
HPL[Tflop/s]
FFT[Gflop/s]
RandomAccess[GUPS]
Sum Sum
Sum Sum
#1
#1
#1
#1
8 Luszczek_HPCC_SC07
HPCC: Comparing three interconnects
Kiviat chart (radar plot)• 3 AMD Opteron clusters− Clock: 2.2 GHz− 64-processor cluster
• Interconnect typesA. VendorB. CommodityC. GigE− G-HPL− Matrix-matrix multiply
• Cannot be differentiated based on− G-HPL− Matrix-matrix multiply
• Available on HPCC Web site− http://icl.cs.utk.edu/hpcc/
9 Luszczek_HPCC_SC07
HPCC: Analysis of sample resultsHPCS ~102
HPC ~104
Clusters ~106
Systems(in Top500
order)
Meg
aG
iga
Tera
Pet
a
Effe
ctiv
e B
andw
idth
(w
ords
/sec
ond)
• All results inwords/second
• Highlightsmemoryhierarchy
• Clusters− Hierarchy
steepens
• HPC systems− Hierarchy
constant
• HPCS goals− Hierarchy
flattens− Easier to
program
Kilo
10 Luszczek_HPCC_SC07
HPCC: Augmenting June TOP500
TOP500 rating Data provided by HPCC database