Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
A Survey of HPC Systems and Applications in EuropeDr Mark Bull, Dr Jon Hill and Dr Alan Simpson
EPCC, University of Edinburghemail: [email protected], [email protected]
Overview
• Background• Survey design• Survey results
– HPC Systems– HPC Applications
• Selecting a benchmark suite
Background to the survey
• The PRACE project is working towards the installation of Petaflop/s scale systems in Europe.
• Requirement for a set of benchmark applications to assess performance of systems before and during procurement process
• Benchmark applications should be representative of HPC usage by PRACE partners
• To understand current applications usage, we conducted a survey of PRACE partners’ current HPC systems
• We took the opportunity to gather other interesting data as well
• We also devised a method for selecting (and weighting) a set of applications which can be considered representative of the current usage – we wanted to do this in a quantifiable way– we wanted to avoid political considerations
– …but it was not entirely successful!
Survey Design
• We asked the PRACE centres to complete: – a systems survey for their largest system, and any other system over
10 Tflop/s Linpack– an application survey for all applications which consumed more than
5% of the utilised cycles on each system
• We collected data for 24 systems and 69 applications
• Survey was conducted in April 2008– data relates to 2007/8
Systems surveyed
169522675415926176Totals
33011821577TNCe325/v40z/x4600 clusterIBM, SunPSNCTNC52016002200TNCx4100 clusterSUNUC-LCAMilipeia2562400-TNCHS21 clusterHPCINECAXC538437003000FNCp690/p575 clusterIBMGENCIHERA
102439006550FNCp690/p690+/p655 clusterIBMGENCIZAHIR131255688921FNCp690 clusterIBMFZJJump2176820010649TNCCP400 BL ProLiant SuperClusterHPCSCmurska.csc.fi2024888310525MPPXT4CrayCSCLouhi57689239216VECSX8NECUSTUTT-HLRShww SX-8
40961111013926MPPBlue Gene/PIBMEPSRCLegion19201149014592FNCp575 clusterIBMNCFHuygens25601294015360FNCp575 clusterIBMEPSRCHPCx33281422017306MPPXT3CrayETHZPalu56321500059900TNCBL460cHPSIGMAStallo51201991053248TNCBladeCenter Cluster LS21IBMCINECABCX
163843733045875MPPBlue Gene/LIBMFZJJubl53763817050104TNCX7DBT-INFSupermicroPSNCGalera55524200051700MPPXT4CraySIGMAHexagon76804213049152TNC3045BullGENCIPlatine64404446059648TNCCluster 3000 DL140HPSNICNeolith
113285464863437MPPXT4CrayEPSRCHECToR97285652062259FNCAltix 4700SGIBADW-LRZHLRB II
102406383094208TNCJS21 clusterIBMBSCMareNostrum65536167300222822MPPBlue Gene/PIBMFZJJugeneCoresRmaxRpeakArchitectureModelManufacturerCentreSystem
Compute power by architecture type
MPP50%
Thin-node Cluster
35%
Fat-node Cluster14%
Vector1%
LEFs
• The measure of computational power and consumed cycles we use is the Linpack Equivalent Flop (LEF).
• A system which has a Linpack Rmax of 50 Tflop/s is said to have a power of 50T LEFs
• An application which uses 10% of the time on that system is said to consume 5T LEFs
Distribution of LEFs by job size
< 3214.1%
33 - 12822.8%
129 - 51219.4%
512 - 204817.6%
> 204826.1%
0
5
10
15
20
25
30
35
Galera
Neolith
Stallo
BCXHLR
B II
murska
.csc.f
iPlat
ineMilip
eia halo
Huyge
nsZAHIR
MareNos
trum
JumpPalu
hww SX-8
HERAHPCx
HECToRLo
uhi
Jubl
Juge
neXC5
Legio
n
% o
f mac
hine
of t
he m
ean
job
size
Mean job size as % of machine
Job size distribution by system
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Juge
ne
Jubl
Louh
i
XC
5
Legi
on
HE
CTo
R
Pal
u
Mar
eNos
trum
Sta
llo
Pla
tine
Neo
lith
HP
Cx
HLR
B II
BC
X
Huy
gens
ZAH
IR
mur
ska.
csc.
fi
hww
SX
-8
Jum
p
HE
RA
Gal
era
Mili
peia
TNC
>2048513-2048129-51233-128<32
Distribution of LEFs by scientific area
Particle Physics 23.5
Computational Chemistry 22.1
Condensed Matter Physics 14.2
CFD 8.6
Earth & Climate 7.8
Astronomy & Cosmology 5.8
Life Sciences 5.3
Computational Engineering 3.7
Plasma Physics 3.3
Other 5.8
No. of users and Rmax per user
0
200
400
600
800
1000
1200
Jump
HPCx
hww SX-8 ZA
HIR
HLR
B II Ju
bl
HERA
murs
ka.cs
c.fi
Huy
gens
N
eolith
Ju
gene
H
ECToR
BCX TNC Palu
Lo
uhi
Gale
ra
Milip
eia
Legio
n XC5
Machine Name
No.
of U
sers
0
200
400
600
800
1000
1200
Rm
ax p
er U
ser
No. Users Rmax Per User
Parallelisation techniques
• Of the 69 applications, all but two use MPI for parallelisation – exceptions are Gaussian (OpenMP) and BLAST (sequential).
• Of the 67 MPI applications, six also have standalone OpenMP versions and three have standalone SHMEM versions.
• 13 applications have hybrid implementations– 10 MPI+OpenMP, 2 MPI+SHMEM, 1 MPI+Posix threads
• Only one application was reported as using MPI2 single sided communication.
Languages
• 16 applications mix Fortran with C/C
1Mathematica
2Perl
3Python
7C99
10C++
15Fortran77
22C90
50Fortran90
No. of applicationsLanguage
Distribution of LEFs by dwarves
Map reduce methods45.1%
Spectral methods18.4%
Dense linear algebra14.4%
Structured grids9.0%
Particle methods7.2%
Sparse linear algebra3.4%
Unstructured grids2.4%
Distribution of LEFs by dwarf and area
0.000.000.000.000.000.000.00Other
0.630.422.220.000.000.000.00Plasma Physics
89.270.000.100.924.320.0012.50Particle Physics
3.460.280.940.130.944.720.00Life Science
0.000.220.000.003.310.700.00Earth and Climate Science
5.700.281.760.060.9614.339.02Condensed Matter Physics
0.001.010.321.067.091.700.00Computational Fluid Dynamics
2.800.530.000.530.530.000.00Computational Engineering
12.980.497.492.791.1424.8915.09Computational Chemistry
0.002.995.433.264.580.620.00Astronomy and Cosmology
Map reduce
methods
Unstructured
grids
Particlem
ethods
Sparse linearalgebra
Structuredgrids
Spectralm
ethods
Dense linear
algebra
Area/Dwarf
Choosing a benchmark suite
• Want to choose a set of applications to form a benchmark suite– to be used in the procurement process for Petaflop/s systems
• Suggested process: find a set of applications that is a best fit to the area/dwarf table in the sense that it minimises the norm of
Uw-vwhere v is a linearised vector containing the table entriesU is a matrix describing the area/dwarf combinations satisfied by the
applicationsw is vector of weights
• In principle, one could search all possible lists of applications up to a certain length and find the list with the smallest residual– in practise, do a manual search– we want to include other criteria, such as usage of applications,
geographical spread, etc.
• Gives a quantitative measure of how well a benchmark suite represents current usage
• Also gives a weighting for the applications which could be used to weight benchmark results
Problems with this approach
• Classification of codes into dwarves (and to some extent, areas) is somewhat arbitrary– some applications use more than one dwarf: we split the LEFs equally
between dwarves
• Bias to recently acquired systems– high LEFs– recently acquired systems may have atypical usage by early users
• Reflects past, rather than future usage
Current status
• We used the above process as a starting point, then swapped some applications to meet some of the concerns
• 12 core applications, plus 8 additional applications
• Core apps: NAMD, CPMD, VASP, QCD, GADGET, Code_Saturne, TORB, NEMO, ECHAM5, CP2K, GROMACS, N3D
• Additional apps: AVBP, HELIUM, TRIPOLI_4, GPAW, ALYA, SIESTA, BSIT, PEPC
• We have undertaken work to port these applications to the PRACE prototype systems, and optimise them for sequential performance and scalability.
• We are currently collecting benchmark data from the prototype systems which have been installed so far.
• Based on this data, we are reviewing the list of applications toensure that the final benchmark suite contains scalable codes and avoids licensing problems.
Acknowledgements
• The authors would like to acknowledge all those who contributed by filling in survey forms and taking part in subsequent discussions.
• A full report is available from: http://www.prace-project.eu/documents/
Availability and utilisation
0
10
20
30
40
50
60
70
80
90
100
Neo
lith
Pal
u
Jubl
Legi
on
Mar
eNos
trum
Juge
ne
hww
SX
-8
BC
X
halo
Jum
p
Pla
tine
Huy
gens
HE
RA
HP
Cx
Mili
peia
HLR
BII
ZAH
IR
Sta
llo
XC
5
HE
CTo
R
mur
ska.
csc.
fi
Gal
era
Availability %Utilisation %
Top 30 applications by usage
11233pdkgrav-gasoline11398magnum12249helium12525cp2k12713bqcd12713trace12713bam12713wien2k12745chroma12802tripoli422857pepc32903gromacs13092tfs/piano23181smmp13202trio_u13798cactus13982quantum-espresso14223casino24779dl_poly24846materials with strong correlations
25206spintronics17947dynamical fermions28412gadget59680cpmd39975dalton410335namd212393lqcd (two flavor)225007lqcd (twisted mass)935766vasp254923overlap and wilson fermions
Number of systemsLEFs Used (Gflop/s)Application Name