Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Extreme-scaling on Omni-Path fabric:
David M. BenoitE.A. Milne Centre for AstrophysicsDepartment of Physics and MathematicsUniversity of Hull, Cottingham Road, Kingston upon Hull HU6 7RX, [email protected]
performance for computational astrochemistry
@dbenoit1
CIUK18 | Manchester | December 2018
Landing on a comet – ESA Rosetta Mission
• ESA space probe launched in 2004• Lander module Philae explored
comet 67/P Churyumov-Gerasimenko in 2014
CIUK18 | Manchester | December 2018
CIUK18 | Manchester | December 2018
pivotal role in its chemistry.21,22 Dust grains are thought to becomprised of silicates, oxides and carbonaceous materials.23,24
In dense molecular clouds, molecules and atoms from the gasphase condense onto the grains to form molecular ices,16,21,25,26
the composition of which does not reflect gas phase abundances.During the lifetime of a molecular cloud (106–108 years)the ices undergo significant physical and chemical changesdepending on the astrophysical environment. A summary ofsome of the typical processing routes for interstellar ices isshown in Fig. 1. The grain acts as a third body, catalysingchemical reactions at the surface. At very low temperatures(o20 K) atomic species such as H, D, N, C and O form simplemolecules via thermal hopping or quantum tunnelling ongrain surfaces.16 In regions characterised by high atomic Habundances, hydrogenation forms molecules such as H2O,NH3, CH3OH and CH4.
13,16–19,27 Ices comprised of thesemolecules are often referred to as polar ices and are dominatedby H2O.16,28,29 Conversely, in regions where the gas phaseatomic and molecular hydrogen ratios are much lower, surfacebound C, N and O atoms form so called apolar ices consistingof more volatile species including CO, CO2, N2 and O2.
27,30,31
The presence of multiply deuterated species in the ISM hasalso been ascribed to grain surface chemistry.32
Interstellar ices undergo further processing via exposure toultraviolet (UV),20,33,34 X-ray or cosmic radiation,35 in additionto thermal processing. Laboratory studies of ice grain mimicshave shown that UV irradiation gives rise to the formation ofcomplex organics such as NH2CHO, OCN! andC2H5OH.34,36,37 Bombardment of the ices by low energyprotons and ions also leads to the formation of new species,in addition to altering the morphology of the ices.35,38–41 Heatgenerated by new born stars stimulates further surfacechemistry prior to evaporation of these icy mantles duringcloud collapse. In regions of massive star formation known ashot cores, where temperatures rise to in excess of 100 K,sublimation of the ices gives rise to enhanced gas phaseabundances of molecules. Once the molecules have beenreleased into the gas phase, they drive a rich chemistry leading
to the formation of larger organics such as methyl formate(HCOOCH3) and dimethyl ether ((CH3)2O).13,18,42
Studies have shown that the sublimation of interstellar icesis not instantaneous.43 Hence a detailed understanding of thethermal desorption of interstellar ices from dust grains isessential for the accurate modelling of star formation. Thisinformation can be obtained from surface science techniquessuch as temperature programmed desorption (TPD) studies ofmodel interstellar ices on dust grain analogue surfaces.44–57
Numerous studies have shown that the desorption of astro-physically relevant species from H2O-rich ices occurs over arange of temperatures, instead of a single temperature asassumed in many astrophysical models.46,48,58–69 Severalauthors70,71 have demonstrated the importance of usingexperimentally determined kinetic parameters to describe thesublimation of interstellar ices from grains. These data can beincorporated into astrophysical models to extrapolate desorptionevents on timescales relevant to real astrophysical processes.Furthermore, the data can be used to calculate residence timesof molecules on grain surfaces as a function of temperature, inaddition to providing a more accurate method of estimatingtotal column density (an estimate of the thickness of the ice) ininterstellar ices.55,72,73
This perspective aims to provide an overview of the currentlevel of understanding of the adsorption, and particularlydesorption, of astrophysically relevant molecular ices from arange of dust grain analogue surfaces. Whilst the perspectivefocuses on the adsorption and desorption of ices in aninterstellar context, particularly with respect to star-formingregions and to hot cores, much of the data is equally relevantto discussions of the desorption of cometary and planetaryices, although these are not specifically described here. Themain body of the perspective discusses results obtained for arange of different interstellar ices, using ultra-high vacuum(UHV) surface science techniques to study adsorption on anddesorption from a range of dust grain analogues. We provide amore general review of some of the experiments that have beenundertaken to study ice desorption by several different groups,
Fig. 1 Schematic showing the main routes of interstellar ice processing that takes place in astrophysical environments. The molecular species
labelled within the inner layer highlight the main constituents detected in interstellar ices.
5948 | Phys. Chem. Chem. Phys., 2010, 12, 5947–5969 This journal is "c the Owner Societies 2010
Publ
ished
on
23 F
ebru
ary
2010
. Dow
nloa
ded
by U
nive
rsity
of H
ull o
n 30
/10/
2017
16:
34:1
2.
View Article Online
•Dust grains acts as cosmic “bench tops” for chemical reactions•Help transform basic chemicals into more complex molecules• Their study is key to understanding low-temperature chemistry in the Universe• But how strongly do moleculesstick to ice-covered grains/comets?
Surface chemistry on dust grains
From: D.J. Burke et al., PCCP 12 (2010) 5947
CIUK18 | Manchester | December 2018
Adsorption / sticking energies?
• Adsorption energies are hard to measure accurately
• Experimental database estimates are from ‘90s with little experimental/theoretical validation
•Quantum chemical calculations can provide a reliable estimate Penteado et al., ApJ 844 (2017) 71
Scaling quantum chemistry on HPC
CIUK18 | Manchester | December 2018
VIPER technical profile
• 5040 Intel Broadwell E5-2860v4 (2.4 GHz) cores in 180 compute nodes• Intel X16 100Gb/s Omni-Path interconnect• 4 x 1 TB high memory nodes• 2 x visualisation nodes (2 x Nvidia
GeForce GTX 980 Ti per node)• 4 x accelerator nodes (4 x Nvidia Tesla
K40M GPUs per node)• 500 TB of user storage running BeeGFS• Each node runs a Docker container
CIUK18 | Manchester | December 2018
TESSERACT technical profile (SGI 8600)• 20,256 Intel Skylake silver 4116
(2.1 GHz) cores in 844 compute nodes• 96 GB NUMA per node (48GB/
processor)• Intel X16 100Gb/s Omni-Path
interconnect• 3 PB of Lustre file system • DiRAC’s Extreme Scaling facility• Hosted at EPPC/Edinburgh
CIUK18 | Manchester | December 2018
Application: CPMD – Large-scale electronic structure code• Well established, freely available,
density functional theory (DFT) code• Developed by MPI-FKF Stuttgart&IBM• Demonstrated scaling on large HPC• Hybrid MPI/OpenMP parallelisation • Bottlenecks: FFT, memory footprint
and all-to-all comms• Our system: fullerene dimer (C120),
480 electrons, triplet state, B3LYP functional and a 500 Ry cutoff
Li/Air Batteries ( ~ 700 atoms ) PBE0 (SCF performance) BG/Q
cpmd.org
Fullerene dimer, C120
CIUK18 | Manchester | December 2018
Large-scale density functional theory on OPASp
eedu
p (2
4 co
res
= 1)
0
4
8
12
16
20
24
Number of cores0 72 144 216 288 360 432 504 576
Linear scalingOmni-Path (VIPER)EDR – Reference
576 FFT planes
…
Conditions: Omni-Path HFI Silicon 100 Series; Intel 100 Series 48 port unmanaged switches; CentOS 7.2.1511; dockerised nodes; OPA 10.4.1.0-1; CPMD V4.5 compiled with ifort 2017 (-O2 -ipo -xHOST) ATLAS (single thread) and scaLAPACK
Maximum theoretical limit
CIUK18 | Manchester | December 2018
Scaling past the maximum theoretical limit
Spee
dup
(24
core
s =
1)
0
12
24
36
48
60
Number of cores
0 288 576 864 1152 1440 1728 2016 2304
Linear scalingOmni-Path (VIPER)EDR – Reference2 FFT groups (VIPER)4 FFT groups (VIPER)
1/2 MPI FFT ranks
1/4 MPI FFT ranks
CIUK18 | Manchester | December 2018
Locality and OpenMP (kitchen sink approach…)
Spee
dup
(24
core
s =
1)
0
12
24
36
48
60
72
84
96
Number of cores
0 288 576 864 1152 1440 1728 2016 2304
Linear scalingMPI onlyLocal data (MPI only)2 FFT groupsLocal data (2 FFT groups)Local data (OMP2)4 FFT groupsLocal data (OMP2, 2 FFT groups)Local data (OMP4)
Locality reduces number of all-to-all comms (~30%)
OpenMP comms slightly faster than FFT group comms
OpenMP & FFT groups faster than pure OMP or pure FFT groups
CIUK18 | Manchester | December 2018
2 x 576 MPI ranks
1 iteration = 4.4 s
ITAC MPI trace – What happens at 1152 cores?
1152 MPI ranks
576 MPI ranks with 2 OpenMP threads
1 iteration = 4.2 s
1 iteration = 13.5 s
CPMDOpenMPMPI All-reduceMPI all-to-all
CIUK18 | Manchester | December 2018
More cores better? – Tesseract scaling
Spee
dup
(24
core
s =
1)
0
48
96
144
192
240
288
336
384
Number of cores
0 1152 2304 3456 4608 5760 6912 8064 9216
Linear scalingTesseractBest Viper50% scaling
Are we running out of data?
Sticky space ice?
CIUK18 | Manchester | December 2018
Space ice model: Low density ice
• Amorphous low density ice model with a single benzene molecule, 1512 atoms, 4030 electrons, PBE functional and a 200 Ry cutoff• 10-fold size increase from C120
http://www.nims.go.jp/water/hda_lda_tr.html
CIUK18 | Manchester | December 2018
Tesseract scalingSp
eedu
p (3
84 c
ores
= 1
)
0
4
8
12
16
20
24
28
32
Number of cores
0 1536 3072 4608 6144 7680 9216 10752 12288
Linear scalingTesseract50 % scaling
CIUK18 | Manchester | December 2018
Bind
ing
ener
gy [K
]
3250
3265
3280
3295
3310
3325
3340
3355
3370
3385
3400
Cutoff energy
50 100 150 200 250 300 350
Aver
age
itera
tion
time
[min
]0
1
2
3
4
432 580 728 876 1024Number of cores
Adsorption energy estimation
TPD
: 477
5K
UM
IST:
790
0KCP2K
CIUK18 | Manchester | December 2018
Why is it important?Unlikely
Standard UMIST
database
3-times more likelyOur results
Modified from: ESA/Rosetta/RPC-ICA
CIUK18 | Manchester | December 2018
Conclusions
•Omni-Path performance are similar to EDR in conventional scaling region but better in non-scalable regime
•Combination of enhanced locality, reduced MPI ranks and FFT grouping leads to predictable scalability
•However, understanding application and problem size are key to scaling at large core counts
CIUK18 | Manchester | December 2018
Acknowledgements
» VIPER HPC support team
» The University of Hull for funding
» DiRAC / STFC / EPCC for CPU time