Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Accelerator-enabled quantum chemistry: a viable path to high-throughput HPC? David M. Benoit E.A. Milne Centre for Astrophysics & G.W. Gray Centre for Advanced Materials Chemistry Building School of Physical and Mathematical Sciences The University of Hull, Cottingham Road, Kingston upon Hull HU6 7RX, UK [email protected] @dbenoit1
VIPER @ Hull
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
HPC @ Hull – Implications for the institution
• No previous institutional experience in HPC • Research-community driven process
that convinced University management • £2.1M capital investment • Strong partnerships with: – Red Oak for project management – Clustervision for hardware
• HPC@Hull focussing on future HPC technologies
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
Building VIPER in 50 days
Computenodesarrived!
Omni-Path installed
Firstcomputerack
VIPER!
Factorytes>ng@Clustervision
ShippingtoHullNewcooling installa>onEmptyroom…
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
Making it work…
• Project management is vital • Delivery @ Hull: 13 May 2016 • Go live: 28 June 2016 • Time from delivery
to first job: 47 days
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
VIPER technical profile
• 5040 Intel Broadwell E5-2860v4 (2.4 GHz) cores in 180 compute nodes • Intel X16 100Gb/s Omni-Path
interconnect • 4 x 1 TB high memory nodes • 2 x Visualisation nodes (2 x Nvidia
GeForce GTX 980 Ti per node) • 4 x Accelerator nodes (4 x Nvidia Tesla
K40M GPUs per node) • 500 TB of user storage running BeeGFS
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
VIPER uses Broadwell-based compute nodes
• Memory: 128 GB RAM • Internal storage: 120GB SSD • Form factor: 2U chassis with 4 Intel compute modules • Node performance (28 cores): –Theoretical performance: 1075.2 GF –Observed average performance (HPL): 1017.6 GF
• Memory bandwidth: –HPCC EP-STREAM triad: 17.2 GB/s
• Full system performance (180 nodes): –Theoretical (100% efficiency): 194 TF –Observed average performance (SAT-HPL): 172 TF
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
VIPER runs containerised HPC
• Docker containers on each node • Improves resilience • Greater flexibility • Potential to spin up/store containers
for different configurations or applications • Performance vs bare metal (1 node) –HPL in docker: 992.5 GF –HPL bare metal: 993.5 GF
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
VIPER use Omni-Path as its communication fabric
• HPCC Random ring latency is low –Omni-Path (4 nodes) performance: 0.92 µs –100Gbps Infiniband performance is: ~1.25 µs
• Application performance equal or better than 100Gbps Infiniband • Network capacity (HPCC, 4 nodes) –Average G-PTRANS: 35 Gb/s –Average Random Access test: 1.10 GUPS
• Still a very “new” fabric which is likelyunder-utilised by current applications
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
VIPER’s 500TB parallel filesystem runs on BeeGFS
• Parallel filesystem focussing on performance, similar to Lustre • Simplicity of filesystem makes it easy to manage • VIPER implements a high-resilience design: –2 BeeGFS storage server nodes –multiple JBOD RAID6 arrays with multiple global hot
spares per file server –each node can mount storage attached to the other node
• Performance (IOZONE): –Average read: 8.89 GB/s –Average write: 9.16 GB/s
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
[email protected] our HPC support team
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
VIPER stats so far…
5 months
150,0005.6 Mio
78% 90
Live operation Completed jobs CPU hours Jobs started within 1 min Users
Accelerators & quantum chemistry
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
• Identification of organic molecules in the interstellar medium relies on recognising their vibrational signatures • This requires both on lab-based
measurements and theoretical models • Accurate spectral models require large-
scale quantum chemical calculations • Pyrene model would need
half a million energy evaluations • A high-throughput
approach is crucial
Looking for organic molecules in space
RedRectangleNebula(HD44179)
Halley’sComet(1P/Halley)
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
Quantum chemistry as an HPC application
• Quantum chemistry codes use linear algebra to solve the electronic Schrödinger equation • Solutions of this equation lead to
knowledge of the electronic wave function (molecular orbitals) and total energy of the system • Computationally intense problem that
grows quickly with problem size and required accuracy • Typical workloads require several GB
of memory and runtimes of hours/days Anunoccupiedmolecularorbitalofpyrene
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
CalDSu: requested number of processors reduced to: 28 ShMem 1 Linda.
PrsmSu: requested number of processors reduced to: 7 ShMem 1 Linda.
DoSDTr: NPSUse= 13
JobTyp=1 Pass 1: I= 1 to 5 NPSUse= 1 ParTrn=F ParDer=F DoDerP=T.
� Erreurs sur %Mem
� %Mem trop élevé
�Plantage immédiat avec le message :
Insuffisant virtual memory
� %Mem faible ou pas adapté au type de calcul
�Plantage au bout d’une phase de calcul, avec un message explicite de type :
Insufficient memory for …
�Réduction du nombre de cœurs utilisés pour les calculs, par le message :
GetIJB would need an additional 45990731 words of memory to use all 32 processors.
4. Utilisation spécifique
Dans de rares cas, l’utilisation des nœuds à mémoire large peut être nécessaire. Ces nœuds sont
accessibles avec la directive Loadleveller :
# @ as_limit = x Gb, avec x = 7,0 Gb × n
Dans la formule calculant %Mem, il faut alors remplacer la valeur 3,5 par 7.
5. Performances
Les graphiques suivants présentent les performances de deux systèmes en fonction du nombre de
cœurs utilisés sur la machine Ada :
Test Gaussian n°397 Supercellule de zéolithe (MOR)
Le système traité et le type de calcul ont une incidence directe sur les performances. Ainsi, il est
recommandé de faire des tests de performances pour connaitre le nombre de cœurs maximal à
utiliser (efficacité > 50%).
0
8
16
24
32
0 8 16 24 32
Acc
éré
lati
on
Nombre de cœurs
idéal
50% eff.
mesuré
0
8
16
24
32
0 8 16 24 32
Acc
élé
rati
on
Nombre de cœurs Numberofcores
Speedu
p
Quantum chemistry on HPC…
• Codebase mainly single core retrofitted to be parallel • Computational scaling far from ideal • Memory requirements often cause bottleneck
Gaussian09
0
4
8
12
16
20
24
28
0 4 8 12 16 20 24 28
Speedup ideal speedup 50% efficiency
RI-TPSS-D3/def2-tzvpp energy calculation
ORCASpeedu
p
Numberofcores
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
Leading scalable option for large HPC: NWChem
• Open-source quantum chemistry program • Implements a number of
very accurate wave function solvers • Proven scalable parallel
performance • Hardware acceleration for
both GPU or Xeon Phi • nwchem-sw.org
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
Timings of pyrene CCSD(T)/cc-pVDZ total energyTo
tal C
PU t
ime
0m
40m
80m
120m
160m
200m
240m
280m
24 CPU 24 CPU+1xGPU
24 CPU+4xGPU
24 CPU+XeonPhi
Iden – Hartree (Ivy [email protected]) Viper – Hull ([email protected])
21%fa
ster
35%fa
ster
21%fa
ster
KNC
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
Accelerator-enabled part of CCSD(T) calculationN
on-it
erat
ive
(T)
CPU
tim
e
0m
40m
80m
120m
160m
200m
240m
24 CPU 24 CPU+1xGPU
24 CPU+4xGPU
24 CPU+XeonPhi
Iden – Hartree (Ivy [email protected]) Viper – Hull ([email protected])
25%fa
ster
42%fa
ster
28%fa
ster
KNC
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCFN
on-it
erat
ive
(T) C
PU s
peed
up
0
2
4
6
8
10
12
14
CPU cores (1 Phi / 24 cores)0 24 48 72 96 120 144 168 192 216 240 264 288 312 336
CPU speedupIdeal speedup
Computational scaling of CCSD(T) on Xeon Phi
Iden – Hartree (Ivy [email protected])
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCFN
on-it
erat
ive
(T) C
PU s
peed
up
0
1
2
3
3
4
5
CPU cores0 24 48 72 96
1xK40m / 24 cores4xK40m / 24 coresideal speedup
Computational scaling of CCSD(T) on Nvidia K40m
Viper – Hull ([email protected])
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
Conclusions
• VIPER is a unique 172TF containerised HPC system that combines Broadwell cores, Omni-Path interconnect and BeeGFS filesystem
• Containers do not significantly impact HPC performance
• Accelerator cards show excellent computational scaling properties and significantly speed up quantum chemistry codes (20% – 40% in our tests)
Computing Insight UK 2016 | Manchester | 14th December 2016
PVSCF
Acknowledgements
• Generous time allocation on Iden at the Hartree Centre (SFTC) in the framework of the Xeon Phi access programmes
• VIPER HPC support team
• The University of Hull for funding
• Technical queries about VIPER: [email protected]
THANK YOU FOR YOUR ATTENTION