Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
YIS@ORNL USA, 2008
Software Engineering,Molecular Dynamics, ChallengesMolecular Dynamics, Challenges
I.T. Todorov
Advanced Research Computing GroupComputational Science and Engineering Department
STFC Daresbury Laboratory Warrington UKSTFC Daresbury Laboratory, Warrington, UK
Software Engineering for Legacy Codes
YIS@ORNL USA, 2008
Legacy requires infrastructure, documentation, portability, ll b i i d d di d !collaborative community and dedicated support!
Software Engineering
Software engineering is an engineering discipline that is • Software engineering is an engineering discipline that is concerned with all aspects of software production.
Software engineering and computational scienceSoftware engineering and computational science
• Computational science is concerned with implementing, utilising and testing scientific theories and fundamentals numerically (complementing experiment); software engineering numerically (complementing experiment); software engineering is concerned with the practicalities of designing, developing and delivering “useful” software.
• “HPC useful” software must be able to run correctly and consistently on commodity clusters at high processor counts, have few defects (there always are), handle abnormal situation nicely, and need little installation effort.
Molecular Dynamics
YIS@ORNL USA, 2008
• Theoretical tool for modelling the detailed microscopic b h i f diff f i l di behaviour of many different types of systems, including gases, liquids, solids, surfaces and clusters.
• In an MD simulation, the classical equations of motion governing the microscopic time evolution of a many body governing the microscopic time evolution of a many body system are solved numerically, subject to the boundary conditions appropriate for the geometry or symmetry of the system.
• Can be used to monitor the microscopic mechanisms of energy and mass transfer in chemical processes, and dynamical properties such as absorption spectra, rate constants and transport properties can be calculatedtransport properties can be calculated.
• Can be employed as a means of sampling from a statistical mechanical ensemble and determining equilibrium properties. These properties include average thermodynamic quantities These properties include average thermodynamic quantities (pressure, volume, temperature, etc.), structure, and free energies along reaction paths.
DL_POLY Project Background
YIS@ORNL USA, 2008
General purpose parallel MD code to meet needs of CCP5 (academic ll b i ) d l d STFC D b L b b W S i h collaboration), developed at STFC Daresbury Laboratory by W. Smith,
T.R. Forester & I.T. Todorov, available free of charge (under licence) to University researchers
1420
885DL_POLY_3
DL_POLY_2
first release
196
e
DL_POLY_3 Development Statistics
YIS@ORNL USA, 2008
65
60
1000
] v08-Sep-07v09-Jan-08
50
55
s of
Cod
e [
v06-Mar-06
v07-Dec-06
45
50
ishe
d Li
nes
v02-Mar-04
v03-Sep-04
v05-Oct-05
3
40Pub
l
v01-Mar-03
p
v04-Mar-05
0 1 2 3 4 535
Development Time [Years]
v04 Mar 05
DL_POLY Versions
YIS@ORNL USA, 2008
Written in modularised free formatted F90 (+MPI) with rigorous d t (FORCHECK d NAGW ifi d) t l lib code syntax (FORCHECK and NAGWare verified), no external library
dependencies. Detailed, self-referencing, interactive PDF (LaTeX) user manuals.• DL_POLY_2 (version 19)
R li d D ll li i li i 30 000 – Replicated Data parallelisation, limits up to ≈30,000 atoms with good parallelisation up to 32 (system dependent) processors (running on any processor count)
– Full force field and molecular descriptionF f di i h h i id i– Free format reading with somewhat rigid semantics
• DL_POLY_3 (version 09)– Domain Decomposition (DD) parallelisation, limits up to ≈2.1×109 atoms with inherent parallelisation (to any high
2k f SPME)processor count, 2k for SPME)– Long-ranged Coulomb interactions are handled by SPM Ewald
employing 3D FFTs, for k-space evaluation, implemented to respect DD space partitioning (DaFT by Ian J. Bush)
ll f f ld d l l d b d b d– Full force field and molecular description but no rigid body description
– Free format semantically approached reading with some fail-safe features and basic reporting (but not fully fool-proofed)
Replicated Data
YIS@ORNL USA, 2008
AA BB CC DD
InitializeInitialize InitializeInitialize InitializeInitialize InitializeInitialize
ForcesForces ForcesForces ForcesForces ForcesForces
MotionMotion
SS
MotionMotion MotionMotion MotionMotion
StatisticsStatistics
SummarySummary
StatisticsStatistics
SummarySummary
StatisticsStatistics
SummarySummary
StatisticsStatistics
SummarySummarySummarySummary SummarySummary SummarySummary SummarySummary
Domain Decomposition
YIS@ORNL USA, 2008
AA BB
CC DD
8
Force Field
YIS@ORNL USA, 2008
• Constraint dynamics: constraint bonds (local) and PMF (global) constraints using iterative solvers (SHAKE for LFV and RATTLE for VV integration)
• Restrained dynamics: tethered and frozen sites
• Intra-molecular interactions: chemical bonds, bond angles, dihedral angles, improper dihedral angles, inversionsg g
• Inter-molecular interactions: van der Waals, metal (EAM, Gupta, Finnis-Sinclair, Sutton-Chen), Tersoff, three-body, four-body
• Electrostatics: SPM Ewald (3D FFTs DaFT by Ian J Bush) Force• Electrostatics: SPM Ewald (3D FFTs – DaFT by Ian J. Bush), Force-Shifted Coulomb, Reaction Field, Fennell damped FSC+RF, Distance dependent dielectric constant
• Ion polarisation via Dynamic (Adiabatic) or Relaxed shell model• Ion polarisation via Dynamic (Adiabatic) or Relaxed shell model
• External fields
Supported Molecular Entities
YIS@ORNL USA, 2008
Rigidmolecules
Point ionsand atoms
Flexiblylinked rigidmolecules
Polarisableions (core+shell) Rigid bondshell) Flexible
moleculesRigidbonds
Rigid bondlinked rigidmoleculesTAD, FES
Functionality
YIS@ORNL USA, 2008
Integration:g
couched in velocity Verlet (VV) or leapfrog Verlet (LFV) manner generating flavours of the following ensembles
• NVE
• NVT (Ekin) Evans
• NVT Andersen, Langevin, Berendsen, Nosé-Hoover, Martyna-Tuckerman-Klein
• NPT Langevin, Berendsen, Nosé-Hoover, Martyna-Tuckerman-Klein
• NσT Langevin, Berendsen, Nosé-Hoover, Martyna-Tuckerman-Klein
Options: Options:
• Statistics, Trajectories, RDF & Z-density profiles, ...
• Variable timestep, Boundary thermostats, Defects detection, infrequent k-space Ewald evaluation, ...infrequent k space Ewald evaluation, ...
• Zero K optimisation, CGM minimisation, ...
• Temperature scaling, Temperature re-Gauss, ...
DL_POLY_3 Weak Scaling on IBM p575
YIS@ORNL USA, 2008
1000 Solid Ar (32'000 atoms per CPU)
800
1000 Solid Ar (32 000 atoms per CPU) NaCl (27'000 ions per CPU) SPC Water (20'736 ions per CPU)
33 million atomst parallelisa
tion
60028 million atomsperfe
ct p
d G
ain
400 21 million atoms
Spe
ed
0
200
max load 700'000 atoms per 1GB/CPUmax load 220'000 ions per 1GB/CPUmax load 210'000 ions per 1GB/CPU
good parallelisation
0 200 400 600 800 1000
Processor Count
Proof of Concept on IBM p575
YIS@ORNL USA, 2008
300,763,000 NaCl with full SPME electrostatics evaluation on 1024 CPU cores
2005 2008
Start-up time ≈ 1 hour (40 min)
Timestep time ≈ 68 seconds (~-30%)Timestep time 68 seconds ( 30%)
FFT evaluation ≈ 55 seconds (~-15%)
In theory the system can be seen by the eye Although you would In theory ,the system can be seen by the eye. Although you would need a very good microscope – the MD cell size for this system is 2μm along the side and as the wavelength of the visible light is 0.5μm so it should be theoretically possible.
Benchmarking BG/L Jülich, 2006
YIS@ORNL USA, 2008
16000 Perfect
14000
16000 Perfect MD step total Link cells van der WaalsE ld l
10000
12000
Gai
n
Ewald real Ewald k-space
6000
8000
Spe
ed
2000
4000
14.6 million particle Gd2Zr2O7 system
2000 4000 6000 8000 10000 12000 14000 16000
Processor count
Radiation Damage Modelling
YIS@ORNL USA, 2008
Atomistic Simulations of Resistance to Amorphization
b R di i D
Simulation of Radiation Damage in Gadolinuim
P hl by Radiation Damage
Kostya Trachenko UoC
Pyrochlores
Ilian Todorov DL/UoCKostya Trachenko UoC
Ilian Todorov UoC/DL
Martin Dove UoC
Ilian Todorov DL/UoC
Neil Allan UoB
John Harding UoLMartin Dove UoC
Emilio Artacho UoC
William Smith DL
John Harding UoL
John Purton DL
William Smith DL
Radiation Damage Motivation
YIS@ORNL USA, 2008
Interim storage ofInterim storage of processed nuclear waste at Sellafield
• WE ARE TO GO NUCLEAR – RISK BALANCED BY SAFETY
• DON’T FORGET SURPLUS PLUTONIUM
• TRADITIONAL GLASS MADE WATE FORMS ARE BELIEVED TO BE VERY UNRELIABLE IN THE LONG TERM
• HAUL STORAGE IS A POTENTIAL RISK AND BECOMES EXPENSIVEHAUL STORAGE IS A POTENTIAL RISK AND BECOMES EXPENSIVE
• BNFL/NEXUS HAS BEEN SUGGESTED AND NOW RECOMMENDS CRYSTALLINE CERAMIC OXIDES AS A LONG-TERM WASTE FORM SOLUTION
Types of Relaxation and Timescales
YIS@ORNL USA, 2008
• Elastic relaxation - reversible
• Relaxation and recovery (or non• Relaxation and recovery (or non-recovery) of the true structural damage
• Both happen on the few ps timescale• Both happen on the few ps timescale
17 PHYSICAL REVIEW B 73, 174207 2006
Biochemical Modelling - I
YIS@ORNL USA, 2008 S.L. Chan Canada, P.-L. Chau France/UK
Biochemical Modelling - II
YIS@ORNL USA, 2008
Biochemical Modelling - III
YIS@ORNL USA, 2008
Biochemical Modelling - IV
YIS@ORNL USA, 2008
Acknowledgements
YIS@ORNL USA, 2008
• Bill Smith, Mike Ashworth – STFC, UK
• Neil Allan – University of Bristol, UKy
• Ian Bush – NAG Ltd., UK
• P.-L. Chau – Pasteur Institute, France
• Kostya Trachenko University of Cambridge UK• Kostya Trachenko – University of Cambridge, UK
http://www.ccp5.ac.uk/DL_POLY/
HPC Changes Lead to Challenges
YIS@ORNL USA, 2008
Change of components’ performance in commodity clustersChange of components performance in commodity clusters
• INTEL PIII 0.8 GHz 0.8 GF (year 2000)
• INTEL Xeon DC Woodcrest 3.0 GHz 24.0 GF (year 2008)
• AMD64 DC Opteron 2 8 GHz 5 2 GF (year 2008)• AMD64 DC Opteron 2.8 GHz 5.2 GF (year 2008)
~ 15-30 fold performance per core increase in 8 years, which follows Moore’s Law.
• Myrinet 18 μs 0 14 GB/s (year 2000)• Myrinet 18 μs 0.14 GB/s (year 2000)
• Infiniband 2 μs 0.90 GB/s (year 2008)
• Sea Star 2+ 4 μs 2.15 GB/s (year 2008)
6 15 f ld i i P2P i h h I~ 6-15 fold improvement in P2P with the Interconnect.
Memory and RAID card performance have also increased by a factor of ~ 8-10 over the same period of time.
A f l i h di l Appearance of: multi-core, heterogeneous commodity clusters (IBM Roadrunner=DC Opteron+PowerXCell+IB), GPGPU, Clearspeed, FPGAs.
HPC Challenges Lead to Changes
YIS@ORNL USA, 2008
• Raw computational power has increased more than any other system parameters thus application scaling and comparison to system parameters – thus application scaling and comparison to old benchmarks is more difficult today (if valid at all).
• With memory and I/O speed performance lagging much behind that of power per core and interconnect speed, clearly there is a that of power per core and interconnect speed, clearly there is a new great contention in the HPC world. (MPI-I/O, NetCDF, HDF5)
• Not all computational problems can be efficiently memory distributed as even DD MD has its limits. Regardless, scaling and comparison is still done only on algorithm performance without I/O included.
• On average the product MD (problem size x simulated time x range) have increased 1000 over the last 8 years Therefore range) have increased ~ 1000 over the last 8 years. Therefore, quick access storage and visualisation under contention too.
• Not only have architectures become multi-core but also heterogeneous commodity clusters have appeared Software heterogeneous commodity clusters have appeared. Software developers are under stress to explore multi-level, hierarchical-heuristic parallelism (as well as memory on disk) in order to take advantage.