47
NIH BTRC for Macromolecular Modeling and Bioinformatics http://www.ks.uiuc.edu/ Beckman Institute, UIUC Petascale Molecular Dynamics with NAMD James Phillips Beckman Institute, University of Illinois http://www.ks.uiuc.edu/Research/namd/ Extreme Scaling Workshop 2012

Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

  • Upload
    buianh

  • View
    220

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Petascale Molecular Dynamics with NAMD

James PhillipsBeckman Institute, University of Illinois

http://www.ks.uiuc.edu/Research/namd/

Extreme Scaling Workshop 2012

Page 2: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUCBTRC for Macromolecular Modeling and Bioinformaticshttp://www.ks.uiuc.edu/

Beckman Institute, UIUC

Beckman Institute

University of Illinois at

Urbana-Champaign

Page 3: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Physics of in vivo Molecular SystemsBiomolecular interactions span many orders of magnitude in space and time.

Center software provides multi-scale computational modeling.femtoseconds hours

Ångstrom microns

!"#!"#$%&'(&!)*&(+',- $"%&#!./(,0$&12'34-'$&

+(-5/#!/64(0$&

Page 4: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Collaborative Driving Projects

1. Ribosome

R. Beckmann (U. Munich)

J. Frank (Columbia U.)

T. Ha(UIUC)

K. Fredrick (Ohio state U.)

R. Gonzalez (Columbia U.)

2. Blood Coagulation

Factors

J. Morrissey (UIUC)

S. Sligar (UIUC)

C. Rienstra (UIUC)

G. Gilbert (Harvard)

3. Whole Cell

Behavior

W. Baumeister (MPI Biochem.)

J. Xiao (Johns Hopkins U.)

C.N. Hunter (U. Sheffield)

N. Price (U. Washington)

4. Biosensors

R. Bashir (UIUC)

J. Gundlach (U. Washington)

G. Timp (U. Notre Dame)

M. Wanunu (Northeastern U.)

L. Liu (UIUC)

5. Viral Infection

Process

J. Hogle (Harvard U.)

P. Ortoleva (Indiana U.)

A. Gronenborn (U. Pittsburgh)

6. IntegrinT. Ha (UIUC)

T. Springer (Harvard U.)

7. Membrane

Transporters

H. Mchaourab (Vanderbilt U.)

R. Nakamoto (U. Virginia)

D.-N. Wang (New York U.)

H. Weinstein (Cornell U.)

Page 5: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Ribosome Driving Project

Target of over 50%

of antibiotics

Many related diseases. e.g. Alzheimer’s

disease due to dysfunctional ribosome

(J. Neuroscience 2005, 25:9171-9175)

Localization failure of nascent chain

lead to neurodegenerative disease

(Mol. Bio. of the Cell 2005, 16:279-291)

Page 6: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUCBTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/ Beckman Institute, UIUC

Viral Infection Driving Projects

Poliovirus Human Immunodeficiency Virus 1

representative of enveloped viruses

need for novel HIV therapies

Poliovirus is a model system for

understanding how non-enveloped

viruses bind to and enter a host cell.

Knowledge of HIV capsid atomic structure

may reveal disassembly mechanism and

guide novel therapies.

Briggs et al. Structure (2006) 14:15-20.

Page 7: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Molecular Mechanics Force Field

Page 8: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Classical Molecular Dynamics

Newton’s equation represents a set of N second order differential

equations which are solved numerically via the Verlet integrator

at discrete time steps to determine the trajectory of each atom.

Energy function:

used to determine the force on each atom:

Small terms added to control temperature and pressure.

Page 9: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Biomolecular Time Scales

Motion Time Scale(sec)

Bond stretching 10-14 to 10-13

Elastic vibrations 10-12 to 10-11

Rotations of surfacesidechains

10-11 to 10-10

Hinge bending 10-11 to 10-7

Rotation of buried sidechains

10-4 to 1 sec

Allosteric transistions 10-5 to 1 sec

Local denaturations 10-5 to 10 sec

Max Timestep: 1 fs

Page 10: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Sizes of Simulations Over Time

BPTI

3K atoms

Estrogen Receptor

36K atoms (1996)

ATP Synthase

327K atoms

(2001)

Page 11: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Our Solution: Parallel Computing

HP 735 cluster

14 processors

(1994)

SGI Origin 2000

128 processors (1997)

PSC Lemieux AlphaServer SC

3000 processors (2002)

Page 12: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Parallel Programming LabUniversity of Illinois at Urbana-Champaign

Siebel Center for Computer Science

http://charm.cs.illinois.edu/

Page 13: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Parallel Objects,

Adaptive Runtime System

Libraries and Tools

The enabling CS technology of parallel objects and intelligent

Runtime systems has led to several collaborative applications in CSE

Crack Propagation

Space-time meshes

Computational Cosmology

Rocket Simulation

Protein Folding

Dendritic Growth

Quantum Chemistry

(QM/MM)

Develop abstractions in context of full-scale applications

NAMD: Molecular Dynamics

STM virus simulation

Page 14: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

NAMD: Scalable Molecular Dynamics51,000 Users, 2900 Citations

Blue Waters Target Application

2002 Gordon Bell Award

Illinois Petascale Computing Facility

PSC LemieuxATP synthase Computational Biophysics Summer School

GPU Acceleration

NCSA LincolnNVIDIA Tesla

Page 15: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Computing research drives NAMD

• Parallel Programming Lab – (Laxmikant Kale)

– Charm++ is an Adaptive Parallel Runtime System

• Gordon Bell Prize 2002

• Three publications at Supercomputing 2011

• Four panels discussing the future necessity of our ideas

• 20 years of co-design for NAMD performance, portability,

and productivity, adaptivity

• Recent example: Implicit Solvent deployed in NAMD by 1 RA

in 6 months. 4x more scalable than similar codes

• Yesterday’s supercomputer is tomorrow’s desktop

Page 16: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

NAMD 2.8 Highly Scalable Implicit Solvent Model

NAMD Implicit Solvent is 4x more scalable than

Traditional Implicit Solvent for all system sizes,

implemented by one GRA in 6 months.

traditional

NAMD

138,000 Atoms

65M Interactions

Processors

Spee

d [

pai

rs/s

ec]

27,600 Atoms

29,500 Atoms

2,016 Atoms

2,412 Atoms

149,000 Atoms

Tanner et al., J. Chem. Theory and Comp., 7:3635-3642, 2011

Page 17: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Charm++ Used by NAMD

• Parallel C++ with data driven objects.

• Asynchronous method invocation.

• Prioritized scheduling of messages/execution.

• Measurement-based load balancing.

• Portable messaging layer.

Page 18: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

• Spatially decompose

data and communication.

• Separate but related

work decomposition.

• “Compute objects”

facilitate iterative,

measurement-based load

balancing system.

NAMD Hybrid Decomposition

Kale et al., J. Comp. Phys. 151:283-312, 1999.

Page 19: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

NAMD Overlapping Execution

Objects are assigned to processors and queued as data arrives.

Phillips et al., SC2002.

Page 20: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Larger machines

enable larger

simulations

Page 21: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

NAMD impact is broad and deep

• Comprehensive, industrial-quality software– Integrated with VMD for simulation setup and analysis

– Portable extensibility through Tcl scripts (also used in VMD)

– Consistent user experience from laptop to supercomputer

• Large user base – 51,000 users– 9,100 (18%) are NIH-funded; many in other countries

– 14,100 have downloaded more than one version

• Leading-edge simulations– “most-used software” on NICS Cray XT5 (largest NSF machine)

– “by far the most used MD package” at TACC (2nd and 3rd largest)

– NCSA Blue Waters early science projects and acceptance test

– Argonne Blue Gene/Q early science project

Page 22: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Outside researchers choose NAMD and succeedCorringer, et al., Nature, 2011

• M. Koeksal, et al., Taxadiene synthase structure and evolution of modular architecture in terpene biosynthesis. (2011)

• C.-C. Su, et al., Crystal structure of the CusBA heavy-metal efflux complex of Escherichia coli. (2011)

• D. Slade, et al., The structure and catalytic mechanism of a poly(ADP-ribose) glycohydrolase. (2011)

• F. Rose, et al., Mechanism of copper(II)-induced misfolding of Parkinson’s disease protein. (2011)

• L. G. Cuello, et al., Structural basis for the coupling between activation and inactivation gates in K(+) channels. (2010)

• S. Dang, et al.,, Structure of a fucose transporter in an outward-open conformation. (2010)

• F. Long, et al., Crystal structures of the CusA efflux pump suggest methionine-mediated metal transport. (2010)

• R. H. P. Law, et al., The structural basis for membrane binding and pore formation by lymphocyte perforin. (2010)

• P. Dalhaimer and T. D. Pollard, Molecular Dynamics Simulations of Arp2/3 Complex Activation. (2010)

• J. A. Tainer, et al., Recognition of the Ring-Opened State of Proliferating Cell Nuclear Antigen by Replication Factor C

Promotes Eukaryotic Clamp-Loading. (2010)

• D. Krepkiy, et al.,, Structure and hydration of membranes embedded with voltage-sensing domains. (2009)

• N. Yeung, et al.,, Rational design of a structural and functional nitric oxide reductase. (2009)

• Z. Xia, et al., Recognition Mechanism of siRNA by Viral p19 Suppressor of RNA Silencing: A Molecular Dynamics Study. (2009)

Recent NAMD Simulations in Nature Bare actin Cofilactin

Voth, et al., PNAS, 2010

180K-atom 30 ns study of anesthetic binding to

bacterial ligand-gated ion channel provided

“complementary interpretations…that could not

have been deduced from the static structure alone.”

500K-atom 500 ns investigation of effect

of actin depolymerization factor/cofilin on

mechanical properties and conformational

dynamics of actin filament.

Bound Propofol Anesthetic

2100 external citations since 2007

Page 23: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Our Goal: Practical Acceleration

• Broadly applicable to scientific computing

– Programmable by domain scientists

– Scalable from small to large machines

• Broadly available to researchers

– Price driven by commodity market

– Low burden on system administration

• Sustainable performance advantage

– Performance driven by Moore’s law

– Stable market and supply chain

Page 24: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

• Outlook in 2005-2006:

– FPGA reconfigurable computing (with NCSA)

• Difficult to program, slow floating point, expensive

– Cell processor (NCSA hardware)

• Relatively easy to program, expensive

– ClearSpeed (direct contact with company)

• Limited memory and memory bandwidth, expensive

– MDGRAPE

• Inflexible and expensive

– Graphics processor (GPU)

• Program must be expressed as graphics operations

Acceleration Options for NAMD

Page 25: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

CUDA: Practical Performance

• CUDA makes GPU acceleration usable:

– Developed and supported by NVIDIA.

– No masquerading as graphics rendering.

– New shared memory and synchronization.

– No OpenGL or display device hassles.

– Multiple processes per card (or vice versa).

• Resource and collaborators make it useful:

– Experience from VMD development

– David Kirk (Chief Scientist, NVIDIA)

– Wen-mei Hwu (ECE Professor, UIUC)

November 2006: NVIDIA announces CUDA for G80 GPU.

Fun to program (and drive)

Stone et al., J. Comp. Chem. 28:2618-2640, 2007.

Page 26: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

NAMD Overlapping Execution

Objects are assigned to processors and queued as data arrives.

Phillips et al., SC2002.

Offload to GPU

Page 27: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Nonbonded Forces on CUDA GPU• Start with most expensive calculation: direct nonbonded interactions.

• Decompose work into pairs of patches, identical to NAMD structure.

• GPU hardware assigns patch-pairs to multiprocessors dynamically.

16kB Shared MemoryPatch A Coordinates & Parameters

32kB RegistersPatch B Coords, Params, & Forces

Texture UnitForce Table

Interpolation

ConstantsExclusions

8kB cache8kB cache

32-way SIMD Multiprocessor

32-256 multiplexed threads

768 MB Main Memory, no cache, 300+ cycle latency

Force computation on single multiprocessor (GeForce 8800 GTX has 16)

Stone et al., J. Comp. Chem. 28:2618-2640, 2007.

Page 28: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

texture<float4> force_table;

__constant__ unsigned int exclusions[];

__shared__ atom jatom[];

atom iatom; // per-thread atom, stored in registers

float4 iforce; // per-thread force, stored in registers

for ( int j = 0; j < jatom_count; ++j ) {

float dx = jatom[j].x - iatom.x; float dy = jatom[j].y - iatom.y; float dz = jatom[j].z - iatom.z;

float r2 = dx*dx + dy*dy + dz*dz;

if ( r2 < cutoff2 ) {

float4 ft = texfetch(force_table, 1.f/sqrt(r2));

bool excluded = false;

int indexdiff = iatom.index - jatom[j].index;

if ( abs(indexdiff) <= (int) jatom[j].excl_maxdiff ) {

indexdiff += jatom[j].excl_index;

excluded = ((exclusions[indexdiff>>5] & (1<<(indexdiff&31))) != 0);

}

float f = iatom.half_sigma + jatom[j].half_sigma; // sigma

f *= f*f; // sigma^3

f *= f; // sigma^6

f *= ( f * ft.x + ft.y ); // sigma^12 * fi.x - sigma^6 * fi.y

f *= iatom.sqrt_epsilon * jatom[j].sqrt_epsilon;

float qq = iatom.charge * jatom[j].charge;

if ( excluded ) { f = qq * ft.w; } // PME correction

else { f += qq * ft.z; } // Coulomb

iforce.x += dx * f; iforce.y += dy * f; iforce.z += dz * f;

iforce.w += 1.f; // interaction count or energy

}

}Stone et al., J. Comp. Chem. 28:2618-2640, 2007.

Nonbonded Forces

CUDA Code

Force Interpolation

Exclusions

Parameters

Accumulation

Page 29: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Overlapping GPU and CPU

with Communication

Remote Force Local ForceGPU

CPU

Other Nodes/Processes

LocalRemote

x

f f

f

f

Localx

x

Update

One Timestep

x

Phillips et al., SC2008

Page 30: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Actual Timelines from NAMDGenerated using Charm++ tool “Projections” http://charm.cs.uiuc.edu/

Remote Force Local Force

x

f f

x

GPU

CPU

f

f

x

x

Page 31: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Tsubame (Tokyo) Application of GPU Accelerated

NAMD (fall 2011)

8400

cores

AFM image of flat chromatophore

membrane (Scheuring 2009)

20 million atom proteins + membrane

Page 32: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

NAMD 2.9 on Keeneland ID(12 Intel cores and 3 Telsa M2070 GPUs per node)

nodes

STMV (1M atoms) s/step

1.62 0.82

GPUs = 8x nodes

Page 33: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Trends Affecting Performance

• GPU performance increasing

– Performance limit will be code on CPU

– Most highly tuned CPU code moved to GPU

– Remaining CPU code is also less efficient

– Therefore CPU must run serial code well

• CPU serial performance static

• CPU core counts increasing

Page 34: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Suggested Strategy

• Focus on CPU-side code

– Port to GPU or optimize/paralellize on CPU

– Stream results off GPU to increase overlap

– Use CPUs with best single-thread performance

• Focus on communication

– Reduce communication overhead on CPU

– Deal with multithreaded MPI issues

– General parallel scalablity improvements

Page 35: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Streaming GPU Results to CPU

• Allows incremental results from a single grid to beprocessed on CPU before grid finishes on GPU

• GPU side:

– Write results to host-mapped memory

– __threadfence_system() and __syncthreads()

– Atomic increment for next output queue location

– Write result index to output queue

• CPU side:

– Poll end of output queue (int array) in host memory

Page 36: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Cray Gemini Optimization

• The new Cray machine has a better network (called Gemini)

• MPI-based NAMD scaled poorly

• BTRC implemented direct port of Charm++ to Cray

• uGNI is the lowest level interface for the Cray Gemini network

– Removes MPI from NAMD call stack

Gemini provides at least 2x increase

in usable nodes for strong scaling

ApoA1 (92,000 atoms) on Cray Blue Waters prototype

ns/

day

nodes

Page 37: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

100M Atoms on Titan vs Jaguar

5x5x4 STMV grid

PME every 4 steps

New Optimizations

Charm++ uGNI port

Node-aware optimizations

Priority messages in critical path

Persistent FFT messages for PME

Shared-memory parallelism for PME

Accepted to SC12

Page 38: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

1M Atom Virus on TitanDev GPU

Single STMV

PME every 4 steps

Page 39: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

TitanDev Strong Scaling4 timesteps = 4982ms = 1.24 s / step

1.83 s for PME step1.01 s for non-PME step

100 stmv

32 nodes

4 timesteps = 336ms = 0.084 s / step

0.185 s for PME step0.046 s for non-PME step

100 stmv

768 nodes

comm

thread

comm

Page 40: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

TitanDev Weak Scaling4 timesteps = 231 ms= 0.057 s / step

0.076 s for PME step 0.049s for non-PME step

4 stmv

30 nodes

4 timesteps = 336ms = 0.084 s / step

0.185 s for PME step0.046 s for non-PME step

100 stmv

768 nodes

comm

thread

comm

Page 41: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

PME delays – tracing data needed for one ungrid calculation

4 timesteps = 336ms = 0.084 s / step

0.185 s for PME step0.046 s for non-PME step

0.042s delay

for 10 KB message

0.046s delay

for 100KB message

comm

comm

comm

comm

comm

commcomm

Page 42: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Strategy to improve scalability

• Fix issues with communication

– 23x16x2 topology limits bisection bandwidth

• Coarsen PME grid

– Reduces communication

– Increases short-range work for GPU

• Push PME work to the GPU

– Charge gridding overlaps coordinate receive

• Start GPU work sooner

– Currently waiting for all coordinate receives

– Use streams to launch work as data arrives

Page 43: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Multilevel Summation Method

• Fast algorithm for N-body electrostatics

• Calculates sum of smoothed pairwise potentials interpolated from ahierarchal nesting of grids

• Advantages over PME (particle-mesh Ewald) and/or FMM (fastmultipole method):

– Algorithm has linear time complexity

– Allows non-periodic or periodic boundaries

– Produces continuous forces for dynamics (advantage over FMM)

– Avoids 3D FFTs for better parallel scaling (advantage over PME)

– Permits polynomial splittings (no erfc() evaluation, as used by PME)

– Spatial separation allows use of multiple time steps

– Extends to other types of pairwise interactions (e.g., dispersion forces)

R. Skeel, I. Tezcan, D. Hardy. J. Comp. Chem. 23:673-684, 2002.

Page 44: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

MSM Main Ideas

=

+

+

atoms

h-grid

2h-grid

Split the 1/r potential Interpolate the smoothed potentials

a 2a

.

.

.

.

.

.

• Split the 1/r potential into a short-range cutoff part plus smoothed parts that aresuccessively more slowly varying. All but the top level potential are cut off.

• Smoothed potentials are interpolated from successively coarser grids.

• Finest grid spacing h and smallest cutoff distance a are doubled at each successivelevel.

1/r

r0

Page 45: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

MSM Calculation

forceexact

short-rangepart

interpolatedlong-range

part

+=

Computational Steps

short-range cutoff

interpolationanterpolation

h-grid cutoff

2h-grid cutoff

4h-grid

restriction

restriction

prolongation

prolongationlong-rangeparts

positions,charges

potentials,forces

Page 46: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

NAMD 2.9 Scalable Replica Exchange• Easier to use and more efficient:

– Eliminates complex, machine-specific launch scripts

– Scalable pair-wise communication between replicas

– Fast communication via high-speed network

• Basis for many enhanced sampling methods:

– Parallel tempering (temperature exchange)

– Umbrella sampling for free-energy calculations

– Hamiltonian exchange (alchemical or conformational)

– Finite Temperature String method

– Nudged elastic band

• Great power and flexibility:

– Enables petascale simulations of modestly sized systems

– Leverages features of Collective Variables module

– Tcl scripts can be highly customized and extended

Released in

NAMD 2.9

Page 47: Petascale Molecular Dynamics with NAMD - Home - XSEDE · Petascale Molecular Dynamics with NAMD ... S. Sligar (UIUC) C. Rienstra ... Develop abstractions in context of full-scale

NIH BTRC for Macromolecular Modeling and Bioinformatics

http://www.ks.uiuc.edu/

Beckman Institute, UIUC

Thanks to: NIH, NSF, DOE,

NVIDIA (Sarah Tariq, Sky Wu,

Justin Luitjens, Nikolai Sakharnykh),

Cray (Sarah Anderson, Ryan Olson),

PPL (Eric Bohm, Yanhua Sun, Gengbin Zheng)

and 17 years of NAMD and Charm++

developers and users.

James PhillipsBeckman Institute, University of Illinois

http://www.ks.uiuc.edu/Research/namd/