35
Scalability and interoperable libraries in NAMD Laxmikant (Sanjay) Kale Theoretical Biophysics group and Department of Computer Science University of Illinois at Urbana-Champaign

Scalability and interoperable libraries in NAMD

  • Upload
    melva

  • View
    40

  • Download
    3

Embed Size (px)

DESCRIPTION

Scalability and interoperable libraries in NAMD. Laxmikant (Sanjay) Kale Theoretical Biophysics group and Department of Computer Science University of Illinois at Urbana-Champaign. Contributors. PI s : Laxmikant Kale, Klaus Schulten, Robert Skeel NAMD 1: - PowerPoint PPT Presentation

Citation preview

Page 1: Scalability and interoperable libraries in NAMD

Scalability and interoperable libraries in NAMD

Laxmikant (Sanjay) KaleTheoretical Biophysics group

and

Department of Computer Science

University of Illinois at Urbana-Champaign

Page 2: Scalability and interoperable libraries in NAMD

Contributors

• PI s : – Laxmikant Kale, Klaus Schulten, Robert Skeel

• NAMD 1: – Robert Brunner, Andrew Dalke, Attila Gursoy, Bill

Humphrey, Mark Nelson

• NAMD2: – M. Bhandarkar, R. Brunner, A. Gursoy, J. Philips,

N.Krawetz, A. Shinozaki, K. Varadarajan, Gengbin Zheng, ..

Page 3: Scalability and interoperable libraries in NAMD

Middle layers

Applications

Parallel Machines

“Middle Layers”:Languages, Tools, Libraries

Page 4: Scalability and interoperable libraries in NAMD

Molecular Dynamics

• Collection of [charged] atoms, with bonds• Newtonian mechanics• At each time-step

– Calculate forces on each atom

• bonds:

• non-bonded: electrostatic and van der Waal’s

– Calculate velocities and Advance positions

• 1 femtosecond time-step, millions needed!• Thousands of atoms (1,000 - 100,000)

Page 5: Scalability and interoperable libraries in NAMD

Cut-off radius

• Use of cut-off radius to reduce work– 8 - 14 Å

– Faraway charges ignored!

• 80-95 % work is non-bonded force computations• Some simulations need faraway contributions

– Periodic systems: Ewald, Particle-Mesh Ewald

– Aperiodic systems: FMA

• Even so, cut-off based computations are important:– near-atom calculations are part of the above

– multiple time-stepping is used: k cut-off steps, 1 PME/FMA

Page 6: Scalability and interoperable libraries in NAMD

Scalability

• The Program should scale up to use a large number of processors. – But what does that mean?

• An individual simulation isn’t truly scalable• Better definition of scalability:

– If I double the number of processors, I should be able to retain parallel efficiency by increasing the problem size

Page 7: Scalability and interoperable libraries in NAMD

Isoefficiency

• Quantify scalability – (Work of Vipin Kumar, U. Minnesota)

• How much increase in problem size is needed to retain the same efficiency on a larger machine?

• Efficiency : Seq. Time/ (P · Parallel Time)– parallel time =

• computation + communication + idle

Page 8: Scalability and interoperable libraries in NAMD

Traditional Approaches

• Replicated Data:– All atom coordinates stored on each processor

– Non-bonded Forces distributed evenly

– Analysis: Assume N atoms, P processors

• Computation: O(N/P)

• Communication: O(N log P)

• Communication/Computation ratio: P log P

• Fraction of communication increases with number of processors, independent of problem size!

– So, not scalable by this definition

Page 9: Scalability and interoperable libraries in NAMD

Atom decomposition

• Partition the Atoms array across processors– Nearby atoms may not be on the same processor

– Communication: O(N) per processor

– Communication/Computation: O(N)/(N/P): O(P)

– Again, not scalable by our definition

Page 10: Scalability and interoperable libraries in NAMD

Force Decomposition

• Distribute force matrix to processors– Matrix is sparse, non uniform

– Each processor has one block

– Communication:

– Ratio:

• Better scalability in practice – (can use 100+ processors)

– Plimpton:

– Hwang, Saltz, et al:

• 6% on 32 Pes 36% on 128 processor

– Yet not scalable in the sense defined here!

P

N

P

Page 11: Scalability and interoperable libraries in NAMD

Spatial Decomposition

• Allocate close-by atoms to the same processor• Three variations possible:

– Partitioning into P boxes, 1 per processor

• Good scalability, but hard to implement

– Partitioning into fixed size boxes, each a little larger than the cutoff distance

– Partitioning into smaller boxes

• Communication: O(N/P): – so, scalable in principle

Page 12: Scalability and interoperable libraries in NAMD

Spatial Decomposition in NAMD

• NAMD 1 used spatial decomposition• Good theoretical isoefficiency, but for a fixed size

system, load balancing problems• For midsize systems, got good speedups up to 16

processors….• Use the symmetry of Newton’s 3rd law to

facilitate load balancing

Page 13: Scalability and interoperable libraries in NAMD

Spatial Decomposition

But the load balancing problems are still severe:

Page 14: Scalability and interoperable libraries in NAMD
Page 15: Scalability and interoperable libraries in NAMD

FD + SD

• Now, we have many more objects to load balance:– Each diamond can be assigned to any processor

– Number of diamonds (3D):

• 14·Number of Patches

Page 16: Scalability and interoperable libraries in NAMD

Bond Forces

• Multiple types of forces:– Bonds(2), Angles(3), Dihedrals (4), ..

– Luckily, each involves atoms in neighboring patches only

• Straightforward implementation:– Send message to all neighbors,

– receive forces from them

– 26*2 messages per patch!

Page 17: Scalability and interoperable libraries in NAMD

Bonded Forces:• Assume one patch per processor:

– an angle force involving atoms in patches:

• (x1,y1,z1), (x2,y2,z2), (x3,y3,z3)

• is calculated in patch: (max{xi}, max{yi}, max{zi})

B

CA

Page 18: Scalability and interoperable libraries in NAMD

Implementation

• Multiple Objects per processor– Different types: patches, pairwise forces, bonded forces,

– Each may have its data ready at different times

– Need ability to map and remap them

– Need prioritized scheduling

• Charm++ supports all of these

Page 19: Scalability and interoperable libraries in NAMD

Charm++

• Parallel C++ with Data Driven Objects• Object Groups:

– global object with a “representative” on each PE

• Asynchronous method invocation• Prioritized scheduling• Mature, robust, portable• http://charm.cs.uiuc.edu

Page 20: Scalability and interoperable libraries in NAMD

Data driven execution

Scheduler Scheduler

Message Q Message Q

Page 21: Scalability and interoperable libraries in NAMD

Load Balancing

• Is a major challenge for this application– especially for a large number of processors

• Unpredictable workloads– Each diamond (force object) and patch encapsulate variable

amount of work

– Static estimates are inaccurate

• Measurement based Load Balancing Framework– Robert Brunner’s recent Ph.D. thesis

– Very slow variations across timesteps

Page 22: Scalability and interoperable libraries in NAMD

Bipartite graph balancing

• Background load:– Patches (integration, ..) and bond-related forces:

• Migratable load:– Non-bonded forces

– bond-related forces involving atoms of the same patch

• Bipartite communication graph – between migratable and non-migratable objects

• Challenge:– Balance Load while minimizing communication

Page 23: Scalability and interoperable libraries in NAMD

Load balancing

• Collect timing data for several cycles• Run heuristic load balancer

– Several alternative ones

• Re-map and migrate objects accordingly– Registration mechanisms facilitate migration

• Needs a separate talk!

Page 24: Scalability and interoperable libraries in NAMD

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

5000000

Processors

Tim

emigratable work

non-migratable work

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

4500000

Processors

Tim

e migratable work

non-migratable work

Page 25: Scalability and interoperable libraries in NAMD

Performance: size of system

# ofatoms

Procs 1 2 4 8 16 32 64 128 160

bR Time 1.14 0.58 .315 .158 .086 .0483,762atoms

Speedup 1.0 1.97 3.61 7.20 13.2 23.7

ER-ERE Time 6.115 3.099 1.598 .810 .397 0.212 0.123 0.09836,573atoms

Speedup (1.97) 3.89 7.54 14.9 30.3 56.8 97.9 123

ApoA-I Time 10.76 5.46 2.85 1.47 0.729 0.382 0.32192,224atoms

Speedup (3.88) 7.64 14.7 28.4 57.3 109 130

Performance data on Cray T3E

Page 26: Scalability and interoperable libraries in NAMD

Performance: various machines

Procs 1 2 4 8 16 32 64 128 160 192

T3E Time 6.12 3.10 1.60 0.810 0.397 0.212 0.123 0.098

- ---------

Speedup (1.97) 3.89 7.54 14.9 30.3 56.8 97.9 123

Origin Time 8.28 4.20 2.17 1.07 0.542 0.271 0.152

2000-------

Speedup 1.0 1.96 3.80 7.74 15.3 30.5 54.3

ASCI- Time 28.0 13.9 7.24 3.76 1.91 1.01 0.500 0.279 0.227 0.196

Red ---------

Speedup 1.0 2.01 3.87 7.45 14.7 27.9 56.0 100 123 143

NOWs Time 24.1 12.4 6.39 3.69

HP735/125

Speedup 1.0 1.94 3.77 6.54

Page 27: Scalability and interoperable libraries in NAMD

Speedup

0

20

40

60

80

100

120

140

160

180

200

220

240

0 20 40 60 80 100 120 140 160 180 200 220 240

Processors

Sp

eed

up

Speedup

Perfect Speedup

Page 28: Scalability and interoperable libraries in NAMD

Recent Speedup Results: ASCI RedSpeedup on ASCI Red: Apo-A1

0

100

200

300

400

500

600

700

0 200 400 600 800 1000 1200

Processors

Sp

eed

up

Page 29: Scalability and interoperable libraries in NAMD

Recent Results on Linux ClusterSpeedup on Linux Cluster

0

10

20

30

40

50

60

70

80

0 20 40 60 80 100 120

Processors

Sp

eed

up

Page 30: Scalability and interoperable libraries in NAMD

Recent Results on Origin 2000Performance on Origin 2000

0

10

20

30

40

50

60

70

80

90

0 20 40 60 80 100 120

Processors

Sp

ee

du

p

Page 31: Scalability and interoperable libraries in NAMD

Multi-paradigm programming

• Long-range electrostatic interactions– Some simulations require this

– Contributions of faraway atoms can be calculated infrequently

– PVM based library, DPMTA

• developed at Duke by John Board et al

• Patch life cycle• Better expressed as a thread

Page 32: Scalability and interoperable libraries in NAMD

Converse

• Supports multi-paradigm programming• Provides portability• Makes it easy to implement RTS for new paradigms• Several languages/libraries:

– Charm++, threaded MPI, PVM, Java, md-perl, pc++, Nexus, Path, Cid, CC++, DP, Agents,..

Page 33: Scalability and interoperable libraries in NAMD

Namd2 with Converse

Page 34: Scalability and interoperable libraries in NAMD

NAMD2

• In production use – Internally for about a year

– Several simulations completed/published

• Fastest MD program? We think so• Modifiable/extensible

– Steered MD

– Free energy calculations

Page 35: Scalability and interoperable libraries in NAMD

Real Application for CS research?

• Benefits– Subtle and complex research problems uncovered only

with real application

– Satisfaction of “real” concrete contribution

– With careful planning, you can truly enrich the “middle layers”

– Bring back a rich variety of relevant CS problems

– Apply to other domains: Rockets? Casting?