16
Octopus A scalable AMR tookit for astrophysics Bryce Adelstein-Lelbach [1] , Zach Byerly [3] , Dominic Marcello [3] PI: Hartmut Kaiser [1][2] , Co-PI: Geoffrey Clayton [3] [1]: Center for Computation and Technology [2]: LSU Department of Computer Science [3]: LSU Department of Physics

HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Octopus A scalable AMR tookit for astrophysics

Bryce Adelstein-Lelbach[1], Zach Byerly[3], Dominic Marcello[3]

PI: Hartmut Kaiser[1][2], Co-PI: Geoffrey Clayton[3]

[1]: Center for Computation and Technology [2]: LSU Department of Computer Science [3]: LSU Department of Physics

Page 2: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Overview of Research

• NSF STAR project: a cross-discipline collaboration between LSU computer scientists and astrophysicists.

– Primary goal is to facilitate a highly realistic simulation of the merger of two white dwarfs.

• The study of these binaries is important as they are possible progenitors of a number of astrophysically important objects, such as Type 1a supernovae.

5/17/2012 stellar.cct.lsu.edu 2

Page 3: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Overview of Research

• Development of new adaptive mesh refinement (AMR) codes utilizing HPX, a framework for message-driven computation, instead of traditional HPC programming mechanisms (MPI, OpenMP, PGAS). – Existing unigrid codes are too slow.

• 0.2 orbits/day running on 1,032 cores.

• We want to be running hundreds if not thousands of orbits.

– AMR codes can be many orders of magnitude faster (104 – 106 ). • Doing AMR with MPI is difficult and can face scalability problems,

due to the inherently inhomogeneous nature of AMR.

5/17/2012 stellar.cct.lsu.edu 3

Page 4: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Numerical Methods

• Our group uses 3D Eulerian hydrodynamics codes: – Explicit advection scheme

• Kurganov and Tadmor, 2000, Journal of Computational Physics, 160, 241

– Finite-volume method – Adaptive mesh refinement

• Multigrid method (for solving Poisson’s equations): Martin and Cartwright, 1996

• Interpolation (PPM): Colella and Woodward, 1984, Journal of Computational Physics

– Angular momentum conservation

• Other references: – Dominic Marcello’s Ph.D. thesis

5/17/2012 stellar.cct.lsu.edu 4

Page 5: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

5/17/2012 stellar.cct.lsu.edu 5

Source: Dominic Marcello, LSU Department of Physics

Page 6: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

What’s HPX?

• A general purpose C++ runtime system for parallel and distributed applications of any scale.

• The HPX paradigm prefers: – Asynchronous communication to hide latencies and

contention instead of avoiding them. – Fine-grained parallelism and an active global address

space to enable dynamic and heuristic load balancing instead of statically partitioning work.

– Local, dependency-driven synchronization instead of explicit global barriers.

– Sending work to data instead of sending data to work.

6 5/17/2012 stellar.cct.lsu.edu

Page 7: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Hiding Latency and Contention (pull model)

5/17/2012 stellar.cct.lsu.edu 8

Locality 1

Suspend thread A

Execute another thread

Resume thread A

Locality 2

Execute future:

Thread B

Future object

Result is being returned

Page 8: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Hiding Latency and Contention (push model)

5/17/2012 stellar.cct.lsu.edu 9

Locality 1

Thread A0 is finished

Execute another thread

Locality 2

Thread B

Asynchronously start thread B, passing it thread A1 as an argument

Pass the result of thread B to thread A1

Thread A1

Page 9: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Asynchronous vs Synchronous

5/17/2012 stellar.cct.lsu.edu 10

A phone call is a form of synchronous communication.

Texting is a form of

asynchronous communication. A text message is a

future.

Page 10: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

What’s Octopus?

• A general purpose HPX AMR framework. – Based heavily on ideas drawn from an existing LSU OpenMP AMR

code. • Octree-based AMR.

– Primarily designed for high-resolution, high-accuracy astrophysics hydrodynamics simulations.

• Octopus design: – Multi-tiered software architecture to maintain abstraction while

supporting domain-specific physics. – Policy-driven genericity. – Powerful optimizations applied to the generic layers:

• Timestep size prediction. • Time-based refinement. • Localization of dependencies • Eager computation of fluxes.

11 5/17/2012 stellar.cct.lsu.edu

Page 11: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Octopus Architecture

stellar.cct.lsu.edu 12 5/17/2012

HPX

AMR Load

Balancer I/O

PDB NetCDF HDF5 Check

pointing

GR Hydrodynamics Eulerian Hydrodynamics

Massless Self Gravity

3D torus

Sod shock tube

Rayleigh Taylor

Sedov blast wave

DWD

Middle Tier

Top Tier

Bottom Tier

Page 12: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Universal Scalability Law

13 5/17/2012 stellar.cct.lsu.edu

S =N

1 + α N − 1 + βN N − 1

S: Speedup α: Contention coefficient β: Synchronization delay coefficient N: Number of processors

Set β = 0, and you get Amdahl

S =N

1 + α N − 1

Maximum scaling point:

Nmax = 1 − α /β

Page 13: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

5/17/2012 stellar.cct.lsu.edu 14

0

2

4

6

8

10

12

14

16

18

20

1 2 4 8 16 32 64 128 256 512 1024

Spe

ed

up

(sc

ale

d t

o 1

-co

re r

ela

tive

to

eac

h c

od

e)

Cores (logscaled)

Scaling of OpenMP 3D Eulerian Code vs Octopus 3D Eulerian Code

Measured scaling for 4 LOR with Octopus

Modeled scaling for 4 LOR with Octopus

Measured scaling for 5 LOR with Octopus

Modeled scaling for 5 LOR with Octopus

Measured scaling for 6 LOR with Octopus

Modeled scaling for 6 LOR with Octopus

Measured scaling for 7 LOR with Octopus

Modeled scaling for 7 LOR with Octopus

Measured scaling for 4 LOR with OpenMP

Modeled scaling for 4 LOR with OpenMP

Measured scaling for 5 LOR with OpenMP

Modeled scaling for 5 LOR with OpenMP

Measured scaling for 6 LOR with OpenMP

Modeled scaling for 6 LOR with OpenMP

Page 14: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

USL Modeling

Octopus

LOR α β Nmax

4 0.04840 3.00E-04 56

5 0.03790 1.00E-04 98

6 0.04084 6.00E-05 126

7 0.05069 6.00E-06 397

OpenMP

LOR α β Nmax

4 0.40890 1.89E-02 5

5 0.37560 1.83E-02 5

6 0.37470 1.62E-02 6

5/17/2012 stellar.cct.lsu.edu 15

Page 15: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu

Grain Size

16 5/17/2012 http://stellar.cct.lsu.edu

Page 16: HPX: A C++11 Parallel Runtime Systemstellar.cct.lsu.edu/pubs/SCALA2013_lelbach.pdf–Dominic Marcello’s Ph.D. thesis 5/17/2012 stellar.cct.lsu.edu 4 5/17/2012 stellar.cct.lsu.edu