26
CELLO Enzo-P / Cello Formation of the First Galaxies James Bordner 1 Michael L. Norman 1 Brian O’Shea 2 1 University of California, San Diego San Diego Supercomputer Center 2 Michigan State University Department of Physics and Astronomy Blue Waters Symposium 2013 National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 1 / 19

Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Enzo-P / CelloFormation of the First Galaxies

James Bordner1 Michael L. Norman1 Brian O’Shea2

1University of California, San DiegoSan Diego Supercomputer Center

2Michigan State UniversityDepartment of Physics and Astronomy

Blue Waters Symposium 2013National Center for Supercomputing Applications

University of Illinois at Urbana-Champaign

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 1 / 19

Page 2: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Introducing Enzo-P / Cello

Our group actively develops two related parallel applications:

Enzo

Enzo: astrophysics / cosmology application

patch-based adaptive mesh refinement (AMR)

MPI or MPI/OpenMP

almost 20 years development

Enzo-P

Cello

Enzo-P / Cello: “Petascale” fork of Enzo code

“forest of octrees” AMR

Charm++ or MPI

≈ 3 years development

work in progress–AMR just coming online

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 2 / 19

Page 3: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Enzo’s strengths

[ John Wise ]

Spans multiple application domains

astrophysical fluid dynamicshydrodynamic cosmology

Rich multi-physics capabilities

fluid, particle, gravity, radiation, . . .

Extreme resolution range

34 levels of refinement by 2!

Active global development community

≈ 25 developers

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 3 / 19

Page 4: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Enzo’s struggles

Memory usage

≈ 1.5KB/patch (MPI/OpenMP helps)memory fragmentation

Mesh quality

2-to-1 constraint can be violatedasymmetric mesh for symmetric problem

Load balancing

difficulty maintaining parent-child locality

Parallel scaling

AMR overhead dominates computation

[ Tom Abel, John Wise, Ralf Kaehler ]

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 4 / 19

Page 5: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Enzo’s pursuit of scalability

Enzo was born in early 1990’s

“Extreme” meant 100 processors

Continual scalability improvements

MPI/OpenMP parallelism“neighbor-finding” algorithmI/O optimizations

Further improvement getting harder

increasing scalability requirementseasy improvements made already

Motivates concurrent rewriting

Enzo-P “Petascale” Enzo forkCello AMR framework

[ Sam Skillman, Matt Turk ]

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 5 / 19

Page 6: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Enzo-P / Cello design overview

Charm++ parallelism

asynchronous, data-drivenlatency tolerantdynamic load balancingcheckpoint / restart

Octree-based AMR

“forest” for root mesheasier to implementscalability advantagesfast neighbor-finding

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 6 / 19

Page 7: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Some advantages of patch-based AMR

Flexible patch size and shape

improved refinement efficiency

Larger patches

smaller surface/volume ratioreduced communicationamortized loop overhead

Fewer patches

reduced AMR meta-data

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 7 / 19

Page 8: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Some advantages of octree-based AMR

Fixed block size and shape

simplified load balancingdynamic memory reuse

More blocks

more parallelism available

Smaller nodes

reduced AMR meta-data

Compute only on leaf nodes

less communication

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 8 / 19

Page 9: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Enzo’s AMR data structure

root patchrefinement

patchroot grid

Patches assigned to MPI processes

Refinement patches created on root patch process

Load balancing relocates refinement patches

Patch data (grid, particle) are distributed

Replicated AMR hierarchy structure

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 9 / 19

Page 10: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Enzo-P / Cello’s AMR data structure

Tree BlockForest

Each block is a Charm++ chare

Blocks initially mapped to root node process

Charm++ load balances

AMR hierarchy structure is fully distributed

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 10 / 19

Page 11: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Charm++ program structure

pX()

Main

ChareC

pW()

Main()

ChareA

pZ()

pY()

ChareB

pV()

A Charm++ Program

Charm++ program

Charm++ objects are charesinvoke remote entry methodscommunicate via messages

Charm++ runtime system

schedules entry methodsmaps chares to processorsmigrates chares to balance

Additional scalability features

checkpoint / restartsophisticated DLB strategies

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 11 / 19

Page 12: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Charm++ collections of chares

Chare Array

distributed array of chares

migrateable elements

flexible indexing

Chare Group

one chare per processor (non-migrateable)

Chare Nodegroup

one chare per node (non-migrateable)

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 12 / 19

Page 13: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello implementation options using Charm++

1. Singleton chares

unlimited hierarchy depth

tedious to program

limited Charm++ support

B B B B B B B B B B B B B B B

2. Chare array

efficient: single access

restricted hierarchy depth

BBBBBB B BB B B B B B B B

HierarchyH

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 13 / 19

Page 14: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Charm++ entities in Enzo-P / Cello

Process P−13210

Simulation

Block

Main

“mainchare” called at program startup

Simulation chare group holds global data

Block chare array defines forest of octrees

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 14 / 19

Page 15: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Control flow in Enzo-P / Cello

Current Enzo-P / Cello control flow

1 Startup

2 Initialize

3 Mesh creation

4 Ghost refresh

5 Computation

6 Mesh adaptation

Simulation

Main

Block

6.

2.

1.

3.

4.

5.

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 15 / 19

Page 16: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Block chare array indexing

Octree

Forest

Block

x y z

0

31

indices

encoding

depth

Charm++ supports user-defined array indices

Default array indices are 3 integers

Cello indexing for Block arrays:

10 × 3 bits for forest indices20 × 3 bits for the octree encoding6 bits for the block depth

Up to 10243 array of octrees

Up to 21 octree levels

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 16 / 19

Page 17: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello mesh generation

Begins with the forest root grid

Proceeds level-by-level

Blocks evaluate refinement criteria

if refine, create child blocksif coarsen, notify parent block

Refine can violate 2-1 constraint

tell coarse neighbors to refinemay recurse

Quiescence detection between steps

Keep track of neighbors andchildren

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 17 / 19

Page 18: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello mesh generation

Begins with the forest root grid

Proceeds level-by-level

Blocks evaluate refinement criteria

if refine, create child blocksif coarsen, notify parent block

Refine can violate 2-1 constraint

tell coarse neighbors to refinemay recurse

Quiescence detection between steps

Keep track of neighbors andchildren

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 17 / 19

Page 19: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello mesh generation

Begins with the forest root grid

Proceeds level-by-level

Blocks evaluate refinement criteria

if refine, create child blocksif coarsen, notify parent block

Refine can violate 2-1 constraint

tell coarse neighbors to refinemay recurse

Quiescence detection between steps

Keep track of neighbors andchildren

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 17 / 19

Page 20: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello mesh generation

Begins with the forest root grid

Proceeds level-by-level

Blocks evaluate refinement criteria

if refine, create child blocksif coarsen, notify parent block

Refine can violate 2-1 constraint

tell coarse neighbors to refinemay recurse

Quiescence detection between steps

Keep track of neighbors andchildren

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 17 / 19

Page 21: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello mesh generation

Begins with the forest root grid

Proceeds level-by-level

Blocks evaluate refinement criteria

if refine, create child blocksif coarsen, notify parent block

Refine can violate 2-1 constraint

tell coarse neighbors to refinemay recurse

Quiescence detection between steps

Keep track of neighbors andchildren

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 17 / 19

Page 22: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello mesh generation

Begins with the forest root grid

Proceeds level-by-level

Blocks evaluate refinement criteria

if refine, create child blocksif coarsen, notify parent block

Refine can violate 2-1 constraint

tell coarse neighbors to refinemay recurse

Quiescence detection between steps

Keep track of neighbors andchildren

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 17 / 19

Page 23: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello mesh generation

Begins with the forest root grid

Proceeds level-by-level

Blocks evaluate refinement criteria

if refine, create child blocksif coarsen, notify parent block

Refine can violate 2-1 constraint

tell coarse neighbors to refinemay recurse

Quiescence detection between steps

Keep track of neighbors andchildren

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 17 / 19

Page 24: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello mesh generation

Begins with the forest root grid

Proceeds level-by-level

Blocks evaluate refinement criteria

if refine, create child blocksif coarsen, notify parent block

Refine can violate 2-1 constraint

tell coarse neighbors to refinemay recurse

Quiescence detection between steps

Keep track of neighbors andchildren

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 17 / 19

Page 25: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Cello AMR ghost zone refresh

Intra-level refresh

1. FaceBlock loads face cells

2. Charm++ entry method send

3. FaceBlock stores ghost cells

copy

copy

send

Fine-to-coarse refresh

1. FaceBlock coarsens face cells

2. Charm++ entry method send

3. FaceBlock stores ghost cells

restrict

copy

send

Coarse-to-fine refresh

1. FaceBlock loads face cells

2. Charm++ entry method send

3. FaceBlock interpolates ghost cells

prolong

copy

send

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 18 / 19

Page 26: Enzo-P / Cello...Enzo-P / Cello Formation of the First Galaxies James Bordner1 Michael L. Norman1 Brian O’Shea2 1University of California, San Diego San Diego Supercomputer Center

CELLO

Summary

Enzo Enzo-P / CelloParallelization MPI/OpenMP Charm++

AMR patch-based tree-basedAMR structure replicated distributedBlock sizes ×1000 variation constantTask scheduling level-parallel dependency-drivenLoad balancing patch migration Charm++

Fault tolerance checkpoint/restart Charm++

http://cello-project.org

NSF PHY-1104819, AST-0808184

Blue Waters Symposium 2013 Enzo-P / Cello 21-22 May 2013 19 / 19