46
Louis Howell Center for Applied Scientific Computing/ AX Division Lawrence Livermore National Laboratory Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion May 18, 2005

Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

  • Upload
    lyre

  • View
    55

  • Download
    4

Embed Size (px)

DESCRIPTION

Louis Howell Center for Applied Scientific Computing/ AX Division Lawrence Livermore National Laboratory. Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion. May 18, 2005. Raptor Code: Overview. Block-structured Adaptive Mesh Refinement (AMR) - PowerPoint PPT Presentation

Citation preview

Page 1: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

Louis HowellCenter for Applied Scientific Computing/

AX Division

Lawrence Livermore National Laboratory

Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

May 18, 2005

Page 2: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 2LLNL / CASC / AXDIV

Raptor Code: Overview

Block-structured Adaptive Mesh Refinement (AMR) Multifluid Eulerian representation Explicit Godunov hydrodynamics Timestep varies with refinement level Single-group radiation diffusion (implicit, multigrid) Multi-group radiation diffusion under development Heat conduction, also implicit

Now adding discrete ordinate (Sn) transport solvers AMR timestep requires both single and multilevel Sn

Parallel implementation and scaling issues

Page 3: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 3LLNL / CASC / AXDIV

Raptor Code: Core Algorithm Developers

Rick Pember

Jeff Greenough

Sisira Weeratunga

Alex Shestakov

Louis Howell

Page 4: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 4LLNL / CASC / AXDIV

Radiation Diffusion Capability

Single-group radiation diffusion is coupled with multi-fluid Eulerian hydrodynamics on a regular grid using block-structured adaptive mesh refinement (AMR).

Page 5: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 5LLNL / CASC / AXDIV

Radiation Diffusion Contrasted with Discrete Ordinates

All three calculations conserve energy by using multilevel coarse-fine synchronization at the end of each coarse timestep. Fluid energy is shown (overexposed to bring out detail). Transport uses step characteristic discretization.

Flux-limited Diffusion S16 (144 ordinates) 144 equally-spaced ordinates

Page 6: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 6LLNL / CASC / AXDIV

Coupling of Radiation with Fluid Energy

Advection and Conduction:

Implicit Radiation Diffusion (gray, flux-limited):

21

25

21

021

nnn TTpEtEE uu

1111

1

1

1111

4

4

nR

nnP

nRn

R

RnR

nR

nR

nnP

n

cEBEEc

t

EE

cEBtEE

Page 7: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 7LLNL / CASC / AXDIV

Coupling of Radiation with Fluid Energy

Advection and Conduction:

Implicit Radiation Transport (gray, isotropic scattering):

21

25

21

021

nnn TTpEtEE uu

dI

BIItc

II

BtEE

nn

nns

nna

nns

na

nnn

nnna

n

4

11

111111111

1111

4

1

4

Page 8: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 8LLNL / CASC / AXDIV

Implicit Radiation Update

Extrapolate Emission to New Temperature:

TB

tc

TB

T

Bee

cBB

avn

a

an

va

nna

*1

*

*

*1**11

4

1

Page 9: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 9LLNL / CASC / AXDIV

Implicit Radiation Update

Iterative Form of Diffusion Update:

1*****1

**

***

11

*1**

1

41

41

1

nRP

n

P

nRn

R

RnRP

nR

nR

cEBtEEE

EEt

B

EEc

cEt

EE

Page 10: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 10LLNL / CASC / AXDIV

Implicit Radiation Update

Iterative Form of Transport Update:

1*****1

**

***

1***1**11

41

41

na

n

a

nas

nsa

nnn

BtEEE

EEt

B

IItc

II

Page 11: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 11LLNL / CASC / AXDIV

Simplified Transport Equation

Gather Similar Terms:

Simplified Gray Semi-discrete Form:

EEt

BItc

S

tc

an

ass

sat

**

***

***

**

41

1

1

,,4

1,,

4rSdrIrIrI st

Page 12: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 12LLNL / CASC / AXDIV

Discrete Ordinate Discretization

Angular Discretization:

Spatial Discretization in 2D Cartesian Coordinates:

Other Coordinate Systems: 1D & 3D Cartesian, 1D Spherical, 2D Axisymmetric (RZ)

mmm

msmtmm SIwII

4

1

mjimjimmsjimt

jimjimm

jimjimm

SIwI

IIy

IIx

,,,,,,

,,,,,,,,

4

1

21

21

21

21

Page 13: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 13LLNL / CASC / AXDIV

Spatial Transport Discretizations

1. Step First order upwind, positive, inaccurate in both thick and thin limits

2. Diamond Difference Second order but very vulnerable to oscillations

3. Simple Corner Balance (SCB) More accurate in thick limit, groups cells in 2x2 blocks, each block

requires 4x4 matrix inversion (8x8 in 3D).

4. Upstream Corner Balance Attempts to improve on SCB in streaming limit, breaks conjugate

gradient acceleration (implemented in 2D Cartesian only)

5. Step Characteristic Gives sharp rays in thin streaming limit, positive, inaccurate in thick

diffusion limit (implemented in 2D Cartesian only)

Page 14: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 14LLNL / CASC / AXDIV

Axisymmetric Crooked Pipe Problem

Diffusion S2 Step S8 Step S2 SCB S8 SCB

Radiation Energy Density

Page 15: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 15LLNL / CASC / AXDIV

Axisymmetric Crooked Pipe Problem

Diffusion S2 Step S8 Step S2 SCB S8 SCB

Fluid Temperature

Page 16: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 16LLNL / CASC / AXDIV

AMR Timestep

Advance Coarse (L0)

Advance Finer (L1)

Advance Finest (L2)

Δt0

Δt1

Δt2

Page 17: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 17LLNL / CASC / AXDIV

AMR Timestep

Synchronize L1 and L2

(Multilevel solve)

Repeat (L1 and L2)

Synchronize L0 and L1

(Multilevel solve)

Δt1

Δt1

Δt0

Page 18: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 18LLNL / CASC / AXDIV

Requirements for Radiation Package

Features controled by the package:— Nonlinear implicit update with fluid energy coupling

— Single level transport solver (for advancing each level)

— Multilevel transport solver (for synchronization)

Features not directly controled by the package:— Refinement criteria

— Grid layout

— Load balancing

— Timestep size

Parallel support provided by BoxLib:— Each refinement level distributed grid-by-grid over all processors

— Coarse and fine grids in same region may be on different processors

Page 19: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 19LLNL / CASC / AXDIV

Multilevel Transport Sweeps

Page 20: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 20LLNL / CASC / AXDIV

Sources Updated Iteratively

Three “sources” must be recomputed after each sweep, and iterated to convergence:

Scattering source Reflecting boundaries AMR refluxing source

The AMR source converges most quickly, while the scattering source is often so slow that convergence acceleration is required.

Page 21: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 21LLNL / CASC / AXDIV

Parallel Communication

Four different communication operations are required:

1. From grid to grid on the same level

2. From coarse level to upstream edges of fine level

3. From coarse level to downstream edges of fine level (to initialize flux registers)

4. From fine level back to coarse as a refluxing source

Operations 2 and 3 only needed when preparing to transfer control from coarse to fine level

Operation 3 could be eliminated and 4 reduced if a data structure existed on the coarse processor to hold the information

Page 22: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 22LLNL / CASC / AXDIV

Parallel Grid Sequencing

To sweep a single ordinate, a grid needs information from the grids on its upstream faces

Different grids sweep different ordinates at the same time

2D Cartesian, first quadrant only of S4 ordinate set: 13 stages for 3 ordinates

Page 23: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 23LLNL / CASC / AXDIV

Parallel Grid Sequencing

In practice, ordinates from all four quadrants are interleaved as much as possible

Execution begins at the four corners of the domain and moves toward the center

2D Cartesian, all quadrants of S4 ordinate set: 22 stages for 12 ordinates

Page 24: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 24LLNL / CASC / AXDIV

Parallel Grid Sequencing: RZ

In axisymmetric (RZ) coordinates, angular differencing transfers energy from ordinates directed inward towards the axis into more outward ordinates. The inward ordinates must therefore be swept first.

2D RZ, S4 ordinate set requires 26 stages for 12 ordinates, up from 22 for Cartesian

Page 25: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 25LLNL / CASC / AXDIV

Parallel Grid Sequencing: AMR

43 level 1 grids, 66 stages for 40 ordinates (S8) (20 waves in each direction):

Stage 4

Stage 34

Stage 15

Stage 62

Page 26: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 26LLNL / CASC / AXDIV

Parallel Grid Sequencing: 3D AMR

In 2D, grids are sorted for each ordinate direction

In 3D, sorting isn’t always possible—loops can form

The solution is to split grids to break the loops

Communication with split grids is implemented

So is a heuristic for determining which grids to split

It is possible to always choose splits in the z direction only

Page 27: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 27LLNL / CASC / AXDIV

Acceleration by Conjugate Gradient

A strong scattering term may make iterated transport sweeps slow to converge

Conjugate gradient acceleration speeds up convergence dramatically

The parallel operations required are then—Transport sweeps— Inner products

A diagonal preconditioner may be used, or for larger ordinate sets, approximate solution of a related problem using a minimal S2 ordinate set

—No new parallel building blocks are required

Page 28: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 30LLNL / CASC / AXDIV

AMR Scaling: 2D Grid LayoutCase 1: Separate Clusters of Fine Grids

To investigate scaling in AMR problems, I need to be able to generate “similar” problems of different sizes.

I use repetitions of a unit cell of 4 coarse and 18 fine grids.

Each processor gets 1 coarse grid. Due to load balancing, different processors get different numbers of fine grids.

Page 29: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 31LLNL / CASC / AXDIV

AMR Scaling: 2D Grid Layout Case 2: Coupled Fine Grids

The decoupled groups of fine grids in the previous AMR problem give the transport algorithms an advantage, since groups do not depend on each other.

This new problem couples fine grids across the entire width of the domain.

Note the minor variations in grid layout from one tile to the next, due to the sequential nature of the regridding algorithm.

Page 30: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 32LLNL / CASC / AXDIV

2D Fine Scaling (MCR Linux Cluster) Case 1: Separate Clusters of Fine Grids

0

0.1

0.2

0.3

0.4

0.5

0.6

0 50 100 150 200

Processors

Seco

nd

s

Fine Step Sweep

Fine SCB Sweep

Fine SMG Setup

Fine SMG V-cycle

Fine PFMG Setup

Fine PFMG V-cycle

Grids arranged in square array, 4 coarse grids and 18 fine grids for every four processors, each coarse grid is 256x256 cells, 41984 fine cells per processor. Sn tranport sweeps are for all 40 ordinates of an S8 ordinate set.

Page 31: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 33LLNL / CASC / AXDIV

2D Fine Scaling (MCR Linux Cluster) Case 2: Coupled Fine Grids

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 50 100 150 200

Processors

Seco

nd

s

Fine Step Sweep

Fine SCB Sweep

Fine SMG Setup

Fine SMG V-cycle

Fine PFMG Setup

Fine PFMG V-cycle

Grids arranged in square array, one coarse grid and 5-6 fine grids for every processor, each coarse grid is 256x256 cells, ~51000 fine cells per processor. Sn tranport sweeps are for all 40 ordinates of an S8 ordinate set.

Page 32: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 34LLNL / CASC / AXDIV

3D Fine Scaling (MCR Linux Cluster)Case 1: Separate Clusters of Fine Grids

0

0.5

1

1.5

2

2.5

3

0 200 400

Processors

Seco

nd

s

Fine Step Sweep

Fine SCB Sweep

Fine SMG Setup

Fine SMG V-cycle

Fine PFMG Setup

Fine PFMG V-cycle

Grids arranged in cubical array, 8 coarse grids and 58 fine grids for every eight processors, each coarse grid is 32x32x32 cells, 28800 fine cells per processor. Sn tranport sweeps are for all 80 ordinates of an S8 ordinate set.

Page 33: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 35LLNL / CASC / AXDIV

3D Fine Scaling (MCR Linux Cluster) Case 2: Coupled Fine Grids

0

1

2

3

4

5

0 100 200

Processors

Seco

nd

s

Fine Step Sweep

Fine SCB Sweep

Fine SMG Setup

Fine SMG V-cycle

Fine PFMG Setup

Fine PFMG V-cycle

Grids arranged in cubical array, one coarse grid and ~33 fine grids for every processor, each coarse grid is 32x32x32 cells, ~47600 fine cells per processor. Sn tranport sweeps are for all 80 ordinates of an S8 ordinate set.

Page 34: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 37LLNL / CASC / AXDIV

2D AMR Scaling (MCR Linux Cluster) Case 2: Coupled Fine Grids

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 50 100 150 200

Processors (Coarse Grids)

Seco

nd

s

ML Step SweepML SCB SweepFine Step SweepFine SCB SweepFine Wave SetupFine Stage Setup

Grids arranged in square array, one coarse grid and 5-6 fine grids for every processor, each coarse grid is 256x256 cells, ~51000 fine cells per processor. Sn tranport sweeps are for all 40 ordinates of an S8 ordinate set.

Page 35: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 38LLNL / CASC / AXDIV

2D AMR Scaling (MCR Linux Cluster) Case 2: Coupled Fine (Optimized Setup)

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 50 100 150 200

Processors (Coarse Grids)

Seco

nd

s

ML Step Sweep

ML SCB Sweep

Fine Step Sweep

Fine SCB Sweep

Fine Wave Setup

Fine Stage Setup

This version has neighbor calculation in wave setup implemented using an O(n) bin sort, depth-first traversal for building waves (makes little difference). In stage setup wave intersections optimized and stored. All optimizations serial.

Page 36: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 39LLNL / CASC / AXDIV

3D AMR Scaling (MCR Linux Cluster)Case 1: Separate Clusters of Fine Grids

0

1

2

3

4

5

6

7

0 200 400

Processors (Coarse Grids)

Seco

nd

s

ML Step SweepML SCB SweepFine Step SweepFine SCB SweepFine Wave SetupFine Stage Setup

Grids arranged in cubical array, 8 coarse grids and 58 fine grids for every eight processors, each coarse grid is 32x32x32 cells, 28800 fine cells per processor. Sn tranport sweeps are for all 80 ordinates of an S8 ordinate set.

Page 37: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 40LLNL / CASC / AXDIV

3D AMR Scaling (MCR Linux Cluster)Case 1: Separate Clusters (Optimized)

0

1

2

3

4

5

6

7

0 200 400

Processors (Coarse Grids)

Seco

nd

s

ML Step SweepML SCB SweepFine Step SweepFine SCB SweepFine Wave SetupFine Stage Setup

This version has neighbor calculation in wave setup implemented using an O(n) bin sort. In stage setup wave intersections optimized and stored. All optimizations serial.

Page 38: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 41LLNL / CASC / AXDIV

3D AMR Scaling (MCR Linux Cluster) Case 2: Coupled Fine Grids (Optimized)

Grids arranged in cubical array, one coarse grid and ~33 fine grids for every processor, each coarse grid is 32x32x32 cells, ~47600 fine cells per processor. Sn tranport sweeps are for all 80 ordinates of an S8 ordinate set.

0

1

2

3

4

5

6

7

8

0 100 200

Processors (Coarse Grids)

Seco

nd

s

ML Step SweepML SCB SweepFine Step SweepFine SCB SweepFine Wave SetupFine Stage Setup

Page 39: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 42LLNL / CASC / AXDIV

Transport Scaling Conclusions

A sweep through an S8 ordinate set and a multigrid V-cycle take similar amounts of time, and scale in similar ways on up to 500 processors.

Setup expenses for transport are amortized over several sweeps. This is code for determining the communication patterns between grids, including such things as the grid splitting algorithm in 3D.

So far, optimized scalar setup code has given acceptable performance, even in 3D.

Page 40: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 43LLNL / CASC / AXDIV

Acceleration by Conjugate Gradient

Solve by sweeps, holding right hand side fixed:

Solve homogeneous problem by conjugate gradient:

Matrix form:

mm

mmsmtmm IwSII

4

1 ,oldnewnew

oldnewcorrcorrcorr ssmtmm II

xAx

xAx

x

oldoldnew ,

4

1

mmm

smtmm

w

Page 41: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 44LLNL / CASC / AXDIV

Acceleration by Conjugate Gradient

Inner product:

Preconditioners:

— Diagonal

— Solution of smaller (S2) system by DPCG

This system can be solved to a weak (inaccurate) tolerance without spoiling the accuracy of the overall iteration

cells

si, iiii yxvuvu

x

x

t

11diag sM

Page 42: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 45LLNL / CASC / AXDIV

“Clouds” Test Problem: Acceleration

Scheme Res Set Accel Iter Sweeps PreCon Time

SCB 128 S2 SI 18472 18472 58.12

CG 290 876 3.283

DPCG 112 342 1.433

SCB 128 S8 SI 18674 18674 560.3

CG 290 1752 52.88

DPCG 111 678 20.92S2PCG 12 84 836 6.583

SCB 128 S16 CG 290 2615 319.4

DPCG 111 1017 125.2S2PCG 12 126 828 19.98

SCB 128,512 S16 CG 263 2891 1304.

DPCG 163 1809 824.3S2PCG 17 208 3570 144.9

Page 43: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 46LLNL / CASC / AXDIV

“Clouds” Test Problem: Acceleration

Scheme Res Set Accel Iter Sweeps PreCon Time

SCB 128,512 S2 SI 19398 19398 227.4

CG 260 1053 14.03

DPCG 168 683 9.333

SCB 128,512 S8 CG 263 2119 264.1

DPCG 164 1327 166.4S2PCG 16 143 3353 67.00

StepChar 128,512 S8 CG 197 1392 398.9

DPCG 158 1119 323.0S2PCG 15 118 2892 121.0

SCB 128,512 S16 CG 263 2891 1304.

DPCG 163 1809 824.3S2PCG 17 208 3570 144.9

Step 128,512 S16 S2PCG 11 129 1889 106.0

Diamond 128,512 S16 S2PCG 19 274 5244 237.7

StepChar 128,512 S16 S2PCG 16 163 3108 265.3

Page 44: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 47LLNL / CASC / AXDIV

“Clouds” Test Problem

1 km square domain

No absorption or emission

400000 erg/cm2/s isotropic flux incoming at top

Specular reflection at sides

Absorbing bottom

κs=10-2 cm-1 inside clouds

κs=10-6 cm-1 elsewhere

S2 uses DPCG

S8 uses S2PCG

Serial timings on GPS (1GHz Alpha EV6.8)

Page 45: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 48LLNL / CASC / AXDIV

“Clouds” Test Problem: SCB Fluxes

Resolutions

S2 (4 ordinates) S8 (40 ordinates)

Total Cells Flux Time Flux Time

32 1024 17742 0.183 20115 0.950

64 4096 13825 0.433 15842 1.783

128 16384 18632 1.433 22677 6.583

32,128 7168 18633 1.233 22678 6.433

256 65536 19804 6.833 26568 33.87

64,256 20480 19819 3.416 26571 19.00

512 262144 20032 35.18 28644 209.2

128,512 57344 20057 9.333 28651 67.00

32,128,512 48128 20059 10.38 28654 70.55

1024 1048576 19994 162.7 29651 1035.

256,1024 212992 20014 45.87 29658 313.4

64,256,1024 167936 20031 47.13 29644 286.3

Page 46: Parallel Adaptive Mesh Refinement for Radiation Transport and Diffusion

LHH 49LLNL / CASC / AXDIV

This work was performed under the auspices of the U. S. Department of Energy by the University of California Lawrence Livermore National Laboratory under Contract W-7405-Eng-48.

UCRL-PRES-212183