Towards achieving GPU-native adaptive mesh reﬁnement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Reﬁne/coarsen patches

Towards achieving GPU-native adaptive mesh refinement

Ania Brown

Prof Takayuki Aoki

Why AMR?

AMR is not GPU friendly

• Complicated, time varying data structures

• Can you use AMR and keep GPU performance?My conclusion: yes, but it’s messy

Contents

• Introduction to the algorithm + data structures

• The challenges

• Optimisation possibilities

• The RSE perspective — some lessons learnt

Problem domain

• Stencil calculations on a square structured mesh

• Cell centre values

i, j+1

i+1, ji-1, j i, j

i, j-1

1) Block structured AMR



2) Tree based AMRSimulation mesh Refinement representation

Simulation mesh Refinement representation

2) Tree based AMR


2) Tree based AMR


… …

2) Tree based AMR


… …… …

2) Tree based AMR

3) Patches +Tree AMR

Block-structured Enzo CHOMBO

Octree RAMSES

Octree + patches FLASH, using PARAMESH NIRVANA

CPU GPU

Octree + patches GAMER (2011) Daino (2016)

The AMR algorithmInitialize data structures

Create halo regions

Update patch values

Refine/coarsen patches

Output values for visualisation

Main loop:

Update neighbour relations

Initialize data structures

Main loop:

CPU GPU

Create halo regions

Update patch values




Update step

1 CUDA block

Update step

• Tune block size for coalesced access

• Zmarching


Main loop:

CPU GPU

Create halo regions

Update patch values





Main loop:

CPU GPU

Create halo regions

Update patch values




Ordering leaves

Hilbert curve

At each step:

• Divide space into 4

• Replace each quadrant with rotated or reflected versions of the original curve

• Connect such that that start and end points remain the same

Rules for refinement

Neighbour relations

Leaf nodes:Neighbour indices in each direction Parent index

Parent nodes: Child indices

Create halo regionsFind correct neighbour node



Create halo regionsCopy halo values

Interpolating halo values





Reducing halo values

Reducing halo values

Main loop:

CPU GPU

Refine/coarsen patch values


Defragment value array

Find patches to coarsen/refine

Coarsen/refine step

Defragment value array refine node



Defragment value array coarsen node



Calculate new defragmented position

• Input: for all nodes, flag whether that node is to be refined, coarsened or unchanged

• Refined: +3Coarsened: -3Unchanged: 0

• For each element, sum all preceding elements in the array

• For n nodes, requires n/2 threads and O(log2(n)) serial steps

Multi-GPULoad balancing

Boundaries between subdomains

Node 0

Node 1

Node 2


Node 0

Node 1

Node 2


How to distribute tree

How to distribute tree

Software design

• Code framework — allow user to edit/add functions for initialisation, resolution criterion, stencil calc

• Code generation — annotated regular data structures

• How much to offer? — cell/node centre, interpolation level, stencil type

Software development process

• Unit testing

• Verification

• Profiling

Code by: T.Shimokawabe, T.Takaki (2011)

Phase field model of dendritic solidification in a binary alloy

7 refinement levels in quad-tree

Regular mesh

Adaptive mesh

Performance testing for dendritic solidification model

256 x 256

L = 1.5⇥ 10�3m

R = 4.5⇥ 10�4m

�xmin = 6⇥ 10�6m

�x

max

= 1.2⇥ 10�5m

8192 x 8192

L = 1.5⇥ 10�3m

R = 4.5⇥ 10�4m

�x

max

= 1.2⇥ 10�5m

�xmin = 1.9⇥ 10�7m

Performance testing for dendritic solidification model

Exec

utio

n tim

e pe

r tim

este

p (m

s)

0

62.5

125

187.5

250

Resolution

1024

2048

3072

4096

5120

6144

7168

8192AMR Regular Mesh Worst case AMR

256512

In summary

• Patch based tree-AMR

• For quick gains, offload update step to GPU

• GPU-native version possible — values on GPU, neighbour relations on CPU

• Likely won’t be a one size fits all fix

Governing PDEs for phase field model

�(x, y, t)

c = (1� �)cL + �cSc(x, y, t)

cLcS

: phase

: liquid concentration: solid concentration

@c@t = r · [DS�rcS +DL(1� �)rcL]

Diffusion Interface anisotropy

Chemical driving force Phase change

@�

@t

= �M

�

r · (a2r�) +

@

@x

(a@a

@�

x

|r�|2) + @

@y

(a@a

@�

y

|r�|2)

+@

@z

(a@a

@�

z

|r�|2)� S�T

dp(�)

d�

�W

dq(�)

d�

�

M�

ap(�), q(�)DS , DL

SWT

: mobility: interface anisotropy

: interpolating functionM�

ap(�), q(�)DS , DL

SWT

q(�) : double well function: diffusion in solid, liquid

: entropy of fusion

: temperature: height of double well potential

Documents

Towards achieving GPU-native adaptive mesh reﬁnement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Reﬁne/coarsen patches