62
Towards achieving GPU-native adaptive mesh refinement Ania Brown Prof Takayuki Aoki

Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Towards achieving GPU-native adaptive mesh refinement

Ania Brown

Prof Takayuki Aoki

Page 2: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Why AMR?

Page 3: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

AMR is not GPU friendly

• Complicated, time varying data structures

• Can you use AMR and keep GPU performance?My conclusion: yes, but it’s messy

Page 4: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Contents

• Introduction to the algorithm + data structures

• The challenges

• Optimisation possibilities

• The RSE perspective — some lessons learnt

Page 5: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Problem domain

• Stencil calculations on a square structured mesh

• Cell centre values

i, j+1

i+1, ji-1, j i, j

i, j-1

Page 6: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

1) Block structured AMR

Page 7: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

1) Block structured AMR

Page 8: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

1) Block structured AMR

Page 9: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

2) Tree based AMRSimulation mesh Refinement representation

Page 10: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Simulation mesh Refinement representation

2) Tree based AMR

Page 11: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Simulation mesh Refinement representation

2) Tree based AMR

Page 12: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Simulation mesh Refinement representation

… …

2) Tree based AMR

Page 13: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Simulation mesh Refinement representation

… …… …

2) Tree based AMR

Page 14: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

3) Patches +Tree AMR

Page 15: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Block-structured Enzo CHOMBO

Octree RAMSES

Octree + patches FLASH, using PARAMESH NIRVANA

CPU GPU

Octree + patches GAMER (2011) Daino (2016)

Page 16: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

The AMR algorithmInitialize data structures

Create halo regions

Update patch values

Refine/coarsen patches

Output values for visualisation

Main loop:

Update neighbour relations

Page 17: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Initialize data structures

Main loop:

CPU GPU

Create halo regions

Update patch values

Refine/coarsen patches

Output values for visualisation

Update neighbour relations

Page 18: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Update step

1 CUDA block

Page 19: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Update step

• Tune block size for coalesced access

• Zmarching

Page 20: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Initialize data structures

Main loop:

CPU GPU

Create halo regions

Update patch values

Refine/coarsen patches

Output values for visualisation

Update neighbour relations

Page 21: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Initialize data structures

Main loop:

CPU GPU

Create halo regions

Update patch values

Refine/coarsen patches

Output values for visualisation

Update neighbour relations

Page 22: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Ordering leaves

Page 23: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Hilbert curve

At each step:

• Divide space into 4

• Replace each quadrant with rotated or reflected versions of the original curve

• Connect such that that start and end points remain the same

Page 24: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Rules for refinement

Page 25: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Neighbour relations

Leaf nodes:Neighbour indices in each direction Parent index

Parent nodes: Child indices

Page 26: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Create halo regionsFind correct neighbour node

Page 27: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Create halo regionsFind correct neighbour node

Page 28: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Create halo regionsFind correct neighbour node

Page 29: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Create halo regionsCopy halo values

Page 30: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Interpolating halo values

Page 31: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Interpolating halo values

Page 32: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Interpolating halo values

Page 33: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Interpolating halo values

Page 34: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Interpolating halo values

Page 35: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Reducing halo values

Page 36: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Reducing halo values

Page 37: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Main loop:

CPU GPU

Refine/coarsen patch values

Update neighbour relations

Defragment value array

Find patches to coarsen/refine

Coarsen/refine step

Page 38: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Defragment value array refine node

Page 39: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Defragment value array refine node

Page 40: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Defragment value array refine node

Page 41: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Defragment value array coarsen node

Page 42: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Defragment value array coarsen node

Page 43: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Defragment value array coarsen node

Page 44: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Calculate new defragmented position

• Input: for all nodes, flag whether that node is to be refined, coarsened or unchanged

• Refined: +3Coarsened: -3Unchanged: 0

• For each element, sum all preceding elements in the array

• For n nodes, requires n/2 threads and O(log2(n)) serial steps

Page 45: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Multi-GPULoad balancing

Page 46: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Boundaries between subdomains

Page 47: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Node 0

Node 1

Node 2

Boundaries between subdomains

Page 48: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Node 0

Node 1

Node 2

Boundaries between subdomains

Page 49: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

How to distribute tree

Page 50: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

How to distribute tree

Page 51: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Software design

• Code framework — allow user to edit/add functions for initialisation, resolution criterion, stencil calc

• Code generation — annotated regular data structures

• How much to offer? — cell/node centre, interpolation level, stencil type

Page 52: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Software development process

• Unit testing

• Verification

• Profiling

Page 53: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Code by: T.Shimokawabe, T.Takaki (2011)

Phase field model of dendritic solidification in a binary alloy

Page 54: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

7 refinement levels in quad-tree

Page 55: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Regular mesh

Adaptive mesh

Page 56: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Performance testing for dendritic solidification model

256 x 256

L = 1.5⇥ 10�3m

R = 4.5⇥ 10�4m

�xmin = 6⇥ 10�6m

�x

max

= 1.2⇥ 10�5m

Page 57: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

8192 x 8192

L = 1.5⇥ 10�3m

R = 4.5⇥ 10�4m

�x

max

= 1.2⇥ 10�5m

�xmin = 1.9⇥ 10�7m

Performance testing for dendritic solidification model

Page 58: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Exec

utio

n tim

e pe

r tim

este

p (m

s)

0

62.5

125

187.5

250

Resolution

1024

2048

3072

4096

5120

6144

7168

8192AMR Regular Mesh Worst case AMR

256512

Page 59: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

In summary

• Patch based tree-AMR

• For quick gains, offload update step to GPU

• GPU-native version possible — values on GPU, neighbour relations on CPU

• Likely won’t be a one size fits all fix

Page 60: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches
Page 61: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches

Governing PDEs for phase field model

�(x, y, t)

c = (1� �)cL + �cSc(x, y, t)

cLcS

: phase

: liquid concentration: solid concentration

@c@t = r · [DS�rcS +DL(1� �)rcL]

Diffusion Interface anisotropy

Chemical driving force Phase change

@�

@t

= �M

r · (a2r�) +

@

@x

(a@a

@�

x

|r�|2) + @

@y

(a@a

@�

y

|r�|2)

+@

@z

(a@a

@�

z

|r�|2)� S�T

dp(�)

d�

�W

dq(�)

d�

M�

ap(�), q(�)DS , DL

SWT

: mobility: interface anisotropy

: interpolating functionM�

ap(�), q(�)DS , DL

SWT

q(�) : double well function: diffusion in solid, liquid

: entropy of fusion

: temperature: height of double well potential

Page 62: Towards achieving GPU-native adaptive mesh refinement€¦ · Daino (2016) The AMR algorithm Initialize data structures Create halo regions Update patch values Refine/coarsen patches