20
1/20 Efficient multigrid solvers for mixed finite element discretisations in NWP models Colin Cotter , David Ham , Lawrence Mitchell , Eike Hermann M ¨ uller * , Robert Scheichl * * University of Bath, Imperial College London 16th ECMWF Workshop on High Performance Computing in Meteorology Oct 2014 Eike Mueller FEM multigrid solvers in NWP

Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

1/20

Efficient multigrid solvers for mixed finite elementdiscretisations in NWP models

Colin Cotter†, David Ham†, Lawrence Mitchell†,Eike Hermann Muller*, Robert Scheichl*

*University of Bath, †Imperial College London

16th ECMWF Workshop on High Performance Computing inMeteorology Oct 2014

Eike Mueller FEM multigrid solvers in NWP

Page 2: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

2/20

Motivation

Equations, Algorithms

Parallel Optimised

Code

Equations, Algorithms

High level code

Parallel Optimised

Code

PETSc firedrakePyOP2 *

Computational ScientistDomain specialist

discretisation

Use case: complex iterative solver for pressure correction eqn.* All credit for developing firedrake/PyOP2 goes to David Ham and his group at

Imperial, I’m giving this talk as a “user”

Eike Mueller FEM multigrid solvers in NWP

Separation of concerns and high-level abstractions

Page 3: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

3/20

Motivation

Computational challenge in numerical forecasts:

Solver for elliptic pressure correction equation

(algorithmic + parallel) scalability & performance

1 4 16 64 256 1024 4096 16384Number of GPUs

0.1

1

10

Tota

l sol

utio

n tim

e [s

]

16 64 256 1024 4096 16384 65536Number of CPU cores

256512768

CGMultigrid

Titan [nVidia Kepler]Hector [AMD Opteron]

1 4 16 64 256 1024 4096 16384Number of GPUs

109

1010

1011

1012

1013

1014

1015

1016

FLOP

s

GigaFLOP

TeraFLOP

PetaFLOP

single GPU peak

Titan theoretical peak

256512768

CGMultigrid

Weak scaling, total solution time Absolute performance

0.5 · 1012 dofs on 16384 GPUs of TITAN, 0.78 PFLOPs, 20%-50% BWFinite volume discretisation, simpler geometry

Eike Mueller FEM multigrid solvers in NWP

Page 4: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

4/20

Motivation

ChallengesFinite element discretisation (Met Office next generation dyn. core),unstructured grids⇒ more complicatedDuplicate effort for CPU, GPU (Xeon Phi,. . . ) implementation& optimisation, reinvent the wheel?Mix two issues:

algorithmic (domain specialist/numerical analyst)parallelisation & low level optimisation (computational scientist)

GoalSeparation of concerns (algorithmic↔ computational)⇒ rapid algorithm development and testing at highabstraction levelPerformance portabilityReuse existing, tested and optimised tools and libraries

⇒ Choice of correct abstraction(s)Eike Mueller FEM multigrid solvers in NWP

Page 5: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

5/20

This talk

Iterative solver for Helmholtz equation:

φ + ω∇ (φ∗u) = rφ−u − ω∇φ = ru

Mixed finite element discretisation, icosahedral grid

Matrix-free multigrid preconditioner for pressure correction

Performance portable firedrake/PyOP2 toolchain[Rathgeber et al., (2012 & 2014)]

Python⇒ automatic generation of optimised C code

Collaboration between

Numerical algorithm developers (Colin, Rob, Eike)

Computational scientists, Library developers (David,Lawrence, PETSc,PyOP2,firedrake teams)

Eike Mueller FEM multigrid solvers in NWP

Page 6: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

6/20

Shallow water equations

Model system: shallow water equations

∂φ

∂t+ ∇ (φu) = 0

∂u∂t

+ (u · ∇) u + g∇φ = 0

Implicit timestepping⇒ linear system for velocity u and height perturbation φ

φ + ω∇ (φ∗u) = rφ−u − ω∇φ = ru

reference state φ∗ ≡ 1

ω = ch∆t =√

gφ∗∆t

Eike Mueller FEM multigrid solvers in NWP

Page 7: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

7/20

Mixed system

Finite element discretisation: φ ∈ V2, u ∈ V1

⇒ Matrix system for dof-vectors Φ ∈ RnV2 , U ∈ RnV1

A(ΦU

)=

(Mφ ωBωBT −Mu

) (ΦU

)=

(Rφ

Ru

)

Mixed finite elements

DG0 + RT1 (lowest order)

DG1 + BDFM1 (higher order)

(Mφ)ij ≡

∫Ω

βi(x)βj(x) dx, (Mu)ef ≡

∫Ω

ve(x) · v f (x) dx (mass matrix)

Bie ≡

∫Ω

∇ve(x)βi(x) dx (derivative matrix)

Eike Mueller FEM multigrid solvers in NWP

Page 8: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

8/20

Iterative solver

Iterative solver

1 Outer iteration (mixed system):GMRES, Richardson, BiCGStab, . . .

Preconditioner:

P ≡(

Mφ ωBωBT −M∗u

)lumped velocity mass matrix

P−1 =

(1 0

(M∗u)−1ωBT1

) (H−1 00 −(M∗u)−1

) (1 ωB(M∗u)−1

0 1

)Elliptic positive definite Helmholtz operator

H = Mφ + ω2B (M∗u)−1 BT

2 Inner iteration (pressure system Hφ′ = R ′φ):CG with geometric multigrid preconditioner

Eike Mueller FEM multigrid solvers in NWP

Page 9: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

9/20

Multigrid

Multigrid V-cycle

smooth

smooth

smooth

smooth

smooth

smooth

smooth x 1

x 1/4

x 1/16

x 1/64

Computational bottleneckSmoother and operator application

φ′ 7→ φ′ + ρrelaxD−1H

(R ′φ − Hφ′

)H = Mφ + ω2B (M∗u)−1 BT

Intrinsic length scale

Λ/∆x = cs∆t/∆x ∼ νCFL = O(10) grid spacings (∀ resolutions ∆x)

⇒ small number of multigrid levels/coarse smoother sufficient

Eike Mueller FEM multigrid solvers in NWP

Page 10: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

10/20

Grid iteration in FEM codes

Computational bottlenecks in finite element codes

1

23

4

6

7 8

9

513

14

1516

12

10 11

5

4

6

7 8

10

1112 5

51913

16

18

2021

1715

14

9

um (1)

um (3) um (2)

ɸee

e

e

Example: Weak divergenceφ = ∇ · u→ Φ = BU

φe 7→ φe + b(e)1 ume (1)

+ b(e)2 ume (2) + b(e)

3 ume (3)

Loop over topological entities (e.g. cells). In each cellAccess local dofs (e.g. pressure and velocity)Execute local kernelWrite data back

Manage halo exchanges, thread parallelismEike Mueller FEM multigrid solvers in NWP

Page 11: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

11/20

Firedrake/PyOP2 framework

PyOP2Parallelunstructuredmeshcomputationframework

FiredrakePerformance-portableFinite-elementcomputationframework

Unified FormLanguage (UFL)

PyOP2Interface

modifiedFFC

Parallel scheduling, code generation

CPU(OpenMP/OpenCL)

GPU(PyCUDA /PyOpenCL)

Futurearch.

Problem definitionin FEM weak form

Local assemblykernels (AST)

Parallel loops: kernelsexecuted over mesh

Explicitlyparallelhardware-specificimplemen-tation

Meshes,matrices,vectors

PETSc4py (KSP,SNES, DMPlex)

FiredrakeInterface

MPI

Geometry,(non)linearsolves

assembly,compiledexpressions

FIAT

parallelloop

parallelloop

COFFEEAST optimizer

data structures(Set, Map, Dat)

Algorithmsdevelopers,num. analysts

Eike, Rob, Colin

Librarydevelopers,comp. scientists

Lawrence, David,PyOP2firedrake,PETScdevelopers

[Florian Rathgeber, PhD thesis, Imperical College (2014)]

Eike Mueller FEM multigrid solvers in NWP

Page 12: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

12/20

UFL

Unified Form Language [Alnæs et al. (2014)]Example: Bilinear form

a(u, v) =

∫Ω

u(x)v(x) dx

Python code

from firedrake import *

mesh = UnitSquareMesh(8,8)

V = FunctionSpace(mesh,’CG’,1)

u = Function(V)

v = Function(V)

a_uv = assemble(u*v*dx)

Eike Mueller FEM multigrid solvers in NWP

Page 13: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

13/20

Generated code

C-Code generated byFiredrake/PyOP2

Kernel (below)execution over grid (right)

OpenMP backend

Eike Mueller FEM multigrid solvers in NWP

Page 14: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

14/20

Putting it all together

≈ 1, 600 lines of highly modular python code (incl. comments)1 Use UFL wherever possible

∫φ(∇·u) dx 7→ assemble(phi*div(u)*dx) (weak divergence operator)

2 Use PETSc for iterative solversprovide operators/preconditioners as callbacks

3 Escape hatch: Write direct PyOP2 parallel loops

lumped mass matrix division: U 7→ (M∗u)−1 Ukernel_code = ’’’void lumped_mass_divide(double **m,double **U)

for (int i=0; i<4; ++i) U[i][0] /= m[0][i];

’’’

kernel = op2.Kernel(kernel_code,"lumped_mass_divide")

op2.par_loop(kernel,facetset,

m.dat(op2.READ,self.facet2dof_map_facets),

u.dat(op2.RW,self.facet2dof_map_BDFM1))

Eike Mueller FEM multigrid solvers in NWP

Page 15: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

15/20

Helmholtz operator

Helmholtz operator Hφ = Mφφ + ω2B(M∗u)−1BTφ

Operator applicationclass Operator(object):[...]

def apply(self,phi):

psi = TestFunction(V_pressure)

w = TestFunction(V_velocity)

B_phi = assemble(div(w)*phi*dx)

self.velocity_mass_matrix.divide(B_phi)

BT_B_phi = assemble(psi*div(B_phi)*dx)

M_phi = assemble(psi*phi*dx)

return assemble(M_phi + self.omega**2*BT_B_phi)

Interface with PETScpetsc_op = PETSc.Mat().create()

petsc_op.setPythonContext(Operator(omega))petsc_op.setUp()

ksp = PETSc.KSP()

ksp.create()

ksp.setOperators(petsc_op)

Eike Mueller FEM multigrid solvers in NWP

Page 16: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

16/20

Algorithmic performance

Convergence historyInner Solve: 3 CG iterations with multigrid preconditioner

Icosahedral grid, 327,680 cells

4 multigrid levels

0 5 10 15 20iteration

10-6

10-5

10-4

10-3

10-2

10-1

100

rela

tive r

esi

dual ||r|| 2/||r

0|| 2

DG0 +RT0 Richardson

DG0 +RT0 GMRES

DG1 +BDFM1 Richardson

DG1 +BDFM1 GMRES

Eike Mueller FEM multigrid solvers in NWP

Page 17: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

17/20

Efficiency

Single node run on ARCHER, lowest order, 5.2 · 106 cells[preliminary]

0 1 2 3 4 5 6 7time [s]

17.1GB/s

10.6GB/s

4.7GB/s

1.3GB/s

Total solve

Pressure solve

Helmholtz operator

STREAM triad: 74.1GB/s per node⇒ up to ≈ 23% of peak BW

Eike Mueller FEM multigrid solvers in NWP

Page 18: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

18/20

Parallel scalability

Weak scaling on ARCHER [preliminary]

6 24 96 384

1536

Number of cores

100

101

102

Solu

tion t

ime [

s]Number of cells

1.3

mio

5.2

mio

21.0

mio

83.9

mio

335.

5mio

hypre BoomerAMG

matrix-free MGDG1 +BDFM1

DG0 +RT0

Eike Mueller FEM multigrid solvers in NWP

Page 19: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

19/20

Conclusion

Summary

Matrix-free multigrid-preconditioner for pressure correctionequationImplementation in firedrake/PyOP2/PETSc framework:

Performance portabilityCorrect abstractions⇒ Separation of concerns

Outlook

Test on GPU backend

Extend to 3d: Regular vertical grid⇒ Improved caching⇒Improved memory BW

Improve and extend parallel scaling

Full 3d dynamical core (Colin Cotter)

Eike Mueller FEM multigrid solvers in NWP

Page 20: Efficient multigrid solvers for mixed finite element ... · Eike Mueller FEM multigrid solvers in NWP. 2/20 Motivation Equations, Algorithms Parallel Optimised Code Equations, Algorithms

20/20

References

The firedrake project: http://firedrakeproject.org/

F. Rathgeber et al.: Firedrake: automating the finite elementmethod by composing abstractions. in preparation

F. Rathgeber et al.: PyOP2: A High-Level Framework forPerformance-Portable Simulations on Unstructured Meshes.HPC, Networking Storage and Analysis, SC Companion, p 1116-1123, Los

Alamitos, CA, USA, 2012. IEEE Computer Society

E. Muller et al.: Petascale elliptic solvers for anisotropic PDEson GPU clusters.Submitted to Parallel Computing (2014) [arxiv:1402.3545]

E. Muller, R. Scheichl: Massively parallel solvers for ellipticPDEs in Numerical Weather- and Climate Prediction.QJRMS (2013) [arxiv:1307.2036]

Helmholtz solver code on githubhttps://github.com/firedrakeproject/firedrake-helmholtzsolver

Eike Mueller FEM multigrid solvers in NWP