28
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG Peta-Scale Multigrid for the Stokes System with Finite Elements 9th International Conference on Large-Scale Scientific Computations Bj¨ orn Gmeiner, Ulrich R¨ ude June, 2013 Department for Computer Science 10 (System Simulation)

Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

Embed Size (px)

Citation preview

Page 1: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Peta-Scale Multigrid for the Stokes System withFinite Elements

9th International Conference on Large-Scale ScientificComputations

Bjorn Gmeiner, Ulrich Rude

June, 2013

Department for Computer Science 10 (System Simulation)

Page 2: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

2

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Contents

Hierarchical Hybrid Grids

Comparison on Different HPC-Clusters

Stokes Solver within HHG

Department for Computer Science 10 (System Simulation)

Page 3: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

3

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Hierarchical Hybrid Grids

Department for Computer Science 10 (System Simulation)

Page 4: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

4

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

HHG - Combining finite element and multigrid methods

FE mesh may be unstructured.What nodes to remove for coarsening? Not straightforward!Why not start from the coarse grid?

Department for Computer Science 10 (System Simulation)

Page 5: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

5

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Properties of the HHG approach

Advantages

• Multigrid is straightforward

• Very memory efficient

• Stencil-based for fast execution

• Subserves parallelization

⇒ 3 · 1012 unknowns are possible

Limitation

• Coarse input grid needed

• Adaptivity

Department for Computer Science 10 (System Simulation)

Page 6: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

6

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Performance Comparison on Different

HPC-Clusters

Department for Computer Science 10 (System Simulation)

Page 7: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

7

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Benchmark Problem and Discretization

Scalar elliptic equation:∫Ωε(x)∇u(x) · ∇v(x)dx =

∫Ωf (x)v(x)dx (1)

Finite elements settings:

• linear (tetrahedral) finite elements

• constant stencil for constant coefficients ε(x)

• on-the-fly assembly for variable coefficients inside themacro-elements, storage of the stencils on the interfaces

Department for Computer Science 10 (System Simulation)

Page 8: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

8

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Multigrid Setup

Multigrid sub-algorithms and parameters:

• three Gauss-Seidel for pre- and post-smoothing steps

• linear interpolation

• six multigrid levels

• parallel conjugate gradient algorithm on the coarsest grid

• direct coarse grid approximation with coefficients averaging

Department for Computer Science 10 (System Simulation)

Page 9: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

9

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Benchmark-Machine Overview

JUGENE JUQUEEN SuperMUC

System IBM BlueGene/P IBM BlueGene/Q IBM System x iDataPlex

Processor IBM PowerPC 450 IBM PowerPC A2 Intel Xeon E5-2680 8C

Clock Frequency 0.85 GHz 1.6 GHz 2.8 GHz

Number of Nodes 73 728 28 672 9 216

HW Threads per Core 4 16x4 16x2

Memory per HW Thread 0.5 GB 0.25 GB 1 GB

Network Topology 3D Torus 5D Torus Tree

GFlops per Watt 0.44 2.54 0.94

Table : System overview of the supercomputers JUGENE, JUQUEEN,and SuperMUC

Department for Computer Science 10 (System Simulation)

Page 10: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

10

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Performance comparison at ≈ 0.8 PFlop peak

JUGENE JUQUEEN SuperMUC

Single NodePeak FLOP/s (const. coeff. ε) 6% 7% 12%Peak FLOP/s (var. coeff. ε) 9% 10% 13%Peak bandwidth eq. (const. coeff. ε) 11% 53% 60%

Parallel Efficiencies (at ≈ 0.8 PFlop peak)Scaling (const. coeff. ε) 65% 64% 72%Scaling (var. coeff. ε) 94% 93% 96%Number of processes 262 144 262 144 32 768

Energy EfficiencyEnergy improvement compared to JUGENE 1 6.6 4.7(const. coeff. ε)Energy improvement compared to JUGENE 1 6.4 3.2(var. coeff. ε)

Table : Single node, parallel performance and power consumptions on thedifferent clusters.

Department for Computer Science 10 (System Simulation)

Page 11: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

11

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Parallel Efficiencies of HHG on Different Clusters

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

1.0E+07 1.0E+08 1.0E+09 1.0E+10 1.0E+11 1.0E+12 1.0E+13

Par

alle

l Eff

icie

ncy

Problem Size

JUGENE JUQUEEN SuperMUC

1 Node Card

1 Midplane

1 Island

Hybrid Parallel

262 144 cores

393 216 cores

131 072 cores

Hybrid Parallel

Department for Computer Science 10 (System Simulation)

Page 12: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

12

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Stokes Solver within HHG

Department for Computer Science 10 (System Simulation)

Page 13: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

13

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Boussinesq Model for Mantle Convection Problems

−∇ · (2µε(u)) +∇p = RaTer ,

∇ · u = 0,

∂T

∂t+ u · ∇T −∇ · (∇T ) = γ.

Helps understanding geophysical phenomena, like

• Continental drift

• Plate tectonics

• Earthquakes

Department for Computer Science 10 (System Simulation)

Page 14: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

14

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Stokes System

∇ · (2µ∇u)−∇p = RaTer , (2)

∇ · (ρu) = 0. (3)

Decoupling with a pressure correction approach using the Schurcomplement leads to(

A BT

0 −BA−1BT

)(UP

)=

(F0

).

with the matrices

Aij = (∇φui, 2µ∇φu

j) (4)

Bij = −(φqi ,∇ · φuj) (5)

Fi = (φui,RaTer ). (6)

Department for Computer Science 10 (System Simulation)

Page 15: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

15

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (0)

Department for Computer Science 10 (System Simulation)

Page 16: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

16

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Discretization with prismatic elements

Department for Computer Science 10 (System Simulation)

Page 17: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

17

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (0)

Department for Computer Science 10 (System Simulation)

Page 18: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

18

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (1)

Department for Computer Science 10 (System Simulation)

Page 19: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

19

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (2)

Department for Computer Science 10 (System Simulation)

Page 20: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

20

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (3)

Department for Computer Science 10 (System Simulation)

Page 21: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

21

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Spherical refinement of an icosahedron (4)

Department for Computer Science 10 (System Simulation)

Page 22: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

22

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Icosahedral Finite Element Mesh

Department for Computer Science 10 (System Simulation)

Page 23: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

23

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Run-time breakup of the parallel Stokes solver onJUQUEEN with 15 360 threads

15.8 s

12.2 s

8.5 s

4.6 s

1.6 s

1.0 s

42.6 s

38.2 s

28.2 s

8.8 s

4.5 s

0.1 s

Overall

Pressurecorrection

Multigrid

Multigrid, w.o.finest level

Multigrid, CL

Multigrid, CL,scalar product

Computation Communication

Department for Computer Science 10 (System Simulation)

Page 24: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

24

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

JUQUEEN: Stokes - Weak Scaling

Nodes Threads Grid points Resolution Time: (A) (B)

1 30 2.1 · 1007 32 km 30 s 89 s4 240 1.6 · 1008 16 km 38 s 114 s

30 1 920 1.3 · 1009 8 km 40 s 121 s240 15 360 1.1 · 1010 4 km 44 s 133 s

1 920 122 880 8.5 · 1010 2 km 48 s 153 s15 360 983 040 6.9 · 1011 1 km 54 s 170 s

Table : Weak scaling of the Stokes equation on the mantle geometry onJUQUEEN. The pressure residual is reduced by at least three orders ofmagnitude for (A), and eight orders of magnitude for (B).

Department for Computer Science 10 (System Simulation)

Page 25: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

25

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

SuperMUC: Stokes - Strong Scaling

27

15

8.3 4.7 3.0

80

44

25

14 8.9

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0

10

20

30

40

50

60

70

80

1 920 3 840 7 680 15 360 30 720

Par

alle

l Eff

icie

ncy

Ru

n-t

ime

[s]

Number of Cores

Time (case A) Time (case B) Parallel efficiency (case A)

Figure : Run-times of strong scaling experiments on SuperMUC from1 920 to 30 720 cores.

Department for Computer Science 10 (System Simulation)

Page 26: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

26

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Thermal advection in a thick squerical shell, simulatedusing our prototype of Terra-Neo

Department for Computer Science 10 (System Simulation)

Page 27: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

27

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Conclusions and Outlook

• Comparison of a geometric multigrid on three differentlarge-scale clusters

• First extensions of HHG to solve the Stokes system

• Utilize HHG as a test-bed for a new Earth mantle convectioncode (TERRA-NEO)

Department for Computer Science 10 (System Simulation)

Page 28: Peta-Scale Multigrid for the Stokes System with Finite ... · Hierarchical Hybrid GridsComparison on Di erent HPC-ClustersStokes Solver within HHG Peta-Scale Multigrid for the Stokes

28

Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG

Number of Number of Time per Number of Number of Time perThreads Unknowns V-cycle Threads Unknowns V-cycle

64 1.33 · 108 2.34 s 16 384 3.43 · 1010 3.15 s128 2.67 · 108 2.41 s 32 768 6.87 · 1010 3.28 s256 5.35 · 108 2.80 s 65 536 1.37 · 1011 3.39 s512 1.07 · 109 2.82 s 131 072 2.75 · 1011 3.56 s

1 024 2.14 · 109 2.82 s 262 144 5.50 · 1011 3.68 s2 048 4.29 · 109 2.84 s 524 288 1.10 · 1012 3.76 s4 096 8.58 · 109 2.96 s 1 048 576 2.20 · 1012 4.07 s8 192 1.72 · 1010 3.09 s 1 572 864 3.29 · 1012 4.03 s

Table : Weak scaling experiment on JUQUEEN solving a problem on thefull machine with up to 3.29 · 1012 unknowns.

Department for Computer Science 10 (System Simulation)