Upload
lamthuan
View
216
Download
0
Embed Size (px)
Citation preview
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Peta-Scale Multigrid for the Stokes System withFinite Elements
9th International Conference on Large-Scale ScientificComputations
Bjorn Gmeiner, Ulrich Rude
June, 2013
Department for Computer Science 10 (System Simulation)
2
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Contents
Hierarchical Hybrid Grids
Comparison on Different HPC-Clusters
Stokes Solver within HHG
Department for Computer Science 10 (System Simulation)
3
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Hierarchical Hybrid Grids
Department for Computer Science 10 (System Simulation)
4
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
HHG - Combining finite element and multigrid methods
FE mesh may be unstructured.What nodes to remove for coarsening? Not straightforward!Why not start from the coarse grid?
Department for Computer Science 10 (System Simulation)
5
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Properties of the HHG approach
Advantages
• Multigrid is straightforward
• Very memory efficient
• Stencil-based for fast execution
• Subserves parallelization
⇒ 3 · 1012 unknowns are possible
Limitation
• Coarse input grid needed
• Adaptivity
Department for Computer Science 10 (System Simulation)
6
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Performance Comparison on Different
HPC-Clusters
Department for Computer Science 10 (System Simulation)
7
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Benchmark Problem and Discretization
Scalar elliptic equation:∫Ωε(x)∇u(x) · ∇v(x)dx =
∫Ωf (x)v(x)dx (1)
Finite elements settings:
• linear (tetrahedral) finite elements
• constant stencil for constant coefficients ε(x)
• on-the-fly assembly for variable coefficients inside themacro-elements, storage of the stencils on the interfaces
Department for Computer Science 10 (System Simulation)
8
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Multigrid Setup
Multigrid sub-algorithms and parameters:
• three Gauss-Seidel for pre- and post-smoothing steps
• linear interpolation
• six multigrid levels
• parallel conjugate gradient algorithm on the coarsest grid
• direct coarse grid approximation with coefficients averaging
Department for Computer Science 10 (System Simulation)
9
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Benchmark-Machine Overview
JUGENE JUQUEEN SuperMUC
System IBM BlueGene/P IBM BlueGene/Q IBM System x iDataPlex
Processor IBM PowerPC 450 IBM PowerPC A2 Intel Xeon E5-2680 8C
Clock Frequency 0.85 GHz 1.6 GHz 2.8 GHz
Number of Nodes 73 728 28 672 9 216
HW Threads per Core 4 16x4 16x2
Memory per HW Thread 0.5 GB 0.25 GB 1 GB
Network Topology 3D Torus 5D Torus Tree
GFlops per Watt 0.44 2.54 0.94
Table : System overview of the supercomputers JUGENE, JUQUEEN,and SuperMUC
Department for Computer Science 10 (System Simulation)
10
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Performance comparison at ≈ 0.8 PFlop peak
JUGENE JUQUEEN SuperMUC
Single NodePeak FLOP/s (const. coeff. ε) 6% 7% 12%Peak FLOP/s (var. coeff. ε) 9% 10% 13%Peak bandwidth eq. (const. coeff. ε) 11% 53% 60%
Parallel Efficiencies (at ≈ 0.8 PFlop peak)Scaling (const. coeff. ε) 65% 64% 72%Scaling (var. coeff. ε) 94% 93% 96%Number of processes 262 144 262 144 32 768
Energy EfficiencyEnergy improvement compared to JUGENE 1 6.6 4.7(const. coeff. ε)Energy improvement compared to JUGENE 1 6.4 3.2(var. coeff. ε)
Table : Single node, parallel performance and power consumptions on thedifferent clusters.
Department for Computer Science 10 (System Simulation)
11
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Parallel Efficiencies of HHG on Different Clusters
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
1.0E+07 1.0E+08 1.0E+09 1.0E+10 1.0E+11 1.0E+12 1.0E+13
Par
alle
l Eff
icie
ncy
Problem Size
JUGENE JUQUEEN SuperMUC
1 Node Card
1 Midplane
1 Island
Hybrid Parallel
262 144 cores
393 216 cores
131 072 cores
Hybrid Parallel
Department for Computer Science 10 (System Simulation)
12
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Stokes Solver within HHG
Department for Computer Science 10 (System Simulation)
13
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Boussinesq Model for Mantle Convection Problems
−∇ · (2µε(u)) +∇p = RaTer ,
∇ · u = 0,
∂T
∂t+ u · ∇T −∇ · (∇T ) = γ.
Helps understanding geophysical phenomena, like
• Continental drift
• Plate tectonics
• Earthquakes
Department for Computer Science 10 (System Simulation)
14
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Stokes System
∇ · (2µ∇u)−∇p = RaTer , (2)
∇ · (ρu) = 0. (3)
Decoupling with a pressure correction approach using the Schurcomplement leads to(
A BT
0 −BA−1BT
)(UP
)=
(F0
).
with the matrices
Aij = (∇φui, 2µ∇φu
j) (4)
Bij = −(φqi ,∇ · φuj) (5)
Fi = (φui,RaTer ). (6)
Department for Computer Science 10 (System Simulation)
15
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Spherical refinement of an icosahedron (0)
Department for Computer Science 10 (System Simulation)
16
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Discretization with prismatic elements
Department for Computer Science 10 (System Simulation)
17
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Spherical refinement of an icosahedron (0)
Department for Computer Science 10 (System Simulation)
18
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Spherical refinement of an icosahedron (1)
Department for Computer Science 10 (System Simulation)
19
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Spherical refinement of an icosahedron (2)
Department for Computer Science 10 (System Simulation)
20
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Spherical refinement of an icosahedron (3)
Department for Computer Science 10 (System Simulation)
21
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Spherical refinement of an icosahedron (4)
Department for Computer Science 10 (System Simulation)
22
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Icosahedral Finite Element Mesh
Department for Computer Science 10 (System Simulation)
23
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Run-time breakup of the parallel Stokes solver onJUQUEEN with 15 360 threads
15.8 s
12.2 s
8.5 s
4.6 s
1.6 s
1.0 s
42.6 s
38.2 s
28.2 s
8.8 s
4.5 s
0.1 s
Overall
Pressurecorrection
Multigrid
Multigrid, w.o.finest level
Multigrid, CL
Multigrid, CL,scalar product
Computation Communication
Department for Computer Science 10 (System Simulation)
24
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
JUQUEEN: Stokes - Weak Scaling
Nodes Threads Grid points Resolution Time: (A) (B)
1 30 2.1 · 1007 32 km 30 s 89 s4 240 1.6 · 1008 16 km 38 s 114 s
30 1 920 1.3 · 1009 8 km 40 s 121 s240 15 360 1.1 · 1010 4 km 44 s 133 s
1 920 122 880 8.5 · 1010 2 km 48 s 153 s15 360 983 040 6.9 · 1011 1 km 54 s 170 s
Table : Weak scaling of the Stokes equation on the mantle geometry onJUQUEEN. The pressure residual is reduced by at least three orders ofmagnitude for (A), and eight orders of magnitude for (B).
Department for Computer Science 10 (System Simulation)
25
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
SuperMUC: Stokes - Strong Scaling
27
15
8.3 4.7 3.0
80
44
25
14 8.9
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
10
20
30
40
50
60
70
80
1 920 3 840 7 680 15 360 30 720
Par
alle
l Eff
icie
ncy
Ru
n-t
ime
[s]
Number of Cores
Time (case A) Time (case B) Parallel efficiency (case A)
Figure : Run-times of strong scaling experiments on SuperMUC from1 920 to 30 720 cores.
Department for Computer Science 10 (System Simulation)
26
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Thermal advection in a thick squerical shell, simulatedusing our prototype of Terra-Neo
Department for Computer Science 10 (System Simulation)
27
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Conclusions and Outlook
• Comparison of a geometric multigrid on three differentlarge-scale clusters
• First extensions of HHG to solve the Stokes system
• Utilize HHG as a test-bed for a new Earth mantle convectioncode (TERRA-NEO)
Department for Computer Science 10 (System Simulation)
28
Hierarchical Hybrid Grids Comparison on Different HPC-Clusters Stokes Solver within HHG
Number of Number of Time per Number of Number of Time perThreads Unknowns V-cycle Threads Unknowns V-cycle
64 1.33 · 108 2.34 s 16 384 3.43 · 1010 3.15 s128 2.67 · 108 2.41 s 32 768 6.87 · 1010 3.28 s256 5.35 · 108 2.80 s 65 536 1.37 · 1011 3.39 s512 1.07 · 109 2.82 s 131 072 2.75 · 1011 3.56 s
1 024 2.14 · 109 2.82 s 262 144 5.50 · 1011 3.68 s2 048 4.29 · 109 2.84 s 524 288 1.10 · 1012 3.76 s4 096 8.58 · 109 2.96 s 1 048 576 2.20 · 1012 4.07 s8 192 1.72 · 1010 3.09 s 1 572 864 3.29 · 1012 4.03 s
Table : Weak scaling experiment on JUQUEEN solving a problem on thefull machine with up to 3.29 · 1012 unknowns.
Department for Computer Science 10 (System Simulation)