Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Lehrstuhl für SimulationUniversität Erlangen-Nürnberg
www10.informatik.uni-erlangen.de SIAM PP 14
Portland, February 2014
Harald Köstler, Uli Rüde(LSS Erlangen, [email protected])
1
The waLBerla Framework:Multi-physics Simulations on
Heterogeneous Parallel Platforms
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Outline3D printing process as motivating examplewaLBerlaperformance driven co-designscalabilityGPU accelerationperformance engineeringconclusions
2
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
waLBerla: an HPC Multiphysics Framework
Focus on lattice Boltzmann methodwritten in C++Hybridly parallelized (MPI + OpenMP)painstakingly optimized machine-specific kernels for max performancegeneric, easily adaptable kernels for prototypingall data structures exa-scalable
from desktop to multi-petascale machines (and beyond)
portable (Compiler/OS)will go open source soon
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Motivating Example: Simulation of Electron Beam Melting Process (Additive Manufacturing)
EU-Project Fast-EBM
ARCAM (Sweden)TWI (Cambridge)WTM (FAU)ZISC (FAU)
Generation of powder bedEnergy transfer by electron beam
modeling penetration depth
heat transferFlow dynamics
Melting/solidificationphase transitionsurfce tensionfluid flowwetting, capillary forces
4
Joint work withC. Körner, M. Markl, R. Ammer
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Simulation of Electron Beam Melting
5
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Lattice Boltzmann MethodLattice Boltzmann equation (single-relaxation time)
Equilibrium distribution function
Macroscopic quantities
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede 7
Geometry Initialization
Complex geometry given by surface Add regular block partitioning
Discard empty blocks
Allocate block data
Load balancing
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich Ruede
Two Multi-PetaFlops Supercomputers
JUQUEEN SuperMUC• Blue Gene/Q architecture• 458,752 PowerPC A2
cores• 16 cores (1.6 GHz) per
node• 16 GiB RAM per node• 5D torus interconnect• Europe‘s fastest
supercomputer
• Intel Xeon architecture• 147,456 cores• 16 cores (2.7 GHz) per
node• 32 GiB RAM per node• Pruned tree interconnect• World‘s fastest x86-based
supercomputer
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich Ruede
SuperMUC
Single Node Performance
JUQUEEN
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich Ruede
Weak scaling (Lid Driven Cavity) TRT
JUQUEEN 16 processes per node4 threads per process
SuperMUC4 processes per node4 threads per process
837 billion ce
ll updates
per second (G
Lups)
1.93 trillio
n cell u
pdates
per second (T
Lups)
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Summary ofPerformance Evaluation onCoronary Geometry
Weak scaling on JUQUEEN with over a trillion (1012) fluid lattice cells
Cell sizes of 1.27 µm (diameter of red blood cells about 7 µm )
Strong scaling at cell sizes of 0.1 and 0.05 mmIn excess of 2000 time steps per second
Project co-financed by Siemens Health Care DivisionPaper at Supercomputing ´13with C. Godenschwager, M. Bauer, F. Schornbaumsee also: Talk by Florian Schornbaum in MS 23, Wed.
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede 12
waLBerla on Tsubame 2.0at Tokyo Tech
Compute nodes: 1442
Processor: Intel Xeon X5670
GPU: 3 x Nvidia Tesla M2050
LINPACK performance: 1.2 Petaflops
Power consumption: 1.4 MW
Interconnect: QDR Infiniband
withC. FeichtingerJ. Habich, G. WelleinT. Aoki, Tokyo Tech
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
waLBerla with GPU acceleration
13
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Overlapping computation and communication
14
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Single node performance on Tsubame
Machine balance
Code balance
Lightspeed estimate
Performance Model II
15Computer Science X - System Simulation Group Harald Köstler ([email protected])
! Single node performance on Tsubame
! Machine balance
! Code balance
! Lightspeed estimate
Performance Model II
11
eperformancpeak bandwidth esustainabl
=mB
!!"
#$$%
&=
c
m
BBl ,1min
200304
FLOPS executed no.stored and loaded bytes no.
==cB
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Single Compute Node Performance
16
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Single Compute Node Performance II
17
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Performance Model DrivenSingle Compute Node Optimization
18
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Weak scaling, 3 GPUs per node
19
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Heterogenous CPU-GPU Simulationwith C. Feichtinger, H. Köstler, J. Habich, G. Wellein, T. Aoki (Tokyo Tech)
20
Particles: 31250, Domain: 400x400x200, Timesteps: 400 000Devices: 2 x M2070 + 1 Intel „Westmere“, Runtime: 17.5 h
Fluidized Beds:
Direct numerical simulationfully resolved particles
Fluid-structure-interaction
4-way-coupling
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Fluid-Structure Interactiondirect simulation of Particle Laden Flows (4-way coupling)
21
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Tumbling Fibers
with D. Bartuschat and K. Gustavsson (KTH Stockholm):validation against integral eqn/slender body approximation in Stokes flow
22
Büro
für G
esta
ltung
Wan
gler
& A
bele
04.
Apr
il 20
11
SIAM PP 14: Harald Köstler and Ulrich Ruede
Thank you for your attention!
Questions?
Animation by S. Bogner. Slides, reports, thesis, animations available for download at: www10.informatik.uni-erlangen.de
23