23
SIAM PP 14: Harald Köstler and Ulrich Ruede Lehrstuhl für Simulation Universität Erlangen-Nürnberg www10.informatik.uni-erlangen.de SIAM PP 14 Portland, February 2014 Harald Köstler, Uli Rüde (LSS Erlangen, [email protected]) 1 The waLBerla Framework: Multi-physics Simulations on Heterogeneous Parallel Platforms

The waLBerla Framework: Multi-physics Simulations on

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Lehrstuhl für SimulationUniversität Erlangen-Nürnberg

www10.informatik.uni-erlangen.de SIAM PP 14

Portland, February 2014

Harald Köstler, Uli Rüde(LSS Erlangen, [email protected])

1

The waLBerla Framework:Multi-physics Simulations on

Heterogeneous Parallel Platforms

Page 2: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Outline3D printing process as motivating examplewaLBerlaperformance driven co-designscalabilityGPU accelerationperformance engineeringconclusions

2

Page 3: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

waLBerla: an HPC Multiphysics Framework

Focus on lattice Boltzmann methodwritten in C++Hybridly parallelized (MPI + OpenMP)painstakingly optimized machine-specific kernels for max performancegeneric, easily adaptable kernels for prototypingall data structures exa-scalable

from desktop to multi-petascale machines (and beyond)

portable (Compiler/OS)will go open source soon

Page 4: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Motivating Example: Simulation of Electron Beam Melting Process (Additive Manufacturing)

EU-Project Fast-EBM

ARCAM (Sweden)TWI (Cambridge)WTM (FAU)ZISC (FAU)

Generation of powder bedEnergy transfer by electron beam

modeling penetration depth

heat transferFlow dynamics

Melting/solidificationphase transitionsurfce tensionfluid flowwetting, capillary forces

4

Joint work withC. Körner, M. Markl, R. Ammer

Page 5: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Simulation of Electron Beam Melting

5

Page 6: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Lattice Boltzmann MethodLattice Boltzmann equation (single-relaxation time) 

 

Equilibrium distribution function

Macroscopic quantities

   

Page 7: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede 7

Geometry Initialization

Complex geometry given by surface Add regular block partitioning

Discard empty blocks

Allocate block data

Load balancing

Page 8: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich Ruede

Two Multi-PetaFlops Supercomputers

JUQUEEN SuperMUC• Blue Gene/Q architecture• 458,752 PowerPC A2

cores• 16 cores (1.6 GHz) per

node• 16 GiB RAM per node• 5D torus interconnect• Europe‘s fastest

supercomputer

• Intel Xeon architecture• 147,456 cores• 16 cores (2.7 GHz) per

node• 32 GiB RAM per node• Pruned tree interconnect• World‘s fastest x86-based

supercomputer

Page 9: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich Ruede

SuperMUC

Single Node Performance

JUQUEEN

Page 10: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich RuedeSIAM PP 14: Ulrich Ruede

Weak scaling (Lid Driven Cavity) TRT

JUQUEEN 16 processes per node4 threads per process

SuperMUC4 processes per node4 threads per process

837 billion ce

ll updates

per second (G

Lups)

1.93 trillio

n cell u

pdates

per second (T

Lups)

Page 11: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Summary ofPerformance Evaluation onCoronary Geometry

Weak scaling on JUQUEEN with over a trillion (1012) fluid lattice cells

Cell sizes of 1.27 µm (diameter of red blood cells about 7 µm )

Strong scaling at cell sizes of 0.1 and 0.05 mmIn excess of 2000 time steps per second

Project co-financed by Siemens Health Care DivisionPaper at Supercomputing ´13with C. Godenschwager, M. Bauer, F. Schornbaumsee also: Talk by Florian Schornbaum in MS 23, Wed.

Page 12: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede 12

waLBerla on Tsubame 2.0at Tokyo Tech

Compute nodes: 1442

Processor: Intel Xeon X5670

GPU: 3 x Nvidia Tesla M2050

LINPACK performance: 1.2 Petaflops

Power consumption: 1.4 MW

Interconnect: QDR Infiniband

withC. FeichtingerJ. Habich, G. WelleinT. Aoki, Tokyo Tech

Page 13: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

waLBerla with GPU acceleration

13

Page 14: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Overlapping computation and communication

14

Page 15: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Single node performance on Tsubame

Machine balance

Code balance

Lightspeed estimate

Performance Model II

15Computer Science X - System Simulation Group Harald Köstler ([email protected])

!   Single node performance on Tsubame

! Machine balance

!   Code balance

! Lightspeed estimate

Performance Model II

11

eperformancpeak bandwidth esustainabl

=mB

!!"

#$$%

&=

c

m

BBl ,1min

200304

FLOPS executed no.stored and loaded bytes no.

==cB

Page 16: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Single Compute Node Performance

16

Page 17: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Single Compute Node Performance II

17

Page 18: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Performance Model DrivenSingle Compute Node Optimization

18

Page 19: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Weak scaling, 3 GPUs per node

19

Page 20: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Heterogenous CPU-GPU Simulationwith C. Feichtinger, H. Köstler, J. Habich, G. Wellein, T. Aoki (Tokyo Tech)

20

Particles: 31250, Domain: 400x400x200, Timesteps: 400 000Devices: 2 x M2070 + 1 Intel „Westmere“, Runtime: 17.5 h

Fluidized Beds:

Direct numerical simulationfully resolved particles

Fluid-structure-interaction

4-way-coupling

Page 21: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Fluid-Structure Interactiondirect simulation of Particle Laden Flows (4-way coupling)

21

Page 22: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Tumbling Fibers

with D. Bartuschat and K. Gustavsson (KTH Stockholm):validation against integral eqn/slender body approximation in Stokes flow

22

Page 23: The waLBerla Framework: Multi-physics Simulations on

Büro

für G

esta

ltung

Wan

gler

& A

bele

04.

Apr

il 20

11

SIAM PP 14: Harald Köstler and Ulrich Ruede

Thank you for your attention!

Questions?

Animation by S. Bogner. Slides, reports, thesis, animations available for download at: www10.informatik.uni-erlangen.de

23