Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Nils Thürey , Thomas Pohl , Ulrich RüdeInstitute for System-Simulation
University of Erlangen-Nürnberg
Parallelization Techniques for LBM Free Surface Flows using MPI and OpenMP
2
Overview
• Introduction
• OpenMP Parallelization
• MPI Parallelization
• Validation Experiment
• Conclusions
3
Overview
• Introduction
• OpenMP Parallelization
• MPI Parallelization
• Validation Experiment
• Conclusions
4
Introduction
Simulations with Free Surface Flows
• Applications, e.g.:– Computer graphics: special effects– Engineering: metal foam simulations
• Lattice Boltzmann method:– D3Q19 lattice– BGK collision with Smagorinsky
turbulence model– Grid compression to reduce memory
requirements
5
Free Surface Flows with LBM
• Similar to Volume-of-Fluid:– Track fill fraction for each cell– Compute mass transfer– Interface with a closed layer of cells
• Extension for adaptive Grids:– Coarse grids for large fluid volumes – Adapt to movement of surface
Details in, e.g.: Lattice Boltzmann Model for Free Surface Flow for Modeling Foaming; C. Körner, M. Thies, T. Hofmann, N. Thürey and U. Rüde; J. Stat. Phys. 121, 2005
6
Free Surface Flow Example
7
Example with Moving Objects
8
Overview
• Introduction
• OpenMP Parallelization
• MPI Parallelization
• Validation Experiment
• Conclusions
9
OpenMP Parallelization• OpenMP for shared memory architectures
• Partition along y-axis, synchronize layers
10
OpenMP and Grid-Compression• Problem: dependency of updates
• Use boundary layer: offset to (0,0,+2) instead of (+1,+1,+1)
• Slightly increased memory requirements
11
OpenMP Performance
• Measurements with 2/4-way Opterons:
12
Overview
• Introduction
• OpenMP Parallelization
• MPI Parallelization
• Validation Experiment
• Conclusions
13
MPI Parallelization
• MPI for distributed memory architectures• Partition along x-axis• Transfer boundary layer over the network• No coarsening of boundary layer allowed
14
MPI Performance• Measurements on 4-way Opterons with
Infiniband interconnect:
15
MPI Performance with adaptive Grids:• Same Opteron-cluster, with and without adaptive coarsening:
16
Example Simulation
• Resolution: 880*880*336; 260M cells, 6.5M active on average
17
Overview
• Introduction
• OpenMP Parallelization
• MPI Parallelization
• Validation Experiment
• Conclusions
18
Numerical Experiment: Single Rising Bubble
• Implementation only using MPI
• Validation for Metal Forms or e.g. Bubble Reactors
• Comparison to 2D Level-Set Volume of Fluid method
• Modified Parameter of Animation: Surface Tension
19
Numerical Experiment: Single Rising Bubble
20
Parallel MPI Performance
21
Overview
• Introduction
• OpenMP Parallelization
• MPI Parallelization
• Validation Experiment
• Conclusions
22
Conclusions
• High performance by combining OpenMP and MPI
• Grid compression requires modifications
• Adaptive coarsening is problematic
23
24
• Unused slides:
25
Cell LBM Simulations
• Goal: …
• Available cell systems: – Blades– Playstation 3
26
Cell Architecture
27
Cell Performance Measurements
• Implementation issues:• as much as possible work on SPUs• SIMD vectorization, also of bounce-back• memory layout must support alignment
restrictions for DMAs and SIMD
MLSUPS: 1 node full blade PS31 SPU/CPU 40 64 41
2 SPUs/CPU 78 111 783 SPUs/CPU 97 143 854 SPUs/CPU 98 164 85
28
PerformanceFree surface LBM-Code
Standard LBM-Code
Performance lousy on a single node! Conditionals: 2,9 SLBM 51 free surface LBM
Pentium 4: almost no degradation ~ 10%SR 8000: enormous degradation (pseudo-vector, predictable jumps)
29
Numerical Experiment: Single Rising BubbleUNUSED