1 1 A GPU Accelerated Explicit Finite-volume Euler Equation Solver with Ghost-cell Approach F.-A....
If you can't read please download the document
1 1 A GPU Accelerated Explicit Finite-volume Euler Equation Solver with Ghost-cell Approach F.-A. Kuo 1,2, M.R. Smith 3, and J.-S. Wu 1* 1 Department of
1 1 A GPU Accelerated Explicit Finite-volume Euler Equation
Solver with Ghost-cell Approach F.-A. Kuo 1,2, M.R. Smith 3, and
J.-S. Wu 1* 1 Department of Mechanical Engineering National Chiao
Tung University Hsinchu, Taiwan 2 National Center for
High-Performance Computing, NARL Hsinchu, Taiwan 3 Department of
Mechanical Engineering National Cheng Kung University Tainan,
Taiwan *E-mail: [email protected] 2013 IWCSE Taipei,
Taiwan October 14-17, 2013 Session: Supercomputer/GPU and
Algorithms (GPU-2)
4 Computational fluid dynamics (CFD) has played an important
role in accelerating the progress of aerospace/space and other
technologies. For several challenging 3D flow problems, parallel
computing of CFD bceomes necessary to greatly shorten the very
lengthy computational time. Parallel computing of CFD has evolved
from SIMD type vectorized processing to SPMD type
distributed-memory processing for the past 2 decades, mainly
because of the much lower cost for H/W of the latter and easier
programming. 4 Parallel CFD
Slide 5
5 SIMD (Single instruction, multiple data), which is a class of
parallel computers, performs the same operation on multiple data
points at the instruction level simultaneously. SSE/AVX
instructions in CPU and GPU computation, e.g., CUDA. SPMD (Single
program, multiple data) is a higher level abstraction where
programs are run across multiple processors and operate on
different subsets of the data. Message passing programming on
distributed memory computer architectures, e.g., MPI. 5 SIMD vs.
SPMD
Slide 6
6 Most well-known parallel CFD codes adopt SPMD parallelism
using MPI. e.g., Fluent (Ansys), CFL3D (NASA), to name a few.
Recently, because of the potentially very high C/P ratio by using
graphics processor units (GPUs), parallelization of CFD code using
GPUs has become an active research area based on CUDA, developed by
Nvidia. However, redesign of the numerical scheme may be necessary
to take full advantage of the GPU architecture. 6 MPI vs. CUDA
Slide 7
7 Split Harten-Lax-van Leer (SHLL) scheme (Kuo et al., 2011) a
highly local numerical scheme, modified from the original HLL
scheme Cartesian grid ~ 60 times of speedup (Nvidia C1060 GPU vs.
Intel X5472 Xeon CPU) with explicit implementation However, it is
difficult to treat objects with complex geometry accurately,
especially for high-speed gas flow. One example is given in the
next page. Thus, how to take advantage of easy implementation of
Cartesian grid on GPUs, while improving the capability of treating
objects with complex geometry becomes important in further
extending the applicability of SHLL scheme in CFD simulations. 7
Split HLL Scheme on GPUs
Slide 8
8 Spurious waves are often generated using staircase-like solid
surface for high-speed gas flows. 8 Shock direction Staircase-like
IBM Staircase-like vs. IBM
Slide 9
9 Immersed boundary method (IBM) (Peskin, 1972; Mittal &
Iaccarino, 2005 ) easy treatment of objects with complex geometry
on a Cartesian grid grid computation near the objects become
automatic or very easy easy treatment of moving objects in
computational domain w/o remeshing Major idea of IBM is simply to
enforce the B.C.s at computational grid points thru interpolation
among fluid grid and B.C.s at solid boundaries. Stencil of IBM
operation is local in general. Enabling an efficient use of
original numerical scheme, e.g., SHLL Easy parallel implementation
9 Immersed Boundary Method
Slide 10
10 Objectives
Slide 11
11 To develop and validate an explicit cell- centered
finite-volume solver for solving Euler equation, based on SHLL
scheme, on a Cartesian grid with cubic-spline IBM on multiple GPUs
To study the parallel performance of the code on single and
multiple GPUs To demonstrate the capability of the code with
several applications 11 Goals
Slide 12
12 Split HLL Scheme
Slide 13
13 SHLL Scheme - 1 13 i-1 i i i+1 SIMD model for 2D flux
computation +Flux- Flux Original HLL Introduce local approximations
Final form (SHLL) is a highly local scheme New S R & S L term
are approximated w/o involving the neighbor-cell data. A highly
local flux computational scheme: great for GPU!
Slide 14
14 Final Form (SHLL) Flux computation is perfect for GPU
application. Almost the same as the vector addition case. > 60
times speedup possible using a single Tesla C1060 GPU device.
Performance compares to single thread of a high-performance CPU
(Intel Xeon X5472) i-1 i i i+1 SIMD model for 2D flux computation
+Flux- Flux SHLL Scheme - 2
Slide 15
15 Cubic-spline IBM
Slide 16
16 Two Critical Issues of IBM How to approximate solid
boundaries? Local Cubic Spline for reconstructing solid boundaries
w/ much fewer points Easier calculation of surface normal/tangent
How to apply IBM in a cell-centered FVM framework? Ghost-cell
approach Obtain ghost cell properties by the interpolation of data
among neighboring fluid cells Enforce BCs at solid boundaries to
ghost cells through data mapping from image points
Slide 17
17 1.Define a cubic-spline function for each segment of
boundary data to best fit solid boundary geometry 2.Identify all
the solid cells, fluid cells and ghost points 3.Locate image points
corresponding to ghost cells Cell Identification Solid cell Fluid
cell Solid boundary curve Ghost cell
Slide 18
18 Cubic-Spline Reconstruction (Solid Boundary) The cubic
spline method provides the advantages including : 1.A high order
curve fitting boundary 2.Find these ghost cells easily. 3.Calculate
the normal vector which is normal to the body surface. 18
Slide 19
19 BCs of Euler Eqns. unit normal of body surface Approximated
form
Slide 20
20 Approximate the properties of the image points using
bi-linear interpolation among neighboring fluid cells IBM
Procedures Image point Ghost point Interpolation Fluid cell Solid
cell
Slide 21
21 SHLL/IBM Scheme on GPU
Slide 22
22 Nearly All-Device Computation 22 Initialize Flux calculation
State calculation CFL calculation Set GPU device ID and flowtime T
> flowtime flowtime += dt Output the result True False Device
Host Start IBM
Slide 23
23 Results & Discussion (Parallel Performance)
Slide 24
24 Also named as Schardins problem Test Conditions Moving shock
w/ Mach 1.5 Resolution: 2000x2000 cells CFL max =0.2 Physical time:
0.35 sec. for 9843 time- steps using one GPU 24 Parallel
Performance - 1 L=1 H=1 Moving shock x 0.2 @ t=0
27 In the case of 400x400 cells w/o IBM, the staircase solid
boundary generates spurious waves, which destroys the accuracy of
the surface properties. By comparison, the case w/ IBM shows much
more improvement for the surface properties. w/ IBM w/o IBM Shock
over a finite wedge - 1
Slide 28
28 with IBMw/o IBM Shock over a finite wedge - 2 All important
physical phenomena are well captured by the solver with IBM without
spurious wave generation. t= 0.35 s Density contour comparison
Slide 29
29 Transonic Flow past a NACA Airfoil pressure IBM result
Staircase boundary w/o IBM In the left case, the spurious waves
appear near the solid boundary, but in the right case, we modify
the boundary by using the IBM.
Slide 30
30 Transonic Flow past a NACA Airfoil Upper surf. Lower surf.
Distribution of pressure around the surface of the airfoil Ghost
cell method, J. Liu et al., 2009 New approach method These 2
results are very closed, and the right result is made by Liu in
2009, and the left result is made by the cubic spline IBM.
Slide 31
31 Transonic Flow past a NACA Airfoil Top-side shock wave
comparison * Petr Furmnek, Numerical Solution of Steady and
Unsteady Compressible Flow, Czech Technical University in Prague,
2008 New approach method Furmanek*, 2008
Slide 32
32 Transonic Flow past a NACA Airfoil Bottom-side shock wave
comparison * Petr Furmnek, Numerical Solution of Steady and
Unsteady Compressible Flow, Czech Technical University in Prague,
2008 New approach method Furmanek*, 2008
Slide 33
33 Conclusion & Future Work
Slide 34
34 A cell-centered 2-D finite-volume solver for the inviscid
Euler equation, which can easily treat objects with complex
geometry on a Cartesian grid by using the cubic-spline IBM on
multiple GPUs, is completed and validated The addition of
cubic-spline IBM only increase 3% of the computational time, which
is negligible. Speedup for GPU/CPU generally exceeds 60 times on a
single GPU (Nvidia, Telsa C1060) as compared to that on a single
thread of an Intel X5472 Xeon CPU. Speedup for GPUs/GPU reaches 3.6
at 4 GPUs (GeForce) for a simulation w/ 2000x2000 cells.
Summary
Slide 35
35 To modify the Cartesian grid to the adaptive mesh grid. To
simulate the moving boundary problem and real-life problems with
this immersed boundary method To change the SHLL solver to the
true- direction finite volume solver, likes QDS Future Work