36
1 1 A GPU Accelerated Explicit Finite- volume Euler Equation Solver with Ghost-cell Approach F.-A. Kuo 1,2 , M.R. Smith 3 , and J.-S. Wu 1* 1 Department of Mechanical Engineering National Chiao Tung University Hsinchu, Taiwan 2 National Center for High-Performance Computing, NARL Hsinchu, Taiwan 3 Department of Mechanical Engineering National Cheng Kung University Tainan, Taiwan *E-mail: [email protected] 2013 IWCSE Taipei, Taiwan October 14-17, 2013 Session: Supercomputer/GPU and Algorithms (GPU-2)

A GPU Accelerated Explicit Finite-volume Euler Equation Solver with Ghost-cell Approach

  • Upload
    kirsi

  • View
    75

  • Download
    0

Embed Size (px)

DESCRIPTION

Session: Supercomputer/GPU and Algorithms (GPU-2). A GPU Accelerated Explicit Finite-volume Euler Equation Solver with Ghost-cell Approach. F.-A. Kuo 1,2 , M.R. Smith 3 , and J.-S. Wu 1* 1 Department of Mechanical Engineering National Chiao Tung University Hsinchu , Taiwan - PowerPoint PPT Presentation

Citation preview

PowerPoint

1A GPU Accelerated Explicit Finite-volume Euler Equation Solver with Ghost-cell ApproachF.-A. Kuo1,2, M.R. Smith3, and J.-S. Wu1*1Department of Mechanical Engineering National Chiao Tung UniversityHsinchu, Taiwan2National Center for High-Performance Computing, NARLHsinchu, Taiwan 3Department of Mechanical Engineering National Cheng Kung UniversityTainan, Taiwan

*E-mail: [email protected]

2013 IWCSETaipei, TaiwanOctober 14-17, 2013Session: Supercomputer/GPU and Algorithms (GPU-2)#Background & Motivation ObjectivesSplit HLL (SHLL) SchemeCubic-Spline Immersed Boundary Method (IBM)Results & DiscussionParallel PerformanceDemonstrationsConclusion and Future work

Outline2233Background & Motivation4Computational fluid dynamics (CFD) has played an important role in accelerating the progress of aerospace/space and other technologies.For several challenging 3D flow problems, parallel computing of CFD bceomes necessary to greatly shorten the very lengthy computational time.Parallel computing of CFD has evolved from SIMD type vectorized processing to SPMD type distributed-memory processing for the past 2 decades, mainly because of the much lower cost for H/W of the latter and easier programming.

4Parallel CFD5SIMD (Single instruction, multiple data), which is a class of parallel computers, performs the same operation on multiple data points at the instruction level simultaneously. SSE/AVX instructions in CPU and GPU computation, e.g., CUDA.SPMD (Single program, multiple data) is a higher level abstraction where programs are run across multiple processors and operate on different subsets of the data.Message passing programming on distributed memory computer architectures, e.g., MPI.5SIMD vs. SPMDSPMD is a subcategory of MIMD, and is the most common style of parallel programming.[1] It is also a prerequisite for research concepts such as active messages and distributed shared memory.56Most well-known parallel CFD codes adopt SPMD parallelism using MPI.e.g., Fluent (Ansys), CFL3D (NASA), to name a few.Recently, because of the potentially very high C/P ratio by using graphics processor units (GPUs), parallelization of CFD code using GPUs has become an active research area based on CUDA, developed by Nvidia.However, redesign of the numerical scheme may be necessary to take full advantage of the GPU architecture.6MPI vs. CUDA7Split Harten-Lax-van Leer (SHLL) scheme (Kuo et al., 2011) a highly local numerical scheme, modified from the original HLL scheme Cartesian grid~ 60 times of speedup (Nvidia C1060 GPU vs. Intel X5472 Xeon CPU) with explicit implementationHowever, it is difficult to treat objects with complex geometry accurately, especially for high-speed gas flow. One example is given in the next page.Thus, how to take advantage of easy implementation of Cartesian grid on GPUs, while improving the capability of treating objects with complex geometry becomes important in further extending the applicability of SHLL scheme in CFD simulations.7Split HLL Scheme on GPUs8Spurious waves are often generated using staircase-like solid surface for high-speed gas flows.

8

Shock directionStaircase-likeIBMStaircase-like vs. IBM9Immersed boundary method (IBM) (Peskin, 1972; Mittal & Iaccarino, 2005 ) easy treatment of objects with complex geometry on a Cartesian gridgrid computation near the objects become automatic or very easyeasy treatment of moving objects in computational domain w/o remeshingMajor idea of IBM is simply to enforce the B.C.s at computational grid points thru interpolation among fluid grid and B.C.s at solid boundaries.Stencil of IBM operation is local in general.Enabling an efficient use of original numerical scheme, e.g., SHLLEasy parallel implementation9Immersed Boundary Method1010Objectives11To develop and validate an explicit cell-centered finite-volume solver for solving Euler equation, based on SHLL scheme, on a Cartesian grid with cubic-spline IBM on multiple GPUsTo study the parallel performance of the code on single and multiple GPUsTo demonstrate the capability of the code with several applications11Goals12Split HLL Scheme13SHLL Scheme - 113

i-1ii+1SIMD model for 2D flux computation+Flux- Flux

Original HLLIntroduce local approximationsFinal form (SHLL) is a highly local schemeNew SR & SL term are approximated w/o involving the neighbor-cell data. A highly local flux computational scheme: great for GPU!The new SR term is using the local approximation without the neighbor cell information. And also the new SL term too.

This is the SHLL final form.131414

Final Form (SHLL)Flux computation is perfect for GPU application.Almost the same as the vector addition case.> 60 times speedup possible using a single Tesla C1060 GPU device. Performance compares to single thread of a high-performance CPU (Intel Xeon X5472)

i-1ii+1SIMD model for 2D flux computation+Flux- FluxSHLL Scheme - 2The flux computation is now embarrassingly parallel and is perfect for GPU application.

More than 60 times speedup possible using a single Tesla C1060 GPU device.

1415Cubic-spline IBM1616Two Critical Issues of IBMHow to approximate solid boundaries?Local Cubic Spline for reconstructing solid boundaries w/ much fewer pointsEasier calculation of surface normal/tangentHow to apply IBM in a cell-centered FVM framework?Ghost-cell approachObtain ghost cell properties by the interpolation of data among neighboring fluid cells Enforce BCs at solid boundaries to ghost cells through data mapping from image points 17Define a cubic-spline function for each segment of boundary data to best fit solid boundary geometryIdentify all the solid cells, fluid cells and ghost pointsLocate image points corresponding to ghost cellsCell Identification

Solid cellFluid cellSolid boundary curveGhost cell18Cubic-Spline Reconstruction (Solid Boundary)The cubic spline method provides the advantages including : A high order curve fitting boundaryFind these ghost cells easily.Calculate the normal vector which is normal to the body surface.18

1919BCs of Euler Eqns.

unit normal of body surfaceApproximated formconsists of a tangential unit vector t , a normal unit vector n1920Approximate the properties of the image points using bi-linear interpolation among neighboring fluid cellsIBM Procedures

Image pointGhost pointInterpolationFluid cellSolid cell21SHLL/IBM Scheme on GPU22Nearly All-Device Computation22InitializeFlux calculationState calculationCFL calculation

Set GPU device IDand flowtimeT > flowtimeflowtime += dt Output the resultTrueFalseDeviceHostStartIBMThis is our CFD program flowchart. The first step is to initialize H/W and the simulation conditions on GPU and CPU.And then, we do the flux calculation, state, IBM, CFL. And then we return the time step information to Host until the flowtime is bigger than the end time. Output the result.222323Results & Discussion(Parallel Performance)24Also named as Schardins problem

Test ConditionsMoving shock w/ Mach 1.5Resolution: 2000x2000 cellsCFLmax=0.2Physical time:0.35 sec. for 9843 time- steps using one GPU24Parallel Performance - 1L=1H=1Moving shock x0.2 @ t=0The problem is considered in order to demonstrate the validity of the limiting procedure for complex moving shock capturing and shock interaction.2425Resolution2000x2000 cellsGPU cluster GPU: Geforce GTX590 (2x 512 cores, 1.2 Ghz3GB DDR5)CPU: Intel Xeon X5472Overhead w/ IBM3% onlySpeedup GPU/CPU: ~ 60x GPU/GPU: 1.9 @2 GPUs GPU/GPU: 3.6 @4 GPUs

25Sec.SpeedupParallel Performance - 22526Results & Discussion(Demonstrations)27

27

In the case of 400x400 cells w/o IBM, the staircase solid boundary generates spurious waves, which destroys the accuracy of the surface properties.By comparison, the case w/ IBM shows much more improvement for the surface properties.w/ IBMw/o IBMShock over a finite wedge - 12828

with IBMw/o IBMShock over a finite wedge - 2All important physical phenomena are well captured by the solver with IBM without spurious wave generation.t= 0.35 sDensity contour comparison

29Transonic Flow past a NACA AirfoilpressurepressureIBM resultStaircase boundary w/o IBMIn the left case, the spurious waves appear near the solid boundary,but in the right case, we modify the boundary by using the IBM.30Transonic Flow past a NACA Airfoil

Upper surf.Lower surf.Distribution of pressure around the surface of the airfoilGhost cell method, J. Liu et al., 2009New approach methodThese 2 results are very closed, and the right result is made by Liu in 2009, and the left result is made by the cubic spline IBM. Compared to the 2 results in the next page.3031Transonic Flow past a NACA Airfoil

Top-side shock wave comparison* Petr Furmnek, Numerical Solution of Steady and Unsteady Compressible Flow, Czech Technical University in Prague, 2008 New approach methodFurmanek*, 2008Our result is very closed to the other results made by the other CFD methods.3132Transonic Flow past a NACA AirfoilBottom-side shock wave comparison* Petr Furmnek, Numerical Solution of Steady and Unsteady Compressible Flow, Czech Technical University in Prague, 2008 New approach methodFurmanek*, 2008

33Conclusion & Future Work34A cell-centered 2-D finite-volume solver for the inviscid Euler equation, which can easily treat objects with complex geometry on a Cartesian grid by using the cubic-spline IBM on multiple GPUs, is completed and validated The addition of cubic-spline IBM only increase 3% of the computational time, which is negligible.Speedup for GPU/CPU generally exceeds 60 times on a single GPU (Nvidia, Telsa C1060) as compared to that on a single thread of an Intel X5472 Xeon CPU.Speedup for GPUs/GPU reaches 3.6 at 4 GPUs (GeForce) for a simulation w/ 2000x2000 cells.Summary35To modify the Cartesian grid to the adaptive mesh grid. To simulate the moving boundary problem and real-life problems with this immersed boundary method To change the SHLL solver to the true-direction finite volume solver, likes QDSFuture Work3636Thanks for your patientand Questions ?