Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
2
Overview of GPU Progress for CFD GPU Acceleration of ANSYS Fluent GPU Acceleration of OpenFOAM
Agenda: GPU Acceleration for Applied CFD
3
GPU progress in CFD research continues to expand
Growth from particle-based CFD and high-order methods Explicit schemes generally more progress than implicit
Strong GPU investments by commercial CFD vendors (ISVs)
Breakthroughs in GPU-parallel linear solvers and preconditioners GPUs for 2nd-level parallelism, preserves costly MPI investment ISV focus on hybrid parallel CFD that utilizes all CPU cores + GPU
GPU progress for end-user developed CFD with OpenACC Most benefits to aerospace companies with legacy Fortran
GPUs behind fast growth in particle-based commercial CFD New ISV developments in lattice Boltzmann (LBM) and SPH
GPU Progress Summary for GPU-Parallel CFD
4
Structured Grid FV Unstructured FV Unstructured FE
CFD Software Character and GPU Suitability
Explicit
Usually
Compressible
Implicit
Usually
Incompressible
Finite Volume Finite Element:
Numerical operations on I,J,K stencil, no “solver” [Typically flat profiles: GPU strategy of directives (OpenACC)]
Sparse matrix linear algebra – iterative solvers [Hot spot ~50%, small % LoC: GPU strategy of CUDA and libs]
5
Structured Grid FV Unstructured FV Unstructured FE
CFD Speedups for GPU Relative to 8-Core CPU
Explicit
Usually
Compressible
Implicit
Usually
Incompressible
Finite Element:
Turbostream
SJTU RANS
- SD++
Stanford
(Jameson)
- FEFLO
(Lohner)
Veloxi
~10x ~5x
Finite Volume
Structured grid explicit
generally best GPU fit
6
Typical Routine Simulation
Large-scale Simulation ~19x Speedup
http://www.turbostream-cfd.com/ Source:
Sample Turbostream GPU Simulations
Turbostream: CFD for Turbomachinery
7
GPU Application
SJTU-developed explicit CFD RANS for aerodynamic evaluation of wing shapes
GPU Benefit
Use of Tesla C2070: 37x vs. single core Intel core i7 CPU
Faster simulations for more wing design candidates vs. costly wind tunnel tests
Expanding to multi-GPU and full aircraft
COMAC and SJTU Commercial Aircraft Corporation of China
COMAC Wing Candidate
ONERA M6 Wing CFD Simulation
Commercial Aircraft Wing Design on GPUs
8
Structured Grid FV Unstructured FV Unstructured FE
CFD Speedups for GPU Relative to 8-Core CPU
Explicit
Usually
Compressible
Implicit
Usually
Incompressible
- Moldflow
- AcuSolve
- Moldex3D
Turbostream
SJTU RANS
- SD++
Stanford
(Jameson)
- FEFLO
(Lohner)
Veloxi
~15x ~5x
~2x
- ANSYS Fluent
- Culises for
OpenFOAM
- SpeedIT for
OpenFOAM
- CFD-ACE+
- FIRE
Commercial CFD mostly
unstructured implicit
Finite Volume Finite Element:
9
Strategic Alliances
Applications Support
Software Development
Business and technical alliances with key ISVs (ANSYS, CD-adapco, etc.) Invest in long-term technical collaboration for ANSYS Fluent acceleration Develop key technical collaborations with CFD research community:
TiTech—Aoki, Stanford—Jameson, Oxford—Giles, Wyoming—Mavriplis, others
NVIDIA linear solver toolkit with emphasis on AMG for industry CFD Invest in relevant high-order methods (DGM, flux reconstruction, etc.)
Direct developer support for range of ISV and customer requests
Implicit Schemes: Integration support of libraries and solver toolkit Explicit Schemes: Stencil libraries, OpenACC support for Fortran
NVIDIA Strategy for GPU-Accelerated CFD
10
ISV Primary Applications (Green color indicates CUDA-ready during 2013)
ANSYS ANSYS Mechanical; ANSYS Fluent; ANSYS HFSS
DS SIMULIA Abaqus/Standard; Abaqus/Explicit; Abaqus/CFD
MSC Software MSC Nastran; Marc; Adams
Altair RADIOSS; AcuSolve
CD-adapco STAR-CD; STAR-CCM+
Autodesk AS Mechanical, Moldflow, AS CFD
ESI Group PAM-CRASH imp; CFD-ACE+
Siemens NX Nastran
LSTC LS-DYNA; LS-DYNA CFD
Mentor FloEFD, FloTherm
Metacomp CFD++
Primary Commercial CAE and GPU Progress
11
Additional Commercial GPU Developments ISV Domain Location Primary Applications
FluiDyna CFD Germany Culises for OpenFOAM; LBultra
Vratis CFD Poland Speed-IT for OpenFOAM; ARAEL
Prometech CFD Japan Particleworks
Turbostream CFD England, UK Turbostream
IMPETUS Explicit FEA Sweden AFEA
AVL CFD Austria FIRE
CoreTech CFD (molding) Taiwan Moldex3D
Intes Implicit FEA Germany PERMAS
Next Limit CFD Spain XFlow
CPFD CFD USA BARRACUDA
Flow Science CFD USA FLOW-3D
SCSK Implicit FEA Japan ADVENTURECluster
CDH Implicit FEA Germany AMLS; FastFRS
FunctionBay MB Dynamics S. Korea RecurDyn
Cradle Software CFD Japan SC/Tetra; scSTREAM
12
Every primary ISV has products available on GPUs or ongoing evaluation
The 4 largest ISVs all have products based on GPUs, some at 3rd generation
ANSYS SIMULIA MSC Software Altair
The top 4 out of 5 ISV applications are available on GPUs today
ANSYS Fluent, ANSYS Mechanical, Abaqus/Standard, MSC Nastran, . . . LS-DYNA implicit only
Several new ISVs were founded with GPUs as a primary competitive strategy Prometech, FluiDyna, Vratis, IMPETUS, Turbostream
Open source CFD OpenFOAM available on GPUs today with many options Commercial options: FluiDyna, Vratis; Open source options: Cufflink, Symscape ofgpu, RAS, etc.
Status Summary of ISVs and GPU Acceleration
13
Basics of GPU Computing for ISV Software
ISV software use of GPU acceleration is user-transparent
Jobs launch and complete without additional user steps
User informs ISV application (GUI, command) that a GPU exists
Schematic of a CPU with an attached GPU accelerator
CPU begins/ends job, GPU manages heavy computations
Schematic of an x86 CPU with a GPU accelerator
1. ISV job launched on CPU
2. Solver operations sent to GPU
3. GPU sends results back to CPU
4. ISV job completes on CPU
GD
DR
GD
DR
DDR
DDR
GPU I/O Hub
PCI-Express
CPU
Cache
1
4
2
3
14
Commercial CFD Focus on Sparse Solvers
CFD Application Software
+
GPU CPU - Hand-CUDA Parallel
- GPU Libraries, CUBLAS
- OpenACC Directives
Implicit Sparse Matrix Operations
40% - 65% of
Profile time,
Small % LoC
(Investigating OpenACC for more tasks on GPU)
Read input, matrix Set-up
Global solution, write output
Implicit Sparse Matrix Operations
15
Toolkit of linear solvers, preconditioners, other, for large sparse Ax=b
Available schemes include:
AMG – multi-level scheme popular with several commercial CFD
Jacobi, BiCGStab, FGMRES, MC-DILU, and others
Use of NVIDIA linear solver toolkit for industry-ready CFD:
ANSYS 14.5 collaboration introduced their AMG-GPU solver in Nov 2012
FluiDyna collaboration on Culises 2.0 AMG solver library for OpenFOAM
Other ISVs and customer CFD software undergoing evaluation . . .
Accelerate state-of-the-art multi-level linear solvers in targeted
application domains
Primary Targets: CFD and Reservoir Simulation
Other domains will follow
Focus on difficult-to-parallelize algorithms
Parallelize both setup and solve phases
Difficult problems: parallel graph algorithms, sparse matrix
manipulation, parallel smoothers
No groups have successfully mapped production-quality algorithms to
fine-grained parallel architectures
Ensure NVIDIA architecture team understands these
applications and is influenced by them
BiCGstab AMG Jacobi
MC-DILU
NVIDIA Offers an Accelerated Solver Toolkit
16
GPU Developments for Aircraft CFD Developer Location Software (Green color indicates GPU-ready during 2013)
NASA USA OVERFLOW
NASA USA FUN3D
AFRL USA AVUS
ONERA France elsA
Stanford/Jameson USA SD++
JAXA Japan UPACS
ANSYS USA ANSYS Fluent 15.0
CD-adapco USA/UK STAR-CCM+
Metacomp USA CFD++
ANSYS USA ANSYS Fluent 15.0
FluiDyna Germany Culises for OpenFOAM 2.2.0
Vratis Poland Speed-IT for OpenFOAM 2.2.0
CD-adapco USA/UK STAR-CCM+
External Aero
Internal Flows
17
GPU Developments for Turbine Engine CFD Developer Location Software (Green color indicates CUDA-ready during 2013)
Turbostream England, UK Turbostream 3.0
Oxford / Rolls Royce England, UK OP2 / Hydra
ANSYS USA ANSYS CFD 15.0 (Fluent + CFX)
ANSYS USA ANSYS Fluent 15.0
FluiDyna Germany Culises for OpenFOAM 2.2.0
Vratis Poland Speed-IT for OpenFOAM 2.2.0
Cascade Technologies USA CHARLES
Convergent Science USA Converge CFD
Sandia NL / Oak Ridge NL USA S3D
Naval Research Lab USA JENRE
Aviadvigatel OJSC Russia GHOST CFD
Turbomachinery
Combustor
Nozzle / Noise
18
GPU Status of Select Automotive CAE Software
Select Automotive CAE Application ISV Select CAE Software GPU Status
CSM: Durability (Stress) and Fatigue MSC Nastran Available Today
Road Handling and VPG Adams (for MBD) Evaluation
Powertrain Stress Analysis Abaqus/Standard Available Today
Body NVH MSC Nastran Available Today
Crashworthiness and Safety LS-DYNA Implicit only, beta
CFD: Aerodynamics / Thermal UH ANSYS Fluent Available Today, beta
IC Engine Combustion STAR-CCM+ Evaluation
Aerodynamics / HVAC OpenFOAM Available Today
Plastic Mold Injection Moldflow Available Today
19
GPU progress in CFD research continues to expand
Growth from particle-based CFD and high-order methods Explicit schemes generally more progress than implicit
Strong GPU investments by commercial CFD vendors (ISVs)
Breakthroughs in GPU-parallel linear solvers and preconditioners GPUs for 2nd-level parallelism, preserves costly MPI investment ISV focus on hybrid parallel CFD that utilizes all CPU cores + GPU
GPU progress for end-user developed CFD with OpenACC Most benefits to aerospace companies with legacy Fortran
GPUs behind fast growth in particle-based commercial CFD New ISV developments in lattice Boltzmann (LBM) and SPH
GPU Progress Summary for GPU-Parallel CFD
20
ISV Software Application Method GPU Status
PowerFLOW Aerodynamics LBM Evaluation LBultra Aerodynamics LBM Available v2.0 XFlow Aerodynamics LBM Evaluation Project Falcon Aerodynamics LBM Evaluation Particleworks Multiphase/FS MPS (~SPH) Available v3.5 BARRACUDA Multiphase/FS MP-PIC In development EDEM Discrete phase DEM In development ANSYS Fluent–DDPM Multiphase/FS DEM In development STAR-CCM+ Multiphase/FS DEM Evaluation AFEA High impact SPH Available v2.0
ESI High impact SPH, ALE In development LSTC High impact SPH, ALE Evaluation Altair High impact SPH, ALE Evaluation
Particle-Based Commercial CFD Software Growing
21
TiTech Aoki Lab LBM Solution of External Flows
A Peta-scale LES (Large-Eddy Simulation) for Turbulent Flows
Based on Lattice Boltzmann Method, Prof. Dr. Takayuki Aoki http://registration.gputechconf.com/quicklink/8Is4ClC
www.sim.gsic.titech.ac.jp
Aoki CFD solver using Lattice
Boltzmann method (LBM) with Large Eddy Simulation (LES)
22
FluiDyna Lattice Boltzmann Solver LBultra
Spin-Off in 2006 from TU Munich
CFD solver using Lattice Boltzmann
method (LBM)
Demonstrated 25x speedup single GPU
Multi-GPU ready
Contact FluiDyna for license details
www.fluidyna.de
http://www.fluidyna.com/content/lbultra
23
Prometech and Particleworks for Particle CFD
Oil Flow in HB Gearbox
Courtesy of Prometech Software and Particleworks CFD Software
MPS-based method developed at the University of Tokyo [Prof. Koshizuka]
Particleworks 3.0 GPU vs. 4 core i7
http://www.prometech.co.jp
24
Overview of GPU Progress for CFD GPU Acceleration of ANSYS Fluent GPU Acceleration of OpenFOAM
Agenda: GPU Acceleration for Applied CFD
25
ANSYS and NVIDIA Technical Collaboration
Release
ANSYS Mechanical ANSYS Fluent ANSYS EM
13.0 Dec 2010
SMP, Single GPU, Sparse
and PCG/JCG Solvers
ANSYS Nexxim
14.0 Dec 2011
+ Distributed ANSYS;
+ Multi-node Support
Radiation Heat Transfer
(beta)
ANSYS Nexxim
14.5 Nov 2012
+ Multi-GPU Support;
+ Hybrid PCG;
+ Kepler GPU Support
+ Radiation HT;
+ GPU AMG Solver (beta),
Single GPU
ANSYS Nexxim
15.0 Q4-2013
+ CUDA 5 Kepler Tuning + Multi-GPU AMG Solver;
+ CUDA 5 Kepler Tuning
ANSYS Nexxim
ANSYS HFSS (Transient)
26
Radiation HT Applications:
- Underhood cooling
- Cabin comfort HVAC
- Furnace simulations
- Solar loads on buildings
- Combustor in turbine
- Electronics passive cooling
ANSYS Fluent 14.5 and Radiation HT on GPU
VIEWFAC Utility:
Use on CPUs, GPUs
or both ~2x speedup
RAY TRACING Utility:
Uses OptiX library
from NVIDIA with up
to ~15x speedup
(Use on GPU only)
27
ANSYS Fluent 15.0 will offer a GPU-based AMG solver (Nov/Dec 2013) Developed with support for MPI across multiple nodes and multiple GPUs
Solver collaboration on pressure-based coupled Navier-Stokes, others to follow
Early results published at Parallel CFD 2013, 20-24 May, Changsha, CN GPU-Accelerated Algebraic Multigrid for Applied CFD
ANSYS Fluent Use of NVIDIA Solver Tooklit
28
Solve Linear System of Equations: Ax = b
Assemble Linear System of Equations
No Yes
Stop
Accelerate
this first
~ 35%
~ 65%
Runtime:
Non-linear iterations
Converged ?
ANSYS Fluent CPU Profile for Coupled Solver
29
Model FL5S1:
- Incompressible
- Flow in a Bend
- 32K Hex Cells
- Coupled Solver
nvAMG Preview of ANSYS Fluent Convergence Behavior
1.0000E-08
1.0000E-07
1.0000E-06
1.0000E-05
1.0000E-04
1.0000E-03
1.0000E-02
1.0000E-01
1.0000E+00
1 11 21 31 41 51 61 71 81 91 101 111 121 131 141
NVAMG-Cont
NVAMG-X-mom
NVAMG-Y-mom
NVAMG-Z-mom
FLUENT-Cont
FLUENT-X-mom
FLUENT-Y-mom
FLUENT-Z-mom
Numerical Results
Mar 2012: Test for convergence at each iteration
matches precise Fluent behavior
Err
or
Resid
uals
Iteration Number
ANSYS Fluent 14.5 GPU Solver Convergence
30
2832
933
517 517
0
1000
2000
3000
Dual Socket CPU Dual Socket CPU + Tesla C2075
AN
SY
S F
luent
AM
G S
olv
er
Tim
e (
Sec)
2 x Xeon X5650, Only 1 Core Used
1.8x
5.5x
Lower is
Better
2 x Xeon X5650, All 12 Cores Used
ANSYS Fluent 14.5 GPU Acceleration
Helix geometry
1.2M Tet cells
Unsteady, laminar
Coupled PBNS, DP
AMG F-cycle on CPU
AMG V-cycle on GPU
Helix Model
NOTE: All jobs
solver time only
Preview of ANSYS Fluent 14.5 Performance – by ANSYS, Aug 2012
31
0
3
6
9
Airfoil (hex 784K) Aircraft (hex 1798K)
K20X
3930K(6)
Lower is
Better
NOTE: Times
for solver only
AN
SY
S F
luent
AM
G S
olv
er
Tim
e p
er
Itera
tion (S
ec)
ANSYS Fluent 14.5 Performance – Results by NVIDIA, Nov 2012
CPU Fluent solver:
F-cycle, agg8, DILU,
0pre, 3post
GPU nvAMG solver:
V-cycle, agg8, MC-DILU,
0pre, 3post
2 x Core-i7 3930K, Only 6 Cores Used
Solver settings:
Airfoil and Aircraft Models with Hexahedral Cells
2.4x
2.4x
ANSYS Fluent with GPU-Based AMG Solver
32
N1 N2 N3 N4
1 2 3
4
Partition on CPU
GPUs and Distributed Cluster Computing
N1
Geometry decomposed: partitions
put on independent cluster nodes;
CPU distributed parallel processing Nodes distributed
parallel using MPI
Global Solution
33
N1 N2 N3 N4
1 2 3
4
Partition on CPU
GPUs and Distributed Cluster Computing
N1
Geometry decomposed: partitions
put on independent cluster nodes;
CPU distributed parallel processing Nodes distributed
parallel using MPI
Global Solution
Execution on
CPU + GPU
GPUs shared memory
parallel using OpenMP
under distributed parallel
G1 G2 G3 G4
1
34
G1 G2 G3 G4
8-Cores 8-Cores 16-Core Server Node
Multi-GPU Acceleration of
16-Core ANSYS Fluent 15.0
(Preview) External Aero
Xeon E5-2667 + 4 x Tesla K20X GPUs
2.9X Solver Speedup
CPU Configuration CPU + GPU Configuration
ANSYS Fluent for 3.6M Cell Aerodynamic Case
35
69
41
28
12 9
0
25
50
75
Intel Xeon E5-2667, 2.90GHz
Intel Xeon E5-2667, 2.90GHz + Tesla K20X
2 x Nodes, 4 CPUs (24 Cores Total);
8 GPUs (4 ea Node)
3.5x
Lower is
Better
ANSYS Fluent for 14M Cell Aerodynamic Case
14 M Mixed cells
DES Turbulence
Coupled PBNS, SP
Times for 1 Iteration
AMG F-cycle on CPU
GPU: Preconditioned FGMRES with AMG
Truck Body Model
NOTE: All jobs
solver time only
3.3x
4 x Nodes, 8 CPUs (48 Cores Total);
16 GPUs (4 ea Node)
1 x Nodes, 2 CPUs (12 Cores Total)
ANSYS Fluent 15.0 Preview Performance – Results by NVIDIA, Jun 2013
AN
SY
S F
luent
AM
G S
olv
er
Tim
e p
er
Itera
tion (S
ec)
36
Overview of GPU Progress for CFD GPU Acceleration of ANSYS Fluent GPU Acceleration of OpenFOAM
Agenda: GPU Acceleration for Applied CFD
37
APR 24 – 26, Frankfurt, DE: ESI OpenFOAM Users Conference (first ever) http://www.esi-group.com/corporate/events/2013/OpenFOAM2013
Concentration on OpenFOAM from OpenCFD
JUN 11 – 14, Jeju, KR : 8th International OpenFOAM Workshop (first in Asia) http://www.openfoamworkshop2013.org/
Concentration on OpenFOAM-extend and Wikki
OCT 24 – 25, Hamburg, DE : 7th Open Source CFD International Conference (ICON)
http://www.opensourcecfd.com/conference2013/
Concentration on both OpenFOAM and OpenFOAM-extend
2013: Further Expansion of OF Community
ESI acquisition of OpenCFD from SGI during Sep 2012
IDAJ acquire majority stake of ICON during May 2013
This Year 3 (up from 2) OpenFOAM Global User Events:
38
Provide technical support for commercial GPU solver developments
FluiDyna Culises AMG solver library using NVIDIA toolkit
Vratis Speed-IT library, development of CUSP-based AMG
Alliances (but no development) with key OpenFOAM organizations ESI and OpenCFD Foundation (H. Weller, M. Salari)
Wikki and OpenFOAM-extend community (H. Jasak)
IDAJ in Japan and ICON in the UK – support of both OF and OF-ext
Conduct performance studies and customer benchmark evaluations Collaborations: developers, customers, OEMs (Dell, SGI, HP, etc.)
NVIDIA Market Strategy for OpenFOAM
39
Culises: CFD Solver Library for OpenFOAM
www.fluidyna.de
FluiDyna: TU Munich Spin-Off from 2006
Culises provides a linear solver library
Culises requires only two edits to control file of OpenFOAM
Multi-GPU ready
Contact FluiDyna for license details
Culises Easy-to-Use AMG-PCG Solver:
#1. Download and license from http://www.FluiDyna.de
#2. Automatic installation with FluiDyna-provided script
#3. Activate Culises and GPUs with 2 edits to config-file
config-file CPU-only config-file CPU+GPU
www.fluidyna.de
40
Culises Coupling to OpenFOAM
www.fluidyna.de
Culises Coupling is User-Transparent:
41
OpenFOAM Speedups Based on CFD Application
www.fluidyna.de
GPU Speedups for Different Industry Cases:
Job Speedup Solver Speedup OpenFOAM CPU-Only Efficiency
Automotive
1.6x
Multiphase
1.9x
Thermal
3.0x
Pharma CFD
2.2x
Process CFD
4.7x
Range of model sizes and different solver schemes (Krylov, AMG-PCG, etc.)
42
FluiDyna Culises: CFD Solver for OpenFOAM
Solver speedup of 7x
for 2 CPU + 4 GPU
• 36M Cells (mixed type)
• GAMG on CPU
• AMGPCG on GPU
Culises: A Library for Accelerated CFD on Hybrid GPU-CPU Systems
Dr. Bjoern Landmann, FluiDyna developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0293-GTC2012-Culises-Hybrid-GPU.pdf
www.fluidyna.de
DrivAer: Joint Car Body Shape by BMW and Audi
http://www.aer.mw.tum.de/en/research-groups/automotive/drivaer
Mesh Size - CPUs 9M - 2 CPU 18M - 2 CPU 36M - 2 CPU
GPUs +1 GPU +2 GPUs +4 GPUs
2.5x 4.2x 6.9x
Job Speedup 1.36x 1.52x 1.67x
43
GPUs provide significant speedups for solver intensive jobs
Improved product quality with higher fidelity modeling
Shorten product engineering cycles with faster simulation turnaround
Simulations recently considered impractical now possible
Unsteady RANS, Large Eddy Simulation (LES) practical in cost and time
Effective parameter optimization from large increase in number of jobs
Conclusions For Applied CFD on GPUs