Upload
vubao
View
220
Download
0
Embed Size (px)
Citation preview
Aerodynamics of a hi-performancevehicle: a parallel computing
application inside the Hi-ZEV project
Workshop “HPC enabling of OpenFOAM® for CFD applications”
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
A. De Maio(1), V. Krastev(2), P. Lanucara(3), F. Salvadore(3)
(1) Nu.m.i.d.i.a. S. r. l.(2) Dept. of Industrial Engineering, University of Rome “Tor Vergata”(3) CINECA Roma, Dipartimento SCAI
Summary
• Hi-ZEV project outline
• Preliminary evaluation of the OpenFOAM® code
• Prototype car simulations: aerodynamic results and scalability/performance tests
• Conclusions
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
• Granted by the Italian Ministry of Economic Development’s program«Industria 2015 – Nuove Tecnologie per il Made in Italy»
• The project aim is the development of an Innovative High Performance Car with Low Environmental Impact based on an Electrical/Hybrid Powertrain
• The project started on 01/01/2011 and will last until 31/12/2013
Hi-ZEV: a collaborative industrial research project
Hi-ZEV: the partners
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Technos Reat
Fondazione Italiana Nuove Comunicazioni
Icomet Microsistemi srl Elettromedia Advanced Devices spa
Dyesol Italia srl Leaff Engineering srl ISAM spa Concept Inn srl HPH Consulting
Hi-ZEV: the partners
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Technos Reat
Fondazione Italiana Nuove Comunicazioni
Icomet Microsistemi srl Elettromedia Advanced Devices spa
Dyesol Italia srl Leaff Engineering srl ISAM spa Concept Inn srl HPH Consulting
Team Leader and Project Coordinator
Hi-ZEV: the partners
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Technos Reat
Fondazione Italiana Nuove Comunicazioni
Icomet Microsistemi srl Elettromedia Advanced Devices spa
Dyesol Italia srl Leaff Engineering srl ISAM spa Concept Inn srl HPH Consulting
Team Leader and Project Coordinator
Hi-ZEV: technical Key Points
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Very light vehicle (low weight/power ratio)
High performance Hybrid Powertrain for a wide rangetorque availability
Very advanced chassis and suspensions for an excellentroad-holding
Accurate Fluid-Dynamic Design
Hi-ZEV: technical Key Points
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Very light vehicle (low weight/power ratio)
High performance Hybrid Powertrain for a wide rangetorque availability
Very advanced chassis and suspensions for an excellentroad-holding
Accurate Fluid-Dynamic Design CFD
The role of CFD inside the project
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
• In the early, as well as in the more advanced design stages, CFD can beeffectively used to optimize:
1. the external aerodynamics of the vehicle;2. the underhood aerodynamics/thermal
management;3. The HVAC systems.
• The combination of an open source fully parallelized code (OpenFOAM®) with the the HPC infrastructure ofCASPUR/CINECA represents anincredibly powerful and efficientanswer to these needs.
OpenFOAM® + HPC
CFD
Externalaerodynamics Underhood HVAC
Preliminary simulations on the Matrix cluster
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
• Preliminary evaluation of OpenFOAM®
on the Matrix infrastructure
• Standard external aerodynamics test case (Ahmed body)
• OpenFOAM-1.7.1 + OpenMPI-1.4.2 + Scotch for decomposition
• Steady state solver (simpleFoam) on unstructured grids (up to 6*106 cells)
• High-Re RANS turbulence modeling(RNG/realizable k-e + WF)
• Up to 256 cores (32 nodes) involved
8 cores x node (2 x quad core AMD Opteron23xx @ 2.1 GHz)
320 nodes with 16 GB RAM each Infiniband DDR connection between nodes 20 Tflops peak perfomance, 177 Mflops/W
sustained performance
Preliminary simulations on the Matrix cluster: computational domain
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Ahmed body results: wake flow structures, ϕ=25°
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)
(RKE)
(RNG)
Ahmed body results: wake flow structures, ϕ=25°
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)
(RKE)
(RNG)
Ahmed body results: wake flow structures, ϕ=25°
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)
(RKE)
(RNG)
Ahmed body results: wake flow structures, ϕ=35°
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
(RKE)
(RNG)
Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)
Ahmed body results: wake flow structures, ϕ=35°
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
(RKE)
(RNG)
Symmetry plane 3D (Q- criterion, Q=1 04 s- 2)
Ahmed body results: velocity profiles in the symmetry plane
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
ϕ=25° ϕ=35°
Ahmed body results: velocity profiles in the symmetry plane
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
ϕ=25° ϕ=35°
Ahmed body results: integrated rearpressure drag
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Rear pressure drag coefficients (ϕ =25) Total Difference (%)*
Slant Base
RKE 0.147 0.088 0.235 -13.3
RNG 0.147 0.083 0.230 -15.1
Lienhart et al. 0.156 0.115 0.271 -
Rear pressure drag coefficients (ϕ =35) Total Difference (%)*
Slant Base
RKE 0.110 0.107 0.217 -12.5
RNG 0.115 0.101 0.216 -12.9
Lienhart et al. 0.121 0.127 0.248 -
Comments:
Results are aligned with previous CFD studies on the 25°/35° configurations
The realizable k-ε captures fairlywell the relative drag reduction (~ 8%)in the 25° to 35° passage
Overall comparison:
Ahmed body results: some considerations about scalability
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Case description:
•Finest grid (~6*106 cells)
•PCG linear solver on pressureequation
•64-96-128-256 cores (8-12-16-32 nodes) progression
Speedup specific efficiency
88
90
92
94
96
98
100
102
8-12 12-16 16-32
Spee
dup
spec
ifice
ffici
ency
(%)
Nodes increase
. . . speedup relative increases s e nodes relative increase=
Aaaaaaa
Ahmed body results: some considerations about scalability
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Case description:
•Finest grid (~6*106 cells)
•PCG linear solver on pressureequation
•64-96-128-256 cores (8-12-16-32 nodes) progression
•Almost linear inter-node scaling(at least in the consideredinterval)
88
90
92
94
96
98
100
102
8-12 12-16 16-32
Spee
dup
spec
ifice
ffici
ency
(%)
Nodes increase
Speedup specific efficiency
Prototype car simulations
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
• Aims:1. Aerodynamic optimization of the Hi-ZEV
prototype external design;2. More systematic scalability tests on the
CASPUR/CINECA HPC infrastructures.
• Two hybrid (prisms+tetras) gridsconsidered:
1. 7.5*106 cells (symmetric);2. 15*106 cells (complete geometry).
• OpenFOAM-2.1.1 + Scotch
• Three architectures selected for the performance tests
8 cores x node (2 x quad core AMD Opteron23xx @ 2.1 GHz)
320 nodes with 16 GB RAM each Infiniband DDR connection between nodes 20 Tflops peak perfomance, 177 Mflops/W
sustained performance
Matrix (AMD Opteron)
Prototype car simulations
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
12 cores x node (2 x six-core Intel X5650 “Westmere” @ 2.67 GHz )†
16 nodes with 48 GB RAM each Infiniband QDR connection between nodes 14.3Tflops peak perfomance, 785 Mflops/W
sustained performance
Jazz (Intel Xeon)
† Each node equipped also with 2 nVidia Tesla GPU computing units, not involved in the OpenFOAMsimulations
• Aims:1. Aerodynamic optimization of the Hi-ZEV
prototype external design;2. More systematic scalability tests on the
CASPUR/CINECA HPC infrastructures.
• Two hybrid (prisms+tetras) gridsconsidered:
1. 7.5*106 cells (symmetric);2. 15*106 cells (complete geometry).
• OpenFOAM-2.1.1 + Scotch
• Three architectures selected for the performance tests
Prototype car simulations
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
16 cores x node (IBM PPCA2 @ 1.6 GHz) 10240 nodes (163840 cores) with 16 GB
RAM each (1 GB x core) Network interface with 11 links ->5D Torus 2 Pflops peak perfomance
Fermi (BG/Q)• Aims:
1. Aerodynamic optimization of the Hi-ZEVprototype external design;
2. More systematic scalability tests on the CASPUR/CINECA HPC infrastructures.
• Two hybrid (prisms+tetras) gridsconsidered:
1. 7.5*106 cells (symmetric);2. 15*106 cells (complete geometry).
• OpenFOAM-2.1.1 + Scotch
• Three architectures selected for the performance tests
Prototype car simulations: computationaldomain
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
movingfloor
inlet
half car
symmetryplane
outlet
top
side
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
OpenFOAM® settings:
•Symmetrical prism/tetra grid(exactly the same for both codes)
•simpleFoam pressure-based solver
•Realizable k-ε for turbulence + standard WF
•TVD scheme for momentumconvection, upwind for k/ε
Fluent settings:
•Symmetrical prism/tetra grid(exactly the same for both codes)
•pressure-based solver
•Realizable k-ε for turbulence + non-equilibrium WF
•Second-order upwind scheme for allconvective terms
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
OpenFOAM® Fluent
Aerodynamic coefficients
Cd = 0.32, CL = 0.14 Cd = 0.31, CL = 0.17
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Pressure distribution around the car, y=0 (symmetry plane)
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
212
pp pC
Uρ
∞
∞ ∞
−=
212
pp pC
Uρ
∞
∞ ∞
−=
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Pressure distribution around the car, y=- 0. 4
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
212
pp pC
Uρ
∞
∞ ∞
−=
212
pp pC
Uρ
∞
∞ ∞
−=
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Pressure distribution around the car, y=- 0. 7
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
212
pp pC
Uρ
∞
∞ ∞
−=
212
pp pC
Uρ
∞
∞ ∞
−=
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Total pressure distribution around the car, y=0 (symmetry plane)
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Total pressure distribution around the car, y=- 0. 4
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Total pressure distribution around the car, y=- 0. 7
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
Prototype car simulations: aerodynamicresults (OF vs. Fluent)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Total pressure distribution around the car, z=0. 1 1
Fluent, 6000 iterations
OpenFOAM, 4500 iterations
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
,
tpt
t
p pCp p
∞
∞ ∞
−=
−
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup, Matrix vs Jazz, PCG
0
4
8
12
16
20
24
0 4 8 12 16 20
Spee
dup
Number of nodes
Matrix, PCG
Jazz, PCG
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
1( )( )
node
N nodes
time per stepspeedup time per step−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup, Matrix vs Jazz, GAMG
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
0
4
8
12
16
0 4 8 12 16 20
Spee
dup
Number of nodes
Matrix, GAMG
Jazz, GAMG
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
1( )( )
node
N nodes
time per stepspeedup time per step−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup, Matrix, GAMG vs PCG
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
0
4
8
12
16
20
24
0 8 16 24 32
Spee
dup
Number of nodes
Matrix, PCG
Matrix, GAMG
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
1( )( )
node
N nodes
time per stepspeedup time per step−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup, Jazz, GAMG vs PCG
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
0
4
8
12
16
20
24
0 4 8 12 16 20
Spee
dup
Number of nodes
Jazz, PCG
Jazz, GAMG
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
1( )( )
node
N nodes
time per stepspeedup time per step−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Comments:
The PCG solver clearly outperformsGAMG when the parallelization startsto become extensive (approximatelyabove 100 processes for the half-carcase)
Jazz appears to scale better thanMatrix, probably because of the more capable infiniband network (QDR vs DDR) and of better cache “filling” asthe single processes become smaller
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•The computing node is selected asthe fundamental unit
Prototype car simulations: inter-nodescalability tests (Matrix vs. Jazz)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Time- per- step, Matrix, GAMG vs PCG
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
0
10
20
30
40
50
60
70
8 16 32 64 128 256
time
(s)
Number of cores
Matrix, PCG
Matrix, GAMG
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Time- per- step, Jazz, GAMG vs PCG
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
0
5
10
15
20
25
30
12 24 48 96 192
time
(s)
Number of cores
Jazz, PCG
Jazz, GAMG
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Time- per- step, single- node, Matrix, GAMG vs PCG
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
0
50
100
150
200
250
300
1 2 4 8
time
(s)
Number of cores
Matrix, PCG
Matrix, GAMG
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
Time- per- step, single- node, Jazz, GAMG vs PCG
0102030405060708090
100
1 2 6 12
time
(s)
Number of cores
Jazz, PCG
Jazz, GAMG
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: absolute and single-node performances (Matrix vs. Jazz)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•Time-per-step evaluated on a per-core basis
Comments:
Though the very inefficient intra-node scaling, the newer Intel arch. is(as expected) much faster than the AMD one
If the procs. number is kept in the “acceptable scaling range”, the GAMG solver is always faster than the PCG one (e. g. 40% faster on 64 Matrix cores)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Speedup efficiency, 1 6 ppn, PCG vs GAMGCase description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)
0
20
40
60
80
100
120
2 4 8 16 32 64 128 256
Spee
dup
effic
ienc
y (%
)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, GAMG, 16 PPN
1 1· ·( ). .(%) 100 ( )
node
N nodes Ntime per steps e time per step
−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
0
20
40
60
80
100
120
2 4 8 16 32 64
Spee
dup
effic
ienc
y (%
)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
Speedup efficiency, PCG, 1 6 ppn vs. 32 ppn
1 1· ·( ). .(%) 100 ( )
node
N nodes Ntime per steps e time per step
−
−
− −= − −
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
0
20
40
60
80
100
120
2 4 8 16 32 64
Spee
dup
effic
ienc
y (%
)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
Speedup efficiency, PCG, 1 6 ppn vs. 32 ppn
What about absolute performance?
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: scalabilitytests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
Time- per- step, PCG, 1 6 ppn vs. 32 ppn
Apparently usingo more ppn could be beneficial in terms of absolute performance, butactually when the number of nodes reaches a “practical” value (64) the benefit vanishes, and in addition…
0
5
10
15
20
25
30
35
2 4 8 16 32 64
time
(s)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Output generation time, PCG, 1 6 ppn vs. 32 ppn
Prototype car simulations: I/O performance tests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG linear solver on pressure
•Output generation time andinitialization time monitored
•16 and 32 MPI processes per node considered
05
101520253035404550
4 8 16 32 64 128
time
(s)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Initialization time, PCG, 1 6 ppn vs. 32 ppn
Prototype car simulations: I/O performance tests (Fermi, symmetrical grid)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG linear solver on pressure
•Output generation time andinitialization time monitored
•16 and 32 MPI processes per node considered 0
50
100
150
200
250
4 8 16 32 64 128
time
(s)
Number of nodes
Fermi, PCG, 16 PPN
Fermi, PCG, 32 PPN
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Prototype car simulations: commentsabout Fermi runs (symmetrical grid)
Comments:
The case is of course too small to prove Fermi’s real potential, but…
…up to the minimum “practical” nodenumber (64) the SIMPLE iteration scalingis acceptable (PCG)
…when the I/O capability of the nodesgets actually saturated, a dramatic dropin the I/O efficiency occurs (and thingsget even worse with 32 ppn)
Case description:
•Symmetrical grid (~7.5*106 cells)
•PCG and GAMG linear solver on pressure equation
•50 iterations monitoring, startingfrom a fairly converged solution
•16 and 32 MPI processes per node considered
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Time- per- step, PCG, symm. vs. doubledCase description:
•Doubled grid (~15*106 cells)
•PCG solver on pressure equation
•Only 16 ppn considered
•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)
Further simulations on Fermi: doubledgrid
0
0,5
1
1,5
2
2,5
3
32-64 64-128 128-256
time
(s)
Number of nodes (symm-double)
Fermi, PCG, symm
Fermi, PCG, double
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Further simulations on Fermi: doubledgrid
Case description:
•Doubled grid (~15*106 cells)
•PCG solver on pressure equation
•Only 16 ppn considered
•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)
O. g. t. , PCG, symm. vs. doubled
0
5
10
15
20
25
30
35
40
32-64 64-128 128-256
time
(s)
Number of nodes (symm-double)
Fermi, PCG, symm
Fermi, PCG, double
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Further simulations on Fermi: doubledgrid
Case description:
•Doubled grid (~15*106 cells)
•PCG solver on pressure equation
•Only 16 ppn considered
•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)
I. t. , PCG, symm. vs. doubled
0
100
200
300
400
500
600
32-64 64-128 128-256
time
(s)
Number of nodes (symm-double)
Fermi, PCG, symm
Fermi, PCG, double
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Further simulations on Fermi: doubledgrid
Comments:
The SIMPLE iteration weak-scalingperformance appears fairly good and thus should encourage more tests on bigger cases, but…
…the I/O issues are confirmed
Case description:
•Doubled grid (~15*106 cells)
•PCG and GAMG linear solver on pressure equation
•Only 16 ppn considered
•Comparison made assuming the samemesh-per-node load distribution (i. e. doubling the number of nodes forthe bigger grid)
Conclusions (1)
• Hi-ZEV a is successful example of how industry can take advantagefrom the combination of parallelized open-source CFD toolkits and highly qualified HPC infrastructures, in a collaborative project framework
• The OpenFOAM® code has been evaluated on “conventional” AMD and Intel HPC facilities for external aerodynamics applications, showing:– Good accuracy compared to well established commercial CFD codes;– Interesting parallel performances (still not totally exploited), at least for
small/medium size cases (~ 107 cells) and depending on the optimal pressuresolver choice (PCG scales better, GAMG is faster for smal procs. numbers)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Conclusions (2)
• The OpenFOAM® performances have been assessed also on the BG/Q supercomputer Fermi and, in spite of the (relatively) smallsize of the considered cases, the following remarks can beextracted:– The solver iteration scaling performances are promising (with PCG), especially in
the perspective of coping with much bigger problems;– Though for the considered cases a more conventional architecture (e. g. Intel
Xeon) seems to be a better choice, a deeper investigation should be made in order to include also performance vs. energy consumption aspects;
– Unfortunately, for massively parallel applications (thousands of processes) a dramatic I/O efficiency question rises (further evaluation needed)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Aknowledgments
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
(1) Nu.m.i.d.i.a. S. r. l.(2) Dept. of Industrial Engineering, University of Rome “Tor Vergata”(3) CINECA Roma, Dipartimento SCAI
A. De Maio(1), V. Krastev(2), P. Lanucara(3), F. Salvadore(3)
M. Testa(1) (for providing the half-car grid and Fluent results)
26-28 november, CINECA, Casalecchio di Reno (BO), Italy
Workshop “HPC enabling of OpenFOAM® for CFD applications”