Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
© 2011 ANSYS, Inc. June 8, 20121
Laz Foley & Simon Pereira Confidence by DesignDetroit June 5, 2012
Automated Design Exploration and Optimization + HPC Best Practices
• The Path to Robust Design• ANSYS DesignXplorer • Mesh Morphing and Optimizer• RBF Morph• Adjoint Solver• HPC Best Practices
Outline
The Path to Robust Design
ngle Physics olutionccuracy, robustness, eed…
MultiphysicsSolution•Integration Platform
“What if” Study•Parametric Platform
Design Exploration•DOE, Response Surfaces, Correlation, Sensitivity, Unified reporting, etc.
Optimization•Algorithms•Published API
Robust Design•Six Sigma Analysis•Probabilistic Algorithms•Adjoint solver methods
obust Design is an NSYS Advantage
Deformation
Single Physics/Multi Physicse physics is the entry point mulation
i‐Physics enables real al prototyping
omagnetic Force Density with Thermal‐Stress and Electromagnetic Force load
Fluid Pressure Distribution
2 way coupled with a transient structural solution
Stress ?
Parametric CAD Connections
Pervasive Parameters
Persistent Updates
Managed State, Update Mechanisms
Remote Solve Manager (RSM)
Needed for "What If?"
“What If?”
Interactively adjust the parameter values and “Update”
Design Exploration
Optimization
Optimal Candidates
ANSYS DesignXplorerInitial vs. Optimized Design
Output Initial Design Optimized
Tt Ratio 1.116 1.126
pt Ratio 1.674 1.709
η [%] 71.65 76.25
Power [MW] 1.208 1.268
Engineers can easily appreciate the value of
understanding.
Engineers can easily appreciate the value of
understanding.
ust manifold design
ParametersDiameter of the dess at inletal Temperaturee RPM
All samples reports max deformation below 1.5 mm
Parametric Geometry
Pressure & Flow Velocity
Thermal
Deformation
Stress
Response ParametersMax Flow TemperatureMax DeformationMax Von‐Mises stress
Uncertainty of input
mum Displacement should not exceed 1.5 mm
Response Surface showing the effect of engine speed and thickness at outlet on
Sigma Analysis
Where are you?
may want to consider how far along the “path” your simulation practices are
haps you could improve your use of optimization and greater leverage your investment in Simulation?
ANSYS DesignXplorerIntegral with Workbench
• Parametric multiphysicsmodeling with automated updates
• Bi‐directional CAD, RSM, scripting, reporting and more...
Integral with Workbench
• Parametric multiphysicsmodeling with automated updates
• Bi‐directional CAD, RSM, scripting, reporting and more...
Robust Design Tools at ANSYSNSYS DesignXplorerUnified Workbench solution
NSYS Fluent Built‐in Mesh Morphing and shape Optimization (MMO) toolsBeing hooked up to DX for 14.5
Adjoint solver
Baseline Design
Optimized DesignHigh sensitivity – changes to shape have a big effect on drag
Low sensitivity
High sensitivity – Shape on Downforce
R14R14
Robust Design Tools at ANSYSNSOFT OptimetricsProduce “families of curves”Simultaneous solve with DSO packsAccess to EBU Adjoint
nd more
0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 1000.00ampturns [A]
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
Fm [n
ewto
n]
Ansoft Corporation Maxwell3DDesign1XY Plot 1Curve Info
FmSetup1 : LastAdaptiveGap='0.001in'
FmSetup1 : LastAdaptiveGap='0.002in'
FmSetup1 : LastAdaptiveGap='0.003in'
FmSetup1 : LastAdaptiveGap='0.004in'
FmSetup1 : LastAdaptiveGap='0.005in'
FmSetup1 : LastAdaptiveGap='0.006in'
0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 1000.00ampturns [A]
0.00
50.00
100.00
150.00
200.00
250.00
300.00
350.00
Fm [n
ewto
n]
Ansoft Corporation Maxwell3DDesign1XY Plot 1Curve Info
FmSetup1 : LastAdaptiveGap='0.001in'
FmSetup1 : LastAdaptiveGap='0.002in'
FmSetup1 : LastAdaptiveGap='0.003in'
FmSetup1 : LastAdaptiveGap='0.004in'
FmSetup1 : LastAdaptiveGap='0.005in'
FmSetup1 : LastAdaptiveGap='0.006in'
Optimization PartnersNSYS simulation software has been effectively used in concert with many optimization partnersMATLAB (Mathworks)ModeFrontier (Esteco)OptiSLang (Dynardo)eArtiusOptimus (Noesis)RBF‐MorphSculptor (Optimal)Sigma Technology (IOSO)TOSCA (FE‐DESIGN)iSight (Dassault)Qfin (Qfinsoft)
ANSYS DesignXplorer
ANSYS DesignXplorer
DesignXplorer is everything under this Parameter bar…
• Low cost & easy to use!
• It drives Workbench
• Improves the ROI!
DesignXplorer is everything under this Parameter bar…
• Low cost & easy to use!
• It drives Workbench
• Improves the ROI!
DX
ANSYS Workbench Solvers
Design of Experiments
With little more effort than for a single run, you can use
DesignXplorer to create a DOE and run many variations
With little more effort than for a single run, you can use
DesignXplorer to create a DOE and run many variations
Correlation Matrix
Understand how your parameters are correlated/influenced by other parameters!
Understand how your parameters are correlated/influenced by other parameters!
Sensitivity
Understand which parameters your design is most sensitive to!
Understand which parameters your design is most sensitive to!
Response Surface
Understand the ensitivities of the output arameters (results) wrt the input parameters.
Understand the ensitivities of the output arameters (results) wrt the input parameters.
3D Response
2D Slices Response
Goal‐Driven Optimization
Use an optimization algorithm or screening to understand tradeoffs or discover optimal
design candidates!
Use an optimization algorithm or screening to understand tradeoffs or discover optimal
design candidates!
Robustness Evaluation
Input parameters have variation!
ut meters also!
Understand how our performance
will vary with your
Understand how our performance
will vary with your
Make sure your design is robust!
Six Sigma, TQM
Make sure your design is robust!
Six Sigma, TQM
Predict how many parts
will likely fail?
Predict how many parts
will likely fail?
Understand which inputs require the
Understand which inputs require the
Industry Testimonials
er the course of the design process, Dyson’s engineers steadily improved the formance of the fan to the point that the final design has an amplification ratio of 15 ne, a 2.5‐fold improvement over the six‐to‐one ratio of the original concept design. team investigated 200 different design iterations using simulation, which was 10 es the number that would have been possible had physical prototyping been the mary design tool. Physical testing was used to validate the final design, and the ults correlated well with the simulation analysis.”
R. Mason, Research, Design and Development Manager, DYSON
“This technology makes it possible to quickly evaluate hundreds of designs in batch processes to explore the complete design space so that we know we have the best possible design.”
Stresses Ken Karbon, Staff Engineer, General Motors
“The ease of using simulation tools has helped to transform our organization from a test‐centric culture to an analysis‐centric culture.”
Bob Tickel, Director of Analysis at Cummins
P3
P2
P1
Need uniform outflowMinimize pressure drop
BAD
GOOD
Pressure Drop
Goo
d
Bad
Example 1: Slit Die
Example 2: Combustor
− 3 parameters−Minimize pressure loss−Minimize mach number
Inlet
Outlet
Outlet
Outlet
Dump Gap
Diffuser Length
Exit Height
Sensitivity
Mesh Morphing and Optimizer
Fluent Morpher‐Optimization Feature
Allows users to optimize product design based on shape deformation to achieve design objectiveBased on free‐form deformation tool coupled with various optimization methods
Mesh Morphinglies a geometric design change directly to the mesh in the solver
s a Bernstein polynomial‐based morphing schemeeform mesh deformation defined on a matrix of control points leads to a ooth deformationorks on all mesh types (Tet/Prism, CutCell, HexaCore, Polyhedral)
r prescribes the scale and direction of deformations to control points distributed evenly through the rectilinear region.
Process
ORWhat if? OptimizerSetup CaseSetup Case
RunRun
Setup MorphSetup Morph
EvaluateEvaluate
Choose “b t” d i
Choose “b t” d i
RegionsRegions
ParametersParameters
DeformationDeformation
Setup CaseSetup Case
RunRun
Setup OptimizerSetup
Optimizer
OptimizeOptimize
Optimal Optimal
MorphMorph
OptimizerOptimizer Auto
Deformation Definition
• Define constraint(s) (if any)
• Select control points and prescribe the relative ranges of motion
• Define constraint(s) (if any)
• Select control points and prescribe the relative ranges of motion
Objective Function
Baseline Design Optimized Design
• Objective Function: Equal flow rate• Objective Function: Equal flow rate
Optimizer Algorithms; Compass, Powell, Rosenbrock, Simplex, TorczonAlgorithms; Compass, Powell, Rosenbrock, Simplex, Torczon
uto
• Optimize!• Optimize!
Example: L‐Shaped Duct
Application: L‐shaped ductObjective Function: Uniform flow at the outlet
Significant Improvement in Flow Uniformity
RBF Morph
A system of radial functions is used to fit a solution for the mesh movement/morphing, from a list of source points and their prescribed displacements
Radial Basis Function interpolation is used to derive the displacement at any location in the space
The RBF problem definition is mesh independent.
How Does RBF‐Morph work?
RBF‐Morph is Integrated with Fluent
Example 1: Internal Flow
Here, a pipe is projected onto a previously defined STL shape
Example 3: External Flow
Recently conducted conceptual study by ANSYS in conjunction with Volvo Cars
50 Million cell hybrid mesh of Volvo XC60
50 Design variants investigated using RBF‐Morph Addon for ANSYS Fluent and Workbench Design Explorer
50 hours total clock time to complete full optimisation on HPC Cluster
Courtesy of Volvo Cars
Example 4: 50:50:50 Optimisation
olvo XC60 vehicle modelFour shape parametersRBF Morph to define shape parametersANSYS DesignXplorer
• To drive shape parameters• To create DOE• To perform Goal Driven Optimization
External Aerodynamics
eal Aerodynamics ptimization Process– Capacity– Automatic – Fast
Volume Mesh – TGridCell Count : 50.2 Million Cellsrism Layers : 10 (First Aspect Ratio 10,
Growth 1.1)rism Count : 24.4 Million Cellskewness < 0.9
Step #1 : Baseline Model
Prism Layers
Cut Plane Y=0
oundary Conditionsnlet : Velocity Inlet 100 kmphOutlet : Pressure Outlet, 0 Pa (Gage)Side walls : Wall, no‐slipTop wall : Wall, no‐slip
olver SettingsSteady, PBCS, Green Gauss Node Based GradientFluid : Incompressible air, Density = 1.225 kg/m3,
Turbulence : Realizable K‐epsilon, Non‐equilibrium wall treatmentDiscretization : Pressure – Standard Momentum, TKE, TDR – 2nd Order
Step #2 : CFD Setup
• Solution Controls– Courant Number = 200– ERF
Momentum, Pressure = 0.75– URFs
Density = 1.0, Body Forces = 1.0TKE TDR = 0 8
Step #3 : RBF‐Morph
Boat Tail Angle (P2) Roof Drop Angle (P3)
A i b t hi h l t d
Step #3 : RBF‐Morph
Step #3 : RBF‐Morph• Fully integrated within FLUENT and Workbench
• Easy to use
• Parallel => rapidly morph large size models• Mesh independent solution works with all element
types (tetrahedral, hexahedral, polyhedral, etc.)• Superposition of multiple RBF-solutions makes the
FLUENT case truly parametric (only 1 mesh is stored)
• RBF-solution can also be applied on the CAD
• Precision: exact nodal movement and exact feature preservation.
Central Composite Design, Face Centered, Enhanced
Step #4 : Setup DesignXplorer
Step #5 : Running Simulations
768 Cores 384 Cores 288 Cores 240 Cores 144 Cores
Task Time (Seconds) Time (Seconds) Time (Seconds)
Time (Seconds)
Time (Seconds)
Baseline Case (i.e. Design Point 1)
d volume mesh of baseline into the CFD solver and y solver settings
225 340 365 481 228
Solution 6979 11153 14409 17256 27246
ing CFD data file 681 538 558 600 532
Each Subsequent Design Point
ph vehicle shape 84 59 65 69 100
Solution 1284 1754 2208 2630 4100
ing CFD data file 734 559 572 621 532
al Run Time (Wall Clock) eded for All 50 Design nts (Hours)
30.80 35.63 42.98 50.28 72.19
Results :
Optimization
Design Points
Boat Tail Angle(P2)
Long Roof Angle(P3)
Green House (P4)
Front Spoiler Angle (P5)
Drag Force (N) (P1)
1 0.000 0.000 0.000 0.000 388.01
9 0.000 1.500 0.000 1.900 393.01
19 1.850 ‐2.300 ‐0.700 0.000 372.30
25 ‐1.850 1.500 ‐0.700 0.000 397.33
low Results Discussion– Design point 1, 9, 19 & 25– Velocity contours– Iso‐surface of total pressure = 0.0
Results :
Design Point #1
Design Point #19
Design Point #25
Adjoint Solver
An adjoint solver allows specific information about a fluid system to be computed that is very difficult to gather otherwise.
The adjoint solution itself is a set of derivatives.• They are not particularly useful in their raw form and must be post‐processed
appropriately.• The derivative of an engineering quantity with respect to all of the inputs for the
system can be computed in a single calculation.– Example: Sensitivity of the drag on an airfoil to its shape.
There are 4 main ways in which these derivatives can be used:
1. Qualitative guidance on what can influence the performance of a system strongly.
2. Quantitative guidance on the anticipated effect of specific design changes.
3. Guidance on important factors in solver numerics.
4. Gradient‐based design optimization.
Adjoint Solution?
GOAL: Identify features of a system design that are most influential in the performance of the system.
EXAMPLE:– Sensitivity of the Drag on a NACA 0012 airfoil to changes in the
shape of the airfoil.– The shape sensitivity field is extracted from the adjoint solution in
a post‐processing step.
How to Use the Results ‐ Qualitative
High sensitivity – changes to shape have a big effect on drag
Low sensitivity – changes to shape have a small effect on drag
GOAL: Identify specific system design changes that benefit the performance and quantify the improvement in performance that is anticipated.
EXAMPLE:– Design modifications to turning vanes in a 90 degree elbow to
reduce the total pressure drop.– The optimal adjustment that is made to the shape is defined by
the shape sensitivity field (steepest descent algorithm).– Effect of each change can be computed in advance based on linear
extrapolation.
How to Use the Results ‐ Quantitative
Original P = ‐232.8 PaExpected change computed using the adjoint and linear extrapolation = 10.0 PaMake the change and recompute the solution.Actual change = 9.0 Pa
BaselineModified
GOAL: Identify aspects of the solver numerics and computational mesh that have a strong influence on quantities that are being computed that are of engineering interest.
EXAMPLE:– Use the adjoint solution to identify parts of the mesh where mesh
adaption will benefit the computed drag by reducing the influence of discretization errors.
How to Use the Results ‐ Solver Numerics
Baseline Mesh Adapted Mesh
Adapted MeshDetail
GOAL: Perform a sequence of automated design modifications to improve a specific performance measure for a system
EXAMPLE:– Gradient‐based optimization of the total pressure drop in a pipe.– Flow solution is recomputed and the adjoint recomputed at each
design iteration.
How to Use the Results ‐ Optimization
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30
p t
ot[Pa]
Initial design
Final design30% reduction in total pressure drop after 30 design iterations
Once a desired change to the geometry of the system has been selected, how is that change to be made?
• Mesh morphing provides a convenient and powerful means of changing the geometry and the computational mesh.– Use Bernstein polynomial‐based morphing scheme discussed earlier
Mesh Morphing
Example: Sensitivity of lift to surface shape
Select portions of the geometry to be modified
Adjoint to deformation operationSurface shape sensitivity becomes control point sensitivity (chain rule for differentiation)
Benefit of this approach is two‐foldSmooths the surface sensitivity fieldProvides a smooth interior and boundary mesh deformation
Mesh Morphing & Adjoint Data
Flow
The adjoint solution is determined based on the specific flow physics of the problem in hand.
The effect of other practical engineering constraints must be reconciled with the adjoint data to decide on an allowable design change.
Example:– Some walls within the control volume may be constrained not to move.– A minimal adjustment is made to the control‐point sensitivity field so that deformation of the fixed walls is eliminated.
Mesh Morphing, Adjoint Data & Constraints
Fixed wall
Fixed wall
Moveable walls
e adjoint solver is released with all Fluent 14 packages.
ocumentation is availableTheoryUsageTutorialCase study
aining is available
nctionality is activated by Loading the adjoint solver add‐on module
new menu item is added at the top level
Current Functionality
ey initial application areas are:Low‐speed external aerodynamics– F1 (increase downforce)– Production automobiles (decrease drag)Low‐speed internal flows– Total pressure drop (reduce losses)
Current FunctionalityApplication Drivers
• Ratios• Products• Variances• Linear combinations• Unary operations
n Fluent 14.5 a mechanism for users to efine a wide range of observables of nterest will be provided.
• Forces• Moments• Pressure drop• Swirl
Current Scope
NSYS‐Fluent flow solver has very broad scope
djoint is configured to compute solutions based on some assumptionsSteady, incompressible, laminar flow.Steady, incompressible, turbulent flow with standard wall functions.First‐order discretization in space.Frozen turbulence.
he primary flow solution does NOT need to be run with these restrictionsStrong evidence that these assumptions do not undermine the utility of the adjoint solution data for engineering purposes.
lly parallelized.
radient algorithm for shape modificationMesh morphing using control points.
djoint‐based solution adaption
Example 1: Automotive Aerodynamics
Surface map of the drag sensitivity to shape changes
ggressive adjustment results in a 17% reduction in loss in just one design iteration
Example 2: Pressure Drop in a DuctTotal Pressure Drop (Pa)
Geometry Predicted Result
Original ‐‐‐ ‐22.0
Modified ‐14.8 ‐18.3
HPC Best Practices
• Know your hardware lifecycle
• Have a goal in mind for what you want to achieve
• Using Licensing productively
• Using ANSYS provided processes effectively
Guidelines :
• This section is meant to provide an overview of the different hardware components and how they can effect solution time.
• Hopefully this will give you some of the tools to understand why some of the benchmark numbers in better detail.
• ANSYS would always recommend that the best thing to do before buying a system is to look at the latest benchmarks.
• If you are not sure please ask.
Hardware Considerations
Effect of Clock SpeedImpact of CPU Clock on Application Performance
Processor: Xeon X5600 SeriesHyper Threading: OFF, TURBO: ON
Active cores: 12/node; Memory speed: 1333 MHz(performance measure is improvement relative to CPU Clock 2.66 GHz)
0.80
0.85
0.90
0.95
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
1.40
Clock Ratio eddy_417K aircraft_2M turbo_500K sedan_4M truck_14M
ANSYS/FLUENT Model
Impr
ovem
ent d
ue to
Clo
ck
2.66 GHz2.93 GHz3.47 GHz
High
er is
bet
ter
Effect of Memory Speed
We can see here the effect of memory speed.
This has implications on how you build your hardware.
Some processors types have slower memory speeds by default.
On other processors non‐optimally filling the memory channels can slow the memory speed.
Impact of DIMM speed on ANSYS/FLUENT Application Performance (Intel Xeon x5670, 2.93 GHz)
Hyper Threading: OFF, TURBO: ONActive threads per node: 12
(performance measure improvement is relative to memory speed of 1066 MHz)
80%
85%
90%
95%
100%
105%
110%
115%
120%
125%
130%
eddy_417K turbo_500K aircraft_2M sedan_4M truck_14M
ANSYS/FLUENT Model
Impa
ct o
f Mem
ory
Spee
d
1066 MHz1333 MHz
Turbo Boost (Intel) / Turbo Core (AMD)
Turbo Boost (Intel)/ Turbo Core(AMD) is a form of over‐clocking that allows you to give more GHz to individual processors when others are idle.
With the Intel’s have seen variable performance with this ranging between 0‐8% improvement depending on the numbers of cores in use.
The graph below for CFX on a Intel X5550. This only sees a maximum of 2.5% improvement.
Hyper‐Threading: ANSYS Fluent
Hyper‐Threading Technology makes a single physical processor appear as two logical processors.
This is not the same as physically having two logical processors and does not give double the speedup.
In our tests we’ve seen as high as a 20% increase in performance although you can see the actual performance can be quite variable from the graph opposite.
It is worth noting that this has licensing implications as you would need to oversubscribe the physical cores and hence would need double the HPC Licenses.
Evaluation of Hyperthreading on ANSYS/FLUENT Performance iDataplex M3 (Intel Xeon x5670, 2.93 GHz)
TURBO: ON(measurement is improvement relative ot Hyperthtreading OFF)
0.90
0.95
1.00
1.05
1.10
eddy_417K turbo_500K aircraft_2M sedan_4M truck_14MANSYS/FLUENT Model
Impr
ovem
et d
ue to
Hyp
erth
read
ing
.HT OFF (12 threads on 12 physical cores) HT ON (24 threads on 12 physical cores)
High
er is
bet
ter
• Traditionally Intel take the “power approach” in general in their 2 socket systems (faster core but less of them per processor/socket).
• Traditionally AMD take the economies of scale approach (more cores per processor but individually slower clock speeds).
• Remember that this landscape changes because they are constantly in competition with each other.
• Please note that whilst we do have some numbers for the new Intel Sandy‐bridge chips we do not have scaling numbers for the equivalent AMD 6200 series at the time of writing this presentation.
AMD vs. Intel
2 Socket vs. 4 Socket Systems
Current 4 socket systems come up slower than their 2 socket counterparts (based on Intel Westmere vs. Xeon E7‐8837).• Clock speed slower• Memory speed slower• No additional memory bandwidth.
Performance of ANSYS Fluent on two‐socket and four‐socket based systemsPerformance measure is Fluent Rating (higher values are better)
2‐socket based SystemsHS22/HS22V Blade, 3550/3650 M3, Dx360 M3
(Xeon 5600 Series)
4‐socket based SystemsIBM HX5 Blade, X3850(Xeon E7‐8837 series)
odes Sockets Cores FluentRating Nodes Sockets Cores Fluent
Rating1 2 12 88 1 2 16 96
Effect of the Interconnect
ANSYS/FLUENT Performance iDataplex M3 (Intel Xeon x5670, 12C 2.93 GHz)
Network: Gigabit, 10-Gigabit, 4X QDR Infiniband (QLogic, Voltaire)Hyperthreading: OFF, TURBO: ON
Models: truck_14M
0
1000
2000
3000
4000
5000
12 24 48 96 192 384 768Number of Cores used by a single job
FLU
ENT
Rat
ing
QLogic Voltaire 10-Gigabit GigabitHi
gher
is b
ette
r
When going for multiple systems linked together the interconnect becomes an important factor.
The interconnect is the fabric that connects the nodes.
We can see from the graph opposite with FLUENT how quickly the performance of Gigabit Ethernet drops off.
ANSYS Fluent Auto‐Partitioning
to partitioning is now very quick
ss than 10s to process 800M cells!
rial pre‐partitioning step no
nger required
200M 400M 600M 800MTime 2.914 4.706 6.617 9.86
0
2
4
6
8
10
12
Time in se
cond
s
cavity case, 768 cores
123456789
Time in se
cond
s
truck_111m
Time to Partition 200M Cavity Case over 768 cores
Time to Partition 111M Truck Case
ANSYS CFX Partitioning
ptimize parallel partitioning in multi‐core clusters (CFX)β
Partitioner determines number of connections between partitions and optimizes part.‐host assignments
‐use previous results to initialize calculations on large problem (CFX) β
Large case interpolation for cases with >~100M nodes
ean up of coupled partitioning option for multi‐domain cases (CFX)
Eliminates ‘isolated’ partition spotsamatically reduced partitioning times for cases th fluid‐solid interfaces and very large numbers of gions
Compute Node 1 Compute Node 2
P1
P5
P3
P6
P2 P7
P4 P8
P1P5
P3
P6P2
P7
P4
P8
Partitioning step finds adjacency amongst partitions; partitions with max adjacency are grouped on same compute nodes
ANSYS Fluent Parallel Scalability
0
1000
2000
3000
4000
5000
6000
7000
0 100 200 300 400 500 600
6.3.0
12.0.0
12.0.0
13.0.0
10000
20000
30000
40000
50000
60000
70000
13.0.0
14.0.0
n X5560 @ 2.80GHz (Nehalem EP)
Intel Harpertown
Intel Westmere
nsistently improved scalability
oss releases
dan, 4M cells
ANSYS Fluent Parallel Scalability
ICE 8400EX, Intel 6‐core
Intel Harpertown
Intel Westmere hex‐core 2.93 GHz
nsistently improved scalability
oss releases
ck, 111M cells 0
50
100
150
200
250
300
350
400
450
0 200 400 600 800 1000 1200
6.3.0
12.0.0
12.0.0
13.0.0
500
1000
1500
2000
2500
13.0.0
14.0.0
Leading Performance for fluid flow simulation
The memory bandwidth of the Intel® Xeon® processor E5-2600 product family allows excellent scalability and per core performance.
Support for higher speed memory DIMMs, added on-core capacity for memory loads, as well as a larger cache size are key to extending performance and scalability.
Higher memory bandwidth has a pronounced impact with fully coupled solver applications, which are the most memory intensive. Sedan_4m is shown as an example of fully coupled solver performance. Truck_14m is representative of segregated solver performance. The horizontal line at 1.63 represents the geomean speedup over 6 standard benchmarks.
SYS Fluent 14tive Performanceer is better
1
1.86
6 core Xeon X5675 8 core Xeon E5‐2680
Sedan_4m
Geomean
1
1.53
6 core Xeon X5675 8 core Xeon E5‐2680
Truck_14mGeomean
ANSYS Fluent Parallel Scalability on Intel
• Good scalability and more operations per clock make obtaining results on Intel® Xeon® E5 1.68x faster than on Intel Xeon 5600 platforms
• For end user it is about faster turnaround or solving larger tasks with the same resources along with lower TCO
Airlif
tRea
ctor
Big
Pipe
Com
bBVM
Com
bED
MCyl
inde
rIn
dyCar
Inte
rnal
LeM
ansC
arLE
S_0
01Pu
mp
Rad
City
Rad
Furn
ace
Sta
geCom
pres
sor
tatic
Mix
er10
0MM
Sta
ticM
ixer
100
Sta
ticM
ixer
200
Sta
ticM
ixer
400
kTu
rbin
eW
igle
y100
Intel Xeon 5650
Intel Xeon E5-2680
ANSYS CFX Parallel Scalability on Intel
Source: Published/submitted/approved results as of March 6, 2012. Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may
Including Monitors
alability with MonitorsScalability to higher core countsSimulations with monitors including plotting and printing
Hex‐core mesh, F1 car, 130 million cellsmonitor‐enabled
200 400 600 800 1000
Example data for scaling with R14 monitors
3072 cores
Monitor support optimizations
maintain scalability expectations
Fluids I/OUENT, CFX and AUTODYN use a “singular” e structure. This means there is one global set of files and every process writes to them.
is methodology falls down at a large mber of cores where the file I/O becomes bottleneck.CFX deals with this by using inline compression cdat)FLUENT has both inline compression (cdat) and at v12.x introduced support for a Parallel File pdat).
rallel file system support in ANSYS UENT– ~10x ‐ 20x speedup for data write– Eliminates scaling bottleneck for data i t i i l ti l l t (
Serial I/O Parallel I/O
ANSYS FLUENT
To Demonstrate 50:50:50 Method– Volvo XC60 vehicle model– Four shape parameters– RBF Morph (Integrated within FLUENT) to define shape parameters
– Grid morphing in parallel
ANSYS WorkBench (Frame Work to Automate Process)– To drive shape parameters– To create DOE– To perform Goal Driven Optimization
HPC Fluids Demonstration Case
The 50:50:50 Method
50 50 design points in the design space EXTENT
5050 million cells used in CFD simulation of each design
pointACCURACY
50 50 hours total elapsed time to simulate all the design points SPEED
“One – Click” – Entire design space is simulated and post‐processed completely automatically after the initial baseline
case setup
HPC Fluids Demonstration CasePrepare Meshed Model for Baseline Vehicle Shape
CFD Solver Setup, Define Shape Parameters
Generate DOE using Input Shape Parameters
Collate Data,
Morph Vehicle Shape
Run CFD Simulation
STEP 1
STEP 2
STEP 3
STEP 4
STEP 5
Mesh Morpher Integrated within FLUENTSolver (FLUENT), Optimizer (DX) & Post Processor (CFD Post) Integrated within
ANSYS WorkBench
HPC Fluids Demonstration Case
768 Cores 384 Cores 288 Cores 240 Cores 144 Cores
Task Time (Seconds) Time (Seconds) Time (Seconds)
Time (Seconds)
Time (Seconds)
Baseline Case (i.e. Design Point 1)
d volume mesh of baseline into the CFD solver and y solver settings
225 340 365 481 228
Solution 6979 11153 14409 17256 27246
ing CFD data file 681 538 558 600 532
Each Subsequent Design Point
ph vehicle shape 84 59 65 69 100
Solution 1284 1754 2208 2630 4100
ing CFD data file 734 559 572 621 532
al Run Time (Wall Clock) eded for All 50 Design nts (Hours)
30.80 35.63 42.98 50.28 72.19
HPC Fluids Demonstration Case
Compute Cluster Details1. Intel’s Endeavor Cluster
2. Intel Xeon X5670 (dual socket)
3. Clock speed 2.93 GHz
4. Six cores per socket (12 cores per node)
5. 24 GB RAM @ 1333 MHz, SMT ON, Turbo ON
6. QDR Infiniband
7. RHEL Server Release 6.1
capability for “specialty physics” view factors, ray tracing, reaction rates, etc.
GPU Acceleration for CFD
Radiation View Factor calculation (ANSYS FLUENT 14 ‐beta)
Getting the right setup is a balancing act..
• HPC Licensing Cost
• Cost of Hardware
• Complexity of Deployment and Maintenance
Factors to Consider
• ANSYS HPC is licensed in either the HPC Workgroup/Enterprise (or individually) or HPC Packs.
• Given that it is licensed per partition (which in most cases translated to a core) – the best value for money is in getting the best scalability per core as possible.
• When running multiple cores make sure you are using them as effectively as the memory bandwidth allows.
HPC Licensing Cost
• ANSYS will, in general, recommend the best hardware for performance that gets you the best out of your licensing investment. However you may need to make trade‐off's for your budget.
• 2 socket systems provide the best performance but more inherently more complexity (and hence cost) because of the need for high speed interconnects when in a cluster.
• Current 4 socket systems have less performance than their 2 socket counterparts but are also cheaper because of their lack of requirement for the high speed interconnects to get to higher numbers of nodes at the low end.
Cost of Hardware
• A large cluster can have significant overheads in ease of deployment & on‐going maintenance costs.
• A 4 socket system, whilst having less performance, may provide an easier deployment and maintenance route at the lower end and will be a better fit to what the average IT department is used to.
• Often users get too caught up on per core performance at the detriment of not getting any extra speedup at all.
• It is important to purchase something you feel you can internally support.
• Purchase 3rd party support for high performance clusters if you do not feel you have the skills to support it internally.
Complexity of Deployment and Maintenance
If you opt for unsupported infrastructure– This does not mean that it will not work but you use them at your own risk.
– We may ask you to replicate it on a system that is supported before providing further support if you run into problems!
We recommend:– Buying Supported Operating systems and Hardware– Using ANSYS Supported Practices– Talking to us before buying! It is in all our interests that you get this right!
Remember the Following ...
NSYS Partner Solutions– http://www.ansys.com/corporate/partners/partners‐hpc.asp
• Reference configurations• Performance data• White papers• Sales contact points
erformance Data– http://www.ansys.com/benchmarks
Information Available
Information Available
NSYS Platform Supporthttp://www.ansys.com/services/ss‐platform‐support.asp– Platform Support Policies– Supported Platforms– Supported Hardware– Tested systems
NSYS Virtual Demo Roomhttp://www.ansys.com/demoroom/– Click on HPC!
Information Available
e ManualSections on best practices and parallel processing for various solversnstallation walkthroughs for installing the products, parallel processing, licensing and RSM remote solve manager)
NSYS AdvantageOnline Magazine
Information Available
stomer Portalhttp://www1.ansys.com/customer/– Knowledge Resources– Installation and Systems FAQ’s
stomer Supporthttp://www1.ansys.com/customer/Portal, Email or Phone
Automated Design Exploration and Optimization + HPC Best Practices