Upload
builiem
View
224
Download
0
Embed Size (px)
Citation preview
© 2011 ANSYS, Inc. May 20, 2012 1
High Performance Computing: A Review of Parallel Computing with ANSYS solutions
“Efficient and Smart Solutions for Large Models”
© 2011 ANSYS, Inc. May 20, 2012 2
Use ANSYS HPC solutions to perform efficient design variations of large structural models
© 2011 ANSYS, Inc. May 20, 2012 3
Everyone can take advantage of HPC
solutions for faster computation and variations of large
models
© 2011 ANSYS, Inc. May 20, 2012 4
All users can benefit from HPC computations
© 2011 ANSYS, Inc. May 20, 2012 5
Most of the analysis types can be accelerated
Static linear or nonlinear analyses Buckling analyses Modal analyses Harmonic & Transient response analyses using the FULL method Low-frequency electromagnetic analysis High-frequency electromagnetic analysis Coupled-field analyses Superelements (use pass) Cyclic symmetry analyses
© 2011 ANSYS, Inc. May 20, 2012 6
Size of the model – how large is “large”?
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
1 2 3 4
Number of CPUs
Sp
eed
Up 1000 elements
8000 elements
64000 elements
512000 elements
© 2011 ANSYS, Inc. May 20, 2012 7
A simple and productive licensing scheme
ANSYS HPC Pack
ANSYS HPC Workgroup
© 2011 ANSYS, Inc. May 20, 2012 8
Which part of the simulation is faster?
© 2011 ANSYS, Inc. May 20, 2012 9
Not all steps of the simulation are parallel
Model solution
© 2011 ANSYS, Inc. May 20, 2012 10
How should I read speed-up curves? This is the solver part – excellent scaling!
This is YOUR time (elapsed)
© 2011 ANSYS, Inc. May 20, 2012 11
The right combination of algorithms and
hardware leads to maximum efficiency
© 2011 ANSYS, Inc. May 20, 2012 12
Shared Memory Parallel vs Distributed Memory Parallel
© 2011 ANSYS, Inc. May 20, 2012 13
Challenges and solutions for the distributed method
Challenges Solutions
Efficient and relevant decomposition
Partitioning methods, Solver
Load Balancing
Partitioning methods, Solver
Speed Hardware(Processors, Interconnects), Solver
Maximum Problem Size Hardware (RAM), Solver
I/O to communicate between cores.
Hardware (Interconnects), MPI, Solver
I/O to write results and overflow files during solution.
Hardware (Disks, Interconnects), MPI, Solver
© 2011 ANSYS, Inc. May 20, 2012 14
Sparse or iterative solvers?
Solver type Distributed/Shared Memory
SPARSE (direct)
DMP/SMP
PCG (Iterative)
DMP/SMP
LANB (direct, modal) SMP
LANPCG (iterative, modal) DMP/SMP
SNODE SMP
© 2011 ANSYS, Inc. May 20, 2012 15
Get it in-core!
Incore - 24GB
Optimal - 24GB
Minimum - 24GB
Optimal -4GB Minimum - 4GB
0
500
1000
1500
2000
2500
Time (s)
© 2011 ANSYS, Inc. May 20, 2012 16
Check the PCG level!
© 2011 ANSYS, Inc. May 20, 2012 17
Balancing the load: a key to efficiency
© 2011 ANSYS, Inc. May 20, 2012 18
A consequence for contact users
© 2011 ANSYS, Inc. May 20, 2012 19
Remote Load or Displacements
Point moment and it is distributed to internal surface of the hole
Deformed shape
All nodes connected to one RBE3 node have to be grouped into the same domain. This hurts load balance! Try to reduce # of RBE3 nodes.
© 2011 ANSYS, Inc. May 20, 2012 20
The choice of hardware will condition the performance of the solution
© 2011 ANSYS, Inc. May 20, 2012 21
Interconnects to ensure data traffic
3 Millions DOF using direct sparse solver Solid 95 elements, worst case for a direct solver
0
1000
2000
3000
4000
5000
6000
16 32 64 128
Wall Time (secs)
Cores
TrueScale vs GigE (16 Nodes) In Core Memory
TrueScale
Gig-E
© 2011 ANSYS, Inc. May 20, 2012 22
Disks to efficiently handle I/O
© 2011 ANSYS, Inc. May 20, 2012 23
Approximate size of models wrt configuration
RAM (GB) PCG Sparse
PC1 (standard ) ~ 16 ~15 ~1
PC2 ( Hautes Performances) ~96 ~100 ~10
– PC1 high end ( 64 bit , 12 GB <= RAM <= 24 GB , 2 disques RAID 0 ( Desktop or Laptop )
– PC2 ultra high end (64 bit ,96 GB , 4 disques RAID 0 )
(Size of models in million dof)
© 2011 ANSYS, Inc. May 20, 2012 24
Taking advantage of new hardware solutions: GPU
© 2011 ANSYS, Inc. May 20, 2012 25
Speed-up from GPU technology
Solder Joint Benchmark - 4M DOF, Creep Strain Analysis
Results Courtesy of MicroConsult Engineering, GmbH
Linux cluster : Each node contains 12 Intel Xeon 5600-series cores, 96 GB RAM, NVIDIA Tesla M2070, InfiniBand
Mold
PCB
Solder balls
© 2011 ANSYS, Inc. May 20, 2012 26
Speed-up from multiple nodes with 1 GPU board per node
Mold
PCB
Solder balls
Results Courtesy of MicroConsult Engineering, GmbH
1 node @ 8 cores no GPU
1 nodes @ 8 cores, 1 GPU
8 nodes@ 1 core, 8 GPU
2 nodes@ 4 cores, 2 GPU
© 2011 ANSYS, Inc. May 20, 2012 27
Reduce computation times from hours to minutes, from days
to hours – some examples
© 2011 ANSYS, Inc. May 20, 2012 28
Typical benchmark results
0
10
20
30
40
50
60
70
80
1 2 4 8 32 64 128
Co
re S
olv
er
Sp
eed
up
Number of Cores
ANSYS Mechanical 12.0 10M DOF Distributed ANSYS PCG Solver
Intel Xeon 5500 Processor Series ("Nehalem") 0
1000
2000
3000
4000
5000
16 32 64 128 256
ANSYS Mechanical 12.0 3M DOF Distributed ANSYS Sparse Solver
AMD Opteron 2360 ("Barcelona") QLogic TrueScale Infiniband
© 2011 ANSYS, Inc. May 20, 2012 29
What could it look like on your model?
6 Mio Degrees of Freedom Plasticity, Contact Bolt pretension 4 load steps
1 HPC Pack
© 2011 ANSYS, Inc. May 20, 2012 30
BGA Model
© 2011 ANSYS, Inc. May 20, 2012 31
BGA Model – Mesh pictures
© 2011 ANSYS, Inc. May 20, 2012 32
BGA Model – Deformations and stresses
© 2011 ANSYS, Inc. May 20, 2012 33
BGA Model – comparing solvers
PCG lev 1 Sparse
Elapsed solve 425 sec 4512 sec
solver Mflops 16066 35690
Elapsed tot 580 sec 4717 sec
© 2011 ANSYS, Inc. May 20, 2012 34
BGA Model – comparing hardware configurations (PCG solver)
Laptop 2 X9100 cores 3.06 Ghz
Desktop 8 W5580 cores 3.20 Ghz
Elapsed PCG 1394 sec 425 sec
PCG solver Mflops
5020 16066
Elapsed tot 1753 sec 580 sec
© 2011 ANSYS, Inc. May 20, 2012 35
BGA Model – elapsed time vs model size
0
1000
2000
3000
4000
5000
6000
7000
8000
0 20 40 60 80 100 120 140
Elapsed solve ( sec)
Elapsed solve
Million DOF
© 2011 ANSYS, Inc. May 20, 2012 36
Use ANSYS HPC solutions to perform efficient design variations of large structural models
© 2011 ANSYS, Inc. May 20, 2012 37
Q & A