High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing:...

Preview:

Citation preview

© 2011 ANSYS, Inc. May 20, 2012 1

High Performance Computing: A Review of Parallel Computing with ANSYS solutions

“Efficient and Smart Solutions for Large Models”

© 2011 ANSYS, Inc. May 20, 2012 2

Use ANSYS HPC solutions to perform efficient design variations of large structural models

© 2011 ANSYS, Inc. May 20, 2012 3

Everyone can take advantage of HPC

solutions for faster computation and variations of large

models

© 2011 ANSYS, Inc. May 20, 2012 4

All users can benefit from HPC computations

© 2011 ANSYS, Inc. May 20, 2012 5

Most of the analysis types can be accelerated

Static linear or nonlinear analyses Buckling analyses Modal analyses Harmonic & Transient response analyses using the FULL method Low-frequency electromagnetic analysis High-frequency electromagnetic analysis Coupled-field analyses Superelements (use pass) Cyclic symmetry analyses

© 2011 ANSYS, Inc. May 20, 2012 6

Size of the model – how large is “large”?

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4

Number of CPUs

Sp

eed

Up 1000 elements

8000 elements

64000 elements

512000 elements

© 2011 ANSYS, Inc. May 20, 2012 7

A simple and productive licensing scheme

ANSYS HPC Pack

ANSYS HPC Workgroup

© 2011 ANSYS, Inc. May 20, 2012 8

Which part of the simulation is faster?

© 2011 ANSYS, Inc. May 20, 2012 9

Not all steps of the simulation are parallel

Model solution

© 2011 ANSYS, Inc. May 20, 2012 10

How should I read speed-up curves? This is the solver part – excellent scaling!

This is YOUR time (elapsed)

© 2011 ANSYS, Inc. May 20, 2012 11

The right combination of algorithms and

hardware leads to maximum efficiency

© 2011 ANSYS, Inc. May 20, 2012 12

Shared Memory Parallel vs Distributed Memory Parallel

© 2011 ANSYS, Inc. May 20, 2012 13

Challenges and solutions for the distributed method

Challenges Solutions

Efficient and relevant decomposition

Partitioning methods, Solver

Load Balancing

Partitioning methods, Solver

Speed Hardware(Processors, Interconnects), Solver

Maximum Problem Size Hardware (RAM), Solver

I/O to communicate between cores.

Hardware (Interconnects), MPI, Solver

I/O to write results and overflow files during solution.

Hardware (Disks, Interconnects), MPI, Solver

© 2011 ANSYS, Inc. May 20, 2012 14

Sparse or iterative solvers?

Solver type Distributed/Shared Memory

SPARSE (direct)

DMP/SMP

PCG (Iterative)

DMP/SMP

LANB (direct, modal) SMP

LANPCG (iterative, modal) DMP/SMP

SNODE SMP

© 2011 ANSYS, Inc. May 20, 2012 15

Get it in-core!

Incore - 24GB

Optimal - 24GB

Minimum - 24GB

Optimal -4GB Minimum - 4GB

0

500

1000

1500

2000

2500

Time (s)

© 2011 ANSYS, Inc. May 20, 2012 16

Check the PCG level!

© 2011 ANSYS, Inc. May 20, 2012 17

Balancing the load: a key to efficiency

© 2011 ANSYS, Inc. May 20, 2012 18

A consequence for contact users

© 2011 ANSYS, Inc. May 20, 2012 19

Remote Load or Displacements

Point moment and it is distributed to internal surface of the hole

Deformed shape

All nodes connected to one RBE3 node have to be grouped into the same domain. This hurts load balance! Try to reduce # of RBE3 nodes.

© 2011 ANSYS, Inc. May 20, 2012 20

The choice of hardware will condition the performance of the solution

© 2011 ANSYS, Inc. May 20, 2012 21

Interconnects to ensure data traffic

3 Millions DOF using direct sparse solver Solid 95 elements, worst case for a direct solver

0

1000

2000

3000

4000

5000

6000

16 32 64 128

Wall Time (secs)

Cores

TrueScale vs GigE (16 Nodes) In Core Memory

TrueScale

Gig-E

© 2011 ANSYS, Inc. May 20, 2012 22

Disks to efficiently handle I/O

© 2011 ANSYS, Inc. May 20, 2012 23

Approximate size of models wrt configuration

RAM (GB) PCG Sparse

PC1 (standard ) ~ 16 ~15 ~1

PC2 ( Hautes Performances) ~96 ~100 ~10

– PC1 high end ( 64 bit , 12 GB <= RAM <= 24 GB , 2 disques RAID 0 ( Desktop or Laptop )

– PC2 ultra high end (64 bit ,96 GB , 4 disques RAID 0 )

(Size of models in million dof)

© 2011 ANSYS, Inc. May 20, 2012 24

Taking advantage of new hardware solutions: GPU

© 2011 ANSYS, Inc. May 20, 2012 25

Speed-up from GPU technology

Solder Joint Benchmark - 4M DOF, Creep Strain Analysis

Results Courtesy of MicroConsult Engineering, GmbH

Linux cluster : Each node contains 12 Intel Xeon 5600-series cores, 96 GB RAM, NVIDIA Tesla M2070, InfiniBand

Mold

PCB

Solder balls

© 2011 ANSYS, Inc. May 20, 2012 26

Speed-up from multiple nodes with 1 GPU board per node

Mold

PCB

Solder balls

Results Courtesy of MicroConsult Engineering, GmbH

1 node @ 8 cores no GPU

1 nodes @ 8 cores, 1 GPU

8 nodes@ 1 core, 8 GPU

2 nodes@ 4 cores, 2 GPU

© 2011 ANSYS, Inc. May 20, 2012 27

Reduce computation times from hours to minutes, from days

to hours – some examples

© 2011 ANSYS, Inc. May 20, 2012 28

Typical benchmark results

0

10

20

30

40

50

60

70

80

1 2 4 8 32 64 128

Co

re S

olv

er

Sp

eed

up

Number of Cores

ANSYS Mechanical 12.0 10M DOF Distributed ANSYS PCG Solver

Intel Xeon 5500 Processor Series ("Nehalem") 0

1000

2000

3000

4000

5000

16 32 64 128 256

ANSYS Mechanical 12.0 3M DOF Distributed ANSYS Sparse Solver

AMD Opteron 2360 ("Barcelona") QLogic TrueScale Infiniband

© 2011 ANSYS, Inc. May 20, 2012 29

What could it look like on your model?

6 Mio Degrees of Freedom Plasticity, Contact Bolt pretension 4 load steps

1 HPC Pack

© 2011 ANSYS, Inc. May 20, 2012 30

BGA Model

© 2011 ANSYS, Inc. May 20, 2012 31

BGA Model – Mesh pictures

© 2011 ANSYS, Inc. May 20, 2012 32

BGA Model – Deformations and stresses

© 2011 ANSYS, Inc. May 20, 2012 33

BGA Model – comparing solvers

PCG lev 1 Sparse

Elapsed solve 425 sec 4512 sec

solver Mflops 16066 35690

Elapsed tot 580 sec 4717 sec

© 2011 ANSYS, Inc. May 20, 2012 34

BGA Model – comparing hardware configurations (PCG solver)

Laptop 2 X9100 cores 3.06 Ghz

Desktop 8 W5580 cores 3.20 Ghz

Elapsed PCG 1394 sec 425 sec

PCG solver Mflops

5020 16066

Elapsed tot 1753 sec 580 sec

© 2011 ANSYS, Inc. May 20, 2012 35

BGA Model – elapsed time vs model size

0

1000

2000

3000

4000

5000

6000

7000

8000

0 20 40 60 80 100 120 140

Elapsed solve ( sec)

Elapsed solve

Million DOF

© 2011 ANSYS, Inc. May 20, 2012 36

Use ANSYS HPC solutions to perform efficient design variations of large structural models

© 2011 ANSYS, Inc. May 20, 2012 37

Q & A

Recommended