37
© 2011 ANSYS, Inc. May 20, 2012 1 High Performance Computing: A Review of Parallel Computing with ANSYS solutions “Efficient and Smart Solutions for Large Models”

High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

  • Upload
    builiem

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 1

High Performance Computing: A Review of Parallel Computing with ANSYS solutions

“Efficient and Smart Solutions for Large Models”

Page 2: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 2

Use ANSYS HPC solutions to perform efficient design variations of large structural models

Page 3: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 3

Everyone can take advantage of HPC

solutions for faster computation and variations of large

models

Page 4: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 4

All users can benefit from HPC computations

Page 5: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 5

Most of the analysis types can be accelerated

Static linear or nonlinear analyses Buckling analyses Modal analyses Harmonic & Transient response analyses using the FULL method Low-frequency electromagnetic analysis High-frequency electromagnetic analysis Coupled-field analyses Superelements (use pass) Cyclic symmetry analyses

Page 6: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 6

Size of the model – how large is “large”?

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1 2 3 4

Number of CPUs

Sp

eed

Up 1000 elements

8000 elements

64000 elements

512000 elements

Page 7: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 7

A simple and productive licensing scheme

ANSYS HPC Pack

ANSYS HPC Workgroup

Page 8: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 8

Which part of the simulation is faster?

Page 9: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 9

Not all steps of the simulation are parallel

Model solution

Page 10: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 10

How should I read speed-up curves? This is the solver part – excellent scaling!

This is YOUR time (elapsed)

Page 11: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 11

The right combination of algorithms and

hardware leads to maximum efficiency

Page 12: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 12

Shared Memory Parallel vs Distributed Memory Parallel

Page 13: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 13

Challenges and solutions for the distributed method

Challenges Solutions

Efficient and relevant decomposition

Partitioning methods, Solver

Load Balancing

Partitioning methods, Solver

Speed Hardware(Processors, Interconnects), Solver

Maximum Problem Size Hardware (RAM), Solver

I/O to communicate between cores.

Hardware (Interconnects), MPI, Solver

I/O to write results and overflow files during solution.

Hardware (Disks, Interconnects), MPI, Solver

Page 14: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 14

Sparse or iterative solvers?

Solver type Distributed/Shared Memory

SPARSE (direct)

DMP/SMP

PCG (Iterative)

DMP/SMP

LANB (direct, modal) SMP

LANPCG (iterative, modal) DMP/SMP

SNODE SMP

Page 15: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 15

Get it in-core!

Incore - 24GB

Optimal - 24GB

Minimum - 24GB

Optimal -4GB Minimum - 4GB

0

500

1000

1500

2000

2500

Time (s)

Page 16: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 16

Check the PCG level!

Page 17: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 17

Balancing the load: a key to efficiency

Page 18: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 18

A consequence for contact users

Page 19: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 19

Remote Load or Displacements

Point moment and it is distributed to internal surface of the hole

Deformed shape

All nodes connected to one RBE3 node have to be grouped into the same domain. This hurts load balance! Try to reduce # of RBE3 nodes.

Page 20: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 20

The choice of hardware will condition the performance of the solution

Page 21: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 21

Interconnects to ensure data traffic

3 Millions DOF using direct sparse solver Solid 95 elements, worst case for a direct solver

0

1000

2000

3000

4000

5000

6000

16 32 64 128

Wall Time (secs)

Cores

TrueScale vs GigE (16 Nodes) In Core Memory

TrueScale

Gig-E

Page 22: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 22

Disks to efficiently handle I/O

Page 23: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 23

Approximate size of models wrt configuration

RAM (GB) PCG Sparse

PC1 (standard ) ~ 16 ~15 ~1

PC2 ( Hautes Performances) ~96 ~100 ~10

– PC1 high end ( 64 bit , 12 GB <= RAM <= 24 GB , 2 disques RAID 0 ( Desktop or Laptop )

– PC2 ultra high end (64 bit ,96 GB , 4 disques RAID 0 )

(Size of models in million dof)

Page 24: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 24

Taking advantage of new hardware solutions: GPU

Page 25: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 25

Speed-up from GPU technology

Solder Joint Benchmark - 4M DOF, Creep Strain Analysis

Results Courtesy of MicroConsult Engineering, GmbH

Linux cluster : Each node contains 12 Intel Xeon 5600-series cores, 96 GB RAM, NVIDIA Tesla M2070, InfiniBand

Mold

PCB

Solder balls

Page 26: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 26

Speed-up from multiple nodes with 1 GPU board per node

Mold

PCB

Solder balls

Results Courtesy of MicroConsult Engineering, GmbH

1 node @ 8 cores no GPU

1 nodes @ 8 cores, 1 GPU

8 nodes@ 1 core, 8 GPU

2 nodes@ 4 cores, 2 GPU

Page 27: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 27

Reduce computation times from hours to minutes, from days

to hours – some examples

Page 28: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 28

Typical benchmark results

0

10

20

30

40

50

60

70

80

1 2 4 8 32 64 128

Co

re S

olv

er

Sp

eed

up

Number of Cores

ANSYS Mechanical 12.0 10M DOF Distributed ANSYS PCG Solver

Intel Xeon 5500 Processor Series ("Nehalem") 0

1000

2000

3000

4000

5000

16 32 64 128 256

ANSYS Mechanical 12.0 3M DOF Distributed ANSYS Sparse Solver

AMD Opteron 2360 ("Barcelona") QLogic TrueScale Infiniband

Page 29: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 29

What could it look like on your model?

6 Mio Degrees of Freedom Plasticity, Contact Bolt pretension 4 load steps

1 HPC Pack

Page 30: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 30

BGA Model

Page 31: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 31

BGA Model – Mesh pictures

Page 32: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 32

BGA Model – Deformations and stresses

Page 33: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 33

BGA Model – comparing solvers

PCG lev 1 Sparse

Elapsed solve 425 sec 4512 sec

solver Mflops 16066 35690

Elapsed tot 580 sec 4717 sec

Page 34: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 34

BGA Model – comparing hardware configurations (PCG solver)

Laptop 2 X9100 cores 3.06 Ghz

Desktop 8 W5580 cores 3.20 Ghz

Elapsed PCG 1394 sec 425 sec

PCG solver Mflops

5020 16066

Elapsed tot 1753 sec 580 sec

Page 35: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 35

BGA Model – elapsed time vs model size

0

1000

2000

3000

4000

5000

6000

7000

8000

0 20 40 60 80 100 120 140

Elapsed solve ( sec)

Elapsed solve

Million DOF

Page 36: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 36

Use ANSYS HPC solutions to perform efficient design variations of large structural models

Page 37: High Performance Computing: A Review of Parallel ... - · PDF fileHigh Performance Computing: A Review of Parallel Computing with ANSYS solutions ... 10M DOF Distributed ANSYS PCG

© 2011 ANSYS, Inc. May 20, 2012 37

Q & A