16
MSC Confidential MSC Nastran Sparse Direct Solvers for Tesla GPUs Cheng Liao May 16, 2012

MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

  • Upload
    others

  • View
    27

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

MSC Nastran Sparse Direct Solvers for

Tesla GPUs

Cheng Liao

May 16, 2012

Page 2: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

Background Information

• MSC Nastran is one of the oldest and most widely used FEA

solver codes for Aerospace, Automotive and manufacturing in

general.

• First GPU feature for real symmetric sparse direct solver is

released in 2012.1, 2nd half of 2011.

• Additional GPU features for complex and unsymmetric sparse

direct solvers will be in a 2012 point release. (subject to change)

• GPU rank update kernels are developed by Nvidia Engineers for

MSCLDL and MSCLU in house solvers. (Thank you Nvidia!)

• Unsymmetric, 3M and CAS extensions are done by MSC.

• On-going work.

2

Page 3: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

Sparse Direct Factorization

3

Original Matrix [A ] Factor Matrix [L][D][L]t

Page 4: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

Multi-frontal Algorithm

4

In a proper order, do the

following at each node.

Assembly

Pivoting

Block Gaussian Elimination:

Solve Factorization Update (Contribution Block)

From Global Stiffness &

contribution blocks

From Element Stiffness &

contribution blocks or

ttLDLADLA

L

AA

AA

21112122

1

111121

11

2221

11 00

1 2

3 4

6 7

5 8

9

11

10

Page 5: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

Three Prong Approach for FP Intensive Kernels

• Sys653=0 (default): segmented rank1 factorization and solve, segmented

s/d/c/zgemm update, thresholding 1x1 and 2x2 pivoting, medium segment size,

least memory usage.

• Sys653=1: segmented rank-1 factorization, segmented s/d/c/ztrsm solve,

segmented s/d/c/zgemm update, no pivoting, larger segment size, slightly more

memory usage.

• Sys653=3: s/d/c/zsytrf and s/d/c/zgetrf factorization, s/d/c/ztrsm solve, segmented

s/d/c/zgemm update, supernodal 1x1 and 2x2 pivoting, largest memory usage.

5

sys653=0 sys653=1 sys653=3

Page 6: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

Superelement Matrix Domain Solver

6

j

bb

j

bi

j

ib

j

iij

KK

KKK

2S

4S3S

1S

j

ib

j

ii

j

bi

j

bb

j

b KKKKK

n

j

j

bb KK1

bbb FXK

Superelement stiffness:

Superelement boundary stiffness:

Global stiffness assembly:

Global equilibrium:

Page 7: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

Segmented (rankN) Update with Staged DGEMM

7

S1 S2 S1 S2 S1 S2

omp omp omp omp omp omp

Copyback GPU->CPU

OMP update on CPU

Compute on GPU S2 S1

S1 S2

S2 S1

tLDL 211121

tLDLA 21112122

Copy & scale

Rank Size

nfront

icol ifrtf ifrtl

Page 8: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

C2070/Westmere-EP Update Gflops vs Rank Size

8

0

50

100

150

200

250

300

350

400

450

32 64 96 128 192 256 384 512

cs

cs_3m_cut

cu

cs_cas_3m

rs

ru

rs_cas

nfront=8000

(rank size)

Page 9: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

C2070/Westmere-EP Update Gflops vs Front Size

9

0

50

100

150

200

250

300

350

400

1k 2k 3k 5k 7k 8k 9k 11k

cs

cs_3m_cut

cu

cs_cas_3m

rs

ru

rs_cas

nrank=384

(front size)

Page 10: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

DMP+SMP+GPU (943K dof Crankshaft SOL101)

10

0

100

200

300

400

500

600

700

800

1G6C 1C 2G6C 6C

solver only

sys653=0

sys653=1

sys653=3

Westmere3.33ghz 2x6

Tesla C2070/2075 48GB

Lower, Better

Page 11: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

Large 2.4M dof Linear Static SOL101

11

0

20

40

60

80

100

120

1G4C 4C

sys653=0

sys653=1

sys653=3

Lower, Better Westmere2.93ghz 2x6

Tesla 2x M2070 96GB

Page 12: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

12

SOL108, NVH

Exterior Acoustics simulation

>5X Speedup with 1 GPU over Serial run

~2x Speedup with 2 GPUs over 8 cores

EMEA Customer Model

Structural & Acoustic mesh

with Infinite elements

DOF: 0.165M 0

2

4

6

8

serial 4c 4c1g 8c (dmp=2) 8c2g (dmp=2)

Exterior Acoustics (3 FREQ1 increments) Speedup over Serial run

Server node: Westmere 2.93 GHz, 2x 6 core

Tesla 2x M2070 GPU, 96GB memory

Page 13: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

13

SOL108, NVH

Trimmed Car Body Frequency Response

USA Automotive OEM Model

Grids: 1.2M

Shells (CQUAD4): 1.04M

Solids (CTETRA): 0.1M

DOF: 7.47M 0

50

100

150

200

250

serial 4c 4c+1g 8c (dmp=2on 2 nodes)

8c+2g (dmp=2on 2 nodes)

Trimmed Car Body NVH (3 FREQ1 increments; 5 load cases)

Elapsed Time in minutes

Server node: Westmere 2.93 GHz, 2x 6 core

Tesla 2x M2070 GPU, 96GB memory 1x

2x

3.7x 3.7x 5.6x

~4X Speedup with 1 GPU over Serial run

1.55x Speedup with 2 GPUs over 8 cores

Page 14: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

14

SOL108, NVH

Coupled Structural-Acoustics simulation

EMEA Automotive OEM Model

Grids: 0.71M

CTETRAs: 3.8M

DOF: 0.71M 0

20

40

60

80

serial 4c (smp) 4c + 1g 8c (smp) 8c(dmp=2)

8c + 2g(dmp =2)

NVH (FREQ1: limited to 5 frequency increments) Elapsed Time in Minutes

1x

2x

4x

2.5x

3.6x 5.5x

Server node: Westmere 2.93 GHz, 2x 6 core

Tesla 2x M2070 GPU, 96GB memory

4X Speedup with 1 GPU over Serial run

~1.55x Speedup with 2 GPUs over 8 cores

Page 15: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

Conclusions

• GPGPU can give very significant performance boost for direct

solver intensive large jobs, ie. max front > 10000 for real data and

> 5000 for complex data models.

• Multiple GPGPU performance is delivered by SOL108

(embarrassingly parallel) and matrix domain solver in SOL101.

• Nvidia and MSC continue to work together to tune BLAS and

LAPACK kernels for MSCLDL and MSCLU.

• Tree-level parallelism, using GPU memory to cache large sub-tree

computing, etc. would provide future road maps to even greater

performance.

• A number of other Nastran functional areas are candidates for

GPU acceleration.

15

Page 16: MSC Nastran Sparse Direct Solvers for Tesla GPUs - GTC 2012developer.download.nvidia.com/.../S0064-GTC2012-MSC-Nastran-Solver… · • MSC Nastran is one of the oldest and most widely

MSC Confidential

Acknowledgement

16

NVIDIA: Steve Harpster, Srinivas Kodiyalam, Stan Posey, Thomas Reed,

Steve Rennich, Peng Wang, Shucai Xiao.

MSC: Sunny Gogar, Joe Griffin, Dan’l Pierce, Peter Schartz, Alex TenEyck,

Paul Vanderwalt, Ted Wertheimer, Gaowen Ye, members of MSC

India QA team.