23
Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

  • Upload
    dophuc

  • View
    243

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

Kostas Sikelis, Phd

Software Development specialist, Optistruct

MUMPS User Days, June 2017

MUMPS v 5.1.1

Page 2: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Outline

• Optistruct - Overview

• Comparison of MUMPS 5.1.1 vs 5.0.1

• 64 bit MUMPS

• BLR/ABLR – Preliminary results

• Wish list

Page 3: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

OptiStruct - Overview

Large Scale Computing and Parallelization

CompositesVibrations and

Acoustics

Linear and

Nonlinear

Analysis

Fatigue Multiphysics

Optimization

Page 4: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Optistruct – MUMPS usage

• Linear static analysis

• Symmetric positive definite or indefinite systems

• Nonlinear static/transient analysis

• Unsymmetric systems

• Lanczos Eigensolver

• Symmetric indefinite systems

• Direct Frequency response Analysis

• Symmetric complex systems

Page 5: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Optistruct – MUMPS Benchmark

Model # of Equations # of Nonzeros # of Elements Element Type

Knuckle 2.8M 117M 650K Solid (CTETRA10)

Car body 12M 320M 1.8M Shell (CQUAD4/CTRIA3)

366K Solid (CTETRA/CPENTA/CHEXA

• Incore linear static analysis.

Page 6: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Optistruct – MUMPS Benchmark

Data extracted from MUMPS report

• Time

• ELAPSED TIME IN ANALYSIS DRIVER

• ELAPSED TIME IN FACTORIZATION DRIVER

• ELAPSED TIME IN SOLVE DRIVER

• Total Memory

• TOTAL space in MBYTES for IC factorization

• ** EFF Min: Avg. Space in MBYTES per working proc (x # of MPI-Processes)

• ** Avg. Space in MBYTES per working proc during solve (x # of MPI-Processes)

Page 7: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

MUMPS in Optistruct – 5.1.1 vs 5.0.1

• Slight performance drop in Analysis phase due to 64bit METIS

• Improved MPI scaling both in Factorization and Solve phases

• Signifficantly improved SMP scaling both in Factorization and Solve phases

1 2 4 8 16

5.1.1 80 77 80 83 95

5.0.1 66 67 67 71 79

0

20

40

60

80

100

# of mpi

Analysis Wall time - Shell

1 2 4 8 12

5.1.1 52 53 53 54 55

5.0.1 45 44 44 46 47

0

10

20

30

40

50

60

# of mpi

Analysis Wall time - Solid

Page 8: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

1 2 4 8 12

5.1.1 7172 4567 2515 2227 1248

5.0.1 7072 3817 3230 2302 1316

010002000300040005000600070008000

# of mpi

Factorization Wall time (s) - Solid

-19,6%

-1,4%

22,1%3,2%

5,1%

MUMPS in Optistruct – 5.1.1 vs 5.0.1

• Slight performance drop in Analysis phase due to 64bit METIS

• Improved MPI scaling both in Factorization and Solve phases

• Signifficantly improved SMP scaling both in Factorization and Solve phases

1 2 4 8 16

5.1.1 335 168 108 77 55

5.0.1 328 180 116 72 60

050

100150200250300350400

# of mpi

Factorization Wall time (s) - Shell

-2,1%

6,7%

6,9%-6,9%

8,3%

Page 9: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

MUMPS in Optistruct – 5.1.1 vs 5.0.1

• Slight performance drop in Analysis phase due to 64bit METIS

• Improved MPI scaling both in Factorization and Solve phases

• Signifficantly improved SMP scaling both in Factorization and Solve phases (even more potential withaggressive setting)

1 2 4 8 12

5.1.1 7172 3605 1940 1073 779

5.0.1 7072 3729 2131 1286 1010

010002000300040005000600070008000

# of threads

Factorization Wall time (s) - Solid

22,9%

-1,4%

3,3%

9,0%16,6%

1 2 4 8 16

5.1.1 335 225 169 163 163

5.0.1 328 270 201 207 201

050

100150200250300350400

# of threads

Factorization Wall time (s) - Shell

-2,1%

16,7%15,9% 21,2% 18,9%

Page 10: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Optistruct – MUMPS 64bit (full)

• Motivation

• Industrial models challenge 32bit integer capacity

• Quick solution: Promote all integers to 64

• MPI-Free Implementation

• Compile Fortran source forcing all integers to be 64bit (-i8 or -fdefault-integer-8 in OPTF)

• Compile C source forcing all integers to be 64bit (-DINTSIZE64 in OPTC)

• Link with 64bit BLAS libraries

• MPI : Intel supports 64 bit integer mpi with option –ilp64 in mpirun command, but with the following limitation

• In REDUCE operations MPI_2INTEGER data type is not promoted, send and receive buffers must be INTEGER(4)

• For custom reduction functions used in REDUCE operations the integer arguments must be INTEGER(4)

Page 11: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Optistruct – MUMPS 64bit (full)

i. bcast_errors.F@16 : INTEGER IN(2), OUT(2) --> INTEGER(4) IN(2), OUT(2)

ii. tools_common.F@269 : INTEGER TEMP1(2), TEMP2(2) --> INTEGER(4) TEMP1(2), TEMP2(2)

iii. [dzsc]fac_scalings_simScale_util.F@23 : INTEGER IWRK(IWSZ) --> INTEGER(4) IWRK(IWSZ)

iv. [dzsc]fac_scalings_simScale_util.F@433-436: INTEGER LEN, INV(2*LEN), INOUTV(2*LEN), DTYPE --> INTEGER(4) LEN, INV(2*LEN), INOUTV(2*LEN), DTYPE

v. [dzsc]fac_scalings_simScale_util.F@466 : INTEGER IW(IWSZ) --> INTEGER(4) IW(IWSZ)

vi. [dzsc]fac_scalings_simScale_util.F@923 : INTEGER IWRK(IWSZ) --> INTEGER(4) IWRK(IWSZ)

vii. [dzsc]mumps_driver.F@1642 : INTEGER TMP1(2),TMP(2) --> INTEGER(4) TMP1(2),TMP(2)

• MUMPS 5.0.1 : Apply above changes manually

• MUMPS 5.1.1 : -DWORKAROUNDINTELILP64MPI2INTEGER

Page 12: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.Copyright © 2015 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

• MUMPS 5.0.1

• NZ -> 32 bit

• Stiffness definitely less than 2 billion

• 32bit integer for internal symbolic

analysis

• No more than 1 billion terms

• Even smaller model for unsymmetric

matrix from nonlinear with friction

• Interface to METIS 32bit API

• Need scotch sometimes

Optistruct – MUMPS 64 bit (selective)

• MUMPS 5.1.1

• NZ -> 32 bit, NNZ -> 64bit

• Backward compatible

• Solve stiffness with more than 2

billion terms if NNZ is used

• 64bit integer for internal symbolic

analysis

• Solve more than 1 billion terms

• Interface to METIS 64bit API

• Avoid METIS overflow

• Better performance

Page 13: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

MUMPS in Optistruct – selective 64 bit vs full

• Analysis phase is a bit faster

• Factorization phase occasionally much faster.

• Solve phase a bit slower.

• Considerable reduction in Total Memory.

1x12 2x6 4x3 6x2 12x1

OS32 55 53 53 57 55

OS64 55 55 55 56 57

0

10

20

30

40

50

60

#of mpi x # of threads

Analysis Wall time (s) - Solid

0% 3,8% 3,8% -1,8% 3,6%

1x16 2x8 4x4 8x2 16x1

OS32 80 76 76 80 95

OS64 82 79 81 85 99

0

20

40

60

80

100

120

#of mpi x # of threads

Analysis Wall time (s) - Shell

2,5% 3,9% 6,6% 6,2%4,2%

Page 14: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

MUMPS in Optistruct – selective 64 bit vs full

1x12 2x6 4x3 6x2 12x1

OS32 780 942 1020 1011 1248

OS64 776 921 1262 1118 1257

0

200

400

600

800

1000

1200

1400

#of mpi x # of threads

Factorization Wall time (s) - Solid

-0,5%-2,2% 23,7% 10,6%

0,7%

1x16 2x8 4x4 8x2 16x1

OS32 163 82 59 49 55

OS64 157 81 61 58 89

0

50

100

150

200

#of mpi x # of threads

Factorization Wall time (s) - Shell

-3,7%

-1,2%3,4%

18,4% 61,9%

• Analysis phase is a bit faster

• Factorization phase occasionally much faster.

• Solve phase a bit slower.

• Considerable reduction in Total Memory.

Page 15: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

MUMPS in Optistruct – selective 64 bit vs full

• Analysis phase is a bit faster

• Factorization phase occasionally much faster.

• Solve phase a bit slower.

• Considerable reduction in Total Memory.

1x12 2x6 4x3 6x2 12x1

OS32 9,2 6,5 5,2 5,3 5,8

OS64 9,6 6,5 4,8 4,6 5,4

0

2

4

6

8

10

12

#of mpi x # of threads

Solve Wall time (s) - Solid

4,3%

0%-7,7% -6,9%-13,2%

1 2 4 8 16

OS32 10,7 7,5 6,5 2,6 3,3

OS64 10,8 7,5 5,2 3,5 3,2

0

2

4

6

8

10

12

#of mpi x # of threads

Solve Wall time (s) - Shell

0,9%

0%-20%

34,6%-3%

Page 16: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

MUMPS in Optistruct – selective 64 bit vs full

• Analysis phase is a bit faster

• Factorization phase occasionally much faster.

• Solve phase a bit slower.

• Considerable reduction in Total Memory.

1x12 2x6 4x3 6x2 12x1

OS32 85122 102054 115483 152051 201719

OS64 91174 111843 137507 185438 225805

0

50000

100000

150000

200000

250000

#of mpi x # of threads

Total Memory Estimate (MB) - Solid

7,1%9,6%

19%

22%

12%

1x16 2x8 4x4 8x2 16x1

OS32 41182 43499 45981 52768 61797

OS64 42909 46238 52252 60256 71286

01000020000300004000050000600007000080000

#of mpi x # of threads

Total Memory Estimate (MB) - Shell

4,2%6,3% 13,6%

14,2%15,4%

Page 17: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

MUMPS in Optistruct – BLR/ABLR

• Analysis phase is a bit slower (for solid).

• Factorization and Solve phases up to 3 times faster (for solid)

• Reduced accuracy of the solution. Solution changes with the number of MPI-processes

1x16 2x8 4x4 8x2 16x1

BLR 84 83 91 99 108

ABLR 86 83 86 90 108

5.1.1 80 76 76 80 95

020406080

100120

#of mpi x # of threads

Analysis Wall time (s) - Shell

1x12 2x6 4x3 6x2 12x1

BLR 73 72 72 76 73

ABLR 72 72 72 76 73

5.1.1 55 53 53 57 55

01020304050607080

#of mpi x # of threads

Analysis Wall time (s) - Solid

Page 18: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

MUMPS in Optistruct – BLR/ABLR

• Analysis phase is a bit slower (for solid).

• Factorization and Solve phases up to 3 times faster (for solid)

• Reduced accuracy of the solution. Solution changes with the number of MPI-processes

1x16 2x8 4x4 8x2 16x1

BLR 153 77 53 46 55

ABLR 158 77 52 43 52

5.1.1 163 82 59 49 55

0

50

100

150

200

#of mpi x # of threads

Factorization Wall time (s) - Shell

1x12 2x6 4x3 6x2 12x1

BLR 362 282 294 328 349

ABLR 296 256 262 307 314

5.1.1 780 942 1020 1011 1248

0200400600800

100012001400

#of mpi x # of threads

Factorization Wall time (s) - Solid

x2,1x3,3 x3,5 x3 x3,5

Page 19: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

MUMPS in Optistruct – BLR/ABLR

• Analysis phase is a bit slower (for solid).

• Factorization and Solve phases up to 3 times faster (for solid)

• Reduced accuracy of the solution. Solution changes with the number of MPI-processes

1x12 2x6 4x3 6x2 12x1

BLR 2,4640E+03 2,5385E+03 2,4489E+03 2,6225E+03 2,5359E+03

ABLR 2,4640E+03 2,4649E+03 4,8555E+04 2,3939E+06 1,5674E+03

5.1.1 2,43367E+032,43367E+03 2,43367E+032,43367E+032,43367E+03

1,0E+001,0E+011,0E+021,0E+031,0E+041,0E+051,0E+061,0E+07

#of mpi x # of threads

Log(Compliance) - Solid

1x12 2x6 4x3 6x2 12x1

BLR 1,2578E-02 4,2218E-02 6,4007E-03 7,3649E-02 4,1007E-02

ABLR 1,2578E-02 1,8823E-02 2,3833E+00 1,9451E+00 3,4821E+00

5.1.1 2,3268E-10 2,3735E-10 2,4897E-10 2,8041E-10 2,9909E-10

1,0E-10

1,0E-08

1,0E-06

1,0E-04

1,0E-02

1,0E+00

#of mpi x # of threads

Epsilon - Solid

Page 20: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Optistruct – Wish list

• Robust detection of ill-conditioned (near-singular) matrix

• Often MUMPS provides physically meaningless solutions to ill-conditioned systems.

• Preferable just error out.

• Possibly a parameter similar to MAXRATIO used for LDLT i.e 𝒓 = 𝒎𝒂𝒙(𝑫𝒊𝒂𝒈 𝑨 𝒊𝒊

𝑫𝒊)

Page 21: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Optistruct – Wish list

• Detection of NaN in input matrix

• A failsafe in case input is erroneous.

• Preferable to error out instead of delivering an erroneous solution

Page 22: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

© 2016 Altair Engineering, Inc. Proprietary and Confidential. All rights reserved.

Optistruct – Wish list

• Suggestion of optimal reordering

0:00:00

0:02:53

0:05:46

0:08:38

0:11:31

0:14:24

0:17:17

0:20:10

0:23:02

0:25:55

METIS SCOTCH PT-SCOTCH

Total timing

0

5

10

15

20

25

30

35

40

METIS SCOTCH PT-SCOTCH

Operations (1.0E+12)

Page 23: MUMPS v 5.1mumps.enseeiht.fr/doc/ud_2017/Sikelis_Talk.pdf · Kostas Sikelis, Phd Software Development specialist, Optistruct MUMPS User Days, June 2017 MUMPS v 5.1.1

Questions?

THANK YOU

MUMPS Team, thank you for a great linear equation solver!