Parallel ESSL forLinux on Power 5.2: Guide and Reference€¦ · for the Banded Linear Algebraic Equations .. . 29 Distributing Data Str uctur es ..... . 33 Chapter 3. Coding and

IBM Parallel Engineering and Scientific Subroutine Libraryfor Linux on POWERVersion 5 Release 2

Parallel ESSL Guide and Reference

GA38-0701-02

IBM

NoteBefore using this information and the product it supports, read the information in “Notices” on page 1075.

This edition applies to:v Version 5 Release 2 of the IBM Parallel Engineering and Scientific Subroutine Library (Parallel ESSL) for Linux on

POWER licensed program, program number 5765-EL5

and to all subsequent releases and modifications until otherwise indicated by a new edition.

Significant changes or additions to the text and illustrations are marked by a vertical line (|) to the left of thechange.

IBM welcomes your comments. see the topic “How to Send Your Comments” on page xvii. When you sendinformation to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believesappropriate without incurring any obligation to you.

© Copyright IBM Corporation 1995, 2015.US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contractwith IBM Corp.

Contents

Tables . . . . . . . . . . . . . .. vii

About this information . . . . . . .. ixHow to Find a Subroutine Description . . . .. ixWhere to Find Related Publications . . . . .. ix

Using Bibliography References . . . . . .. xIBM Request for Enhancement (RFE) Community .. xSpecial Terms . . . . . . . . . . . . .. xHow to Interpret Product Names Used in ThisDocument . . . . . . . . . . . . . .. xiiAbbreviated Names . . . . . . . . . .. xiiFonts . . . . . . . . . . . . . . .. xiiScalar Data Notations. . . . . . . . . .. xiiiSpecial Characters, Symbols, Expressions, andAbbreviations . . . . . . . . . . . .. xiiiHow to Interpret the Subroutine Descriptions . .. xv

Syntax . . . . . . . . . . . . . .. xvOn Entry . . . . . . . . . . . . .. xviOn Return . . . . . . . . . . . .. xviNotes and Coding Rules . . . . . . . .. xviError Conditions . . . . . . . . . .. xviExample . . . . . . . . . . . . .. xvi

How to Send Your Comments. . . . . . .. xvii

Summary of changes . . . . . . .. xix

Part 1. Guide Information . . . . .. 1

Chapter 1. Overview, Requirements, andList of Subroutines . . . . . . . . .. 3Overview of Parallel ESSL . . . . . . . . .. 3

How Parallel ESSL Works . . . . . . . .. 3Accuracy of the Computations . . . . . .. 5The Fortran Language Interface to the ParallelESSL Subroutines . . . . . . . . . . .. 5

Hardware and Software Products That Can Be Usedwith Parallel ESSL . . . . . . . . . . .. 5

Hardware Products Supported by Parallel ESSL . 5Operating Systems Supported by Parallel ESSL .. 5Software Products Required by Parallel ESSL .. 5Thread Safety and Parallel ESSL . . . . . .. 6Installation and Customization of Parallel ESSL.. 6

Software Products Required for Displaying ParallelESSL Documentation . . . . . . . . . .. 7BLACS—Usage in Parallel ESSL for Communication 7List of Parallel ESSL Subroutines . . . . . .. 8

Level 2 PBLAS. . . . . . . . . . . .. 8Level 3 PBLAS. . . . . . . . . . . .. 9Linear Algebraic Equations . . . . . . .. 10Eigensystem Analysis and Singular ValueAnalysis . . . . . . . . . . . . .. 14Fourier Transforms . . . . . . . . . .. 16Random Number Generation . . . . . .. 17Utilities. . . . . . . . . . . . . .. 18

Chapter 2. Distributing Your Data . .. 21Concepts . . . . . . . . . . . . . .. 21

About Global Data Structures . . . . . .. 21About Process Grids . . . . . . . . .. 21Data Distribution and Your Program . . . .. 22Block, Cyclic, and Block-Cyclic Data Distributions 22

Specifying and Distributing Data in Your Program 26Specifying Block-Cyclically-Distributed Vectorsand Matrices . . . . . . . . . . . .. 26Specifying Block-Cyclically-Distributed Matricesfor the Banded Linear Algebraic Equations . .. 29Distributing Data Structures . . . . . . .. 33

Chapter 3. Coding and Running YourProgram . . . . . . . . . . . . .. 75Coding Tips for Optimizing Parallel Performance.. 75

Choosing How Many MPI Tasks andComputational Threads to Use . . . . . .. 75Considerations for Using GPUs with ParallelESSL . . . . . . . . . . . . . .. 76Parallel ESSL Techniques . . . . . . . .. 76

Avoiding Conflicts with Parallel ESSL and ESSLRoutine Names . . . . . . . . . . . .. 77Coding Your Program . . . . . . . . . .. 77

Using the BLACS . . . . . . . . . .. 78Using Extrinsic Procedures—The Fortran 90Sparse Linear Algebraic Equation Subroutines .. 87Setting Up the Parallel ESSL Header File for Cand C++ . . . . . . . . . . . . .. 88Setting Up the C Interface for the BLACS HeaderFile for C and C++ . . . . . . . . . .. 88Application Program Outline . . . . . .. 88Application Program Outline for the Fortran 90Sparse Linear Algebraic Equations and TheirUtilities. . . . . . . . . . . . . .. 89Application Program Outline for the Fortran 77Sparse Linear Algebraic Equations and TheirUtilities. . . . . . . . . . . . . .. 91

Running Your Program . . . . . . . . .. 92Fortran Program Procedures . . . . . . .. 93C Program Procedures. . . . . . . . .. 93C++ Program Procedures . . . . . . . .. 94

Chapter 4. Migrating Your Programs .. 97Migrating to Parallel ESSL Version 5 Release 2. .. 97Migrating to Parallel ESSL Version 5 Release 1. .. 97

Chapter 5. Using Error Handling. . .. 99Where to Find More Information About Errors .. 99Getting Help from IBM Support . . . . . .. 99National Language Support . . . . . . .. 100PESSL_ERROR_SYNC Environment Variable . .. 100Dealing with Errors . . . . . . . . . .. 101

Program Exceptions . . . . . . . . .. 101Input-Argument Errors . . . . . . . .. 101

© Copyright IBM Corp. 1995, 2015 iii

||

Computational Errors . . . . . . . .. 102Resource Errors . . . . . . . . . .. 103Communication Errors . . . . . . . .. 103Informational and Attention Messages . . .. 104Miscellaneous Errors . . . . . . . . .. 104ESSL Error Messages . . . . . . . . .. 104MPI Error Messages . . . . . . . . .. 104

Messages . . . . . . . . . . . . . .. 104Message Conventions . . . . . . . .. 104Input-Argument Error Messages (001-299) . .. 106Computational Error Messages (300-399) . .. 116Resource Error Messages (400-499) . . . .. 118Communication Error Messages (500-599) . .. 118Informational and Attention Messages (600-699) 118Miscellaneous Error Messages (700-799) . .. 119Input-Argument Error Messages (800-999) . .. 120

Part 2. Reference Information . .. 125

Chapter 6. Level 2 PBLAS . . . . .. 127Overview of the Level 2 PBLAS Subroutines . .. 127Level 2 PBLAS Subroutines. . . . . . . .. 128PDGEMV and PZGEMV — Matrix-Vector Productfor a General Matrix or Its Transpose . . . .. 129PDSYMV and PZHEMV — Matrix-Vector Productfor a Real Symmetric or a Complex HermitianMatrix . . . . . . . . . . . . . . .. 152PDGER, PZGERC, and PZGERU — Rank-OneUpdate of a General Matrix . . . . . . .. 166PDSYR and PZHER — Rank-One Update of a RealSymmetric or a Complex Hermitian Matrix . .. 184PDSYR2 and PZHER2 — Rank-Two Update of aReal Symmetric or a Complex Hermitian Matrix.. 195PDTRMV and PZTRMV — Matrix-Vector Productfor a Triangular Matrix or Its Transpose . . .. 210PDTRSV and PZTRSV — Solution of TriangularSystem of Equations with a Single Right-Hand Side 222

Chapter 7. Level 3 PBLAS . . . . .. 235Overview of the Level 3 PBLAS Subroutines . .. 235Level 3 PBLAS Subroutines. . . . . . . .. 236PDGEMM and PZGEMM — Matrix-Matrix Productfor a General Matrix, Its Transpose, or ItsConjugate Transpose . . . . . . . . . .. 237PDSYMM, PZSYMM, and PZHEMM —Matrix-Matrix Product Where One Matrix is Realor Complex Symmetric or Complex Hermitian .. 254PDTRMM and PZTRMM — TriangularMatrix-Matrix Product . . . . . . . . .. 273PDTRSM and PZTRSM — Solution of TriangularSystem of Equations with Multiple Right-HandSides . . . . . . . . . . . . . . .. 285PDSYRK, PZSYRK, and PZHERK — Rank-KUpdate of a Real or Complex Symmetric or aComplex Hermitian Matrix . . . . . . . .. 298PDSYR2K, PZSYR2K, and PZHER2K — Rank-2KUpdate of a Real or Complex Symmetric or aComplex Hermitian Matrix . . . . . . . .. 313PDTRAN, PZTRANC, and PZTRANU — MatrixTranspose for a General Matrix . . . . . .. 333

Chapter 8. Linear Algebraic Equations 347Overview of the Dense Linear Algebraic EquationSubroutines . . . . . . . . . . . . .. 347Overview of the Banded Linear Algebraic EquationSubroutines . . . . . . . . . . . . .. 349Overview of the Fortran 90 Sparse Linear AlgebraicEquation Subroutines. . . . . . . . . .. 351Overview of the Fortran 77 Sparse Linear AlgebraicEquation Subroutines. . . . . . . . . .. 352Dense Linear Algebraic Equation Subroutines .. 353PDGESV and PZGESV — General MatrixFactorization and Solve . . . . . . . . .. 354PDGETRF and PZGETRF — General MatrixFactorization . . . . . . . . . . . .. 368PDGETRS and PZGETRS — General Matrix Solve 379PDGETRI and PZGETRI — General Matrix Inverse 391PDGECON and PZGECON — Estimate theReciprocal of the Condition Number of a GeneralMatrix . . . . . . . . . . . . . . .. 400PDGEQRF and PZGEQRF — General Matrix QRFactorization . . . . . . . . . . . .. 409PDGELS and PZGELS — General Matrix LeastSquares Solution . . . . . . . . . . .. 419PDPOSV and PZPOSV — Positive Definite RealSymmetric or Complex Hermitian MatrixFactorization and Solve . . . . . . . . .. 433PDPOTRF and PZPOTRF — Positive Definite RealSymmetric or Complex Hermitian MatrixFactorization . . . . . . . . . . . .. 446PDPOTRS and PZPOTRS — Positive Definite RealSymmetric or Complex Hermitian Matrix Solve .. 455PDPOTRI and PZPOTRI — Positive Definite RealSymmetric or Complex Hermitian Matrix Inverse . 466PDPOCON and PZPOCON — Estimation of theReciprocal of the Condition Number of a PositiveDefinite Real Symmetric or Complex HermitianMatrix . . . . . . . . . . . . . . .. 473PDTRTRI and PZTRTRI — Triangular MatrixInverse . . . . . . . . . . . . . .. 482Banded Linear Algebraic Equation Subroutines .. 490PDPBSV and PZPBSV — Positive Definite RealSymmetric or Complex Hermitian Band MatrixFactorization and Solve . . . . . . . . .. 491PDPBTRF and PZPBTRF — Positive Definite RealSymmetric or Complex Hermitian Band MatrixFactorization . . . . . . . . . . . .. 505PDPBTRS and PZPBTRS — Positive Definite RealSymmetric or Complex Hermitian Band MatrixSolve . . . . . . . . . . . . . . .. 516PDGTSV, PDDTSV, and PZDTSV — GeneralTridiagonal Matrix Factorization and Solve . .. 529PDGTTRF, PDDTTRF, and PZDTTRF — GeneralTridiagonal Matrix Factorization . . . . . .. 547PDGTTRS, PDDTTRS, and PZDTTRS — GeneralTridiagonal Matrix Solve . . . . . . . .. 566PDPTSV and PZPTSV — Positive Definite RealSymmetric or Complex Hermitian TridiagonalMatrix Factorization and Solve . . . . . .. 587PDPTTRF and PZPTTRF — Positive Definite RealSymmetric or Complex Hermitian TridiagonalMatrix Factorization . . . . . . . . . .. 605

iv Parallel ESSL forLinux on Power 5.2: Guide and Reference

PDPTTRS and PZPTTRS — Positive Definite RealSymmetric or Complex Hermitian TridiagonalMatrix Solve. . . . . . . . . . . . .. 621Fortran 90 Sparse Linear Algebraic EquationSubroutines and Their Utility Subroutines . . .. 639PADALL — Allocates Space for an ArrayDescriptor for a General Sparse Matrix . . . .. 640PSPALL — Allocates Space for a General SparseMatrix . . . . . . . . . . . . . . .. 642PGEALL — Allocates Space for a Dense Vector .. 644PSPINS — Inserts Local Data into a General SparseMatrix . . . . . . . . . . . . . . .. 646PGEINS — Inserts Local Data into a Dense Vector 650PSPASB — Assembles a General Sparse Matrix .. 652PGEASB — Assembles a Dense Vector . . . .. 655PSPGPR — Preconditioner for a General SparseMatrix . . . . . . . . . . . . . . .. 657PSPGIS — Iterative Linear System Solver for aGeneral Sparse Matrix . . . . . . . . .. 660PGEFREE — Deallocates Space for a Dense Vector 665PSPFREE — Deallocates Space for a General SparseMatrix . . . . . . . . . . . . . . .. 666PADFREE — Deallocates Space for an ArrayDescriptor for a General Sparse Matrix . . . .. 668Example—Using the Fortran 90 Sparse Subroutines 669

Output . . . . . . . . . . . . .. 669Application Program . . . . . . . . .. 670

Fortran 77 Sparse Linear Algebraic EquationSubroutines and Their Utility Subroutines . . .. 676PADINIT — Initializes an Array Descriptor for aGeneral Sparse Matrix . . . . . . . . .. 677PDSPINIT — Initializes a General Sparse Matrix 679PDSPINS — Inserts Local Data into a GeneralSparse Matrix . . . . . . . . . . . .. 681PDGEINS — Inserts Local Data into a DenseVector . . . . . . . . . . . . . . .. 686PDSPASB — Assembles a General Sparse Matrix 689PDGEASB — Assembles a Dense Vector . . .. 693PDSPGPR — Preconditioner for a General SparseMatrix . . . . . . . . . . . . . . .. 695PDSPGIS — Iterative Linear System Solver for aGeneral Sparse Matrix . . . . . . . . .. 698Example—Using the Fortran 77 Sparse Subroutines 703

Application Program . . . . . . . . .. 703

Chapter 9. Eigensystem Analysis andSingular Value Analysis . . . . . .. 709Overview of the Eigensystem Analysis andSingular Value Analysis Subroutines. . . . .. 709Eigensystem Analysis and Singular Value AnalysisSubroutines . . . . . . . . . . . . .. 711PDSYEVX and PZHEEVX — Selected Eigenvaluesand, Optionally, the Eigenvectors of a RealSymmetric or Complex Hermitian Matrix . . .. 712PDSYEVD and PZHEEVD — All Eigenvalues andEigenvectors of a Real Symmetric or ComplexHermitian Matrix Using a ParallelDivide-and-Conquer Algorithm . . . . . .. 733PDSYEV and PZHEEV — All Eigenvalues and,Optionally, the Eigenvectors of a Real Symmetricor Complex Hermitian Matrix . . . . . . .. 747

PDSYGVX and PZHEGVX — Selected Eigenvaluesand, Optionally, the Eigenvectors of a RealSymmetric or Complex Hermitian Positive DefiniteGeneralized Eigenproblem . . . . . . . .. 760PDSYNTRD, PZHENTRD, PDSYTRD, andPZHETRD — Reduce a Real Symmetric orComplex Hermitian Matrix to Tridiagonal Form .. 786PDSYNGST, PZHENGST, PDSYGST, andPZHEGST — Reduce a Real Symmetric orComplex Hermitian Positive Definite GeneralizedEigenproblem to Standard Form . . . . . .. 802PDGEHRD — Reduce a General Matrix to UpperHessenberg Form . . . . . . . . . . .. 817PDGEBRD and PZGEBRD — Reduce a GeneralMatrix to Bidiagonal Form . . . . . . . .. 826PDGESVD and PZGESVD — Singular ValueDecomposition of a General Matrix . . . . .. 843

Chapter 10. Fourier Transforms . .. 859Overview of the Fourier Transforms Subroutines 859

Determining an Acceptable Length of aTransform . . . . . . . . . . . .. 860Acceptable Lengths for the Transforms . . .. 860

Fourier Transforms Subroutines . . . . . .. 862PSCFTD and PDCFTD — MultidimensionalComplex Fourier Transforms . . . . . . .. 863PSRCFTD and PDRCFTD — MultidimensionalReal-to-Complex Fourier Transforms . . . .. 875PSCRFTD and PDCRFTD — MultidimensionalComplex-to-Real Fourier Transforms . . . .. 884PSCFT2 and PDCFT2 — Complex FourierTransforms in Two Dimensions . . . . . .. 893PSRCFT2 and PDRCFT2 — Real-to-ComplexFourier Transforms in Two Dimensions. . . .. 900PSCRFT2 and PDCRFT2 — Complex-to-RealFourier Transforms in Two Dimensions. . . .. 905PSCFT3 and PDCFT3 — Complex FourierTransforms in Three Dimensions . . . . . .. 910PSRCFT3 and PDRCFT3 — Real-to-ComplexFourier Transforms in Three Dimensions . . .. 918PSCRFT3 and PDCRFT3 — Complex-to-RealFourier Transforms in Three Dimensions . . .. 924

Chapter 11. Random NumberGeneration . . . . . . . . . . .. 931Overview of the Random Number GenerationSubroutine . . . . . . . . . . . . .. 931Random Number Generation Subroutine . . .. 932PDURNG — Uniform Random Number Generator 933

Chapter 12. Utilities. . . . . . . .. 939Overview of the Utility Subroutines . . . . .. 939Utility Subroutines . . . . . . . . . .. 941IPESSL — Determine the Level of Parallel ESSLInstalled on Your System . . . . . . . .. 942DESCINIT — Initialize a Type-1 Array Descriptorwith Error Checking . . . . . . . . . .. 944DESCSET — Initialize a Type-1 Array Descriptor 947ICEIL — Compute the Ceiling of the Division ofTwo Integers . . . . . . . . . . . .. 950

Contents v

ILCM — Compute the Least Common Multiple ofTwo Positive Integers. . . . . . . . . .. 951INDXG2L — Compute the Local Row or ColumnIndex of a Global Element of a Block-CyclicallyDistributed Matrix. . . . . . . . . . .. 952INDXG2P — Compute the Process Row or ColumnIndex of a Global Element of a Block-CyclicallyDistributed Matrix. . . . . . . . . . .. 954INDXL2G — Compute the Global Row or ColumnIndex of a Local Element of a Block-CyclicallyDistributed Matrix. . . . . . . . . . .. 956INFOG1L — Compute the Starting Local Row orColumn Index and Process Row or Column Indexof a Global Element of a Block-CyclicallyDistributed Matrix. . . . . . . . . . .. 958INFOG2L — Compute the Starting Local Row andColumn Indices and the Process Row and ColumnIndices of a Global Element of a Block-CyclicallyDistributed Matrix. . . . . . . . . . .. 960NUMROC — Compute the Number of Rows orColumns of a Block-Cyclically Distributed MatrixContained in a Process . . . . . . . . .. 963PDLANGE and PZLANGE — General MatrixNorm . . . . . . . . . . . . . . .. 967PDLANSY, PZLANSY, and PZLANHE — RealSymmetric, Complex Symmetric, or ComplexHermitian Matrix Norm . . . . . . . . .. 974PDLANTR and PZLANTR — Triangular orTrapezoidal Matrix Norm . . . . . . . .. 983

Part 3. Appendixes . . . . . . .. 991

Appendix A. BLACS Quick ReferenceGuide. . . . . . . . . . . . . .. 993Calling sequences . . . . . . . . . . .. 993

Fortran interface for the BLACS . . . . .. 993C interface for the BLACS . . . . . . .. 995

Argument data types . . . . . . . . . .. 996Argument options . . . . . . . . . . .. 996

Appendix B. Sample Programs . . .. 997Sample Programs and Utilities Provided withParallel ESSL . . . . . . . . . . . .. 997

Sample Thermal Diffusion Program . . . .. 998Sample Sparse Linear Algebraic EquationsPrograms . . . . . . . . . . . .. 1032Sample Makefiles and Run Script . . . .. 1068

Appendix C. Accessibility featuresfor Parallel ESSL . . . . . . . .. 1073Accessibility features . . . . . . . . .. 1073IBM and accessibility . . . . . . . . .. 1073

Notices . . . . . . . . . . . .. 1075Trademarks . . . . . . . . . . . .. 1077Software Update Protocol . . . . . . . .. 1077Programming Interfaces . . . . . . . .. 1077

Bibliography . . . . . . . . . .. 1079

Index . . . . . . . . . . . . .. 1085

vi Parallel ESSL forLinux on Power 5.2: Guide and Reference

Tables

1. Hardware supported by Parallel ESSL . . .. 52. Operating Systems Supported by Parallel ESSL 53. Required Software Products for Parallel ESSL

for Linux on POWER . . . . . . . .. 64. Software needed to display various formats of

online information . . . . . . . . .. 75. List of Level 2 PBLAS . . . . . . . .. 86. List of Level 3 PBLAS . . . . . . . .. 97. List of Dense Linear Algebraic Equation

Subroutines . . . . . . . . . . .. 108. List of Banded Linear Algebraic Equation

Subroutines . . . . . . . . . . .. 129. List of Fortran 90 Sparse Linear Algebraic

Equation Subroutines . . . . . . . .. 1310. List of The Fortran 77 Sparse Linear Algebraic

Equation Subroutines . . . . . . . .. 1411. List of Eigensystem Analysis and Singular

Value Analysis Subroutines . . . . . .. 1512. List of Fourier Transform Subroutines . . .. 1613. List of Random Number Generation

Subroutines . . . . . . . . . . .. 1714. List of Utility Subroutines . . . . . . .. 1815. Six Processes Mapped to a 2 × 3 Process Grid

Using Row-Major Order . . . . . . .. 2116. Six Processes Mapped to a 2 × 3 Process Grid

Using Column-Major Order . . . . . .. 2217. Block Distribution . . . . . . . . .. 2318. Cyclic Distribution . . . . . . . . .. 2319. Block-Cyclic Distribution . . . . . . .. 2420. Inverse Mapping of Block-Cyclic Distribution 2421. Block Distribution Over a 2 by 3 Process Grid 2522. Data Distribution from a Process Point-of-View 2523. Distributed Matrix elements from a Process

Point-of-View . . . . . . . . . . .. 2524. Calling Sequence Arguments for a

Block-Cyclically-Distributed Vector . . . .. 2725. Calling Sequence Arguments for a

Block-Cyclically-Distributed Matrix . . .. 2726. Type-1 Array Descriptor for Block-Cyclically

Distributed Vector or Matrix . . . . . .. 2727. Calling Sequence Arguments for a Distributed

Real Symmetric or Complex Hermitian BandMatrix . . . . . . . . . . . . .. 29

28. Calling Sequence Arguments for GeneralTridiagonal Matrix . . . . . . . . .. 30

29. Calling Sequence Arguments for a SymmetricTridiagonal Matrix . . . . . . . . .. 30

30. Calling Sequence Arguments for a MatrixContaining the Multiple Right-Hand Sides .. 31

31. Type-501 Array Descriptor . . . . . .. 3232. Type-502 Array Descriptor . . . . . .. 3233. Calling Sequence Arguments for the Sparse

Matrix . . . . . . . . . . . . .. 5834. Components of D_SPMAT . . . . . .. 5835. Elements of DESC_A%MATRIX_DATA(_) 59

36. Calling Sequence Arguments for the SparseMatrix . . . . . . . . . . . . .. 59

37. Elements of INFOA() . . . . . . . .. 6038. Elements of DESC_A() . . . . . . . .. 6139. Specifying the Number of MPI Tasks and the

Number of Computational Threads . . .. 7540. Suggested Block Sizes . . . . . . . .. 7741. Input and Output for BLACS_GET. . . .. 8042. A 3 by 4 process grid . . . . . . . .. 8143. 3 by 4 process grid . . . . . . . . .. 8444. 2 by 2 process grid . . . . . . . . .. 8445. 4 by 4 process grid . . . . . . . . .. 8546. 4 by 4 process grid . . . . . . . . .. 8547. Fortran program compile and link commands

for use with MPICH libraries . . . . .. 9348. C program compile and link commands for

use with MPICH libraries . . . . . . .. 9349. C program gcc compile and link commands

for use with MPICH libraries . . . . .. 9450. C++ program compile and link commands for

use with MPICH libraries . . . . . . .. 9551. C++ program g++ compile and link commands

for use with MPICH libraries . . . . .. 9552. Product Package Names on Linux . . . .. 9953. List of Level 2 PBLAS. . . . . . . .. 12754. Data Types . . . . . . . . . . .. 12955. Data Types . . . . . . . . . . .. 15256. Data Types . . . . . . . . . . .. 16657. Data Types . . . . . . . . . . .. 18458. Data Types . . . . . . . . . . .. 19559. Data Types . . . . . . . . . . .. 21060. Data Types . . . . . . . . . . .. 22261. List of Level 3 PBLAS. . . . . . . .. 23562. Data Types . . . . . . . . . . .. 23763. Coding Rules for the Reference Matrix X 24464. Coding Rules for the Reference Matrix X 24565. Data Types . . . . . . . . . . .. 25466. Data Types . . . . . . . . . . .. 27367. Data Types . . . . . . . . . . .. 28568. Data Types . . . . . . . . . . .. 29869. Data Types . . . . . . . . . . .. 31370. Data Types . . . . . . . . . . .. 33371. List of Dense Linear Algebraic Equation

Subroutines . . . . . . . . . . .. 34772. List of Banded Linear Algebraic Equation

Subroutines . . . . . . . . . . .. 34973. List of Fortran 90 Sparse Linear Algebraic

Equation Subroutines . . . . . . . .. 35174. List of The Fortran 77 Sparse Linear Algebraic

Equation Subroutines . . . . . . . .. 35275. Data Types . . . . . . . . . . .. 35476. Data Types . . . . . . . . . . .. 36877. Data Types . . . . . . . . . . .. 37978. Data Types . . . . . . . . . . .. 39179. Data Types . . . . . . . . . . .. 40080. Data Types . . . . . . . . . . .. 409

© Copyright IBM Corp. 1995, 2015 vii

|||||

|||

81. Data Types . . . . . . . . . . .. 41982. Data Types . . . . . . . . . . .. 43383. Data Types . . . . . . . . . . .. 44684. Data Types . . . . . . . . . . .. 45585. Data Types . . . . . . . . . . .. 46686. Data Types . . . . . . . . . . .. 47387. Data Types . . . . . . . . . . .. 48288. Data Types . . . . . . . . . . .. 49189. Data Types . . . . . . . . . . .. 50590. Data Types . . . . . . . . . . .. 51691. Data Types . . . . . . . . . . .. 52992. Type-502 Array Descriptor . . . . . .. 53193. Type-1 Array Descriptor (P × 1 Process Grid) 53194. Type-501 Array Descriptor . . . . . .. 53295. Type-1 Array Descriptor (1 × p Process Grid) 53396. Data Types . . . . . . . . . . .. 54797. Type-502 Array Descriptor . . . . . .. 54998. Type-1 Array Descriptor (p × 1 Process Grid) 54999. Type-501 Array Descriptor . . . . . .. 550

100. Type-1 Array Descriptor (1 × p Process Grid) 550101. Data Types . . . . . . . . . . .. 566102. Type-502 Array Descriptor . . . . . .. 568103. Type-1 Array Descriptor (p × 1 Process Grid) 569104. Type-501 Array Descriptor . . . . . .. 570105. Type-1 Array Descriptor (1 × p Process Grid) 570106. Data Types . . . . . . . . . . .. 587107. Type-502 Array Descriptor . . . . . .. 589108. Type-1 Array Descriptor (p × 1 Process Grid) 589109. Type-501 Array Descriptor . . . . . .. 590110. Type-1 Array Descriptor (1 × p Process Grid) 590111. Data Types . . . . . . . . . . .. 605112. Type-502 Array Descriptor . . . . . .. 606113. Type-1 Array Descriptor (p × 1 Process Grid) 607114. Type-501 Array Descriptor . . . . . .. 607115. Type-1 Array Descriptor (1 × p Process Grid) 608116. Data Types . . . . . . . . . . .. 621117. Type-502 Array Descriptor . . . . . .. 623

118. Type-1 Array Descriptor (p × 1 Process Grid) 623119. Type-501 Array Descriptor . . . . . .. 624120. Type-1 Array Descriptor (1 × p Process Grid) 624121. List of Eigensystem Analysis and Singular

Value Analysis Subroutines . . . . . .. 709122. Data Types . . . . . . . . . . .. 712123. Data Types . . . . . . . . . . .. 733124. Data Types . . . . . . . . . . .. 747125. Data Types . . . . . . . . . . .. 760126. Data Types . . . . . . . . . . .. 786127. Data Types . . . . . . . . . . .. 802128. Data Types . . . . . . . . . . .. 817129. Data Types . . . . . . . . . . .. 826130. Data Types . . . . . . . . . . .. 843131. List of Fourier Transform Subroutines 859132. Fourier Transform subroutines allowing all

lengths between 2 and 1073479680 . . .. 860133. Fourier transform subroutines whose lengths

are limited . . . . . . . . . . .. 860134. Data Types . . . . . . . . . . .. 863135. Data Types . . . . . . . . . . .. 875136. Data Types . . . . . . . . . . .. 884137. Data Types . . . . . . . . . . .. 893138. Data Types . . . . . . . . . . .. 900139. Data Types . . . . . . . . . . .. 905140. Data Types . . . . . . . . . . .. 910141. Data Types . . . . . . . . . . .. 918142. Data Types . . . . . . . . . . .. 924143. List of Random Number Generation

Subroutines . . . . . . . . . . .. 931144. Data Types . . . . . . . . . . .. 933145. List of Utility Subroutines . . . . . .. 939146. Data Types . . . . . . . . . . .. 967147. Data Types . . . . . . . . . . .. 974148. Data Types . . . . . . . . . . .. 983149. Table of Contents for the Sample Thermal

Diffusion Programs . . . . . . . .. 999

viii Parallel ESSL forLinux on Power 5.2: Guide and Reference

About this information

The IBM® Parallel Engineering and Scientific Subroutine Library (Parallel ESSL) isa set of high-performance mathematical subroutines.

This book is a guide and reference manual for use in doing applicationprogramming in Fortran, C, and C++. It includes:v An overview of Parallel ESSL and guidance information for coding and running

your program, as well as using error handlingv Reference information for coding each subroutine calling sequence

This book is meant to be used in conjunction with the ESSL Guide and Reference.Where information is identical between Parallel ESSL and ESSL, such as matrixstorage modes, this book references the appropriate section of the ESSL Guide andReference.

This book is written for a wide class of users: scientists, mathematicians, engineers,statisticians, computer scientists, and system programmers. It assumes a basicknowledge of mathematics, Single Program Multiple Data (SPMD) parallelprocessing concepts and familiarity with Fortran, C, or C++.

How to Find a Subroutine DescriptionIf you want to locate a subroutine description and you know the subroutine name,you can find it listed individually or under the entry “subroutines” in the Index.

Where to Find Related PublicationsParallel ESSL documentation, as well as other related information, can be displayedor downloaded from the Internet at the URL:http://www-01.ibm.com/support/knowledgecenter/SSNR5K/welcome

Related Publications

The related publications or libraries listed at the following Web sites may be usefulto you when using Parallel ESSL.

Product Web site URL

Linux For general information and documentation on Linux:

http://www.tldp.org/

For information about the standard Linux installation procedure using theRPM Package Manager (RPM):

http://www.rpm.org/

For information about IBM-related offerings for Linux:

http://www.ibm.com/linux/

C and C++XL Fortran

http://www.ibm.com/support/knowledgecenter/, under 'Rational' on theleft hand pane.

© Copyright IBM Corp. 1995, 2015 ix

||

|

Product Web site URL

ParallelEnvironment

http://www.ibm.com/support/knowledgecenter/, under the 'ClusterSoftware' on the left hand pane.

NVIDIA http://www.nvidia.com

For information about CUDA, see:

http://developer.nvidia.com/cuda-toolkit

For the CUDA Toolkit Documentation site, see:

http://docs.nvidia.com/cuda/#axzz3VafCSAvr

Using Bibliography ReferencesSpecial references are made throughout this book to mathematical backgroundpublications and software libraries, available through IBM, publishers, or othercompanies. All of these are described in detail in the “Bibliography” on page 1079.A reference to one of these is made by using a number enclosed in square brackets.The number refers to the item listed under that number in the bibliography. Forexample, reference [1] cites the first item listed in the bibliography.

IBM Request for Enhancement (RFE) CommunityThe IBM Requests for Enhancements (RFEs) Community provides an opportunityto collaborate directly with the IBM product development teams and other productusers on RFEs.

You can submit ESSL RFEs at the Servers and Systems Software RFE Community:https://www.ibm.com/developerworks/rfe/?BRAND_ID=352

Special TermsStandard data processing and mathematical terms are used in this book.Terminology is generally consistent with that used for Fortran. See the Glossary formore definitions of terms used in this book.

Distribution: Used to describe the method in which global data structures aredivided among processes. Reference reports may use the term decomposition tomean the same thing.

Global: Used to identify arguments that must have the same value on allprocesses.

Local: Used to identify arguments that may have different values on differentprocesses.

LOCp(): For block-cyclic data distribution, LOCp(M_) represents the number ofrows that a process would receive if M_ was distributed block-cyclically over prows of its process column.

The ScaLAPACK Users' Guide uses LOCr, which is equivalent to LOCp.

LOCq(): LOCq() can be used in three ways:

x Parallel ESSL forLinux on Power 5.2: Guide and Reference

v For block-cyclic data distribution, LOCq(N_) represents the number of columnsthat a process would receive if N_ was distributed block-cyclically over qcolumns of its process row.

v For block-column data distribution, LOCq(n) represents the number of columnsthat a process would receive if n was distributed block over q processes.

v For block-plane data distribution, LOCq(n) represents the number of planes thata process would receive if n was distributed block over q processes.

The ScaLAPACK Users' Guide uses LOCc, which is equivalent to LOCq.

Optional: Indicates an argument does not have to be coded and is assigned adefault value if the argument is not present.

Process: Indicates the logical CPUs identified in the process grid. Referencedreports may also use the terms processor or node to mean the same thing.

Process Grid: Indicates a way to view a parallel machine as a logical one- ortwo-dimensional rectangular grid.

For one-dimensional process grids, the variables p and np are used interchangeablyto indicate the number of processes in a row or column of the process grid.

For two-dimensional process grids, the variables p and nprow are usedinterchangeably to indicate the number of rows in the process grid. The variables qand npcol are used interchangeably to indicate the number of columns in theprocess grid.

Referenced reports or manuals may also use the terms processor mesh, processortemplate, processor shape, or processor grid. These all mean the same thing.

Required: Indicates an argument must be coded in the calling sequence.

Scope: Scope can be used in two ways:1. Refers to the portion of the parallel computer program within which the

definition of an argument remains unchanged. When the scope of an argumentis defined as global, the argument must have the same value on all processes.When the scope of an argument is defined as local, the argument may havedifferent values on different processes.

2. In Appendix A, “BLACS Quick Reference Guide,” on page 993, scope indicatesthe processes that participate in the broadcast and global operations. It canequal 'all', 'row', or 'column'.

Short and Long Precision: Because Parallel ESSL can be used with more than oneprogramming language, the terms short precision and long precision are used inplace of the Fortran terms single precision and double precision.

Subroutines and Subprograms: A subroutine is a named sequence of instructionswithin the Parallel ESSL library, whose execution is invoked by a call. A subroutinecan be called in one or more user programs and at one or more times within eachprogram. The Parallel ESSL subroutines are referred to as subprograms in the areasof Level 2 and 3 Parallel Basic Linear Algebra Subprograms (PBLAS). The termsubprograms is used because it is consistent with the Basic Linear AlgebraSubprograms (BLAS).

About this information xi

How to Interpret Product Names Used in This DocumentParallel ESSL refers to the Parallel Engineering and Scientific Subroutine Libraryproduct.

ESSL refers to the Engineering and Scientific Subroutine Library product.

MPI refers to the Message Passing Interface provided by IBM Parallel EnvironmentRuntime Edition (PE).

Abbreviated NamesThe abbreviated names used in this book are defined as follows.

Short Name Full Name

BLACS Basic Linear Algebra Communication Subprograms

BLAS Basic Linear Algebra Subprograms

CUDA Parallel computing platform and programming modelinvented by NVIDIA

ESSL Engineering and Scientific Subroutine Library

FCA Fabric Collective Accelerator

GPU Graphics processing unit

HTML Hypertext Markup Language

LAPACK Linear Algebra Package

MPI Message Passing Interface

MPICH Implementation of the Message Passing Interface created byArgonne National Laboratory

NLS National Language Support

PDF Portable Document Format

PE Parallel Environment Runtime Edition

PBLAS Parallel Basic Linear Algebra Subprograms

ScaLAPACK Scalable Linear Algebra Package

SMP Symmetric Multi-Processing

SPMD Single Program Multiple Data

US User Space

xCAT Extreme Cloud Administration Toolkit

FontsThis book uses a variety of special fonts to distinguish between manymathematical and programming items. These are defined as follows:

Special Font Example Description

Italic with no subscripts m, incx, uplo A calling sequence argument ormathematical variable

Italic with subscripts x1, aij, yk1, k2 An element of a vector, matrix, orsequence

Bold italic lowercase x, y, z A vector or sequence

xii Parallel ESSL forLinux on Power 5.2: Guide and Reference

Special Font Example Description

Bold italic lowercasewith subscripts

xix:ix+n-1 A vector, with defined bounds

Bold italic uppercase A, B, C A matrix

Bold italic uppercasewith subscripts

Aia:ia+m-1, ja:ja+n-1

Xix:ix+n-1, ja:ja

A submatrix, with defined bounds

A vector (a special form ofsubmatrix), with defined limits

Gothic uppercase A, B, C, AGB

NPROW=2

An array

A Fortran statement

Scalar Data NotationsFollowing are the special notations used in this book for scalar data items. Thesenotations do not imply usage of any precision, short or long.

Data Item Example Description

Character item ’T’ Character(s) in single quotation marks

Logical item .TRUE. .FALSE. True or false logical value, as indicated

Integer data 1 Number with no decimal point

Real data 1.6 Number with a decimal point

Complex data (1.0,-2.9) Real part followed by the imaginary part

Special Characters, Symbols, Expressions, and AbbreviationsThe mathematical and programming notations used in this book are consistentwith traditional mathematical and programming usage. These conventions areexplained in the following table, along with special abbreviations that areassociated with specific values.

Item Description

Greek letters: α, σ, ω, Ω Symbolic scalar values

|a| The absolute value of a

avb The dot product of a and b

xi The i-th element of vector x

cij The element in matrix C at row i and column j

x1 ... xn Elements from x1 to xn

i = 1, n i is assigned the values 1 to n

y←x Vector y is replaced by vector x

xy Vector x times vector y

ak a raised to the k power

ex Exponential function of x

AT; xT The transpose of matrix A; the transpose of vector x

The complex conjugate of vector x; the complex conjugate ofmatrix A

About this information xiii

Item Description

The complex conjugate of the complex vector element xi, where:

The complex conjugate of the complex matrix element cjk

xH; AH The complex conjugate transpose of vector x; the complexconjugate transpose of matrix A

I Identity matrix

The sum of elements x1 to xn

The square root of a+b

{x{2The Euclidean norm of vector x, defined as:

{A{1The one norm of matrix A, defined as:

{A{2The spectral norm of matrix A, defined as:

max{{Ax{2 : {x{2 = 1}

{A{FThe Frobenius norm of matrix A, defined as:

{A{∞The infinity norm of matrix A, defined as:

A-1The inverse of matrix A

A-TThe transpose of A inverse

|A| The determinant of matrix A

m by n matrix A Matrix A has m rows and n columns

sin a The sine of a

cos b The cosine of b

SIGN (a) The sign of a; the result is either + or -

address {a} The storage address of a

xiv Parallel ESSL forLinux on Power 5.2: Guide and Reference

Item Description

size(a, dim) The result equals the number of elements in a along a specifieddimension dim or if dim is not present the total number of arrayelements in a.

max(x) The maximum element in vector x

min(x) The minimum element in vector x

ceiling(x) The smallest integer that is greater than or equal to x

floor(x) The largest integer that is not greater than x

iceil(m,n) The smallest integer that is greater than or equal to m/n; that is,iceil(m,n) = ceiling(m/n)

ilcm(i1,i2) The integer least common multiple of the integers, i1 and i2.

int(x), x > 0 The largest integer that is less than or equal to x

m→(p, i) m is mapped into (p, i)

mod(x, m) x modulo m; the remainder when x is divided by m

∞ Infinity

π Pi, 3.14159265

How to Interpret the Subroutine DescriptionsThis section explains how to interpret the information in the subroutinedescriptions in Part 2 and 3 of this book. Each subroutine description explains thefunction(s) performed by the subroutine(s). It provides a data types table, showinghow the data differs for each subroutine.

SyntaxThis section shows the syntax for the Fortran, C, and C++ calling statements.

Fortran, C, and C++ SyntaxThis section shows the syntax for the Fortran, C, and C++ calling statements.

Language Syntax

Fortran CALL NAME-1 | NAME-2 | ... | NAME-n (arg-1, arg-2, ... , arg-m)

C and C++ name-1 | name-2 | ... | name-n (arg-1, ... , arg-m);

The syntax indicates:v The programming language (Fortran, C, or C++)v Each possible subroutine name that you can code in the calling sequence. Each

name is separated by the | (or) symbol. You specify only one of these names inyour calling sequence. (You do not code the | in the calling sequence.)

v The arguments, listed in the order in which you code them in the callingsequence. You must code them all in your calling sequence.You can distinguish between input arguments and output arguments by lookingat the “On Entry” and “On Return” sections, respectively. An argument used forboth input and output is described in both the “On Entry” and “On Return”sections. In this case, the input value for the argument is overlaid with theoutput value.

About this information xv

Fortran 90 SyntaxThis shows the syntax for the Fortran 90 calling statements.

Syntax for the Fortran 90 calling statements

Fortran 90 Equationsor Cases

CALL NAME (req-1, ... , req-m)

CALL NAME (req-1, ... , req-m, opt-1, ... , opt-l)

The syntax indicates:v The programming language (Fortran 90)v The Parallel ESSL subroutine name, which is a generic name for one or more

functions.v The arguments in the calling sequence.

The first calling sequence shows the arguments required when coding yourprogram. The second calling sequence shows all the arguments, required andoptional. The subroutine assigns a default value for any optional argument thatis not present.You can distinguish between input arguments and output arguments by lookingat the “On Entry” and “On Return” sections, respectively. An argument used forboth input and output is described in both the “On Entry” and “On Return”sections. In this case, the input value for the argument is overlaid with theoutput value.

On EntryThis lists the input arguments, which are the arguments you pass to thesubroutine. Each argument description first gives the meaning of the argument,and then gives the form of data required for the argument. (To help you avoiderrors, output arguments are included, with a reference to the “On Return”section.)

On ReturnThis lists the output arguments, which are the arguments passed back to yourprogram from the subroutine. Each argument description first gives the meaning ofthe argument, and then gives the form of data passed back to your program forthe argument.

Notes and Coding RulesThe notes describe any programming considerations and restrictions that apply tothe arguments or the data for the arguments. There may be references to otherparts of the book for further information.

Error ConditionsThese are all the Parallel ESSL run-time errors that can occur in the subroutine.They are organized under the headings, “Computational Errors”, “Input ArgumentErrors”, “Resource Errors”, “Communications Errors”, and “Miscellaneous Errors”.

ExampleThe two reference sections in this book contain different types of examples.

xvi Parallel ESSL forLinux on Power 5.2: Guide and Reference

Fortran ExamplesThe examples in Part 2, “Reference Information,” on page 125 show how youwould call the subroutine in a Fortran program. Each example includes:v A description of the salient features of the examplev The calling sequence, coded in Fortranv The input and output data distributed across a process grid

How to Send Your CommentsYour feedback is important in helping us to produce accurate, high-qualityinformation. If you have any comments about this information or any otherParallel ESSL documentation, send your comments to the following e-mail address:

[email protected]

Include the publication title and order number, and, if applicable, the specificlocation of the information about which you have comments (for example, a pagenumber or a table number).

About this information xvii

xviii Parallel ESSL forLinux on Power 5.2: Guide and Reference

Summary of changes

The following sections summarize changes to Parallel ESSL and the Parallel ESSLdocumentation for each new release or major service update for a given productversion. Within each book in the library, a vertical line to the left of text andillustrations indicates technical changes or additions made to the previous editionof the book.

Summary of changes for Parallel ESSL for Linux on POWER®,Version 5 Release 2

This release of Parallel ESSL includes the following changes:v Parallel ESSL for Linux, V5.2 now supports the following:

– IBM Power System S822LC (8335-GTA) servers with PCIe Gen3 x16 1-portInfiniBand EDR adapters or PCIe Gen3 x16 2-port InfiniBand EDR adaptersinterconnected using EDR Infiniband switches running Red Hat EnterpriseLinux 7.2, or later (little endian mode).

– Compiling Parallel ESSL applications using the g++ compiler.v Support is not provided with Parallel ESSL for Linux V5.2 for:

– Ubuntu operating system– IBM Power System S824L server Model 42L, IBM Power S822L server Model

22L, IBM Power S821L server Model 21L with Mellanox Connect-IB FDR 2Port 56G x16 PCIe Gen3 Adapters interconnected using Mellanox FDRInfiniband switches.

– Standalone Power8 Servers

If you require the above support, order Parallel ESSL for Linux V5.1 instead.

Summary of changes for Parallel ESSL for Linux on POWER,Version 5 Release 1.0.1

This release of Parallel ESSL includes the following changes:v Parallel ESSL for Linux, V5.1.0.1 now supports IBM Power System S824L server

(8247-42L) with Mellanox Connect-IB FDR 2 Port 56G x16 PCIe Gen3 Adaptersinterconnected using a Mellanox FDR Infiniband Switch. See “HardwareProducts Supported by Parallel ESSL” on page 5 for more information.

v Linking Parallel ESSL applications with the ESSL SMP CUDA Library is nowsupported.

Summary of changes for Parallel ESSL for Linux on POWER,Version 5 Release 1

This release of Parallel ESSL includes the following changes:v Parallel ESSL for Linux, V5.1 supports either:

– The following compute nodes with Mellanox Connect-IB FDR 2 Port 56G x16PCIe Gen3 Adapters interconnected using a Mellanox FDR InfiniBand Switch:- IBM Power® S822L server Model 22L- IBM Power S821L server Model 21L

– Standalone Power8 servers

© Copyright IBM Corp. 1995, 2015 xix

See “Hardware and Software Products That Can Be Used with Parallel ESSL” onpage 5 for more information.

v New Banded Linear Algebraic Equation Subroutines:– PZDTSV - Diagonally-Dominant Complex General Tridiagonal Matrix

Factorization and Solve (see “PDGTSV, PDDTSV, and PZDTSV — GeneralTridiagonal Matrix Factorization and Solve” on page 529)

– PZDTTRF - Diagonally-Dominant Complex General Tridiagonal MatrixFactorization (see “PDGTTRF, PDDTTRF, and PZDTTRF — GeneralTridiagonal Matrix Factorization” on page 547)

– PZDTTRS - Diagonally-Dominant Complex General Tridiagonal Matrix Solve(see “PDGTTRS, PDDTTRS, and PZDTTRS — General Tridiagonal MatrixSolve” on page 566)

v New Eigensystems Analysis Subroutines:– PZHENTRD - Reduce a Complex Hermitian Matrix to Tridiagonal Form (see

“PDSYNTRD, PZHENTRD, PDSYTRD, and PZHETRD — Reduce a RealSymmetric or Complex Hermitian Matrix to Tridiagonal Form” on page 786)

– PZHENGST - Reduce a Complex Hermitian Positive Definite GeneralizedEigenproblem to Standard Form (see “PDSYNGST, PZHENGST, PDSYGST,and PZHEGST — Reduce a Real Symmetric or Complex Hermitian PositiveDefinite Generalized Eigenproblem to Standard Form” on page 802)

v Support is provided for use of C99 complex floating-point types for complexarithmetic in C and C++ applications.

v Support is not provided with Parallel ESSL for Linux V5.1 for:– Red Hat Enterprise Linux 6 operating system– Compute nodes based on POWER7/POWER7+ technology– Mellanox QDR Infiniband switches– 32-bit applications– Parallel ESSL defined or user defined complex floating-point types for

complex arithmetic in C applications

xx Parallel ESSL forLinux on Power 5.2: Guide and Reference

Part 1. Guide Information

The guidance information about how to use Parallel ESSL is organized as follows:v Overview, Requirements, and List of Subroutinesv Distributing Your Datav Coding and Running Your Programv Migrating Your Programv Using Error Handling

© Copyright IBM Corp. 1995, 2015 1

2 Parallel ESSL forLinux on Power 5.2: Guide and Reference

Chapter 1. Overview, Requirements, and List of Subroutines

This introduces you to the IBM Parallel Engineering and Scientific SubroutineLibrary (Parallel ESSL) product.

Overview of Parallel ESSLParallel ESSL is a scalable mathematical subroutine library that supports parallelprocessing applications on clusters of processor nodes optionally connected by ahigh-performance switch. Parallel ESSL supports the Single Program Multiple Data(SPMD) programming model using the Message Passing Interface (MPI) library.

Parallel ESSL provides subroutines in the following computational areas:v Level 2 Parallel Basic Linear Algebra Subprograms (PBLAS)v Level 3 PBLASv Linear Algebraic Equationsv Eigensystem Analysis and Singular Value Analysisv Fourier Transformsv Random Number Generation

For communication, Parallel ESSL includes the Basic Linear AlgebraCommunications Subprograms (BLACS), which use MPI. For computations,Parallel ESSL uses the ESSL subroutines.

The Parallel ESSL subroutines can be called from 64-bit–environment applicationprograms written in Fortran, C, and C++.

The following Parallel ESSL libraries are available:

Parallel ESSL MPICH Libraries These 32-bit integer, 64-bit pointer environment libraries are provided foruse with the IBM Parallel Environment Runtime Edition (PE) MPICHlibrary. You cannot simultaneously call Parallel ESSL from multiple threads.

To order Parallel ESSL product, specify the appropriate program number as listedas follows:

IBM Parallel ESSL for Linux5765-EL5

How Parallel ESSL WorksParallel ESSL (which supports the SPMD programming model) uses MPI forcommunication during parallel processing and runs on clusters of processor nodesoptionally connected by a high-performance switch.

A parallel program, such as yours with calls to the Parallel ESSL subroutines,executes as a number of individual, but related, parallel tasks on a number ofyour system's processor nodes. The group of parallel tasks is called a partition.The parallel tasks of your partition can communicate to exchange data orsynchronize execution.

© Copyright IBM Corp. 1995, 2015 3

Your system may have an optional high-performance switch for communication.The switch increases the speed of communication between nodes. This helps yourapplication program, as well as the Parallel ESSL subroutines, achieve maximumperformance.

Parallel ESSL assumes that the application program is using the SPMDprogramming model, where the programs running the parallel tasks of yourpartition are identical. The tasks, however, work on different sets of data.

Coding Your ProgramThe application developer creates a parallel program's source code, including callsto Parallel ESSL BLACS or MPI routines. These calls enable the parallel processesof your partition to communicate data and coordinate their execution.

Details on what other specific coding additions are required when using ParallelESSL are given in Chapter 3, “Coding and Running Your Program,” on page 75.

Distributing Your DataYour global data structures (vectors, matrices, or sequences) must be distributedacross your processes prior to calling the Parallel ESSL subroutines.

Because data is distributed for both input and output, no implicit bottleneck iscreated by an initial scatter or ending gather operation. Parallel ESSL works in trueSPMD mode, where each process operates only on a portion of the data. Also, theinput and output data may be too large to collectively reside on a single node;therefore, problems associated with the storage limitations of a single processornode are eased by performing the computation in actual SPMD fashion.

See Chapter 2, “Distributing Your Data,” on page 21 for details on distributingyour data.

Running and TestingAfter writing the parallel application program containing calls to the Parallel ESSLsubroutines, the developer then begins a cycle of modification and testing. Theapplication program is run using the following product:v IBM Parallel Environment Runtime Edition (PE)

This product includes a number of compiler scripts, environment variables, andcommand-line flags, which may be used to set up your execution environment.(For example, before you execute a program, you need to set the size of yourpartition—the number of parallel tasks—by setting the appropriate environmentvariables or their command-line flags.)

For further details on PE and its various capabilities, see the PE manuals availableat the URLs listed in “Related Publications” on page ix. For more informationabout MPI, see references [42 on page 1082] and [50 on page 1082].

Tuning for PerformanceOnce the parallel program is debugged, you now want to tune the program foroptimal performance. This is an important step of the process, becauseperformance is the key reason for using the Parallel ESSL subroutines. To tune andanalyze programs with calls to the Parallel ESSL subroutines, you may wish to usethe tools provided by PE. For details, see the PE manuals available at the URLslisted in “Related Publications” on page ix.


Accuracy of the ComputationsParallel ESSL provides accuracy comparable to libraries using equivalentalgorithms with identical precision formats. The data types operated on areANSI/IEEE 64-bit binary floating-point format and 32-bit integer. See theANSI/IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard754–1985 for more detail.

The Fortran Language Interface to the Parallel ESSLSubroutines

The Parallel ESSL subroutines follow standard Fortran calling conventions. WhenParallel ESSL subroutines are called from a C or C++ program, the Fortranconventions must be used. This applies to all aspects of the interface, such as thelinkage conventions and the data conventions. For example, array ordering mustbe consistent with Fortran array ordering techniques. Data and linkage conventionsfor each language are given in the ESSL Guide and Reference.

Hardware and Software Products That Can Be Used with Parallel ESSLThis describes the hardware and software products you can use with Parallel ESSL,as well as those products for installing Parallel ESSL and displaying the onlinedocumentation.

Hardware Products Supported by Parallel ESSLParallel ESSL runs on the following hardware combinations:

Table 1. Hardware supported by Parallel ESSL

Servers and Processors

(Select models and operating systems are supported.) Infiniband Switches

POWER8® See Note

Note: Parallel ESSL for Linux, V5.2 supports the following:v IBM Power System S822LC (8335-GTA) servers with PCIe Gen3 x16 1-port

InfiniBand EDR adapters or PCIe Gen3 x16 2-port InfiniBand EDR adaptersinterconnected using EDR Infiniband switches running Red Hat Enterprise Linux7.2, or later (little endian mode).

Operating Systems Supported by Parallel ESSLTable 2. Operating Systems Supported by Parallel ESSL

ProductSupported Environment

big endian modeSupported Environment

little endian mode

Parallel ESSL for Linux onPOWER

N/A Red Hat Enterprise Linux (RHEL) 7.2 or later(little endian mode) (See Note)

Note: No virtualization is supported - neither PowerVM® nor PowerKVM. RHEL must run as a bare metaloperating system on the supported systems.

Software Products Required by Parallel ESSLParallel ESSL for Linux requires the software products shown in Table 3 on page 6for compiling and running.

Chapter 1. Overview, Requirements, and List of Subroutines 5

|

||||

||

|||||

|||||

|||

|

To assist C and C++ users, two header files are provided. Use of these files isdescribed in “Running Your Program” on page 92.

To assist Fortran 90 sparse linear algebraic equation users, module files areprovided with the Parallel ESSL product. Use of this file is described in “UsingExtrinsic Procedures—The Fortran 90 Sparse Linear Algebraic EquationSubroutines” on page 87.

Required Software Products

The following table lists the required software products for Parallel ESSL for Linuxon POWER:

Table 3. Required Software Products for Parallel ESSL for Linux on POWER

Required Software Products Software Release Level

For Compiling IBM XL Fortran for Linux 15.1.2 or later with the latest service

IBM XL C/C++ for Linux 13.1.2 or later with the latest service

gcc and g++ (See Note 3.)

For Linking,Loading, or Running

(See Note 1.)

IBM XL Fortran Runtime Environment forLinux

(See Note 2.)

15.1.2 or later with the latest service

gcc and g++ 64-bit libraries (See Note 3.)

IBM ESSL for Linux

(See Note 4.)

5.4 with the latest service

CUDA Toolkit

(See Note 5.)

7.5

Parallel Environment Runtime Edition forLinux (PE)

2.3 with the latest service

Notes:

1. Additional software packages may be required for building applications. For details, consult the Linux andcompiler documentation.

2. The correct version of IBM XL Fortran Runtime Environment and Addons Library for Linux is automaticallyshipped with the compiler. It is also available for downloading from the following website:

http://www.ibm.com/support/docview.wss?rs=43&uid=swg21156900

3. Use the libraries and compilers provided with your Linux distribution. The ESSL SMP libraries require the XLOpenMP runtime. The gcc OpenMP runtime is not compatible with the XL OpenMP runtime.

4. ESSL for Linux on POWER must be ordered separately.

5. This product is only required in order to use the ESSL SMP CUDA library.

Thread Safety and Parallel ESSLThe Parallel ESSL SMP libraries are not thread safe; however, they are threadtolerant and can therefore be called from a single thread of a multithreadedapplication. Multiple simultaneous calls to the Parallel ESSL SMP libraries fromdifferent threads of a single process can cause unpredictable results.

Installation and Customization of Parallel ESSLThe Parallel ESSL for Linux on POWER Installation Guide provides the detailedinformation you need to install Parallel ESSL on Linux.


||

||

|||

||

||

||

|

||

|

|

||

|

|

|

|

|

|

|||

|

||

||

|

||

|

||

Software Products Required for Displaying Parallel ESSLDocumentation

The software products needed to display Parallel ESSL online information arelisted in Table 4.

Table 4. Software needed to display various formats of online information

Format of onlineinformation

Software needed

HTML HTML document browser (such as Microsoft Internet Explorer)

PDF Adobe Acrobat Reader, which is freely available for downloadingfrom the Adobe web site at:

http://www.adobe.com

Manpages No additional software needed. To display a specific manpage,use the man command as follows:

v man subroutine-name

Note: These manpages will be installed in the followingdirectory:

On Linux/usr/share/man/man3

In order for manpages to display properly on Linux, theLANG environment variable must be set to either of thefollowing values: C or en_US.iso885915.

The manpages provided by ScaLAPACK are installed in the/usr/share/man/manl directory. By default, Parallel ESSLmanpages will be displayed rather than PBLAS or ScaLAPACKmanpages with the same names. If you want to access thePBLAS or ScaLAPACK manpages, you must set the MANPATHenvironment variable. See the documentation for the mancommand.

BLACS—Usage in Parallel ESSL for CommunicationThe Basic Linear Algebra Communication Subprograms (BLACS) provideease-of-use and portability for message passing in parallel linear algebra programs.The BLACS efficiently support not only point-to-point operations betweenprocesses on a logical two-dimensional process grid, but also collectivecommunications on such grids, or within just a grid row or column (aone-dimensional process grid).

Most communication packages, such as MPI, require an address and a length to besent; therefore, they are classified as having operations based on vectors. Inprogramming linear algebra problems, however, it is preferable to express alloperations in terms of matrices. Vectors and scalars are simply subclasses ofmatrices. The BLACS operate on matrices, as defined by an address, column size,row size, leading dimension, and so forth.

Parallel ESSL includes the following interfaces for the BLACS:v Fortran interface for the BLACSv C interface for the BLACS


A BLACS quick reference guide can be found in Appendix A, “BLACS QuickReference Guide,” on page 993.

An example of the usage of BLACS in a Fortran 90 program is shown in “SamplePrograms and Utilities Provided with Parallel ESSL” on page 997.

The BLACS are documented in references [6 on page 1079], [35 on page 1081], and[36 on page 1081].

List of Parallel ESSL SubroutinesThis provides an overview of the subroutines in each of the areas of Parallel ESSL.

Level 2 PBLASThe Level 2 PBLAS include a subset of the standard set of distributed memoryparallel versions of the Level 2 BLAS.

Note: These subroutines were designed in accordance with the proposed Level 2PBLAS standard. (See references [15 on page 1080], [16 on page 1080], and [18 onpage 1080].) If these subroutines do not comply with the standard as approved,IBM will consider updating them to do so.

If IBM updates these subroutines, the update could require modifications of thecalling application program.

Table 5. List of Level 2 PBLAS

Descriptive Name Long-Precision Subprogram Location

Matrix-Vector Product for a General Matrix or ItsTranspose

PDGEMVPZGEMV

“PDGEMV and PZGEMV— Matrix-Vector Productfor a General Matrix or ItsTranspose” on page 129

Matrix-Vector Product for a Real Symmetric or aComplex Hermitian Matrix

PDSYMVPZHEMV

“PDSYMV and PZHEMV— Matrix-Vector Productfor a Real Symmetric or aComplex HermitianMatrix” on page 152

Rank-One Update of a General Matrix PDGERPZGERCPZGERU

“PDGER, PZGERC, andPZGERU — Rank-OneUpdate of a GeneralMatrix” on page 166

Rank-One Update of a Real Symmetric or a ComplexHermitian Matrix

PDSYRPZHER

“PDSYR and PZHER —Rank-One Update of aReal Symmetric or aComplex HermitianMatrix” on page 184

Rank-Two Update of a Real Symmetric or a ComplexHermitian Matrix

PDSYR2PZHER2

“PDSYR2 and PZHER2 —Rank-Two Update of aReal Symmetric or aComplex HermitianMatrix” on page 195

Matrix-Vector Product for a Triangular Matrix or ItsTranspose

PDTRMVPZTRMV

“PDTRMV and PZTRMV— Matrix-Vector Productfor a Triangular Matrix orIts Transpose” on page210


Table 5. List of Level 2 PBLAS (continued)


Solution of Triangular System of Equations with a SingleRight-Hand Sides

PDTRSVPZTRSV

“PDTRSV and PZTRSV —Solution of TriangularSystem of Equations witha Single Right-Hand Side”on page 222

Level 3 PBLASThe Level 3 PBLAS include a subset of the standard set of distributed memoryparallel versions of the Level 3 BLAS.

Note: These subroutines were designed in accordance with the proposed Level 3PBLAS standard. (See references [15 on page 1080], [16 on page 1080], and [18 onpage 1080].) If these subroutines do not comply with the standard as approved,IBM will consider updating them to do so.


Table 6. List of Level 3 PBLAS


Matrix-Matrix Product for a General Matrix, ItsTranspose, or Its Conjugate Transpose

PDGEMMPZGEMM

“PDGEMM and PZGEMM— Matrix-Matrix Productfor a General Matrix, ItsTranspose, or Its ConjugateTranspose” on page 237

Matrix-Matrix Product Where One Matrix is Real orComplex Symmetric or Complex Hermitian

PDSYMMPZSYMMPZHEMM

“PDSYMM, PZSYMM, andPZHEMM — Matrix-MatrixProduct Where One Matrixis Real or ComplexSymmetric or ComplexHermitian” on page 254

Triangular Matrix-Matrix Product PDTRMMPZTRMM

“PDTRMM and PZTRMM— Triangular Matrix-MatrixProduct” on page 273

Solution of Triangular System of Equations withMultiple Right-Hand Sides

PDTRSMPZTRSM

“PDTRSM and PZTRSM —Solution of TriangularSystem of Equations withMultiple Right-Hand Sides”on page 285

Rank-K Update of a Real or Complex Symmetric or aComplex Hermitian Matrix

PDSYRKPZSYRKPZHERK

“PDSYRK, PZSYRK, andPZHERK — Rank-K Updateof a Real or ComplexSymmetric or a ComplexHermitian Matrix” on page298

Rank-2K Update of a Real or Complex Symmetric ora Complex Hermitian Matrix

PDSYR2KPZSYR2KPZHER2K

“PDSYR2K, PZSYR2K, andPZHER2K — Rank-2KUpdate of a Real orComplex Symmetric or aComplex Hermitian Matrix”on page 313


Table 6. List of Level 3 PBLAS (continued)


Matrix Transpose for a General Matrix PDTRANPZTRANCPZTRANU

“PDTRAN, PZTRANC, andPZTRANU — MatrixTranspose for a GeneralMatrix” on page 333

Linear Algebraic EquationsThese subroutines consist of dense, banded, and sparse subroutines, and include asubset of the ScaLAPACK subroutines.

Note: The dense and banded linear algebraic equations subroutines were designedin accordance with the proposed ScaLAPACK standard. See references [10 on page1080], [17 on page 1080], [19 on page 1080], [28 on page 1081], and [29 on page1081]. If these subroutines do not comply with the standard as approved, IBM willconsider updating them to do so.


Dense Linear Algebraic EquationsThe dense linear algebraic equation subroutines provide:v Solutions to linear systems of equations for real and complex general matrices,

and their transposes, and for positive definite real symmetric and complexHermitian matrices.

v Least squares solutions to linear systems of equations for real and complexgeneral matrices.

v Inverse of real and complex general matrices, of positive definite real symmetricand complex Hermitian matrices, and of real and complex triangular matrices.

v Condition number of real and complex general matrices and of positive definitereal symmetric and complex Hermitian matrices.

Table 7. List of Dense Linear Algebraic Equation Subroutines

Descriptive NameLong-PrecisionSubroutine Location

General Matrix Factorization and Solve PDGESVPZGESV

“PDGESV and PZGESV —General Matrix Factorizationand Solve” on page 354

General Matrix Factorization PDGETRFPZGETRF

“PDGETRF and PZGETRF —General Matrix Factorization”on page 368

General Matrix Solve PDGETRSPZGETRS

“PDGETRS and PZGETRS —General Matrix Solve” on page379

General Matrix Inverse PDGETRIPZGETRI

“PDGETRI and PZGETRI —General Matrix Inverse” onpage 391

Estimate the Reciprocal of the Condition Number of aGeneral Matrix

PDGECONPZGECON

“PDGECON and PZGECON— Estimate the Reciprocal ofthe Condition Number of aGeneral Matrix” on page 400


Table 7. List of Dense Linear Algebraic Equation Subroutines (continued)


General Matrix QR Factorization PDGEQRFPZGEQRF

“PDGEQRF and PZGEQRF —General Matrix QRFactorization” on page 409

General Matrix Least Squares Solution PDGELSPZGELS

“PDGELS and PZGELS —General Matrix Least SquaresSolution” on page 419

Positive Definite Real Symmetric or Complex HermitianMatrix Factorization and Solve

PDPOSVPZPOSV

“PDPOSV and PZPOSV —Positive Definite RealSymmetric or ComplexHermitian Matrix Factorizationand Solve” on page 433

Positive Definite Real Symmetric or Complex HermitianMatrix Factorization

PDPOTRFPZPOTRF

“PDPOTRF and PZPOTRF —Positive Definite RealSymmetric or ComplexHermitian MatrixFactorization” on page 446

Positive Definite Real Symmetric or Complex HermitianMatrix Solve

PDPOTRSPZPOTRS

“PDPOTRS and PZPOTRS —Positive Definite RealSymmetric or ComplexHermitian Matrix Solve” onpage 455

Positive Definite Real Symmetric or Complex HermitianMatrix Inverse

PDPOTRIPZPOTRI

“PDPOTRI and PZPOTRI —Positive Definite RealSymmetric or ComplexHermitian Matrix Inverse” onpage 466

Estimation of the Reciprocal of the Condition Number ofa Positive Definite Real Symmetric or Complex HermitianMatrix

PDPOCONPZPOCON

“PDPOCON and PZPOCON— Estimation of the Reciprocalof the Condition Number of aPositive Definite RealSymmetric or ComplexHermitian Matrix” on page473

Triangular Matrix Inverse PDTRTRIPZTRTRI

“PDTRTRI and PZTRTRI —Triangular Matrix Inverse” onpage 482

Banded Linear Algebraic EquationsThe banded linear algebraic equation subroutines provide solutions to linearsystems of equations for positive definite real symmetric and complex Hermitianband matrices, real general tridiagonal matrices, diagonally-dominant real andcomplex general tridiagonal matrices, and positive definite real symmetric andcomplex Hermitian tridiagonal matrices.


Table 8. List of Banded Linear Algebraic Equation Subroutines

Descriptive NameLong- PrecisionSubroutine Location

Positive Definite Real Symmetric or Complex HermitianBand Matrix Factorization and Solve

PDPBSVPZPBSV

“PDPBSV and PZPBSV —Positive Definite RealSymmetric or ComplexHermitian Band MatrixFactorization and Solve” onpage 491

Positive Definite Real Symmetric or Complex HermitianBand Matrix Factorization

PDPBTRFPZPBTRF

“PDPBTRF and PZPBTRF —Positive Definite RealSymmetric or ComplexHermitian Band MatrixFactorization” on page 505

Positive Definite Real Symmetric or Complex HermitianBand Matrix Solve

PDPBTRSPZPBTRS

“PDPBTRS and PZPBTRS —Positive Definite RealSymmetric or ComplexHermitian Band MatrixSolve” on page 516

General Tridiagonal Matrix Factorization and Solve PDGTSV “PDGTSV, PDDTSV, andPZDTSV — GeneralTridiagonal MatrixFactorization and Solve” onpage 529

General Tridiagonal Matrix Factorization PDGTTRF “PDGTTRF, PDDTTRF, andPZDTTRF — GeneralTridiagonal MatrixFactorization” on page 547

General Tridiagonal Matrix Solve PDGTTRS “PDGTTRS, PDDTTRS, andPZDTTRS — GeneralTridiagonal Matrix Solve” onpage 566

Diagonally-Dominant General Tridiagonal MatrixFactorization and Solve

PDDTSVPZDTSV

“PDGTSV, PDDTSV, andPZDTSV — GeneralTridiagonal MatrixFactorization and Solve” onpage 529

Diagonally-Dominant General Tridiagonal MatrixFactorization

PDDTTRFPZDTTRF

“PDGTTRF, PDDTTRF, andPZDTTRF — GeneralTridiagonal MatrixFactorization” on page 547

Diagonally-Dominant General Tridiagonal Matrix Solve PDDTTRSPZDTTRS

“PDGTTRS, PDDTTRS, andPZDTTRS — GeneralTridiagonal Matrix Solve” onpage 566

Positive Definite Real Symmetric or Complex HermitianTridiagonal Matrix Factorization and Solve

PDPTSVPZPTSV

“PDPTSV and PZPTSV —Positive Definite RealSymmetric or ComplexHermitian Tridiagonal MatrixFactorization and Solve” onpage 587


Table 8. List of Banded Linear Algebraic Equation Subroutines (continued)

Descriptive NameLong- PrecisionSubroutine Location

Positive Definite Real Symmetric or Complex HermitianTridiagonal Matrix Factorization

PDPTTRFPZPTTRF

“PDPTTRF and PZPTTRF —Positive Definite RealSymmetric or ComplexHermitian Tridiagonal MatrixFactorization” on page 605

Positive Definite Real Symmetric or Complex HermitianTridiagonal Matrix Solve

PDPTTRSPZPTTRS

“PDPTTRS and PZPTTRS —Positive Definite RealSymmetric or ComplexHermitian Tridiagonal MatrixSolve” on page 621

Fortran 90 Sparse Linear Algebraic Equation SubroutinesThe Fortran 90 sparse linear algebraic equation subroutines provide solutions tolinear systems of equations for a real general sparse matrix. The sparse utilitysubroutines provided in Parallel ESSL must be used in conjunction with the sparselinear algebraic equation subroutines.

Table 9. List of Fortran 90 Sparse Linear Algebraic Equation Subroutines

Descriptive Name Long-PrecisionSubroutine

Location

Allocates Space for an Array Descriptor for a General SparseMatrix

PADALL “PADALL — Allocates Spacefor an Array Descriptor for aGeneral Sparse Matrix” onpage 640

Allocates Space for a General Sparse Matrix PSPALL “PSPALL — Allocates Spacefor a General Sparse Matrix”on page 642

Allocates Space for a Dense Vector PGEALL “PGEALL — Allocates Spacefor a Dense Vector” on page644

Inserts Local Data into a General Sparse Matrix PSPINS “PSPINS — Inserts LocalData into a General SparseMatrix” on page 646

Inserts Local Data into a Dense Vector PGEINS “PGEINS — Inserts LocalData into a Dense Vector” onpage 650

Assembles a General Sparse Matrix PSPASB “PSPASB — Assembles aGeneral Sparse Matrix” onpage 652

Assembles a Dense Vector PGEASB “PGEASB — Assembles aDense Vector” on page 655

Preconditioner for a General Sparse Matrix PSPGPR “PSPGPR — Preconditionerfor a General Sparse Matrix”on page 657

Iterative Linear System Solver for a General Sparse Matrix PSPGIS “PSPGIS — Iterative LinearSystem Solver for a GeneralSparse Matrix” on page 660

Deallocates Space for a Dense Vector PGEFREE “PGEFREE — DeallocatesSpace for a Dense Vector” onpage 665


Table 9. List of Fortran 90 Sparse Linear Algebraic Equation Subroutines (continued)


Location

Deallocates Space for a General Sparse Matrix PSPFREE “PSPFREE — DeallocatesSpace for a General SparseMatrix” on page 666

Deallocates Space for an Array Descriptor for a GeneralSparse Matrix

PADFREE “PADFREE — DeallocatesSpace for an Array Descriptorfor a General Sparse Matrix”on page 668

Fortran 77 Sparse Linear Algebraic Equation SubroutinesThe Fortran 77 sparse linear algebraic equation subroutines provide solutions tolinear systems of equations for a real general sparse matrix. The sparse utilitysubroutines provided in Parallel ESSL must be used in conjunction with the sparselinear algebraic equation subroutines.

Table 10. List of The Fortran 77 Sparse Linear Algebraic Equation Subroutines


Location

Initializes an Array Descriptor for a General Sparse Matrix PADINIT “PADINIT — Initializesan Array Descriptor for aGeneral Sparse Matrix” onpage 677

Initializes a General Sparse Matrix PDSPINIT “PDSPINIT — Initializes aGeneral Sparse Matrix” onpage 679

Inserts Local Data into a General Sparse Matrix PDSPINS “PDSPINS — InsertsLocal Data into a GeneralSparse Matrix” on page681

Inserts Local Data into a Dense Vector PDGEINS “PDGEINS — InsertsLocal Data into a DenseVector” on page 686

Assembles a General Sparse Matrix PDSPASB “PDSPASB — Assembles aGeneral Sparse Matrix” onpage 689

Assembles a Dense Vector PDGEASB “PDGEASB — Assemblesa Dense Vector” on page693

Preconditioner for a General Sparse Matrix PDSPGPR “PDSPGPR —Preconditioner for aGeneral Sparse Matrix” onpage 695

Iterative Linear System Solver for a General Sparse Matrix PDSPGIS “PDSPGIS — IterativeLinear System Solver for aGeneral Sparse Matrix” onpage 698

Eigensystem Analysis and Singular Value AnalysisThe eigensystems analysis subroutines provide solutions to the algebraic andgeneralized eigensystem analysis problem. The singular value analysis subroutines


provide the singular value decomposition. These subroutines include a subset ofthe ScaLAPACK subroutines. See references [20 on page 1080] and [21 on page1080].

Note: These subroutines were designed in accordance with the proposedScaLAPACK standard. If these subroutines do not comply with the standard asapproved, IBM will consider updating them to do so. If IBM updates thesesubroutines, the update could require modifications of the calling applicationprogram.

Table 11. List of Eigensystem Analysis and Singular Value Analysis Subroutines


Selected Eigenvalues and, Optionally, the Eigenvectors of aReal Symmetric or Complex Hermitian Matrix

PDSYEVXPZHEEVX

“PDSYEVX and PZHEEVX— Selected Eigenvaluesand, Optionally, theEigenvectors of a RealSymmetric or ComplexHermitian Matrix” onpage 712

All Eigenvalues and Eigenvectors of a Real Symmetric orComplex Hermitian Matrix Using a ParallelDivide-and-Conquer Algorithm

PDSYEVDPZHEEVD

“PDSYEVD andPZHEEVD — AllEigenvalues andEigenvectors of a RealSymmetric or ComplexHermitian Matrix Using aParallelDivide-and-ConquerAlgorithm” on page 733

All Eigenvalues and, Optionally, the Eigenvectors of a RealSymmetric or Complex Hermitian Matrix

PDSYEVPZHEEV

“PDSYEV and PZHEEV —All Eigenvalues and,Optionally, theEigenvectors of a RealSymmetric or ComplexHermitian Matrix” onpage 747

Selected Eigenvalues and, Optionally, the Eigenvectors of aReal Symmetric or Complex Hermitian Positive DefiniteGeneralized Eigenproblem

PDSYGVXPZHEGVX

“PDSYGVX andPZHEGVX — SelectedEigenvalues and,Optionally, theEigenvectors of a RealSymmetric or ComplexHermitian PositiveDefinite GeneralizedEigenproblem” on page760

Reduce a Real Symmetric or Complex Hermitian Matrix toTridiagonal Form

PDSYNTRDPZHENTRDPDSYTRDPZHETRD

“PDSYNTRD,PZHENTRD, PDSYTRD,and PZHETRD — Reducea Real Symmetric orComplex HermitianMatrix to TridiagonalForm” on page 786


Table 11. List of Eigensystem Analysis and Singular Value Analysis Subroutines (continued)


Reduce a Real Symmetric or Complex Hermitian PositiveDefinite Generalized Eigenproblem to Standard Form

PDSYNGSTPZHENGSTPDSYGSTPZHEGST

“PDSYNGST, PZHENGST,PDSYGST, and PZHEGST— Reduce a RealSymmetric or ComplexHermitian PositiveDefinite GeneralizedEigenproblem to StandardForm” on page 802

Reduce a General Matrix to Upper Hessenberg Form PDGEHRD “PDGEHRD — Reduce aGeneral Matrix to UpperHessenberg Form” onpage 817

Reduce a General Matrix to Bidiagonal Form PDGEBRDPZGEBRD

“PDGEBRD andPZGEBRD — Reduce aGeneral Matrix toBidiagonal Form” on page826

Singular Value Decomposition of a General Matrix PDGESVDPZGESVD

“PDGESVD andPZGESVD — SingularValue Decomposition of aGeneral Matrix” on page843

Fourier TransformsThe Fourier transform subroutines perform mixed-radix transforms in two andthree dimensions. See references [1 on page 1079] and [3 on page 1079].

Table 12. List of Fourier Transform Subroutines

Descriptive NameShort- PrecisionSubroutine

Long- PrecisionSubroutine Location

Multidimensional Complex Fourier Transforms PSCFTD PDCFTD “PSCFTD andPDCFTD —MultidimensionalComplex FourierTransforms” onpage 863

Multidimensional Real-to-Complex FourierTransforms

PSRCFTD PDRCFTD “PSRCFTD andPDRCFTD —MultidimensionalReal-to-ComplexFourier Transforms”on page 875

Multidimensional Complex-to-Real FourierTransforms

PSCRFTD PDCRFTD “PSCRFTD andPDCRFTD —MultidimensionalComplex-to-RealFourier Transforms”on page 884


Table 12. List of Fourier Transform Subroutines (continued)

Descriptive NameShort- PrecisionSubroutine

Long- PrecisionSubroutine Location

Complex Fourier Transforms in Two Dimensions PSCFT2 PDCFT2 “PSCFT2 andPDCFT2 —Complex FourierTransforms in TwoDimensions” onpage 893

Real-to-Complex Fourier Transforms in TwoDimensions

PSRCFT2 PDRCFT2 “PSRCFT2 andPDRCFT2 —Real-to-ComplexFourier Transformsin TwoDimensions” onpage 900

Complex-to-Real Fourier Transforms in TwoDimensions

PSCRFT2 PDCRFT2 “PSCRFT2 andPDCRFT2 —Complex-to-RealFourier Transformsin TwoDimensions” onpage 905

Complex Fourier Transforms in Three Dimensions PSCFT3 PDCFT3 “PSCFT3 andPDCFT3 —Complex FourierTransforms in ThreeDimensions” onpage 910

Real-to-Complex Fourier Transforms in ThreeDimensions

PSRCFT3 PDRCFT3 “PSRCFT3 andPDRCFT3 —Real-to-ComplexFourier Transformsin ThreeDimensions” onpage 918

Complex-to-Real Fourier Transforms in ThreeDimensions

PSCRFT3 PDCRFT3 “PSCRFT3 andPDCRFT3 —Complex-to-RealFourier Transformsin ThreeDimensions” onpage 924

Random Number GenerationThe random number generation subroutine generates uniformly distributedrandom numbers.

Table 13. List of Random Number Generation Subroutines


Uniform Random Number Generator PDURNG “PDURNG — UniformRandom Number Generator”on page 933


UtilitiesThe utility subroutines perform general service functions that support ParallelESSL.

Table 14. List of Utility Subroutines

Descriptive Name Subprogram Location

Determine the Level of Parallel ESSL Installed on YourSystem

IPESSL “IPESSL — Determine the Level ofParallel ESSL Installed on YourSystem” on page 942

Initialize a Type-1 Array Descriptor with ErrorChecking

DESCINIT “DESCINIT — Initialize a Type-1Array Descriptor with ErrorChecking” on page 944

Initialize a Type-1 Array Descriptor DESCSET “DESCSET — Initialize a Type-1Array Descriptor” on page 947

Compute the Ceiling of the Division of Two Integers ICEIL “ICEIL — Compute the Ceiling ofthe Division of Two Integers” onpage 950

Compute the Least Common Multiple of Two PositiveIntegers

ILCM “ILCM — Compute the LeastCommon Multiple of Two PositiveIntegers” on page 951

Compute the Local Row or Column Index of a GlobalElement of a Block-Cyclically Distributed Matrix

INDXG2L “INDXG2L — Compute the LocalRow or Column Index of a GlobalElement of a Block-CyclicallyDistributed Matrix” on page 952

Compute the Process Row or Column Index of aGlobal Element of a Block-Cyclically DistributedMatrix

INDXG2P “INDXG2P — Compute the ProcessRow or Column Index of a GlobalElement of a Block-CyclicallyDistributed Matrix” on page 954

Compute the Global Row or Column Index of a LocalElement of a Block-Cyclically Distributed Matrix

INDXL2G “INDXL2G — Compute the GlobalRow or Column Index of a LocalElement of a Block-CyclicallyDistributed Matrix” on page 956

Compute the Starting Local Row or Column Indexand Process Row or Column Index of a GlobalElement of a Block-Cyclically Distributed Matrix

INFOG1L “INFOG1L — Compute the StartingLocal Row or Column Index andProcess Row or Column Index of aGlobal Element of aBlock-Cyclically Distributed Matrix”on page 958

Compute the Starting Local Row and Column Indicesand the Process Row and Column Indices of a GlobalElement of a Block-Cyclically Distributed Matrix

INFOG2L “INFOG2L — Compute the StartingLocal Row and Column Indices andthe Process Row and ColumnIndices of a Global Element of aBlock-Cyclically Distributed Matrix”on page 960

Compute the Number of Rows or Columns of aBlock-Cyclically Distributed Matrix Contained in aProcess

NUMROC “NUMROC — Compute theNumber of Rows or Columns of aBlock-Cyclically Distributed MatrixContained in a Process” on page963

General Matrix Norm PDLANGEPZLANGE

“PDLANGE and PZLANGE —General Matrix Norm” on page 967


Table 14. List of Utility Subroutines (continued)

Descriptive Name Subprogram Location

Real Symmetric, Complex Symmetric, or ComplexHermitian Matrix Norm

PDLANSYPZLANSYPZLANHE

“PDLANSY, PZLANSY, andPZLANHE — Real Symmetric,Complex Symmetric, or ComplexHermitian Matrix Norm” on page974

Triangular or Trapezoidal Matrix Norm PDLANTRPZLANTR

“PDLANTR and PZLANTR —Triangular or Trapezoidal MatrixNorm” on page 983


Chapter 2. Distributing Your Data

Before specifying and distributing data for your programs, you need to understandthe general concepts involved in this process.

ConceptsIt is important to understand the general concepts involved in the process of datadistribution, specifically about global data structures, process grids, block andcyclic distribution methods.

About Global Data StructuresBecause the Parallel ESSL subroutines support the SPMD programming model,your global data structures (vectors, matrices, or sequences) must be distributedacross your processes prior to calling the Parallel ESSL subroutines.

Conceptually, global data structures have a defined storage mode consistent withthose used by the serial ESSL library, except for real symmetric and complexHermitian tridiagonal matrices. For Parallel ESSL, you must store real symmetricand complex Hermitian tridiagonal matrices as described in “Block-CyclicallyDistributing a Real Symmetric or Complex Hermitian Tridiagonal Matrix overOne-Dimensional Process Grids” on page 48. For how to store all other datastructures when using Parallel ESSL, you should see the appropriate section in theESSL Guide and Reference. The FFT-packed storage mode is a new storage mode forParallel ESSL and is described in “Specifying Sequences for the FourierTransforms” on page 63.

Global data structures must be mapped to local (distributed memory) datastructures, according to the data distribution technique supported by the ParallelESSL subroutines that you are using. These local data structures are called localarrays.

The data distribution techniques described here apply equally to real and complexdata structures.

About Process GridsA parallel machin

Documents

Parallel ESSL forLinux on Power 5.2: Guide and Reference€¦ · for the Banded Linear Algebraic Equations .. . 29 Distributing Data Str uctur es ..... . 33 Chapter 3. Coding and