26
A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato , Carlo Janna, Giuseppe Gambolati, Flavio Sartoretto Department ICEA Sparse Days 2014 June 5-6

A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

Embed Size (px)

Citation preview

Page 1: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of

linear systems

Massimiliano Ferronato,

Carlo Janna, Giuseppe Gambolati, Flavio Sartoretto

Department ICEA

Sparse Days 2014June 5-6

Page 2: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

Outline

Introduction: preconditioning techniques for high performance computing

Approximate inverse preconditioning for Symmetric Positive Definite matrices: the FSAI-based approach

FSAIPACK: a software package for high performance FSAI preconditioning

Numerical results

Conclusions and future work

Page 3: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

IntroductionPreconditioning techniques for high performance computing

The implementation of large models is becoming quite a popular effort in several applications, with the the use of parallel computational resources almost mandatory

One of the most expensive and memory-consuming tasks in any numerical application is the solution of large and sparse linear systems

Conjugate Gradient-like solution methods can be efficiently implemented on parallel computers provided that an effective parallel preconditioner is available

Algebraic preconditioners: robust algorithms that generate a preconditioner from the knowledge of the system matrix only, independently of the problem it arises from

Most popular and successful classes of preconditioners:

Incomplete LU factorizations Approximate inverses Algebraic multigrid

Page 4: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

IntroductionPreconditioning techniques for high performance computing

For parallel computations the Factorized Sparse Approximate Inverse (FSAI) approach is quite attractive, as it is «naturally» parallel

FSAIPACK: a parallel software package for high performance FSAI preconditioning in the solution of Symmetric Positive Definite linear systems

Collection of routines that implement several different existing methods for computing an FSAI-based preconditioner

Allows for a very flexible user-specified construction of a parallel FSAI preconditioner

General purpose package easy to be included as an external library into any existing code

Currently coded in FORTRAN90 with Open MP directives for shared memory machines

Freely available online at www.dmsa.unipd.it/~janna/software.html

Page 5: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

The FSAI-based approachFSAI definition

Factorized Sparse Approximate Inverse (FSAI): an almost perfectly parallel factored preconditioner for SPD problems [Kolotilina & Yeremin, 1993] :

GGM T 1

with G a lower triangular matrix such that:

minF

GLI

over the set of matrices with a prescribed lower triangular sparsity pattern SL, e.g. the pattern of A or A2, where L is the exact Cholesky factor of A L is not actually required for computing G!

Computed via the solution of n independent small dense systems and applied via matrix-vector products

Nice features: (1) ideally perfect parallel construction and application of the preconditioner; (2) preservation of the positive definiteness of the native matrix

Page 6: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

The FSAI-based approachFSAI definition

The key property for the quality of any FSAI-based parallel preconditioner is the selection of the sparsity pattern SL

Historically, the first idea to build SL is to define it a priori, but more effective strategies can be developed dynamically selecting the position of the non-zero entries in SL

Static FSAI: SL is defined a priori, e.g., as the pattern of Ak, possibly after a sparsification of A [Huckle 1999; Chow 2000, 2001]

Dynamic FSAI: SL is defined dynamically during the computation of G using some optimization algorithm [Huckle 2003; Janna & Ferronato, 2011]

Recurrent FSAI: the FSAI factor G is defined as the product of several factors, computed either statically or dynamically [Wang & Zhang 2003; Bergamaschi & Martinez 2012]

Post-filtration: it is generally recommended to apply an a posteriori sparsification of G dropping the smallest entries [Kolotilina & Yeremin, 1999]

Page 7: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKStatic FSAI construction

FSAIPACK is a software library that collects several different ways for computing an FSAI preconditioner in a shared memory environment and allows for combining the construction techniques into original user-specified strategies

Assuming that SL is given, it is possible to compute G

Static FSAI: denote by Pi the set of column indices belonging to the i-th row of SL

iiLi PmSjijP ,,:

Compute the vector by solving the mi×mi linear system:

imiii g ,~~gg

imiii PPA eg ~,

and scale to obtain the dense i-th row of G:

ig~

ig~

Page 8: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKStatic pattern generation

Static pattern generation: SL is the lower triangular pattern of a power of A or of a sparsified A

,,1 ~

Low 1 iABB ii

with:

jjiiijij aaaa if 0~

AB~

0

and:

User-specified parameters needed: (integer), (real)

The non-zero pattern for the Static FSAI computation can be generated with the aid of the following recurrence

Page 9: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKDynamic FSAI construction

For ill-conditioned problems high values of may be needed to properly decrease the iteration count, or even to allow for convergence, and the preconditioner construction and application can become quite heavy

A most efficient option relies on selecting the pattern dynamically by an adaptive procedure which uses somewhat the “best” available positions for the non-zero coefficients

The Kaporin conditioning number of an SPD matrix is defined as:

nAn

AA 1det

tr

where:

1A and 1A iff n 21

Page 10: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKDynamic FSAI construction

The Kaporin conditioning number of an FSAI preconditioned matrix reads [Janna & Ferronato 2011; Janna et al. 2014] :

n

ii

nT AGAG1

1det

where i depends on the non-zero entries in the i-th row of G:

niiiTiiii

Ti

nTi aiPAPPAiGAiG

11,~2~,~:,

~:,

~ ggg

The scalar i is a quadratic form of A in

Idea fo generating the pattern dynamically: for each row select the non-zero positions in providing the largest decrease in the i value

Compute the gradient of i with respect to and retain the positions containing the largest entries

The procedure can be iterated until either a maximum number of iterations or some exit tolerance is met

ig~

ig~

ig~

Page 11: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKDynamic FSAI construction

Adaptive FSAI: SL is built dynamically and G immediately computed, choosing s entries per step, with a maximum number of kmax steps, into the i-th row such that:

max)1()( ,,1 kkk

iki

until the exit tolerance is achieved:

)0(

)(

i

ki

Dynamic construction of FSAI by an adaptive pattern generation row-by-row:

User-specified parameters needed: kmax (integer), s (integer), (real)

The default initial guess G0 is diag(A)-1/2, but any other user-specified lower triangular matrix is possible

Page 12: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKDynamic FSAI construction

Iterative FSAI: the i-th row of G is computed by minimizing i with an incomplete Steepest Descent method:

)(),(

)(),()(

1 ,:,~

:,~

ki

Tki

ki

Tki

kkikkk A

iGiG

retaining the s largest entries per row for kiter iterations until the exit tolerance is achieved

As i is a quadratic form of A in the i-th row of G, it can be minimized by using a gradient method

This gives rise to an iterative construction of SL and G, another kind of Dynamic FSAI

User-specified parameters needed: kiter (integer), s (integer), (real)

The default initial guess G0 is diag(A)-1/2, but any other user-specified lower triangular matrix is possible

The use of an inner preconditioner M-1 is also allowed

Page 13: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKRecurrent FSAI construction

Recurrent FSAI: the final factor G is obtained as the product of nl factors:

k

n

kGG

l

1

where Gk is the k-level preconditioning factor for:Tkkkk GAGA 111

with A0=A and G0=I. Even if each factor is very sparse and computationally very cheap, the resulting preconditioner is actually very dense and never formed explicitly:

l

k

n

kGG

1

Implicit construction of the sparsity pattern SL, writing the FSAI preconditioner as a product of factors:

Page 14: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

Analysis of the properties of each single method on a structural test case (size = 190,581, no. of non-zeroes: 7,531,389):

Static FSAI

3GpT

Page 15: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

Adaptive FSAI

4GpT

Page 16: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

Iterative FSAI

GpT

Page 17: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

Recurrent FSAI

GpT

Page 18: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

Comparison between the different methods on a Linux Cluster with 24 processors:

G=0.50 G=1.00 G=2.00

Tp [s] # iter. Tp [s] # iter. Tp [s] # iter.

Static 0.20 885 0.84 858 2.68 558

Adaptive 1.24 622 1.96 557 7.47 444

Iterative 1.42 697 2.13 607 3.97 562

Recurrent 2.72 617 6.64 504 13.48 426

The most efficient option is combining the different methods so as to maximize the pros and minimize the cons

FSAIPACK implements all the methods for building a FSAI-based preconditioner following a user-specified strategy that can be prescribed by a pseudo-programming language

Page 19: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

Examples and numerical results (Linux Cluster, 24 processors)

# iter. Tp [s] Ts [s] Tt [s] G

Static (=3, =1e-2) 2245 16.0 101.4 117.4 0.214

Adaptive (kmax=10, s=5, =1e-2) 897 9.5 43.9 53.4 0.323

Iterative (kiter= 20, s=10) 1332 27.8 59.1 86.9 0.213

Static + Adaptive 861 6.2 34.3 40.5 0.270

Iterative + Static + Adaptive 675 9.3 33.8 43.1 0.332

EMILIA (reservoir mechanics): size = 923,136 non-zeroes = 41,005,206

Note: Post-filtration is used anyway

Page 20: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

# iter. Tp [s] Ts [s] Tt [s] G

Static (=4, =1e-2) 736 1.6 24.8 26.4 0.329

Adaptive (kmax=20, s=1, =1e-3) 360 3.4 13.4 16.8 0.476

Iterative (kiter=10, s=10) 1204 5.3 40.9 46.2 0.387

Iterative+Static+S.P. Iterative 191 7.0 7.7 14.7 0.626

Static+S.P. Iterative+Adaptive 220 4.8 8.8 13.6 0.590

STOCF (porous media flow): size = 1,465,137 non-zeroes = 21,005,389

Note: Post-filtration is used anyway

Page 21: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

# iter. Tp [s] Ts [s] Tt [s] G

Static (=3, =1e-2) 2208 23.1 119.4 142.5 0.238

Adaptive (kmax=25, s=2, =1e-3) 681 25.5 40.2 65.7 0.317

Iterative (kiter=30, s=10) 1981 40.0 102.0 142.0 0.187

Static+S.P. Iterative+Adaptive 661 16.5 39.3 55.8 0.305

Iterative+Adaptive 689 14.9 39.8 54.7 0.294

MECH (structural mechanics): size = 1,102,614 non-zeroes = 48,987,558

Note: Post-filtration is used anyway

Page 22: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

Example of strategy prescribed using the pseudo-programming language

> MK_PATTERN [ A : patt ] -t -k 1e-2 2> STATIC_FSAI [ A, patt : F ]> TRANSP_FSAI [ F : Ft ]> PROJ_FSAI [ A, F, Ft : F ] -n -s -e 1 10 1e-8> ADAPT_FSAI [ A : F ] -n -s -e 10 1 1e-3> POST_FILT [ A : F ] -t 0.01> TRANSP_FSAI [ F : Ft ]> APPEND_FSAI [ F, Ft : PREC ]

> MK_PATTERN [ A : patt ] -t -k 1e-2 2> STATIC_FSAI [ A, patt : F ]> TRANSP_FSAI [ F : Ft ]> PROJ_FSAI [ A, F, Ft : F ] -n -s -e 1 10 1e-8> ADAPT_FSAI [ A : F ] -n -s -e 10 1 1e-3> POST_FILT [ A : F ] -t 0.01> TRANSP_FSAI [ F : Ft ]> APPEND_FSAI [ F, Ft : PREC ]

Easy management also of complex

strategies

Page 23: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

FSAIPACKNumerical results

FSAIPACK scalability on the largest example

Test on an IBM-Bluegene/Q node equipped with 16 cores

Between 16 and 64 threads the ideal profile is flat because all physical cores are saturated

Using more threads than cores is convenient as we hide memory access latencies

Page 24: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

ConclusionsResults…

FSAI-based approaches are attractive preconditioners for an efficient solution of SPD linear systems on parallel computers

The traditional static pattern generation is fast and cheap, but can give rise to poor preconditioners

The dynamic pattern generation can improve considerably the FSAI quality, especially in ill-conditioned problems, but its cost typically increases quite rapidly with the density of the preconditioner

FSAIPACK is a high performance software package that has been implemented for building a FSAI-based preconditioner using a user-specified strategy that combines different methods for selecting the sparsity pattern

A smart combination of static and dynamic pattern generation techniques is probably the most efficient way to build an effective preconditioner even for very ill-conditioned problems

Page 25: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

Conclusions… and future work

Generalizing the results also for non-symmetric linear systems: difficulties with existence and uniqueness of the preconditioner, and with an efficient dynamic pattern generation

Implementing the FSAIPACK library also for distributed memory computers and GPU accelerators mixing OpenMP, MPI and CUDA

Studying in more detail the Iterative FSAI construction:

http://www.dmsa.unipd.it/~janna/software.html

FSAIPACK is freely available online at:

Analysis of the theoretical properties of Incomplete gradient methods Replace the Incomplete Steepest Descent method with an

Incomplete Self-Preconditioned Conjugate Gradient method Understand why the pattern is generally good, even though the

computed coefficients could be inaccurate

Page 26: A Factored Sparse Approximate Inverse software package (FSAIPACK) for the parallel preconditioning of linear systems Massimiliano Ferronato, Carlo Janna,

Department ICEA

Thank you for your attention

Sparse Days 2014June 5-6