52
Serial Iterative Methods Parallel Iterative Methods Preconditioning Parallel Numerical Algorithms Chapter 4 – Sparse Linear Systems Section 4.3 – Iterative Methods Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 52

Parallel Numerical Algorithms - Edgar Solomoniksolomonik.cs.illinois.edu/teaching/cs554/slides/slides... · 2019. 10. 30. · Parallel Iterative Methods Preconditioning Parallel Numerical

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Parallel Numerical AlgorithmsChapter 4 – Sparse Linear Systems

    Section 4.3 – Iterative Methods

    Michael T. Heath and Edgar Solomonik

    Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

    CS 554 / CSE 512

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Outline

    1 Serial Iterative MethodsStationary Iterative MethodsKrylov Subspace Methods

    2 Parallel Iterative MethodsPartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    3 PreconditioningSimple PreconditionersDomain DecompositionIncomplete LU

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Iterative Methods for Linear Systems

    Iterative methods for solving linear system Ax = b beginwith initial guess for solution and successively improve ituntil solution is as accurate as desired

    In theory, infinite number of iterations might be required toconverge to exact solution

    In practice, iteration terminates when residual ‖b−Ax‖, orsome other measure of error, is as small as desired

    Iterative methods are especially useful when matrix A issparse because, unlike direct methods, no fill is incurred

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Jacobi Method

    Beginning with initial guess x(0), Jacobi method computesnext iterate by solving for each component of x in terms ofothers

    x(k+1)i =

    (bi −

    ∑j 6=i

    aijx(k)j

    )/aii, i = 1, . . . , n

    If D, L, and U are diagonal, strict lower triangular, andstrict upper triangular portions of A, then Jacobi methodcan be written

    x(k+1) = D−1(b− (L + U)x(k)

    )

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 4 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Jacobi Method

    Jacobi method requires nonzero diagonal entries, whichcan usually be accomplished by permuting rows andcolumns if not already true

    Jacobi method requires duplicate storage for x, since nocomponent can be overwritten until all new values havebeen computed

    Components of new iterate do not depend on each other,so they can be computed simultaneously

    Jacobi method does not always converge, but it isguaranteed to converge under conditions that are oftensatisfied (e.g., if matrix is strictly diagonally dominant),though convergence rate may be very slow

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 5 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Gauss-Seidel Method

    Faster convergence can be achieved by using each newcomponent value as soon as it has been computed ratherthan waiting until next iteration

    This gives Gauss-Seidel method

    x(k+1)i =

    (bi −

    ∑ji

    aijx(k)j

    )/aii

    Using same notation as for Jacobi, Gauss-Seidel methodcan be written

    x(k+1) = (D + L)−1(b−Ux(k)

    )Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 6 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Gauss-Seidel Method

    Gauss-Seidel requires nonzero diagonal entries

    Gauss-Seidel does not require duplicate storage for x,since component values can be overwritten as they arecomputed

    But each component depends on previous ones, so theymust be computed successively

    Gauss-Seidel does not always converge, but it isguaranteed to converge under conditions that aresomewhat weaker than those for Jacobi method (e.g., ifmatrix is symmetric and positive definite)

    Gauss-Seidel converges about twice as fast as Jacobi, butmay still be very slow

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 7 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    SOR Method

    Successive over-relaxation (SOR ) uses step to nextGauss-Seidel iterate as search direction with fixed searchparameter ωSOR computes next iterate as

    x(k+1) = x(k) + ω(x(k+1)GS − x

    (k))

    where x(k+1)GS is next iterate given by Gauss-SeidelEquivalently, next iterate is weighted average of currentiterate and next Gauss-Seidel iterate

    x(k+1) = (1− ω)x(k) + ω x(k+1)GSIf A is symmetric, the SOR can be written as theapplication of a symmetric matrix; this is the SymmetricSuccessive Over-Relaxation method

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 8 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    SOR Method

    ω is fixed relaxation parameter chosen to accelerateconvergence

    ω > 1 gives over-relaxation, while ω < 1 givesunder-relaxation (ω = 1 gives Gauss-Seidel method)

    SOR diverges unless 0 < ω < 2, but choosing optimal ω isdifficult in general except for special classes of matrices

    With optimal value for ω, convergence rate of SOR methodcan be order of magnitude faster than that of Gauss-Seidel

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 9 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Conjugate Gradient Method

    If A is n× n symmetric positive definite matrix, thenquadratic function

    φ(x) = 12xTAx− xTb

    attains minimum precisely when Ax = b

    Optimization methods have form

    xk+1 = xk + α sk

    where α is search parameter chosen to minimize objectivefunction φ(xk + α sk) along sk

    For method of steepest descent, sk = −∇φ(x)

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 10 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Conjugate Gradient Method

    For special case of quadratic problem,

    Negative gradient is residual vector

    −∇φ(x) = b−Ax = r

    Optimal line search parameter is given by

    α = rTk sk/sTkAsk

    Successive search directions can easily beA-orthogonalized by three-term recurrence

    Using these properties, we obtain conjugate gradientmethod (CG ) for linear systems

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 11 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Conjugate Gradient Method

    x0 = initial guessr0 = b−Ax0s0 = r0for k = 0, 1, 2, . . .

    αk = rTk rk/s

    TkAsk

    xk+1 = xk + αkskrk+1 = rk − αkAskβk+1 = r

    Tk+1rk+1/r

    Tk rk

    sk+1 = rk+1 + βk+1skend

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 12 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Conjugate Gradient Method

    Key features that make CG method effective

    Short recurrence determines search directions that areA-orthogonal (conjugate)Error is minimal over space spanned by search directionsgenerated so far

    Minimum error property implies that method producesexact solution after at most n steps

    In practice, rounding error causes loss of orthogonality thatspoils finite termination property, so method is usediteratively

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 13 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Conjugate Gradient Method

    Error is reduced at each iteration by factor of

    (√κ− 1)/(

    √κ+ 1)

    on average, where

    κ = cond(A) = ‖A‖ · ‖A−1‖ = λmax(A)/λmin(A)

    Thus, convergence tends to be rapid if matrix iswell-conditioned, but can be arbitrarily slow if matrix isill-conditioned

    But convergence also depends on clustering ofeigenvalues of A

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 14 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Stationary Iterative MethodsKrylov Subspace Methods

    Nonsymmetric Krylov Subspace Methods

    CG is not directly applicable to nonsymmetric or indefinitesystems

    CG cannot be generalized to nonsymmetric systemswithout sacrificing one of its two key properties (shortrecurrence and minimum error)

    Nevertheless, several generalizations have beendeveloped for solving nonsymmetric systems, includingGMRES, QMR, CGS, BiCG, and Bi-CGSTAB

    These tend to be less robust and require more storagethan CG, but they can still be very useful for solving largenonsymmetric systems

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 15 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Parallel Implementation

    Iterative methods for linear systems are composed of basicoperations such as

    vector updates (saxpy)inner productsmatrix-vector multiplicationsolution of triangular systems

    In parallel implementation, both data and operations arepartitioned across multiple tasks

    In addition to communication required for these basicoperations, necessary convergence test may requireadditional communication (e.g., sum or max reduction)

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 16 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Partitioning of Vectors

    Iterative methods typically require several vectors,including solution x, right-hand side b, residualr = b−Ax, and possibly others

    Even when matrix A is sparse, these vectors are usuallydense

    These dense vectors are typically uniformly partitionedamong p tasks, with given task holding same set ofcomponent indices of each vector

    Thus, vector updates require no communication, whereasinner products of vectors require reductions across tasks,at cost we have already seen

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 17 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Partitioning of Sparse Matrix

    Sparse matrix A can be partitioned among tasks by rows,by columns, or by submatrices

    Partitioning by submatrices may give uneven distribution ofnonzeros among tasks; indeed, some submatrices maycontain no nonzeros at all

    Partitioning by rows or by columns tends to yield moreuniform distribution because sparse matrices typically haveabout same number of nonzeros in each row or column

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 18 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Row Partitioning of Sparse Matrix

    Suppose that each task is assigned n/p rows, yielding ptasks, where for simplicity we assume that p divides n

    In dense matrix-vector multiplication, since each task ownsonly n/p components of vector operand, communication isrequired to obtain remaining components

    If matrix is sparse, however, few components may actuallybe needed, and these should preferably be stored inneighboring tasks

    Assignment of rows to tasks by contiguous blocks orcyclically would not, in general, result in desired proximityof vector components

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 19 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Graph Partitioning

    Desired data locality can be achieved by partitioning graphof matrix, or partitioning underlying grid or mesh for finitedifference or finite element problem

    For example, graph can be partitioned into p pieces bynested dissection, and vector components correspondingto nodes in each resulting piece assigned to same task,with neighboring pieces assigned to neighboring tasks

    Then matrix-vector product requires relatively littlecommunication, and only between neighboring tasks

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 20 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Two-Dimensional Partitioning

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 21 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Sparse MatVec with 2-D Partitioning

    Partition entries of both x and y across processorsPartition entries of A accordingly

    (a) Send entries xj to processors with nonzero aij for some i

    (b) Local multiply-add: yi = yi + aijxj(c) Send partial inner products to relevant processors

    (d) Local sum: sum partial inner products

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 22 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Sparse MatVec with 2-D Partitioning

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 23 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Graph Partitioning Methods Types

    Coordinate-basedCoordinate bisection

    Inertial

    Geometric

    Multilevel

    Coordinate-free

    Level structure

    Spectral

    Combinatorial refinement (e.g., Kernighan-Lin)

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 24 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Graph Partitioning Software

    ChacoJostleMeshpartMetis/ParMetisMondriaanPartyScotchZoltan

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 25 / 52

    http://www.cs.sandia.gov/~bahendr/chaco.htmlhttp://staffweb.cms.gre.ac.uk/~c.walshaw/jostle/http://www.cerfacs.fr/algor/Softs/MESHPART/http://glaros.dtc.umn.edu/gkhome/views/metishttp://www.math.uu.nl/people/bisseling/Mondriaan/mondriaan.htmlhttp://wwwcs.uni-paderborn.de/fachbereich/AG/monien/RESEARCH/PART/party.htmlhttp://www.labri.fr/perso/pelegrin/scotch/http://www.cs.sandia.gov/Zoltan/

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Coordinate-based partitioning

    Assume graph embedded in d-dimensional space, so wehave coordinates for nodesFind centerpoints, every hyperplane that includes one is agood partition

    G. Miller, S. Teng, W. Thurston, S. Vavasis, 1997Possible to find vertex separators of size O(n(d−1)/d) formeshes with constant aspect ratio – max relative distanceof edges in space

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 26 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Coordinate-free partitioning

    Partitioning is hard for arbitrary graphsLevel partitioning – breadth-first-search (BFS) tree levelsMaximal k-independent sets – select vertices separated bydistance k and create partitions by combining vertices withnearest vertex in k-independent setSpectral partitioning looks at eigenvalues of Laplacianmatrix L of a graph G = (V,E),

    lij =

    i = j : degree(V (i))(i, j) ∈ E : −1(i, j) /∈ E : 0

    the eigenvector of L with the second smallest eigenvalue(the Fiedler vector) provides a good partition of G

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 27 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Parallel Jacobi

    We have already seen example of this approach withJacobi method for 1-D Laplace equation

    Contiguous groups of variables are assigned to each task,so most communication is internal, and externalcommunication is limited to nearest neighbors in 1-D mesh

    More generally, Jacobi method usually parallelizes well ifunderlying grid is partitioned in this manner, since allcomponents of x can be updated simultaneously

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 28 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Parallel Gauss-Seidel and SOR

    Unfortunately, Gauss-Seidel and SOR methods requiresuccessive updating of solution components in given order(in effect, solving triangular system), rather than permittingsimultaneous updating as in Jacobi method

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 29 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Row-Wise Ordering for 2-D Grid

    G (A)

    5 6 7 8

    1 2 3 4

    9 10 11 12

    13 14 15 16

    A

    ××××

    ××× ×××

    ××××

    ××× ×××

    ××××

    ××× ×××

    ××××

    ××× ×××

    × ××

    ×× ×

    ×

    ×× ×

    ×

    ×

    × ××

    ×× ×

    ×

    ×× ×

    ×

    ×

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 30 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Red-Black Ordering

    Apparent sequential order can be broken, however, ifcomponents are reordered according to coloring ofunderlying graph

    For 5-point discretization on square grid, for example, coloralternate nodes in each dimension red and others black,giving color pattern of chess or checker board

    Then all red nodes can be updated simultaneously, as canall black nodes, so algorithm proceeds in alternatingphases, first updating all nodes of one color, then those ofother color, repeating until convergence

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 31 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Red-Black Ordering for 2-D Grid

    G (A)

    11 3 12 4

    1 9 2 10

    5 13 6 14

    15 7 16 8

    A

    ×××

    ×

    ×××

    ×

    ×××

    ×

    ×××

    ×

    ××× ×

    ×× ××××

    ×

    ×

    ××

    ×

    ××

    ×

    ×

    ×× ×

    ×

    ×

    ×××××× × ×

    ××

    ×

    ×

    ×××

    ××

    ×

    ×

    ×× ×

    ××

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 32 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Multicolor Orderings

    More generally, arbitrary graph requires more colors, sothere is one phase per color in parallel algorithm

    Nodes must also be partitioned among tasks, and loadshould be balanced for each color

    Reordering nodes may affect convergence rate, however,so gain in parallel performance may be offset somewhat byslower convergence rate

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 33 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Sparse Triangular Systems

    More generally, multicolor ordering of graph of matrixenhances parallel performance of sparse triangularsolution by identifying sets of solution components that canbe computed simultaneously (rather than in usualsequential order for triangular solution)

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 34 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Parallel 1D 2-pt Stencil

    Normally, halo exchange done before every stencil application

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 35 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    In-time Blocking

    Instead apply stencil repeatedly before larger halo exchange

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 36 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    In-time Blocking

    Instead apply stencil repeatedly before larger halo exchange

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 37 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    In-time Blocking

    Instead apply stencil repeatedly before larger halo exchange

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 38 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    In-time Blocking

    Instead apply stencil repeatedly before larger halo exchange

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 39 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Analysis of in-time blocking for 1D mesh

    For 1-D 2-pt stencil (3-pt stencil similar)

    Consider t steps, and execute s without messages

    Bring down latency cost by a factor of s, for t = Θ(n), weimprove latency cost from Θ(αn) to Θ(αp) with s = n/p

    Also improves flop-to-byte ratio of local subcomputations

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 40 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Analysis of in-time blocking for 2-D mesh

    For 2-D mesh, there is more complexity

    Consider t steps and execute s ≤√n/p without msgs

    Need to do asymptotically more computation andinterprocessor communication if s >

    √n/p

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 41 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    PartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    Asynchronous or Chaotic Relaxation

    Using updated values for solution components inGauss-Seidel and SOR methods improves convergencerate, but limits parallelism and requires synchronization

    Alternatively, in computing next iterate, each processorcould use most recent value it has for each solutioncomponent, rather than waiting for latest value on anyprocessor

    This approach, sometimes called asynchronous or chaoticrelaxation, can be effective, but stochastic behaviorcomplicates analysis of convergence and convergence rate

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 42 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Simple PreconditionersDomain DecompositionIncomplete LU

    Preconditioning

    Convergence rate of iterative methods depends oncondition number and can often be substantiallyaccelerated by preconditioning

    Apply method to M−1A, where M is chosen so thatM−1A is better conditioned than A, and systems of formMz = Ay are easily solved for z

    Typically, M is (block)-diagonal or triangular

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 43 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Simple PreconditionersDomain DecompositionIncomplete LU

    Basic Preconditioners: M for M−1A

    Polynomial (most commonly Chebyshev)

    M−1 = poly(A)

    neither M nor M−1 explicitly formed, latter applied bymultiple SpMVs

    Diagonal (Jacobi) (diagonal scaling)

    M = diag(d)

    sometime can use d = diag(A), easy but ineffective

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 44 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Simple PreconditionersDomain DecompositionIncomplete LU

    Block Preconditioners: M for M−1AConsider partitioning

    A =

    [B EF C

    ]=

    B1 E1. . . . . .

    Bp EpF1 C11 . . . C1p

    . . ....

    . . ....

    Fp Cp1 . . . Cpp

    Block-diagonal (domain decomposition)

    M =

    [B

    I

    ]so M−1A

    [y1y2

    ]=

    [y1 + B

    −1Ey2Fy1 + Cy2

    ]iterative methods with M−1A can compute each B−1i inparallel to get and apply B−1

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 45 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Simple PreconditionersDomain DecompositionIncomplete LU

    Sparse Preconditioners: M for M−1A

    Incomplete LU (ILU) computes an approximate LUfactorization, ignoring fill entries throughout regular sparse LU

    Let S ∈ N2 to be a sparsity mask and compute L,U on SLevel-0 ILU factorization

    ILU[0] : (i, j) ∈ S if and only if aij 6= 0

    Given [L(0),U (0)]← ILU[0](A), our preconditioner will beM = L(0)U (0) ≈ ALevel-1 ILU factorization

    ILU[1] : (i, j) ∈ S if ∃k, l(0)ik u(0)kj 6= 0 or aij 6= 0

    (generate fill only if updates are from non-newly-filledentries)

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 46 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    Simple PreconditionersDomain DecompositionIncomplete LU

    Parallel Incomplete LU

    When the number of nonzeros per row is small, computingILU[0] is as hard as triangular solves with the factors

    Elimination tree is given by spanning tree of original graph,

    filled graph = original graph

    No need for symbolic factorization and lessermemory-usage

    However, no general accuracy guarantees

    ILU can be approximated iteratively with high concurrency(see Chow and Patel, 2015)

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 47 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    References – Iterative Methods

    R. Barrett, M. Berry, T. Chan, J. Demmel, J. Donato, J. Dongarra,V. Eijkhout, R. Pozo, C. Romine and H. van der Vorst, Templates for theSolution of Linear Systems: Building Blocks for Iterative Methods,SIAM, 1994

    A. Greenbaum, Iterative Methods for Solving Linear Systems, SIAM,1997

    Y. Saad, Iterative Methods for Sparse Linear Systems, 2nd ed., SIAM,2003

    H. A. van der Vorst, Iterative Krylov Methods for Large Linear Systems,Cambridge University Press, 2003

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 48 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    References – Parallel Iterative Methods

    J. W. Demmel, M. T. Heath, and H. A. van der Vorst, Parallel numericallinear algebra, Acta Numerica 2:111-197, 1993

    J. J. Dongarra, I. S. Duff, D. C. Sorenson, and H. A. van der Vorst,Numerical Linear Algebra for High-Performance Computers, SIAM,1998

    I. S. Duff and H. A. van der Vorst, Developments and trends in theparallel solution of linear systems, Parallel Computing 25:1931-1970,1999

    M. T. Jones and P. E. Plassmann, The efficient parallel iterative solutionof large sparse linear systems, A. George, J. R. Gilbert, and J. Liu, eds.,Graph Theory and Sparse Matrix Computation, Springer-Verlag, 1993,pp. 229-245

    H. A. van der Vorst and T. F. Chan, Linear system solvers: sparseiterative methods, D. E. Keyes, A. Sameh, and V. Venkatakrishnan,eds., Parallel Numerical Algorithms, pp. 91-118, Kluwer, 1997

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 49 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    References – Preconditioning

    T. F. Chan and H. A. van der Vorst, Approximate and incompletefactorizations, D. E. Keyes, A. Sameh, and V. Venkatakrishnan, eds.,Parallel Numerical Algorithms, pp. 167-202, Kluwer, 1997

    M. J. Grote and T. Huckle, Parallel preconditioning with sparseapproximate inverses, SIAM J. Sci. Comput. 18:838-853, 1997

    Y. Saad, Highly parallel preconditioners for general sparse matrices,G. Golub, A. Greenbaum, and M. Luskin, eds., Recent Advances inIterative Methods, pp. 165-199, Springer-Verlag, 1994

    H. A. van der Vorst, High performance preconditioning, SIAM J. Sci.Stat. Comput. 10:1174-1185, 1989

    E. Chow and A. Patel, Fine-grained parallel incomplete LU factorization,SIAM journal on Scientific Computing, 37(2), pp.C169-C193, 2015.

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 50 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    References – Graph Partitioning

    B. Hendrickson and T. Kolda, Graph partitioning models for parallelcomputing, Parallel Computing, 26:1519-1534, 2000.

    G. Karypis and V. Kumar, A fast and high quality multilevel scheme forpartitioning irregular graphs, SIAM J. Sci. Comput. 20:359-392, 1999

    G. L. Miller, S.-H. Teng, W. Thurston, and S. A. Vavasis, Automatic meshpartitioning, A. George, J. R. Gilbert, and J. Liu, eds., Graph Theoryand Sparse Matrix Computation, Springer-Verlag, 1993, pp. 57-84

    A. Pothen, Graph partitioning algorithms with applications to scientificcomputing, D. E. Keyes, A. Sameh, and V. Venkatakrishnan, eds.,Parallel Numerical Algorithms, pp. 323-368, Kluwer, 1997

    C. Walshaw and M. Cross, Parallel optimisation algorithms for multilevelmesh partitioning, Parallel Comput. 26:1635-1660, 2000

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 51 / 52

  • Serial Iterative MethodsParallel Iterative Methods

    Preconditioning

    References – Chaotic Relaxation

    R. Barlow and D. Evans, Synchronous and asynchronous iterativeparallel algorithms for linear systems, Comput. J. 25:56-60, 1982

    R. Bru, L. Elsner, and M. Newmann, Models of parallel chaotic iterationmethods, Linear Algebra Appl. 103:175-192, 1988

    G. M. Baudet, Asynchronous iterative methods for multiprocessors, J.ACM 25:226-244, 1978

    D. Chazan and W. L. Miranker, Chaotic relaxation, Linear Algebra Appl.2:199-222, 1969

    A. Frommer and D. B. Szyld, On asynchronous iterations, J. Comput.Appl. Math. 123:201-216, 2000

    Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 52 / 52

    Serial Iterative MethodsStationary Iterative MethodsKrylov Subspace Methods

    Parallel Iterative MethodsPartitioningOrderingCommunication-Avoiding Iterative MethodsChaotic Relaxation

    PreconditioningSimple PreconditionersDomain DecompositionIncomplete LU