24
1 Linear Triangular System b Lx L – lower triangular matrix, nonsingular 22 1 21 2 2 11 1 1 2 1 2 1 22 21 11 / ) ( / 0 a x a b x a b x b b x x a a a Lx=b L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1) For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end Forward substitution, row version

Linear Triangular System

  • Upload
    erik

  • View
    68

  • Download
    0

Embed Size (px)

DESCRIPTION

Linear Triangular System. L – lower triangular matrix, nonsingular. Lx=b L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1) For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end. Forward substitution, row version. Triangular System. - PowerPoint PPT Presentation

Citation preview

  • Linear Triangular SystemL lower triangular matrix, nonsingularLx=bL: nxn nonsingular lower triangularb: known vector

    b(1) = b(1)/L(1,1)For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i)endForward substitution, row version

  • Triangular SystemColumn version (column sweep method): As soon as a variable is solved, its effect can be subtracted from subsequent equationsLx = b

    for j=1:n-1 b(j) = b(j)/L(j,j) b(j+1:n) = b(j+1:n)-b(j)L(j+1:n,j)endb(n) = b(n)/L(n,n)Forward substitution, column versionColumn version is more amenable to parallel computing

  • Triangular System: ParallelLbAs soon as x_i (or a few x_i variables) is computed, the value is passed downward to neighboring cpus;As soon as a cpu receives x_i value, it passes the value downward to neighboring cpus;Then local b vector is updated.

    Disadvantage: load imbalance, about 50% cpus are active on averageRemedy: cyclic or block cyclic distribution of rows.LbblockBlock cyclic

  • Triangular System: InversionA NxN lower triangularDivide A into equal blocks

    Can inverse A recursively:Inverse A1;Inverse A2;Compute X by matrix multiplication Matrix multiplication

  • Triangular System: InversionFirst phase: invert diagonal elements of A2nd phase: compute 2x2 diagonal blocks of A^(-1)K-th phase: compute diagonal 2^(k-1) x 2^(k-1) blocks of A^(-1)

    Essentially matrix multiplications;K-th phase: N/2^(k-1) pairs of 2^(k-2)x2^(k-2) matrix multiplications

    Can do in parallel on P=K^3 processors

  • Gaussian EliminationAx = b

    A = LU, L unit lower triangular U upper triangular

    Ax = b LUx = b Ly = b, Ux = y

    Especially with multiple rhs or solve same equations (same coefficient matrix) many times

  • LU FactorizationA = LUA nxn matrixA(1:k,1:k) nonsingular for k=1:n-1

    (kij) versionFor k=1:n-1 for i=k+1:n A(i,k) = A(i,k)/A(k,k) for j=k+1:n A(i,j) = A(i,j) A(i,k)A(k,j) end endend

    orFor k=1:n-1 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)- A(k+1:n,k)A(k,k+1:n)endAfter factorization, L is in strictly lower triangular part of A, U is in upper triangular part of A (including diagonal)A(k,k) is the pivot

  • Factorization BreakdownIf A(k,k)=0 at any stage breakdown, LU factorization may not exist even if A is nonsingular

    Theorem:Assume A is nxn matrix. (1) A has an LU factorization if A(1:k,1:k) is non-singular for all k=1:n-1.(2) If the LU factorization exists and A is non-singular, then the LU factorization is unique.

    Avoid method breakdown pivotingPivoting is also necessary to improve accuracy. Small pivot increased errorsMake sure no large entries appear in L or U. Use large pivots.

  • Block LU FactorizationA nxn matrix, n = r*NA11 rxr matrix, A22 (n-r)x(n-r) matrix, A12 rx(n-r) matrix, A21 (n-r)xr

    A11 = L11*U11A12 = L11*U12 U12A21 = L21*U11 L21A22 = L21*U12+L22*U22 A22-L21*U12 = A = L22*U22

    LU factorization iteratively

  • Block LU FactorizationA nxn matrixA(1:k,1:k) is non-singular for k=1:n-11
  • Permutation MatrixPermutation matrix: identity matrix with its rows re-ordered.p = [4 1 3 2] encodes permutation matrix P

    p(k) is the column index of the 1 in k-th rowPA: row-permuted version of AAP: column-permuted version of AInterchange permutation matrix: identity matrix with two rows swappedRow 1 and 4 swapped

    EA: swap rows 1 and 4 of AAE: swap columns 1 and 4 of A

  • Permutation MatrixA permutation matrix can be expressed as a series of row interchanges:If E_{k} is the interchange permutation matrix with rows k and p(k) interchanged,Then P can be encoded by vector p(1:n).

    If x(1:n) is a vector, then Px can be computed using p(1:n)

    For k=1:n swap x(k) and x(p(k))End

    p(1:n) vector is useful for pivoting

  • Partial PivotingPivoting is crucial to preventing breakdown and improving accuracy

    Partial pivoting: choose largest element in a column (or row) and interchange rows (columns)Swap rows 1 and 3Swap rows 2 and 3

  • LU Factorization with Row Partial PivotingA nxn matrixAfter factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting

    Algorithm F2:For k=1:n-1 Determine s with k

  • How to Use Factorized ASolve Ax = bUsing LU factorization of row partial pivoting

    Need to swap elements of b according to partial pivoting information in p(1:n-1)

    Assume A is LU factorized with row partial pivoting using algorithm F2:

    For k=1:n-1 swap b(k) and b(p(k))EndSolve Ly = bSolve Ux = y

    L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)

  • LU Factorization With Row Partial PivotingA nxn matrixAfter factorization, strictly lower triangular part of A contains multipliers; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting

    Algorithm F1:For k=1:n-1 Determine s with k

  • How to Use Factorized ASolve Ax = bUsing LU factorization of partial pivoting

    Need to swap elements of b according to partial pivoting information in p(1:n-1)Need to multiply appropriate coefficients information in lower triangular part of A

    Assume A is LU factorized with partial pivoting using algorithm F1:

    For k=1:n-1 swap b(k) and b(p(k)) b(k+1:n) = b(k+1:n) b(k)A(k+1:n,k)EndSolve Ux = b

    U - upper triangular part of A (including diagonal)

  • Column Partial PivotingColumn partial pivoting: search row k for the largest element, exchange that columnwith column k.

    A nxn matrixAfter factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting

    Algorithm G:For k=1:n-1 Determine s with k

  • How to Use Factorized ASolve Ax = bUsing LU factorization with column partial pivoting

    Need to swap elements of x according to partial pivoting information in p(1:n-1)

    Assume A is LU factorized with column partial pivoting using algorithm G:

    Solve Ly = bSolve Ux = yFor k=n-1:-1:1 swap x(k) and x(p(k))end

    L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)

  • Complete PivotingComplete pivoting: the largest element in submatrix A(k:n,k:n) is permuted into (k,k) as the pivot

    Need a row interchange and a column interchange

    A nxn matrixp(1:n-1) vector encoding row interchangesq(1:n-1) vector encoding column interchanges

    After factorization, lower triangular part of A contains L, upper triangular part of A contains U (including diagonal)

  • LU Factorization with Complete PivotingLU factorization with complete pivoting

    For k=1:n-1 Determine s (k

  • How to Use Factorized ASolve Ax = b By LU factorization with complete pivoting

    Suppose A is LU factorized with complete pivoting, p(1:n-1) and q(1:n-1) are permutation encoding vectors

    for k=1:n-1 swap b(k) and b(p(k))EndSolve Ly = b for ySolve Ux = y for xFor k=n-1:-1:1 swap x(k) and x(q(k))End

    L and U are lower and upper triangular parts of factorized A

  • Parallelization of Gaussian EliminationA(k,k)Row-wise 1D block decomposition

    At step k, the processor holding the pivot sends row k: A(k,k:n) to bottom neighboring processor;At each processor, forward data immediately to bottom neighbor upon receiving data from top processor; then update its own data; then wait for data from top neighbor

    Disadvantage: load imbalanceRemedy: row-wise block cyclic distribution

  • Parallelization with Partial PivotingRow-wise block/block-cyclic decomposition

    Gaussian elimination with column partial pivotingMore difficult with row partial pivoting

    Pivoting search on the processor holding row k, no communication among processors;

    Column index of the new pivot element together with row k: A(k:n) need to be sent out;

    On each processor, upon receiving data from top neighbor, forward immediately to bottom neighbor, and swap column k and new pivot column of own data; update own data; wait data from top neighbor;