48
Scientific Computing II Conjugate Gradient Methods Michael Bader Technical University of Munich Summer 2016

Conjugate Gradient Methods - TUM...Quadratic Forms A quadratic form is a scalar, quadratic function of a vector of the form: f(x) = 1 2 xT Ax bT x +c; where A = AT 4-4 0 2-2 40 0 0-2

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

  • Scientific Computing II

    Conjugate Gradient Methods

    Michael BaderTechnical University of Munich

    Summer 2016

  • Families of Iterative Solvers

    • relaxation methods:• Jacobi-, Gauss-Seidel-Relaxation, . . .• Over-Relaxation-Methods

    • Krylov methods:• Steepest Descent, Conjugate Gradient, . . .• GMRES, . . .

    • Multilevel/Multigrid methods,Domain Decomposition, . . .

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 2

  • Remember: The Residual Equation

    • for Ax = b, we defined the residual as:

    r (i) = b − Ax (i)

    • and the error: e(i) := x − x (i)

    • leads to the residual equation:

    Ae(i) = r (i)

    • relaxation methods: solve a modified (easier) SLE:

    B ê(i) = r (i) where B ∼ A

    • multigrid methods: coarse-grid correction on residual equation

    AHe(i)H = r

    (i)H and x

    (i+1) := x (i) + IhHe(i)H

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 3

  • Part I

    Quadratic Forms and SteepestDescent

    Quadratic FormsDirection of Steepest DescentSteepest Descent

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 4

  • Quadratic FormsA quadratic form is a scalar, quadratic function of a vector of the form:

    f (x) =12

    xT Ax − bT x + c, where A = AT

    4-4

    02

    -2

    40

    00-2

    80

    y2x-4

    120

    4-66

    160-2

    -2

    xy

    02

    -6

    4

    4

    -4

    60

    2

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 5

  • Quadratic Forms (2)

    The gradient of a quadratic form is defined as

    f ′(x) =

    ∂∂x1

    f (x)...

    ∂∂xn

    f (x)

    • apply to f (x) = 12 x

    T Ax − bT x + c, then• f ′(x) = Ax − b• f ′(x) = 0 ⇔ Ax − b = 0 ⇔ Ax = b

    ⇒ Ax = b equivalent to a minimisation problem⇒ proper minimum only if A positive definite

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 6

  • Direction of Steepest Descent

    • gradient f ′(x): direction of “steepest ascent”• f ′(x) = Ax − b = −r (with residual r = b − Ax)• residual r : direction of “steepest descent”

    -2

    y

    6x

    4

    2

    -4

    2

    -6

    -2

    00 4

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 7

  • Solving SLE via Minimum Search

    • basic idea to find minimum:move into direction of steepest descent

    • most simple scheme:x (i+1) = x (i) + αr (i)

    • α constant⇒ Richardson iteration(usually considered as a relaxation method)

    • better choice of α:move to lowest point in that direction⇒ Steepest Descent

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 8

  • Steepest Descent – find an optimal α

    • task: line search along the line x (1) = x (0) + αr (0)

    • choose α such that f (x (1)) is minimal:

    ∂αf (x (1)) = 0

    • use chain rule:

    ∂αf (x (1)) = f ′(x (1))T

    ∂αx (1) = f ′(x (1))T r (0)

    • remember f ′(x (1)) = −r (1), thus:

    −(

    r (1))T

    r (0) != 0

    hence, f ′(x (1)) = −r (1) should be orthogonal to r (0)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 9

  • Steepest Descent – find α (2)

    (r (1))T

    r (0) =(

    b − Ax (1))T

    r (0) = 0(b − A(x (0) + αr (0))

    )Tr (0) = 0(

    b − Ax (0))T

    r (0) − α(

    Ar (0))T

    r (0) = 0(r (0))T

    r (0) − α(

    r (0))T

    Ar (0) = 0

    Solve for α:

    α =

    (r (0))T

    r (0)(r (0))T Ar (0)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 10

  • Steepest Descent – Algorithm

    1. r (i) = b − Ax (i)

    2. αi =(r (i))T

    r (i)(r (i))T Ar (i)

    3. x (i+1) = x (i) + αi r (i)

    Observations:

    -2 4x

    y2

    -2

    20

    4

    -4

    6

    -6

    0

    • slow convergence (sim. to Jacobi relaxation)

    •∥∥e(i)∥∥A ≤ (κ−1κ+1)i ∥∥e(0)∥∥A

    • for positive definite A: κ = λmax/λmin(largest/smallest eigenvalues of A)

    • many steps in the same direction

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 11

  • Steepest Descent – Example 1

    Consider example(

    3 22 6

    )x =

    (2−8

    ), starting solution

    (−3−3

    )

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 12

  • Steepest Descent – Example 2

    Consider example(

    3 22 100

    )x =

    (2−8

    ), starting solution

    (−10−2

    )

    Multiple steps in the same search direction!

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 13

  • Part II

    Conjugate Gradients

    Conjugate DirectionsA-OrthogonalityConjugate GradientsA Miracle Occurs . . .CG Algorithm

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 14

  • Conjugate Directions

    • observation:Steepest Descent takes repeated steps in the same direction

    • obvious idea:try to do only one step in each direction

    • possible approach:choose orthogonal search directions d (0)⊥d (1)⊥d (2)⊥ . . .

    • notice:errors orthogonal to previous directions:

    e(1)⊥d (0),e(2)⊥d (1)⊥d (0), . . .

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 15

  • Conjugate Directions (2)

    • compute α from(d (0)

    )Te(1) =

    (d (0)

    )T (e(0) − αd (0)

    )= 0

    requires propagation of the error e(1) = x − x (1)

    x (1) = x (0) + αi d (0)

    x − x (1) = x − x (0) − αi d (0)

    e(1) = e(0) − αi d (0)

    • formula for α:

    α =

    (d (0)

    )Te(0)(

    d (0))T d (0)

    • but: we don’t know e(0)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 16

  • A-Orthogonality• make the search directions A-orthogonal:(

    d (i))T

    Ad (j) = 0

    • again: errors A-orthogonal to previous directions:(e(i+1)

    )TAd (i) != 0

    • equiv. to minimisation in search direction d (i):

    ∂αf(

    x (i+1))=

    (f ′(

    x (i+1)))T ∂

    ∂αx (i+1) = 0

    ⇔ −(

    r (i+1))T

    d (i) = 0

    ⇔ −(

    d (i))T

    Ae(i+1) = 0

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 17

  • A-Conjugate Directions

    • remember the formula for conjugate directions:

    α =

    (d (0)

    )T e(0)(d (0)

    )T d (0)• same computation, but with A-orthogonality:

    αi =

    (d (i))T Ae(i)(

    d (i))T Ad (i) =

    (d (i))T r (i)(

    d (i))T Ad (i)

    (for the i-th iteration)• these αi can be computed!• still to do: find A-orthogonal search directions

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 18

  • A-Conjugate Directions (2)

    classical approach to find orthogonal directions→conjugate Gram-Schmidt process:

    • from linearly independent vectors u(0),u(1), . . . ,u(i−1)

    • construct orthogonal directions d (0),d (1), . . . ,d (i−1)

    d (i) = u(i) +i−1∑k=0

    βik d (k)

    βik = −(u(i))T Ad (k)

    (d (k))T Ad (k)

    • needs to keep all old search vectors in memory• O(n3) computational complexity⇒ infeasible

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 19

  • Conjugate Gradients

    • use residuals (i.e., u(i) := r (i)) to construct conjugate directions:

    d (i) = r (i) +i−1∑k=0

    βik d (k)

    • new direction d (i) should be A-orthogonal to all d (j):

    0 !=(d (i))T Ad (j) = (r (i))T Ad (j) + i−1∑

    k=0

    βik(d (k)

    )T Ad (j)• all directions d (k) (for k = 0, . . . , i − 1) are already A-orthogonal

    (and j < i), hence:

    0 =(r (i))T Ad (j) + βij(d (j))T Ad (j) ⇒ βij = − (r (i))T Ad (j)(

    d (j))T Ad (j)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 20

  • Conjugate Gradients – Status

    1. conjugate directions and computation of αi :

    αi =

    (d (i))T r (i)(

    d (i))T Ad (i)

    x (i+1) = x (i) + αid (i)

    2. use residuals to compute search directions:

    d (i) = r (i) +i−1∑k=0

    βik d (k)

    βik = −(r (i))T Ad (k)(

    d (k))T Ad (k)

    → still too expensive, as we need to store all vectors d (k)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 21

  • A Miracle Occurs – Part 1

    Two small contributions:1. propagation of the error e(i) = x − x (i)

    x (i+1) = x (i) + αid (i)

    x − x (i+1) = x − x (i) − αid (i)

    e(i+1) = e(i) − αid (i)

    (we have used this once, already)

    2. propagation of residuals

    r (i+1) = Ae(i+1) = A(

    e(i) − αid (i))

    ⇒ r (i+1) = r (i) − αiAd (i)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 22

  • A Miracle Occurs – Part 2

    Orthogonality of the residuals:• search directions are A-orthogonal• only one step in each direction• hence: error is A-orthogonal to all previous search directions:(

    d (i))T Ae(j) = 0, for i < j

    • residuals are orthogonal to all previous search directions:(d (i))T r (j) = 0, for i < j

    • search directions are built from residuals:span

    {d (0), . . . ,d (i−1)

    }= span

    {r (0), . . . , r (i−1)

    }• hence: residuals are orthogonal(

    r (i))T r (j) = 0, i < j

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 23

  • A Miracle Occurs – Part 3

    • combine orthogonality and recurrence for residuals:(r (i))T r (j+1) = (r (i))T r (j) − αj(r (i))T Ad (j)

    ⇒ αj(r (i))T Ad (j) = (r (i))T r (j) − (r (i))T r (j+1)

    •(r (i))T r (j) = 0, if i 6= j :

    (r (i))T Ad (j) =

    1αi

    (r (i))T r (i), i = j

    − 1αi−1(r (i))T r (i), i = j + 1

    0 otherwise.

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 24

  • A Miracle Occurs – Part 4

    • computation of βik (for k = 0, . . . , i − 1):

    βik = −(r (i))T Ad (k)(

    d (k))T Ad (k) =

    (r (i))T r (i)

    αi−1(d (i−1)

    )T Ad (i−1) , if i = k + 10, if i > k + 1

    • thus: search directions

    d (i) = r (i) +i−1∑k=0

    βik d (k) = r (i) + βi,i−1d (i−1)

    βi := βi,i−1 =

    (r (i))T r (i)

    αi−1(d (i−1)

    )T Ad (i−1)⇒ reduces to a simple iterative scheme for βi

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 25

  • A Miracle Occurs – Part 5

    • build search directions

    d (i+1) = r (i+1) + βid (i)

    βi+1 =

    (r (i+1)

    )T r (i+1)αi(d (i))T Ad (i)

    • remember: αi =(d (i))T r (i)

    (d (i))T Ad (i)

    • thus: αi (d (i))T Ad (i) = (d (i))T r (i)

    ⇒ βi+1 =(r (i+1)

    )T r (i+1)(d (i))T r (i) =

    (r (i+1)

    )T r (i+1)(r (i))T r (i)

    • last step:(d (i)

    )T r (i) = (r (i) + βi−1d (i−1))T r (i) = (r (i))T r (i) + βi−1(d (i−1))T r (i) = (r (i))T r (i)(residual r (i) orthogonal to previous search direction d (i−1))

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 26

  • Conjugate Gradients – Algorithm

    Start with d (0) = r (0) = b − Ax (0)

    While r (i) > � iterate over:

    1. αi =(r (i))T r (i)

    (d (i))T Ad (i)

    2. x (i+1) = x (i) + αid (i)

    3. r (i+1) = r (i) − αiAd (i)

    4. βi+1 =(r (i+1)

    )T r (i+1)(r (i))T r (i)

    5. d (i+1) = r (i+1) + βi+1d (i)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 27

  • Conjugate Gradients – Example

    Consider example(

    3 22 100

    )x =

    (2−8

    ), starting solution

    (−10−2

    )

    Convergence to solution after n = 2 steps!

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 28

  • Part III

    Preconditioning

    CG ConvergencePreconditioningCG with “Change-of-Basis” PreconditioningHierarchical TransformsCG with Matrix PreconditionerPreconditioners – ExamplesILU and Incomplete Cholesky

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 29

  • Conjugate Gradients – Convergence

    Convergence Analysis:• uses Krylov subspace:

    span{

    r (0),Ar (0),A2r (0), . . . ,Ai−1r (0)}

    • “Krylov subspace method”

    Convergence Results:• in principle: direct method (n steps)

    (however: orthogonality lost due to round-off errors→ exact solution not found)

    • in practice: iterative scheme

    ∥∥∥e(i)∥∥∥A≤ 2

    (√κ− 1√κ+ 1

    )i ∥∥∥e(0)∥∥∥A, κ = λmax/λmin

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 30

  • Preconditioning

    • convergence depends on matrix A• idea: modify linear system

    Ax = b M−1Ax = M−1b,

    then: convergence depends on matrix M−1A• optimal preconditioner: M−1 = A−1:

    A−1Ax = A−1b ⇔ x = A−1b.

    • in practice:– avoid explicit computation of M−1A– find an M similar to A, compute effect of M−1

    (i.e., approximate solution of SLE)– or: find an M−1 similar to A−1

    • also possible: solve AM y = b for y , and then x = My

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 31

  • CG and Preconditioning

    • just replace A by M−1A in the algorithm??• problem: M−1A not necessarily symmetric

    (even if M and A both are)• we will try an alternative first:

    symmetric preconditioning

    Ax = b LT ALx̂ = LT b, x = Lx̂

    • Remember: for Finite Element discretization, this corresponds to achange of basis functions!

    • can we implement CG without having to set up LT AL?• requires some re-computations in the CG algorithm

    (see following slides)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 32

  • “Change-of-Basis” Preconditioning• preconditioned system of equations:

    Ax = b (LT AL︸ ︷︷ ︸=: Â

    )x̂ = LT b︸︷︷︸

    =: b̂

    , x = Lx̂

    • computation of residual:

    r̂ = b̂ − Â x̂ = LT b − LT ALx̂ = LT (b − Ax) = LT r• computation of α (for preconditioned system):

    αi :=

    (r̂ (i))T r̂ (i)(

    d̂ (i))T Â d̂ (i) =

    (r̂ (i))T r̂ (i)(

    d̂ (i))T LT AL d̂ (i) =

    (r̂ (i))T r̂ (i)(

    d̃ (i))T A d̃ (i)

    where we defined Ld̂ (i) =: d̃ (i)

    • update of solution:x̂ (i+1) = x̂ (i) + αi d̂ (i)

    ⇒ x (i+1) = Lx̂ (i+1) = Lx̂ (i) + Lαi d̂ (i) = x (i) + αi d̃ (i)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 33

  • “Change-of-Basis” Preconditioning (2)

    • update residuals r̂ :

    r̂ (i+1) = r̂ (i) − αi Âd̂ (i) = r̂ (i) − αiLT AL d̂ (i)

    = r̂ (i) − αiLT A d̃ (i)

    • computation of βi :

    βi+1 =

    (r̂ (i+1)

    )T r̂ (i+1)(r̂ (i))T r̂ (i)

    • update of search directions:

    d̂ (i+1) = r̂ (i+1) + βi d̂ (i)

    ⇒ d̃ (i+1) = Ld̂ (i+1) = Lr̂ (i+1) + Lβi d̂ (i)

    = Lr̂ (i+1) + βi d̃ (i)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 34

  • CG with “Change-of-Basis” Preconditioning

    Start with r̂ (0) = LT (b − Ax (0)) and d̃ (0) = L r̂ (0);

    While r̂ (i) > � iterate over:

    1. αi =(r̂ (i))T r̂ (i)(

    d̃ (i))T A d̃ (i)

    2. x (i+1) = x (i) + αi d̃ (i)

    3. r̂ (i+1) = r̂ (i) − αiLT A d̃ (i)

    4. βi+1 =(r̂ (i+1)

    )T r̂ (i+1)(r̂ (i))T r̂ (i)

    5. d̃ (i+1) = L r̂ (i+1) + βi d̃ (i)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 35

  • Hierarchical Basis Preconditioning

    Some specifics for the CG implementation:• L transforms coefficient vector from hierarchical basis to nodal basis,

    for example x̂ = L x or d̃ = L d̂• LT transforms the vector of basis functions from nodal basis to

    hierarchical basis (cmp. FEM), thus r̂ = LT r• effect of L and LT can be computed in O(N) operations

    HB-CG for the Poisson problem:• in 1D: convergence after a log N iterations!

    (in this case: LT AL diagonal matrix with log N different eigenvalues)• in 2D and 3D very fast convergence!• further improved by additional diagonal preconditioning• so-called hierarchical generating systems (change to a multigrid basis)

    achieve multigrid-like performance

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 36

  • CG with Hierarchical Generating SystemsRecall: system of linear equations AGSvGS = bGS given as Ah AhP

    h2h AhP

    h4h

    R2hh Ah A2h A2hP2h4h

    R4hh Ah R4h2hA2h A4h

    vhv2h

    v4h

    = bhR2hh bh

    R4hh bh

    Preconditioning for CG?

    • system AGSvGS = bGS is singular→ subspace of solutions vGS (minima of the quadratic form!)

    • however: any of the solutions will do!• convergence result:∥∥∥e(i)∥∥∥

    A≤ 2

    (√κ̂GS − 1√κ̂GS + 1

    )i ∥∥∥e(0)∥∥∥A, κ̂GS = λ̂max/λ̂min

    where κ̂GS is the ratio of largest vs. smallest non-zero eigenvalue• for Poisson eq.: κ̂GS independent of h multigrid convergence

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 37

  • Hierarchical Basis TransformationTowards level-wise approach

    Consider “semi-hierarchical” transform:

    .x2x1 x3 x5 x7

    Φ4Φ2 Φ6Φ1 Φ3 Φ5 Φ7

    x6x4

    −→

    .x2x1 x3 x5 x7

    Ψ2 Ψ6Ψ1 Ψ3 Ψ5 Ψ7

    x6x4

    Matrices for change of basis are then: (H(2)3 to transform to hierarchical basis)

    H(1)3 =

    1 0 0 0 0 0 012 1

    12 0 0 0 0

    0 0 1 0 0 0 00 0 12 1

    12 0 0

    0 0 0 0 1 0 00 0 0 0 12 1

    12

    0 0 0 0 0 0 1

    H(2)3 =

    1 0 0 0 0 0 00 1 0 0 0 0 00 0 1 0 0 0 00 12 0 1 0

    12 0

    0 0 0 0 1 0 00 0 0 0 0 1 00 0 0 0 0 0 1

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 38

  • Hierarchical Basis Transformation

    Level-wise hierarchical transform:• hierarchical basis transformation: ψn,i(x) =

    ∑j

    Hi,jφn,j(x)

    • written as matrix-vector product: ~ψn = Hn~φn• Hn~φn can be performed as a sequence of level-wise transforms:

    Hn~φn = H(n−1)n H

    (n−2)n . . . H

    (2)n H

    (1)n~φn

    • consider Hierarchical-Basis-Preconditioning for FEM-discretization:LT := Hn and we need to compute r̂ (0) = LT r (0) and LT A d̃ (i)

    • each level-wise transform H(k)n has a simple loop implementation:For k from 1 to n-1 do

    For j from 2k to 2n step 2k

    rj := 12 rj−2k−1 + rk +12 rj+2k−1

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 39

  • Hierarchical Coordinate Transformation

    • transform b = HTn a turns “hierachical” coefficients a into “nodal”coefficients b: ∑

    j

    bjφn,j(x) =∑

    i

    aiψn,i(x) ≈ f (x)

    • Hn = H(n−1)n H

    (n−2)n . . . H

    (2)n H

    (1)n has a level-wise representation, thus:

    HTn =(

    H(1)n)T (

    H(2)n)T

    . . .(

    H(n−2)n)T (

    H(n−1)n)T

    • consider Hierarchical-Basis-Preconditioning for FEM-discretization:L := HTn for computation of d̃ (0) = L r̂ (0) and d̃ (i+1) = L r̂ (i+1) + βi d̃ (i)

    • again: use loop-based implementations for(

    H(k)n)T

    a

    For k from n-1 downto 1For i from 2k−1 to 2n step 2k

    di := 12 di−2k−1 + di +12 di+2k−1 (with d0 = adn = 0)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 40

  • CG and Preconditioning (revisited)

    • preconditioning: replace A by M−1A• problem: M−1A not necessarily symmetric• compare symmetric preconditioning

    Ax = b LT ALx̂ = Lb, x = Lx̂

    • workaround: find ET E = M (Cholesky fact.), then

    Ax = b E−T AE−1x̂ = E−T b, x̂ = Ex

    • what if E cannot be computed (efficiently)?(neither M nor M−1 might be known explicitly!)

    • E , E−T , E−1 can be eliminated from algorithm(again requires some re-computations):

    set d̂ = Ed and use r̂ = E−T r , x̂ = Ex , E−1E−T = M−1

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 41

  • CG with Preconditioner

    Start: r (0) = b − Ax (0); d (0) = M−1r (0)

    1. αi =(r (i))T M−1r (i)

    (d (i))T Ad (i)

    2. x (i+1) = x (i) + αid (i)

    3. r (i+1) = r (i) − αiAd (i)

    4. βi+1 =(r (i+1)

    )T M−1r (i+1)(r (i))T M−1r (i)

    5. d (i+1) = M−1r (i+1) + βi+1d (i)

    (for detailed derivation, see Shewchuck)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 42

  • Implementation

    Preconditioning steps: M−1r (i),M−1r (i+1)

    • M−1 known then multiply M−1r (i)

    • M known? Then solve My = r (i) to obtain y = M−1r (i)

    • neither M, nor M−1 are known explicitly:• algorithm to solve My = r (i) is sufficient!• algorithm to compute action of M−1 on a vector is sufficient!⇒ any approximate solver for Ae = r (i) is sufficient

    (if it is equivalent to applying a symmetric and pos. definite matrix)

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 43

  • Preconditioners for CG – Examples• find M ≈ A and compute effect of M−1:

    • Jacobi preconditioning: M := DA• (Symmetric) Gauss-Seidel preconditioning: M := LA or

    M = (DA + L′A)D−1A (DA + (L

    ′A)

    T ), etc.• just compute effect of M−1:

    • any approximate solver might do→ incl. multigrid methods

    • incomplete LU-decomposition (ILU)should be symmetric→ incomplete Cholesky factorization

    • use a multigrid method as preconditioner(?)→ worthwhile (only) in situations where multigrid does not work(well) as stand-alone solver

    • find an M−1 similar to A−1

    • “sparse approximate inverse” (SPAI)• tries to minimise ‖I −MA‖F , where M is a matrix with (given) sparse

    non-zero pattern

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 44

  • Preconditioners – ILU and Incomplete Cholesky

    Recall LU decomposition and Cholesky factorization:• LU decomposition: given A, find lower/upper triangular matrices L and U

    such that A = LU• Cholesky factorization: given A = AT , find lower triangular matrix L such

    that A = LLT → symmetric preconditioning!• variants with explicit diagonal matrix D:

    A = LD−1U or A = LD−1LT ,

    where L = D + L′ and R = D + R′ with strict lower/upper triangular L′, R′

    • but: for sparse A, L and U may be non-sparse

    Idea: disregard all fill-in during factorization

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 45

  • Cholesky Factorization

    L11L21 L22L31 L32 L33

    D

    −111

    D−122D−133

    L

    T11 L

    T21 L

    T31

    LT22 LT32

    LT33

    =A11 A

    T21 A

    T31

    A21 A22 AT32A31 A32 A33

    Derive the factorization algorithm:

    • assume that A11!= L11D−111 L

    T11 is already factorized

    • let L21 be a 1× k submatrix, i.e., to compute next row of L:

    L21D−111 LT11

    != A21

    D−111 LT11 upper triangular matrix→ solve triangular system for L21

    • by convention L22 = D22, which is computed from:

    L21D−111 LT21 + L22D

    −122 L

    −T22

    != A22 ⇒ L22 = D22 := A22 − L21D−111 L

    T21

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 46

  • Incomplete Cholesky Factorization

    Algorithm: ( A→ LD−1LT )• initialize D := 0, L := 0• for i = 1, . . . ,n:

    1. for k = 1, . . . , i − 1:if (i , k) ∈ S then set Lik := Aik −

    ∑j

  • Literature/References

    Conjugate Gradients:• Shewchuk: An Introduction to the Conjugate Gradient Method Without

    the Agonizing Pain.• Hackbusch: Iterative Solution of Large Sparse Systems of Equations,

    Springer 1993.• M. Griebel: Multilevelmethoden als Iterationsverfahren über

    Erzeugendensystemen, Teubner Skripten zur Numerik, 1994M. Griebel: Multilevel algorithms considered as iterative methods onsemidefinite systems, SIAM Int. J. Sci. Stat. Comput. 15(3), 1994.

    Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 48

    Quadratic Forms and Steepest DescentQuadratic FormsDirection of Steepest DescentSteepest Descent

    Conjugate GradientsConjugate DirectionsA-OrthogonalityConjugate GradientsA Miracle Occurs …CG Algorithm

    PreconditioningCG ConvergencePreconditioningCG with ``Change-of-Basis'' PreconditioningHierarchical TransformsCG with Matrix PreconditionerPreconditioners – ExamplesILU and Incomplete Cholesky