View
7
Download
0
Category
Preview:
Citation preview
Scientific Computing II
Conjugate Gradient Methods
Michael BaderTechnical University of Munich
Summer 2016
Families of Iterative Solvers
• relaxation methods:• Jacobi-, Gauss-Seidel-Relaxation, . . .• Over-Relaxation-Methods
• Krylov methods:• Steepest Descent, Conjugate Gradient, . . .• GMRES, . . .
• Multilevel/Multigrid methods,Domain Decomposition, . . .
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 2
Remember: The Residual Equation
• for Ax = b, we defined the residual as:
r (i) = b − Ax (i)
• and the error: e(i) := x − x (i)
• leads to the residual equation:
Ae(i) = r (i)
• relaxation methods: solve a modified (easier) SLE:
B ê(i) = r (i) where B ∼ A
• multigrid methods: coarse-grid correction on residual equation
AHe(i)H = r
(i)H and x
(i+1) := x (i) + IhHe(i)H
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 3
Part I
Quadratic Forms and SteepestDescent
Quadratic FormsDirection of Steepest DescentSteepest Descent
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 4
Quadratic FormsA quadratic form is a scalar, quadratic function of a vector of the form:
f (x) =12
xT Ax − bT x + c, where A = AT
4-4
02
-2
40
00-2
80
y2x-4
120
4-66
160-2
-2
xy
02
-6
4
4
-4
60
2
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 5
Quadratic Forms (2)
The gradient of a quadratic form is defined as
f ′(x) =
∂∂x1
f (x)...
∂∂xn
f (x)
• apply to f (x) = 12 x
T Ax − bT x + c, then• f ′(x) = Ax − b• f ′(x) = 0 ⇔ Ax − b = 0 ⇔ Ax = b
⇒ Ax = b equivalent to a minimisation problem⇒ proper minimum only if A positive definite
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 6
Direction of Steepest Descent
• gradient f ′(x): direction of “steepest ascent”• f ′(x) = Ax − b = −r (with residual r = b − Ax)• residual r : direction of “steepest descent”
-2
y
6x
4
2
-4
2
-6
-2
00 4
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 7
Solving SLE via Minimum Search
• basic idea to find minimum:move into direction of steepest descent
• most simple scheme:x (i+1) = x (i) + αr (i)
• α constant⇒ Richardson iteration(usually considered as a relaxation method)
• better choice of α:move to lowest point in that direction⇒ Steepest Descent
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 8
Steepest Descent – find an optimal α
• task: line search along the line x (1) = x (0) + αr (0)
• choose α such that f (x (1)) is minimal:
∂
∂αf (x (1)) = 0
• use chain rule:
∂
∂αf (x (1)) = f ′(x (1))T
∂
∂αx (1) = f ′(x (1))T r (0)
• remember f ′(x (1)) = −r (1), thus:
−(
r (1))T
r (0) != 0
hence, f ′(x (1)) = −r (1) should be orthogonal to r (0)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 9
Steepest Descent – find α (2)
(r (1))T
r (0) =(
b − Ax (1))T
r (0) = 0(b − A(x (0) + αr (0))
)Tr (0) = 0(
b − Ax (0))T
r (0) − α(
Ar (0))T
r (0) = 0(r (0))T
r (0) − α(
r (0))T
Ar (0) = 0
Solve for α:
α =
(r (0))T
r (0)(r (0))T Ar (0)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 10
Steepest Descent – Algorithm
1. r (i) = b − Ax (i)
2. αi =(r (i))T
r (i)(r (i))T Ar (i)
3. x (i+1) = x (i) + αi r (i)
Observations:
-2 4x
y2
-2
20
4
-4
6
-6
0
• slow convergence (sim. to Jacobi relaxation)
•∥∥e(i)∥∥A ≤ (κ−1κ+1)i ∥∥e(0)∥∥A
• for positive definite A: κ = λmax/λmin(largest/smallest eigenvalues of A)
• many steps in the same direction
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 11
Steepest Descent – Example 1
Consider example(
3 22 6
)x =
(2−8
), starting solution
(−3−3
)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 12
Steepest Descent – Example 2
Consider example(
3 22 100
)x =
(2−8
), starting solution
(−10−2
)
Multiple steps in the same search direction!
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 13
Part II
Conjugate Gradients
Conjugate DirectionsA-OrthogonalityConjugate GradientsA Miracle Occurs . . .CG Algorithm
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 14
Conjugate Directions
• observation:Steepest Descent takes repeated steps in the same direction
• obvious idea:try to do only one step in each direction
• possible approach:choose orthogonal search directions d (0)⊥d (1)⊥d (2)⊥ . . .
• notice:errors orthogonal to previous directions:
e(1)⊥d (0),e(2)⊥d (1)⊥d (0), . . .
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 15
Conjugate Directions (2)
• compute α from(d (0)
)Te(1) =
(d (0)
)T (e(0) − αd (0)
)= 0
requires propagation of the error e(1) = x − x (1)
x (1) = x (0) + αi d (0)
x − x (1) = x − x (0) − αi d (0)
e(1) = e(0) − αi d (0)
• formula for α:
α =
(d (0)
)Te(0)(
d (0))T d (0)
• but: we don’t know e(0)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 16
A-Orthogonality• make the search directions A-orthogonal:(
d (i))T
Ad (j) = 0
• again: errors A-orthogonal to previous directions:(e(i+1)
)TAd (i) != 0
• equiv. to minimisation in search direction d (i):
∂
∂αf(
x (i+1))=
(f ′(
x (i+1)))T ∂
∂αx (i+1) = 0
⇔ −(
r (i+1))T
d (i) = 0
⇔ −(
d (i))T
Ae(i+1) = 0
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 17
A-Conjugate Directions
• remember the formula for conjugate directions:
α =
(d (0)
)T e(0)(d (0)
)T d (0)• same computation, but with A-orthogonality:
αi =
(d (i))T Ae(i)(
d (i))T Ad (i) =
(d (i))T r (i)(
d (i))T Ad (i)
(for the i-th iteration)• these αi can be computed!• still to do: find A-orthogonal search directions
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 18
A-Conjugate Directions (2)
classical approach to find orthogonal directions→conjugate Gram-Schmidt process:
• from linearly independent vectors u(0),u(1), . . . ,u(i−1)
• construct orthogonal directions d (0),d (1), . . . ,d (i−1)
d (i) = u(i) +i−1∑k=0
βik d (k)
βik = −(u(i))T Ad (k)
(d (k))T Ad (k)
• needs to keep all old search vectors in memory• O(n3) computational complexity⇒ infeasible
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 19
Conjugate Gradients
• use residuals (i.e., u(i) := r (i)) to construct conjugate directions:
d (i) = r (i) +i−1∑k=0
βik d (k)
• new direction d (i) should be A-orthogonal to all d (j):
0 !=(d (i))T Ad (j) = (r (i))T Ad (j) + i−1∑
k=0
βik(d (k)
)T Ad (j)• all directions d (k) (for k = 0, . . . , i − 1) are already A-orthogonal
(and j < i), hence:
0 =(r (i))T Ad (j) + βij(d (j))T Ad (j) ⇒ βij = − (r (i))T Ad (j)(
d (j))T Ad (j)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 20
Conjugate Gradients – Status
1. conjugate directions and computation of αi :
αi =
(d (i))T r (i)(
d (i))T Ad (i)
x (i+1) = x (i) + αid (i)
2. use residuals to compute search directions:
d (i) = r (i) +i−1∑k=0
βik d (k)
βik = −(r (i))T Ad (k)(
d (k))T Ad (k)
→ still too expensive, as we need to store all vectors d (k)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 21
A Miracle Occurs – Part 1
Two small contributions:1. propagation of the error e(i) = x − x (i)
x (i+1) = x (i) + αid (i)
x − x (i+1) = x − x (i) − αid (i)
e(i+1) = e(i) − αid (i)
(we have used this once, already)
2. propagation of residuals
r (i+1) = Ae(i+1) = A(
e(i) − αid (i))
⇒ r (i+1) = r (i) − αiAd (i)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 22
A Miracle Occurs – Part 2
Orthogonality of the residuals:• search directions are A-orthogonal• only one step in each direction• hence: error is A-orthogonal to all previous search directions:(
d (i))T Ae(j) = 0, for i < j
• residuals are orthogonal to all previous search directions:(d (i))T r (j) = 0, for i < j
• search directions are built from residuals:span
{d (0), . . . ,d (i−1)
}= span
{r (0), . . . , r (i−1)
}• hence: residuals are orthogonal(
r (i))T r (j) = 0, i < j
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 23
A Miracle Occurs – Part 3
• combine orthogonality and recurrence for residuals:(r (i))T r (j+1) = (r (i))T r (j) − αj(r (i))T Ad (j)
⇒ αj(r (i))T Ad (j) = (r (i))T r (j) − (r (i))T r (j+1)
•(r (i))T r (j) = 0, if i 6= j :
(r (i))T Ad (j) =
1αi
(r (i))T r (i), i = j
− 1αi−1(r (i))T r (i), i = j + 1
0 otherwise.
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 24
A Miracle Occurs – Part 4
• computation of βik (for k = 0, . . . , i − 1):
βik = −(r (i))T Ad (k)(
d (k))T Ad (k) =
(r (i))T r (i)
αi−1(d (i−1)
)T Ad (i−1) , if i = k + 10, if i > k + 1
• thus: search directions
d (i) = r (i) +i−1∑k=0
βik d (k) = r (i) + βi,i−1d (i−1)
βi := βi,i−1 =
(r (i))T r (i)
αi−1(d (i−1)
)T Ad (i−1)⇒ reduces to a simple iterative scheme for βi
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 25
A Miracle Occurs – Part 5
• build search directions
d (i+1) = r (i+1) + βid (i)
βi+1 =
(r (i+1)
)T r (i+1)αi(d (i))T Ad (i)
• remember: αi =(d (i))T r (i)
(d (i))T Ad (i)
• thus: αi (d (i))T Ad (i) = (d (i))T r (i)
⇒ βi+1 =(r (i+1)
)T r (i+1)(d (i))T r (i) =
(r (i+1)
)T r (i+1)(r (i))T r (i)
• last step:(d (i)
)T r (i) = (r (i) + βi−1d (i−1))T r (i) = (r (i))T r (i) + βi−1(d (i−1))T r (i) = (r (i))T r (i)(residual r (i) orthogonal to previous search direction d (i−1))
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 26
Conjugate Gradients – Algorithm
Start with d (0) = r (0) = b − Ax (0)
While r (i) > � iterate over:
1. αi =(r (i))T r (i)
(d (i))T Ad (i)
2. x (i+1) = x (i) + αid (i)
3. r (i+1) = r (i) − αiAd (i)
4. βi+1 =(r (i+1)
)T r (i+1)(r (i))T r (i)
5. d (i+1) = r (i+1) + βi+1d (i)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 27
Conjugate Gradients – Example
Consider example(
3 22 100
)x =
(2−8
), starting solution
(−10−2
)
Convergence to solution after n = 2 steps!
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 28
Part III
Preconditioning
CG ConvergencePreconditioningCG with “Change-of-Basis” PreconditioningHierarchical TransformsCG with Matrix PreconditionerPreconditioners – ExamplesILU and Incomplete Cholesky
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 29
Conjugate Gradients – Convergence
Convergence Analysis:• uses Krylov subspace:
span{
r (0),Ar (0),A2r (0), . . . ,Ai−1r (0)}
• “Krylov subspace method”
Convergence Results:• in principle: direct method (n steps)
(however: orthogonality lost due to round-off errors→ exact solution not found)
• in practice: iterative scheme
∥∥∥e(i)∥∥∥A≤ 2
(√κ− 1√κ+ 1
)i ∥∥∥e(0)∥∥∥A, κ = λmax/λmin
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 30
Preconditioning
• convergence depends on matrix A• idea: modify linear system
Ax = b M−1Ax = M−1b,
then: convergence depends on matrix M−1A• optimal preconditioner: M−1 = A−1:
A−1Ax = A−1b ⇔ x = A−1b.
• in practice:– avoid explicit computation of M−1A– find an M similar to A, compute effect of M−1
(i.e., approximate solution of SLE)– or: find an M−1 similar to A−1
• also possible: solve AM y = b for y , and then x = My
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 31
CG and Preconditioning
• just replace A by M−1A in the algorithm??• problem: M−1A not necessarily symmetric
(even if M and A both are)• we will try an alternative first:
symmetric preconditioning
Ax = b LT ALx̂ = LT b, x = Lx̂
• Remember: for Finite Element discretization, this corresponds to achange of basis functions!
• can we implement CG without having to set up LT AL?• requires some re-computations in the CG algorithm
(see following slides)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 32
“Change-of-Basis” Preconditioning• preconditioned system of equations:
Ax = b (LT AL︸ ︷︷ ︸=: Â
)x̂ = LT b︸︷︷︸
=: b̂
, x = Lx̂
• computation of residual:
r̂ = b̂ − Â x̂ = LT b − LT ALx̂ = LT (b − Ax) = LT r• computation of α (for preconditioned system):
αi :=
(r̂ (i))T r̂ (i)(
d̂ (i))T Â d̂ (i) =
(r̂ (i))T r̂ (i)(
d̂ (i))T LT AL d̂ (i) =
(r̂ (i))T r̂ (i)(
d̃ (i))T A d̃ (i)
where we defined Ld̂ (i) =: d̃ (i)
• update of solution:x̂ (i+1) = x̂ (i) + αi d̂ (i)
⇒ x (i+1) = Lx̂ (i+1) = Lx̂ (i) + Lαi d̂ (i) = x (i) + αi d̃ (i)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 33
“Change-of-Basis” Preconditioning (2)
• update residuals r̂ :
r̂ (i+1) = r̂ (i) − αi Âd̂ (i) = r̂ (i) − αiLT AL d̂ (i)
= r̂ (i) − αiLT A d̃ (i)
• computation of βi :
βi+1 =
(r̂ (i+1)
)T r̂ (i+1)(r̂ (i))T r̂ (i)
• update of search directions:
d̂ (i+1) = r̂ (i+1) + βi d̂ (i)
⇒ d̃ (i+1) = Ld̂ (i+1) = Lr̂ (i+1) + Lβi d̂ (i)
= Lr̂ (i+1) + βi d̃ (i)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 34
CG with “Change-of-Basis” Preconditioning
Start with r̂ (0) = LT (b − Ax (0)) and d̃ (0) = L r̂ (0);
While r̂ (i) > � iterate over:
1. αi =(r̂ (i))T r̂ (i)(
d̃ (i))T A d̃ (i)
2. x (i+1) = x (i) + αi d̃ (i)
3. r̂ (i+1) = r̂ (i) − αiLT A d̃ (i)
4. βi+1 =(r̂ (i+1)
)T r̂ (i+1)(r̂ (i))T r̂ (i)
5. d̃ (i+1) = L r̂ (i+1) + βi d̃ (i)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 35
Hierarchical Basis Preconditioning
Some specifics for the CG implementation:• L transforms coefficient vector from hierarchical basis to nodal basis,
for example x̂ = L x or d̃ = L d̂• LT transforms the vector of basis functions from nodal basis to
hierarchical basis (cmp. FEM), thus r̂ = LT r• effect of L and LT can be computed in O(N) operations
HB-CG for the Poisson problem:• in 1D: convergence after a log N iterations!
(in this case: LT AL diagonal matrix with log N different eigenvalues)• in 2D and 3D very fast convergence!• further improved by additional diagonal preconditioning• so-called hierarchical generating systems (change to a multigrid basis)
achieve multigrid-like performance
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 36
CG with Hierarchical Generating SystemsRecall: system of linear equations AGSvGS = bGS given as Ah AhP
h2h AhP
h4h
R2hh Ah A2h A2hP2h4h
R4hh Ah R4h2hA2h A4h
vhv2h
v4h
= bhR2hh bh
R4hh bh
Preconditioning for CG?
• system AGSvGS = bGS is singular→ subspace of solutions vGS (minima of the quadratic form!)
• however: any of the solutions will do!• convergence result:∥∥∥e(i)∥∥∥
A≤ 2
(√κ̂GS − 1√κ̂GS + 1
)i ∥∥∥e(0)∥∥∥A, κ̂GS = λ̂max/λ̂min
where κ̂GS is the ratio of largest vs. smallest non-zero eigenvalue• for Poisson eq.: κ̂GS independent of h multigrid convergence
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 37
Hierarchical Basis TransformationTowards level-wise approach
Consider “semi-hierarchical” transform:
.x2x1 x3 x5 x7
Φ4Φ2 Φ6Φ1 Φ3 Φ5 Φ7
x6x4
−→
.x2x1 x3 x5 x7
Ψ2 Ψ6Ψ1 Ψ3 Ψ5 Ψ7
x6x4
Matrices for change of basis are then: (H(2)3 to transform to hierarchical basis)
H(1)3 =
1 0 0 0 0 0 012 1
12 0 0 0 0
0 0 1 0 0 0 00 0 12 1
12 0 0
0 0 0 0 1 0 00 0 0 0 12 1
12
0 0 0 0 0 0 1
H(2)3 =
1 0 0 0 0 0 00 1 0 0 0 0 00 0 1 0 0 0 00 12 0 1 0
12 0
0 0 0 0 1 0 00 0 0 0 0 1 00 0 0 0 0 0 1
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 38
Hierarchical Basis Transformation
Level-wise hierarchical transform:• hierarchical basis transformation: ψn,i(x) =
∑j
Hi,jφn,j(x)
• written as matrix-vector product: ~ψn = Hn~φn• Hn~φn can be performed as a sequence of level-wise transforms:
Hn~φn = H(n−1)n H
(n−2)n . . . H
(2)n H
(1)n~φn
• consider Hierarchical-Basis-Preconditioning for FEM-discretization:LT := Hn and we need to compute r̂ (0) = LT r (0) and LT A d̃ (i)
• each level-wise transform H(k)n has a simple loop implementation:For k from 1 to n-1 do
For j from 2k to 2n step 2k
rj := 12 rj−2k−1 + rk +12 rj+2k−1
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 39
Hierarchical Coordinate Transformation
• transform b = HTn a turns “hierachical” coefficients a into “nodal”coefficients b: ∑
j
bjφn,j(x) =∑
i
aiψn,i(x) ≈ f (x)
• Hn = H(n−1)n H
(n−2)n . . . H
(2)n H
(1)n has a level-wise representation, thus:
HTn =(
H(1)n)T (
H(2)n)T
. . .(
H(n−2)n)T (
H(n−1)n)T
• consider Hierarchical-Basis-Preconditioning for FEM-discretization:L := HTn for computation of d̃ (0) = L r̂ (0) and d̃ (i+1) = L r̂ (i+1) + βi d̃ (i)
• again: use loop-based implementations for(
H(k)n)T
a
For k from n-1 downto 1For i from 2k−1 to 2n step 2k
di := 12 di−2k−1 + di +12 di+2k−1 (with d0 = adn = 0)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 40
CG and Preconditioning (revisited)
• preconditioning: replace A by M−1A• problem: M−1A not necessarily symmetric• compare symmetric preconditioning
Ax = b LT ALx̂ = Lb, x = Lx̂
• workaround: find ET E = M (Cholesky fact.), then
Ax = b E−T AE−1x̂ = E−T b, x̂ = Ex
• what if E cannot be computed (efficiently)?(neither M nor M−1 might be known explicitly!)
• E , E−T , E−1 can be eliminated from algorithm(again requires some re-computations):
set d̂ = Ed and use r̂ = E−T r , x̂ = Ex , E−1E−T = M−1
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 41
CG with Preconditioner
Start: r (0) = b − Ax (0); d (0) = M−1r (0)
1. αi =(r (i))T M−1r (i)
(d (i))T Ad (i)
2. x (i+1) = x (i) + αid (i)
3. r (i+1) = r (i) − αiAd (i)
4. βi+1 =(r (i+1)
)T M−1r (i+1)(r (i))T M−1r (i)
5. d (i+1) = M−1r (i+1) + βi+1d (i)
(for detailed derivation, see Shewchuck)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 42
Implementation
Preconditioning steps: M−1r (i),M−1r (i+1)
• M−1 known then multiply M−1r (i)
• M known? Then solve My = r (i) to obtain y = M−1r (i)
• neither M, nor M−1 are known explicitly:• algorithm to solve My = r (i) is sufficient!• algorithm to compute action of M−1 on a vector is sufficient!⇒ any approximate solver for Ae = r (i) is sufficient
(if it is equivalent to applying a symmetric and pos. definite matrix)
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 43
Preconditioners for CG – Examples• find M ≈ A and compute effect of M−1:
• Jacobi preconditioning: M := DA• (Symmetric) Gauss-Seidel preconditioning: M := LA or
M = (DA + L′A)D−1A (DA + (L
′A)
T ), etc.• just compute effect of M−1:
• any approximate solver might do→ incl. multigrid methods
• incomplete LU-decomposition (ILU)should be symmetric→ incomplete Cholesky factorization
• use a multigrid method as preconditioner(?)→ worthwhile (only) in situations where multigrid does not work(well) as stand-alone solver
• find an M−1 similar to A−1
• “sparse approximate inverse” (SPAI)• tries to minimise ‖I −MA‖F , where M is a matrix with (given) sparse
non-zero pattern
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 44
Preconditioners – ILU and Incomplete Cholesky
Recall LU decomposition and Cholesky factorization:• LU decomposition: given A, find lower/upper triangular matrices L and U
such that A = LU• Cholesky factorization: given A = AT , find lower triangular matrix L such
that A = LLT → symmetric preconditioning!• variants with explicit diagonal matrix D:
A = LD−1U or A = LD−1LT ,
where L = D + L′ and R = D + R′ with strict lower/upper triangular L′, R′
• but: for sparse A, L and U may be non-sparse
Idea: disregard all fill-in during factorization
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 45
Cholesky Factorization
L11L21 L22L31 L32 L33
D
−111
D−122D−133
L
T11 L
T21 L
T31
LT22 LT32
LT33
=A11 A
T21 A
T31
A21 A22 AT32A31 A32 A33
Derive the factorization algorithm:
• assume that A11!= L11D−111 L
T11 is already factorized
• let L21 be a 1× k submatrix, i.e., to compute next row of L:
L21D−111 LT11
!= A21
D−111 LT11 upper triangular matrix→ solve triangular system for L21
• by convention L22 = D22, which is computed from:
L21D−111 LT21 + L22D
−122 L
−T22
!= A22 ⇒ L22 = D22 := A22 − L21D−111 L
T21
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 46
Incomplete Cholesky Factorization
Algorithm: ( A→ LD−1LT )• initialize D := 0, L := 0• for i = 1, . . . ,n:
1. for k = 1, . . . , i − 1:if (i , k) ∈ S then set Lik := Aik −
∑j
Literature/References
Conjugate Gradients:• Shewchuk: An Introduction to the Conjugate Gradient Method Without
the Agonizing Pain.• Hackbusch: Iterative Solution of Large Sparse Systems of Equations,
Springer 1993.• M. Griebel: Multilevelmethoden als Iterationsverfahren über
Erzeugendensystemen, Teubner Skripten zur Numerik, 1994M. Griebel: Multilevel algorithms considered as iterative methods onsemidefinite systems, SIAM Int. J. Sci. Stat. Comput. 15(3), 1994.
Michael Bader | Scientific Computing II | Conjugate Gradient Methods | Summer 2016 48
Quadratic Forms and Steepest DescentQuadratic FormsDirection of Steepest DescentSteepest Descent
Conjugate GradientsConjugate DirectionsA-OrthogonalityConjugate GradientsA Miracle Occurs …CG Algorithm
PreconditioningCG ConvergencePreconditioningCG with ``Change-of-Basis'' PreconditioningHierarchical TransformsCG with Matrix PreconditionerPreconditioners – ExamplesILU and Incomplete Cholesky
Recommended