Introduction to Iterative Methods for Solving Sparse ...zimmer.fresnostate.edu/.../itmethods_spmat.pdf · 4 Basic Iterative Methods Splittings and Overrelaxation Convergence Results

OutlineMotivation

Some Linear Algebra BackgroundDiscretization of Partial Differential Equations (PDEs)

Basic Iterative MethodsKrylov Subspace Methods

Preconditioning

Introduction to Iterative Methods for SolvingSparse Linear Systems

Dr. Doreen De Leon

Math 191T, Spring 2019

Dr. Doreen De Leon Introduction to Iterative Methods for Solving Sparse Linear Systems

OutlineMotivation



Preconditioning

Outline

1 Motivation2 Some Linear Algebra Background3 Discretization of Partial Differential Equations (PDEs)

Discretizations in One DimensionDiscretizations in Two Dimensions

4 Basic Iterative MethodsSplittings and OverrelaxationConvergence Results

5 Krylov Subspace MethodsOrthogonal Krylov Subspace MethodsBiorthogonal Krylov Subspace Methods

6 Preconditioning


OutlineMotivation



Preconditioning

Motivation

A sparse matrix is a matrix having a large number of zeros.

Large sparse linear systems arise as a result of a variety ofapplications.

Examples appear in combinatorics, network theory (when thereis a low density of connections), and science and engineering(often in the form of numerical solutions of partial differentialequations).

Focus in this talk: numerical solution of differential equations.


OutlineMotivation



Preconditioning

Some Linear Algebra Background

M-matrix: A matrix A is an M-matrix if it satisfies the followingproperties:

1 aii > 0 for i = 1, 2, . . . , n2 aij ≤ 0 for i 6= j, i, j = 1, 2, . . . , n3 A is nonsingular4 A−1 ≥ 0.

Positive definite matrix: A is positive definite if (Au, u) > 0 forall u ∈ Rn such that u 6= 0.

Symmetric positive definite (SPD): A is symmetric positivedefinite if AT = A and A is positive definite.


OutlineMotivation



Preconditioning

Some More Linear Algebra

Two flavors of diagonal dominance:

(weak) diagonal dominance: |ajj| ≥n∑

i=1i 6=j

|aij|, j = 1, 2, . . . , n

strict diagonal dominance: |ajj| >n∑

i=1i 6=j

|aij|, j = 1, 2, . . . , n

Irreducible matrix: A is irreducible if there is no permutationmatrix P such that PAPT is block upper triangular.

Theorem: If A is a nonnegative irreducible matrix, thenλ ≡ ρ(A) is a simple eigenvalue of A.


OutlineMotivation



Preconditioning


Partial Differential Equations

Physical phenomena are often modeled by equations relatingseveral partial derivatives of physical quantities.

Such equations rarely have an explicit solution.Model problems to be used in this talk:

Poisson’s equation:∂2u∂x2 +

∂2u∂y2 = f (x, y) for (x, y) ∈ Ω, where Ω

is a bounded, open domain in R2.

General elliptic equation:∂

∂x

(a∂u∂x

)+

∂

∂y

(a∂u∂y

)= f (x, y).

Steady-state conection-diffusion equation:−∇ · (a∇u) + b · ∇u = f .


OutlineMotivation



Preconditioning


Introduction to Finite Difference Discretizations – 1-D

Based on local approximations of partial derivatives using loworder Taylor expansions.Basic approximations (one-dimension), with step size of h:

Forward difference:dudx≈ u(x + h)− u(x)

h.

Backward difference:dudx≈ u(x)− u(x− h)

h(use −h).

Centered difference:dudx≈ u(x + h)− u(x− h)

2h(from

combining Taylor approximations for u(x + h) and u(x− h)).

Taylor series expand the right-hand side: error is O(h) forforward and backward difference and O(h2) for centereddifference.


OutlineMotivation



Preconditioning


Introduction to Finite Difference Discretizations – 1-D(cont.)

Second derivative approximation (centered differenceapproximation): add the Taylor approximations for u(x + h) andu(x− h) and divide by h2 to obtain

d2udx2 =

u(x + h)− 2u(x) + u(x− h)

h2 +O(h2).

Note: O(h2) means that the dominant term in the error as h→ 0is a constant times h2.


OutlineMotivation



Preconditioning


One-Dimensional Example

Consider the equation

−u′′(x) = f (x), for x ∈ (0, 1)

u(0) = u(1) = 0.

Discretize [0, 1]: xi = ih, i = 0, 1, . . . , n + 1; so h = 1(n+1) .

Values at the boundary are known, so we only number theinterior points xi with i = 1, 2, . . . , nAt xi, centered-difference approximation gives

1h2 (−ui+1 + 2ui − ui−1) = fi,

where ui ≈ u(xi), etc.Dr. Doreen De Leon Introduction to Iterative Methods for Solving Sparse Linear Systems

OutlineMotivation



Preconditioning


One-Dimensional Example (cont.)

Resulting linear system: Ax = f , where

A =1h2

2 −1−1 2 −1

. . . . . . . . .−1 2 −1

−1 2

.

This is a sparse linear system, because the coefficient matrixconsists mostly of zeros.


OutlineMotivation



Preconditioning


Upwind Schemes

Consider the 1-D version of the convection-diffusion equation:

−au′′ + bu′ = 0, x ∈ (0, 1)

u(0) = u(1) = 0

The exact solution is u(x) =1− eRx

1− eR , where R =ba

.


OutlineMotivation



Preconditioning


Upwind Schemes (cont.)

Consideration of the various discretization schemes shows thatonly forward or backward difference gives a non-oscillatingsolution approximating the true solution, depending on the signof b.

Solution: Use forward differencing when b < 0 and usebackward differencing when b > 0, giving the upwind scheme.

Define the upwind scheme by

bu′(xi) ≈12

(b− |b|)(

ui+1 − ui

h

)+

12

(b + |b|)(

ui − ui−1

h

).


OutlineMotivation



Preconditioning


Introduction to Finite Difference Discretizations – 2-D

Consider the Poisson problem

−(∂2u∂x2 +

∂2u∂y2

)= f in Ω

u = 0 on ∂Ω,

where Ω is the rectangle (0, 1)× (0, 1).Discretize in the x and y directions with uniform grids: takem + 2 points in x and n + 2 points in y.So xi = ihx, i = 0, 1, . . . ,m + 1 andyj = jhy, j = 0, 1, . . . , n + 1, where

hx =1

m + 1, hy =

1n + 1

.


OutlineMotivation



Preconditioning


Finite Difference Discretizations for Poisson Problem

Values at the boundary are known, so we only number theinterior points, i.e., the points(xi, yj), i = 1, 2, . . . ,m, j = 1, 2, . . . , n.

At (xi, yj), centered-difference approximation gives

1h2

x(−ui+1,j + 2ui,j − ui−1,j) +

1h2

y(−ui,j+1 + 2ui,j − ui,j−1) = fi,

where ui,j ≈ u(xi, yj), etc.

Gives a linear system Ax = f , with A block tridiagonal.


OutlineMotivation



Preconditioning


Finite Difference Discretization of Poisson – Form of A

If hx = hy = h, then A is of the form

A =1h2

B −I−I B −I

. . . . . . . . .−I B −I

−I B

.

B is tridiagonal with diagonal 4 and off-diagonals of -1.

Therefore, A is a sparse matrix.


OutlineMotivation



Preconditioning


Finite Difference Discretizations of Poisson - Pattern ofNonzero Entries A

Pattern of nonzero entries in A if m = 7, n = 5:


OutlineMotivation



Preconditioning

Splittings and OverrelaxationConvergence Results

What is an Iterative Method?

A method in which you first “guess" the solution, then apply analgorithm to obtain a better approximation to the solution.

Example from Calculus: Newton’s method.


OutlineMotivation



Preconditioning


Relationship Between Error and Residual

Given an approximate solution x ∈ Rn to the linear systemAx = b, the residual vector is r = b− Ax.

In general, we expect ‖r‖ small =⇒ ‖x− x‖ small.

It turns out that if x is an approximation to the solution ofAx = b, A is nonsingular, and r is the residual vector for x, then,for any induced norm,

‖x− x‖ ≤ ‖r‖‖A−1‖.

This is important because we don’t know the exact solution inmost cases.


OutlineMotivation



Preconditioning


Relaxation Methods

Consider the linear system

Ax = b,

where A is an n× n real matrix and b ∈ Rn.

Start with a given approximate solution (our “guess").

Modify the components of the approximation in a certain order.

Stop when convergence is reached (in terms of a set tolerance).

Decompose A as A = D− E − F, where D is the diagonal of A,−E is the strictly lower part, and −F is the strictly upper part.


OutlineMotivation



Preconditioning


Jacobi Iteration

The ith component of the (k + 1)st approximation of x is definedto eliminate the ith component of the residual (i.e., solve(b− Ax(k+1))i = 0).So, we have

aiix(k+1)i = −

n∑i=1i 6=j

aijx(k)j + bi, i = 1, 2, . . . , n or

x(k+1)i =

1aii

bi −n∑

i=1i 6=j

aijx(k)j

, i = 1, 2, . . . , n.


OutlineMotivation



Preconditioning


Vector Form of Jacobi

Writing the above in vector form gives

x(k+1) = D−1(E + F)x(k) + D−1b.

Alternately, we have

Dx(k+1) = (E + F)x(k) + b.


OutlineMotivation



Preconditioning


Gauss-Seidel Iteration

Corrects the ith component of the current approximate solution,in the same order as Jacobi, but updates the approximate solutionimmediately when the new component is determined.So, the residual at the ith step in the (k + 1)st iteration is

bi −i−1∑j=1

aijx(k+1)j − aiix

(k+1)i −

n∑j=i+1

aijx(k)j = 0, i = 1, . . . , n.

Resulting iteration:

x(k+1)i =

1aii

bi −i−1∑j=1

aijx(k+1)j −

n∑j=i+1

aijx(k)j

, i = 1, . . . , n.


OutlineMotivation



Preconditioning


Vector Form of Gauss-Seidel Iteration

In vector form, we can write the defining equation as

b + Ex(k+1) − Dx(k+1) + Fx(k) = 0.

Solving for x(k+1) gives

x(k+1) = (D− E)−1Fx(k) + (D− E)−1b.

Note that this is equivalent to solving the triangular system

(D− E)x(k+1) = Fx(k) + b.


OutlineMotivation



Preconditioning


Backward Gauss-Seidel Iteration

Backward Gauss-Seidel iteration may be defined as

(D− F)x(k+1) = Ex(k) + b.

This is equivalent ot making the coordinate corrections in thereverse order.


OutlineMotivation



Preconditioning


Symmetric Gauss-Seidel Iteration

Consists of a forward sweep followed by a backward sweep.

So, do the following:

(D− E)x(∗) = Fx(k) + b

(D− F)x(k+1) = Ex(∗) + b.


OutlineMotivation



Preconditioning


Splittings

Jacobi and Gauss-Seidel iterations both have the form

Mx(k+1) = Nx(k) + b = (M − A)x(k) + b, (1)

where A = M − N is a splitting of A.

For Jacobi, M = D.For Gauss-Seidel

Forward Gauss-Seidel: M = D− E.Backward Gauss-Seidel: M = D− F

An iterative method of the form (1) can be obtained for anysplitting of the form A = M − N, where M is nonsingular.


OutlineMotivation



Preconditioning


Successive Overrelaxation (SOR) Method

Overrelaxation is based on the splitting

ωA = (D− ωE)− (ωF + (1− ω)D).

The corresponding successive overrelaxation method (SOR) isgiven by the recursion

(D− ωE)x(k+1) = [ωF + (1− ωD)]x(k) + ωb.

This iteration corresponds to the relaxation sequence

x(k+1)i = ωxGS

i + (1− ω)x(k)i , i = 1, 2, . . . , n,

where xGSi is defined by the Gauss-Seidel iteration.

A backward SOR sweep can be defined analogously to thebackward Gauss-Seidel sweep.


OutlineMotivation



Preconditioning


Symmetric SOR (SSOR)

One SSOR step consists of a forward SOR step followed by abackward SOR step:

(D− ωE)x(k+12 ) = [ωF + (1− ω)D]x(k) + ωb

(D− ωF)x(k+1) = [ωE + (1− ω)D]x(k+12 ) + ωb

This gives the recurrence

x(k+1) = Gωx(k) + fω,

where

Gω = (D− ωF)−1(ωE + (1− ω)D)(D− ωE)−1(ωF + (1− ω)D)

fω = ω(2− ω)(D− ωF)−1D(D− ωE)−1b.


OutlineMotivation



Preconditioning


Comparison of Results: Poisson Problem

Consider the problem −(uxx + uyy) = 0 on (0, 1)× (0, 1), with u = 0on the boundary.

Table: Iterations until Convergence to Tolerance of 10−6

Grid size Jacobi Gauss-Seidel SOR SSOR5× 10 271 137 32 36

10× 10 418 211 44 4510× 15 655 330 54 57

Note: SOR and SSOR are computed using approximate optimal ω.


OutlineMotivation



Preconditioning


Comparison of Results: Example with Discontinuous a

Consider the problem −((aux)x + (auy)y) = 0 on (0, 1)× (0, 1), withu = 0 on the boundary, and

a(x, y) =

100, if 0.25 < x < 0.75 and 0.25 < y < 0.751, otherwise

.


Grid size Jacobi Gauss-Seidel SOR SSOR5× 10 >100,000 >70,000 792 28,80310× 10 >200,000 >100,000 1,078 33,51910× 15 >300,000 >100,000 1,136 58,133

Note: SOR and SSOR are computed using approximate optimal ω.Dr. Doreen De Leon Introduction to Iterative Methods for Solving Sparse Linear Systems

OutlineMotivation



Preconditioning


Iteration Matrices

Jacobi, Gauss-Seidel, SOR, and SSOR can be written in the form

x(k+1) = Gx(k) + f . (2)

G for Jacobi and Gauss-SeidelGJA = I − D−1AGGS = I − (D− E)−1A

Consider the splitting A = M − N. Then we can define a linearfixed-point iteration

x(k+1) = M−1Nx(k) + M−1b, (3)

so in (2),

G = M−1N = M−1(M − A) = I −M−1A, f = M−1b.


OutlineMotivation



Preconditioning


M Matrices for Jacobi, Gauss-Seidel, SOR, and SSOR

MJA = D

MGS = D− E

MSOR =1ω

(D− ωE)

MSSOR =1

ω(2− ω)(D− ωE)D−1(D− ωF)


OutlineMotivation



Preconditioning


General Convergence Results

Consider an iteration scheme defined by x(k+1) = Gx(k) + f .Questions to answer:

1 If the method converges, does it converge to the correct answer?2 Under what conditions does the iteration converge?3 When the iteration converges, how fast does it converge?


OutlineMotivation



Preconditioning


Answering the First Convergence Question

The first question is easy to answer.

If the iteration converges, its limit satisfies x = Gx + f .

If A = M − N and G = M−1N, then the iteration can be writtenas

x(k+1) = M−1Nx(k) + M−1b,

which in the limit satisfies

Mx = Nx + b, or Ax = b.


OutlineMotivation



Preconditioning


Conditions for Convergence

TheoremLet G be a square matrix such that ρ(G) < 1. Then I − G isnonsingular and the iteration (2) converges for any f and any initialguess. Conversely, if the iteration (2) converges for any f and anyinitial guess, then ρ(G) < 1.

Note: We can replace ρ(G) by ‖G‖ for some matrix norm.


OutlineMotivation



Preconditioning


Speed of Convergence

The error d(k) = x(k) − x at step k satisfies d(k) = Gkd(0).Some work shows that ‖d(k)‖ has the asymptotic form

‖d(k)‖ ≈ C|λk−ρ+1|(

kp− 1

).

The convergence factor has the limit

ρ = limk→∞

(‖d(k)‖‖d(0)‖

) 1k

.

It can be shown that ρ = ρ(G).The convergence rate τ is defined by τ = − ln ρ.


OutlineMotivation



Preconditioning


Some Other Convergence Results

TheoremIf A is strictly diagonally dominant or irreducibly diagonallydominant, then the associated Jacobi and Gauss-Seidel iterationsconverge for any initial guess.

TheoremIf A is symmetric with positive diagonal elements and for 0 < ω < 2,SOR converges for any x(0) iff A is positive definite.


OutlineMotivation



Preconditioning


The Optimal ω for SOR

The optimal ω for SOR satisfies

ω =2

1 +√

1− ρ(GJA)2,

where GJA is the iteration matrix for Jacobi. (Results from a theoremand additional work.)


OutlineMotivation



Preconditioning

Orthogonal Krylov Subspace MethodsBiorthogonal Krylov Subspace Methods

Introduction to Projection Methods

Consider the linear system Ax = b, where A is a real n× nmatrix.

Idea of projection methods is to extract an approximate solutionto the system from a subspace of Rn.

Let K be the subspace of candidate approximants (or the searchsubspace) and let dim(K) = m. Then m constraints are requiredto obtain an approximation.


OutlineMotivation



Preconditioning


Introduction to Projection Methods (cont.)

Typical way to describe constraints is to impose m independentorthogonality conditions: the residual vector b− Ax isconstrained to be orthogonal to m linearly independent vectors(the Petrov-Galerkin condition).

This defines a subspace L of dimension m, the subspace ofconstraints.Two classes of projection methods

orthogonal: L = Koblique: L is different from K and may be totally unrelated to it.


OutlineMotivation



Preconditioning


General Projection Methods

A projection technique onto the subspace K and orthogonal to Lfinds an approximate solution to the system of equations bydoing the following:

Find x ∈ K such that b− Ax ⊥ L.

To take advantage of the knowledge of an initial guess x0 to thesolution, the approximations must be sought in the affine spacex0 +K, so we do:

Find x ∈ x0 +K such that b− Ax ⊥ L.

Gauss-Seidel can be viewed as a projection method withK = L = spane1, and the projection steps are cycled fori = 1, 2, . . . , n until convergence.


OutlineMotivation



Preconditioning


Optimality Results I

PropositionSuppose A is SPD and L = K. Then a vector x is the result of anorthogonal projection method onto K with starting vector x0 iff itminimizes the A-norm of the error over x0 +K, i.e., iff

E(x) = minx∈x0+K

E(x),

whereE(x) ≡ (A(x∗ − x), x∗ − x)

12 .


OutlineMotivation



Preconditioning


Optimality Results II

PropositionLet A be an arbitrary square matrix and assume L = AK. Then avector x is the result of an oblique projection method onto Korthogonally to L with the starting vector x0 iff it minimizes the2-norm of the residual vector r = b− Ax over x ∈ x0 +K, i.e., iff

R(x) = minx∈x0+K

R(x),

whereR(x) ≡ ‖b− Ax‖2.


OutlineMotivation



Preconditioning


Orthogonal Krylov Subspace Methods

The subspace Km in the projection method has the form

Km(A, r0) = spanr0,Ar0,A2r0, . . . ,Am−1r0,

where r0 = b− Ax0.

Typically, Km(A, r0) is denoted by Km when there is noambiguity.

Different Krylov subspace methods result from different choicesof Lm and different ways the system is preconditioned.


OutlineMotivation



Preconditioning


Approximation Theory View

The approximations obtained from a Krylov subspace methodtake the form

A−1b ≈ xm = x0 + qm−1(A)r0,

where qm−1(A) is a polynomial of degree m− 1.

If x0 = 0, then this gives A−1b ≈ qm−1(A)b, so A−1b isapproximated by qm−1(A)b.


OutlineMotivation



Preconditioning


Arnoldi’s Procedure

An algorithm for building an orthogonal basis of the Krylovsubspace Km.One variant of the algorithm:

1. Choose a vector v1 such that ‖v1‖2 = 12. For j = 1, 2, . . . ,m, Do3. Compute hij = (Avj, vi) for i = 1, 2, . . . , j4. Compute wj = Avj −

∑ji=1 hijvi

5. hj+1,j = ‖wj‖26. If hj+1,j = 0, then Stop7. vj+1 = wj/hj+1,j8. End Do


OutlineMotivation



Preconditioning


Generalized Minimum Residual Method (GMRES)

Based on taking K = Km and L = AKm.

Such a technique minimizes the residual norm over all vectors inx0 +Km.

Let Km = spanv1,Av1, . . .Am−1v1 and let v1 = r0/‖r0‖2.

Basic idea: Any vector x ∈ x0 +Km can be written as

x = x0 + Vmy,

where Vm is the n×m matrix whose columns are the orthonormalbasis v1, v2, . . . , vm of Arnoldi’s procedure and y ∈ Rm.


OutlineMotivation



Preconditioning


GMRES (cont.)

Define

J(y) = ‖b− Ax‖2 = ‖b− A(x0 + Vmy)‖2.

Then, since by Arnoldi’s procedure, AVm = Vm+1Hm, whereHm = (hij),

b− Ax = b− A(x0 + Vmy)

= r0 − AVmy

= ‖r0‖2v1 − Vm+1Hmy

= Vm+1(‖r0‖2e1 − Hmy).

So, J(y) = ‖Vm+1(‖r0‖2e1 − Hmy)‖2 = ‖‖r0‖2e1 − Hmy‖2.


OutlineMotivation



Preconditioning


GMRES (cont.)

GMRES approximates the unique vector in x0 +Km thatminimizes J(y), i.e., xm = x0 + Vmym, where ym minimizes J(y).

The miminization of ym is inexpensive to compute because itonly requires the solution of an (m + 1)× m least-squaresproblem, where m is typically small.

If A is nonsingular, the GMRES algorithm only breaks down atstep j (i.e., hj+1,j = 0) iff the approximate solution xj is exact.


OutlineMotivation



Preconditioning


Conjugate Gradient (CG)

One of the best known iterative techniques for solving sparseSPD linear systems.An orthogonal projection technique onto the Krylov subspaceKm(r0,A), where r0 is the initial residual.Based on the Lanczos method for symmetric linear systems(which is similar to Arnoldi’s procedure).Direct version of Lanczos introduces auxiliary vectorspj, j = 0, 1, . . . .Useful properties of this algorithm:

The residual vectors rj are orthogonal to each other; andThe auxiliary vectors pj form an A-conjugate set; i.e.,(Api, pj) = 0 if i 6= j.


OutlineMotivation



Preconditioning


Algorithm for CG

The conjugate gradient algorithm takes advantage of the aboveproperties.Standard formulation of the algorithm:

1. Compute r0 = b− Ax0, p0 = r02. For j = 0, 1, . . . until convergence, Do3. αj = (rj, rj)/(Apj, pj)4. xj+1 = xj + αjpj

5. rj+1 = rj − αjApj

6. βj = (rj+1, rj+1)/(rj, rj)7. pj+1 = rj+1 + βjpj

8. End Do


OutlineMotivation



Preconditioning





Grid size GMRES CG5× 10 15 16

10× 10 15 1615× 15 30 31


OutlineMotivation



Preconditioning


Some Convergence Results

We can show that if A is a positive definite matrix, thenGMRES(m) converges for any m ≥ 1.

The above is true because the subspace Km contains the initialresidual vector at each restart and mimizes the residual norm inthe subspace Km in each outer iteration.

We can also show that for CG, if we define κ = λmax/λmin, thenthe error satisfies

‖x∗ − xm‖A ≤ 2[√

κ− 1√κ+ 1

]m

‖x∗ − xm‖A.


OutlineMotivation



Preconditioning


Biorthogonal Krylov Subspace Methods

Based on a biorthogonalization algorithm due to Lanczos.

The algorithm by Lanczos for nonsymmetric matrices builds apair of biorthogonal bases for the subspaces

Km(A, v1) = spanv1,Av1, . . . ,Am−1v1and

Km(AT ,w1) = spanw1,ATw1, . . . , (AT)m−1w1


OutlineMotivation



Preconditioning


Lanczos Biorthogonalization Procedure

1. Choose two vectors v1 and w1 such that (v1,w1) = 12. Set β1 = δ1 = 0, w0 = v0 = 0.3. For j = 1, 2, . . . ,m Do4. αj = (Avj,wj)

5. vj+1 = Avj − αjvj − βjvj−1

6. wj+1 = Atwj − αjwj − δjwj−1

7. δj+1 = |(vj+1, wj+1)|12 ; If δj+1 = 0, Stop

8. βj+1 = ((vj+1, wj+1)/δj+1

9. wj+1 = wj+1/βj+1

10. vj+1 = vj+1/βj+1

11. End DoDr. Doreen De Leon Introduction to Iterative Methods for Solving Sparse Linear Systems

OutlineMotivation



Preconditioning


The Biconjugate Gradient Algorithm (BCG)

The Lanczos algorithm (if it does not break down) gives two setsof vectors vi and wi that are biorthogonal and that formbases of Km(A, v1) and Km(AT ,w1), respectively.

BCG is a projection process ontoKm = spanv1,Av1, . . . ,Am−1v1 orthogonal toLm = spanw1,ATw1, . . . , (AT)m−1w1, where v1 = r0/‖r0‖2.

If there is a dual system ATx∗ = b∗ to solve, then w1 is obtainedby scaling the initial residual b∗ − Ax∗0; if not, then w1 isobtained by choosing an “initial residual" r∗ not orthogonal to r0and scaling it.


OutlineMotivation



Preconditioning


Algorithm for BCG

Note that the algorithm is similar to CG.Standard formulation of the algorithm:

1. Compute r0 = b− Ax0; Choose r∗0 so that (r0, r∗0 ) 6= 02. Set p0 = r0, p∗0 = r∗03. For j = 0, 1, . . . until convergence, Do4. αj = (rj, r∗j )/(Apj, p∗j )5. xj+1 = xj + αjpj


7. r∗j+1 = r∗j − αjATp∗j8. βj = (rj+1, r∗j+1)/(rj, r∗j )9. pj+1 = rj+1 + βjpj

10. p∗rj+1 = r∗j+1 + βjp∗j11. End Do


OutlineMotivation



Preconditioning


Orthogonality Properties of the Vectors Produced by BCG

TheoremThe vectors produced by the BCG algorithm satisfy the followingorthogonality properties

(ri, r∗j ) = 0 for i 6= j (4)

(Api, p∗j ) = 0 for i 6= j. (5)


OutlineMotivation



Preconditioning


Transpose-free Variant of BCG: Conjugate GradientSquared (CGS)

Goal: avoid using AT in BCG and get faster convergence.Algorithm:

1. Compute r0 = b− Ax0l; Choose r∗0 to be arbitrary2. Set p0 = u0 = r03. For j = 0, 1, . . . until convergence, Do4. αj = (rj, r∗0 )/(Apj, r∗0 )5. qj = uj − αjApj

6. xj+1 = xj + αj(uj + qj)7. rj+1 = rj − αjA(uj + qj)8. βj = (rj+1, r∗0 )/(rj, r∗0 )9. uj+1 = rj+1 + βjqj

10. pj+1 = uj+1 + βj(qj + βjpj)11. End Do


OutlineMotivation



Preconditioning


CGS (cont.)

Roughly the same computational cost as BCG.

Two matrix-by-vector products with matrix A are performed ateach step in the algorithm.

Expectation: CGS should converge twice as fast as BCG (whenit converges).

Problem: Rounding errors tend to be more problematic due tosquaring of the polynomials.


OutlineMotivation



Preconditioning


Biconjugate Gradient Stabilized (BICGSTAB)

Remedies the issue of build-up of round-off error.Algorithm:

1. Compute r0 = b− Ax0; Choose r∗0 to be arbitrary2. Set p0 = r03. For j = 0, 1, . . . until convergence, Do4. αj = (rj, r∗0 )/(Apj, r∗0 )5. sj = rj − αjApj

6. ωj = (Asj, sj)/(Asj,Asj)7. xj+1 = xj + αjpj + ωjsj

8. rj+1 = sj − ωjAsj

9. βj =(rj+1, r∗0 )

(rj, r∗0 )·αj

ωj10. pj+1 = rj+1 + βj(pj + ωjApj)11. End Do


OutlineMotivation



Preconditioning





Grid size BCG CGS BICGSTAB10× 10 16 15 1415× 15 31 26 2320× 20 41 33 30


OutlineMotivation



Preconditioning


Comparison of Results: Convection-Diffusion Problem

Consider the problem −(uxx + uyy) + 10ux + 10uy = 0 on(0, 1)× (0, 1), with u = 0 on the boundary


Grid size BCG CGS BICGSTAB10× 10 31 20 2015× 15 53 35 3120× 20 74 46 41


OutlineMotivation



Preconditioning

Improving Convergence: Preconditioning

The methods discussed typically converge slowly for problemsarising from applications.

Preconditioning is key to improving convergence.

Preconditioning is a way of changing the original system intoone with the same solution, but easier to solve with an iterativemethod.

Reliability of such techniques is strongly dependant on thequality of the preconditioner.


OutlineMotivation



Preconditioning

Idea Behind Preconditioning

First, find a preconditioning matrix M.Requirements for M:

It must be computationally inexpensive to solve Mx = b.M should be close to A in some sense.M should be nonsingular.

Ways of applying preconditionersFrom the left, giving the preconditioned system:M−1Ax = M−1b.To the right: AM−1u = b, x = M−1u.When M is available in factored form, M = MLMR, wheretypically ML and MR are triangular matrices, the preconditioningcan be split: M−1

L AM−1R u = b, x = M−1

R u.


OutlineMotivation



Preconditioning

Two Example Preconditioners

Define M as the M matrix from the splitting A = M − N, e.g.,M = MSSOR.

Define M by computing an incomplete LU-factorization of A.


OutlineMotivation



Preconditioning

Preconditioned Conjugate Gradient

M must be SPD.Algorithm:

1. Compute r0 = b− Ax0, z0 = M−1r0, and p0 = z02. For j = 0, 1, . . . until convergence, Do3. αj = (rj, zj)/(Apj, pj)4. xj+1 = xj + αjpj


6. zj+1 = M−1r)j + 17. βj = (rj+1, zj+1)/(rj, zj)8. pj+1 = rj+1 + βjpj

8. End Do

We will use SSOR for the preconditioner in the examples.


OutlineMotivation



Preconditioning




Grid size CG SSOR PCG10× 10 16 45 315× 15 31 66 420× 20 41 87 5

Note: ω = 1.5 for SSOR in PCG.


OutlineMotivation



Preconditioning

Comparison of Results: Diffusion Problem with aDiscontinuous

Consider the problem −((aux)x + (auy)y) = 0 on (0, 1)× (0, 1), withu = 0 on the boundary, anda(x, y) = 100, 0.25 < x, y < 0.75, and 1, otherwise.


Grid size CG SSOR PCG10× 10 38 1,000 715× 15 144 1,000 1620× 20 280 1,000 32

Note: ω = 1.5 for SSOR in PCG.


OutlineMotivation



Preconditioning

Reference

[1] Gene Golub and Charles Van Loan, Matrix Computations, 3rdedition, Baltimore, MD, The Johns Hopkins University Press(1996).

[2] Yousef Saad, Iterative Methods for Sparse Linear Systems, 2ndedition, SIAM, 2003.


Documents

Introduction to Iterative Methods for Solving Sparse ...zimmer.fresnostate.edu/.../itmethods_spmat.pdf · 4 Basic Iterative Methods Splittings and Overrelaxation Convergence Results