36
Iterative Methods in Linear Algebra (part 1) Stan Tomov Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday March 31, 2010 CS 594, 03-31-2010

Iterative Methods in Linear Algebra (part 1)

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Iterative Methods in Linear Algebra (part 1)

Iterative Methods in Linear Algebra(part 1)

Stan Tomov

Innovative Computing LaboratoryComputer Science Department

The University of Tennessee

Wednesday March 31, 2010

CS 594, 03-31-2010

Page 2: Iterative Methods in Linear Algebra (part 1)

CS 594, 03-31-2010

Page 3: Iterative Methods in Linear Algebra (part 1)

Outline

Part IDiscussionMotivation for iterative methods

Part IIStationary Iterative Methods

Part IIINonstationary Iterative Methods

CS 594, 03-31-2010

Page 4: Iterative Methods in Linear Algebra (part 1)

Part I

Discussion

CS 594, 03-31-2010

Page 5: Iterative Methods in Linear Algebra (part 1)

Discussion

Part 1:

CS 594, 03-31-2010

Page 6: Iterative Methods in Linear Algebra (part 1)

Discussion

Part 2:

CS 594, 03-31-2010

Page 7: Iterative Methods in Linear Algebra (part 1)

Discussion

Original matrix:>> load matrix.output>> S = spconvert(matrix);>> spy(S)

Reordered matrix:>> load A.data>> S = spconvert(A);>> spy(S)

CS 594, 03-31-2010

Page 8: Iterative Methods in Linear Algebra (part 1)

Discussion

Original sub-matrix:>> load matrix.output

>> S = spconvert(matrix);

>> spy(S(1:8660),1:8660))

Reordered sub-matrix:>> load A.data

>> S = spconvert(A);

>> spy(S(8602:8700,8602:8700))

CS 594, 03-31-2010

Page 9: Iterative Methods in Linear Algebra (part 1)

Discussion

Results on torc0:Pentium III 933 MHz, 16 KB L1and 256 KB L2 cache

Original matrix:Mflop/s = 42.4568

Reordered matrix:Mflop/s = 45.3954

BCRSMflop/s = 72.17374

Results on woodstock:Intel Xeon 5160 Woodcrest3.0GHz, L1 32 KB, L2 4 MB

Original matrix:Mflop/s = 386.8

Reordered matrix:Mflop/s = 387.6

BCRSMflop/s = 894.2

CS 594, 03-31-2010

Page 10: Iterative Methods in Linear Algebra (part 1)

Homework 9

CS 594, 03-31-2010

Page 11: Iterative Methods in Linear Algebra (part 1)

Questions?

Homework #9

PART I

PETScMany iterative solversWe will see how is projection used in iterative methods

PART II

hybrid CPU-GPUs computations

CS 594, 03-31-2010

Page 12: Iterative Methods in Linear Algebra (part 1)

Hybrid CPU-GPU computations

CS 594, 03-31-2010

Page 13: Iterative Methods in Linear Algebra (part 1)

Iterative Methods

CS 594, 03-31-2010

Page 14: Iterative Methods in Linear Algebra (part 1)

Motivation

So far we have discussed and have seen:

Many engineering and physics simulations lead to sparsematricese.g. PDE based simulations and their discretizations based on

FEM/FVMfinite differencesPetrov-Galerkin type of conditions, etc.

How to optimize performance on sparse matrix computations

Some software how to solve sparse matrix problems (e.g.PETSc)

The question is:Can we solve sparse matrix problems faster than using directsparse methods?

In certain cases Yes:using iterative methods

CS 594, 03-31-2010

Page 15: Iterative Methods in Linear Algebra (part 1)

Sparse direct methods

Sparse direct methods

These are based on Gaussian elimination (LU)

Performed in sparse format

Are there limitations of the approach?

Yes, they have fill-ins which lead to

more memory requirementsmore flops being performed

Fill-ins can become prohibitively high

CS 594, 03-31-2010

Page 16: Iterative Methods in Linear Algebra (part 1)

Sparse direct methods

Consider LU for the matrix below

a nonzero is represented bya *

1st step of LU factorization willintroduce fill-ins

marked by an F

CS 594, 03-31-2010

Page 17: Iterative Methods in Linear Algebra (part 1)

Sparse direct methods

Fill-ins can be improved by reordering

Remember: we talked about it in slightly different context(for speed and parallelization)

Consider the following reordering

These were extreme cases

but still, the problem exists

CS 594, 03-31-2010

Page 18: Iterative Methods in Linear Algebra (part 1)

Sparse methods

Sparse direct vs dense methods

Dense takes O(n2) storage, O(n3) flops, runs within peakperformance

Sparse direct can take O(#nonz) storage, and O(#nonz) flops, butthese can also grow due to fill-ins and performance is bad

with #nonz << n2 and ’proper’ ordering it pays off to do sparsedirect

Software (table from Julien Langou)http://www.netlib.org/utk/people/JackDongarra/la-sw.html

Package Support Type Language ModeReal Complex f77 c c++ Seq Dist SPD Gen

DSCPACK yes X X X M XHSL yes X X X X X X

MFACT yes X X X XMUMPS yes X X X X X M X X

PSPASES yes X X X M XSPARSE ? X X X X X X

SPOOLES ? X X X X M XSuperLU yes X X X X X M XTAUCS yes X X X X X X

UMFPACK yes X X X X XY12M ? X X X X X

CS 594, 03-31-2010

Page 19: Iterative Methods in Linear Algebra (part 1)

Sparse methods

What about Iterative Methods? Think for example

xi+1 = xi + P(b − Axi )

Pluses:

Storage is only O(#nonz) (for the matrix and a few workingvectors)

Can take only a few iterations to converge (e.g. << n)

for P = A−1 it takes 1 iteration (check)!

In general easy to parallelize

CS 594, 03-31-2010

Page 20: Iterative Methods in Linear Algebra (part 1)

Sparse methods

What about Iterative Methods? Think for example

xi+1 = xi + P(b − Axi )

Minuses:

Performance can be bad(as we saw in Lecture 11 and today’s discussion)

Convergence can be slow or even stagnateBut can be improved with preconditioning

Think of P as a preconditioner, an operator/matrix P ≈ A−1

Optimal preconditioners (e.g. multigrid can be) lead toconvergence in O(1) iterations

CS 594, 03-31-2010

Page 21: Iterative Methods in Linear Algebra (part 1)

Part II

Stationary Iterative Methods

CS 594, 03-31-2010

Page 22: Iterative Methods in Linear Algebra (part 1)

Stationary Iterative Methods

Can be expressed in the form

xi+1 = Bxi + c

where B and c do not depend on i

older, simpler, easy to implement, but usually not as effective(as nonstationary)

examples: Richardson, Jacobi, Gauss-Seidel, SOR, etc. (next)

CS 594, 03-31-2010

Page 23: Iterative Methods in Linear Algebra (part 1)

Richardson Method

Richardson iteration

xi+1 = xi + (b − Axi ) = (I − A)xi + b (1)

i.e. B from the definition above is B = I − A

Denote ei = x − xi and rewrite (1)

x − xi+1 = x − xi − (Ax − Axi )

ei+1 = ei − Aei

= (I − A)ei

||ei+1|| ≤ ||(I − A)ei || ≤ ||I − A|| ||ei || ≤ ||I − A||2 ||ei−1||

≤ · · · ≤ ||I − A||i ||e0||

i.e. for convergence (ei → 0) we need

||I − A|| < 1

for some norm || · ||, e.g. when ρ(A) < 1.

CS 594, 03-31-2010

Page 24: Iterative Methods in Linear Algebra (part 1)

Jacobi Method

Jacobi Method

xi+1 = xi + D−1(b − Axi ) = (I − D−1A)xi + D−1b (2)

where D is the diagonal of A (assumed nonzero; B = I − D−1Afrom the definition)

Denote ei = x − xi and rewrite (2)

x − xi+1 = x − xi − D−1(Ax − Axi )

ei+1 = ei − D−1Aei

= (I − D−1A)ei

||ei+1|| ≤ ||(I − D−1A)ei || ≤ ||I − D−1A|| ||ei || ≤ ||I − D−1A||2 ||ei−1||

≤ · · · ≤ ||I − D−1A||i ||e0||

i.e. for convergence (ei → 0) we need

||I − D−1A|| < 1

for some norm || · ||, e.g. when ρ(D−1A) < 1

CS 594, 03-31-2010

Page 25: Iterative Methods in Linear Algebra (part 1)

Gauss-Seidel Method

Gauss-Seidel Method

xi+1 = (D − L)−1(Uxi + b) (3)

where −L is the lower and −U the upper triangular part of A, andD is the diagonal.

Equivalently:8>>>>>><>>>>>>:

x(i+1)1 = (b1 − a12x

(i)2 − a13x

(i)3 − a14x

(i)4 − . . .− a1nx

(i)n )/a11

x(i+1)2 = (b2 − a21x

(i+1)1 − a23x

(i)3 − a24x

(i)4 − . . .− a2nx

(i)n )/a22

x(i+1)3 = (b2 − a31x

(i+1)1 − a32x

(i+1)2 − a34x

(i)4 − . . .− a3nx

(i)n )/a22

...

x(i+1)n = (bn − an1x

(i+1)1 − an2x

(i+1)2 − an3x

(i+1)3 − . . .− an,n−1x

(i+1)n−1 )/ann

CS 594, 03-31-2010

Page 26: Iterative Methods in Linear Algebra (part 1)

A comparison (from Julien Langou slides)

Gauss-Seidel method

A =

0BBB@10 1 2 0 10 12 1 3 11 2 9 1 00 3 1 10 01 2 0 0 15

1CCCA b =

0BBB@2344364980

1CCCA

‖b − Ax(i)‖2/‖b‖2, function of i , the number of iterations

x(0) x(1) x(2) x(3) x(4) . . .0.0000 2.3000 0.8783 0.9790 0.9978 1.00000.0000 3.6667 2.1548 2.0107 2.0005 2.00000.0000 2.9296 3.0339 3.0055 3.0006 3.00000.0000 3.5070 3.9502 3.9962 3.9998 4.00000.0000 4.6911 4.9875 5.0000 5.0001 5.0000

CS 594, 03-31-2010

Page 27: Iterative Methods in Linear Algebra (part 1)

A comparison (from Julien Langou slides)Jacobi Method: 8>>>>>>>><>>>>>>>>:

x(i+1)1 = (b1 − a12x

(i)2 − a13x

(i)3 − a14x

(i)4 − . . .− a1nx

(i)n )/a11

x(i+1)2 = (b2 − a21x

(i)1 − a23x

(i)3 − a24x

(i)4 − . . .− a2nx

(i)n )/a22

x(i+1)3 = (b2 − a31x

(i)1 − a32x

(i)2 − a34x

(i)4 − . . .− a3nx

(i)n )/a22

.

.

.

x(i+1)n = (bn − an1x

(i)1 − an2x

(i)2 − an3x

(i)3 − . . .− an,n−1x

(i)n−1)/ann

Gauss-Seidel method8>>>>>>>><>>>>>>>>:

x(i+1)1 = (b1 − a12x

(i)2 − a13x

(i)3 − a14x

(i)4 − . . .− a1nx

(i)n )/a11

x(i+1)2 = (b2 − a21x

(i+1)1 − a23x

(i)3 − a24x

(i)4 − . . .− a2nx

(i)n )/a22

x(i+1)3 = (b2 − a31x

(i+1)1 − a32x

(i+1)2 − a34x

(i)4 − . . .− a3nx

(i)n )/a22

.

.

.

x(i+1)n = (bn − an1x

(i+1)1 − an2x

(i+1)2 − an3x

(i+1)3 − . . .− an,n−1x

(i+1)n−1 )/ann

Gauss-Seidel is the method with better numerical properties(less iterations to convergence). Which is the method withbetter efficiency in term of implementation in sequential orparallel computer? Why?In Gauss-Seidel, the computation of x

(i+1)k+1 implies the knowledge of x

(i+1)k .

Parallelization is impossible.

CS 594, 03-31-2010

Page 28: Iterative Methods in Linear Algebra (part 1)

Convergence

Convergence can be very slowConsider a modified Richardson method:

xi+1 = xi + τ(b − Axi ) (4)

Convergence is linear, similarly to Richardson we get

||ei+1|| ≤ ||I − τA||i ||e0||

but can be very slow (if ||I − τA|| is close to 1), e.g. let

A be symmetric and positive definite (SPD)

the matrix norm in ||I − τA|| is induced by || · ||2Then the best choice for τ (τ = 2

λ1+λN) would give

||I − τA|| =k(A)− 1

k(A) + 1

where k(A) = λNλ1

is the condition number of A.

CS 594, 03-31-2010

Page 29: Iterative Methods in Linear Algebra (part 1)

Convergence

Note:

The rate of convergence depend on thecondition number k(A)

Even for the the best τ the rate of convergence

k(A)− 1

k(A) + 1

is slow (i.e. close to 1) for large k(A)

We will see convergence of nonstationary methods alsodepend on k(A) but is better, e.g. compare with CG√

k(A)− 1√k(A) + 1

CS 594, 03-31-2010

Page 30: Iterative Methods in Linear Algebra (part 1)

Part III

Nonstationary Iterative Methods

CS 594, 03-31-2010

Page 31: Iterative Methods in Linear Algebra (part 1)

Nonstationary Iterative Methods

The methods involve information that changes at every iteration

e.g. inner-products with residuals or other vectors, etc.

The methods are

newer, harder to understand, but more effective

in general based on the idea of orthogonal vectors andsubspace projections

examples: Krylov iterative methods

CG/PCG, GMRES, CGNE, QMR, BiCG, etc.

CS 594, 03-31-2010

Page 32: Iterative Methods in Linear Algebra (part 1)

Krylov Iterative Methods

CS 594, 03-31-2010

Page 33: Iterative Methods in Linear Algebra (part 1)

Krylov Iterative Methods

Krylov subspaces: these are the spaces

Ki (A, r) = span{r , Ar , A2r , . . . , Ai−1r}

Krylov iterative methods find approximation xi to x where

Ax = b,

as a minimization on Ki (A, r).We have seen how this can be done for example by projection, i.e.by the the Petrov-Galerkin conditions:

(Axi , φ) = (b, φ) for ∀φ ∈ Ki (A, r)

CS 594, 03-31-2010

Page 34: Iterative Methods in Linear Algebra (part 1)

Krylov Iterative Methods

In general we

expend the Krylov subspace by a matrix-vector product, and

do a minimization/projection in it.

Various methods result by specific choices of expansion andminimization/projection.

CS 594, 03-31-2010

Page 35: Iterative Methods in Linear Algebra (part 1)

Krylov Iterative Methods

An example: The Conjugate Gradient Method (CG)

The method is for SPD matrices

There is a way at iteration i to construct new ’search direction’ pi s.t.

span{p0, p1, . . . , pi} ≡ Ki+1(A, r0) and (Api , pj ) = 0 for i 6= j.

Note: A is SPD⇒ (Api , pj ) ≡ (pi , pj )A can be used as an inner product, i.e. p0, . . . , pi is an (·, ·)Aorthogonal basis for Ki+1(A, r0)

⇒ we can easily find xi+1 ≈ x as

xi+1 = x0 + α0p0 + · · · + αi pi s.t.

(Axi+1, pj ) = (b, pj ) for j = 0, . . . , i

Namely, because of the (·, ·)A orthogonality of p0, . . . , pi at iteration i + 1 we have to find only αi

(Axi+1, pj ) = (A(xi + αi pi ), pi ) = (b, pi ), ⇒ αi =(ri , pi )

(Api , pi )

Note: xi above actually can be replaced by any x0 + v , v ∈ Ki (A, r0) (Why?)

CS 594, 03-31-2010

Page 36: Iterative Methods in Linear Algebra (part 1)

Learning Goals

A brief introduction to iterative methods

Motivation for iterative methods and links to previouslectures, namely

PDE discretization and sparse matricesOptimized implementations

Stationary iterative methods

Nonstationary iterative methods and links to building blocksthat we have already covered

Projection/MinimizationOrthogonalization

Krylov methods; an example with CG;to see more examples ... (next lecture)

CS 594, 03-31-2010