Krylov subspace methods de Sturler/crcd_01c.pdf · ©2001 Eric de Sturler A cunning plan: Since changes completely every step, apply a change of variablesym Alternative for update

©2001 Eric de Sturler

Eric de Sturler

Department of Computer Science

University of Illinois at Urbana-Champaign

[email protected] www-faculty.cs.uiuc.edu/sturler

Krylov subspace methods

Hermitian Problems


Consider again how GMRES builds an orthogonal basis for the

Krylov space :Km+1(A, r0)

Verify that the (Arnoldi) algorithmv1 = r0/ær0æ2;

for generates the following recurrence:k = 1 : m,

vk+1 = Avk;

for .j = 1 : k, AVm = Vm+1Hm+1,m

h j,k = v jHvk+1;

What does look like?vk+1 = vk+1 − h j,kvk; Hm+1,m

end

Prove is orthogonal.hk+1,k = ævk+1æ2; Vm+1

vk+1 = vk+1/hk+1,k;

end Note .Hm+1,m = Vm+1H AVm

and . So bothrange(Vm) = Km(A, r0) range(Vm+1) = Km+1(A, r0)

and from GCR contained in .range(Um) range(Cm) range(Vm+1)

MINRES

CRCD_01c.PRZ 1-2


Now consider A being Hermitian: AH= A

Another way to write the recurrence relation from Arnoldi:

, AVm = Vm+1Hm= VmHm + vm+1´m

Thm+1,m

where is the upper part of .Hm m %m Hm

So, .VmHAVm = Vm

HVmHm + Vm

Hvm+1´m

Thm+1,m = Hm

since , and so (VmHAVm )

H= Vm

HA

HVm = Vm

HAVm A

H= A

must be Hermitian as well.Hm

This has some important consequences ...

MINRES


A Hermitian upper Hessenberg matrix is tridiagonal!

This means that (in exact arithmetic) we need to orthogonalize each

new vector only against the vectors and .Av i v i−1 v i

We could solve the least squares problem in the same way as for

GMRES, except that we save on orthogonalizations (inner products

and vector updates).

What is the computational cost of iterations of GMRES?m

Theorem: Let be Hermitian and let be the vectorsA v1, v2, ¢ , vm

generated by the Arnoldi algorithm (so they span ). Then Km(A, v1)

and so .Av iΩv1, v2, ¢ , vi−2 Av iΩ span v1, v2,¢, v i−2

Proof:

MINRES

CRCD_01c.PRZ 3-4


The algorithm now proceeds as follows:

Lanczos recurrence: (T for tridiagonal).AVm = Vm+1Tm

Lanczos is Arnoldi in the Hermitian case (2 orthogonalizations).

Solve just as in GMRES:ym = arg minær0 − AVmyæ2

We have , AVm = Vm+1Tm= Vm+1Q

m

Rm

and we compute (solving least squares problem).ym = Rm−1Qm

HVm+1

H r0

Every step we update the QR-decomposition of and solve Ti

. Riyi = Qi

H´1ær0æ2

At end we update and .xm = x0 +Vmym rm = r0 − Vm+1ym

Note that each step we only orthogonalize on previous 2 vectors.

What would seem an obvious improvement. Can we do that here?

MINRES


Since we only orthogonalize on the previous two vectors, we would

like to discard the other vectors.

However, we need them for the update at the end.

Can we update every step and discard the vectors ?v i

The problem is that changes and hence changes (in general)Rm ym

completely. So we need all previous .v i

We need a trick.

MINRES

CRCD_01c.PRZ 5-6


A cunning plan:

Since changes completely every step, apply a change of variablesymAlternative for update : VmymTake and .Wm = VmRm

−1 ym = Rmym = RmRm−1Qm

H´1ær0æ2 = QmH´1ær0æ2

Then and each iteration only the last component of Wmym = Vmym changes. So we can update without keeping all .ym Wmym wi

from the Givens QR decomposition of a tridiagonal matrix isRm

uppertriangular with 2 upper diagonals.

columns are found by solving each iteration.Wm WmRm = Vm

So looking at the last (=the new) column we have:

, only not known:wmrm,m +wm−1rm−1,m + wm−2rm−2,m = vm wm

wm = rm,m−1 (vm − wm−1rm−1,m −wm−2rm−2,m)

MINRES


Update solution: xm = x0 +Vmym = x0 +Wmym

Since , contrary to , changes only in its last position we can doym ymthe update iteration-wise:

xm = x0 + i=1

m

wiyi,m = x0 + i=1

m−1

w iy i,m + wmym,m = xm−1 + wmym,m

How many vectors do we need to keep (indepedenf of # iterations)?

Do we need to continue the iteration?rm

What would be an update formula for ?rm

MINRES

CRCD_01c.PRZ 7-8


MINRES: Ax = b

choose and , set ;x0 d r0 = b − Ax0 tol k = 0

while doærkæ > tol

k = k + 1;

vk+1 = Avk − tk,kvk − tk−1,kvk−1;

tk+1,k = ævk+1æ2; vk+1 = vk+1/tk+1,k;

Update QR: Qk+1 = QkGk;Rk = GkH(Qk

HTk); yk,k = qk

H´1ær0æ2

, , t Qk Rk yk hQk

H´1ær0æ2;

wk = rk,k−1 (vk − wk−1rk−1,k − wk−2rk−2,k);

xk = xk−1 + wkyk,kend

MINRES


Hermitian matrices: Error minimization in the A-norm

We are solving with initial guess and Ax = b x0 d r0 = b − Ax0 is the solution to .x Ax = b

The error at iteration is , where is the ithi ei = x − (x0 + zi) z iKi(A, r0)

update to the initial guess.

Theorem:

Let be Hermitian, then the vector satisfies A z iKi(A, r0)

iff satisfieszi = arg minæx − (x0 + z)æA : zKi(A, r0) r i h r0 − Az i.r iΩK

i(A, r0)

The most important algorithm of this class is the Conjugate

Gradient Algorithm.

Conjugate Gradients

CRCD_01c.PRZ 9-10


Proof:

zi = arg minæx − (x0 + z)æA : zKi(A, r0) w (x − x0) − ziΩAKi(A, r0)

We know . Ki(A, r0) = span r0, r1,¢, ri−1This gives rkΩA(x − x0 − zi ) for k = 0,¢, i − 1 g

…A(x − x0 − zi), rk for k = 0,¢, i − 1 g

…b − Ax0 − Azi, rk for k = 0,¢, i − 1 g

…r0 − Azi, rk for k = 0,¢, i − 1 g

…r i, rk for k = 0,¢, i − 1 g

r iΩKi(A, r0)

Conjugate Gradients


Lanczos iteration:

Choose ; q1 0 = 0; q0 = 0;

for doi = 1, 2,¢

qi+1 = Aq i;

i = …Aq i, q i ; qi+1 = qi+1 − iqi; q i+1 = q i+1 − i−1qi−1;

i = æqi+1æ2; q i+1 = qi+1/ i;

end

Show sets .qi+1 = qi+1 − i−1q i−1 qi+1Ωq i−1(one argument is the symmetry of the Hessenberg matrix for

Arnoldi, give another)

This generates the recurrence relation:

, where , .AQi = Q iT i + iqi+1´ iT Q i = [q1 q2 £ qi ] Ti =

1 1 0 £

1 2 2 •

0 2 • •

§ •• •

Conjugate Gradients

CRCD_01c.PRZ 11-12


Use Lanczos orthonormal basis for minimizing A-norm of error.

iff satisfieszi = arg minæx − (x0 + z)æA : zKi(A, r0) r i h r0 − Az i.riΩK

i(A, r0)

q1 = r0/ær0æ2;

Lanczos method:AQ i = Q iT i + iqi+1´iT

Solve r0 − AQiyiΩQ i w Qi

H(ær0æ2q1 − AQ iyi) = 0w

.Qi

H(ær0æ2q1 − AQ iy i) = 0w ær0æ2´1 −Qi

HAQ iy i = 0

Notice .range(Qi) = spanr0, r1,¢, r i−1

AQ i = Q iT i + iqi+1´iT u Q i

HAQi = T i

So we reduced the problem to solve :ær0æ2´1 − Tiyi = 0

yi = T i−1

−1 ´1ær0æ2

Conjugate Gradients


In order to update step-by-step we use same trick as in MINRES:

Let then , where is unit lowerTi = L iDiLi

H yi = Li

−HDi

−1Li

−1´1ær0æ2 L i

bi-diagonal with lower diagonal coeff.s , index d columnl1, l2,¢, l i−2

Change of variables:

and : Pi = QiL i

−H yi = Di

−1L i

−1´1ær0æ2 Qiy i = Piy i

Notice that each iteration only the last component of changes.yiFrom we get a recurrence for : P iLi

H = Qi p i pi + l i−1p i−1 = q i (p1 = q1)

So every new step we compute a new , we update theqi+1decomposition of and from that and .Ti yi+1 p i+1

xi = xi−1 + piyi,i (where is ith comp of vector )r i = r i−1 −Apiy i,i = qi+1iyi,i yi,i yi

Conjugate Gradients

CRCD_01c.PRZ 13-14


(Easier form of) CG algorithm: Ax = b

Choose ; x0 t r0 = b − Ax0; p1 = r0 i = 0

while doæriæ2 > tol

i = i + 1;

i =…r i−1,r i−1

…pi−1,Api−1 ;

x i = x i + ip i;

r i = r i−1 − iApi;

i =…ri,ri

…ri−1,ri−1 ;

p i = r i − ipi−1;

end

Conjugate Gradients


0 20 40 60 80 100 120 140-10

-8

-6

-4

-2

0

2

CG

GMRES

log10|r|2

# iterations (matvecs)

Conjugate Gradients

CRCD_01c.PRZ 15-16


Eric de Sturler

Department of Computer Science

University of Illinois at Urbana-Champaign

[email protected] www-faculty.cs.uiuc.edu/sturler

Krylov subspace methods

Comparing Methods


GMRES: Ax = b

choose (e.g. ) and x0 x0 = 0 tol

r0 = b − Ax0; k = 0; v1 = r0/ær0æ2;

while ærkæ2 > tol

k = k + 1;

vk+1 = Avk;

for j = 1 : k,

hj,k = v jHvk+1; vk+1 = vk+1 − hj,kvk;

end

hk+1,k = ævk+1æ2; vk+1 = vk+1/hk+1,k;

update QR-dec: Hk = Qk+1Rk

ærkæ2 = qk+1H ´1 ær0æ2

end

yk = Rk−1Q

k

H´1ær0æ2; xk = x0 + Vkyk;

(or simply )rk = r0 − Vk+1Hkyk = Vk+1 I −Q

kQ

k

H ´1ær0æ2; rk = b − Axk

GMRES

CRCD_01c.PRZ 17-18

Swiss Cen ter for Scientific Computing© Eric de Sturler

Iterative Methods: Cost

! Many Cheap Iterations versus Minimum Number of Expensive Iterations" same as sequential but issues determining cost change

! four main kernels" matrix-vector product: comp: 2*N*nz1 comm: “neighbour”" preconditioner: comp: 2*N*nz2 comm: “neighbour” (& global)" vector update: comp: 2*N comm: none" inner product: comp: 2*N comm: global

" Methods— GMRES, GCR, FOM, BiCG, CGS, BiCGSTAB(l)— short recurrence: cheap iteration / many iterations— full orthogonalization: minimal number of iterations / expensive

" Matrix vector product often linked with grid/domain partitioning— partition scheme to minimize comm. volume/number of messages— separate local and nonlocal references— overlap communication (latency hiding)


u=un

u=us

u=uw u=ueLu=f

Lu =−(pux)x− (quy)y + rux+ suy + tu= f

Convection-Diffusion(-Reaction) Equation

Dirichlet boundary conditions

Model Problem

CRCD_01c.PRZ 19-20


CG vs GMRES for various mesh widths (h)

0 50 100 150 200 250

-10

-8

-6

-4

-2

0

2

1/11 1/21

1/31

1/51 1/71

log10|r|2


CG

GMRES


0 100 200 300 400 500 600 700 800 900

0

1

2

3

4

5

6

7

8

h=1/11 h=1/31

Eigenvalues

min max cond. nr.

10 0.162 7.84 48.4

20 4.47e-2 7.95 178

30 2.05e-2 7.98 389

50 7.57e-3 7.99 1.06e3

70 3.92e-3 8.00 2.04e3

Eigenvalues for various h

CRCD_01c.PRZ 21-22


CG vs GMRES

0 5 10 15 20 25 30 35

-10

-8

-6

-4

-2

0

2

CG

GMRES

log10|r|2


p=q=1; t = 0; f = 0; h=1/ 11;

us = 0; uw =1; un = 1; ue = 0;


0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

-10

-8

-6

-4

-2

0

2

CG

GMRES

log10|r|2

p=q=1; t = 0; f = 0; h=1/ 11;

us = 0; uw =1; un = 1; ue = 0;

# time (s)

CG vs GMRES

CRCD_01c.PRZ 23-24


0 20 40 60 80 100 120 140 160

-10

-8

-6

-4

-2

0

2

CG

GMRESlog10|r|2


p=q=1; t = 0; f = 0; h=1/ 51;

us = 0; uw =1; un = 1; ue = 0;

CG vs GMRES


CG vs GMRES (time)

0 2 4 6 8 10 12 14 16 18 20

-10

-8

-6

-4

-2

0

2

CG

GMRES

time (s)

log10|r|2

p=q=1; t = 0; f = 0; h=1/ 51;

us = 0; uw =1; un = 1; ue = 0;

CRCD_01c.PRZ 25-26


Iterations for GMRES(m)

p=q=1; t = 0; f = 0; h=1/ 51;

us = 0; uw =1; un = 1; ue = 0;

0 500 1000 1500 2000 2500 3000 3500

-10

-8

-6

-4

-2

0

2

3

5 10

30

50 full (156)

log10|r|2



Time for GMRES(m)

0 5 10 15 20 25 30 35 40

-10

-8

-6

-4

-2

0

2

30

50

10

5

3

full (156)

time (s)

log10|r|2

p=q=1; t = 0; f = 0; h=1/ 51;

us = 0; uw =1; un = 1; ue = 0;

CRCD_01c.PRZ 27-28


CG for a non-Hermitian Problem

0 100 200 300 400 500 600

-5

-4

-3

-2

-1

0

1

p=q=1; r=s=5; h=1/31;

us=0; uw=0; un=1; ue=1;


Eigenvalues

2.5 3 3.5 4 4.5 5 5.5

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

2.5

p=q=1; r=s=70; h=1/31;

us=0; uw=0; un=1; ue=1;

real

imaginary

CRCD_01c.PRZ 29-30


GMRES for varying convection

0 10 20 30 40 50 60 70 80 90 100

-12

-10

-8

-6

-4

-2

0

2


log10|r|2

p=q=1; r=s: given; h=1/ 31;

us = 0; uw =0; un = 1; ue = 1;

r=s=250

100

10

70


GMRES(m) with r=s=10

0 50 100 150

-10

-8

-6

-4

-2

0

2


log10|r|2

10

full 2550

p=q=1; r=s=10; h=1/ 31;

us = 0; uw =0; un = 1; ue = 1;

CRCD_01c.PRZ 31-32



0 5 10 15 20 25 30 35 40

-12

-10

-8

-6

-4

-2

0

full

50

10

25


log10|r|2

p=q=1; r=s=70; h=1/ 31;

us = 0; uw =0; un = 1; ue = 1;



0 10 20 30 40 50 60-10

-8

-6

-4

-2

0

2

10full

25

50


log10|r|2

p=q=1; r=s=100; h=1/ 31;

us = 0; uw =0; un = 1; ue = 1;

CRCD_01c.PRZ 33-34



0 20 40 60 80 100 120

-10

-8

-6

-4

-2

0

2

10

full

25

50


log10|r|2

p=q=1; r=s=250; h=1/ 31;

us = 0; uw =0; un = 1; ue = 1;


GMRES(m) after shifting spectrum

0 200 400 600 800 1000 1200

-8

-7

-6

-5

-4

-3

-2

-1

0

10

full

2550100

120


log10|r|2

p=q=1; r=s=70; h=1/ 31;

us = 0; uw =0; un = 1; ue = 1;

A=A-3.65*I

CRCD_01c.PRZ 35-36


0 100 200 300 400 500 600

-8

-6

-4

-2

0

2

4

full (119)

3 5

10 20 30 50 75

76


log10|r|2

p=q=1; r=200; s=-200; t=0; f=0; h=1/51;

us = 0; uw =100; un = 100; ue = 0;

GMRES(m)


0 20 40 60 80 100 120 140 160 180 200

-8

-6

-4

-2

0

2

4

GMRES(100)


log10|r|2

p=q=1; r=200; s=-200; t=0; f=0; h=1/51;

us = 0; uw =100; un = 100; ue = 0;

GMRES(m)

CRCD_01c.PRZ 37-38

Documents

Krylov subspace methods de Sturler/crcd_01c.pdf · ©2001 Eric de Sturler A cunning plan: Since changes completely every step, apply a change of variablesym Alternative for update