17
CG VERSUS MINRES: AN EMPIRICAL COMPARISON * DAVID CHIN-LUNG FONG AND MICHAEL SAUNDERS Abstract. For iterative solution of symmetric systems Ax = b, the conjugate gradient method (CG) is commonly used when A is positive definite, while the minimum residual method (MINRES) is typically reserved for indefinite systems. We investigate the sequence of approximate solutions x k generated by each method and suggest that even if A is positive definite, MINRES may be preferable to CG if iterations are to be terminated early. In particular, we show for MINRES that the solution norms kx k k are monotonically increasing when A is positive definite (as was already known for CG), and the solution errors kx * - x k k are monotonically decreasing. We also show that the backward errors for the MINRES iterates x k are monotonically decreasing. Key words. conjugate gradient method, minimum residual method, iterative method, sparse matrix, linear equations, CG, CR, MINRES, Krylov subspace method, trust-region method 1. Introduction. The conjugate gradient method (CG) [11] and the minimum residual method (MINRES) [18] are both Krylov subspace methods for the iterative solution of symmetric linear equations Ax = b. CG is commonly used when the matrix A is positive definite, while MINRES is generally reserved for indefinite systems [27, p85]. We reexamine this wisdom from the point of view of early termination on positive-definite systems. We assume that the system Ax = b is real with A symmetric positive definite (spd) and of dimension n × n. The Lanczos process [13] with starting vector b may be used to generate the n × k matrix V k ( v 1 v 2 ... v k ) and the (k + 1) × k Hessenberg tridiagonal matrix T k such that AV k = V k+1 T k for k =1, 2,...,‘ and AV = V T for some n, where the columns of V k form a theoretically orthonormal basis for the kth Krylov subspace K k (A, b) span{b, Ab, A 2 b,...,A k-1 b}, and T is × and tridiagonal. Approximate solutions within the kth Krylov subspace may be formed as x k = V k y k for some k-vector y k . As shown in [18], three iterative methods CG, MINRES, and SYMMLQ may be derived by choosing y k appropriately at each iteration. CG is well defined if A is spd, while MINRES and SYMMLQ are stable for any symmetric nonsingular A. As noted by Choi [2], SYMMLQ can form an approximation x k+1 = V k+1 y k+1 in the (k +1)th Krylov subspace when CG and MINRES are forming their approximations x k = V k y k in the kth subspace. It would be of future interest to compare all three methods on spd systems, but for the remainder of this paper we focus on CG and MINRES. With different methods using the same information V k+1 and T k to compute solu- tion estimates x k = V k y k within the same Krylov subspace (for each k), it is commonly thought that the number of iterations required will be similar for each method, and hence CG should be preferable on spd systems because it requires somewhat fewer floating-point operations per iteration. This view is justified if an accurate solution is required (stopping tolerance τ close to machine precision ). We show that with * Report SOL 2011-2R (to appear in SQU Journal for Science, 17:1 (2012), 44–62). ICME, Stanford University, CA 94305-4042, USA ([email protected]). Partially sup- ported by a Stanford Graduate Fellowship. Systems Optimization Laboratory, Department of Management Science and Engineering, Stan- ford University, CA 94305-4026, USA ([email protected]). Partially supported by Office of Naval Research grant N00014-08-1-0191 and NSF grant DMS-1009005, and by the U.S. Army Re- search Laboratory, through the Army High Performance Computing Research Center, Cooperative Agreement W911NF-07-0027. 1

CG VERSUS MINRES: AN EMPIRICAL COMPARISON

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES: AN EMPIRICAL COMPARISON∗

DAVID CHIN-LUNG FONG† AND MICHAEL SAUNDERS‡

Abstract. For iterative solution of symmetric systems Ax = b, the conjugate gradient method(CG) is commonly used when A is positive definite, while the minimum residual method (MINRES)is typically reserved for indefinite systems. We investigate the sequence of approximate solutions xkgenerated by each method and suggest that even if A is positive definite, MINRES may be preferableto CG if iterations are to be terminated early. In particular, we show for MINRES that the solutionnorms ‖xk‖ are monotonically increasing when A is positive definite (as was already known for CG),and the solution errors ‖x∗ − xk‖ are monotonically decreasing. We also show that the backwarderrors for the MINRES iterates xk are monotonically decreasing.

Key words. conjugate gradient method, minimum residual method, iterative method, sparsematrix, linear equations, CG, CR, MINRES, Krylov subspace method, trust-region method

1. Introduction. The conjugate gradient method (CG) [11] and the minimumresidual method (MINRES) [18] are both Krylov subspace methods for the iterativesolution of symmetric linear equations Ax = b. CG is commonly used when the matrixA is positive definite, while MINRES is generally reserved for indefinite systems [27,p85]. We reexamine this wisdom from the point of view of early termination onpositive-definite systems.

We assume that the system Ax = b is real with A symmetric positive definite(spd) and of dimension n × n. The Lanczos process [13] with starting vector b maybe used to generate the n × k matrix Vk ≡

(v1 v2 . . . vk

)and the (k + 1) × k

Hessenberg tridiagonal matrix Tk such that AVk = Vk+1Tk for k = 1, 2, . . . , ` andAV` = V`T` for some ` ≤ n, where the columns of Vk form a theoretically orthonormalbasis for the kth Krylov subspace Kk(A, b) ≡ span{b, Ab,A2b, . . . , Ak−1b}, and T` is`× ` and tridiagonal. Approximate solutions within the kth Krylov subspace may beformed as xk = Vkyk for some k-vector yk. As shown in [18], three iterative methodsCG, MINRES, and SYMMLQ may be derived by choosing yk appropriately at eachiteration. CG is well defined if A is spd, while MINRES and SYMMLQ are stable forany symmetric nonsingular A.

As noted by Choi [2], SYMMLQ can form an approximation xk+1 = Vk+1yk+1 inthe (k+1)th Krylov subspace when CG and MINRES are forming their approximationsxk = Vkyk in the kth subspace. It would be of future interest to compare all threemethods on spd systems, but for the remainder of this paper we focus on CG andMINRES.

With different methods using the same information Vk+1 and Tk to compute solu-tion estimates xk = Vkyk within the same Krylov subspace (for each k), it is commonlythought that the number of iterations required will be similar for each method, andhence CG should be preferable on spd systems because it requires somewhat fewerfloating-point operations per iteration. This view is justified if an accurate solutionis required (stopping tolerance τ close to machine precision ε). We show that with

∗Report SOL 2011-2R (to appear in SQU Journal for Science, 17:1 (2012), 44–62).†ICME, Stanford University, CA 94305-4042, USA ([email protected]). Partially sup-

ported by a Stanford Graduate Fellowship.‡Systems Optimization Laboratory, Department of Management Science and Engineering, Stan-

ford University, CA 94305-4026, USA ([email protected]). Partially supported by Office ofNaval Research grant N00014-08-1-0191 and NSF grant DMS-1009005, and by the U.S. Army Re-search Laboratory, through the Army High Performance Computing Research Center, CooperativeAgreement W911NF-07-0027.

1

Page 2: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

2 DAVID FONG AND MICHAEL SAUNDERS

looser stopping tolerances, MINRES is sure to terminate sooner than CG when thestopping rule is based on the backward error for xk, and by numerical examples weillustrate that the difference in iteration numbers can be substantial.

1.1. Notation. We study the application of CG and MINRES to real symmetricpositive-definite (spd) systems Ax = b. The unique solution is denoted by x∗. Theinitial approximate solution is x0 ≡ 0, and rk ≡ b−Axk is the residual vector for anapproximation xk within the kth Krylov subspace. For a vector v and matrix A, ‖v‖and ‖A‖ denote the 2-norm and the Frobenius norm respectively, and A � 0 indicatesthat A is spd.

2. Minimization properties of Krylov subspace methods. With exactarithmetic, the Lanczos process terminates with k = ` for some ` ≤ n. To ensurethat the approximations xk = Vkyk improve by some measure as k increases toward`, the Krylov solvers minimize some convex function within the expanding Krylovsubspaces [9].

2.1. CG. When A is spd, the quadratic form φ(x) ≡ 12x

TAx − bTx is boundedbelow, and its unique minimizer solves Ax = b. A characterization of the CG iterationsis that they minimize the quadratic form within each Krylov subspace [9], [17, §2.4],[28, §§8.8–8.9]:

xCk = VkyCk , where yCk = arg min

yφ(Vky).

With b = Ax∗ and 2φ(xk) = xTkAxk − 2xTkAx∗, this is equivalent to minimizing the

function ‖x∗− xk‖A ≡ (x∗− xk)TA(x∗− xk), known as the energy norm of the error,within each Krylov subspace. For some applications, this is a desirable property[21, 24, 1, 17, 28].

2.2. MINRES. For nonsingular (and possibly indefinite) systems, the residualnorm was used in [18] to characterize the MINRES iterations:

xMk = VkyMk , where yMk = arg min

y‖b−AVky‖. (2.1)

Thus, MINRES minimizes ‖rk‖ within the kth Krylov subspace. This was also an aimof Stiefel’s Conjugate Residual method (CR) [23] for spd systems (and of Luenberger’sextensions of CR to indefinite systems [15, 16]). Thus, CR and MINRES must generatethe same iterates on spd systems. We use this connection to prove that ‖xk‖ increasesmonotonically when MINRES is applied to an spd system.

2.3. CG and CR. The two methods for solving spd systems Ax = b are sum-marized in Table 2.1. The first two columns are pseudocodes for CG and CR withiteration number k omitted for clarity; they match our Matlab implementations.Note that q = Ap in both methods, but it is not computed as such in CR. Termina-tion occurs when r = 0 (⇒ ρ = β = 0).

To prove our main result we need to introduce iteration indices; see column 3 ofTable 2.1. Termination occurs when rk = 0 for some index k = ` ≤ n (⇒ ρ` = β` = 0,r` = s` = p` = q` = 0). Note: This ` is the same as the ` at which the Lanczosprocess theoretically terminates for the given A and b.

Theorem 2.1. The following properties hold for Algorithm CR:(a) qTi qj = 0 (i 6= j)(b) rTi qj = 0 (i ≥ j + 1)

Page 3: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES 3

Table 2.1Pseudocode for algorithms CG and CR

CGInitializex = 0, r = bρ = ‖r‖2, p = rRepeatq = Apα = ρ/pTqx← x+ αpr ← r − αq

ρ = ρ, ρ = rTrβ = ρ/ρp← r + βp

CRInitializex = 0, r = b, s = Ar,ρ = rTs, p = r, q = sRepeat

(q = Ap)α = ρ/‖q‖2x← x+ αpr ← r − αqs = Arρ = ρ, ρ = rTsβ = ρ/ρp← r + βpq ← s+ βq

CRInitializex0 = 0, r0 = b, s0 = Ar0,ρ0 = rT0s0, p0 = r0, q0 = s0For k = 1, 2, . . .

(qk−1 = Apk−1)αk = ρk−1/‖qk−1‖2xk = xk−1 + αkpk−1rk = rk−1 − αkqk−1sk = Arkρk = rTkskβk = ρk/ρk−1pk = rk + βkpk−1qk = sk + βkqk−1

Proof. Given in [16, Theorem 1].

Theorem 2.2. The following properties hold for Algorithm CR:

(a) αi ≥ 0(b) βi ≥ 0(c) pTi qj ≥ 0(d) pTi pj ≥ 0(e) xTi pj ≥ 0(f) rTi pj ≥ 0

Proof.

(a) Here we use the fact that A is spd. The inequalities are strict until i = ` (andr` = 0).

ρi = rTi si = rTiAri ≥ 0 (A � 0) (2.2)

αi = ρi−1/‖qi−1‖2 ≥ 0

(b) And again:

βi = ρi/ρi−1 ≥ 0 (by (2.2))

(c) Case I: i = j

pTi qi = pTi Api ≥ 0 (A � 0)

Case II: i− j = k > 0

pTi qj = pTi qi−k = rTi qi−k + βipTi−1qi−k

= βipTi−1qi−k (by Thm 2.1 (b))

≥ 0,

where βi ≥ 0 by (b) and pTi−1qi−k ≥ 0 by induction as (i−1)−(i−k) = k−1 < k.

Page 4: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

4 DAVID FONG AND MICHAEL SAUNDERS

Case III: j − i = k > 0

pTi qj = pTi qi+k = pTi Api+k

= pTi A(ri+k + βi+kpi+k−1)

= qTi ri+k + βi+kpTi qi+k−1

= βi+kpTi qi+k−1 (by Thm 2.1 (b))

≥ 0,

where βi+k ≥ 0 by (b) and pTi qi+k−1 ≥ 0 by induction as (i+k−1)−i = k−1 < k.(d) At termination, define P ≡ span{p0, p1, . . . , p`−1} and Q ≡ span{q0, . . . , q`−1}.

By construction, P = span{b, Ab, . . . , A`−1b} and Q = span{Ab, . . . , A`b} (sinceqi = Api). Again by construction, x` ∈ P, and since r` = 0 we have b =Ax` ⇒ b ∈ Q. We see that P ⊆ Q. By Theorem 2.1(a), {qi/‖qi‖}`−1i=0 forms anorthonormal basis for Q. If we project pi ∈ P ⊆ Q onto this basis, we have

pi =

`−1∑k=0

pTi qkqTk qk

qk,

where all coordinates are non-negative from (c). Similarly for any other pj , j < `.Therefore pTi pj ≥ 0 for any i, j < `.

(e) By construction,

xi = xi−1 + αipi−1 = · · · =i∑

k=1

αkpk−1 (x0 = 0)

Therefore xTi pi ≥ 0 by (d) and (a).(f) Note that any ri can be expressed as a sum of qi:

ri = ri+1 + αi+1qi

= · · ·= rl + αlql−1 + · · ·+ αi+1qi

= αlql−1 + · · ·+ αi+1qi.

Thus we have

rTi pj = (αlql−1 + · · ·+ αi+1qi)T pj ≥ 0,

where the inequality follows from (a) and (c).

We are now able to prove our main theorem about the monotonic increase of ‖xk‖for CR and MINRES. A similar result was proved for CG by Steihaug [21].

Theorem 2.3. For CR (and hence MINRES) on an spd system Ax = b, ‖xk‖increases monotonically.

Proof. ‖xi‖2 − ‖xi−1‖2 = 2αixTi−1pi−1 + pTi−1pi−1 ≥ 0, where the last inequality

follows from Theorem 2.2 (a), (d) and (e). Therefore ‖xi‖ ≥ ‖xi−1‖.‖x∗−xk‖ is known to be monotonic for CG [11]. The corresponding result for CR

is a direct consequence of [11, Thm 7:5]. However, the second half of that theorem,

Page 5: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES 5

‖x∗ − xCk−1‖ > ‖x∗ − xMk ‖, rarely holds in machine arithmetic. We give here analternative proof that does not depend on CG.

Theorem 2.4. For CR (and hence MINRES) on an spd system Ax = b, the error‖x∗ − xk‖ decreases monotonically.

Proof. From the update rule for xk, we can express the final solution xl = x∗ as

xl = xl−1 + αl−1pl−1

= · · ·= xk + αk+1pk + · · ·+ αl−1pl−1 (2.3)

= xk−1 + αkpk−1 + αk+1pk + · · ·+ αl−1pl−1. (2.4)

Using the last two equalities above, we can write

‖xl − xk−1‖2 − ‖xl − xk‖2 = (xl − xk−1)T (xl − xk−1)− (xl − xk)T (xl − xk)

= 2αkpTk−1(αk+1pk + · · ·+ αl−1pl−1) + α2

kpTk−1pk−1

≥ 0,

where the last inequality follows from Theorem 2.2 (a), (d).The energy norm error ‖x∗ − xk‖A is monotonic for CG by design. The corre-

sponding result for CR is given in [11, Thm 7:4]. We give an alternative proof here.Theorem 2.5. For CR (and hence MINRES) on an spd system Ax = b, the error

in energy norm ‖x∗ − xk‖A is strictly decreasing.Proof. From (2.3) and (2.4), we can write

‖xl − xk−1‖2A − ‖xl − xk‖2A= (xl − xk−1)TA(xl − xk−1)− (xl − xk)TA(xl − xk)

= 2αkpTk−1A(αk+1pk + · · ·+ αl−1pl−1) + α2

kpTk−1Apk−1

= 2αkqTk−1(αk+1pk + · · ·+ αl−1pl−1) + α2

kqTk−1pk−1

> 0,

where the last inequality follows from Theorem 2.2 (a), (c).

3. Backward error analysis. For many physical problems requiring numericalsolution, we are given inexact or uncertain input data (in this case A and/or b). Itis not justifiable to seek a solution beyond the accuracy of the data [6]. Instead, it ismore reasonable to stop an iterative solver once we know that the current approximatesolution solves a nearby problem. The measure of “nearby” should match the errorin the input data. The design of such stopping rules is an important application ofbackward error analysis.

For a consistent linear system Ax = b, we think of xk coming from the kthiteration of one of the iterative solvers. Following Titley-Peloquin [25] we say that xkis an acceptable solution if and only if there exist perturbations E and f satisfying

(A+ E)xk = b+ f,‖E‖‖A‖

≤ α, ‖f‖‖b‖≤ β (3.1)

for some tolerances α ≥ 0, β ≥ 0 that reflect the (preferably known) accuracy of thedata. We are naturally interested in minimizing the size of E and f . If we define theoptimization problem

minξ,E,f

ξ s.t. (A+ E)xk = b+ f,‖E‖‖A‖

≤ αξ, ‖f‖‖b‖≤ βξ

Page 6: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

6 DAVID FONG AND MICHAEL SAUNDERS

to have optimal solution ξk, Ek, fk (all functions of xk, α, and β), we see that xk is anacceptable solution if and only if ξk ≤ 1. We call ξk the normwise relative backwarderror (NRBE) for xk.

With rk = b−Axk, the optimal solution ξk, Ek, fk is shown in [25] to be

φk =β‖b‖

α‖A‖‖xk‖+ β‖b‖, Ek =

(1− φk)

‖xk‖2rkx

Tk , (3.2)

ξk =‖rk‖

α‖A‖‖xk‖+ β‖b‖, fk = −φkrk. (3.3)

(See [12, p12] for the case β = 0 and [12, §7.1 and p336] for the case α = β.)

3.1. Stopping rule. For general tolerances α and β, the condition ξk ≤ 1 forxk to be an acceptable solution becomes

‖rk‖ ≤ α‖A‖‖xk‖+ β‖b‖, (3.4)

the stopping rule used in LSQR for consistent systems [19, p54, rule S1].

3.2. Monotonic backward errors. Of interest is the size of the perturbationsto A and b for which xk is an exact solution of Ax = b. From (3.2)–(3.3), theperturbations have the following norms:

‖Ek‖ = (1− φk)‖rk‖‖xk‖

=α‖A‖‖rk‖

α‖A‖‖xk‖+ β‖b‖, (3.5)

‖fk‖ = φk‖rk‖ =β‖b‖‖rk‖

α‖A‖‖xk‖+ β‖b‖. (3.6)

Since ‖xk‖ is monotonically increasing for CG and MINRES, we see from (3.2) that φkis monotonically decreasing for both solvers. Since ‖rk‖ is monotonically decreasingfor MINRES (but not for CG), we have the following result.

Theorem 3.1. Suppose α > 0 and β > 0 in (3.1). For CR and MINRES (but notCG), the relative backward errors ‖Ek‖/‖A‖ and ‖fk‖/‖b‖ decrease monotonically.

Proof. This follows from (3.5)–(3.6) with ‖xk‖ increasing for both solvers and‖rk‖ decreasing for CR and MINRES but not for CG.

4. Numerical results. Here we compare the convergence of CG and MINRES

on various spd systems Ax = b and some associated indefinite systems (A− δI)x = b.The test examples are drawn from the University of Florida Sparse Matrix Collec-tion (Davis [5]). We experimented with all 26 cases for which A is real spd and b issupplied. In Matlab we computed the condition number for each test matrix by find-ing the largest and smallest eigenvalue using eigs(A,1,’LM’) and eigs(A,1,’SM’)

respectively. For this test set, the condition numbers range from 1.7E+03 to 3.1E+13.Since A is spd, we apply diagonal preconditioning by redefining A and b as follows:

d = diag(A), D = diag(1./sqrt(d)), A ← DAD, b ← Db, b ← b/‖b‖. Thus in thefigures below we have diag(A) = I and ‖b‖ = 1. With this preconditioning, thecondition numbers range from 1.2E+01 to 2.2E+11. The distribution of conditionnumber of the test set matrices before and after preconditioning is shown in Figure4.1.

The stopping rule used for CG and MINRES was (3.4) with α = 0 and β = 10−8.That is, ‖rk‖ ≤ 10−8‖b‖ = 10−8 (but with a maximum of 5n iterations for spd systemsand n iterations for indefinite systems).

Page 7: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES 7

0 5 10 15 20 25 3010

0

102

104

106

108

1010

1012

1014

Matrix ID

Cond(A

)

CG vs MINRES test set

original

preconditioned

Fig. 4.1. Distribution of condition number for matrices used for CG vs MINRES comparison,before and after diagonal preconditioning

4.1. Positive-definite systems. In plotting backward errors, we assume forsimplicity that α > 0 and β = 0 in (3.1)–(3.3), even though it doesn’t match thechoice of α and β in the stopping rule (3.4). This gives φk = 0 and ‖Ek‖ = ‖rk‖/‖xk‖in (3.5). Thus, as in Theorem 3.1, we expect ‖Ek‖ to decrease monotonically for CR

and MINRES but not for CG.

In Figures 4.2 and 4.3, we plot ‖rk‖/‖xk‖, ‖x∗−xk‖A, and ‖x∗−xk‖ for CG andMINRES for four different problems. For CG, the plots confirm that ‖x∗ − xk‖A and‖x∗− xk‖ are monotonic. For MINRES, the plots confirm the prediction of Theorems3.1, 2.5, and 2.4 that ‖rk‖/‖xk‖, ‖x∗ − xk‖A, and ‖x∗ − xk‖ are monotonic.

Figure 4.2 (left) shows problem Schenk AFE af shell8 with A of size 504855 ×504855 and cond(A) = 2.7E+05. From the plot of backward errors ‖rk‖/‖xk‖, wesee that both CG and MINRES converge quickly at the early iterations. Then thebackward error of MINRES plateaus at about iteration 80, and the backward error ofCG stays about 1 order of magnitude behind MINRES. A similar phenomenon of fastconvergence at early iterations followed by slow convergence is observed in the energynorm error and 2-norm error plots.

Figure 4.2 (right) shows problem Cannizzo sts4098 with A of size 4098 × 4098and cond(A) = 6.7E+03. MINRES converges slightly faster in terms of backwarderror, while CG converges slightly faster in terms of both error norms.

Page 8: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

8 DAVID FONG AND MICHAEL SAUNDERS

0 200 400 600 800 1000 1200 1400−10

−9

−8

−7

−6

−5

−4

−3

−2

−1

iteration count

log(

||r||/

||x||)

Name:Schenk_AFE_af_shell8, Dim:504855x504855, nnz:17579155, id=11

CGMINRES

0 50 100 150 200 250 300 350 400 450−10

−9

−8

−7

−6

−5

−4

−3

−2

−1

0

iteration count

log(

||r||/

||x||)

Name:Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13

CGMINRES

0 200 400 600 800 1000 1200 1400−7

−6

−5

−4

−3

−2

−1

0

1

iteration count

log|

|xk −

x* || A

Schenk_AFE_af_shell8, Dim:504855x504855, nnz:17579155, id=11

CGMINRES

0 50 100 150 200 250 300 350 400 450−8

−7

−6

−5

−4

−3

−2

−1

0

1

iteration count

log|

|xk −

x* || A

Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13

CGMINRES

0 200 400 600 800 1000 1200 1400−5

−4

−3

−2

−1

0

1

2

iteration count

log|

|xk −

x* ||

Schenk_AFE_af_shell8, Dim:504855x504855, nnz:17579155, id=11

CGMINRES

0 50 100 150 200 250 300 350 400 450−7

−6

−5

−4

−3

−2

−1

0

1

2

iteration count

log|

|xk −

x* ||

Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13

CGMINRES

Fig. 4.2. Comparison of backward and forward errors for CG and MINRES solving two spdsystems Ax = b.

Left: Problem Schenk AFE af shell8 with n = 504855 and cond(A) = 2.7E+05. Note thatMINRES stops significantly sooner than CG with α = 0 and β = 10−8 in (3.4).

Right: Cannizzo sts4098 with n = 4098 and cond(A) = 6.7E+03. MINRES stops slightlysooner than CG.

Top: The values of log10(‖rk‖/‖xk‖) are plotted against iteration number k. These valuesdefine log10(‖Ek‖) when the stopping tolerances in (3.4) are α > 0 and β = 0.

Middle: The values of log10 ‖x∗ − xk‖A are plotted against iteration number k. This is thequantity that CG minimizes at each iteration.

Bottom: The values of log10 ‖x∗ − xk‖.

Page 9: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES 9

0 1 2 3 4 5 6 7 8 9 10

x 104

−18

−16

−14

−12

−10

−8

−6

−4

−2

0

2

iteration count

log(

||r||/

||x||)

Name:Simon_raefsky4, Dim:19779x19779, nnz:1316789, id=7

CGMINRES

0 0.5 1 1.5 2 2.5 3 3.5

x 104

−10

−9

−8

−7

−6

−5

−4

−3

−2

−1

0

iteration countlo

g(||r

||/||x

||)

Name:BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22

CGMINRES

0 1 2 3 4 5 6 7 8 9 10

x 104

−1

0

1

2

3

4

5

iteration count

log|

|xk −

x* || A

Simon_raefsky4, Dim:19779x19779, nnz:1316789, id=7

CGMINRES

0 0.5 1 1.5 2 2.5 3 3.5

x 104

−7

−6

−5

−4

−3

−2

−1

0

1

iteration count

log|

|xk −

x* || A

BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22

CGMINRES

0 1 2 3 4 5 6 7 8 9 10

x 104

4

5

6

7

8

9

10

iteration count

log|

|xk −

x* ||

Simon_raefsky4, Dim:19779x19779, nnz:1316789, id=7

CGMINRES

0 0.5 1 1.5 2 2.5 3 3.5

x 104

−5

−4

−3

−2

−1

0

1

2

iteration count

log|

|xk −

x* ||

BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22

CGMINRES

Fig. 4.3. Comparison of backward and forward errors for CG and MINRES solving two morespd systems Ax = b.

Left: Problem Simon raefsky4, with n = 19779 and cond(A) = 2.2E+11.Right: BenElechi BenElechi1, with n = 245874 and cond(A) = 1.8E+09.Top: The values of log10(‖rk‖/‖xk‖) are plotted against iteration number k. These values

define log10(‖Ek‖) when the stopping tolerances in (3.4) are α > 0 and β = 0.Middle: The values of log10 ‖x∗ − xk‖A are plotted against iteration number k. This is the

quantity that CG minimizes at each iteration.Bottom: The values of log10 ‖x∗ − xk‖.

Page 10: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

10 DAVID FONG AND MICHAEL SAUNDERS

Figure 4.3 (left) shows problem Simon raefsky4 with A of size 19779×19779 andcond(A) = 2.2E+11. Because of the high condition number, both algorithms hit the5n iteration limit. We see that the backward error for MINRES converges faster thanfor CG as expected. For the energy norm error, CG is able to decrease over 5 orders ofmagnitude while MINRES plateaus after a 2 orders of magnitude decrease. For boththe energy norm error and 2-norm error, MINRES reaches a lower point than CG forsome iterations. This must be due to numerical error in CG and MINRES (a result ofloss of orthogonality in Vk).

Figure 4.3 (right) shows problem BenElechi BenElechi1 with A of size 245874×245874 and cond(A) = 1.8E+09. The backward error of MINRES stays ahead of CG

by 2 orders of magnitude for most iterations. Around iteration 32000, the backwarderror of both algorithms goes down rapidly and CG catches up with MINRES. Bothalgorithms exhibit a plateau on energy norm error for the first 20000 iterations. Theerror norms for CG start decreasing around iteration 20000 and decrease even fasterafter iteration 30000.

Figure 4.4 shows ‖rk‖ and ‖xk‖ for CG and MINRES on two typical spd examples.We see that ‖xk‖ is monotonically increasing for both solvers, and the ‖xk‖ valuesrise fairly rapidly to their limiting value ‖x∗‖, with a moderate delay for MINRES.

Figure 4.5 shows ‖rk‖ and ‖xk‖ for CG and MINRES on two spd examples inwhich the residual decrease and the solution norm increase are somewhat slower thantypical. The rise of ‖xk‖ for MINRES is rather more delayed. In the second case, ifthe stopping tolerance were β = 10−6 rather than β = 10−8, the final MINRES ‖xk‖(k ≈ 10000) would be less than half the exact value ‖x∗‖. It will be of future interestto evaluate this effect within the context of trust-region methods for optimization.

4.1.1. Why does ‖rk‖ for CG lag behind MINRES?. It is commonly thoughtthat even though MINRES is known to minimize ‖rk‖ at each iteration, the cumulativeminimum of ‖rk‖ for CG should approximately match that of MINRES. That is,

min0≤i≤k

‖rCi ‖ ≈ ‖rMk ‖.

However, in Figures 4.2 and 4.3 we see that ‖rk‖ for MINRES is often smaller thanfor CG by 1 or 2 orders of magnitude. This phenomenon can be explained by thefollowing relations between ‖rCk ‖ and ‖rMk ‖ [10, Lemma 5.4.1] and [26]:

‖rCk ‖ =‖rMk ‖√

1− ‖rMk ‖2/‖rMk−1‖2. (4.1)

From (4.1), one can infer that if ‖rMk ‖ decreases a lot between iterations k − 1and k, then ‖rCk ‖ would be roughly the same as ‖rMk ‖. The converse also holds, inthat ‖rCk ‖ will be much larger than ‖rMk ‖ if MINRES is almost stalling at iterationk (i.e., ‖rMk ‖ did not decrease much relative to the previous iteration). The aboveanalysis was pointed out by Titley-Peloquin [26] in comparing LSQR and LSMR [8].We repeat the analysis here for CG vs MINRES and extend it to demonstrate whythere is a lag in general for large problems.

With α = 0 in stopping rule (3.4), CG and MINRES stop when ‖rk‖ ≤ β‖b‖. Ifthis occurs at iteration l, we have

l∏k=1

‖rk‖‖rk−1‖

=‖rl‖‖b‖≈ β.

Page 11: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES 11

0 0.5 1 1.5 2 2.5 3

x 104

−10

−8

−6

−4

−2

0

2

iteration count

log|

|r||

Name:Simon_olafu, Dim:16146x16146, nnz:1015156, id=6

CGMINRES

0 50 100 150 200 250 300 350 400 450−9

−8

−7

−6

−5

−4

−3

−2

−1

0

1

iteration countlo

g||r

||

Name:Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13

CGMINRES

0 0.5 1 1.5 2 2.5 3

x 104

0

2

4

6

8

10

12

14

16

18x 10

5

iteration count

||x||

Name:Simon_olafu, Dim:16146x16146, nnz:1015156, id=6

CGMINRES

0 50 100 150 200 250 300 350 400 4500

5

10

15

20

25

iteration count

||x||

Name:Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=13

CGMINRES

Fig. 4.4. Comparison of residual and solution norms for CG and MINRES solving two spdsystems Ax = b. These are typical examples.

Left: Problem Simon olafu with n = 16146 and cond(A) = 4.3E+08.Right: Problem Cannizzo sts4098 with n = 4098 and cond(A) = 6.7E+03.Top: The values of log10 ‖rk‖ are plotted against iteration number k.Bottom: The values of ‖xk‖ are plotted against k. The solution norms grow somewhat faster

for CG than for MINRES. Both reach the limiting value ‖x∗‖ significantly before xk is close to x∗.

Thus on average, ‖rMk ‖/‖rMk−1‖ will be closer to 1 if l is large. This means that thelarger l is (in absolute terms), the more CG will lag behind MINRES (a bigger gapbetween ‖rCk ‖ and ‖rMk ‖).

4.2. Indefinite systems. A key part of Steihaug’s trust-region method forlarge-scale unconstrained optimization [21] (see also [4]) is his proof that when CG

is applied to a symmetric (possibly indefinite) system Ax = b, the solution norms‖x1‖, . . . , ‖xk‖ are strictly increasing as long as pTjApj > 0 for all iterations 1 ≤ j ≤ k.(We are using the notation in Table 2.1.)

Page 12: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

12 DAVID FONG AND MICHAEL SAUNDERS

0 200 400 600 800 1000 1200 1400−9

−8

−7

−6

−5

−4

−3

−2

−1

0

iteration count

log|

|r||

Name:Schmid_thermal1, Dim:82654x82654, nnz:574458, id=14

CGMINRES

0 0.5 1 1.5 2 2.5 3 3.5

x 104

−9

−8

−7

−6

−5

−4

−3

−2

−1

0

iteration countlo

g||r

||

Name:BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22

CGMINRES

0 200 400 600 800 1000 1200 14000

5

10

15

20

25

iteration count

||x||

Name:Schmid_thermal1, Dim:82654x82654, nnz:574458, id=14

CGMINRES

0 0.5 1 1.5 2 2.5 3 3.5

x 104

0

10

20

30

40

50

60

70

80

iteration count

||x||

Name:BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=22

CGMINRES

Fig. 4.5. Comparison of residual and solution norms for CG and MINRES solving two morespd systems Ax = b. Sometimes the solution norms take longer to reach the limiting value ‖x∗‖.

Left: Problem Schmid thermal1 with n = 82654 and cond(A) = 3.0E+05.Right: Problem BenElechi BenElechi1 with n = 245874 and cond(A) = 1.8E+09.Top: The values of log10 ‖rk‖ are plotted against iteration number k.Bottom: The values of ‖xk‖ are plotted against k. Again the solution norms grow faster for

CG.

From our proof of Theorem 2.2, we see that the same property holds for CR andMINRES as long as both pTjApj > 0 and rTjArj > 0 for all iterations 1 ≤ j ≤ k. Incase future research finds that MINRES is a useful solver in the trust-region context,it is of interest now to offer some empirical results about the behavior of ‖xk‖ whenMINRES is applied to indefinite systems.

Page 13: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES 13

1 1.5 2 2.5 30.2

0.4

0.6

0.8

1

1.2

1.4

1.6

Iteration number

||xk||

1 1.5 2 2.5 30

0.5

1

1.5

2

2.5

3

Iteration number

||rk||

/ ||x

k||

Fig. 4.6. For MINRES on the indefinite problem (4.2), ‖xk‖ and the backward error ‖rk‖/‖xk‖are both slightly non-monotonic.

First, on the nonsingular indefinite system2 1 11 0 11 1 2

x =

011

, (4.2)

MINRES gives non-monotonic solution norms, as shown in the left plot of Figure 4.6.The decrease in ‖xk‖ implies that the backward errors ‖rk‖/‖xk‖ may not be mono-tonic, as illustrated in the right plot.

More generally, we can gain an impression of the behavior of ‖xk‖ by recallingfrom Choi et al. [3] the connection between MINRES and MINRES-QLP. Both methodscompute the iterates xMk = Vky

Mk in (2.1) from the subproblems

yMk = arg miny∈Rk

‖Tky − β1e1‖ and possibly T`yM` = β1e1.

When A is nonsingular or Ax = b is consistent (which we now assume), yMk is uniquelydefined for each k ≤ ` and the methods compute the same iterates xMk (but by differentnumerical methods). In fact they both compute the expanding QR factorizations

Qk[Tk β1e1

]=

[Rk tk0 φk

],

(with Rk upper tridiagonal) and MINRES-QLP also computes the orthogonal factor-izations RkPk = Lk (with Lk lower tridiagonal), from which the kth solution estimateis defined by Wk = VkPk, Lkuk = tk, and xMk = Wkuk. As shown in [3, §5.3], theconstruction of these quantities is such that the first k − 3 columns of Wk are thesame as in Wk−1, and the first k − 3 elements of uk are the same as in uk−1. SinceWk has orthonormal columns, ‖xMk ‖ = ‖uk‖, where the first k − 2 elements of uk areunaltered by later iterations. As shown in [3, §6.5], it means that certain quantitiescan be cheaply updated to give norm estimates in the form

χ2 ← χ2 + µ2k−2, ‖xMk ‖2 = χ2 + µ2

k−1 + µ2k,

where it is clear that χ2 increases monotonically. Although the last two terms are ofunpredictable size, ‖xMk ‖2 tends to be dominated by the monotonic term χ2 and wecan expect that ‖xMk ‖ will be approximately monotonic as k increases from 1 to `.

Page 14: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

14 DAVID FONG AND MICHAEL SAUNDERS

0 2000 4000 6000 8000 10000 12000 14000 16000 18000−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0Simon_olafu, Dim:16146x16146, nnz:1015156, id=32

iteration count

log|

|r||

0 500 1000 1500 2000 2500 3000 3500 4000 4500−2.5

−2

−1.5

−1

−0.5

0Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=39

iteration count

log|

|r||

0 2000 4000 6000 8000 10000 12000 14000 16000 180000

1

2

3

4

5

6

7

8

9Simon_olafu, Dim:16146x16146, nnz:1015156, id=32

iteration count

||x||

0 500 1000 1500 2000 2500 3000 3500 4000 45000

5

10

15Cannizzo_sts4098, Dim:4098x4098, nnz:72356, id=39

iteration count

||x||

Fig. 4.7. Residual norms and solution norms when MINRES is applied to two indefinitesystems (A− δI)x = b, where A is the spd matrices used in Figure 4.4 and δ = 0.5 is large enoughto make the systems indefinite.

Left: Problem Simon olafu with n = 16146.Right: Problem Cannizzo sts4098 with n = 4098.Top: The values of log10 ‖rk‖ are plotted against iteration number k for the first n iterations.Bottom left: The values of ‖xk‖ are plotted against k. During the n = 16146 iterations, ‖xk‖

increased 83% of the time and the backward errors ‖rk‖/‖xk‖ (not shown) decreased 96% of thetime.

Bottom right: During the n = 4098 iterations, ‖xk‖ increased 90% of the time and the back-ward errors ‖rk‖/‖xk‖ (not shown) decreased 98% of the time.

Experimentally we find that for most MINRES iterations on an indefinite problem,‖xk‖ does increase. To obtain indefinite examples that were sensibly scaled, we usedthe four spd (A, b) cases in Figures 4.4–4.5, applied diagonal scaling as before, andsolved (A − δI)x = b with δ = 0.5 and where A and b are now scaled (so thatdiag(A) = I). The number of iterations increased significantly but was limited to n.

Figure 4.7 shows log10 ‖rk‖ and ‖xk‖ for the first two cases (where A is the spdmatrices in Figure 4.4). The values of ‖xk‖ are essentially monotonic. The backwarderrors ‖rk‖/‖xk‖ (not shown) were even closer to being monotonic (at least for thefirst n iterations).

Page 15: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES 15

0 1 2 3 4 5 6 7 8 9

x 104

−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0Schmid_thermal1, Dim:82654x82654, nnz:574458, id=40

iteration count

log|

|r||

0 0.5 1 1.5 2 2.5

x 105

−3

−2.5

−2

−1.5

−1

−0.5

0BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=48

iteration count

log|

|r||

0 1 2 3 4 5 6 7 8 9

x 104

0

50

100

150

200

250

300

350

400Schmid_thermal1, Dim:82654x82654, nnz:574458, id=40

iteration count

||x||

0 0.5 1 1.5 2 2.5

x 105

0

50

100

150

200

250

300

350

400BenElechi_BenElechi1, Dim:245874x245874, nnz:13150496, id=48

iteration count

||x||

Fig. 4.8. Residual norms and solution norms when MINRES is applied to two indefinitesystems (A− δI)x = b, where A is the spd matrices used in Figure 4.5 and δ = 0.5 is large enoughto make the systems indefinite.

Left: Problem Schmid thermal1 with n = 82654.Right: Problem BenElechi BenElechi1 with n = 245874.Top: The values of log10 ‖rk‖ are plotted against iteration number k for the first n iterations.Bottom left: The values of ‖xk‖ are plotted against k. There is a mild but clear decrease in

‖xk‖ over an interval of about 10000 iterations. During the n = 82654 iterations, ‖xk‖ increased83% of the time and the backward errors ‖rk‖/‖xk‖ (not shown) decreased 91% of the time.

Bottom right: The solution norms and backward errors are essentially monotonic. Duringthe n = 245874 iterations, ‖xk‖ increased 88% of the time and the backward errors ‖rk‖/‖xk‖ (notshown) decreased 95% of the time.

Figure 4.8 shows ‖xk‖ and log10 ‖rk‖ for the second two cases (where A is the spdmatrices in Figure 4.5). The left example reveals a definite period of decrease in ‖xk‖.Nevertheless, during the n = 82654 iterations, ‖xk‖ increased 83% of the time and thebackward errors ‖rk‖/‖xk‖ decreased 91% of the time. The right example is more likethose in Figure 4.7. During n = 245874 iterations, ‖xk‖ increased 83% of the time,the backward errors ‖rk‖/‖xk‖ decreased 91% of the time, and any nonmonotonicitywas very slight.

Page 16: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

16 DAVID FONG AND MICHAEL SAUNDERS

Table 5.1Comparison of CG and MINRES properties on an spd system Ax = b.

CG MINRES

‖xk‖ ↗ [21, Thm 2.1] ↗ (Thm 2.3)‖x∗ − xk‖ ↘ [11, Thm 4:3] ↘ (Thm 2.4) [11, Thm 7:5]‖x∗ − xk‖A ↘ [11, Thm 6:3] ↘ (Thm 2.5) [11, Thm 7:4]‖rk‖ not-monotonic ↘ [18] [11, Thm 7:2]‖rk‖/‖xk‖ not-monotonic ↘ (Thm 3.1)

↗ monotonically increasing↘ monotonically decreasing

Table 5.2Comparison of LSQR and LSMR properties on min ‖Ax− b‖.

LSQR LSMR

‖xk‖ ↗ [7, Thm 3.3.1] ↗ [7, Thm 3.3.6]‖x∗ − xk‖ ↘ [7, Thm 3.3.2] ↘ [7, Thm 3.3.7]‖r∗ − rk‖ ↘ [7, Thm 3.3.3] ↘ [7, Thm 3.3.8]‖ATrk‖ not-monotonic ↘ [8, §3.2]‖rk‖ ↘ [19, §5.1] ↘ [7, Thm 3.3.11]

(‖ATrk‖/‖rk‖)LSQR ≥ (‖ATrk‖/‖rk‖)LSMR

xk converges to minimum-norm x∗ for singular systems

5. Conclusions. For full-rank least-squares problems min ‖Ax− b‖, the solversLSQR [19, 20] and LSMR [8, 14] are equivalent to CG and MINRES on the (spd) normalequation ATAx = ATb. Comparisons in [8] indicated that LSMR can often stop muchsooner than LSQR when the stopping rule is based on Stewart’s backward error norm‖ATrk‖/‖rk‖ for least-squares problems [22].

Our theoretical and experimental results here provide analogous evidence thatMINRES can often stop much sooner than CG on spd systems when the stopping ruleis based on the backward error ‖rk‖/‖xk‖ for Ax = b (or the more general backwarderrors in Theorem 3.1). In some cases, MINRES can converge faster than CG byas much as 2 orders of magnitude (Figure 4.3). On the other hand, CG convergessomewhat faster than MINRES in terms of both ‖x∗ − xk‖A and ‖x∗ − xk‖ (samefigure). For spd systems, Table 5.1 summarizes properties that were already knownby Hestenes and Stiefel [11] and Steihaug [21], along with the two additional propertiesof MINRES that we proved here (Theorems 2.3 and 3.1).

These theorems and experiments on CG and MINRES are part of the first author’sPhD thesis [7], which also discusses LSQR and LSMR and derives some new resultsfor both solvers. Table 5.2 summarizes the known results for LSQR and LSMR (in [19]and [8] respectively) and the newly derived properties for both solvers (in [7]).

Acknowledgements. We kindly thank Professor Mehiddin Al-Baali and othercolleagues for organizing the Second International Conference on Numerical Analysisand Optimization at Sultan Qaboos University (January 3–6, 2011, Muscat, Sultanateof Oman). Their wish to publish some of the conference papers in a special issue ofSQU Journal for Science gave added motivation for this research. We also thankDr Sou-Cheng Choi for many helpful discussions of the iterative solvers, and MichaelFriedlander for a final welcome tip.

Page 17: CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES 17

REFERENCES

[1] M. Arioli, A stopping criterion for the conjugate gradient algorithm in a finite element methodframework, Numer. Math., 97 (2004), pp. 1–24.

[2] S.-C. Choi, Iterative Methods for Singular Linear Equations and Least-Squares Problems, PhDthesis, Stanford University, Stanford, CA, December 2006.

[3] S.-C. Choi, C. C. Paige, and M. A. Saunders, MINRES-QLP: a Krylov subspace method forindefinite or singular symmetric systems, SIAM J. Sci. Comput., 33 (2011), pp. 1810–1836.

[4] A. R. Conn, N. I. M. Gould, and Ph. L. Toint, Trust-Region Methods, vol. 1, SIAM, Philadel-phia, 2000.

[5] T. A. Davis, University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/

research/sparse/matrices.[6] J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, LINPACK Users’ Guide,

SIAM, Philadelphia, 1979.[7] D. C.-L. Fong, Minimum-Residual Methods for Sparse Least-Squares Using Golub-Kahan

Bidiagonalization, PhD thesis, Stanford University, Stanford, CA, December 2011.[8] D. C.-L. Fong and M. A. Saunders, LSMR: An iterative algorithm for sparse least-squares

problems, SIAM J. Sci. Comput., 33 (2011), pp. 2950–2971.[9] R. W. Freund, G. H. Golub, and N. M. Nachtigal, Iterative solution of linear systems,

Acta Numerica, 1 (1992), pp. 57–100.[10] A. Greenbaum, Iterative Methods for Solving Linear Systems, SIAM, Philadelphia, 1997.[11] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems,

J. Res. Nat. Bur. Standards, 49 (1952), pp. 409–436.[12] N. J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, sec-

ond ed., 2002.[13] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential

and integral operators, J. Res. Nat. Bur. Standards, 45 (1950), pp. 255–282.[14] LSMR software for linear systems and least squares. http://www.stanford.edu/group/SOL/

software.html.[15] D. G. Luenberger, Hyperbolic pairs in the method of conjugate gradients, SIAM J. Appl.

Math., 17 (1969), pp. 1263–1267.[16] , The conjugate residual method for constrained minimization problems, SIAM J. Numer.

Anal., 7 (1970), pp. 390–398.[17] G. A. Meurant, The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite

Precision Computations, vol. 19 of Software, Environments, and Tools, SIAM, Philadelphia,2006.

[18] C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear equations,SIAM J. Numer. Anal., 12 (1975), pp. 617–629.

[19] , LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans.Math. Softw., 8 (1982), pp. 43–71.

[20] , Algorithm 583; LSQR: Sparse linear equations and least-squares problems, ACM Trans.Math. Softw., 8 (1982), pp. 195–209.

[21] T. Steihaug, The conjugate gradient method and trust regions in large scale optimization,SIAM J. Numer. Anal., 20 (1983), pp. 626–637.

[22] G. W. Stewart, Research, development and LINPACK, in Mathematical Software III, J. R.Rice, ed., Academic Press, New York, 1977, pp. 1–14.

[23] E. Stiefel, Relaxationsmethoden bester strategie zur losung linearer gleichungssysteme, Comm.Math. Helv., 29 (1955), pp. 157–179.

[24] Y. Sun, The Filter Algorithm for Solving Large-Scale Eigenproblems from Accelerator Simu-lations, PhD thesis, Stanford University, Stanford, CA, March 2003.

[25] D. Titley-Peloquin, Backward Perturbation Analysis of Least Squares Problems, PhD thesis,School of Computer Science, McGill University, 2010.

[26] , Convergence of backward error bounds in LSQR and LSMR. Private communication,2011.

[27] H. A. van der Vorst, Iterative Krylov Methods for Large Linear Systems, Cambridge Uni-versity Press, Cambridge, first ed., 2003.

[28] D. S. Watkins, Fundamentals of Matrix Computations, Pure and Applied Mathematics, Wiley,Hoboken, NJ, third ed., 2010.