Regularization of Total Least Squares Problems

Heinrich Vossvoss@tu-harburg.de

Hamburg University of TechnologyInstitute of Numerical Simulation

TUHH Heinrich Voss Total Least Squares Problems Valencia 2010 1 / 86

Outline

1 Total Least Squares Problems

2 Numerical solution of TLS problems

3 Regularization of TLS problems (RTLSQEP)

4 Nonlinear maxmin characterization

5 RTLSQEP: Numerical considerations

6 Numerical example

7 Regularization of TLS problems (RTLSLEP)

8 Determining the regualrization parameter

Total Least Squares Problems

Outline

6 Numerical example

Total Least Squares ProblemsThe ordinary Least Squares (LS) method assumes that the system matrix A ofa linear model is error free, and all errors are confined to the right hand side b.

However, in engineering applications this assumption is often unrealistic.Many problems in data estimation are obtained by linear systems where both,the matrix A and the right-hand side b, are contaminated by noise, forexample if A as well is only available by measurements or if A is an idealizedapproximation of the true operator.

If the true values of the observed variables satisfy linear relations, and if theerrors in the observations are independent random variables with zero meanand equal variance, then the total least squares (TLS) approach often givesbetter estimates than LS.

Given A ∈ Rm×n, b ∈ Rm, m ≥ n

Find ∆A ∈ Rm×n, ∆b ∈ Rm and x ∈ Rn such that

‖[∆A,∆b]‖2F = min! subject to (A + ∆A)x = b + ∆b, (1)

where ‖ · ‖F denotes the Frobenius norm.TUHH Heinrich Voss Total Least Squares Problems Valencia 2010 4 / 86

Total Least Squares Problems cnt.

Although the name “total least squares” was introduced only recently in theliterature by Golub and Van Loan (1980), this fitting method is not new andhas a long history in the statistical literature, where it is known as orthogonalregression, errors-in-variables, or measurement errors, and in imagedeblurring blind deconvolution

The univariate problem (n = 1) is already discussed by Adcock (1877), and itwas rediscovered many times, often independently.

About 30 – 40 years ago, the technique was extended by Sprent (1969) andGleser (1981) to the multivariate case (n > 1).

More recently, the total least squares method also stimulated interest outsidestatistics. In numerical linear algebra it was first studied by Golub and VanLoan (1980). Their analysis and their algorithm is based on the singular valuedecomposition.

TLS analysisIf b + ∆b is in the range of A + ∆A, then there is a vector x ∈ Rn such that

(A + ∆A)x = b + ∆b,

([A,b] + [∆A,∆b])

[x−1

]= 0. (2)

Let the singular value decomposition of C := [A,b] be given by

[A,b] = UΣV T (3)

with U ∈ Rm×(n+1),V ∈ R(n+1)×(n+1) having orthonormal columns and thematrix Σ = diag(σ1, . . . , σn+1) where it holds that

σ1 ≥ σ2 ≥ σk > σk+1 = · · · = σn+1.

σn+1 = min {‖[∆A,∆b]‖F : rank([A + ∆A,b + ∆b]) < n + 1} (4)

and the minimum is attained by [∆A,∆b] = −[A,b]vvT , where v is any unitvector in the right singular space

S := span{vk+1, . . . , vn+1} (5)

of [A,b] corresponding to the smallest singular value σn+1.

TLS analysis cnt.

Denote by S the set of right singular vectors corresponding to σn+1. Suppose,that there exits a vector v ∈ S having the form

], y ∈ Rn, α 6= 0.

Ifx :=

−1α

v and [∆A,∆b] = −[A,b]vvT (6)

([A,b] + [∆A,∆b])

[x−1

]= [A,b](I − vvT )(−v/α) = 0,

and x solves the TLS problem.

If en+1 = (0, . . . ,0,1)T is orthogonal to S, then the TLS problem has nosolution.

TLS analysis cnt.

], y ∈ Rn, α 6= 0.

Ifx :=

−1α

v and [∆A,∆b] = −[A,b]vvT (6)

([A,b] + [∆A,∆b])

[x−1

]= [A,b](I − vvT )(−v/α) = 0,

TLS analysis cnt.

], y ∈ Rn, α 6= 0.

Ifx :=

−1α

v and [∆A,∆b] = −[A,b]vvT (6)

([A,b] + [∆A,∆b])

[x−1

]= [A,b](I − vvT )(−v/α) = 0,

TLS analysis cnt.

If σn+1 is a repeated singular value of [A,b], then the TLS problem may lack aunique solution. However, in this case it is possible to single out a uniqueminimum norm TLS solution.

Let Q be an orthogonal matrix of order n − k + 1 with the property that

[vk+1, . . . , vn+1]Q =

[W y0 α

], y ∈ Rm, (7)

then xTLS := −y/α is the minimum norm TLS solution.

A unique solution can only occur if k = n, i.e. σn+1 is simple, and in case k = nthere exists a unique solution if the last component of vn+1 is different from 0.

Obviously, if σ′n is the smallest singular value of A and σn+1 < σ′n , then theTLS solution is unique, and for σn+1 = σ′n not.

TLS analysis cnt.

[vk+1, . . . , vn+1]Q =

[W y0 α

], y ∈ Rm, (7)

TLS analysis cnt.

[vk+1, . . . , vn+1]Q =

[W y0 α

], y ∈ Rm, (7)

TLS analysis cnt.

[vk+1, . . . , vn+1]Q =

[W y0 α

], y ∈ Rm, (7)

Uniquely solvable case

Theorem The TLS problem has a unique solution if and only if

σn+1 = σmin([A,b]) < σmin(A) = σ′n. (8)

If the TLS problem is uniquely solvable then a closed form solution is

xTLS = (AT A− σ2n+1I)−1AT b (9)

which looks similar to the normal equations for LS problems.

Comparing (9) to the normal equations for RLS problems with Tikhonovregularization we observe some kind of deregularization in the TLS case.

This deregularization results in a larger norm of the solution, i.e. the followingrelation holds

‖xTLS‖ ≥ ‖xLS‖ ≥ ‖xλ‖ (10)

where xλ is the solution of the Tikhonov regularized problem for some λ ≥ 0.TUHH Heinrich Voss Total Least Squares Problems Valencia 2010 9 / 86

‖xTLS‖ ≥ ‖xLS‖ ≥ ‖xλ‖ (10)

Uniquely solvable case cnt.

The inequalities (10) can be directly obtained with the help of the SVD ofA = U ′Σ′V ′T∥∥∥∥∥

n∑i=1

σ′i (bT u′i )

σ′2i − σ2n+1

v ′i

∥∥∥∥∥ ≥∥∥∥∥∥

n∑i=1

bT u′iσ′i

v ′i

∥∥∥∥∥ ≥∥∥∥∥∥

n∑i=1

σ′i (bT u′i )

σ′2i + λv ′i

∥∥∥∥∥ . (11)

If σn+1 6= 0 the strict inequality ‖xTLS‖ > ‖xLS‖ holds, otherwise the systemAx = b is consistent.

Uniquely solvable case cnt.

The inequalities (10) can be directly obtained with the help of the SVD ofA = U ′Σ′V ′T∥∥∥∥∥

n∑i=1

σ′i (bT u′i )

σ′2i − σ2n+1

v ′i

∥∥∥∥∥ ≥∥∥∥∥∥

n∑i=1

bT u′iσ′i

v ′i

∥∥∥∥∥ ≥∥∥∥∥∥

n∑i=1

σ′i (bT u′i )

σ′2i + λv ′i

∥∥∥∥∥ . (11)

If σn+1 6= 0 the strict inequality ‖xTLS‖ > ‖xLS‖ holds, otherwise the systemAx = b is consistent.

Alternative characterizationLet

f (x) :=‖Ax − b‖2

1 + ‖x‖2 , x ∈ Rn. (12)

It holds that

σ2n+1 = min

vT [A,b]T [A,b]vvT v

≤ f (x) =[xT ,−1][A,b]T [A,b][xT ,−1]T

1 + ‖x‖2 .

If the TLS problem is solvable then we have σ2n+1 = f (xTLS), and if the TLS

problem is not solvable, then there exists x ∈ Rn, ‖x‖ = 1 with‖Ax‖ = σ′n = σn+1, and it follows that

limk→∞

f (kx) = limk→∞

‖kAx − b‖2

1 + k2 = ‖Ax‖2 = σn+1.

thus the solution of the TLS problem can be characterized by the minimizationproblem

xTLS = argmin‖Ax − b‖2

1 + ‖x‖2 , x ∈ Rn. (13)

f (x) :=‖Ax − b‖2

1 + ‖x‖2 , x ∈ Rn. (12)

It holds that

σ2n+1 = min

≤ f (x) =[xT ,−1][A,b]T [A,b][xT ,−1]T

1 + ‖x‖2 .

limk→∞

f (kx) = limk→∞

‖kAx − b‖2

1 + k2 = ‖Ax‖2 = σn+1.

1 + ‖x‖2 , x ∈ Rn. (13)

f (x) :=‖Ax − b‖2

1 + ‖x‖2 , x ∈ Rn. (12)

It holds that

σ2n+1 = min

≤ f (x) =[xT ,−1][A,b]T [A,b][xT ,−1]T

1 + ‖x‖2 .

limk→∞

f (kx) = limk→∞

‖kAx − b‖2

1 + k2 = ‖Ax‖2 = σn+1.

1 + ‖x‖2 , x ∈ Rn. (13)

f (x) :=‖Ax − b‖2

1 + ‖x‖2 , x ∈ Rn. (12)

It holds that

σ2n+1 = min

≤ f (x) =[xT ,−1][A,b]T [A,b][xT ,−1]T

1 + ‖x‖2 .

limk→∞

f (kx) = limk→∞

‖kAx − b‖2

1 + k2 = ‖Ax‖2 = σn+1.

1 + ‖x‖2 , x ∈ Rn. (13)

Numerical solution of TLS problems

Outline

6 Numerical example

Newton’s method

In the generic case the TLS solution x = xTLS satisfies the system ofequations [

AT A AT bbT A bT b

] [x−1

[x−1

where λ = σ2n+1 is the square of the smallest singular value of [A,b].

Hence, (λ, (xT ,−1)T ) is the eigenpair of the symmetric matrix [A,b]T [A,b]corresponding to its smallest eigenvalue. A widely used method for computingit is inverse iteration where it is natural in this case to normalize theeigenvector approximation such that its last component is −1.

Inverse iteration is equivalent to Newton’s method for the system of n + 1nonlinear equations in x and λ(

f (x , λ)g(x , λ)

(−AT r − λx−bT r + λ

)= 0 (15)

with r := b − Ax .

Newton’s method

AT A AT bbT A bT b

] [x−1

[x−1

f (x , λ)g(x , λ)

)= 0 (15)

Newton’s method

AT A AT bbT A bT b

] [x−1

[x−1

f (x , λ)g(x , λ)

)= 0 (15)

Newton’s method cnt.

The Jacobi matrix of this system is

[AT A− λI −x

bT A 1

[J −x

bT A 1

with J = AT A− σ2I, λ = σ2.

We assume in the following that σ < σ′n such that J is positive definite.

Let the current approximation be (xk , λ(k)). Then the corrections are obtainedfrom the linear system[

J −xk

bT A 1

] [∆xk

∆λ(k)

[−AT r k − λ(k)xk

−bT r k + λ(k)

]. (17)

[AT A− λI −x

bT A 1

[J −x

bT A 1

J −xk

bT A 1

] [∆xk

∆λ(k)

−bT r k + λ(k)

]. (17)

[AT A− λI −x

bT A 1

[J −x

bT A 1

J −xk

bT A 1

] [∆xk

∆λ(k)

−bT r k + λ(k)

]. (17)

Newton’s method cnt.To solve this system we take advantage of the block LU factorization of J.

] [J −xk

where Jz = AT b, τ = 1 + zT xk = 1 + bT AJ−1xk .

To improve the numerical stability we note that

Jz = AT b = AT r k + Jxk + λk xk = Jxk − fk ,

and hence z can be computed from z = xk − J−1fk .

By (block) forward and backward substitution we obtain

∆λ(k) = β := (zT fk − gg)/τ, ∆xk = −J−1fk + βJ−1xk .

Hence, in each Newton step two symmetric linear systems have to be solved

Jyk = xk , and wk = −fk with J = AT A− σ2I.

] [J −xk

Require: Initial approximation x1: r = b − Ax2: σ2 = rT r3: for k = 1,2, . . . until δ <tol do4: f = −AT r − σ2x5: g = −bT r + σ2x6: δ =

√f T f + g2

7: solve (AT A− σ2I)[y ,w ] = [x ,−f ]8: z = x + w9: β = (zT f − g)/(zT x + 1)

10: x = z + βy11: σ2 = σ2 + β12: r = b − Ax13: end for

Rayleigh quotient iterationNewton’s method is known to converge quadratically. One can even get cubicconvergence if σ2 is not updated by σ2 ← σ2 + β but by the Rayleigh quotient

ρ =[xT ,−1][A,b]T [A,b][xT ,−1]T

xT x + 1=‖Ax − b‖2

1 + ‖x‖2 + 1=

rT rxT x + 1

Then one gets the following Rayleigh quotient iteration

Require: Initial approximation x1: for k = 1,2, . . . until δ <tol do2: r = b − Ax3: σ2 = rT r/(1 + xT x)4: f = −AT r − σ2x5: g = −bT r + σ2x6: δ =

√f T f + g2

10: x = z + βy11: end for

Rayleigh quotient iterationNewton’s method is known to converge quadratically. One can even get cubicconvergence if σ2 is not updated by σ2 ← σ2 + β but by the Rayleigh quotient

ρ =[xT ,−1][A,b]T [A,b][xT ,−1]T

xT x + 1=‖Ax − b‖2

1 + ‖x‖2 + 1=

rT rxT x + 1

Then one gets the following Rayleigh quotient iteration

Require: Initial approximation x1: for k = 1,2, . . . until δ <tol do2: r = b − Ax3: σ2 = rT r/(1 + xT x)4: f = −AT r − σ2x5: g = −bT r + σ2x6: δ =

√f T f + g2

10: x = z + βy11: end for

Regularization of TLS problems (RTLSQEP)

Outline

6 Numerical example

Regularized Total Least Squares Problem

If A and [A,b] are ill-conditioned, regularization is necessary. We consider thequadratically constrained total least squares problem.

Let L ∈ Rk×n, k ≤ n and δ > 0. Then the quadratically constrained formulationof the Regularized Total Least Squares (RTLS) problem reads:Find ∆A ∈ Rm×n, ∆b ∈ Rm and x ∈ Rn such that

‖[∆A,∆b]‖2F = min! subject to (A + ∆A)x = b + ∆b, ‖Lx‖2 ≤ δ2.

Using the orthogonal distance this problems can be rewritten asFind x ∈ Rn such that

‖Ax − b‖2

1 + ‖x‖2 = min! subject to ‖Lx‖2 ≤ δ2.

‖Ax − b‖2

Regularized Total Least Squares Problem cnt.

If δ > 0 is chosen small enough (e.g. δ < ‖LxTLS‖ where xTLS is the solution ofthe TLS problem), then the constraint ‖Lx‖2 ≤ δ2 is active, and the RTLSproblem readsFind x ∈ Rn such that

‖Ax − b‖2

1 + ‖x‖2 = min! subject to ‖Lx‖2 = δ2. (1)

Problem is equivalent to the quadratic optimization problem

‖Ax − b‖2 − f ∗(1 + ‖x‖2) = min! subject to ‖Lx‖2 = δ2, (2)

wheref ∗ = inf{f (x) : ‖Lx‖2 = δ2},

i.e. x∗ is a global minimizer of problem (1) if and only if it is a global minimizerof (2).

If δ > 0 is chosen small enough (e.g. δ < ‖LxTLS‖ where xTLS is the solution ofthe TLS problem), then the constraint ‖Lx‖2 ≤ δ2 is active, and the RTLSproblem readsFind x ∈ Rn such that

‖Ax − b‖2

1 + ‖x‖2 = min! subject to ‖Lx‖2 = δ2. (1)

Problem is equivalent to the quadratic optimization problem

‖Ax − b‖2 − f ∗(1 + ‖x‖2) = min! subject to ‖Lx‖2 = δ2, (2)

wheref ∗ = inf{f (x) : ‖Lx‖2 = δ2},

i.e. x∗ is a global minimizer of problem (1) if and only if it is a global minimizerof (2).

This suggests to consider the following family of quadratic optimizationproblems:

For fixed y ∈ Rn find x ∈ Rn such that

g(x ; y) := ‖Ax − b‖2 − f (y)(1 + ‖x‖2) = min!

subject to ‖Lx‖2 = δ2. (Py )

Lemma 1Problem (Py ) admits a global minimizer if and only if

f (y) ≤ minx∈N (L),x 6=0

xT AT AxxT x

, (19)

where N (L) denotes the null space of L.If L is square and nonsingular, the feasible region is F = {x : ‖Lx‖2 = δ2} isa nondegenerate ellipsoid; Therefore any quadratic function obtains its globalminimum on F . On the other hand, N (LT L) = {0}, and any vector x satisfies(19).If LT L is singular, any x can be uniquely written as x = x1 + x2 with xT

1 x2 = 0,x2 ∈ N (LT L), x1 ∈ range(LT L), and the unboundedness of g(·, y) can onlyoccur from the contribution of x2.Hence, g(·, y) is bounded below if and only if xT

2 Hx2 ≥ 0 for any x2 ∈ N (LT L)where H = AT A− f (y)xT x is the Hessian of g(·, y), from which (19) is readilyobtained.

f (y) ≤ minx∈N (L),x 6=0

xT AT AxxT x

, (19)

f (y) ≤ minx∈N (L),x 6=0

xT AT AxxT x

, (19)

f (y) ≤ minx∈N (L),x 6=0

xT AT AxxT x

, (19)

RTLSQEP MethodLemma 2Assume that y satisfies conditions of Lemma 1 and ‖Ly‖ = δ, and let z be aglobal minimizer of problem (Py ). Then it holds that

f (z) ≤ f (y).

Proof:

(1 + ‖z‖)2(f (z)− f (y)) = g(z; y) ≤ g(y ; y) = (1 + ‖y‖2)(f (y)− f (y)) = 0.

This monotonicity result suggests the following algorithm:

Require: x0 satisfying conditions of Lemma 1 and ‖Lx0‖ = δ.for m = 0,1,2, . . . until convergence do

Determine global minimizer xm+1 of

g(x ; xm) = min! subject to ‖Lx‖2 = δ2.

end forThis is exactly the RTLSQEP (regularized total least squares method viaquadratic eigenvalue problems) introduced by Sima, Van Huffel, and Golub(2004), but motivated in a different way.

f (z) ≤ f (y).

Proof:

f (z) ≤ f (y).

Proof:

f (z) ≤ f (y).

Proof:

RTLSQEP cnt.

Form Lemma 2 it follows that

0 ≤ f (xm+1) ≤ f (xm),

i.e. {f (xm)} is monotonically decreasing und bounded below.

The (Pxm ) can be solved via the first order necessary optimality conditions

(AT A− f (xm)I)x + λLT Lx = AT b, ‖Lx‖2 = δ2.

Although g(·; xm) in general is not convex these conditions are even sufficientif the Lagrange parameter is chosen maximal.

RTLSQEP cnt.

0 ≤ f (xm+1) ≤ f (xm),

RTLSQEP cnt.

0 ≤ f (xm+1) ≤ f (xm),

RTLSQEP cnt.

Theorem 1Assume that (λ, x) solves the first order conditions.

(AT A− f (y)I)x + λLT Lx = AT b, ‖Lx‖2 = δ2. (20)

If ‖Ly‖ = δ and λ is the maximal Lagrange multiplier then x is a globalminimizer of problem (Py ).

ProofThe statement follows immediately from the following equation which can beshown similarly as in Gander (1981):If (λj , z j ), j = 1,2, are solutions of (20) then it holds that

g(z2; y)− g(z1; y) =12

(λ1 − λ2)‖L(z1 − z2)‖2.

RTLSQEP cnt.

Theorem 1Assume that (λ, x) solves the first order conditions.

(AT A− f (y)I)x + λLT Lx = AT b, ‖Lx‖2 = δ2. (20)

If ‖Ly‖ = δ and λ is the maximal Lagrange multiplier then x is a globalminimizer of problem (Py ).

ProofThe statement follows immediately from the following equation which can beshown similarly as in Gander (1981):If (λj , z j ), j = 1,2, are solutions of (20) then it holds that

g(z2; y)− g(z1; y) =12

(λ1 − λ2)‖L(z1 − z2)‖2.

A quadratic eigenproblem

The first order conditions

(AT A− f (y)I)x + λLT Lx = AT b, ‖Lx‖2 = δ2

can be solved via a quadratic eigenvalue problem.

If L is square and singular, let z := Lx . Then it holds that

Wz + λz := L−T (AT A− f (y)I)L−1z + λz = L−T AT b =: h, zT z = δ2.

u := (W + λI)−2h ⇒ hT u = zT z = δ2 ⇒ h = δ−2hhT u

Hence,(W + λI)2u − δ−2hhT u = 0.

If λ is the right-most real eigenvalue, and the corresponding eigenvector isscaled such that hT u = δ2 then the solution of problem (∗) is recovered asx = L−T (W + λI)u.

A quadratic eigenproblem cnt.

Assume that k < n where L has linearly independent rows.

Let LT L = USUT be the spectral decomposition of LT L. Then the first ordercondition is equivalent to(

(AU)T (AU)− f (y)I)

z + λSz = (AU)T b, zT Sz = δ2

with z = UT x .Partitioning the matrices and vectors in block form

(AU)T (AU) =

[T1 T2T T

], S =

[S1 00 0

], (AU)T b =

], z =

]where the leading blocks have dimension k , one gets[

T1 − f (y)Ik T2T T

2 T4 − f (y)In−k

] [z1z2

(AU)T (AU) =

[T1 T2T T

], S =

[S1 00 0

], (AU)T b =

], z =

2 T4 − f (y)In−k

] [z1z2

(AU)T (AU) =

[T1 T2T T

], S =

[S1 00 0

], (AU)T b =

], z =

2 T4 − f (y)In−k

] [z1z2

A quadratic eigenproblem cnt.Solving the second component for z2

z2 = (T4 − f (y)In−k )−1(c2 − T T2 y1)

and substituting in the first component one gets(T1 − f (y)Ik − T2(T4 − f (y)In−k )−1T T

)z1 + λS1z1

= (c1 − T2(T4 − f (y)In−k )−1c2).

Hence, one gets the quadratic eigenvalue problem

(W + λI)2u − δ−2hhT u = 0,

W = S−1/21

(T1 − f (y)Ik − T2(T4 − f (y)In−k )−1T T

)S−1/2

h = S−1/21

(c1 − T2(T4 − f (y)In−k )−1c2

A quadratic eigenproblem cnt.Solving the second component for z2

z2 = (T4 − f (y)In−k )−1(c2 − T T2 y1)

and substituting in the first component one gets(T1 − f (y)Ik − T2(T4 − f (y)In−k )−1T T

)z1 + λS1z1

= (c1 − T2(T4 − f (y)In−k )−1c2).

Hence, one gets the quadratic eigenvalue problem

(W + λI)2u − δ−2hhT u = 0,

W = S−1/21

(T1 − f (y)Ik − T2(T4 − f (y)In−k )−1T T

)S−1/2

h = S−1/21

(c1 − T2(T4 − f (y)In−k )−1c2

If (λ,u) is the eigenpair corresponding to the right most eigenvalue and u isnormalized such that uT h = δ2, and w = (W + λI)u, then the solution of thefirst order conditions is recovered by x = Uz where

(S−1/2

1 w(T4 − f (y)I)−1(c2 − T T

2 S−1/21 w)

For k < n the quadratic eigenproblem looks very complicated. Noticehowever, that n − k usually is very small (n − k = 1 and n − k = 2 for thediscrete first and second order derivative) and quite often fast decompositiontechniques can be used.

Moreover, L with k < n often can be replace by some nonsingularapproximation.

(S−1/2

1 w(T4 − f (y)I)−1(c2 − T T

2 S−1/21 w)

(S−1/2

1 w(T4 − f (y)I)−1(c2 − T T

2 S−1/21 w)

Convergence of RTLSQEPTheoremAny limit point x∗ of the sequence {xm} constructed by RTLSQEP is a globalminimizer of the optimization problem

f (x) = min subject to ‖Lx‖2 = δ2.

Proof: Let x∗ be a limit point of {xm}, and let {xmj} be a subsequenceconverging to x∗. Then xmj solves the first order conditions

(AT A− f (xmj−1)I)xmj + λmj LT Lxmj = AT b.

By the monotonicity of f (xm) it follows that f (xmj−1) converges to f (x∗).

Since W (y) and h(y) depend continuously on f (y) the sequence of right-mosteigenvalues {λmj′} converges to some λ∗, and x∗ satisfies

(AT A− f (x∗)I)x∗ + λ∗LT Lx∗ = AT b, ‖Lx∗‖2 = δ2,

where λ∗ is the maximal Lagrange multiplier.TUHH Heinrich Voss Total Least Squares Problems Valencia 2010 30 / 86

Convergence of RTLSQEP

Hence, x∗ is a global minimizer of

g(x ; x∗) = min! subject to ‖Lx‖2 = δ2,

and for y ∈ Rn with ‖Ly‖2 = δ2 it follows that

0 = g(x∗; x∗) ≤ g(y ; x∗)= ‖Ay − b‖2 − f (x∗)(1 + ‖y‖2)

= (f (y)− f (x∗))(1 + ‖y‖2),

i.e.f (y) ≥ f (x∗).

Convergence of RTLSQEP

Hence, x∗ is a global minimizer of

g(x ; x∗) = min! subject to ‖Lx‖2 = δ2,

and for y ∈ Rn with ‖Ly‖2 = δ2 it follows that

0 = g(x∗; x∗) ≤ g(y ; x∗)= ‖Ay − b‖2 − f (x∗)(1 + ‖y‖2)

= (f (y)− f (x∗))(1 + ‖y‖2),

i.e.f (y) ≥ f (x∗).

Nonlinear maxmin characterization

Outline

6 Numerical example

Nonlinear maxmin characterizationLet T (λ) ∈ Cn×n, T (λ) = T (λ)H , λ ∈ J ⊂ R an open interval (maybeunbounded).

For every fixed x ∈ Cn, x 6= 0 assume that the real function

f (·; x) : J → R, f (λ; x) := xHT (λ)x

is continuous, and that the real equation

f (λ, x) = 0

has at most one solution λ =: p(x) in J.

Then equation f (λ, x) = 0 implicitly defines a functional p on some subset Dof Cn which we call the Rayleigh functional.

Assume that

(λ− p(x))f (λ, x) > 0 for every x ∈ D, λ 6= p(x).

f (·; x) : J → R, f (λ; x) := xHT (λ)x

f (λ, x) = 0

Assume that

f (·; x) : J → R, f (λ; x) := xHT (λ)x

f (λ, x) = 0

Assume that

f (·; x) : J → R, f (λ; x) := xHT (λ)x

f (λ, x) = 0

Assume that

maxmin characterization (V., Werner 1982)Let supv∈D p(v) ∈ J and assume that there exists a subspace W ⊂ Cn ofdimension ` such that

W ∩ D 6= ∅ and infv∈W∩D

p(v) ∈ J.

Then T (λ)x = 0 has at least ` eigenvalues in J, and for j = 1, . . . , ` thej-largest eigenvalue λj can be characterized by

λj = maxdim V=j,V∩D 6=∅

infv∈V∩D

p(v). (1)

For j = 1, . . . , ` every j dimensional subspace V ⊂ Cn with

V ∩ D 6= ∅ and λj = infv∈V∩D

is contained in D ∪ {0}, and the maxmin characterization of λj can bereplaced by

λj = maxdim V=j,

V\{0}⊂D

minv∈V\{0}

p(v) ∈ J.

infv∈V∩D

p(v). (1)

λj = maxdim V=j,

V\{0}⊂D

minv∈V\{0}

p(v) ∈ J.

infv∈V∩D

p(v). (1)

λj = maxdim V=j,

V\{0}⊂D

minv∈V\{0}

Back to

T (λ)x :=((W + λI)2 − δ−2hhT )x = 0 (QEP)

f (λ, x) = xHT (λ)x = λ2‖x‖2 + 2λxHWx + ‖Wx‖2 − |xHh|2/δ2, x 6= 0

is a parabola which attains its minimum at

λ = −xHWxxHx

Let J = (−λmin,∞) where λmin is the minimum eigenvalue of W . Thenf (λ, x) = 0 has at most one solution p(x) ∈ J for every x 6= 0. Hence, theRayleigh functional p of (QEP) corresponding to J is defined, and the generalconditions are satisfied.

Back to

T (λ)x :=((W + λI)2 − δ−2hhT )x = 0 (QEP)

λ = −xHWxxHx

Back to

T (λ)x :=((W + λI)2 − δ−2hhT )x = 0 (QEP)

λ = −xHWxxHx

Characterization of maximal real eigenvalue

Let xmin be an eigenvector of W corresponding to λmin. Then

f (−λmin, xmin) = xHmin(W − λmin)2xmin − |xH

minh|2/δ2 = −|xHminh|2/δ2 ≤ 0

Hence, if xHminh 6= 0 then xmin ∈ D.

If xHminh = 0, and the minimum eigenvalue µmin of T (−λmin) is negative, then

for the corresponding eigenvector ymin it holds

f (−λmin, ymin) = yHminT (−λmin)ymin = µmin‖ymin‖2 < 0,

and ymin ∈ D.

If xHminh = 0, and T (−λmin) is positive semi-definite, then

f (−λmin, x) = xHT (−λmin)x ≥ 0 for every x 6= 0,

and D = ∅.

and ymin ∈ D.

and D = ∅.

and ymin ∈ D.

and D = ∅.

and ymin ∈ D.

and D = ∅.

Characterization of maximal real eigenvalue cnt.

Assume that D 6= ∅. For xHh = 0 it holds that

f (λ, x) = ‖(W + λI)x‖2 > 0 for every λ ∈ J,

i.e. x 6∈ D.

Hence, D does not contain a two-dimensional subspace of Rn, and thereforeJ contains at most one eigenvalue of (QEP).

If λ ∈ C is a non-real eigenvalue of (QEP) and x a corresponding eigenvector,then

xHT (λ)x = λ2‖x‖2 + 2λxHWx + ‖Wx‖2 − |xHh|2/δ2 = 0.

Hence, the real part of λ satisfies

real(λ) = −xHWxxHx

≤ −λmin.

i.e. x 6∈ D.

≤ −λmin.

i.e. x 6∈ D.

≤ −λmin.

Theorem

Let λmin be the minimal eigenvalue of W , and xmin be a correspondingeigenvector.

If xHminh = 0 and T (−λmin) is positive semi-definite, then λ := −λmin is the

maximal real eigenvalue of (QEP).

Otherwise, the maximal real eigenvalue is the unique eigenvalue λ of(QEP) in J = (−λmin,∞), and it holds

λ = maxx∈D

λ is the right most eigenvalue of (QEP), i.e.

real(λ) ≤ −λmin ≤ λ for every eigenvalue λ 6= λ of (QEP).

Theorem

λ = maxx∈D

Theorem

λ = maxx∈D

Theorem

λ = maxx∈D

Example

Example: close up

Positivity of λ

Sima et al. claimed that the right-most eigenvalue problem is always positive.

Simplest counter–example: If W is positive definite with eigenvalue λj > 0,then −λj are the only eigenvalues of the quadratic eigenproblem(W + λI)2x = 0, and if the term δ−2hhT is small enough, then the quadraticproblem will have no positive eigenvalue, but the right–most eigenvalue will benegative.

However, in quadratic eigenproblems occurring in regularized total leastsquares problems δ and h are not arbitrary, but regularization only makessense if δ ≤ ‖LxTLS‖ where xTLS denotes the solution of the total least squaresproblem without regularization.

The following theorem characterizes the case that the right–most eigenvalueis negative.

Positivity of λ

Positivity of λ cnt.

Theorem 3The maximal real eigenvalue λ of the quadratic problem

(W + λI)2x − δ−2hhT x = 0

is negative if and only if W is positive definite and

‖W−1h‖ < δ.

For the quadratic eigenproblem occuring in regularized total least squares itholds that

‖W−1h‖ = ‖L(AT A− f (x)I)−1AT b‖.

For the standard case L = I the right-most eigenvalue λ is always nonnegativeif δ < ‖xTLS‖.

(W + λI)2x − δ−2hhT x = 0

‖W−1h‖ < δ.

‖W−1h‖ = ‖L(AT A− f (x)I)−1AT b‖.

(W + λI)2x − δ−2hhT x = 0

‖W−1h‖ < δ.

‖W−1h‖ = ‖L(AT A− f (x)I)−1AT b‖.

RTLSQEP: Numerical considerations

Outline

6 Numerical example

Quadratic eigenproblem

The quadratic eigenproblems

Tm(λ)z = (Wm + λI)2z − 1δ2 hmhT

mz = 0

can be solved bylinearizationKrylov subspace method for QEP (Li & Ye 2003)SOAR (Bai & Su 2005)nonlinear Arnoldi method (Meerbergen 2001, V. 2004)

Krylov subspace method: Li & Ye 2003For A,B ∈ Rn×n such that some linear combination of A and B is a matrix ofrank q with an Arnoldi-type process a matrix Q ∈ Rn×m+q+1 with orthonormalcolumns and two matrices Ha ∈ Rm+q+1×m and Hb ∈ Rm+q+1×m with lowerbandwidth q + 1 are determined such that

AQ(:,1 : m) = Q(:,1 : m + q + 1)Ha and BQ(:,1 : m) = Q(:,1 : m + q + 1)Hb.

Then approximations to eigenpairs of the quadratic eigenproblem

(λ2I − λA− B)y = 0

are obtained from its projection onto spanQ(:,1 : m) which reads

(λ2Im − λHa(1 : m,1 : m)− Hb(1 : m,1 : m))z = 0.

For the quadratic eigenproblem ((W + λI)2 − δ−2hhT )x = 0 the algorithm of Li& Ye is applied with A = −W and B = hhT from which the projected problem

(θ2I−2θHa(1 : m, 1 : m)−Ha(1 : m+2, 1 : m)T Ha(1 : m+2,m)−δ−2Hb(1 : m, 1 : m))z = 0

is easily obtained.TUHH Heinrich Voss Total Least Squares Problems Valencia 2010 45 / 86

(λ2I − λA− B)y = 0

(λ2Im − λHa(1 : m,1 : m)− Hb(1 : m,1 : m))z = 0.

(λ2I − λA− B)y = 0

(λ2Im − λHa(1 : m,1 : m)− Hb(1 : m,1 : m))z = 0.

SOAR: Bai & Su 2005

The Second Order Arnoldi Reduction method is based on the observation thatthe Krylov space of the linearization(

A BI O

(I OO I

of (λ2I − λA− B)y = 0 with initial vector(

)has the form

), . . . ,

(rk−1rk−2

)}where

r1 = r0, and rj = Arj−1 + Brj−2, for j ≥ 2.

The entire information on Kk is therefore contained in the Second OrderKrylov Space

Gk (A,B) = span{r0, r1, . . . , rk−1}.

SOAR determines an orthonormal basis of Gk (A,B).

SOAR: Bai & Su 2005

The Second Order Arnoldi Reduction method is based on the observation thatthe Krylov space of the linearization(

A BI O

(I OO I

of (λ2I − λA− B)y = 0 with initial vector(

)has the form

), . . . ,

(rk−1rk−2

)}where

r1 = r0, and rj = Arj−1 + Brj−2, for j ≥ 2.

The entire information on Kk is therefore contained in the Second OrderKrylov Space

Gk (A,B) = span{r0, r1, . . . , rk−1}.

SOAR determines an orthonormal basis of Gk (A,B).

Nonlinear Arnoldi Method

1: start with initial basis V , V T V = I2: determine preconditioner M ≈ T (σ)−1, σ close to wanted eigenvalue3: find smallest eigenvalue µ of V T T (µ)Vy = 0 and corresponding eigenvector y4: set u = Vy , r = T (µ)u5: while ‖r‖/‖u‖ > ε do6: v = Mr7: v = v − VV T v8: v = v/‖v‖, V = [V , v ]9: find smallest eigenvalue µ of V T T (µ)Vy = 0 and corresponding eigenvector y

10: set u = Vy , r = T (µ)u11: end while

Reuse of information

The convergence of Wm and hm suggests to reuse information from theprevious iterations when solving Tm(λ)z = 0.

Krylov subspace methods for Tm(λ)z = 0 can be started with the solutionzm−1 of Tm−1(λ)z = 0.

The nonlinear Arnoldi method can use thick starts, i.e. the projection methodfor

Tm(λ)z = 0

can be initialized by Vm−1 where zm−1 = Vm−1uj−1, and uj−1 is an eigenvectorof V T

m−1Tm−1(λ)Vm−1u = 0.

Tm(λ)z = 0

m−1Tm−1(λ)Vm−1u = 0.

Tm(λ)z = 0

m−1Tm−1(λ)Vm−1u = 0.

Thick starts

Wm = C − f (xm)D − S(T − f (xm)In−k )−1ST

hm = g − D(T − f (xm)In−k )−1c

with C,D ∈ Rk×k , S ∈ Rk×n−k , T ∈ Rn−k×n−k , g ∈ Rk , c ∈ Rn−k .

Hence, in order to update the projected problem

V T (Wm+1 + λI)2Vu − 1δ2 V T hmhT

mVu = 0

one has to keep only CV , DV , ST V , and gT V .

Since it is inexpensive to obtain updates of Wm and hm we decided toterminate the inner iteration long before convergence, namely if the residual ofthe quadratic eigenvalue was reduced by at least 10−2. This reduced thecomputing time further.

Thick starts

mVu = 0

Thick starts

mVu = 0

Numerical exampleTo evaluate RTLSQEP for large dimension we considered the example shawfrom Hansen’s regularization tools which is discrete Fredholm equation of thefirst kind with one dimensional domain.

We considered problems of dimensions n = 1000, n = 2000 and n = 4000,and we added white noise of level 5% and 50% to the system matrix and theright hand side. The following table contains the CPU times in secondsaveraged over 100 random simulations on a pentium R4 computer with 3.4GHz and 3 GB RAM.

Table: Example shaw, average CPU time in seconds

noise n SOAR Li & Ye NL Arn. (exact) NL Arn. (early)5% 1000 0.40 0.23 0.29 0.12

2000 1.33 0.60 0.65 0.374000 5.09 1.89 2.06 1.39

50% 1000 0.27 0.14 0.28 0.112000 1.01 0.44 0.68 0.354000 4.10 1.73 2.23 1.31

2000 1.33 0.60 0.65 0.374000 5.09 1.89 2.06 1.39

50% 1000 0.27 0.14 0.28 0.112000 1.01 0.44 0.68 0.354000 4.10 1.73 2.23 1.31

2000 1.33 0.60 0.65 0.374000 5.09 1.89 2.06 1.39

50% 1000 0.27 0.14 0.28 0.112000 1.01 0.44 0.68 0.354000 4.10 1.73 2.23 1.31

Example: shaw(2000); Li & Ye

0 20 40 60 80 10010

10−10

10−5

shaw(2000) − convergence history of Li/Ye−Arnoldi

Inner Iterations

Example: shaw(2000); SOAR

0 20 40 60 8010

10−10

10−5

shaw(2000) − convergence history of SOAR

Inner Iterations

Example: shaw(2000); nonlinear Arnoldi

0 5 10 15 20 25 3010

10−10

10−5

shaw(2000) − convergence history of NL−Arnoldi

Inner Iterations

Early updates

0 2 4 6 8 10 12 1410

10−10

10−5

shaw(2000) − convergence history of NL−Arnoldi early

Inner Iterations

Numerical example

Outline

6 Numerical example

Regularization of TLS problems (RTLSLEP)

Outline

6 Numerical example

First Order Conditions; Golub, Hansen, O’Leary 1999

Assume xRTLS exists and constraint is active, then (RTLS) is equivalent to

f (x) :=‖Ax − b‖2

1 + ‖x‖2 = min! subject to ‖Lx‖2 = δ2.

First-order optimality conditions are equivalent to

(AT A + λI I + λLLT L)x = AT b,µ ≥ 0, ‖Lx‖2 = δ2

λI = −‖Ax − b‖2

1 + ‖x‖2 , λL = µ(1 + ‖x‖2), µ =bT (b − Ax) + λI

δ2(1 + ‖x‖2).

First Order Conditions; Golub, Hansen, O’Leary 1999

Assume xRTLS exists and constraint is active, then (RTLS) is equivalent to

f (x) :=‖Ax − b‖2

1 + ‖x‖2 = min! subject to ‖Lx‖2 = δ2.

First-order optimality conditions are equivalent to

(AT A + λI I + λLLT L)x = AT b,µ ≥ 0, ‖Lx‖2 = δ2

λI = −‖Ax − b‖2

1 + ‖x‖2 , λL = µ(1 + ‖x‖2), µ =bT (b − Ax) + λI

δ2(1 + ‖x‖2).

Two Iterative Algorithms based on EVPs

Two approaches for solving the first order conditions(AT A + λI(x)I + λL(x)LT L

)x = AT b (∗)

Idea of Sima, Van Huffel, Golub [2004]:

Iterative algorithm based on updating λI

With fixed λI reformulate (∗) into QEPDetermine rightmost eigenvalue, i.e. the free parameter λL

Use corresponding eigenvector to update λI

Idea of Renaut, Guo [2005] is the other way round:

Iterative algorithm based on updating λL

With fixed λL reformulate (∗) into linear EVPDetermine smallest eigenvalue, i.e. the free parameter λI

Use corresponding eigenvector to update λL

)x = AT b (∗)

Necessary Condition: Guo, Renaut 2002

The RTLS solution x satisfies the linear eigenvalue problem

(x−1

)= −λI

(x−1

)where

B(x) = [A,b]T [A,b] + λL(x)

(LT L 0

0 −δ2

)λI = −f (x)

λL = − 1δ2 (bT (Ax − b)− λI)

Conversely, if(− λ,

(x−1

))is an eigenpair of B(x), and

λL(x) = − 1δ2 (bT (Ax − b) + f (x)), then x satisfies the necessary conditions,

and the eigenvalue is given by λ = −f (x).

Necessary Condition: Guo, Renaut 2002

The RTLS solution x satisfies the linear eigenvalue problem

(x−1

)= −λI

(x−1

)where

B(x) = [A,b]T [A,b] + λL(x)

(LT L 0

0 −δ2

)λI = −f (x)

λL = − 1δ2 (bT (Ax − b)− λI)

Conversely, if(− λ,

(x−1

))is an eigenpair of B(x), and

λL(x) = − 1δ2 (bT (Ax − b) + f (x)), then x satisfies the necessary conditions,

and the eigenvalue is given by λ = −f (x).

Approach of Renaut, Guo 2005

Determine θ ∈ R+ such that the eigenvector (xTθ ,−1)T of

B(θ) := [A,b]T [A,b] + θ

(LT L 0

0 −δ2

)(∗)

corresponding to the smallest eigenvalue satisfies the constraint ‖Lxθ‖2 = δ2,i.e. find a nonnegative root θ of the real function

g(θ) :=‖Lxθ‖2 − δ2

1 + ‖xθ‖2 .

Renaut and Guo claim that (under the conditions bT A 6= 0 andN (A)∩N (L) = {0}) the smallest eigenvalue of B(θ) is simple, and that (underthe further condition that the matrix [A,b] has full rank) g is continuous andstrictly monotonically decreasing.

Hence, g(θ) = 0 has a unique root θ0, and the corresponding eigenvector(scaled appropriately) yields the solution of the RTLS problem.

B(θ) := [A,b]T [A,b] + θ

(LT L 0

0 −δ2

)(∗)

g(θ) :=‖Lxθ‖2 − δ2

1 + ‖xθ‖2 .

B(θ) := [A,b]T [A,b] + θ

(LT L 0

0 −δ2

)(∗)

g(θ) :=‖Lxθ‖2 − δ2

1 + ‖xθ‖2 .

Approach of Renaut, Guo ct.Unfortunately these assertions are not true. The last component of aneigenvector corresponding to the smallest eigenvalue need not be differentfrom zero, and then g is not defined.

Example: Let

1 00 10 0

10√3

(√2 0

), δ = 1.

Then the conditions ’[A,b| has full rank’, ’bT A = (1,0) 6= 0’ and’N (A) ∩N (L) = {0}’ are satisfied,

B(θ) =

1 + 2θ 0 10 1 + θ 01 0 4− θ

and the smallest eigenvalue λmin(B(0.5)) = 1.5 and λmin(B(1)) = 2 havemultiplicity 2, and for θ ∈ (0.5,1) the last component of the eigenvectoryθ = (0,1,0)T corresponding to the smallest eigenvalue λmin(B(θ)) = 1 + θ isequal to zero.

Example: Let

1 00 10 0

10√3

(√2 0

), δ = 1.

B(θ) =

1 + 2θ 0 10 1 + θ 01 0 4− θ

Example: Let

1 00 10 0

10√3

(√2 0

), δ = 1.

B(θ) =

1 + 2θ 0 10 1 + θ 01 0 4− θ

Example 1

Modified definition of g

Let E(θ) denote the eigenspace of B(θ) corresponding to its smallest

eigenvalue, and let N :=

(LT L 0

0 −δ2

g(θ) := miny∈E(θ)

yT NyyT y

is the minimal eigenvalue of the projection of N to E(θ)

Generalized g

Properties of g

Assume σmin([AF ,b]) < σmin(AF ) holds, where the columns of F ∈ Rn,n−k

form an orthonormal basis of the null space of L.Then g : [0,∞)→ R has the following properties:

(i) If σmin([A,b]) < σmin(A) then g(0) > 0

(ii) limθ→∞ g(θ) = −δ2

(iii) If the smallest eigenvalue of B(θ0) is simple, then g is continuous at θ0

(iv) g is monotonically not increasing on [0,∞)

(v) Let g(θ) = 0 and let y ∈ E(θ) such that g(θ) = yT Ny/‖y‖2. Then the lastcomponent of y is different from 0.

(vi) g has at most one root.

Properties of g

Roots of g

If θ is a positive root of g, then x := −y(1 : n)/y(n + 1) solves the RTLSproblem, where y denotes an eigenvector of B(θ) corresponding to itssmallest eigenvalue.

However, g is not necessarily continuous.If the multiplicity of the smallest eigenvalue of B(θ) is greater than 1 for someθ0, then g may have a jump discontinuity at θ0, and this may actually occur (cf.Example where g is discontinuous for θ0 = 0.5 and θ0 = 1). Hence, thequestion arises whether g may jump from a positive value to a negative one,such that it has no positive root.

Roots of g

If θ is a positive root of g, then x := −y(1 : n)/y(n + 1) solves the RTLSproblem, where y denotes an eigenvector of B(θ) corresponding to itssmallest eigenvalue.

However, g is not necessarily continuous.If the multiplicity of the smallest eigenvalue of B(θ) is greater than 1 for someθ0, then g may have a jump discontinuity at θ0, and this may actually occur (cf.Example where g is discontinuous for θ0 = 0.5 and θ0 = 1). Hence, thequestion arises whether g may jump from a positive value to a negative one,such that it has no positive root.

Theorem

Consider the standard case L = I, where σmin([A,b]) < σmin(A) andδ2 < ‖xTLS‖2.

Assume that the smallest eigenvalue of B(θ0) is a multiple one for some θ0 .

Then it holds that0 6∈ [ lim

θ→θ0−g(θ),g(θ0)].

Theorem

θ→θ0−g(θ),g(θ0)].

Theorem

θ→θ0−g(θ),g(θ0)].

General caseFor general regularization matrices L it may happen, that g does not haveroot, but it jumps below zero at some θ0.

Example: Let

1 00 10 0

10√5

(√2 0

), δ =

Then, 1 = σmin(A) > σmin([A,b]) ≈ 0.8986 holds, and the corresponding TLSproblem has the solution

xTLS = (AT A− σ2minI)−1AT b ≈

(5.1926

The constraint is active, because δ2 = 3 < 53.9258 ≈ ‖LxTLS‖22 holds, and

N (L) = {0}, so the RTLS problem is solvable.

The corresponding function g(θ) has got two jumps, one at θ = 0.25 andanother one at θ = 1 which jumps below zero.

Example: Let

1 00 10 0

10√5

(√2 0

), δ =

(5.1926

Example: Let

1 00 10 0

10√5

(√2 0

), δ =

(5.1926

Example: Let

1 00 10 0

10√5

(√2 0

), δ =

(5.1926

Jump below zero

A jump discontinuity of g only appears if λmin(B(θ0)) is a multiple eigenvalueof B(θ0).

Hence, there exists v ∈ E(θ0) with vanishing last component, and clearly theRayleigh quotient RN(v) of N at v is positive.

Since g(θ0) < 0, there exists some w ∈ E(θ0) with RN(w) = g(θ0) < 0 andnon vanishing last component.

Hence, for some linear combination of v and w we have RN(αv + βw) = 0,and scaling the last component to -1 yields a solution of the RTLS problem

Jump below zero

Typical g

Method of Renaut and GuoAssuming that g is continuous and strictly monotonically decreasing Renautand Guo derived the following update

θk+1 = θk +θk

δ2 g(θk )

for solving

g(θ) =‖Lx‖2 − δ2

‖x‖2 + 1= 0,

where at step k , (xTθk,−1)T is the eigenvector of B(θk ) = M + θk N

corresponding to λmin(B(θk )).

Additionally, back tracking was introduced to make the method converge, i.e.the update was modified to

θk+1 = θk + ιθk

δ2 g(θk )

where ι ∈ (0,1] was reduced until the sign condition g(θk )g(θk+1) ≥ 0 wassatisfied.

Method of Renaut and GuoAssuming that g is continuous and strictly monotonically decreasing Renautand Guo derived the following update

θk+1 = θk +θk

δ2 g(θk )

for solving

g(θ) =‖Lx‖2 − δ2

‖x‖2 + 1= 0,

where at step k , (xTθk,−1)T is the eigenvector of B(θk ) = M + θk N

corresponding to λmin(B(θk )).

Additionally, back tracking was introduced to make the method converge, i.e.the update was modified to

θk+1 = θk + ιθk

δ2 g(θk )

where ι ∈ (0,1] was reduced until the sign condition g(θk )g(θk+1) ≥ 0 wassatisfied.

RTLSEVP

Require: Initial guess θ0 > 01: compute the smallest eigenvalue λmin(B(θ0)), and the corresponding

eigenvector (xT0 ,−1)T

2: compute g(θ0), and set k = 13: while not converged do4: ι = 15: while sign condition not satisfied do6: update θk+17: compute the smallest eigenvalue λmin(B(θk+1)), and the

corresponding eigenvector (xTk+1,−1)T

8: if g(θk )g(θk+1) < 0 then9: ι = ι/2

10: end if11: end while12: end while13: xRTLS = xk+1

Although in general the assumptions are not satisfied, the algorithm may beapplied to the modified function g since generically the smallest eigenvalue ofB(θ) is simple and solutions of the RTLS problem correspond to the root of g.However, the method as suggested by Renaut and Guo suffers twodrawbacks:

The suggested eigensolver in line 7 of algorithm for finding the smallesteigenpair of B(θk+1) is the Rayleigh quotient iteration. Due to the required LUfactorizations in each step this method is very costly. An approach like thisdoes not turn to account the fact that the matrices B(θk ) converge as θkapproaches the root θ of g. We suggest a method which takes advantage ofinformation acquired in previous iteration steps by thick starts.

Secondly, the safeguarding by back tracking hampers the convergence of themethod considerably. We propose to replace it by an algorithm whichencloses the root in bounds and utilizes the asymptotic behaviour of g.

Typical g−1

Root findingGiven three pairs (θj ,g(θj )), j = 1,2,3 with

θ1 < θ2 < θ3 and g(θ1) > 0 > g(θ3) (∗)we determine the rational interpolation

h(γ) =p(γ)

γ + δ2 , where p is a ploynomial of degree 2,

and p is chosen such that h(g(θj )) = θj , j = 1,2,3.If g is strictly monotonically decreasing in [θ1, θ3] then this is a rationalinterpolation of g−1 : [g(θ3),g(θ1)]→ R.As our next iterate we choose θ4 = h(0). In exact arithmetic θ4 ∈ (θ1, θ3), andwe replace θ1 or θ3 by θ4 such that the new triple satisfies (*).It may happen that due to nonexistence of the inverse g−1 on [g(θ3),g(θ1)] ordue to rounding errors very close to the root θ, θ4 is not contained in theinterval (θ1, θ3). In this case we perform a bisection step such that the intervaldefinitely still contains the root.If a discontinuity at or close to the root is encountered, then a very smallε = θ3 − θ1 appears with relatively large g(θ1)− g(θ3). In this case weterminate the iteration and determine the solution as described before.

h(γ) =p(γ)

Solving sequence of eigenproblems

To evaluate g one has to solve an eigenproblem

B(θk )y = (M + θk N)y = λy

with M = [A,b]T [A,b] and N =

(LT L 0

0 −δ2

As for the sequence of quadratic eigenproblems in the approach of Sima,Golub, Van Huffel this can be done with the Nonlinear Arnoldi method andthick starts taking advantage of the information gathered in previous iterationsteps.

Solving sequence of eigenproblems

To evaluate g one has to solve an eigenproblem

B(θk )y = (M + θk N)y = λy

with M = [A,b]T [A,b] and N =

(LT L 0

0 −δ2

As for the sequence of quadratic eigenproblems in the approach of Sima,Golub, Van Huffel this can be done with the Nonlinear Arnoldi method andthick starts taking advantage of the information gathered in previous iterationsteps.

Numerical example

Numerical examples are obtained from Hansen’s regularization tool box:

We added white noise to the data of heat(1) with noise level 1% and 10%, andchose L1 to be an approximate first order derivative (marked ’a’) and anonsingular approximation (marked ’b’).The following tabel contains the average CPU time for 100 test problems ofdimensions n = 1000, n = 2000, and n = 4000 (CPU: Pentium D, 3.4 GHz).

n LY 1a LY 1b SO 1a SO 1b NLA 1a NLA 1b NLA 2a1% 1000 0.53 0.47 0.52 0.63 0.35 0.36 0.19

2000 1.28 1.19 1.13 1.02 1.08 0.99 0.604000 4.94 4.68 4.37 3.78 4.21 3.88 2.65

10% 1000 0.55 0.46 0.48 0.45 0.36 0.32 0.192000 1.37 1.18 1.19 0.99 1.07 0.98 0.614000 4.95 4.67 4.31 3.73 4.17 3.92 2.54

Numerical example

2000 1.28 1.19 1.13 1.02 1.08 0.99 0.604000 4.94 4.68 4.37 3.78 4.21 3.88 2.65

10% 1000 0.55 0.46 0.48 0.45 0.36 0.32 0.192000 1.37 1.18 1.19 0.99 1.07 0.98 0.614000 4.95 4.67 4.31 3.73 4.17 3.92 2.54

Numerical example

2000 1.28 1.19 1.13 1.02 1.08 0.99 0.604000 4.94 4.68 4.37 3.78 4.21 3.88 2.65

10% 1000 0.55 0.46 0.48 0.45 0.36 0.32 0.192000 1.37 1.18 1.19 0.99 1.07 0.98 0.614000 4.95 4.67 4.31 3.73 4.17 3.92 2.54

Numerical example 2: tomo

’tomo’ is a discretization of a Fredholm integral equation of the first kind with

2D domain. L2 =

(L1 ⊗ II ⊗ L1

)is an approximation of the first order derivative.

The following tabel contains the average CPU time for 100 test problems.

n Li&Ye 1a SOAR 1a NL Arn. 1a NL Arn. 1b NL Arn. 2a1% 30x30 0.77 1.01 1.02 1.24 0.20

40x40 2.62 2.55 2.07 2.81 0.5450x50 6.93 6.44 4.78 6.03 3.86

10% 30x30 0.77 1.02 1.00 1.23 0.2140x40 2.63 2.56 2.02 2.88 0.5650x50 6.89 6.38 4.80 5.98 3.83

Determining the regualrization parameter

Outline

6 Numerical example

L-curve

Idea of the L-curve:Developed to balance ‖Axλ − b‖2 and ‖Lxλ‖2 in Tikhonov approach‖Ax − b‖2 + λ‖Lx‖2 = minx

Works as well for ‖Axδ − b‖2 = minx subject to ‖Lxδ‖ ≤ δ

Can be extended to f (xδ) =‖Axδ − b‖2

1 + ‖xδ‖2 = minx s.t. ‖Lxδ‖ ≤ δ

Choose set of δi , i = 1, . . . ` and solve one RTLS problem for each δi

Note, there is no discrete Picard condition for RTLS problems,nor there exists a tool like GSVD to obtain L-curve analytically.

We simply try to balance the function value f (xδ) and the size of the norm‖Lxδ‖, which seems to work fine in most cases.

L-curve

Idea of the L-curve:Developed to balance ‖Axλ − b‖2 and ‖Lxλ‖2 in Tikhonov approach‖Ax − b‖2 + λ‖Lx‖2 = minx

Works as well for ‖Axδ − b‖2 = minx subject to ‖Lxδ‖ ≤ δ

Can be extended to f (xδ) =‖Axδ − b‖2

1 + ‖xδ‖2 = minx s.t. ‖Lxδ‖ ≤ δ

Choose set of δi , i = 1, . . . ` and solve one RTLS problem for each δi

Note, there is no discrete Picard condition for RTLS problems,nor there exists a tool like GSVD to obtain L-curve analytically.

We simply try to balance the function value f (xδ) and the size of the norm‖Lxδ‖, which seems to work fine in most cases.

shaw(4000-2000); L-curve

10−8

10−6

10−4

10−2

10−7

10−6

10−5

10−4

10−3

RTLSQEP − L−curve

|| Lx ||

x||2 )

L-curve

Use Nonlinear Arnoldi to solve sequence of RTLS problemsThis means solving a sequence of a sequence of QEPsUpdating projected problems is trivial, because all stored matrices areindependent of parameter δi

Reuse search space V during sequence of sequence of eigenproblemsIf search space grows too large, include restart strategy

Restart strategy:If dimension p � n of search space span(V ) is reached, restartNonlinear Arnoldi by subspace of q < p eigenvectors corresponding torightmost eigenvalues of the last p eigenproblems.Values can be set in advance (e.g. p = 45,q = 5)Effect of restart: It purges out ’old’ information corresponding to values ofδ that highlight other parts of the problem

L-curve

Use Nonlinear Arnoldi to solve sequence of RTLS problemsThis means solving a sequence of a sequence of QEPsUpdating projected problems is trivial, because all stored matrices areindependent of parameter δi

Reuse search space V during sequence of sequence of eigenproblemsIf search space grows too large, include restart strategy

Restart strategy:If dimension p � n of search space span(V ) is reached, restartNonlinear Arnoldi by subspace of q < p eigenvectors corresponding torightmost eigenvalues of the last p eigenproblems.Values can be set in advance (e.g. p = 45,q = 5)Effect of restart: It purges out ’old’ information corresponding to values ofδ that highlight other parts of the problem

shaw(4000-2000); reuse of entire search space

0 20 40 60 80

10−15

10−10

10−5

RTLSQEP

inner iterations

QEP residual

shaw(4000-2000); reuse of solution vector only

0 50 100 150 200

10−15

10−10

10−5

RTLSQEP

inner iterations

QEP residual

Regularization of Total Least Squares Problems - TUHH · Regularization of Total Least Squares...

Documents

Aditi Voss

Regularization of Least Squares ProblemsLeast Squares Problems Solving LS problems If the columns of A are linearly independent, the solution x∗can be obtained solving the normal

VOSS-HELME · VOSS-Caps VOSS-Cap classic Copricapo antiurto VOSS-Cap classic – a norma EN 812:1997 – secondo il classico look dei cappellini da baseball – peso ca. 165 g –

VOSS - MAN

SV Regularization

Image denoising by generalized total variation ...jyan/Publications/MSSP2013.pdf · Image denoising by generalized total variation regularization and least squares delity Jie Yan

prml regularization

Voss Vanities

Voss Toilets

Georgina Voss

Voss konfur

Voss Accessories

Galleri Voss

Voss Basins

A Least-squares Approach to Direct Importance Estimation€¦ · Keywords: importance sampling, covariate shift adaptation, novelty detection, regularization path, leave-one-out cross

Regularization, Ridge Regression - University of …courses.cs.washington.edu/.../regularization-xvalidation-lasso.pdf · Ridge Regression: Effect of Regularization 14 ! Solution

Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Large Scale Tikhonov Regularization for Total Least ... · Total Least Squares Problems Total Least Squares Problems cnt. The TLS problem can be analyzed in terms of the singular

Regularization in Neural Networks - Welcome to CEDARcedar.buffalo.edu/~srihari/CSE574/Chap5/Chap5.5-Regularization.pdf · Regularization in Neural Networks ... Need for Regularization

VOSS Gesamtkatalog