Ill-posed LS problemsue.pwr.wroc.pl/numerical_methods_lectures/NM_LS_problems.pdf · Springer-Verlag, 1993, [4] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000,

NumericalNumerical

MethodsMethods

Rafał ZdunekRafał Zdunek

IllIll--posed Least Squares Problemsposed Least Squares Problems

(3h.)(3h.)

(LLS, (LLS, illill--posedposed

andand

rankrank--deficientdeficient

LS LS problemsproblems, , regularizationregularization, , regularization parameter estimregularization parameter estim. ). )

Introduction• Linear least squares (LLS) problems,• Normal equations,• Basic orthogonal subspaces,• Linear regression,• Ill-posed and rank-deficient LSS problems,• Regularization,• Regularization parameter estimation.

Bibliography[1] A. Bjorck, Numerical Methods for Least Squares Problems, SIAM,

Philadelphia, 1996, [2] G. Golub, C. F. Van Loan, Matrix Computations, The John Hopkins

University Press, (Third Edition), 1996, [3] J. Stoer

R. Bulirsch, Introduction to Numerical Analysis (Second Edition),

Springer-Verlag, 1993, [4] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000,[5] Ch. Zarowski, An Introduction to Numerical Analysis for Electrical and

Computer Engineers, Wiley, 2004,[6] G. Strang, Linear Algebra and Its Applications, Harcourt Brace & Company

International Edition, 1998,

Solutions to linear systemsA system of linear equations can be expressed in the following matrix form:

bAx = , (1)

where [ ] NMija ×ℑ∈=A is a coefficient matrix, [ ] M

ib ℑ∈=b is a data vector, and

[ ] Njx ℑ∈=x is a solution to be estimated. Let [ ] )1( +×ℑ== NMbAB be the augmented

matrix to the system (1). The system of linear equations may behave in any of three possible ways:

A The system has no solution if ( ) ( )rank rank<A B .

B The system has a single unique solution if ( ) ( )rank rank N= =A B .

C The system has infinitely many solutions if ( ) ( )rank rank N= <A B .

Solution to inconsistent systemGiven the inconsistent and overdetermined system of linear equations:

, where and

the linear least squares (LLS) problem aims at finding such an estimate to a solution vector x that the forward projection vector Ax is “best” approximation to an observation vector b. There are many possible ways of defining the “best” solution. A choice which can often be motivated for statistical reasons and which also leads to a simple computational approach is to let x be a solution to the following minimization problem:

This problem originally arose from the need to fit a linear mathematical model to given observations. The best fit in the least-squares sense minimizes the sum of squared residuals, a residual being the difference between an observed value and the fitted value provided by a model.

=Ax b ,M N×∈ℜA ,M∈ℜb ,N∈ℜx

2min −xb Ax

,M N≥

LLS problemUsually the LLS problem is expressed by:

2

2

1min .2

−x

b Ax

Let Φ(x) be the objective function of the LLS problem: 2

2

1( )2

Φ = −x b Ax

We have:

( ) ( ) ( )2

2

1 1 1( )2 2 2

1 1 .2 2

T T T T T T T

T T T T T

Φ = − = − − = − − +

= − +

x b Ax b Ax b Ax b b x A b b Ax x A Ax

b b x A b x A Ax

Normal equationsStationary point of Φ(x) : ( ) 0.∇Φ

xx

1 1( ) 02 2

T T T T T T T⎛ ⎞∇Φ = ∇ − + = − =⎜ ⎟⎝ ⎠x x

x b b x A b x A Ax A Ax A b

Hence: .T T=A Ax A b (Normal equations of the first kind)

( ) 1,T T− += =x A A A b A b

In general, the matrix ATA ∈ℜNxN is symmetric and nonnegative definite.

Theorem: If M ≥

N, and rank(A) = N, the matrix ATA is symmetric and positive definite. For this case, the Least Squares (LS) solution:

A+ - Moore-Penrose pseudoinversewhere

Normal equationsTheorem: If M < N (underdetermined system), the matrix ATA is singular, and the matrix (ATA)-1AT can be transformed to the equivalent form AT(AAT)-1,where AAT ∈ℜMxM is symmetric and positive definite if rank(A) = M.

Thus:

( ) 1T T − += =x A AA b A b

,T =AA z b .T=x A zwhere (Normal equations of the second kind)

Solution:

Normal equations11 12 1

121 22 2

231 33 3

a a bx

a a bx

a a b

⎡ ⎤ ⎡ ⎤⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

( ) 1T T−=x A A A b

( ) 1T T−=P A A A A - projection matrix

Orthogonal subspaces{ }( ) Im( ) : , M NR = = ∈ℜ = ∈ℜA A b b Ax x

{ }( ) Ker( ) : NN = = ∈ℜ =A A x Ax 0

( ) { }( ) Ker( ) : T T M TR N⊥ = = = ∈ℜ =A A A y A y 0

{ }( ) ( ) Im( ) : , T T N MN R⊥ = = = ∈ℜ = ∈ℜA A A x b Ax b

- range of A or column space

- nullspace of A

- left nullspace of A or orthogonal complement of column space

- row space of A or orthogonal Complement of nullspace

)Ker()Im( TM AA ⊕=ℜ

)Ker()Im( AA ⊕=ℜ TN

MTTT =+=+ ))dim(Im( )rank())dim(Im())dim(Ker( AAAA

N=+=+ ))dim(Im()rank())dim(Im())dim(Ker( AAAA

Orthogonal subspaces

Im(A)

Ker(AT )

Im(A

T )

Ker(A)

xr

xn

x00

A.xxr −−> A.xr = A.x

x −−> A.x

xn −−> A.xn = 0

ℜN ℜM

Orthogonal subspacesLet A ∈ ℑMxN, r = rank(A), and the SVD of A is given by:

1

0,

0 0

rH H M N

i i iiσ ×

=

⎡ ⎤= = ∈ℑ⎢ ⎥

⎣ ⎦∑

ΣA U V u v

where [ ]1, , ,M MM

×= ∈ℑU u u… [ ]1, , ,N NN

×= ∈ℑV v v…

( )1 2diag , , , .r rrσ σ σ ×= ∈ℜΣ …

The four fundamental orthogonal subspaces:

( ) { }1span , , ,M rrR ×= ∈ℑA u u…

( ) { } ( )1span , , ,H M M r

r MN × −+= ∈ℑA u u… ( ) { }1span , , .H N r

rR ×= ∈ℑA v v…

( ) { } ( )1span , , ,N N r

r NN × −+= ∈ℑA v v…

Perturbation problemsA system of linear equations Ax = b is inconsistent ⇔ b ∉

R(A)

Consider the perturbed linear system:

,δ= + =Ax b b b where ( ) ,E δ =b 0 and ( ) 2cov ,TMδ δ σ=b b I

(Perturbations, e.g. Gaussian noise)

Then ,r nδ δ δ= +b b b where ( ) ,r Rδ ∈b A ( ) ,Tn Nδ ∈b A

Since ( ) ( ) ,T T TR R∈ =A b A A A we have ( ) ( )1

.T T R−

= ∈Pb A A A A b A

Remark: The normal equations boil down an inconsistent system to a consistent one!

PseudoinverseTheorem: The general LLS problem:

2min ,

S∈xx

where

{ }2: minNS = ∈ℜ − =x b Axwith

,M N×∈ℑA and ( ) { }rank min , ,r M N= ≤A always has a unique solution

that is called the minimal-norm least squares solution and can be written in terms of

the SVD of A as: +=x A b

11

1

0,

0 0

rH H N M

i i iiσ

−+ − ×

=

⎡ ⎤= = ∈ℑ⎢ ⎥

⎣ ⎦∑Σ

A V U v u

where A+ is the Moore-Penrose pseudoinverse:

PseudoinverseThe pseudoinverse is uniquely determined by the following conditions:

,+ =AA A A ,+ + +=A AA A

( ) ,H+ +=AA AA ( ) ,

H+ +=A A A A

Orthogonal projectors:

( ) 1 1 ,HRP += =A AA U U

1 1( ),H

HR

P += =A

A A VV2 2( )

,HH

MNP += − =

AI AA U U

( ) 2 2 ,HN nP += − =A I A A V V

where [ ]1 1, , ,M rr

×= ∈ℑU u u… [ ] ( )2 1, , ,M M r

r M× −

+= ∈ℑU u u…

[ ]1 1, , ,N rr

×= ∈ℑV v v… [ ] ( )2 1, , ,N N r

r N× −

+= ∈ℑV v v…

Rank-deficient problemsIf A ∈ ℑMxN, M ≥

N, and rank(A) = N (column vectors are linearly independent),

then( ) .N =∅A (the nullspace is trivial)

The minimal-norm LS solution can be computed from: ( ) 1,H H−+= =x A b A A A b

since the matrix AHA is nonsingular.

Theorem: If A ∈ ℑMxN, M ≥

N, and rank(A) < N (rank-deficient problem), then: ( ) .N ≠ ∅A (the nullspace is not trivial)

The matrix AHA is singular! The minimal-norm LS solution may be not the true solution.

Ill-posed LS problemsAccording to the Hadamard’s definition, a problem can be defined as an ill-posed if

• its solution is not unique,• it is not a continuous function of the data – that is, small perturbations of the data

may result in large perturbation of the solution.

A degree of ill-posedness can be estimated from a distribution of singular values of the operator or the system matrix that transforms a space of solution into a space of observations (measurements). Let Ax = b, that A is the system matrix.

Ill-posed LS problemsAssume the condition number of a matrix A is defined as:

( ) max

min

cond σσ

=A

where σmax and σmin are the maximal and minimal singular values, respectively.

The condition number for ill-posed problems is large! Let μi denote the decay rate of singular values. Then, the problem characterized by , where α ≤ 1 or α

> 1, is regarded as midly or moderately ill-posed, elseif , the

problem is considered as severely ill-posed. Rank-deficient LS problems are always severely ill-posed.

( )i O i αμ −=( )i

i O e αμ −=

Regularization

Regularization introduces additional a priori information about the desired

solution to the underlying model in order to stabilize an ill-posed problem or to

prevent overfitting. This information is usually of the form of a penalty for some

spectral components of the solution. In consequence, this results in restrictions

for smoothness, sparsity or bounds on the vector space norm. From a statistical

point of view, many regularization techniques correspond to imposing certain

prior distributions on model parameters with the Bayesian framework.

RegularizationRegularization can be achieved in many ways:

• TSVD (Truncated SVD) or FSVD (Filtered SVD); these techniques aim at selecting only these spectral components (usually low-frequency components) of the solution that are not strongly affected by perturbations of the data.

• Morphological constraints; the penalty terms that are usually added to the objective function to enforce a certain characteristics of a solution, e.g. smoothness, sparsity, unimodality, nonnegativity, box-constraints, etc.

• Early termination of iterative updates in the iterative solvers (Landweber iterations, Jacobi, Gauss-Newton, CG, GMRES, QMR, LSQR, etc.)

TSVDThe Truncated SVD (TSVD) solution can be obtained by selecting only these spectral components (right singular vectors) that correspond to relatively large singular values.

( )1

,Hk

NiLS i

i iσ=

= ∈ℑ∑U b

x v

Let Ax = b, where A ∈ ℑMxN, M ≥

N, A = UΣVH, then

where k is the numerical rank that is usually lower than rank(A). It can be roughly estimed from a distribution of singular values, e.g. as k = {i: σi > τ}, where τ

is

some threshold.

FSVDIn general, the regularized solution can be expressed in term of the SVD as

( )1

,HN i Ni

LS ii i

f

σ=

= ∈ℑ∑U b

x v

where {fi } for i = 1, ...,N are filter factors that take various forms, depending on a regularization technique.

If : fi = 1,Else if : 0 ≤

fi < 1

From the Picard conditions:Hi iσ<u b

Tikhonov regularizationThe discrete Tikhonov regularization scheme is obtained by adding to the objective function the penalty term. Usually this term is expressed by the discrete smoothing norm Ω(x). Thus the Tikhonov regularized LS problem has the form:

( )2 22

1min2

α⎧ ⎫− + Ω⎨ ⎬⎩ ⎭x

b Ax x

where α

> 0 is the regularization parameter.

Penalty termsExamples of the penalty terms:

( ) 2

1Ω =x Lx1. (l1 -norm enforces sparseness)

( ) 2

2Ω =x Lx2. (l2 -norm enforces smoothness)

( ) ( ) 2

2priorΩ = −x L x x3. (if a priori estimate is accessible)

( )1

0

d dtdt

Ω = ∫xx5. (Total Variation (TV) functional)

4. ( ) ln d∞

−∞

Ω = ∫x x x x (maximization of entropy decreases randomness)

Penalty terms

N=L I

For ( ) 2

2:Ω =x Lx

- standard Tikhonov regularization (enforces total smoothness; minimize the norm ||x||2 )

( 1)

1 1 0 00 00 0 1 1

N N− ×

−⎡ ⎤⎢ ⎥= ∈ℜ⎢ ⎥⎢ ⎥−⎣ ⎦

L ( 2)

1 2 1 0 00 00 0 1 2 1

N N− ×

−⎡ ⎤⎢ ⎥= ∈ℜ⎢ ⎥⎢ ⎥−⎣ ⎦

L

(First derivative operator) (Second derivative operator)

General Tikhonov regularization ( ) 2

2

1 .2

Ω =x LxLet A ∈ ℜMxN, M ≥

N, and Then the regularized objective function:

22 2

2 2

1( )2 2

αΦ = − +x b Ax Lx

From the stationarity of Φ(x), we have : ( ) 0∇Φx

x2

2

1 1( )2 2 2

0

T T T T T T T

T T T

α

α

⎛ ⎞∇Φ = ∇ − + +⎜ ⎟

⎝ ⎠= + − =

x xx b b x A b x A Ax x L Lx

A Ax L Lx A b Thus ( ) 12T T TLS α

−= +x A A L L A b

(Tikhonov regularized solution)

Standard Tikhonov regularization

,N=L IFor

Thus, the Tikhonov regularized solution can be expressed in terms of the FSVD, where:

( ) 12T TLS Nα

−= +x A A I A b

Assuming , we have:

( )( ) ( ) ( )

( )

1 12 2

2122 2diag

T TT T T T T T TLS N N

T T T TiN

i

α α

σασ α

− −

−

= + = + =

⎛ ⎞= + = ⎜ ⎟+⎝ ⎠

x UΣV UΣV I UΣV b VΣ ΣV I VΣ U b

V Σ Σ I Σ U b V U b

T=A UΣV

2

2 2i

ii

f σσ α

=+

If σi > α, then fi ≈

1, elseif σi << α, then fi → 0.

Iterative Tikhonov regularizationLet: ,

For :

The limit point for consistent and inconsistent cases is the same as in the Landweber iterations.

For : may has N(A) different than

Nℜ∈0x ( ) ( )1121 −−− −++= kTTTkk AxbALLAAxx α

NIL =( )

( ) irTi

N

i

kii

i

Ti

rank

i

kin

k ff vxvvbuxxA

0

11

0 1∑∑==

−++=σ

k

i

ikif ⎟⎟

⎠

⎞⎜⎜⎝

⎛+

−−= 22

2

11ασ

σ

NIL ≠ k

kx

∞→lim ( )( )0xANP

Tikhonov regularizationFor , ITR →

Standard Tikhonov Regularization

,

To balance the errors: OCV, GCV criteria may be applied.

0x =0 1=k

( ) 12

1,

TNT T i

reg N i ii i

fασ

−

=

= + =∑ u bx A A I A b v 22

2

ασσ+

=i

iif

( ) ( )∑=

⎥⎥⎥⎥

⎦

⎤

⎢⎢⎢⎢

⎣

⎡

+−=N

i i

rTi

ii

rTi

iregLS

perreg

ffd1

2

2 1,

εε

σδ

σbubuxx

Regularization parameter estimation

Typical techniques for estimating regularization parameter are:

• L-curve,• Cross-Validation.

The Generalized Cross-Validation (GCV) assumes that the components of b are subject to perturbations δb, where ( ) ,E δ =b 0 ( ) 2cov ,T

Mδ δ σ=b b Iand the variance σ2 may not be known.

Assuming: 1 ,TLS α

−=x M A b where 2T Tα α= +M A A L L

GCV function

( )( )( ) ( )( ) ( )

2 2 22 2 2

22

21

1

1 tracetrace

Tikhonov

rank

MM ii

MMGM fM

α α α

αα

α+

=

− − −= = =

−− − ∑A

Ax b Ax b Ax b

I AAI P

The predicted values of b can be expressed in terms of the system:

,α α=Ax P b where 1 Tα α

−=P AM A

When σ2 is not known, α

may be chosen to minimize the Generalized Cross-Validation (GCV) function:

Image Reconstruction (Tikhonov Regularization)

GCV function computed with using the Hansen’s toolbox, 1998 ( ) αλ=

Distance: ( )regLSd xx ,

L-curve

The L-curve has two different segments:-horizontal, where the regularization errors

dominate,- vertical, where the perturbation errors dominate.

(Optimal solution)

Linear regressionIn linear regression the aim is to fit the linear model y(t) =α+βt to given data (yi , ti ), i=1,...,M. This leads to an overdetermined linear system:

1 1

2 2

11

1 m m

t yt y

t y

αβ

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞⎜ ⎟ ⎜ ⎟=⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

From the normal equations:

1 1

2

1 1 1

M M

i ii i

M M M

i i i ii i i

M t y

t t y t

αβ

= =

= = =

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎛ ⎞⎜ ⎟ ⎜ ⎟=⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠

∑ ∑

∑ ∑ ∑

Linear regressionThe LS solution is given by:

Where and are the mean values

1

2 2

1

, ,

M

i ii

M

ii

y t Myty t

t Mtβ α β=

=

⎛ ⎞−⎜ ⎟

⎝ ⎠= = −⎛ ⎞

−⎜ ⎟⎝ ⎠

∑

∑

1 1, .

M M

i ii i

y ty t

M M= =

⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠= =∑ ∑

Note that the point lies on the fitted line.

y t

( ),y t

Documents

Ill-posed LS problemsue.pwr.wroc.pl/numerical_methods_lectures/NM_LS_problems.pdf · Springer-Verlag, 1993, [4] C. D. Meyer, Matrix Analysis and Applied Linear Algebra, SIAM, 2000,