Statistical Regularization Approaches for Linear/Nonlinear ...rosie/mypresentations/fortcollins.pdf · Statistical Regularization Approaches for Linear/Nonlinear Inverse Problems:

Statistical Regularization Approaches forLinear/Nonlinear Inverse Problems: Hybrids

Rosemary Renaut

Colorado State Fort Collins

July 8, 2009

National Science Foundation: Division of Computational Mathematics 1 / 18

Least Squares for Ax = b: A Quick Review

Consider discrete systems: A ∈ Rm×n, b ∈ Rm, x ∈ Rn

Ax = b + e,

Classical Approach Linear Least Squares (A full rank)

xLS = arg minx||Ax− b||22

Difficulty xLS sensitive to changes in right hand side b when A isill-conditioned.

Regularization is used


Example Signal Restoration

(a) Original (b) Noisy

(c) Unregularized (d) Regularized


Introduce Generalized Tikhonov Regularization

Weighted Fidelity with Regularization

xRLS(λ) = arg minx{‖b− Ax‖2

Wb+ λ2‖D(x− x0)‖2},

Weighting matrix Wb

D is a suitable operator, often derivative approximation.

Assume N (A) ∩N (D) = {0}x0 is a reference solution, often x0 = 0.

• λ is a regularization parameter which is unknown.

Solution xRLS(λ) depends on λ, D and Wb

Having found λ posterior inverse covariance matrix is

W̃x = ATWbA + λ2I


Choice of λ crucial

Different algorithms yield different solutions.

Discrepancy Principle

L-Curve

Generalized Cross Validation (GCV)

Unbiased Predictive Risk (UPRE)

χ2 Method

Residual Periodogram and related approaches (O’Leary et al)


Some standard approaches I: L-curve - Find the corner

Let r(λ) = (A(λ)− A)b:Influence MatrixA(λ) = A(ATWbA+λ2DTD)−1AT

Plot

log(‖Dx‖), log(‖r(λ)‖)

Trade off contributions.

Expensive - requires range of λ.

GSVD makes calculationsefficient.

Not statistically based

Find corner

No cornerNational Science Foundation: Division of Computational Mathematics 6 / 18

Generalized Cross-Validation (GCV)

LetA(λ) = A(ATWbA+λ2DTD)−1AT

Can pick Wb = I.

Minimize GCV function

‖b− Ax(λ)‖2Wb

[trace(Im − A(λ))]2,

which estimates predictive risk.



Requires minimum

Multiple minima

Sometimes flat


Unbiased Predictive Risk Estimation (UPRE)

Minimize expected value ofpredictive risk: Minimize UPREfunction


+2 trace(A(λ))− m



Need estimate of traceMinimum needed


phillips Fredholm integral equation (Hansen)

1 Add noise to b2 Standard deviation σbi = .01|bi|+ .1bmax

3 Covariance matrix Cb = σ2bIm

4 σ2b average of σ2

bi

5 Uncertainty estimates shown


phillips Fredholm integral equation (Hansen)

Comparison


Solutions with Uncertainty

(e) L-curve Solution (f) GCV Solution

(g) UPRE Solution (h) χ2 curve SolutionNational Science Foundation: Division of Computational Mathematics 11 / 18

The Discrepancy Principle

Suppose noise is white: Cb = σ2bI.

Find λ such that the regularized residual satisfies

σ2b =

1m‖b− Ax(λ)‖2

2. (1)

Can be implemented by a Newton root finding algorithm.

But discrepancy principle typically oversmooths.


The χ2 Method

Solve for χ2 Distribution


+λ2‖D(x− x0)‖2 = m

Newton root finding -efficient

Small Problems use GSVD

Estimate of average x0

Noise Distribution Wb needed

But extends for x0 not known.The χ2 curve

Monotonic - fast convergence


Relation Discrepancy and χ2

χ2 Method:


+ λ2‖D(x− x0)‖2 = m

Discrepancym = ‖b− Ax(λ)‖2

Wb.

Both are quick for given A - use a Newton algorithm and converge in fewiterations.


Large Scale Problems

Solve global problem using LSQR algorithm: use only A and AT .

Projects problem to a small subproblem

Hybrids solve the regularization on projected problem.Definition of the regularization on projected problem - careful -

Nagy needs a weighted GCV.χ2 number of degrees of freedom is reduced.

Approach is viable


Observations

1 χ2 principle is similar in spirit to discrepancy2 Can be used for large scale problems3 Uses statistical information on errors in right hand side4 Finds solution efficiently5 May be used to provide uncertainty estimates.6 Remember it uses a weighted norm.


Nonlinear Least Squares for inverse problems- well-known

Solve for parameters q which are potentially defined by solution of a pdeNonlinear least squares for residual vector R(q)

qopt =12

arg minq

R(q)2 = arg min f (q)

Basic damped Gauss Newton algorithm

q(k+1) = q(k) + δp(k+1)

where δ is line search and p is a search directionp solves for Jacobian system

J(q)p ≈ −∇R(q)

But Jacobian is usually ill-conditioned - Levenberg Marquardt -introduce regularizationTypical approaches use Morozov discrepancy principle for estimate ofacceptable solution


For Discussion

Suggest using LSQR with χ2 principle. - replace discrepancy

Solve on projected problem. Projected problem has to be updated eachouter iteration.

Cost of finding projected problem though is cheap.

Size of projected problem is generally small.

Provides alternative to trust region based approaches.

Can still introduce sufficient decrease conditions.

Use weighted norm based on errors in estimates.


Documents

Statistical Regularization Approaches for Linear/Nonlinear ...rosie/mypresentations/fortcollins.pdf · Statistical Regularization Approaches for Linear/Nonlinear Inverse Problems: