Upload
others
View
17
Download
0
Embed Size (px)
Citation preview
Statistical Regularization Approaches forLinear/Nonlinear Inverse Problems: Hybrids
Rosemary Renaut
Colorado State Fort Collins
July 8, 2009
National Science Foundation: Division of Computational Mathematics 1 / 18
Least Squares for Ax = b: A Quick Review
Consider discrete systems: A ∈ Rm×n, b ∈ Rm, x ∈ Rn
Ax = b + e,
Classical Approach Linear Least Squares (A full rank)
xLS = arg minx||Ax− b||22
Difficulty xLS sensitive to changes in right hand side b when A isill-conditioned.
Regularization is used
National Science Foundation: Division of Computational Mathematics 2 / 18
Example Signal Restoration
(a) Original (b) Noisy
(c) Unregularized (d) Regularized
National Science Foundation: Division of Computational Mathematics 3 / 18
Introduce Generalized Tikhonov Regularization
Weighted Fidelity with Regularization
xRLS(λ) = arg minx{‖b− Ax‖2
Wb+ λ2‖D(x− x0)‖2},
Weighting matrix Wb
D is a suitable operator, often derivative approximation.
Assume N (A) ∩N (D) = {0}x0 is a reference solution, often x0 = 0.
• λ is a regularization parameter which is unknown.
Solution xRLS(λ) depends on λ, D and Wb
Having found λ posterior inverse covariance matrix is
W̃x = ATWbA + λ2I
National Science Foundation: Division of Computational Mathematics 4 / 18
Choice of λ crucial
Different algorithms yield different solutions.
Discrepancy Principle
L-Curve
Generalized Cross Validation (GCV)
Unbiased Predictive Risk (UPRE)
χ2 Method
Residual Periodogram and related approaches (O’Leary et al)
National Science Foundation: Division of Computational Mathematics 5 / 18
Some standard approaches I: L-curve - Find the corner
Let r(λ) = (A(λ)− A)b:Influence MatrixA(λ) = A(ATWbA+λ2DTD)−1AT
Plot
log(‖Dx‖), log(‖r(λ)‖)
Trade off contributions.
Expensive - requires range of λ.
GSVD makes calculationsefficient.
Not statistically based
Find corner
No cornerNational Science Foundation: Division of Computational Mathematics 6 / 18
Generalized Cross-Validation (GCV)
LetA(λ) = A(ATWbA+λ2DTD)−1AT
Can pick Wb = I.
Minimize GCV function
‖b− Ax(λ)‖2Wb
[trace(Im − A(λ))]2,
which estimates predictive risk.
Expensive - requires range of λ.
GSVD makes calculationsefficient.
Requires minimum
Multiple minima
Sometimes flat
National Science Foundation: Division of Computational Mathematics 7 / 18
Unbiased Predictive Risk Estimation (UPRE)
Minimize expected value ofpredictive risk: Minimize UPREfunction
‖b− Ax(λ)‖2Wb
+2 trace(A(λ))− m
Expensive - requires range of λ.
GSVD makes calculationsefficient.
Need estimate of traceMinimum needed
National Science Foundation: Division of Computational Mathematics 8 / 18
phillips Fredholm integral equation (Hansen)
1 Add noise to b2 Standard deviation σbi = .01|bi|+ .1bmax
3 Covariance matrix Cb = σ2bIm
4 σ2b average of σ2
bi
5 Uncertainty estimates shown
National Science Foundation: Division of Computational Mathematics 9 / 18
phillips Fredholm integral equation (Hansen)
Comparison
National Science Foundation: Division of Computational Mathematics 10 / 18
Solutions with Uncertainty
(e) L-curve Solution (f) GCV Solution
(g) UPRE Solution (h) χ2 curve SolutionNational Science Foundation: Division of Computational Mathematics 11 / 18
The Discrepancy Principle
Suppose noise is white: Cb = σ2bI.
Find λ such that the regularized residual satisfies
σ2b =
1m‖b− Ax(λ)‖2
2. (1)
Can be implemented by a Newton root finding algorithm.
But discrepancy principle typically oversmooths.
National Science Foundation: Division of Computational Mathematics 12 / 18
The χ2 Method
Solve for χ2 Distribution
‖b− Ax(λ)‖2Wb
+λ2‖D(x− x0)‖2 = m
Newton root finding -efficient
Small Problems use GSVD
Estimate of average x0
Noise Distribution Wb needed
But extends for x0 not known.The χ2 curve
Monotonic - fast convergence
National Science Foundation: Division of Computational Mathematics 13 / 18
Relation Discrepancy and χ2
χ2 Method:
‖b− Ax(λ)‖2Wb
+ λ2‖D(x− x0)‖2 = m
Discrepancym = ‖b− Ax(λ)‖2
Wb.
Both are quick for given A - use a Newton algorithm and converge in fewiterations.
National Science Foundation: Division of Computational Mathematics 14 / 18
Large Scale Problems
Solve global problem using LSQR algorithm: use only A and AT .
Projects problem to a small subproblem
Hybrids solve the regularization on projected problem.Definition of the regularization on projected problem - careful -
Nagy needs a weighted GCV.χ2 number of degrees of freedom is reduced.
Approach is viable
National Science Foundation: Division of Computational Mathematics 15 / 18
Observations
1 χ2 principle is similar in spirit to discrepancy2 Can be used for large scale problems3 Uses statistical information on errors in right hand side4 Finds solution efficiently5 May be used to provide uncertainty estimates.6 Remember it uses a weighted norm.
National Science Foundation: Division of Computational Mathematics 16 / 18
Nonlinear Least Squares for inverse problems- well-known
Solve for parameters q which are potentially defined by solution of a pdeNonlinear least squares for residual vector R(q)
qopt =12
arg minq
R(q)2 = arg min f (q)
Basic damped Gauss Newton algorithm
q(k+1) = q(k) + δp(k+1)
where δ is line search and p is a search directionp solves for Jacobian system
J(q)p ≈ −∇R(q)
But Jacobian is usually ill-conditioned - Levenberg Marquardt -introduce regularizationTypical approaches use Morozov discrepancy principle for estimate ofacceptable solution
National Science Foundation: Division of Computational Mathematics 17 / 18
For Discussion
Suggest using LSQR with χ2 principle. - replace discrepancy
Solve on projected problem. Projected problem has to be updated eachouter iteration.
Cost of finding projected problem though is cheap.
Size of projected problem is generally small.
Provides alternative to trust region based approaches.
Can still introduce sufficient decrease conditions.
Use weighted norm based on errors in estimates.
National Science Foundation: Division of Computational Mathematics 18 / 18