Solution of ill-posed inverse problems pertaining to signal...

SOLUTION OF ILL-POSED INVERSE PROBLEMS

PERTAINING TO SIGNAL RESTORATION :UNDERSTANDING NOISE IN THE BASIS

Rosemary Renauthttp://math.asu.edu/˜rosie

EDINBURGH

APRIL 12, 2012

Outline

BackgroundExamples of Ill-posed problemsMotivation: Total Variation Solutions

The SVD SolutionImportance of the BasisPicard Condition for Ill-Posed Problems

Generalized regularizationGSVD for examining the solutionRevealing the Noise in the GSVD BasisStabilizing the GSVD Solution

Conclusions and Future

Regularization Parameter Estimation for TV

Illustration:Blurred Signal Restoration

b(t) =∫ π−π h(t, s)x(s)ds h(s) = 1

2πexp(−s

2σ2 ), σ > 0.

Forward Problem: Given x, A calculate b

I x the source is sampled at sj , b(t) is measured at ti.I The kernel h describes matrix A: integral is approximated

using numerical quadrature (weights wj) yielding a matrixA.

I Matrix equationb = Ax

describes the linear relationshipI Generally

∫ ba h(x)dx = 1 (for PSF no energy is introduced,

the signal is spread) leads to the normalization∑j wjh(xj) = 1.

I When h(s, t) = h(t− s): kernel is spatially invariant.I Typical choice of h(x) is the Gaussian blur - a low pass

filter. Also interested in the general integral equationproblem.

Inverse Problem given A, b find x

The solution depends on the conditioning of A and on the noisein measurements b. Condition of A is 1.8679e+ 05 Restorationwith noise .0001 : x = A−1b(Almost committing the inverse crime)

Example of restoration of 2D image with low noise

Figure: Original

Figure: Noisy and Blurred 10−5

Figure: Restored 10−5 inverse crime

Figure: Noisy and Blurred 10−3

Figure: Restored 10−3

Summary

Goals Given data b and kernel h find x

Features One may wish to find features from x, for examplethe edges in the signal

Difficulties The solution is very sensitive to the dataIll-Posed (according to Hadamard) A problem is ill-posed if it

does not satisfy conditions for well-posedness, OR1. b /∈ range(A)2. inverse is not unique because more than one

image is mapped to the same data, or3. an arbitrarily small change in b can cause an

arbitrarily large change in x.Required Analyse the formulation / adapt solution

techniquesTikhonov Regularization

xTik(λ) = arg minx{‖b−Ax‖2 + λ2‖Lx‖2}

Summary

Tikhonov Regularized Solutions x(λ) and derivative Lx for changing λ

Figure: We cannot capture Lx (red) from the solution (green): Noticethat ‖Lx‖ decreases as λ increases

Total Variation Regularization

A more general regularization term R(x) may be considered tobetter preserve properties of the solution x:

x(λ) = arg minx{‖Ax− b‖2W + λ2R(x)}

Suppose R is total variation of x (general options are possible)The total variation for a function x defined on a discrete grid is

TV(x(s)) =∑i

|x(si)− x(si−1)| ≈ ∆∑i

|dx(si)/ds|

TV approximates a scaled sum of the magnitude of jumps in x.∆ is a scale factor dependent on the grid size.Notice TV(x(s)) ≈ ∆‖Lx‖1 so we solve

x(λ) = arg minx{‖Ax− b‖22 + λ2‖Lx‖1}

Problematic for large scale problems.

TV(x(s)) =∑i

|x(si)− x(si−1)| ≈ ∆∑i

|dx(si)/ds|

TV(x(s)) =∑i

|x(si)− x(si−1)| ≈ ∆∑i

|dx(si)/ds|

Split Bregman for TV

The main reference is here with software:T. GOLDSTEIN, Split Bregman Software Page,http://www.stanford.edu/˜tagoldst/Tom_Goldstein/Split_Bregman.html.T. GOLDSTEIN AND S. OSHER, The Split Bregman Method forL1-Regularized Problems. SIAM J. Imaging Sci. 2, 2, (2009),323-343.Many developments have been made since that time:S. SETZER, Operator Splittings, Bregman Methods and FrameShrinkage in Image Processing, International Journal ofComputer Vision, in press, 2011.Above reference discusses relationship of SB to AugmentedLagrangian and Peaceman-Rachford alternating direction

SB The Main Idea of GO Paper: for Regularization R(x)

For R(x) = Lx for L ∈ Rq×n: Introduce d = Lx

Rewrite R(x) = λ2

2 ‖d− Lx‖22 + µ‖d‖1 and restate as

(x,d) = arg minx,d{12‖Ax− b‖22 + λ2

2 ‖d− Lx‖22 + µ‖d‖1}

Solve using an alternating minimization which separatesminimization for d from x

Various versions of the iteration can be defined.

S1 : x(k+1) = arg minx{1

2‖Ax− b‖22 +

2‖Lx− (d(k+1) − g(k))‖22} (1)

S2 : d(k+1) = arg mind{λ

2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1} (2)

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1). (3)

Rewrite R(x) = λ2

2 ‖d− Lx‖22 + µ‖d‖1}

2‖Ax− b‖22 +

2‖Lx− (d(k+1) − g(k))‖22} (1)

2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1} (2)

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1). (3)

Rewrite R(x) = λ2

2 ‖d− Lx‖22 + µ‖d‖1}

2‖Ax− b‖22 +

2‖Lx− (d(k+1) − g(k))‖22} (1)

2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1} (2)

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1). (3)

Rewrite R(x) = λ2

2 ‖d− Lx‖22 + µ‖d‖1}

2‖Ax− b‖22 +

2‖Lx− (d(k+1) − g(k))‖22} (1)

2‖d− (Lx(k+1) + g(k))‖22 + µ‖d‖1} (2)

S3 : g(k+1) = g(k) + Lx(k+1) − d(k+1). (3)

SB Unconstrained algorithm:

Update for x:

x = arg minx{1

2‖Ax− b‖22 +

2‖Lx− h‖22}, h = d− g

Standard least squares update using a Tikhonov regularizer.Uses the standard basis but perturbed weights through h

x(k+1) =

p∑i=1

νiuTi b + λ2µiv

ν2i + λ2µ2

z̃i +

n∑i=p+1

(uTi b)z̃i

Update for d:

d = arg mind{µ‖d‖1 +

2‖d− c‖22}, c = Lx + g

= arg mind{‖d‖1 +

2‖d− c‖22}, γ =

To obtain an effective solution we need to find the parametersλ, µ.

SB Unconstrained algorithm:

Update for x:

x = arg minx{1

2‖Ax− b‖22 +

2‖Lx− h‖22}, h = d− g

Standard least squares update using a Tikhonov regularizer.Uses the standard basis but perturbed weights through h

x(k+1) =

p∑i=1

νiuTi b + λ2µiv

ν2i + λ2µ2

z̃i +

n∑i=p+1

(uTi b)z̃i

Update for d:

d = arg mind{µ‖d‖1 +

2‖d− c‖22}, c = Lx + g

= arg mind{‖d‖1 +

2‖d− c‖22}, γ =

To obtain an effective solution we need to find the parametersλ, µ.

The Situation

1. Inverse problem we need regularization2. For feature extraction we need more than Tikhonov

Regularization - e.g. TV3. Both techniques are parameter dependent4. Notice the dependence of the TV solution on Tikhonov

solutions5. The TV iterates over many Tikhonov solutions6. Moreover the parameters are needed7. We need to fully understand the Tikhonov and ill-posed

problems

Spectral Decomposition of the Solution: The SVD

Consider general overdetermined discrete problem

Ax = b, A ∈ Rm×n, b ∈ Rm, x ∈ Rn, m ≥ n.

Singular value decomposition (SVD) of A (full column rank)

A = UΣV T =

n∑i=1

uiσivTi , Σ = diag(σ1, . . . , σn).

gives expansion for the solution

n∑i=1

ui, vi are left and right singular vectors for ASolution is a weighted linear combination of the basis vectors vi

A = UΣV T =

n∑i=1

A = UΣV T =

n∑i=1

A = UΣV T =

n∑i=1

A = UΣV T =

n∑i=1

Illustration: Example of the basis vectors

Problem wing n = 84: Evaluating impact of higher precision.Cond(A) = 2.2056e+ 19. Hansen’s Regularization Toolbox.

h(s, t) = se−ts2, x(s) = 1,

3< t <

3b(t) =

e−t/9 − e−4t/9

Left Singular Vectors and Basis Depend on the kernel (matrix A)

Figure: The first few left singular vectors ui and basis vectors vi. Arethe results correct?

Use Matlab High Precision to examine the SVD

I Matlab digits allows high precision. Standard is 32.I Symbolic toolbox allows operations on high precision

variables with vpa.I SVD for vpa variables calculates the singular values

symbolically, but not the singular vectors.I Higher accuracy for the SVs generates higher accuracy

singular vectors.I Solutions with high precision can take advantage of

Matlab’s symbolic toolbox.

Left Singular Vectors and Basis Calculated in High Precision

Figure: The first few left singular vectors ui and basis vectors vi.Apparently with higher precision we preserve the frequency contentof the basis. How many can we use in the solution for x?

Power Spectrum for detecting white noise : a time series analysistechnique

Suppose for a given vector y that it is a time series indexed byposition, i.e. index i.Diagnostic 1 Does the histogram of entries of y generate

histogram consistent with y ∼ N(0, 1)? (i.e.independent normally distributed with mean 0 andvariance 1) Not practical to automatically look at ahistogram and make an assessment

Diagnostic 2 Test the expectation that yi are selected from awhite noise time series. Take the Fourier transformof y and form cumulative periodogram z frompower spectrum c

cj = |(dft(y)j |2, zj =

∑ji=1 cj∑qi=1 ci

, j = 1, . . . , q,

Automatic: Test is the line (zj , j/q) close to a straight line withslope 1 and length

√5/2?

cj = |(dft(y)j |2, zj =

, j = 1, . . . , q,

√5/2?

cj = |(dft(y)j |2, zj =

, j = 1, . . . , q,

√5/2?

Cumulative Periodogram for the left singular vectors

Figure: Standard precision on the left and high precision on the right:On left more vectors are close to white than on the right - vectors arewhite if the CP is close to the diagonal on the plot: Low frequencyvectors lie above the diagonal and high frequency below the diagonal.White noise follows the diagonal

Cumulative Periodogram for the basis vectors

Figure: Standard precision on the left and high precision on the right:On left more vectors are close to white than on the right - if you countthere are 9 vectors with true frequency content on the left

Measure Deviation from Straight Line: Basis Vectors

Figure: Testing for white noise for the standard precision vectors:Calculate the cumulative periodogram and measure the deviationfrom the “white noise” line or assess proportion of the vector outsidethe Kolmogorov Smirnov test at a 5% confidence level for white noiselines. In this case it suggests that about 9 vectors are noise free.

Cannot expect to use more than 9 vectors in the expansion forx. Additional terms are contaminated by noise - independent ofnoise in b

Measure Deviation from Straight Line: Basis Vectors

Figure: Testing for white noise for the standard precision vectors:Calculate the cumulative periodogram and measure the deviationfrom the “white noise” line or assess proportion of the vector outsidethe Kolmogorov Smirnov test at a 5% confidence level for white noiselines. In this case it suggests that about 9 vectors are noise free.

Cannot expect to use more than 9 vectors in the expansion forx. Additional terms are contaminated by noise - independent ofnoise in b

Discrete Picard condition: examine the weights of the expansion

Recall

n∑i=1

Here|uTi b|/σi = O(1)

Ratios are not large but are the values correct? Consideringonly the discrete Picard condition does not tell us whether theexpansion for the solution is correct.

Discrete Picard condition: examine the weights of the expansion

I From high precision calculation of σi shows they decayexponentially to zero (down to machine precision)

I The Picard condition considers ratios |uTi b|/σi for data b;they decay exponentially (down to the machine precision).

I Note ratios in this case are O(1) - hence noisecontaminated basis vectors are not ignored. i.e. toapproximate a solution with a discontinuity we need all thebasis vectors.

I We may obtain solutions by truncating the SVD

k∑i=1

I Now parameter k is a regularization parameterI For given example we know k < 10 independent of the

measured data b. We cannot see this from the Picardcondition.

The Truncated Solutions (Noise free data b)

Figure: Truncated SVD Solutions: Standard precision |uTi b|. Error in

the basis contaminates the solution

The Truncated Solutions (Noise free data b)

Figure: Truncated SVD Solutions: VPA calculation |uTi b|. Number of

terms not sufficient to represent the solution discontinuity

Observations

I Even when committing the inverse crime we will notachieve the solution if we cannot approximate the basiscorrectly.

I We need all basis vectors which contain the highfrequency terms in order to approximate a solution withhigh frequency components - e.g. edges.

I Reminder - this is independent of the data.I But is an indication of an ill-posed problem. In this case the

data that is modified exhibits in the matrix Adecomposition.

I We look at a problem with a smoother solution -what arethe issues?Problem shaw from the regularization toolbox defined oninterval [−π/2, π/2]

h(s, t) = (cos(s) + cos(t))(sin(u)/u)2, u = π(sin(s) + sin(t))