Iterative Shrinkage/Thresholding Algorithms: Some History ...people.ee.duke.edu/~lcarin/figueiredo.pdf · Iterative Shrinkage/Thresholding Algorithms: Some History and Recent Development

Iterative Shrinkage/Thresholding Algorithms: g g gSome History and Recent Development

Mário A. T. Figueiredog

Instituto de TelecomunicaçõesandInstituto Superior TécnicoInstituto Superior Técnico, Technical University of Lisbon

PORTUGAL

CS Workshop, Duke, [email protected]

Signal/Image Restoration/Representation/Reconstruction

M i l/i t ti / i ti it i h th fMany signal/image reconstruction/approximation criteria have the form

is smooth and convex (the data fidelity term); usually,

is a regularization/penalty function;

typically convex (sometimes not), often non-differentiable.

Examples: TV-based and wavelet-based restoration/reconstruction, sparse representations sparse (linear or logistic) regression

CS Workshop, Duke, 2009

sparse representations, sparse (linear or logistic) regression,compressive sensing (with )

Outline

1. The optimization problem (previous slide)

2 IST Al ith 4 d i ti2. IST Algorithms: 4 derivations

3. Convergence results

4. Enhanced (accelerated) versions: TwIST and SpaRSA

5 Warm starting and continuation5. Warm starting and continuation

6. Concluding remarks


Denoising/shrinkage operators

If , we have a denoising problem.

If is proper and convex , is strictly convex, there is a unique minimizer.

If , we have a denoising problem.

Thus, the so-called shrinkage/thresholding/denoising function

is well defined (Moreau proximal mapping) [Moreau 1962], [Combettes 2001]

Examples:


(not convex)

Iterative Shrinkage/Thresholding (IST)

Problem:

IST algorithm:

Adequate when products by and are efficiently computable (e g FFT)(e.g., FFT)

Since is the gradient of

if , IST is gradient descent with step length


IST also applicale in Bregman iterations to solve constrained problems[Yin, Osher, Goldfarb, Darbon, 2008]

IST as Expectation-Maximization [F. and Nowak, 2001, 2003]

Underlying observation model:

Equivalent model:Equivalent model:

Hidden image:

E-step: (Wiener)

M-step:


monotonicity

IST as Majorization-Minimization [Daubechies, Defrise, De Mol, 2004]

Majorization function:j

MM algorithm:

Monotonicity:

If , we can set


Thus,

IST as Forward-Backward Splitting

(the minimizeris unique)

Back to the problem differentiableconvex

(fixed point equation)


Fixed point scheme:

IST as Separable Approximation

Recall the problem:

Separable approximation to

Recall the problem:

Iteration:

If i

Can be re-written as

If is convex,

The objective function in each iteration can be seen as the Lagrangian for


…a trust-region method.

Bibliographical Notes

IST as expectation-maximization: [F. and Nowak, 2001, 2003]

IST as majorization-minimization: [Daubechies, Defrise, De Mol, 2003, 2004][F No ak Bio cas Dias 2005 2007][F., Nowak, Bioucas-Dias, 2005, 2007]

Forward-backward schemes in math: [Bruck, 1977], [Passty, 1979], [Lions and Mercier, 1979]

Forward-backward schemes in signal reconstruction: [Combettes and Wajs, 2003, 2004]

Separable approximation: [Wright, Nowak, and F., 2008]

Other authors independently proposed IST schemes for signal/image recovery: [Bect Blanc Féraud Aubert and Chambolle 2004][Bect, Blanc-Féraud, Aubert, and Chambolle, 2004], [Elad, Matalon, and Zibulevsky, 2006],[Starck, Nguyen, Murtagh, 2003],[Starck, Candès, Donoho, 2003],[Hale Yin Zhang 2007]


[Hale, Yin, Zhang, 2007]

Existence, Uniqueness

is non empty if is coercive ( )

has at most one element if is strictly convex or is invertible

[Combettes and Wajs 2004]

has exactly one element if is bounded bellow

[Combettes and Wajs, 2004]


Convergence Results (I)

Problem:

IST algorithm:

[Daubechies, Defrise, De Mol, 2004]: (applies in a Hilbert space setting)

Let and ; thenLet , , and ; then,

IST converges to a minimizer of

[Combettes and Wajs, 2005]: (applies to a more general version of IST)

L t b d ( t h )Let be convex and proper (never , not everywhere)

; then, IST converges to a minimizer ofand


Convergence Results (II)

Problem:

IST algorithm:

ob e

[Hale, Yin, Zhang, 2007]:

Let and

Then, IST converges to some and,

for all but a finite number of iterations:


where

Accelerating IST: Two-Step IST (TwIST)

IST becomes slow when is very ill conditioned and is small

Inspired by two-step method for linear systems [Frankel, 1950], [Axelsson, 1996],

TwIST algorithm [Bioucas-Dias and F., 2007]

IST becomes slow when is very ill-conditioned and is small

g [ ]

Simplified analysis with

The minimizer is unique and TwIST converges to ,

There is an optimal choice for and for which


Accelerating IST: TwIST (II)

A t th d i d fA one-step method is recovered for

which is an over-relaxed version of the original IST.

For the optimal choice of :p

~ number of iterations to decrease error by factor of 10.

Example:


Another two-step method was recently proposed in [Beck and Teboulle, 2008]

original Blurred, 9x9, 40db noise restored

Accelerating IST: TwIST (III)

o g a Blurred, 9x9, 40db noise es o ed

182

α=β=1

12

14

16 α=β=1β=β0

Second order

NR

TwIST

1.8

ββ=β0Second order

unct

ion

8

10

12SN

IST

over-relaxed IST

1.4

1.6

TwIST

IST

over-relaxed IST

obje

ctiv

e fu


0 200 400 600 800 10006

iterations

IST

0 200 400 600 800 1000iterations

TwIST

Initialization: choose and ; set

Accelerating IST: The SpaRSA Algorithmic Framework

Initialization: choose , , and ; set

repeat:chooserepeat:choose

until (* acceptance criterion *)

until stopping criterion is satisfied

until ( acceptance criterion )

[Wright, Nowak, F., 2008]until stopping criterion is satisfied.

Variants of SpaRSA are distinguished by the choice of , , and


Examples: yields standard IST.

yields monotone SpaRSA

The Barzilai Borwein approach: seek to mimic a Newton step

Choosing for Speed

The Barzilai-Borwein approach: seek to mimic a Newton step,

a less conservative choice than in IST:

With a least-squares criterion over the last step,

If , then

Alternative rule (SpaRSA-monotone): , with


Compressed Sensing Experiment

210 x 212 random (Gaussian), 160 randomly located non-zeros

where, where

[F Nowak Wright 2007]

[Kim Koh Lustig Boyd Gorinvesky 2007]

[F., Nowak, Wright, 2007]

[Hale, Yin, Zhang, 2007]

[Kim, Koh, Lustig, Boyd, Gorinvesky, 2007]

[Bioucas-Dias, F., 2007]

[Nesterov, 2007]

CS Workshop, Duke, 2009GPSR and l1_ls are “hardwired” for

Non-monotonicity

SpaRSA--monotoneSpaRSASpaRSAGPSR


Convergence of SpaRSA

P blProblem:

Critical point ifCritical point if

Criticality is necessary for optimality. If b th d it i l ffi i tIf both and are convex, it is also sufficient.

Safeguarded SpaRSA (S-SParRSA) [Wright, Nowak, F., 2008]

where usually e g

Let be Lipschitz continuously differentiable, convex and finite-valued, and

where , usually , e.g.,


Let be Lipschitz continuously differentiable, convex and finite valued, and bounded below. Then, all accumulation points of S-SpaRSA are critical points of

Warm Starting and Continuation

SpaRSA (as GPSR, IST, etc) is slow for small

SpaRSA (as GPSR and IST) is “warm-startable”,

i.e., it benefits (a lot) from a good initialization.

Continuation scheme: start with largeg

slowly decrease while tracking the solution.

IST + continuation = fixed point continuation (FPC) [Hale, Yin, Zhang, 2007]


Continuation Experiment


For , the solution is the zero vector

Conclusions

- Reviewed several ways to derive the IST algorithm

Reviewed several convergence results for IST

- Described recent accelerated versions: TwIST, SpaRSA

- Reviewed several convergence results for IST

p

- IST and SpaRSA benefits (a lot) from a continuation scheme.

-State-of-the-art performance for a variety of problems:MRI reconstruction (TV and wavelets), MEG imaging, deconvolution,( ), g g, ,compressed sensing, …


Documents

Iterative Shrinkage/Thresholding Algorithms: Some History ...people.ee.duke.edu/~lcarin/figueiredo.pdf · Iterative Shrinkage/Thresholding Algorithms: Some History and Recent Development