53
Algorithms for sparse optimization Michael P. Friedlander University of California, Davis Google May 19, 2015

Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

  • Upload
    dothuy

  • View
    224

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Algorithms for sparse optimization

Michael P. FriedlanderUniversity of California, Davis

GoogleMay 19, 2015

Page 2: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Sparse representations

= +

y = Hx y = Dx y =[H D

]x

20 40 60 80 100 20 40 60 80 100 50 100 150 200

Haar DCT Haar/DCT dictionary

2 / 40

Page 3: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Sparsity via a transform

Represent a signal y as a superposition of elementary signal atoms:

y =∑

j φjxj ie, y = Φx

Orthogonal bases yield a unique representation:

x = ΦTy

Overcomplete dictionaries, eg, A = [Φ1 Φ2] have more flexibility.But representation is not unique. One approach:

minimize nnz(x) subj to Ax ≈ y

3 / 40

Page 4: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Efficient transforms

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

Identity Fourier Wavelet4 / 40

Page 5: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Efficient transforms

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

Identity Fourier Wavelet4 / 40

Page 6: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Efficient transforms

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

Identity Fourier Wavelet4 / 40

Page 7: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Efficient transforms

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

Identity Fourier Wavelet4 / 40

Page 8: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Efficient transforms

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

Identity Fourier Wavelet4 / 40

Page 9: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Efficient transforms

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

Identity Fourier Wavelet4 / 40

Page 10: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Prototypical workflow

Measure structured signal y via linear measurements:

bi = 〈fi , y〉, i = 1, . . . ,m

ie, b = My , M =

Decode the m measurements:

MΦx ≈ b, x “sparse”

Reconstruct the signal:y := Φx

Guarantees: number of samples m needed to reconstruct signal w.h.p.• Compressed sensing (eg, M Gaussian/Φ orthogonal): O(k log n

k )• matrix completion: # matrix samples O(n5/4r log n)

5 / 40

Page 11: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Sparse optimization

Problem: find sparse solution x

minimize nnz(x) subj to Ax ≈ b

Convex relaxations, eg:

basis pursuit [Chen et al. (1998); Candes et al. (2006)]

minimizex∈Rn

‖x‖1 subj to Ax ≈ b (m n)

nuclear norm [Fazel (2002); Candes and Recht (2009)]

minimizeX∈Rn×p

∑i σi (X ) subj to AX ≈ b (m np)

6 / 40

Page 12: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

APPLICATIONSand

FORMULATIONS

Page 13: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Compressed sensing

= +

〈φ1, y〉, . . . , 〈φm, y〉 Φy = Φ = Gaussian

Φy ≈ Φ Haar DCT x ⇒ b ≈ A x

minimize ‖x‖1 subj to Ax ≈ b7 / 40

Page 14: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Image deblurring

Observe b = My , y = image, M = blurring operatorRecover significant coeff’s via

minimize 12‖MWx − b‖2 + λ‖x‖1

λ 2.3e+0 2.3e-2 2.3e-4‖Ax − b‖ 2.6e+2 3.0e+0 3.1e-2nnz(x) 72 1,741 28,484

Sparco problem blurrycam [Figueiredo et al. (2007)]8 / 40

Page 15: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Joint sparsity: source localization

s5s4s3s2s1

λ

θ

9 / 40

[Malioutov et al. (2005)]

minimize ‖X‖1,2subj to ‖B − AX‖F ≤ σ

phase-shift gain matrix=

arrival angles

sens

ors

x2x1 xpb1 b2 bp

Page 16: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Mass spectrometry – separating mixtures

20 40 60 80 1000

20

40

60

80

100

m/zR

elat

ive

inte

nsity

20 40 60 80 1000

20

40

60

80

100

m/z

Rel

ativ

e in

tens

ity

0 20 40 60 80 1000

20

40

60

80

100

m/z

Rel

ativ

e in

tens

ity

0 20 40 60 80 1000

20

40

60

80

100

m/z

Rel

ativ

e in

tens

ity

= + +

spectral fingerprints

A =

0 100 200 300 400 5000

5

10

15

20

25

Component

Per

cent

age

ActualRecoveredRecovered (Noisy)

+

mixture molecule 1 molecule 2 molecule 3

minimize∑

i xi subj to Ax ≈ b, x ≥ 0 [Du-Angeletti ’06, Berg-F. ’11]

10 / 40

Page 17: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Phase retrievalObserve signal x ∈ Cn through magnitudes.

bk = |〈ak , x〉|2, k = 1, . . . ,m

Lift.

bk = |〈ak , x〉|2 = 〈aka∗k , xx∗〉 = 〈aka∗k ,X 〉 with X = xx∗

Decode. Find X such that

〈aka∗k ,X 〉 ≈ bk , X 0, rank(X ) = 1

Convex relaxation.

minimizeX∈Hn

tr(X ) subj to 〈aka∗k ,X 〉 ≈ bk , X 0

[Chai et al. (2011); Candes et al. (2012); Waldspurger et al. (2015)]

11 / 40

Page 18: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Matrix completion

Observe random matrix entries of an m × p matrix Y .

bk = Yik ,jk ≡ eTikYejk = 〈eik eT

jk ,Y 〉, k = 1, . . . ,m

Decode. Find m × p matrix X such that

〈eik eTjk ,X 〉 ≈ bk , rank(X ) = min

Convex relaxation.

minimizeX∈Rm×p

∑iσi (X ) subj to 〈eik eT

jk ,X 〉 ≈ bk

[Fazel (2002); Candes and Recht (2009)]

12 / 40

Page 19: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

CONVEX OPTIMIZATION

Page 20: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Convex optimization

minimizex∈C

f (x)

f : Rn → R is a convex function, C ⊆ Rn is a convex set

Essential objective.

f (x) : Rn → R := R ∪ +∞ =

f (x) if x ∈ C+∞ otherwise

No loss of generality in restricting ourselves to unconstrained problems:

minimizex

f (x) + g(Ax)

f : Rn → R, g : Rn → R, A is m × n matrix

13 / 40

Page 21: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Convex functions

Definition. f : Rn → R is convex if its epigraph

epi f = (x , τ) ∈ Rn × R | f (x) ≤ τ

is convex.

14 / 40

Page 22: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Examples

Indicator on a convex set.

δC(x) =

0 if x ∈ C,+∞ otherwise,

epi δC = C × R+

Supremum of convex functions.

f := supj∈J

fj , fjj∈J convex functions

epi f =⋂j∈J

epi fj

Infimal convolution (epi-addition).

(f g)(x) = infz

f (z) + g(x − z)

epi(f g) = epi f + epi g

15 / 40

Page 23: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Examples

Generic form.minimize

xf (x) + g(Ax)

Basis pursuit.

minx

‖x‖1 st Ax = b, f = λ‖ · ‖1, g = δb

Basis pursuit denoising.

minx

‖x‖1 st ‖Ax − b‖2 ≤ σ, f = ‖ · ‖1, g = δ‖·−b‖2≤σ

Lasso.

minx

12‖Ax − b‖2 st ‖x‖1 ≤ τ, f = δτB1

, g = 12‖ · −b‖2

16 / 40

Page 24: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

DUALITY

Page 25: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Duality

minimizex

f (x) + g(Ax)

Decouple the objective terms:

minimizex ,z

f (x) + g(z) subj to Ax = z

Lagrangian function.

L(x , z , y) := f (x) + g(z) + 〈y ,Ax − z〉

encapsulates all the information about the problem:

supy

L(x , z , y) =

f (x) + g(z) if Ax = z+∞ otherwise

p∗ = infx ,z

supy

L(x , z , y)

17 / 40

Page 26: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Primal-dual pair.

p∗ = infx ,z

supy

L(x , z , y) (primal problem)

d∗ = supy

infx ,z

L(x , z , y) (dual problem)

The following always holds:

p∗ ≥ d∗ (weak duality)

Under some constraint qualification,

p∗ = d∗ (strong duality)

18 / 40

Page 27: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Fenchel dualDual problem.

supy

infx ,z

L(x , z , y) = supy

infx ,z

f (x) + g(z) + 〈y ,Ax − z〉

= supy

− sup

x

[〈ATy , x〉 − f (x)

]− sup

z

[〈−y , z〉 − g(z)

]

Conjugate function.

h∗(u) = supx〈u, x〉 − h(u)

Fenchel-dual pair.

minimizex

f (x) + g(Ax)

minimizey

f ∗(ATy) + g∗(−y)

19 / 40

Page 28: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

PROXIMAL METHODS

Page 29: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Proximal algorithm

minimizex

f (x) (g ≡ 0)

Proximal operator.

proxαf (x) := arg minz

f (z) + 1

2α‖z − x‖2≡ f 1

2α‖ · ‖2

Optimality. Every optimal solution x∗ solves

x = proxαf (x), ∀α > 0

Proximal iteration.x+ = proxαf (x)

[Martinet (1970)]

20 / 40

Page 30: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

xk+1 = proxαf (xk) := arg minz

f (x) + 1

2α‖z − x‖2

Descent. Because xk+1 is the minimizing argument

f (xk+1) + 12αk‖xk+1 − xk‖

2 ≤ f (x) + 12αk‖x − xk‖

2 ∀x

Set x = xk :f (xk+1)− f (xk) ≤ − 1

2αk‖xk+1 − xk‖

2

Convergence.• finite convergence for f polyhedral [Polyak and Tretyakov (1973)]• subproblems may be solved approximately [Rockafellar (1976)]• f (xk)− p∗ ≤ O(1/k) [Guler (1991)]• f (xk)− p∗ ≤ O(1/k2) via an accelerated variant [Guler (1992)]

21 / 40

Page 31: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Moreau-Yosida regularization

M-Y envelope fα(x) = minz

f (z) + 12α‖z − x‖2

proximal map proxαf (x) = arg minz

f (z) + 12α‖z − x‖2

Useful properties:

• differentiable: ∇fα(x) = 1α

[x − proxαf (x)

]• preserves minima: argmin fα = argmin f• decomposition: x = proxf (x) + proxf ∗(x)

Examples

vector 1-norm proxα‖·‖1(x) = sgn(x) . ∗max

|x | − α, 0

Schatten 1-norm proxα‖·‖1

(X ) = UΣV T , Σ = Diag[

proxα‖·‖1

(σ(X )

)]22 / 40

Page 32: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Augmented Lagrangian (AL) algorithm

Rockafellar (’73) connects AL to proximal method on dual

minimize f (x) subj to Ax = b ie, f (x) + δb(Ax)

Lagrangian: L (x , y) = f (x) + yT(Ax − b)

aug Lagrangian: Lρ(x , y) = f (x) + yT(Ax − b) + 12ρ‖Ax − b‖2

AL algorithm:

xk+1 = arg minx

Lρ(x , yk) [may be inexact]

yk+1 = yk + ρ(Axk+1 − b) [multiplier update]

• for linear constraints, AL algorithm ≡ Bregman• used by L1-Bregman for f (x) = ‖x‖1 [Yin et al. (2008)]• proposed by Hestenes/Powell (’69) for smooth nonlinear optimization

23 / 40

Page 33: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

SPLITTING METHODS

projected gradient, proximal-gradient, iterative soft-thresholding,alternating projections

Page 34: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Proximal gradient

minimizex

f (x) + g(x)

Algorithm.

x+ = proxαg(x − α∇f (x)

)with

α ∈ (µ, 2/L), µ > 0, ‖∇f (x)−∇f (y)‖ ≤ L‖x − y‖

Convergence.

• f (xk)− p∗ ≤ O(1/k ) constant stepsize (α ≡ 1/L)

• f (xk)− p∗ ≤ O(1/k2) accelerated variant [Beck and Teboulle (2009)]

• f (xk)− p∗ ≤ O(γk), γ < 1, with stronger assumptions

24 / 40

Page 35: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Interpretations

Majorization minimization. Lipshitz assumption on ∇f implies

f (z) ≤ f (x) + 〈∇f (x), z − x〉+ L2‖z − x‖2 ∀x , z

Proximal-gradient iteration minimizes the majorant: ∀α ∈ (0, 1/L)

x+ = proxαg(x − α∇f (x)

)≡ arg min

z

g(z) + f (x) + 〈∇f (x), z − x〉+ 1

2α‖z − x‖2

Forward-backward splitting.

x+ = proxαg (x − α∇f (x)) = (I + α∂g)−1︸ ︷︷ ︸forwardstep

(I − α∇f )︸ ︷︷ ︸backwardstep

(x)

25 / 40

Page 36: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Projected gradient

minimize f (x) subj to x ∈ C xk+1 = projC(xk − αk∇f (xk))

LASSO

minimize‖x‖1≤τ

12‖Ax − b‖2 via f (x) = 1

2‖Ax − b‖2, g(x) = δτB1(x)

• projC(·) can be computed in O(n) (randomized) [Duchi et al. (2008)]• used by SPGL1 for Lasso subproblems [vdBerg and Friedlander (2008)]

BPDN (Lagrangian form)

minimizex∈Rn

12‖Ax − b‖2 + λ‖x‖1 via minimize

x∈R2n+

12‖Ax − b‖2 + λeTx

• proj≥0(·) can be computed in O(n)• GPSR uses Barzilai-Borwein for step αk [Figueiredo-Nowak-Wright 07]

26 / 40

Page 37: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Iterative soft-thresholding (IST)

minimizex

12‖Ax − b‖2 + λ‖x‖1

x+ = proxαλ‖·‖1(x − αATr), r ≡ Ax − b

= Sαλ(x − αATr)

where

α ∈ (0, 2/‖A‖2) and Sτ (x) = sgn(x) ·max|x | − τ, 0

[aka, iterative shrinkage, iterative Landweber]

Derived from various viewpoints:• expectation-maximization [Figueiredo & Nowak ’03]• surrogate functionals [Daubechies et al ’04]• forward-backward splitting: [Combettes & Wajs ’05]• FISTA (accelerated variant) [Beck-Teboulle ’09]

27 / 40

Page 38: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Low-rank approximation via nuclear-norm:

minimize 12‖AX − b‖2 + λ‖X‖n with ‖X‖n =

∑σj(X )

Singular-value soft-thresholding algorithm:

x+ = Sαλ(X − αA∗(r))

with

r := AX − b, Sλ(Z ) = U diag(σ)V T , σ = Sλ(σ(Z ))

Computing Sλ:

No need to compute full SVD of Z , only leading singular values/vectors• Monte-Carlo approach [Ma-Goldfarb-Chen ’08]• Lanczos bidiagonalization (via PROPACK)

[Cai-Candes-Shen ’08; Jain et al ’10]

28 / 40

Page 39: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Typical behavior

minimizeX∈Rn1×n2

‖X‖1 subj to XΩ ≈ b

recover 1000× 1000 matrix Xrank(X∗) = 4

0 10 20 300

100

200

iteration

no.o

fsin

gular

trip

les

29 / 40

Page 40: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Backward-backward splitting

Approximate one of the functions via its (smooth) Moreau envelope.

minimizex

fα(x) + g(x) (both nonsmooth)

fα = f 12α‖ · ‖

2, ∇fα(x) = 1α [x − proxαf (x)]

Proximal gradient.

x+ = proxαg(x − α∇fα(x)

)= proxαg proxαf (x)

Projection onto convex sets (POCS).

f = δC , g = δD

x+ = projD projC(x)solves the problem

minimizex ,z

12‖z − x‖2 subj to x ∈ D, z ∈ C

30 / 40

Page 41: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Alternating direction of method of multipliers (ADMM)

minimizex

f (x) + g(x)

ADMM algorithm.

x+ = proxαf (z − u)z+ = proxαg (x+ + u)u+ = u + x+ − z+

• used by YALL1 [Zhang et al. (2011)]• ADMM algo ≡ Douglas-Rachford ≡(

linearconstraints

) split Bregman

• Esser’s UCLA PhD thesis (’10) ; survey by Boyd et al. (’11)

31 / 40

Page 42: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

PROXIMAL SMOOTHING

Page 43: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Primal smoothing

minimizex

f (x) + g(Ax)

Partial smoothing via the Moreau envelope:

minimizex

fµ(x) + g(Ax), fµ = f 12µ‖ · ‖

2

Example (Basis Pursuit). Huber approximation to 1-norm:

f = ‖ · ‖1, g = δb =⇒ minimizex

‖x‖1 subj to Ax = b

Smoothed function is Huber with parameter µ:

fµ(x) =

x2/2µ if |x | ≤ µ|x | − µ/2 otherwise

µ 0 =⇒ x∗µ → x∗

32 / 40

Page 44: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Dual smoothing

Regularize the primal problem.minimize

xf (x) + µ

2 ‖x‖2 + g(Ax)

Dualize. Regularizing f smoothes the conjugate:

(f + µ2 ‖ · ‖

2)∗ = (f ∗ 12µ‖ · ‖

2) = f ∗µ

minimizey

f ∗µ (ATy) + g∗(−y)

Example (Basis Pursuit).f = ‖ · ‖1, g = δb =⇒ min

x‖x‖1 + µ

2 ‖x‖2 subj to Ax = b

f ∗µ = 12µ dist2

B∞ =⇒ maxy

〈b, y〉+ 12µ dist2

B∞(ATy)

Exact regularization: x∗µ = x∗ for all µ ∈ [0, µ), µ > 0[Mangasarian and Meyer (1979); Friedlander and Tseng (2007)]

[Nesterov (2005); Beck and Teboulle (2012)]; TFOCS [Becker et al. (2011)] 33 / 40

Page 45: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

CONCLUSION

Page 46: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

much left unsaid, eg,• optimal first-order methods [Nesterov (1988, 2005)]• block coordinate descent [Sardy et al. (2004)]• active-set methods (eg, LARS & Homotopy) [Osborne et al. (2000)]

my own work.• avoiding proximal operators [Friedlander et al. (2014) (SIOPT)]• greedy coordinate descent [Nutini et al. (2015) (ICML)]

email• [email protected]

web• www.math.ucdavis.edu/˜mpf

34 / 40

Page 47: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

REFERENCES

Page 48: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

References I

A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm forlinear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202,2009.

A. Beck and M. Teboulle. Smoothing and first order methods: A unifiedframework. SIAM J. Optim., 22(2):557–580, 2012.

S. Becker, E. Candes, and M. Grant. Templates for convex cone problems withapplications to sparse signal recovery. Math. Program. Comp., 3:165–218,2011. doi: 10.1007/s12532-011-0029-5.

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributedoptimization and statistical learning via the alternating direction method ofmultipliers. Foundations and Trends in Machine Learning, 3(1):1–122,2011.

J.-F. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithmfor matrix completion. SIAM J. Optim., 4(20):1956–1982, 2010. doi:10.1137/080738970.

E. J. Candes and B. Recht. Exact matrix completion via convex optimization.Found. Comput. Math., 9:717–772, December 2009.

35 / 40

Page 49: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

References IIE. J. Candes, J. Romberg, and T. Tao. Stable signal recovery from incomplete

and inaccurate measurements. Comm. Pure Appl. Math., 59(8):1207–1223, 2006.

E. J. Candes, T. Strohmer, and V. Voroninski. Phaselift: Exact and stablesignal recovery from magnitude measurements via convex programming.Commun. Pur. Appl. Ana., 2012.

A. Chai, M. Moscoso, and G. Papanicolaou. Array imaging using intensity-onlymeasurements. Inverse Problems, 27(1):015005, 2011. URLhttp://stacks.iop.org/0266-5611/27/i=1/a=015005.

S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basispursuit. SIAM J. Sci. Comput., 20(1):33–61, 1998.

P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backwardsplitting. Multiscale Model. Simul., 4(4):1168–1200, 2005.

I. Daubechies, M. Defrise, and C. D. Mol. An iterative thresholding algorithmfor linear inverse problems with a sparsity constraint. Comm. Pure Appl.Math., 57:1413–1457, 2004.

J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projectionsonto the `1-ball for learning in high dimensions. In Proceedings of the 25thInternational Conference on Machine Learning, pages 272–279, 2008.

36 / 40

Page 50: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

References IIIJ. E. Esser. Primal Dual Algorithms for Convex Models and Applications to

Image Restoration, Registration and Nonlocal Inpainting. PhD thesis,University of California, Los Angeles, 2010.

M. Fazel. Matrix rank minimization with applications. PhD thesis, Elec.Eng. Dept, Stanford University, 2002.

M. Figueiredo, R. Nowak, and S. J. Wright. Gradient Projection for SparseReconstruction: Application to Compressed Sensing and Other InverseProblems. Sel. Top. in Signal Process., IEEE J., 1(4):586–597, 2007.

M. A. T. Figueiredo and R. D. Nowak. An EM algorithm for wavelet-basedimage restoration. IEEE Trans. Image Process., 12(8):906–916, August2003.

M. P. Friedlander and P. Tseng. Exact regularization of convex programs.SIAM J. Optim., 18(4):1326–1350, 2007. doi: 10.1137/060675320.

M. P. Friedlander, I. Macedo, and T. K. Pong. Gauge optimization and duality.SIAM J. Optim., 24(4):1999–2022, 2014. doi: 10.1137/130940785.

O. Guler. On the convergence of the proximal point algorithm for convexminimization. SIAM Journal on Control and Optimization, 29(2):403–419,1991. doi: 10.1137/0329022. URL http://dx.doi.org/10.1137/0329022.

37 / 40

Page 51: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

References IVO. Guler. New proximal point algorithms for convex minimization. SIAM

Journal on Optimization, 2(4):649–664, 1992. doi: 10.1137/0802032. URLhttp://dx.doi.org/10.1137/0802032.

M. R. Hestenes. Multiplier and gradient methods. J. Optim. Theory Appl., 4(5):303–320, 1969.

S. Ma, D. Goldfarb, and L. Chen. Fixed point and Bregman iterative methodsfor matrix rank minimization. Math. Program., 128:321–353, 2011. doi:10.1007/s10107-009-0306-5.

D. Malioutov, M. Cetin, and A. S. Willsky. A sparse signal reconstructionperspective for source localization with sensor arrays. IEEE Trans. Sig.Proc., 53(8):3010–3022, August 2005.

O. L. Mangasarian and R. R. Meyer. Nonlinear perturbation of linear programs.SIAM J. Control Optim., 17(6):745–752, November 1979.

B. Martinet. Regularisation d’inequations variationnelles par approximationssuccessives. Rev. FranÂÿcaise Informat. Recherche OpÂťerationnelle, 4(Ser. R-3):154âĂŞ158, 1970.

Y. Nesterov. On an approach to the construction of optimal methods ofminimization of smooth convex functions. Ekonom. i. Mat. Metody, 24,1988.

38 / 40

Page 52: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

References VY. Nesterov. Smooth minimization of non-smooth functions. Math. Program.,

103:127–152, 2005.P. Netrapalli, P. Jain, and S. Sanghavi. Phase retrieval using alternating

minimization. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, andK. Weinberger, editors, Advances in Neural Information ProcessingSystems 26, pages 2796–2804. Curran Associates, Inc., 2013. URLhttp://papers.nips.cc/paper/5041-pha.

J. Nutini, I. Laradji, M. Schmidt, and M. Friedlander. Coordinate descentconverges faster with the gauss-southwell rule than random selection. InInter. Conf. Mach. Learning, 2015.

M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variableselection in least squares problems. IMA J. Numer. Anal., 20(3):389–403,2000.

B. Polyak and N. Tretyakov. The method of penalty bounds for constrainedextremum problems. Zh. Vych. Mat i Mat. Fiz, 13:34–46, 1973.

M. J. D. Powell. A method for nonlinear constraints in minimization problems.In R. Fletcher, editor, Optimization, chapter 19. Academic Press, Londonand New York, 1969.

39 / 40

Page 53: Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

References VIR. T. Rockafellar. Augmented Lagrangians and applications of the proximal

point algorithm in convex programming. Mathematics of OperationsResearch, 1(2):97–116, 1976.

S. Sardy, A. Antoniadis, and P. Tseng. Automatic smoothing with wavelets fora wide class of distributions. J. Comput. Graph. Stat., 13, 2004.

E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basispursuit solutions. SIAM J. Optim., 31(2):890–912, 2008. doi:10.1137/080714488.

I. Waldspurger, A. dâĂŹAspremont, and S. Mallat. Phase recovery, maxcut andcomplex semidefinite programming. Math. Program., 149(1-2):47–81, 2015.

W. Yin, S. Osher, D. Goldfarb, and J. Darbon. Bregman iterative algorithmsfor l1 minimization with applications to compressed sensing. SIAM J. Imag.Sci., 1, 2008.

Y. Zhang, W. Deng, J. Yang, , and W. Yin, 2011. URLhttp://yall1.blogs.rice.edu/.

40 / 40