Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of

Algorithms for sparse optimization

Michael P. FriedlanderUniversity of California, Davis

GoogleMay 19, 2015

Sparse representations

= +

y = Hx y = Dx y =[H D

]x

20 40 60 80 100 20 40 60 80 100 50 100 150 200

Haar DCT Haar/DCT dictionary

2 / 40

Sparsity via a transform

Represent a signal y as a superposition of elementary signal atoms:

y =∑

j φjxj ie, y = Φx

Orthogonal bases yield a unique representation:

x = ΦTy

Overcomplete dictionaries, eg, A = [Φ1 Φ2] have more flexibility.But representation is not unique. One approach:

minimize nnz(x) subj to Ax ≈ y

3 / 40

Efficient transforms

0 20 40 60 80 1000

20

40

60

80

100

Percentage of coefficients

Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

Identity Fourier Wavelet4 / 40


0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y



0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y



0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y



0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y



0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y

0 20 40 60 80 1000

20

40

60

80

100


Per

cent

age

of e

nerg

y


Prototypical workflow

Measure structured signal y via linear measurements:

bi = 〈fi , y〉, i = 1, . . . ,m

ie, b = My , M =

Decode the m measurements:

MΦx ≈ b, x “sparse”

Reconstruct the signal:y := Φx

Guarantees: number of samples m needed to reconstruct signal w.h.p.• Compressed sensing (eg, M Gaussian/Φ orthogonal): O(k log n

k )• matrix completion: # matrix samples O(n5/4r log n)

5 / 40

Sparse optimization

Problem: find sparse solution x

minimize nnz(x) subj to Ax ≈ b

Convex relaxations, eg:

basis pursuit [Chen et al. (1998); Candes et al. (2006)]

minimizex∈Rn

‖x‖1 subj to Ax ≈ b (m n)

nuclear norm [Fazel (2002); Candes and Recht (2009)]

minimizeX∈Rn×p

∑i σi (X ) subj to AX ≈ b (m np)

6 / 40

APPLICATIONSand

FORMULATIONS

Compressed sensing

= +

〈φ1, y〉, . . . , 〈φm, y〉 Φy = Φ = Gaussian

Φy ≈ Φ Haar DCT x ⇒ b ≈ A x

minimize ‖x‖1 subj to Ax ≈ b7 / 40

Image deblurring

Observe b = My , y = image, M = blurring operatorRecover significant coeff’s via

minimize 12‖MWx − b‖2 + λ‖x‖1

λ 2.3e+0 2.3e-2 2.3e-4‖Ax − b‖ 2.6e+2 3.0e+0 3.1e-2nnz(x) 72 1,741 28,484

Sparco problem blurrycam [Figueiredo et al. (2007)]8 / 40

https://github.com/MPF-Optimization-Laboratory/Sparco

Joint sparsity: source localization

s5s4s3s2s1

λ

θ

9 / 40

[Malioutov et al. (2005)]

minimize ‖X‖1,2subj to ‖B − AX‖F ≤ σ

phase-shift gain matrix=

arrival angles

sens

ors

x2x1 xpb1 b2 bp

Mass spectrometry – separating mixtures

20 40 60 80 1000

20

40

60

80

100

m/zR

elat

ive

inte

nsity

20 40 60 80 1000

20

40

60

80

100

m/z

Rel

ativ

e in

tens

ity

0 20 40 60 80 1000

20

40

60

80

100

m/z

Rel

ativ

e in

tens

ity

0 20 40 60 80 1000

20

40

60

80

100

m/z

Rel

ativ

e in

tens

ity

= + +

spectral fingerprints

A =

0 100 200 300 400 5000

5

10

15

20

25

Component

Per

cent

age

ActualRecoveredRecovered (Noisy)

+

mixture molecule 1 molecule 2 molecule 3

minimize∑

i xi subj to Ax ≈ b, x ≥ 0 [Du-Angeletti ’06, Berg-F. ’11]

10 / 40

Phase retrievalObserve signal x ∈ Cn through magnitudes.

bk = |〈ak , x〉|2, k = 1, . . . ,m

Lift.

bk = |〈ak , x〉|2 = 〈aka∗k , xx∗〉 = 〈aka∗k ,X 〉 with X = xx∗

Decode. Find X such that

〈aka∗k ,X 〉 ≈ bk , X 0, rank(X ) = 1

Convex relaxation.

minimizeX∈Hn

tr(X ) subj to 〈aka∗k ,X 〉 ≈ bk , X 0

[Chai et al. (2011); Candes et al. (2012); Waldspurger et al. (2015)]

11 / 40

Matrix completion

Observe random matrix entries of an m × p matrix Y .

bk = Yik ,jk ≡ eTikYejk = 〈eik eT

jk ,Y 〉, k = 1, . . . ,m

Decode. Find m × p matrix X such that

〈eik eTjk ,X 〉 ≈ bk , rank(X ) = min

Convex relaxation.

minimizeX∈Rm×p

∑iσi (X ) subj to 〈eik eT

jk ,X 〉 ≈ bk

[Fazel (2002); Candes and Recht (2009)]

12 / 40

CONVEX OPTIMIZATION

Convex optimization

minimizex∈C

f (x)

f : Rn → R is a convex function, C ⊆ Rn is a convex set

Essential objective.

f (x) : Rn → R := R ∪ +∞ =

f (x) if x ∈ C+∞ otherwise

No loss of generality in restricting ourselves to unconstrained problems:

minimizex

f (x) + g(Ax)

f : Rn → R, g : Rn → R, A is m × n matrix

13 / 40

Convex functions

Definition. f : Rn → R is convex if its epigraph

epi f = (x , τ) ∈ Rn × R | f (x) ≤ τ

is convex.

14 / 40

Examples

Indicator on a convex set.

δC(x) =

0 if x ∈ C,+∞ otherwise,

epi δC = C × R+

Supremum of convex functions.

f := supj∈J

fj , fjj∈J convex functions

epi f =⋂j∈J

epi fj

Infimal convolution (epi-addition).

(f g)(x) = infz

f (z) + g(x − z)

epi(f g) = epi f + epi g

15 / 40

Examples

Generic form.minimize

xf (x) + g(Ax)

Basis pursuit.

minx

‖x‖1 st Ax = b, f = λ‖ · ‖1, g = δb

Basis pursuit denoising.

minx

‖x‖1 st ‖Ax − b‖2 ≤ σ, f = ‖ · ‖1, g = δ‖·−b‖2≤σ

Lasso.

minx

12‖Ax − b‖2 st ‖x‖1 ≤ τ, f = δτB1

, g = 12‖ · −b‖2

16 / 40

DUALITY

Duality

minimizex

f (x) + g(Ax)

Decouple the objective terms:

minimizex ,z

f (x) + g(z) subj to Ax = z

Lagrangian function.

L(x , z , y) := f (x) + g(z) + 〈y ,Ax − z〉

encapsulates all the information about the problem:

supy

L(x , z , y) =

f (x) + g(z) if Ax = z+∞ otherwise

p∗ = infx ,z

supy

L(x , z , y)

17 / 40

Primal-dual pair.

p∗ = infx ,z

supy

L(x , z , y) (primal problem)

d∗ = supy

infx ,z

L(x , z , y) (dual problem)

The following always holds:

p∗ ≥ d∗ (weak duality)

Under some constraint qualification,

p∗ = d∗ (strong duality)

18 / 40

Fenchel dualDual problem.

supy

infx ,z

L(x , z , y) = supy

infx ,z

f (x) + g(z) + 〈y ,Ax − z〉

= supy

− sup

x

[〈ATy , x〉 − f (x)

]− sup

z

[〈−y , z〉 − g(z)

]

Conjugate function.

h∗(u) = supx〈u, x〉 − h(u)

Fenchel-dual pair.

minimizex

f (x) + g(Ax)

minimizey

f ∗(ATy) + g∗(−y)

19 / 40

PROXIMAL METHODS

Proximal algorithm

minimizex

f (x) (g ≡ 0)

Proximal operator.

proxαf (x) := arg minz

f (z) + 1

2α‖z − x‖2≡ f 1

2α‖ · ‖2

Optimality. Every optimal solution x∗ solves

x = proxαf (x), ∀α > 0

Proximal iteration.x+ = proxαf (x)

[Martinet (1970)]

20 / 40

xk+1 = proxαf (xk) := arg minz

f (x) + 1

2α‖z − x‖2

Descent. Because xk+1 is the minimizing argument

f (xk+1) + 12αk‖xk+1 − xk‖

2 ≤ f (x) + 12αk‖x − xk‖

2 ∀x

Set x = xk :f (xk+1)− f (xk) ≤ − 1

2αk‖xk+1 − xk‖

2

Convergence.• finite convergence for f polyhedral [Polyak and Tretyakov (1973)]• subproblems may be solved approximately [Rockafellar (1976)]• f (xk)− p∗ ≤ O(1/k) [Guler (1991)]• f (xk)− p∗ ≤ O(1/k2) via an accelerated variant [Guler (1992)]

21 / 40

Moreau-Yosida regularization

M-Y envelope fα(x) = minz

f (z) + 12α‖z − x‖2

proximal map proxαf (x) = arg minz

f (z) + 12α‖z − x‖2

Useful properties:

• differentiable: ∇fα(x) = 1α

[x − proxαf (x)

]• preserves minima: argmin fα = argmin f• decomposition: x = proxf (x) + proxf ∗(x)

Examples

vector 1-norm proxα‖·‖1(x) = sgn(x) . ∗max

|x | − α, 0

Schatten 1-norm proxα‖·‖1

(X ) = UΣV T , Σ = Diag[

proxα‖·‖1

(σ(X )

)]22 / 40

Augmented Lagrangian (AL) algorithm

Rockafellar (’73) connects AL to proximal method on dual

minimize f (x) subj to Ax = b ie, f (x) + δb(Ax)

Lagrangian: L (x , y) = f (x) + yT(Ax − b)

aug Lagrangian: Lρ(x , y) = f (x) + yT(Ax − b) + 12ρ‖Ax − b‖2

AL algorithm:

xk+1 = arg minx

Lρ(x , yk) [may be inexact]

yk+1 = yk + ρ(Axk+1 − b) [multiplier update]

• for linear constraints, AL algorithm ≡ Bregman• used by L1-Bregman for f (x) = ‖x‖1 [Yin et al. (2008)]• proposed by Hestenes/Powell (’69) for smooth nonlinear optimization

23 / 40

SPLITTING METHODS

projected gradient, proximal-gradient, iterative soft-thresholding,alternating projections

Proximal gradient

minimizex

f (x) + g(x)

Algorithm.

x+ = proxαg(x − α∇f (x)

)with

α ∈ (µ, 2/L), µ > 0, ‖∇f (x)−∇f (y)‖ ≤ L‖x − y‖

Convergence.

• f (xk)− p∗ ≤ O(1/k ) constant stepsize (α ≡ 1/L)

• f (xk)− p∗ ≤ O(1/k2) accelerated variant [Beck and Teboulle (2009)]

• f (xk)− p∗ ≤ O(γk), γ < 1, with stronger assumptions

24 / 40

Interpretations

Majorization minimization. Lipshitz assumption on ∇f implies

f (z) ≤ f (x) + 〈∇f (x), z − x〉+ L2‖z − x‖2 ∀x , z

Proximal-gradient iteration minimizes the majorant: ∀α ∈ (0, 1/L)

x+ = proxαg(x − α∇f (x)

)≡ arg min

z

g(z) + f (x) + 〈∇f (x), z − x〉+ 1

2α‖z − x‖2

Forward-backward splitting.

x+ = proxαg (x − α∇f (x)) = (I + α∂g)−1︸︷︷︸forwardstep

(I − α∇f )︸︷︷︸backwardstep

(x)

25 / 40

Projected gradient

minimize f (x) subj to x ∈ C xk+1 = projC(xk − αk∇f (xk))

LASSO

minimize‖x‖1≤τ

12‖Ax − b‖2 via f (x) = 1

2‖Ax − b‖2, g(x) = δτB1(x)

• projC(·) can be computed in O(n) (randomized) [Duchi et al. (2008)]• used by SPGL1 for Lasso subproblems [vdBerg and Friedlander (2008)]

BPDN (Lagrangian form)

minimizex∈Rn

12‖Ax − b‖2 + λ‖x‖1 via minimize

x∈R2n+

12‖Ax − b‖2 + λeTx

• proj≥0(·) can be computed in O(n)• GPSR uses Barzilai-Borwein for step αk [Figueiredo-Nowak-Wright 07]

26 / 40

Iterative soft-thresholding (IST)

minimizex

12‖Ax − b‖2 + λ‖x‖1

x+ = proxαλ‖·‖1(x − αATr), r ≡ Ax − b

= Sαλ(x − αATr)

where

α ∈ (0, 2/‖A‖2) and Sτ (x) = sgn(x) ·max|x | − τ, 0

[aka, iterative shrinkage, iterative Landweber]

Derived from various viewpoints:• expectation-maximization [Figueiredo & Nowak ’03]• surrogate functionals [Daubechies et al ’04]• forward-backward splitting: [Combettes & Wajs ’05]• FISTA (accelerated variant) [Beck-Teboulle ’09]

27 / 40

Low-rank approximation via nuclear-norm:

minimize 12‖AX − b‖2 + λ‖X‖n with ‖X‖n =

∑σj(X )

Singular-value soft-thresholding algorithm:

x+ = Sαλ(X − αA∗(r))

with

r := AX − b, Sλ(Z ) = U diag(σ)V T , σ = Sλ(σ(Z ))

Computing Sλ:

No need to compute full SVD of Z , only leading singular values/vectors• Monte-Carlo approach [Ma-Goldfarb-Chen ’08]• Lanczos bidiagonalization (via PROPACK)

[Cai-Candes-Shen ’08; Jain et al ’10]

28 / 40

Typical behavior

minimizeX∈Rn1×n2

‖X‖1 subj to XΩ ≈ b

recover 1000× 1000 matrix Xrank(X∗) = 4

0 10 20 300

100

200

iteration

no.o

fsin

gular

trip

les

29 / 40

Backward-backward splitting

Approximate one of the functions via its (smooth) Moreau envelope.

minimizex

fα(x) + g(x) (both nonsmooth)

fα = f 12α‖ · ‖

2, ∇fα(x) = 1α [x − proxαf (x)]

Proximal gradient.

x+ = proxαg(x − α∇fα(x)

)= proxαg proxαf (x)

Projection onto convex sets (POCS).

f = δC , g = δD

x+ = projD projC(x)solves the problem

minimizex ,z

12‖z − x‖2 subj to x ∈ D, z ∈ C

30 / 40

Alternating direction of method of multipliers (ADMM)

minimizex

f (x) + g(x)

ADMM algorithm.

x+ = proxαf (z − u)z+ = proxαg (x+ + u)u+ = u + x+ − z+

• used by YALL1 [Zhang et al. (2011)]• ADMM algo ≡ Douglas-Rachford ≡(

linearconstraints

) split Bregman

• Esser’s UCLA PhD thesis (’10) ; survey by Boyd et al. (’11)

31 / 40

PROXIMAL SMOOTHING

Primal smoothing

minimizex

f (x) + g(Ax)

Partial smoothing via the Moreau envelope:

minimizex

fµ(x) + g(Ax), fµ = f 12µ‖ · ‖

2

Example (Basis Pursuit). Huber approximation to 1-norm:

f = ‖ · ‖1, g = δb =⇒ minimizex

‖x‖1 subj to Ax = b

Smoothed function is Huber with parameter µ:

fµ(x) =

x2/2µ if |x | ≤ µ|x | − µ/2 otherwise

µ 0 =⇒ x∗µ → x∗

32 / 40

Dual smoothing

Regularize the primal problem.minimize

xf (x) + µ

2 ‖x‖2 + g(Ax)

Dualize. Regularizing f smoothes the conjugate:

(f + µ2 ‖ · ‖

2)∗ = (f ∗ 12µ‖ · ‖

2) = f ∗µ

minimizey

f ∗µ (ATy) + g∗(−y)

Example (Basis Pursuit).f = ‖ · ‖1, g = δb =⇒ min

x‖x‖1 + µ

2 ‖x‖2 subj to Ax = b

f ∗µ = 12µ dist2

B∞ =⇒ maxy

〈b, y〉+ 12µ dist2

B∞(ATy)

Exact regularization: x∗µ = x∗ for all µ ∈ [0, µ), µ > 0[Mangasarian and Meyer (1979); Friedlander and Tseng (2007)]

[Nesterov (2005); Beck and Teboulle (2012)]; TFOCS [Becker et al. (2011)] 33 / 40

CONCLUSION

much left unsaid, eg,• optimal first-order methods [Nesterov (1988, 2005)]• block coordinate descent [Sardy et al. (2004)]• active-set methods (eg, LARS & Homotopy) [Osborne et al. (2000)]

my own work.• avoiding proximal operators [Friedlander et al. (2014) (SIOPT)]• greedy coordinate descent [Nutini et al. (2015) (ICML)]

email• [email protected]

web• www.math.ucdavis.edu/˜mpf

34 / 40

[email protected]

www.math.ucdavis.edu/~mpf

REFERENCES

References I

A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm forlinear inverse problems. SIAM Journal on Imaging Sciences, 2(1):183–202,2009.

A. Beck and M. Teboulle. Smoothing and first order methods: A unifiedframework. SIAM J. Optim., 22(2):557–580, 2012.

S. Becker, E. Candes, and M. Grant. Templates for convex cone problems withapplications to sparse signal recovery. Math. Program. Comp., 3:165–218,2011. doi: 10.1007/s12532-011-0029-5.

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributedoptimization and statistical learning via the alternating direction method ofmultipliers. Foundations and Trends in Machine Learning, 3(1):1–122,2011.

J.-F. Cai, E. J. Candes, and Z. Shen. A singular value thresholding algorithmfor matrix completion. SIAM J. Optim., 4(20):1956–1982, 2010. doi:10.1137/080738970.

E. J. Candes and B. Recht. Exact matrix completion via convex optimization.Found. Comput. Math., 9:717–772, December 2009.

35 / 40

References IIE. J. Candes, J. Romberg, and T. Tao. Stable signal recovery from incomplete

and inaccurate measurements. Comm. Pure Appl. Math., 59(8):1207–1223, 2006.

E. J. Candes, T. Strohmer, and V. Voroninski. Phaselift: Exact and stablesignal recovery from magnitude measurements via convex programming.Commun. Pur. Appl. Ana., 2012.

A. Chai, M. Moscoso, and G. Papanicolaou. Array imaging using intensity-onlymeasurements. Inverse Problems, 27(1):015005, 2011. URLhttp://stacks.iop.org/0266-5611/27/i=1/a=015005.

S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basispursuit. SIAM J. Sci. Comput., 20(1):33–61, 1998.

P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backwardsplitting. Multiscale Model. Simul., 4(4):1168–1200, 2005.

I. Daubechies, M. Defrise, and C. D. Mol. An iterative thresholding algorithmfor linear inverse problems with a sparsity constraint. Comm. Pure Appl.Math., 57:1413–1457, 2004.

J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient projectionsonto the `1-ball for learning in high dimensions. In Proceedings of the 25thInternational Conference on Machine Learning, pages 272–279, 2008.

36 / 40

http://stacks.iop.org/0266-5611/27/i=1/a=015005

References IIIJ. E. Esser. Primal Dual Algorithms for Convex Models and Applications to

Image Restoration, Registration and Nonlocal Inpainting. PhD thesis,University of California, Los Angeles, 2010.

M. Fazel. Matrix rank minimization with applications. PhD thesis, Elec.Eng. Dept, Stanford University, 2002.

M. Figueiredo, R. Nowak, and S. J. Wright. Gradient Projection for SparseReconstruction: Application to Compressed Sensing and Other InverseProblems. Sel. Top. in Signal Process., IEEE J., 1(4):586–597, 2007.

M. A. T. Figueiredo and R. D. Nowak. An EM algorithm for wavelet-basedimage restoration. IEEE Trans. Image Process., 12(8):906–916, August2003.

M. P. Friedlander and P. Tseng. Exact regularization of convex programs.SIAM J. Optim., 18(4):1326–1350, 2007. doi: 10.1137/060675320.

M. P. Friedlander, I. Macedo, and T. K. Pong. Gauge optimization and duality.SIAM J. Optim., 24(4):1999–2022, 2014. doi: 10.1137/130940785.

O. Guler. On the convergence of the proximal point algorithm for convexminimization. SIAM Journal on Control and Optimization, 29(2):403–419,1991. doi: 10.1137/0329022. URL http://dx.doi.org/10.1137/0329022.

37 / 40

http://dx.doi.org/10.1137/0329022

References IVO. Guler. New proximal point algorithms for convex minimization. SIAM

Journal on Optimization, 2(4):649–664, 1992. doi: 10.1137/0802032. URLhttp://dx.doi.org/10.1137/0802032.

M. R. Hestenes. Multiplier and gradient methods. J. Optim. Theory Appl., 4(5):303–320, 1969.

S. Ma, D. Goldfarb, and L. Chen. Fixed point and Bregman iterative methodsfor matrix rank minimization. Math. Program., 128:321–353, 2011. doi:10.1007/s10107-009-0306-5.

D. Malioutov, M. Cetin, and A. S. Willsky. A sparse signal reconstructionperspective for source localization with sensor arrays. IEEE Trans. Sig.Proc., 53(8):3010–3022, August 2005.

O. L. Mangasarian and R. R. Meyer. Nonlinear perturbation of linear programs.SIAM J. Control Optim., 17(6):745–752, November 1979.

B. Martinet. Regularisation d’inequations variationnelles par approximationssuccessives. Rev. FranÂÿcaise Informat. Recherche OpÂťerationnelle, 4(Ser. R-3):154âĂŞ158, 1970.

Y. Nesterov. On an approach to the construction of optimal methods ofminimization of smooth convex functions. Ekonom. i. Mat. Metody, 24,1988.

38 / 40

http://dx.doi.org/10.1137/0802032

References VY. Nesterov. Smooth minimization of non-smooth functions. Math. Program.,

103:127–152, 2005.P. Netrapalli, P. Jain, and S. Sanghavi. Phase retrieval using alternating

minimization. In C. Burges, L. Bottou, M. Welling, Z. Ghahramani, andK. Weinberger, editors, Advances in Neural Information ProcessingSystems 26, pages 2796–2804. Curran Associates, Inc., 2013. URLhttp://papers.nips.cc/paper/5041-pha.

J. Nutini, I. Laradji, M. Schmidt, and M. Friedlander. Coordinate descentconverges faster with the gauss-southwell rule than random selection. InInter. Conf. Mach. Learning, 2015.

M. R. Osborne, B. Presnell, and B. A. Turlach. A new approach to variableselection in least squares problems. IMA J. Numer. Anal., 20(3):389–403,2000.

B. Polyak and N. Tretyakov. The method of penalty bounds for constrainedextremum problems. Zh. Vych. Mat i Mat. Fiz, 13:34–46, 1973.

M. J. D. Powell. A method for nonlinear constraints in minimization problems.In R. Fletcher, editor, Optimization, chapter 19. Academic Press, Londonand New York, 1969.

39 / 40

http://papers.nips.cc/paper/5041-pha

References VIR. T. Rockafellar. Augmented Lagrangians and applications of the proximal

point algorithm in convex programming. Mathematics of OperationsResearch, 1(2):97–116, 1976.

S. Sardy, A. Antoniadis, and P. Tseng. Automatic smoothing with wavelets fora wide class of distributions. J. Comput. Graph. Stat., 13, 2004.

E. van den Berg and M. P. Friedlander. Probing the pareto frontier for basispursuit solutions. SIAM J. Optim., 31(2):890–912, 2008. doi:10.1137/080714488.

I. Waldspurger, A. dâĂŹAspremont, and S. Mallat. Phase recovery, maxcut andcomplex semidefinite programming. Math. Program., 149(1-2):47–81, 2015.

W. Yin, S. Osher, D. Goldfarb, and J. Darbon. Bregman iterative algorithmsfor l1 minimization with applications to compressed sensing. SIAM J. Imag.Sci., 1, 2008.

Y. Zhang, W. Deng, J. Yang, , and W. Yin, 2011. URLhttp://yall1.blogs.rice.edu/.

40 / 40

http://yall1.blogs.rice.edu/

Documents

Michael P. Friedlander University of California, Davis ...mpf/pdfs/2015-Friedlander-tutorial-google.pdf · Algorithms for sparse optimization Michael P. Friedlander University of