23
Accelerated primal-dual methods for linearly constrained convex problems Yangyang Xu SIAM Conference on Optimization May 24, 2017 1 / 23

Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

  • Upload
    others

  • View
    7

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Accelerated primal-dual methodsfor linearly constrained convex problems

Yangyang Xu

SIAM Conference on Optimization

May 24, 2017

1 / 23

Page 2: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Accelerated proximal gradient

For convex composite problem: minimizex

F (x) := f(x) + g(x)

• f : convex and Lipschitz differentiable

• g: closed convex (possibly nondifferentiable) and simple

Proximal gradient:xk+1 = arg min

x

〈∇f(xk), x〉+Lf

2‖x− xk‖2 + g(x)

• convergence rate: F (xk)− F (x∗) = O(1/k)

Accelerated Proximal gradient [Beck-Teboulle’09, Nesterov’14]:xk+1 = arg min

x

〈∇f(xk), x〉+Lf

2‖x− xk‖2 + g(x)

• xk: extrapolated point

• convergence rate (with smart extrapolation): F (xk)− F (x∗) = O(1/k2)

This talk: ways to accelerate primal-dual methods

2 / 23

Page 3: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Part I: accelerated linearized augmented Lagrangian

3 / 23

Page 4: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Affinely constrained composite convex problems

minimizex

F (x) = f(x) + g(x), subject to Ax = b (LCP)

• f : convex and Lipschitz differentiable

• g: closed convex and simple

Examples

• nonnegative quadratic programming: f = 12x>Qx+ c>x, g = ιRn

+

• TV image denoising: min{ 12‖X −B‖

2F + λ‖Y ‖1, s.t. D(X) = Y }

4 / 23

Page 5: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Augmented Lagrangian method (ALM)

At iteration k,

xk+1 ← arg minx

f(x) + g(x)− 〈λk, Ax〉+ β

2 ‖Ax− b‖2,

λk+1 ← λk − γ(Axk+1 − b)

• augmented dual gradient ascent with stepsize γ

• β: penalty parameter; dual gradient Lipschitz constant 1/β

• 0 < γ < 2β: convergence guaranteed

• also popular for (nonlinear, nonconvex) constrained problems

x-subproblem as difficult as original problem

5 / 23

Page 6: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Linearized augmented Lagrangian method

• Linearize the smooth term f :

xk+1 ← arg minx

〈∇f(xk), x〉+η

2‖x− xk‖2 + g(x)− 〈λk, Ax〉+

β

2‖Ax− b‖2.

• Linearize both f and ‖Ax− b‖2:

xk+1 ← arg minx

〈∇f(xk), x〉+ g(x)− 〈λk, Ax〉+ 〈βA>rk, x〉+η

2‖x− xk‖2,

where rk = Axk − b is the residual.

Easier updates and nice convergence speed O(1/k)

6 / 23

Page 7: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Accelerated linearized augmented Lagrangian method

At iteration k,

xk ← (1− αk)xk + αkxk,

xk+1 ← arg minx

〈∇f(xk)−A>λk, x〉+ g(x) +βk

2‖Ax− b‖2 +

ηk

2‖x− xk‖2,

xk+1 ← (1− αk)xk + αkxk+1,

λk+1 ← λk − γk(Axk+1 − b).

• Inspired by [Lan ’12] on accelerated stochastic approximation

• reduces to linearized ALM if αk = 1, βk = β, ηk = η, γk = γ, ∀k• convergence rate: O(1/k) if η ≥ Lf and 0 < γ < 2β

• adaptive parameters to have O(1/k2) (next slides)

7 / 23

Page 8: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Better numerical performance

Objective error Feasibility Violation

0 200 400 600 800 100010

−6

10−5

10−4

10−3

10−2

10−1

100

Iteration numbers

|obj

ectiv

e m

inus

opt

imal

val

ue|

Nonaccelerated ALMAccelerated ALM

0 200 400 600 800 100010

−10

10−8

10−6

10−4

10−2

100

Iteration numbers

viol

atio

n of

feas

ibili

ty

Nonaccelerated ALMAccelerated ALM

• Tested on quadratic programming (subproblems solved exactly)

• Parameters set according to theorem (see next slide)

• Accelerated ALM significantly better

8 / 23

Page 9: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Guaranteed fast convergence

Assumptions:

• There is a pair of primal-dual solution (x∗, λ∗).

• ∇f is Lipschitz continuous: ‖∇f(x)−∇f(y)‖ ≤ Lf‖x− y‖

Convergence rate of order O(1/k2):

• Set parameters to

∀k : αk =2

k + 1, γk = kγ, βk ≥

γk

2, ηk =

η

k,

where γ > 0 and η ≥ 2Lf . Then

|F (xk+1)− F (x∗)| ≤1

k(k + 1)

(η‖x1 − x∗‖2 +

4‖λ∗‖2

γ

),

‖Axt+1 − b‖ ≤1

k(k + 1) max(1, ‖λ∗‖)

(η‖x1 − x∗‖2 +

4‖λ∗‖2

γ

),

9 / 23

Page 10: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Sketch of proof

Let Φ(x, x, λ) = F (x)− F (x)− 〈λ,Ax− b〉.

1. Fundamental inequality (for any λ):

Φ(xk+1, x∗, λ)− (1− αk)Φ(xk, x∗, λ)

≤−αkηk2

[‖xk+1 − x∗‖2 − ‖xk − x∗‖2 + ‖xk+1 − xk‖2

]+ α2

kLf

2 ‖xk+1 − xk‖2

+ αk2γk

[‖λk − λ‖2 − ‖λk+1 − λ‖2 + ‖λk+1 − λk‖2

]− αkβk

γ2k

‖λk+1 − λk‖2,

2. αk = 2k+1 , γk = kγ, βk ≥ γk

2 , ηk = ηk

and multiply k(k+ 1) to the above ineq.:

k(k + 1)Φ(xk+1, x∗, λ)− k(k − 1)Φ(xk, x∗, λ)

≤− η[‖xk+1 − x∗‖2 − ‖xk − x∗‖2

]+

[‖λk − λ‖2 − ‖λk+1 − λ‖2

].

3. Set λ1 = 0 and sum the above inequality over k:

Φ(xk+1, x∗, λ) ≤1

k(k + 1)

(η‖x1 − x∗‖2 +

1γ‖λ‖2

)4. Take λ = max (1 + ‖λ∗‖, 2‖λ∗‖) Axk+1−b

‖Axk+1−b‖ and use the optimality conditionΦ(x, x∗, λ∗) ≥ 0⇒ F (xk+1)− F (x∗) ≥ −‖λ∗‖ · ‖Axk+1 − b‖

10 / 23

Page 11: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Literature

• [He-Yuan ’10]: accelerated ALM to O(1/k2) for smooth problems

• [Kang et. al ’13]: accelerated ALM to O(1/k2) for nonsmooth problems

• [Huang-Ma-Goldfarb ’13]: accelerated linearized ALM (with linearization ofaugmented term) to O(1/k2) for strongly convex problems

11 / 23

Page 12: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Part II: accelerated linearized ADMM

12 / 23

Page 13: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Two-block structured problems

Variable is partitioned into two blocks, smooth part involves one block, andnonsmooth part is separable

minimizey,z

h(y) + f(z) + g(z), subject to By + Cz = b (LCP-2)

• f convex and Lipschitz differentiable

• g and h closed convex and simple

Examples:

• Total-variation regularized regression:{

miny,z

λ‖y‖1 + f(z), s.t. Dz = y}

13 / 23

Page 14: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Alternating direction method of multipliers (ADMM)

At iteration k,

yk+1 ← arg miny

h(y)− 〈λk, By〉+ β

2 ‖By + Czk − b‖2,

zk+1 ← arg minz

f(z) + g(z)− 〈λk, Cz〉+ β

2 ‖Byk+1 + Cz − b‖2,

λk+1 ← λk − γ(Byk+1 + Czk+1 − b)

• 0 < γ < 1+√

52 β: convergence guaranteed [Glowinski-Marrocco’75]

• updating y, z alternatingly: easier than jointly update• but z-subproblem can still be difficult

14 / 23

Page 15: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Accelerated linearized ADMM

At iteration k,

yk+1 ← arg miny

h(y)− 〈λk, By〉+βk

2‖By + Czk +−b‖2,

zk+1 ← arg minz

〈∇f(zk)− C>λk + βkC>rk+ 1

2 , z〉+ g(z) +ηk

2‖z − zk‖2,

λk+1 ← λk − γk(Byk+1 + Czk+1 − b)

where rk+ 12 = Byk+1 + Czk − b.

• reduces to linearized ADMM if βk = β, ηk = η, γk = γ, ∀k

• convergence rate: O(1/k) if 0 < γ ≤ β and η ≥ Lf + β‖C‖2

• O(1/k2) if adaptive parameters and strong convexity on z (next two slides)

15 / 23

Page 16: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Accelerated convergence speed

Assumptions:

• Existence of a pair of primal-dual solution (y∗, z∗, λ∗)• ∇f Lipschitz continuous: ‖∇f(z)−∇f(z)‖ ≤ Lf‖z − z‖

• f strongly convex with modulus µf (not required for y)

Convergence rate of order O(1/k2)

• Set parameters as follows (with γ > 0 and γ < η ≤ µf/2)

∀k : βk = γk = (k + 1)γ, ηk = (k + 1)η + Lf ,

Then

max(‖zk − z∗‖2, |F (yk, zk)− F ∗|, ‖Byk + Czk − b‖

)≤ O(1/k2),

where F (y, z) = h(y) + f(z) + g(z) and F ∗ = F (y∗, z∗).

16 / 23

Page 17: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Sketch of proof

1. Fundamental inequality from optimality conditions of each iterate:

F (yk+1, zk+1)− F (y, z)− 〈λ,Byk+1 + Czk+1 − b〉

≤−⟨

1γk

(λk − λk+1), λ− λk + βkγk

(λk − λk+1)− βkC(zk+1 − zk)⟩

+ Lf

2 ‖zk+1 − zk‖2 − µf

2 ‖zk − z‖2 − ηk〈zk+1 − z, zk+1 − zk〉,

2. Plug in parameters and bound cross terms:

F (yk+1, zk+1)− F (y∗, z∗)− 〈λ,Byk+1 + Czk+1 − b〉

+ 12

(η(k + 1)‖zk+1 − z∗‖2 + Lf‖zk+1 − z∗‖2

)+ 1

2γ(k+1)‖λ− λk+1‖2

≤ 12

(η(k + 1)‖zk − z∗‖2 + (Lf − µf )‖zk − z∗‖2

)+ 1

2γ(k+1)‖λ− λk‖2.

3. Multiply k + k0 (here k0 ∼2Lf

µf) and sum the inequality over k:

F (yk+1, zk+1)− F (y∗, z∗)− 〈λ,Byk+1 + Czk+1 − b〉 ≤φ(y∗, z∗, λ)

k2

4. Take a special λ and use KKT conditions

17 / 23

Page 18: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Literature

• [Ouyang et. al’15]: O(Lf/k2 + C0/k) with only weak convexity

• [Goldstein et. al’14]: O(1/k2) with strong convexity on both y and z

• [Chambolle-Pock’11, Chambolle-Pock’16, Dang-Lan’14, Bredies-Sun’16]:accelerated first-order methods on bilinear saddle-point problems

Open question: weakest conditions to have O(1/k2)

18 / 23

Page 19: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Numerical experiments(More results in paper)

19 / 23

Page 20: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Accelerated (linearized) ADMM

Tested problem: total-variation regularized image denoising

minimizeX,Y

12‖X −B‖

2F + µ‖Y ‖1, subject to DX = Y. (TVDN)

• B observed noisy Cameraman image, and D finite difference operator

Compared methods:

• original ADMM

• accelerated ADMM

• linearized ADMM

• accelerated linearized ADMM

• accelerated Chambolle-Pock

20 / 23

Page 21: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Performance of compared methods

0 100 200 300 400 50010

−8

10−6

10−4

10−2

100

102

104

Iteration numbers

|obj

ectiv

e m

inus

opt

imal

val

ue|

Accelerated ADMMAccelerated Linearized ADMMNonaccelerated ADMMNonaccelerated Linearized ADMMChambolle−Pock

0 10 20 30 40 5010

−15

10−10

10−5

100

105

Running time (sec.)

|obj

ectiv

e m

inus

opt

imal

val

ue|

Accelerated ADMMAccelerated Linearized ADMMNonaccelerated ADMMNonaccelerated Linearized ADMMChambolle−Pock

• Accelerated (linearized) ADMM significantly better than nonaccelerated one

• (accelerated) ADMM faster than (accelerated) linearized ADMM regardingiteration number (but the latter takes less time)

21 / 23

Page 22: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

Conclusions

• accelerated linearized ALM to O(1/k2) from O(1/k) with merely convexity

• accelerated (linearized) ADMM to O(1/k2) from O(1/k) with strongconvexity on one block variable

• performed numerical experiments

22 / 23

Page 23: Accelerated primal-dual methods for linearly constrained ... · Accelerated first-order primal-dual proximal methods for linearly constrained composite convex programming, SIAM J

References

1. Y. Xu. Accelerated first-order primal-dual proximal methods for linearlyconstrained composite convex programming, SIAM J. Optimization, 2017.

2. T. Goldstein, B. O’Donoghue, S. Setzer, and R. Baraniuk. Fast alternatingdirection optimization methods, SIAM J. on Imaging Sciences, 2014.

3. B. He and X. Yuan. On the acceleration of augmented Lagrangian method forlinearly constrained optimization, Optimization Online, 2010.

4. B. Huang, S. Ma, and D. Goldfarb. Accelerated linearized Bregman method,Journal of Scientific Computing, 2013.

5. M. Kang, S. Yun, H. Woo, and M. Kang. Accelerated bregman method forlinearly constrained `1-`2 minimization, Journal of Scientific Computing, 2013.

23 / 23