27
Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani

Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

Convex Optimization

CMU-10725Quasi Newton Methods

Barnabás Póczos & Ryan Tibshirani

Page 2: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

2

Quasi Newton Methods

Page 3: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

3

Outline

� Modified Newton Method

� Rank one correction of the inverse

� Rank two correction of the inverse

� Davidon–Fletcher–Powell Method (DFP)

� Broyden–Fletcher–Goldfarb–Shanno Method (BFGS)

Page 4: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

4

Books to Read

David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming

Nesterov: Introductory lectures on convex optimization

Bazaraa, Sherali, Shetty: Nonlinear Programming

Dimitri P. Bestsekas: Nonlinear Programming

Page 5: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

5

Motivation

Quasi Newton:

somewhere between steepest descent and Newton’s method

Evaluation and use of the Hessian matrix is impractical or costly

Idea:

use an approximation to the inverse Hessian.

Motivation:

Page 6: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

6

Modified Newton Method

Goal:

Gradient descent:

Newton method:

Modified Newton method: [Method of Deflected Gradients]

Special cases:

Page 7: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

7

Modified Newton Method

Lemma [Descent direction]

Proof:

We know that if a vector has negative inner product with the gradient vector, then that direction is a descent direction

Page 8: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

8

Quadratic problem

Modified Newton Method update rule:

Lemma [αk in quadratic problems]

Page 9: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

9

Quadratic problem

Lemma [αk in quadratic problems]

Proof [αk]

Page 10: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

10

Convergence rate (Quadratic case)

Then for the modified Newton method it holds at every step k

Theorem [Convergence rate of the modified Newton method]

Corollary

If Sk-1 is close to Q, then bk is close to Bk, and then convergence is fast

Proof: No time for it…

Page 11: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

11

Classical modified Newton’s method

The Hessian at the initial point x0 is used throughout the process.

The effectiveness depends on how fast the Hessian is changing.

Classical modified Newton:

Page 12: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

12

Construction of the inverse of the Hessian

Page 13: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

13

Construction of the inverse

Idea behind quasi-Newton methods: construct the approximation of the inverse Hessian using information gathered during the process

We show how the inverse Hessian can be built up from gradient information obtained at various points.

In the quadratic case

Notation:

Page 14: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

14

Construction of the inverse

Quadratic case:

Goal:

Page 15: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

15

Symmetric rank one correction (SR1)

We want an update on Hk such that :

Let us find the update in this form [Rank one correction]

Theorem [Rank one update of Hk]

Page 16: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

16

Therefore,

Proof:

Q.E.D.

We already know that

Symmetric rank one correction (SR1)

Page 17: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

17

We still have to proof that this update will be good for us:

Theorem [Hk update works]

Corollary

Symmetric rank one correction (SR1)

Page 18: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

18

Issues:

Algorithm: [Modified Newton method with rank 1 correction]

Symmetric rank one correction (SR1)

Page 19: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

19

Davidon–Fletcher–Powell Method[Rank two correction]

Page 20: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

20

Davidon–Fletcher–Powell Method

� For a quadratic objective, it simultaneously generates the directions of the conjugate gradient method while constructing the inverse Hessian.

� At each step the inverse Hessian is updated by the sum of two symmetric rank one matrices. [rank two correction procedure]

� The method is also often referred to as the variable metric method

Page 21: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

21

Davidon–Fletcher–Powell Method

Page 22: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

22

Davidon–Fletcher–Powell Method

Theorem [Hk is positive definite]

Theorem [DFP is a conjugate direction method]

Corollary [finite step convergence for quadratic functions]

Page 23: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

23

Broyden–Fletcher–Goldfarb–Shanno

In DFP, at each step the inverse Hessian is updated by the sum of two symmetric rank one matrices.

BFGS we will estimate the Hessian Q, instead of its inverse

In the quadratic case we already proved:

To estimate H, we used the update:

Therefore, if we switch q and p, then Q can be estimated as well with Qk

Page 24: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

24

BFGS

Similarly, the DFP update rule for H is

Switching q and p, this can also be used to estimate Q:

In the minimization algorithm, however, we will need an estimator of Q-1

To get an update for Hk+1, let us use the Sherman-Morrison formula twice

Page 25: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

25

Sherman-Morrison matrix inversion formula

Page 26: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

26

BFGS Algorithm

BFGS is almost the same as DFP, only the H update is different.

In practice BFGS seems to work better than DFP.

Page 27: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11... · 2013. 12. 9. · Symmetric rank one correction (SR1) 19 Davidon–Fletcher–Powell Method [Rank two correction]

27

Summary

� Modified Newton Method

� Rank one correction of the inverse

� Rank two correction of the inverse

� Davidon–Fletcher–Powell Method (DFP)

� Broyden–Fletcher–Goldfarb–Shanno Method (BFGS)