29
Quasi-Newton Methods of Optimization Lecture 2

Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Embed Size (px)

Citation preview

Page 1: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Quasi-Newton Methods of Optimization

Lecture 2

Page 2: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

General Algorithm

A Baseline Scenario• Algorithm U (Model algorithm for n-

dimensional unconstrained minimization). Let xk be the current estimate of x*.

– U1. [Test for convergence] If the conditions for convergence are satisfied, the algorithm terminates with xk as the solution.

– U2. [Compute a search direction] Compute a non-zero n-vector pk, the direction of the search.

Page 3: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

General Algorithm

– U3. [Compute a step length] Compute a scalar ak, the step length, for which f(xk + akpk )<f(xk).

– U4. [Update the estimate of the minimum] Set xk+1 = xk + ak pk, k=k+1, and go back to step U1.

• Given the steps to the prototype algorithm, I want to develop a sample problem that we can compare the various algorithms against.

4321

1.4

4.3

3.2

2.1

32100

max

xxxxst

xxxxx

Page 4: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

General Algorithm

– Using Newton-Raphson, the optimal point for this problem is found in 10 iterations using 1.23 seconds on the DEC Alpha.

Page 5: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm An Overview of Newton and Quasi-Newton

Algorithms• The Newton-Raphson methodology can be

used in U2 in the prototype algorithm. Specifically, the search direction can be determined by:

p f x f xk xx k x k 2 1

( ) ( )

Page 6: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm

• Quasi-Newton algorithms involve an approximation to the Hessian matrix. For example, we could replace the Hessian matrix with the negative of the identity matrix for the maximization problem. In this case the search direction would be:

p I n f xk x k ( ) ( )

Page 7: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm

• This replacement is referred to as the steepest descent method. In our sample problem, this methodology requires 990 iterations and 29.28 seconds on the DEC Alpha.

– The steepest descent method requires more overall iterations. In this example, the steepest descent method requires 99 times as many iterations as the Newton-Raphson method.

Page 8: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm

– Typically, the time spent on each iteration is reduced. Again, in the current comparison each the steepest descent method requires .123 seconds per iteration while Newton-Raphson requires .030 seconds per iteration.

Page 9: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm

• Obviously substituting the identity matrix uses no real information from the Hessian matrix. An alternative to this drastic reduction would be to systematically derive a matrix Hk which uses curvature information akin to the Hessian matrix. The projection could then be derived as:

p H f xk k x k 1 ( )

Page 10: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm Conjugate Gradient Methods

• One class of Quasi-Newton methods are the conjugate gradient methods which “build” up information on the Hessian matrix.

– From our standard starting point, we take a Taylor series expansion around the point xk + sk

x k k x k xx k kf x s f x f x s( ) ( ) ( )2

Page 11: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm

for some sk N(xk,). Solving this expression for the term involving the Hessian yields

x k k x k xx k k

k x k k x k k xx k k

f x s f x f x s

s f x s f x s f x s

( ) ( ) ( )

' ( ( ) ( )) ' ( )

2

2

Page 12: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm

y f x s f xk x k k x k ( ) ( )

y f x sk xx k k2 ( )

y B sk k k 1

Page 13: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm One way to generate Bk+1 is to start with the

current Bk and add new information on the current solution

B B uv

y B uv sk k

k k k

1 '

( ' )

Page 14: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm

u v s y B sk k k k( ' )

uv s

y B sk

k k k 1

'( )

B Bv s

y B s vk kk

k k k 1

1

'( ) '

Page 15: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm

• The Rank-One update then involves choosing v to be yk + Bksk. Among other things, this update will yield a symmetric Hessian matrix:

B By B s s

y B s y B sk kk k k k

k k k k k k

1

1

( )'( )( )'

Page 16: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Derivation of the Quasi-Newton Algorithm

• Other than the Rank-One update, no simple vector will result in a symmetric Hessian. An alternative is to reconfigure the Hessian by letting the numeric be the 1/2 the sum of a numeric approximation plus itself transposed. This procedure yields the general update:

B Bv s

y B s v v y B sy B s s

v sv vk k

kk k k k k k

k k k k

k

1 2

1

'( ) ' ( )'

( )'

( ' )'

Page 17: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

DFP and BFGS

• Two prominent conjugate gradient methods are the Davidon-Fletcher-Powell (DFP) update and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) update.

– In the DFP update v is set equal to yk yielding

B Bs B s

B s s By s

y y s B s w w

wy s

ys B s

B s

k kk k k

k k k kk k

k k k k k k k

kk k

kk k k

k k

1

1 1

1 1

''

'' ' '

' '

Page 18: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

DFP and BFGS

– The BFGS update is then

B Bs B s

B s s By s

y yk kk k k

k k k kk k

k k 1

1 1

''

''

Page 19: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

DFP and BFGS

A Numerical Example• Using the previously specified problem and

starting with an identity matrix as the original Hessian matrix, each algorithm was used to maximize the utility function.

Bt

1 0 0

1 0

1

Bt*

. . .

. .

.

5275 2885 0718

6085 0954

2244

Page 20: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

DFP and BFGS

• In discussing the difference in step, I will focus on two attributes.

– The first attribute is the relative length of the step (the 2-norm).

– The second attribute is the direction of the step. Dividing each vector by its 2-norm yields yields a normalized direction of the search

st . . .7337 9766 2428

st* . . . 4 3559 4 3476 4 3226

Page 21: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

DFP and BFGS

x1 x2 x3 x1 x2 x3

Newton-Raphson 4.36 4.35 4.32 7.52 0.58 0.58 0.57Conjugate Gradient 0.73 0.98 0.24 1.25 0.59 0.78 0.19

Page 22: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Relative Performance

– The Rank One Approximation• Iteration 1

B

B

t

t

. . .

. .

.

. . .

. .

.

*

6690 3842 1536

5540 1783

9287

2780 1276 0511

2501 0592

2280

x1 x2 x3 x1 x2 x3

Newton-Raphson 6.75 7.63 4.83 11.27 0.60 0.68 0.43Conjugate Gradient 4.32 5.01 2.00 6.91 0.62 0.73 0.29

Page 23: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Relative Performance

• Iteration 2

B

B

t

t

. . .

. .

.

. . .

. .

.

*

6315 4256 1776

5083 2048

9134

0628 0171 0106

0612 0113

0846

x1 x2 x3 x1 x2 x3

Newton-Raphson 10.85 11.69 6.08 17.07 0.64 0.68 0.36Conjugate Gradient 7.73 8.91 3.79 12.39 0.62 0.72 0.31

Page 24: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Relative Performance

– PSB• Iteration 1 B

B

t

t

. . .

. .

.

. . .

. .

.

*

6703 3860 1504

5565 1827

9365

2780 1276 0511

2501 0592

2280

x1 x2 x3 x1 x2 x3

Newton-Raphson 10.85 11.69 6.08 17.07 0.64 0.68 0.36Conjugate Gradient 7.73 8.91 3.79 12.39 0.62 0.72 0.31

Page 25: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Relative Performance

• Iteration 2

B

B

t

t

. . .

. .

.

. . .

. .

.

*

6328 4274 1745

5109 2096

9223

0629 0171 0106

0612 0114

0850

x1 x2 x3 x1 x2 x3

Newton-Raphson 10.85 11.70 6.07 17.07 0.64 0.69 0.36Conjugate Gradient 7.72 8.91 3.78 12.38 0.62 0.72 0.30

Page 26: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Relative Performance

– DFP• Iteration 1 B

B

t

t

. . .

. .

.

. . .

. .

.

*

7187 4517 0326

6455 3424

12232

2780 1276 0511

2501 0592

2280

x1 x2 x3 x1 x2 x3

Newton-Raphson 6.75 7.63 4.83 11.27 0.60 0.68 0.43Conjugate Gradient 4.14 5.01 1.76 6.73 0.61 0.74 0.26

Page 27: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Relative Performance

• Iteration 2

B

B

t

t

. . .

. .

.

. . .

. .

.

*

6788 4945 0589

6021 3766

12194

0653 0177 0119

0602 0124

0971

x1 x2 x3 x1 x2 x3

Newton-Raphson 6.75 7.63 4.83 11.27 0.60 0.68 0.43Conjugate Gradient 4.14 5.01 1.76 6.73 0.61 0.74 0.26

Page 28: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Relative Performance

– BFGS• Iteration 1

B

B

t

t

. . .

. .

.

. . .

. .

.

*

6771 3952 1338

5690 2051

9768

2780 1276 0511

2501 0593

2280

x1 x2 x3 x1 x2 x3

Newton-Raphson 10.81 11.93 5.80 17.11 0.63 0.70 0.34Conjugate Gradient 7.52 9.06 3.40 12.26 0.61 0.74 0.28

Page 29: Quasi-Newton Methods of Optimization Lecture 2. General Algorithm n A Baseline Scenario Algorithm U (Model algorithm for n- dimensional unconstrained

Relative Performance

• Iteration 2

B

B

t

t

. . .

. .

.

. . .

. .

.

*

6391 4369 1585

5238 2333

9644

0634 0172 0109

0610 0115

0871

x1 x2 x3 x1 x2 x3

Newton-Raphson 10.84 11.74 6.02 17.08 0.63 0.69 0.35Conjugate Gradient 7.69 8.94 3.71 12.36 0.62 0.72 0.30