Multivariate Optimization - Web Services Overview - Portland State

Example 1: Optimization Problem

a1

a 2

−5 −4 −3 −2 −1 0 1 2 3 4 5−5

0

5

J. McNames Portland State University ECE 4/557 Multivariate Optimization Ver. 1.14 3

Overview of Multivariate Optimization Topics

• Problem definition

• Algorithms

– Cyclic coordinate method

– Steepest descent

– Conjugate gradient algorithms

– PARTAN

– Newton’s method

– Levenberg-Marquardt

• Concise, subjective summary



a1

a 2

−5 −4 −3 −2 −1 0 1 2 3 4 5−5

0

5


Multivariate Optimization Overview

• The “unconstrained optimization” problem is a generalization ofthe line search problem

• Find a vector a such that

a∗ = argmina

f(a)

• Note that the are no constraints on a

• Example: Find the vector of coefficients (w ∈ Rp×1) that

minimize the average absolute error of a linear model

• Akin to a blind person trying to find their way to the bottom of avalley in a multidimensional landscape

• We want to reach the bottom with the minimum number of “canetaps”

• Also vaguely similar to taking core samples for oil prospecting






Example 1: MATLAB Code

function [] = OptimizationProblem ();

%==============================================================================

% User -Specified Parameters

%==============================================================================

x = -5:0.05 :5;

y = -5:0.05 :5;

%==============================================================================

% Evaluate the Function

%==============================================================================

[X,Y] = meshgrid(x,y);

[Z,G] = OptFn(X,Y);

functionName = ’OptimizationProblem ’;

fileIdentifier = fopen([ functionName ’.tex’],’w’);

%==============================================================================

% Contour Map

%==============================================================================

figure;

FigureSet(2,’Slides’);

contour(x,y,Z ,50);

xlabel(’a_1’);

ylabel(’a_2’);

zoom on;

AxisSet (8);

fileName = sprintf(’%s-%s’,functionName ,’Contour ’);




case 1, view (45 ,10);

case 2, view ( -55 ,22);

case 3, view ( -131 ,10);

otherwise , error(’Not implemented. ’);

end

fileName = sprintf(’%s-%s%d’,functionName ,’Surface ’,c1);

print(fileName ,’-depsc’);

fprintf(fileIdentifier ,’%%=============================================================================

fprintf(fileIdentifier ,’\\ newslide\n’);

fprintf(fileIdentifier ,’\\ slideheading{Example \\ arabic{exc}: Optimization Problem }\n’);

fprintf(fileIdentifier ,’%%=============================================================================

fprintf(fileIdentifier ,’\\ includegraphics[scale =1]{ Matlab/%s}\n’,fileName );

fprintf(fileIdentifier ,’\n’);

end

%==============================================================================

% List the MATLAB Code

%==============================================================================

fprintf(fileIdentifier ,’%%==============================================================================\ n’

fprintf(fileIdentifier ,’\\ newslide \n’);

fprintf(fileIdentifier ,’\\ slideheading{Example \\ arabic{exc}: MATLAB Code}\n’);


fprintf(fileIdentifier ,’\t \\ matlabcode{Matlab/%s.m}\n’,functionName );

fclose(fileIdentifier );





fprintf(fileIdentifier ,’\\ stepcounter{exc}\n’);





%==============================================================================

% Quiver Map

%==============================================================================

figure;


axis([-5 5 -5 5]);

contour(x,y,Z ,50);

h = get(gca ,’Children ’);

set(h,’LineWidth ’,0.2);

hold on;

xCoarse = -5:0.5:5;

yCoarse = -5:0.5:5;

[X,Y] = meshgrid(xCoarse ,yCoarse );

[ZCoarse ,GCoarse] = OptFn(X,Y);

nr = size(xCoarse ,1);

dzx = GCoarse( 1:nr ,1:nr);

dzy = GCoarse(nr + (1:nr),1:nr);

quiver(xCoarse ,yCoarse ,dzx ,dzy);

hold off;

xlabel(’a_1’);

ylabel(’a_2’);

zoom on;


Global Optimization?

• In general, all optimization algorithms find a local minimum in asfew steps as possible

• There are also “global” optimization algorithms based on ideassuch as

– Evolutionary computing

– Genetic algorithms

– Simulated annealing

• None of these guarantee convergence in a finite number ofiterations

• All require a lot of computation


AxisSet (8);

fileName = sprintf(’%s-%s’,functionName ,’Quiver’);








%==============================================================================

% 3D Maps

%==============================================================================

figure;

set(gcf ,’Renderer ’,’zbuffer ’);


h = surf(x,y,Z);

set(h,’LineStyle ’,’None’);

xlabel(’a_1’);

ylabel(’a_2’);

shading interp;

grid on;

AxisSet (8);

hl = light(’Position ’ ,[0,0,30]);

set(hl ,’Style’,’Local’);

set(h,’BackFaceLighting ’,’unlit’)

material dull

for c1=1:3

switch c1


Cyclic Coordinate Method

1. For i = 1 to p,

ai := argminα

f([a1, a2, . . . , ai−1, α, ai+1, . . . , ap])

2. Loop to 1 until convergence

+ Simple to implement

+ Each line search can be performed semi-globally to avoid shallowlocal minima

+ Can be used with nominal variables

+ f(a) can be discontinuous

+ No gradient required

− Very slow compared to gradient-based optimization algorithms

− Usually only practical when the number of parameters, p, is small

• There are modified versions with faster convergence


Optimization Comments

• Ideally, when we construct models we should favor those whichcan be optimized with few shallow local minima and reasonablecomputation

• Graphically you can think of the function to be minimized as theelevation in a complicated high-dimensional landscape

• The problem is to find the lowest point

• The most common approach is to go downhill

• The gradient points in the most “uphill” direction

• The steepest downhill direction is the opposite of the gradient

• Most optimization algorithms use a line search algorithm

• The methods mostly differ only in the way that the “direction ofdescent” is generated


Example 2: Cyclic Coordinate Method

−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

X

Y


Optimization Algorithm Outline

• The basic steps of these algorithms is as follows

1. Pick a starting vector a

2. Find the direction of descent, d

3. Move in that direction until a minimum is found:

α∗ := argminα

f(a + αd)

a := a + α∗d


• Most of the theory of these algorithms is based on quadraticsurfaces

• Near local minima, this is a good approximation

• Note that the functions should (must) have continuous gradients(almost) everywhere



0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror



−3 −2 −1 0−3.5

−3

−2.5

−2

−1.5

−1

−0.5

0

0.5

X

Y


Example 2: Relevant MATLAB Code

function [] = CyclicCoordinate ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = -1;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);

[z,dzx ,dzy] = OptFn(x,y);

a(1,:) = [x y];

f(1) = z;

for cnt = 2:ns,

if rem(cnt ,2)==1 ,

d = [1 0]’; % Along x direction

else

d = [0 1]’; % Along y direction

end;

[b,fmin] = LineSearch ([x y]’,d,b0 ,ls);

x = x + b*d(1);

y = y + b*d(2);



0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue


print -depsc CyclicCoordinateContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;

xerr = (sum (((a-ones(ns ,1)*[ xopt2 yopt2])’).^2)’).^(1/2);

h = plot(k-1,xerr ,’b’);

set(h(1),’Marker’,’.’);

set(h,’MarkerSize ’ ,6);

xlabel(’Iteration ’);

ylabel(’Euclidean Position Error’);

xlim ([0 ns -1]);

ylim ([0 xerr (1)]);

grid on;

set(gca ,’Box’,’Off’);

AxisSet (8);

print -depsc CyclicCoordinatePositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;

h = plot(k-1,f,’b’ ,[0 ns],zopt *[1 1],’r’ ,[0 ns],zopt2 *[1 1],’g’);




ylabel(’Function Value’);

ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;


AxisSet (8);


a(cnt ,:) = [x y];

f(cnt) = fmin;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));


[zopt ,id1] = min(z);

[zopt ,id2] = min(zopt);

id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));


[zopt2 ,id1] = min(z);

[zopt2 ,id2] = min(zopt2);

id1 = id1(id2);

xopt2 = x(id1 ,id2);

yopt2 = y(id1 ,id2);

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);


print -depsc CyclicCoordinateErrorLinear;


set(h(1),’LineWidth ’,1.2);


h = plot(xopt ,yopt ,’kx’,xopt ,yopt ,’rx’);



set(h(1),’MarkerSize ’ ,5);


hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;

AxisSet (8);

print -depsc CyclicCoordinateContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-1.5 + (-2:0.05:2),-1.5 + (-2:0.05 :2));


contour(x,y,z ,75);



axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);



hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;

AxisSet (8);


Example 3: Steepest Descent

−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

X

Y


Steepest Descent

The gradient of the function f(a) is defined as the vector of partialderivatives:

∇af(a) ≡[

∂f(a)∂a1

∂f(a)∂a2

. . . ∂f(a)∂ap

]T

• It can be shown that the gradient, ∇af(a), “points” in thedirection of maximum ascent

• The negative of the gradient, −∇af(a), “points” in the directionof maximum descent

• A vector d is a direction of descent if there exists a ε such thatf(a + λd) < f(a) for all 0 < λ < ε

• It can also be shown that d is a direction of descent iff(∇af(a))

T

d < 0

• The algorithm of steepest descent uses d = −∇af(a)

• The most fundamental of all algorithms for minimizing acontinuously differentiable function



−2 −1.8 −1.6 −1.4 −1.2−2.2

−2.1

−2

−1.9

−1.8

−1.7

−1.6

−1.5

−1.4

−1.3

−1.2

X

Y


Steepest Descent

+ Very stable algorithm

− Can converge very slowly once near the local minima where thesurface is approximately quadratic



function [] = SteepestDescent ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = 0.01;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);

[z,g] = OptFn(x, y);

a(1,:) = [x y];

f(1) = z;

d = -g/norm(g);

for cnt = 2:ns,


x = x + b*d(1);

y = y + b*d(2);


d = -g;

d = d/norm(d);



0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue


a(cnt ,:) = [x y];

f(cnt) = z;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



[zopt zopt2]

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(’square’);

hold on;


Example 3: Steepest Descent Method

0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


AxisSet (8);

print -depsc SteepestDescentErrorLinear ;


h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);








hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;

AxisSet (8);

print -depsc SteepestDescentContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-1.6 + (-0.5:0.01:0.5),-1.7 + (-0.5:0.01:0.5));

z = OptFn(x,y);

contour(x,y,z ,75);



axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);



hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;


Conjugate Gradient Algorithms

1. Take a steepest descent step

2. For i = 2 to p

• α := argminα

f(a + αd)

• a := a + αd

• gi := ∇f(a)

• β := gTi gi

gTi−1gi−1

• d := −gi + βdi


• Based on quadratic approximations of f

• Called the Fletcher-Reeves method


AxisSet (8);

print -depsc SteepestDescentContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;







xlim ([0 ns -1]);


grid on;


AxisSet (8);

print -depsc SteepestDescentPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;






ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;



Example 4: Fletcher-Reeves Conjugate Gradient

0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue



−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

X

Y



0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror



1.5 2 2.5−3.5

−3.4

−3.3

−3.2

−3.1

−3

−2.9

−2.8

−2.7

−2.6

−2.5

X

Y


h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);








hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;

AxisSet (8);

print -depsc FletcherReevesContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.5:0.01:2.5 ,-3.5:0.01:-2.5);

z = OptFn(x,y);

contour(x,y,z ,75);



axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);



hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;



function [] = FletcherReeves ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = 0.01;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);


a(1,:) = [x y];

f(1) = z;

d = -g/norm(g); % First direction

for cnt = 2:ns,


x = x + b*d(1);

y = y + b*d(2);

go = g; % Old gradient


beta = (g’*g)/(go ’*go);


AxisSet (8);

print -depsc FletcherReevesContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;







xlim ([0 ns -1]);


grid on;


AxisSet (8);

print -depsc FletcherReevesPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;






ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;



d = -g + beta*d;

a(cnt ,:) = [x y];

f(cnt) = z;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(’square’);

hold on;


Example 5: Polak-Ribiere Conjugate Gradient

−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

X

Y


AxisSet (8);

print -depsc FletcherReevesErrorLinear;



1.5 2 2.5−3.5

−3.4

−3.3

−3.2

−3.1

−3

−2.9

−2.8

−2.7

−2.6

−2.5

X

Y


Conjugate Gradient Algorithms Continued

• There is also a variant called Polak-Ribiere where

β :=(gi − gi−1)

Tgi

gT

i−1gi−1

+ Only requires the gradient

+ Converges in a finite No. steps when f(a) is quadratic and perfectline searches are used

− Less stable numerically than steepest descent

− Sensitive to inexact line searches



function [] = PolakRibiere ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = 0.01;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);


a(1,:) = [x y];

f(1) = z;


for cnt = 2:ns,


x = x + b*d(1);

y = y + b*d(2);

go = g; % Old gradient


beta = ((g-go)’*g)/(go ’*go);



0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue


d = -g + beta*d;

a(cnt ,:) = [x y];

f(cnt) = z;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(’square’);

hold on;



0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


AxisSet (8);

print -depsc PolakRibiereErrorLinear;


h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);








hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;

AxisSet (8);

print -depsc PolakRibiereContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.5:0.01:2.5 ,-3.5:0.01:-2.5);

z = OptFn(x,y);

contour(x,y,z ,75);



axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);



hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;


Parallel Tangents (PARTAN)

1. First gradient step

• d := ∇f(a)

• α := argminα f(a + αd)

• sp := αd

• a := a + sp

2. Gradient Step

• dg := ∇f(a)


• sg := αd

• a := a + sg

3. Conjugate Step

• dp := sp + sg


• sp := αd

• a := a + sp



AxisSet (8);

print -depsc PolakRibiereContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;







xlim ([0 ns -1]);


grid on;


AxisSet (8);

print -depsc PolakRibierePositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;






ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;



Example 6: PARTAN

1.5 2 2.5−3.5

−3.4

−3.3

−3.2

−3.1

−3

−2.9

−2.8

−2.7

−2.6

−2.5

X

Y


PARTAN Concept

a0

a1

a2a3

a4

a5

a6 a7

• First two steps are steepest descent

• Thereafter, each iteration consists of two steps

1. Search along the direction

di = ai − ai−2

where ai is the current point and ai−2 is the point from twosteps ago

2. Search in the direction of the negative gradient

di = −∇f(ai)


Example 6: PARTAN

0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue


Example 6: PARTAN

−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

X

Y


cnt = 2;

while cnt <ns ,

% Gradient step

[z,g] = OptFn(x,y);

d = -g/norm(g); % Direction

[bg ,fmin] = LineSearch ([x y]’,d,b0 ,ls);

xg = x + bg*d(1);

yg = y + bg*d(2);

cnt = cnt + 1;

a(cnt ,:) = [xg yg];

f(cnt) = OptFn(xg ,yg);

fprintf(’G : %d %5.3f\n’,cnt ,f(cnt ));

if cnt==ns,

break;

end;

% Conjugate

d = [xg -xa yg -ya]’;

if norm(d)�=0,

d = d/norm(d);

[bp ,fmin] = LineSearch ([xg yg]’,d,b0 ,ls);

else

bp = 0;

end;

if bp >0, % Line search in conjugate direction was successful

fprintf(’P :’);

x = xg + bp*d(1);

y = yg + bp*d(2);


Example 6: PARTAN

0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


else % Could not move - do another gradient update

cnt = cnt + 1;

a(cnt ,:) = a(cnt -1 ,:);

f(cnt) = f(cnt -1);

if cnt==ns ,

break;

end;

fprintf(’G2:’);

[z,g] = OptFn(xg,yg);

d = -g/norm(g); % Direction

[bp ,fmin] = LineSearch ([xg yg]’,d,b0 ,ls);

x = xg + bp*d(1);

y = yg + bp*d(2);

end;

% Update anchor point

xa = xg;

ya = yg;

cnt = cnt + 1;

a(cnt ,:) = [x y];

f(cnt) = OptFn(x,y);

fprintf(’ %d %5.3f\n’,cnt ,f(cnt ));

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);



function [] = Partan ();

%clear all;

close all;

ns = 26;

x = -3;

y = 1;

b0 = 0.01;

ls = 30;

a = zeros(ns ,2);

f = zeros(ns ,1);

[z,g] = OptFn(x,y);

a(1,:) = [x y];

f(1) = z;

xa = x;

ya = y;

% First step - substitute for a Conjugate step


[bp,fmin] = LineSearch ([x y]’,d,b0 ,100);

x = x + bp*d(1); % Standin for a conjugate step

y = y + bp*d(2);

a(2,:) = [x y];

f(2) = fmin;


xlim ([0 ns -1]);


grid on;


AxisSet (8);

print -depsc PartanPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;






ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;


AxisSet (8);

print -depsc PartanErrorLinear;


yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);








hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;


PARTAN Pros and Cons

a0

a1

a2a3

a4

a5

a6 a7

+ For quadratic functions, converges in a finite number of steps

+ Easier to implement than 2nd order methods

+ Can be used with large number of parameters

+ Each (composite) step is at least as good as steepest descent

+ Tolerant of inexact line searches

− Each (composite) step requires two line searches


AxisSet (8);

print -depsc PartanContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.5:0.01:2.5 ,-3.5:0.01:-2.5);

z = OptFn(x,y);

contour(x,y,z ,75);



axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);



hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;

AxisSet (8);

print -depsc PartanContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;








Example 7: Newton’s with Steepest Descent Safeguard

0 0.5 1 1.5 2

−3

−2.5

−2

−1.5

X

Y


Newton’s Method

ak+1 = ak − H(ak)−1 ∇f(ak)

where ∇f(ak) is the gradient and H(ak) is the hessian of f(a),

H(ak) ≡

⎡⎢⎢⎢⎢⎢⎣

∂2f(a)∂a2

1

∂2f(a)∂a1 ∂a2

. . . ∂2f(a)∂a1 ∂ap

∂2f(a)∂a2 ∂a1

∂2f(a)∂a2

2. . . ∂2f(a)

∂a2 ∂ap

......

. . ....

∂2f(a)∂ap ∂a1

∂2f(a)∂ap ∂a2

. . . ∂2f(a)∂a2

p

⎤⎥⎥⎥⎥⎥⎦

• Based on a quadratic approximation of the function f(a)

• If f(a) is quadratic, converges in one step

• If H(a) is positive-definite, the problem is well defined near localminima where f(a) is nearly quadratic



0 10 20 30 40 50 60 70 80 900

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue



−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

X

Y


y = y + b*d(2);

[z,g,H] = OptFn(x,y);

a(cnt ,:) = [x y];

f(cnt) = z;

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);





0 10 20 30 40 50 60 70 80 900

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);








hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;

AxisSet (8);

print -depsc NewtonsContourA;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.0 + (-1:0.02:1), -2.4 + (-1:0.02 :1));

z = OptFn(x,y);

contour(x,y,z ,75);



axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);



hold off;

xlabel(’X’);



function [] = Newtons ();

%clear all;

close all;

ns = 100;

x = -3; % Starting x

y = 1; % Starting y

b0 = 1;

a = zeros(ns ,2);

f = zeros(ns ,1);

[z,g,H] = OptFn(x, y);

a(1,:) = [x y];

f(1) = z;

for cnt = 2:ns,

d = -inv(H)*g;

if d’*g>0, % Revert to steepest descent if is not direction of descent

%fprintf (’(%2d of %2d) Min. Eig :%5 .3f Reverting...\n’,cnt ,ns,min(eig(H)));

d = -g;

end;

d = d/norm(d);

[b,fmin] = LineSearch ([x y]’,d,b0 ,100);

%a(cnt ,:) = (a(cnt -1,:)’ - inv(H)*g)’; % Pure Newton ’s Method

x = x + b*d(1);


Newton’s Method Pros and Cons

ak+1 = ak − H(ak)−1 ∇f(ak)

+ Very fast convergence near local minima

− Not guaranteed to converge (may actually diverge)

− Requires p × p Hessian

− Requires a p × p matrix inverse that uses O(p3) operations


ylabel(’Y’);

zoom on;

AxisSet (8);

print -depsc NewtonsContourB;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;







xlim ([0 ns -1]);


grid on;


AxisSet (8);

print -depsc NewtonsPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;






ylim ([0 f(1)]);

xlim ([0 ns -1]);


Levenberg-Marquardt

1. Determine if εkI + H(ak) is positive definite. If not, εk := 4εk

and repeat.

2. Solve the following equation for ak+1

[εkI + H(ak)] (ak+1 − ak) = −∇f(ak)

3.

rk ≡ f(ak) − f(ak+1)q(ak) − q(ak+1)

where q(a) is the quadratic approximation of f(a) based on thef(a), ∇f(a), and H(ak)

4. If rk < 0.25, then εk+1 := 4εk

If rk > 0.75, then εk+1 := 12εk

If rk ≤ 0, then ak+1 := ak

5. If not converged, k := k + 1 and loop to 1.


grid on;


AxisSet (8);

print -depsc NewtonsErrorLinear;


Example 8: Levenberg-Marquardt Conjugate Gradient

1.5 2 2.5−3.5

−3.4

−3.3

−3.2

−3.1

−3

−2.9

−2.8

−2.7

−2.6

−2.5

X

Y


Levenberg-Marquardt Comments

• Similar to Newton’s method

• Has safety provisions for regions where quadratic approximation isinappropriate

• Compare

Newton’s: ak+1 = ak − H(ak)−1 ∇f(ak)LM : [εkI + H(ak)] (ak+1 − ak) = −∇f(ak)

• If ε = 0, these are equivalent

• If ε → ∞, ak+1 → ak

• ε is chosen to ensure that the smallest eigenvalue of H(ak) ispositive and sufficiently large (≥ δ)



0 5 10 15 20 250

1

2

3

4

5

6

7

Iteration

Func

tion

Val

ue



−5 0 5−5

−4

−3

−2

−1

0

1

2

3

4

5

X

Y


y = a(cnt ,2);

zo = zn; % Old function value

zn = OptFn(x,y);

xd = (a(cnt ,:)’-ap);

qo = zo;

qn = zn + g’*xd + 0.5*xd ’*H*xd;

if qo==qn , % Test for convergence

x = a(cnt ,1);

y = a(cnt ,2);

a(cnt:ns ,:) = ones(ns -cnt +1 ,1)*[x y];

f(cnt:ns ,:) = OptFn(x,y);

break;

end;

r = (zo -zn)/(qo-qn);

if r<0.25 ,

eta = eta * 4;

elseif r>0.50 , % 0.75 is recommended , but much slower

eta = eta / 2;

end;

if zn>zo , % Back up

a(cnt ,:) = a(cnt -1 ,:);

else

ap = a(cnt ,:)’;

end;

x = a(cnt ,1);



0 5 10 15 20 250

1

2

3

4

5

6

Iteration

Euc

lidea

n Po

sitio

n E

rror


y = a(cnt ,2);

a(cnt ,:) = [x y];

f(cnt) = OptFn(x,y);

%disp([cnt a(cnt ,:) f(cnt) r eta])

end;

[x,y] = meshgrid (0+( -0 .01:0.001:0.01),3+(-0.01:0 .001:0.01 ));




id1 = id1(id2);

xopt = x(id1 ,id2);

yopt = y(id1 ,id2);

[x,y] = meshgrid (1 .883+(-0.02:0.001:0.02),-2.963+(-0.02:0.001:0.02 ));




id1 = id1(id2);



figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (-5:0.1:5,-5:0.1 :5);

z = OptFn(x,y);

contour(x,y,z ,50);



axis(’square’);



function [] = LevenbergMarquardt ();

%clear all;

close all;

ns = 26;

x = -3; % Starting x

y = 1; % Starting y

eta = 0.0001;

a = zeros(ns ,2);

f = zeros(ns ,1);

[zn,g,H] = OptFn(x, y);

a(1,:) = [x y];

f(1) = zn;

ap = [x y]’; % Previous point

for cnt = 2:ns,

[zn ,g,H] = OptFn(x,y);

while min(eig(eta*eye (2)+H))<0,

eta = eta * 4;

end;

a(cnt ,:) = (ap - inv(eta*eye (2)+H)*g )’;

x = a(cnt ,1);



AxisSet (8);

print -depsc LevenbergMarquardtErrorLinear;


hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);








hold off;

xlabel(’X’);

ylabel(’Y’);

zoom on;

AxisSet (8);

print -depsc LevenbergMarquardtContourA ;

figure;

FigureSet (1,4.5 ,2.75);

[x,y] = meshgrid (1.5:0.01:2.5 ,-3.5:0.01:-2.5);

z = OptFn(x,y);

contour(x,y,z ,75);



axis(’square’);

hold on;

h = plot(a(:,1),a(:,2),’k’,a(:,1),a(:,2),’r’);



hold off;

xlabel(’X’);

ylabel(’Y’);


Levenberg-Marquardt Pros and Cons

[εkI + H(ak)] (ak+1 − ak) = −∇f(ak)

• Many equivalent formulations

+ No line search required

+ Can be used with approximations to the hessian

+ Extremely fast convergence (2nd order)

− Requires gradient and hessian (or approximate hessian)

− Requires O(p3) operations for each solution to the key equation


zoom on;

AxisSet (8);

print -depsc LevenbergMarquardtContourB ;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;







xlim ([0 ns -1]);


grid on;


AxisSet (8);

print -depsc LevenbergMarquardtPositionError;

figure;

FigureSet (2,4.5 ,2.75);

k = 1:ns;






ylim ([0 f(1)]);

xlim ([0 ns -1]);

grid on;


Optimization Algorithm Summary

Algorithm Convergence Stable ∇f(a) H(a) LSCyclic Coordinate Slow Y N N YSteepest Descent Slow Y Y N YConjugate Gradient Fast N Y N YPARTAN Fast Y Y N YNewton’s Method Very Fast N Y Y NLevenberg-Marquardt Very Fast Y Y Y N


Documents

Multivariate Optimization - Web Services Overview - Portland State