Optimization

Optimization 吳育德

Unconstrained Minimization

Def : f(x), x is said to be differentiable at a point x*, if it is defined in a neighborhood N around x* and if x* +h a vector n independent of h that

where the vector a is called the gradient of f(x) evaluated at x*, denote it as

)*,(,,*)()*( hxΦhhaxfhxf

T

xnx

f

x

f

x

fxfa *][)(

21

*

The term <a,h> is called the 1-st variation.

and

Tn hxhxhx ])*,()*,([, 1

*

nihxh

i ,....2,1 ,0)*,(lim0


Note if f(x) is twice differentiable, then

TOHhxFhx ..)(2

1),(

where F(x) is an n*n symmetric, called the Hessian of f(x)

Then

1st variation 2nd variation

:::

:::

)(2

2

22

2

12

21

2

21

2

21

2

n

n

xx

f

x

f

xx

fxx

f

xx

f

x

f

xF

TOHhhxFhxfxfhxf ..,*)(2

1*),(*)()*(

Directional derivatives

Let w be a directional vector of unit norm || w|| =1 Now consider

is a function of the scalar r.

)*()( wrxfrg w

*x rw

rwx *

Def : The directional derivative of f(x) in the direction w (unit norm) at w* is defined as

wrwxwxf

wrwxwxf

r

rwrwxrwxf

r

xfrwxf

r

grg

dr

rdgxfD

r

r

r

r

rrw

),*,(*),(

)),*,(*),((

),*,(*),(

*)()*(

0

)0()(|

)(*)(

lim

lim

lim

lim

lim

0

0

0

0

00

0

Directional derivatives Example : Let

Then

i.e. the partial derivative of f(x*) w.r.t xi is the directional derivative of f(x) in the direction ei.

Tiew ]010[

iii x

fexfxfDe

*),(*)(

Interpretation of *)(xDwf

Consider

Then

The directional derivative along a direction w (||w||=1) is the length of the projection vector of on w.

wwxfxfproj *),(*)(

*)(*),( *)(

,*),(

,*)(,*)(*)( 2

2

xDwfwxfxfproj

wwwxf

wxfwxfxfproj

*)(xDwf

*)(xf

1w w

*x

*)(xf proj

[Q] : What direction w yield the largest directional derivative?Ans :

Recall that the 1st variation of is

Conclusion 1 : The direction of the gradient is the direction that yields the largest change (1st -variation) in the function.

This suggests in the steepest decent method which will be described later


||*)(||

*)(

xf

xfw

)*( rwxf *)(*),( xrDwfwxfr

)(1 kkk xfxx

Example: 2

2

122

21 ,)( R

x

xx xxxf

2

1

2

1

2

2)(

x

x

x

fx

f

xf

Let ,

w with unit norm =

1

1x

2

2)(xf

2

12

1

2

2

8

1

)(

)(

xf

xf

Sol :

1

1

)( xf

1x

2x

curcles level

cxxf ),( 21


The directional derivative in the direction of the gradient is

Notes :

222

4

2

12

1

,2

2

)(

)(,),()(

xf

xf wwxfxfDw

2220

1,

2

2),()( 11

exfxfDe

2221

0,

2

2),()( 22

exfxfDe



Def : f(x) is said to have a local (or relative) minimum at x*, if in a nbd N of x*

Nx xfxf ),()( *

Theorem: Let f(x) be differentiable ,If f(x) has a local minimum at x* , then

pf :

),0)((0)( wxfD or xf w

0),(

0),(,

0),(

0),(0

0),,(),()()(

11

*

hxfhh then

hxfh if since

hxf

hxf h as

hhxhxfxfhxf

Note: is a necessary condition, not sufficient condition. )( xf


Theorem: If f(x) is twice diff and

pf :

x

f(x) of minimum local a is x then

)v, )vF(x (i.e. vmatrix definite positive is )F(x

xf

*

*T* 00)2(

0)()1(

TOHhhxFhxfxfhxf ..,)(2

1),()()(

0)(2

1)()( hxFhxfhxf T

*xNx ,xfxf )()(

Conclusion2: The necessary & Sufficient Conditions for a local minimum of f(x) is

p.d is xF

0)f(x *

)()2(

)1(

Minimization of Unconstrained function

Prob. : Let y=f(x) , . We want to generate a sequence

and such that it converges to the minimum of f(x).

Consider the kth guess, , we can generate provided that we have two of information (1) the direction to go (2) a scalar step sizeThen

Basic descent methods (1) Steepest descent (2) Newton-Raphson method

)xf()xf()xf( that suchxxx )2()1()0()2()1()0( ,........,,

,kx )1( kx

:kd:k

)()()()1( kkkk dxx

nRx

Steepest Descent

Steepest descent : )( )()( kk xfd

0)( )()()()()1( kkkkk withxfxx

.

,(k)(k)(k)(k)(k)

(k)

of function a is ))f(x-f(x)g( consider

determine To

Note

)()()(

)(),()(

))((

)(2)()()(

)()()()(

)()()(

kkkk

kkkk

kkk

xfxfxf

xfxfxf

xfxf

1.a. Optimum it minimizes )(k 0

)(),(

)(

)()(

k

kk

d

dgieg

Steepest Descent

Example : 2

21

21 ,

1)( Rx

xx

xxxf

64

2164

37

)(,3

3 )()( kk xfx Suppose

64

2164

37

3

3 )()1( kkx

1)64

213)(

64

373(

)64

373(

64

373)(

)()(

2)(

)()(

kk

k

kkg

general) (in calvulate tomessy is d

dg kk

k)(

)(

)(

,0)(

Steepest Descent

Example : p.d. and c symmetri: Q xbQxxxf nnTT

,2

1)(

) in parabola a (

(k)

..

)(2

1

))(()()(2

1

))(()(

)(

)(

)()()(

)()()()()()(2)(

)()()()(

)()()()1(

ei

xbQxx

dbQdXQdd

bQxxfg

bQxxx

bQxxf

kTkTk

kTkTkkkTkk

kkkk

kkkk

)()( kk dbQX

Steepest Descent

0)()()( )()()()()(

)(

)(

kTTkkTkk

k

k

dbQXQddd

dg

)()(

)()()(

)(

)(kTk

kTkk

Qdd

dd

p.d.) is Q ( Qddd

gd Note kTk

k

k

0)()(

)( )()(2)(

)(2

Optimum iteration

Remark :The optimal steepest descent step size can be determined analytically for quadratic function.

bQXd dQdd

ddXX kkk

kk

kkkk

)()()()()(

)()()()1( ,

,

,

Steepest Descent

1.b. other possibilities for choosing )(k

(1) Constant step size i.e.

adv : simple disadv : no idea of which value of α to choose If α is too large diverge If α is too small very slow

(2) Variable step size

k constantk )(

)( )()()1( kkk xfxx

one. minimized the find g(,),g(),g( evaluate

minimized is )g( },,,{ from choose ei

k21

(k)k21

(k)

)

..

Steepest Descent


(3) Polynomial fit methods

(i) Quadratic fit

gauss three values for α, say α1 , α2 , α3.

Let

Solve for a, b, c minimize by

Check

2)( cbag

)(g 02)(

cbd

dg

1,2,3i ,cbaxfxfg iik

ik

i )(()( )()(

02)(

2

2

cd

gd

),,,,,(2

321321)( gggfun

c

bk

)( kg

)(k)(k1 2 3

1g2g 3g

Steepest Descent


(3) Polynomial fit methods

(ii) Cubic fit

432

23

1)( aaaag

4

3

2

1

4

3

2

1

444

333

222

111

1

1

1

1

23

23

23

23

g

g

g

g

a

a

a

a

026)(

6

1242

023)(

,,,

2)(

1)(

)(2

1

312

22)(

322

1

4321

aad

gd check

a

aaaa

aaad

dg

aaaa solveto

kk

k

k

)( kg

)(k)(k1 2 3 4

1g

2g3g

3g

(4) Region elimination methods

Assume g(α) is convex over [a,b] i.e. one minimum

(a) g1>g2 (b)g1<g2 (c)g1=g2

Steepest Descent


1g 2g

2a b

)(g

1

2g1g2g

1g 2g1g

a b1 2 a b1 2 a b1 2

eliminated eliminated eliminated eliminated

initial interval of uncertainty [a,b] , next interval of uncertainty for (i) is [ ,b]; for (ii) is [a, ]; for (iii) is [ , ]1 12 2

Steepest Descent

[Q] : how do we choose and ?

(i) Two points equal interval search

i.e. α1- a = α1- α2=b- α1

1st iteration

2nd iteration

3rd iteration

kth iteration

1 2

abL 0

01 3

2LL

02

12 )3

2(

3

2LLL

0)3

2( LL k

k

a b1 2

Steepest Descent


(ii) Fibonacci Search method

For N-search iteration

Example: Let N=5, initial a = 0 , b = 1

1 2

2110 ,1,1 kkk FFFFF

1,,2,1,0,)(1

1)(1

NkaabF

Fkkk

kN

kNk

1,,2,1,0,)(1

)(2

NkaabF

Fkkk

kN

kNk

13

50)01(

6

4)0(

1

F

F

]b,[a gg compare

F

F

11 1

2

6

5)0(

2

,

13

80)01(

0 13

501

13

802

2g1g

1

k=01b1a

01 13

8LL

1115

3)1(

1)( aab

F

F

]b ,[a gg compare

aabF

F

221

2

1115

4)1(

2

,

)(

Steepest Descent


(iii) Golden Section Method

then use

until

Example: then then

etc…

N if 382.01

1lim

N

N

N F

F618.0

1lim

N

N

N F

F

kkkk aab )(382.0)(

1

kkkk aab )(618.0)(

2 2,1,0k

kk ab

]2,0[],[ 00 ba

764.00)02(382.0)0(1

236.10)02(618.0)0(2

0 764.0 236.1

2g1g

1

]236.1,0[],[ 11 ba472.00)0236.1(382.0)1(

1

763.00)0236.1(618.0)1(2

0 472.0 763.0

2g1g

236.1

1 2

Steepest Descent

Flow chart of steepest descent

Initial guess x(0)

Compute f(x▽ (k))

∥ ▽f(x(k)) ε∥﹤

Determine α(k)

x(k+1)c=x(k)- α(k) f(x▽ (k))

k=k+1

Stop!x(k) is minimum

Yes

No

α {α1 ，… αn} Polynomial fit : cubic ,… Region elimination : …

[Q]: is the direction of the “best” direction to go?

suppose the initial guess is x(0)

Consider the next guess

What should M be such that x(1) is the minimum, i.e. ?

Since we want

If MQ=I ， or M=Q-1

Thus ， for a quadratic function ， x(k+1)=x(k)-Q-1▽f(x(k)) will take us to the minimum in one iteration no matter what x(0) is.

Steepest Descent

)(xf

bQxxf xbQxxxf TT )(2

1)(

)(

)()0()0(

)0()1(

bQxMx

matrix nn : M ,xfMxx

0)( )1( xf

0)( )1()1( bQxxf

bbQxMxQ ))(( )0()0(

0)0()0( bQMbQMQxQx

Newton-Raphson Method

Minimize f(x)The necessary condition f(x)=0▽The N-R algorithm is to find the roots of f(x)=0 ▽

Guess x(k) ， then x(k+1) must satisfy

Note not always converge

)(

)()( )(

)1()(

)(

kx

k

kk

k

dx

xfd

xx

xf

)()

)(( )(1)()1(

)(

k

x

kk xfdx

xfdxx

k

)( kxf)(xf

1kx kx

kk xx 1

)(xf

1

2

x


A more formal derivationMin f(x(k)+h) w.r.t h

hxFhhxfxfhxf kkkk )(,2

1,()()( )()()()(

0)()()( )()()( hxFxfhxf kkkh

)()( )(1)( kk xfxFh

hxx kk )()1(

)()]([ )(1)()( kkk xfxFx

kx1kx2kx3kx x

)(xf


Remarks ：（ 1 ） computation of [F(x(k))]-1 at every iteration → time consuming

→ modify N-R algorithm to calculate [F(x(k))]-1 every M-th iteration

（ 2 ） must check F(x(k)) is p.d. at every iteration. If not →

Example ：

ΛxFxF kk )()(ˆ )()( 00

01

n

Λ

xxxxf

1

1),(

22

21

21

222

21

2

222

21

1

)1(

2)1(

2

xx

xxx

x

f


134

413

)1(

2)(

22

2121

212

22

1

322

21 xxxx

xxxx

xxxF

The minimum of f(x) is at (0,0)

In the nbd of (0,0) is p.d.

Now suppose we start an initial guess

Then

diverges.

then ,x

0

1)0(

02

1)( )0(xf

2

10

02

1

)( )0(xF

04

5)())(( )0(1)0()0()1( xfxFxx

10

012))0,0((F

Remark ：（ 3 ） N-R algorithm is good(fast) when initial guess close to minimum ， but not very good when far from minimum.

Documents

Optimization