Optimization 吳育德
Unconstrained Minimization
Def : f(x), x is said to be differentiable at a point x*, if it is defined in a neighborhood N around x* and if x* +h a vector n independent of h that
where the vector a is called the gradient of f(x) evaluated at x*, denote it as
)*,(,,*)()*( hxΦhhaxfhxf
T
xnx
f
x
f
x
fxfa *][)(
21
*
The term <a,h> is called the 1-st variation.
and
Tn hxhxhx ])*,()*,([, 1
*
nihxh
i ,....2,1 ,0)*,(lim0
Unconstrained Minimization
Note if f(x) is twice differentiable, then
TOHhxFhx ..)(2
1),(
where F(x) is an n*n symmetric, called the Hessian of f(x)
Then
1st variation 2nd variation
:::
:::
)(2
2
22
2
12
21
2
21
2
21
2
n
n
xx
f
x
f
xx
fxx
f
xx
f
x
f
xF
TOHhhxFhxfxfhxf ..,*)(2
1*),(*)()*(
Directional derivatives
Let w be a directional vector of unit norm || w|| =1 Now consider
is a function of the scalar r.
)*()( wrxfrg w
*x rw
rwx *
Def : The directional derivative of f(x) in the direction w (unit norm) at w* is defined as
wrwxwxf
wrwxwxf
r
rwrwxrwxf
r
xfrwxf
r
grg
dr
rdgxfD
r
r
r
r
rrw
),*,(*),(
)),*,(*),((
),*,(*),(
*)()*(
0
)0()(|
)(*)(
lim
lim
lim
lim
lim
0
0
0
0
00
0
Directional derivatives Example : Let
Then
i.e. the partial derivative of f(x*) w.r.t xi is the directional derivative of f(x) in the direction ei.
Tiew ]010[
iii x
fexfxfDe
*),(*)(
Interpretation of *)(xDwf
Consider
Then
The directional derivative along a direction w (||w||=1) is the length of the projection vector of on w.
wwxfxfproj *),(*)(
*)(*),( *)(
,*),(
,*)(,*)(*)( 2
2
xDwfwxfxfproj
wwwxf
wxfwxfxfproj
*)(xDwf
*)(xf
1w w
*x
*)(xf proj
[Q] : What direction w yield the largest directional derivative?Ans :
Recall that the 1st variation of is
Conclusion 1 : The direction of the gradient is the direction that yields the largest change (1st -variation) in the function.
This suggests in the steepest decent method which will be described later
Unconstrained Minimization
||*)(||
*)(
xf
xfw
)*( rwxf *)(*),( xrDwfwxfr
)(1 kkk xfxx
Example: 2
2
122
21 ,)( R
x
xx xxxf
2
1
2
1
2
2)(
x
x
x
fx
f
xf
Let ,
w with unit norm =
1
1x
2
2)(xf
2
12
1
2
2
8
1
)(
)(
xf
xf
Sol :
1
1
)( xf
1x
2x
curcles level
cxxf ),( 21
Directional derivatives
The directional derivative in the direction of the gradient is
Notes :
222
4
2
12
1
,2
2
)(
)(,),()(
xf
xf wwxfxfDw
2220
1,
2
2),()( 11
exfxfDe
2221
0,
2
2),()( 22
exfxfDe
Directional derivatives
Directional derivatives
Def : f(x) is said to have a local (or relative) minimum at x*, if in a nbd N of x*
Nx xfxf ),()( *
Theorem: Let f(x) be differentiable ,If f(x) has a local minimum at x* , then
pf :
),0)((0)( wxfD or xf w
0),(
0),(,
0),(
0),(0
0),,(),()()(
11
*
hxfhh then
hxfh if since
hxf
hxf h as
hhxhxfxfhxf
Note: is a necessary condition, not sufficient condition. )( xf
Directional derivatives
Theorem: If f(x) is twice diff and
pf :
x
f(x) of minimum local a is x then
)v, )vF(x (i.e. vmatrix definite positive is )F(x
xf
*
*T* 00)2(
0)()1(
TOHhhxFhxfxfhxf ..,)(2
1),()()(
0)(2
1)()( hxFhxfhxf T
*xNx ,xfxf )()(
Conclusion2: The necessary & Sufficient Conditions for a local minimum of f(x) is
p.d is xF
0)f(x *
)()2(
)1(
Minimization of Unconstrained function
Prob. : Let y=f(x) , . We want to generate a sequence
and such that it converges to the minimum of f(x).
Consider the kth guess, , we can generate provided that we have two of information (1) the direction to go (2) a scalar step sizeThen
Basic descent methods (1) Steepest descent (2) Newton-Raphson method
)xf()xf()xf( that suchxxx )2()1()0()2()1()0( ,........,,
,kx )1( kx
:kd:k
)()()()1( kkkk dxx
nRx
Steepest Descent
Steepest descent : )( )()( kk xfd
0)( )()()()()1( kkkkk withxfxx
.
,(k)(k)(k)(k)(k)
(k)
of function a is ))f(x-f(x)g( consider
determine To
Note
)()()(
)(),()(
))((
)(2)()()(
)()()()(
)()()(
kkkk
kkkk
kkk
xfxfxf
xfxfxf
xfxf
1.a. Optimum it minimizes )(k 0
)(),(
)(
)()(
k
kk
d
dgieg
Steepest Descent
Example : 2
21
21 ,
1)( Rx
xx
xxxf
64
2164
37
)(,3
3 )()( kk xfx Suppose
64
2164
37
3
3 )()1( kkx
1)64
213)(
64
373(
)64
373(
64
373)(
)()(
2)(
)()(
kk
k
kkg
general) (in calvulate tomessy is d
dg kk
k)(
)(
)(
,0)(
Steepest Descent
Example : p.d. and c symmetri: Q xbQxxxf nnTT
,2
1)(
) in parabola a (
(k)
..
)(2
1
))(()()(2
1
))(()(
)(
)(
)()()(
)()()()()()(2)(
)()()()(
)()()()1(
ei
xbQxx
dbQdXQdd
bQxxfg
bQxxx
bQxxf
kTkTk
kTkTkkkTkk
kkkk
kkkk
)()( kk dbQX
Steepest Descent
0)()()( )()()()()(
)(
)(
kTTkkTkk
k
k
dbQXQddd
dg
)()(
)()()(
)(
)(kTk
kTkk
Qdd
dd
p.d.) is Q ( Qddd
gd Note kTk
k
k
0)()(
)( )()(2)(
)(2
Optimum iteration
Remark :The optimal steepest descent step size can be determined analytically for quadratic function.
bQXd dQdd
ddXX kkk
kk
kkkk
)()()()()(
)()()()1( ,
,
,
Steepest Descent
1.b. other possibilities for choosing )(k
(1) Constant step size i.e.
adv : simple disadv : no idea of which value of α to choose If α is too large diverge If α is too small very slow
(2) Variable step size
k constantk )(
)( )()()1( kkk xfxx
one. minimized the find g(,),g(),g( evaluate
minimized is )g( },,,{ from choose ei
k21
(k)k21
(k)
)
..
Steepest Descent
1.b. other possibilities for choosing )(k
(3) Polynomial fit methods
(i) Quadratic fit
gauss three values for α, say α1 , α2 , α3.
Let
Solve for a, b, c minimize by
Check
2)( cbag
)(g 02)(
cbd
dg
1,2,3i ,cbaxfxfg iik
ik
i )(()( )()(
02)(
2
2
cd
gd
),,,,,(2
321321)( gggfun
c
bk
)( kg
)(k)(k1 2 3
1g2g 3g
Steepest Descent
1.b. other possibilities for choosing )(k
(3) Polynomial fit methods
(ii) Cubic fit
432
23
1)( aaaag
4
3
2
1
4
3
2
1
444
333
222
111
1
1
1
1
23
23
23
23
g
g
g
g
a
a
a
a
026)(
6
1242
023)(
,,,
2)(
1)(
)(2
1
312
22)(
322
1
4321
aad
gd check
a
aaaa
aaad
dg
aaaa solveto
kk
k
k
)( kg
)(k)(k1 2 3 4
1g
2g3g
3g
(4) Region elimination methods
Assume g(α) is convex over [a,b] i.e. one minimum
(a) g1>g2 (b)g1<g2 (c)g1=g2
Steepest Descent
1.b. other possibilities for choosing )(k
1g 2g
2a b
)(g
1
2g1g2g
1g 2g1g
a b1 2 a b1 2 a b1 2
eliminated eliminated eliminated eliminated
initial interval of uncertainty [a,b] , next interval of uncertainty for (i) is [ ,b]; for (ii) is [a, ]; for (iii) is [ , ]1 12 2
Steepest Descent
[Q] : how do we choose and ?
(i) Two points equal interval search
i.e. α1- a = α1- α2=b- α1
1st iteration
2nd iteration
3rd iteration
kth iteration
1 2
abL 0
01 3
2LL
02
12 )3
2(
3
2LLL
0)3
2( LL k
k
a b1 2
Steepest Descent
[Q] : how do we choose and ?
(ii) Fibonacci Search method
For N-search iteration
Example: Let N=5, initial a = 0 , b = 1
1 2
2110 ,1,1 kkk FFFFF
1,,2,1,0,)(1
1)(1
NkaabF
Fkkk
kN
kNk
1,,2,1,0,)(1
)(2
NkaabF
Fkkk
kN
kNk
13
50)01(
6
4)0(
1
F
F
]b,[a gg compare
F
F
11 1
2
6
5)0(
2
,
13
80)01(
0 13
501
13
802
2g1g
1
k=01b1a
01 13
8LL
1115
3)1(
1)( aab
F
F
]b ,[a gg compare
aabF
F
221
2
1115
4)1(
2
,
)(
Steepest Descent
[Q] : how do we choose and ?
(iii) Golden Section Method
then use
until
Example: then then
etc…
N if 382.01
1lim
N
N
N F
F618.0
1lim
N
N
N F
F
kkkk aab )(382.0)(
1
kkkk aab )(618.0)(
2 2,1,0k
kk ab
]2,0[],[ 00 ba
764.00)02(382.0)0(1
236.10)02(618.0)0(2
0 764.0 236.1
2g1g
1
]236.1,0[],[ 11 ba472.00)0236.1(382.0)1(
1
763.00)0236.1(618.0)1(2
0 472.0 763.0
2g1g
236.1
1 2
Steepest Descent
Flow chart of steepest descent
Initial guess x(0)
Compute f(x▽ (k))
∥ ▽f(x(k)) ε∥﹤
Determine α(k)
x(k+1)c=x(k)- α(k) f(x▽ (k))
k=k+1
Stop!x(k) is minimum
Yes
No
α {α1 ,… αn} Polynomial fit : cubic ,… Region elimination : …
[Q]: is the direction of the “best” direction to go?
suppose the initial guess is x(0)
Consider the next guess
What should M be such that x(1) is the minimum, i.e. ?
Since we want
If MQ=I , or M=Q-1
Thus , for a quadratic function , x(k+1)=x(k)-Q-1▽f(x(k)) will take us to the minimum in one iteration no matter what x(0) is.
Steepest Descent
)(xf
bQxxf xbQxxxf TT )(2
1)(
)(
)()0()0(
)0()1(
bQxMx
matrix nn : M ,xfMxx
0)( )1( xf
0)( )1()1( bQxxf
bbQxMxQ ))(( )0()0(
0)0()0( bQMbQMQxQx
Newton-Raphson Method
Minimize f(x)The necessary condition f(x)=0▽The N-R algorithm is to find the roots of f(x)=0 ▽
Guess x(k) , then x(k+1) must satisfy
Note not always converge
)(
)()( )(
)1()(
)(
kx
k
kk
k
dx
xfd
xx
xf
)()
)(( )(1)()1(
)(
k
x
kk xfdx
xfdxx
k
)( kxf)(xf
1kx kx
kk xx 1
)(xf
1
2
x
Newton-Raphson Method
A more formal derivationMin f(x(k)+h) w.r.t h
hxFhhxfxfhxf kkkk )(,2
1,()()( )()()()(
0)()()( )()()( hxFxfhxf kkkh
)()( )(1)( kk xfxFh
hxx kk )()1(
)()]([ )(1)()( kkk xfxFx
kx1kx2kx3kx x
)(xf
Newton-Raphson Method
Remarks : ( 1 ) computation of [F(x(k))]-1 at every iteration → time consuming
→ modify N-R algorithm to calculate [F(x(k))]-1 every M-th iteration
( 2 ) must check F(x(k)) is p.d. at every iteration. If not →
Example :
ΛxFxF kk )()(ˆ )()( 00
01
n
Λ
xxxxf
1
1),(
22
21
21
222
21
2
222
21
1
)1(
2)1(
2
xx
xxx
x
f
Newton-Raphson Method
134
413
)1(
2)(
22
2121
212
22
1
322
21 xxxx
xxxx
xxxF
The minimum of f(x) is at (0,0)
In the nbd of (0,0) is p.d.
Now suppose we start an initial guess
Then
diverges.
then ,x
0
1)0(
02
1)( )0(xf
2
10
02
1
)( )0(xF
04
5)())(( )0(1)0()0()1( xfxFxx
10
012))0,0((F
Remark : ( 3 ) N-R algorithm is good(fast) when initial guess close to minimum , but not very good when far from minimum.