MATLAB 333†††LLLƒƒƒ+++nnnïïï˜˜˜¥¥¥˙˙˙AAA^^^ · MATLAB ˜˜˜::: ÝÝÝ ƒƒƒŒŒŒ|||$$$””” Example 5 Using Jacobi and Guass-Seidel iteration to solve

MATLAB 333²²²LLL+++nnnïïïÄÄÄ¥¥¥AAA^nnnØØØ¢¢¢~~~

Xi Chenibs.bfsu.edu.cn/chenxi

Department of Management Science and EngineeringInternational Business School

Beijing Foreign Studies University

Xi Chen ([email protected]) MATLAB 333²²²LLL+++nnnïïïÄÄÄ¥¥¥AAA^ 1 / 139

µ¦

1 pêÆ!5ê!VÇØ¶

2 $ÊÆ!ÚOÆ¶

3 ) C ó½a C óÄ?§("

¡é

1 c?ïÄ)¶

2 nc?)"

8IµU|^ MATLAB ¥Jø«ÄõUÚp?óä©Û²L+nÆ¥ïÄ¯K"


Øª

1 µ10%

2 ,Lyµ20%

3 Ï"Áµ70%

1 µ2-3g

2 Ï"Áµtake-home


MATLAB ÄÄÄ:::

1 MATLAB Ä:Ýê|$~^êÆ¼êÎÒ$MATLAB ?§n±ã

2 `z¯K¦)55yê5yg5y


MATLAB ÄÄÄ::: ÝÝÝêêê|||$$$





Ýê|$

ÝÑ\

A =

1 2 34 5 67 8 9

3 MATLAB ·-I¥§Ñ\

A = [1, 2, 3; 4, 5, 6; 7, 8, 9]

Ue Enter w«(J"

Ý A ±?\ MATLAB ?6ì¥Ñ\±e§Sè¢y

for i=1:1:3

for j=1:1:3

A(i,j)=(i-1)*3+j;

end

end



Ý

3 MATLAB ·-I¥§gÑ\e·-

>> x=[-1.3,sqrt(3),(1+2+3)*4/5]

>> x(5)=abs(x(1))

>> x(4)=abs(x(2))

e

A =

1 2 34 5 67 8 9

3 MATLAB ·-I¥§gÑ\e·-

>> B=[A;[10,11,12]]

>> C=A(1:2,:)

>> D=A(1:2,1:2)

>> E=A([1,3],[1,3])



Ý$

e A = [1, 2, 3; 4, 5, 6; 7, 8, 9]§* B = A′, C = A + B (J¶

e x = [−1, 0, 2]§* y = x − 1 (J¶

e A = [1, 2, 3; 4, 5, 6; 7, 8, 9]§* pi ∗ A, A ∗ pi (J¶

e X = [−1, 0, 2], Y = [−2,−1, 1]§* X ∗ Y ′, X ′ ∗ Y (J¶

¦)àg5§|2x1 + x2 − 5x3 + x4 = 8x1 − 3x2 − 6x4 = 9

2x2 − x3 + 2x4 = −5x1 + 4x2 − 7x3 + 6x4 = 0

âK¿3 MATLAB ·-I¥ïáåXêÝ A Úmàþ b§* det(A), A\b, inv(A) ∗ b, b/A, b ∗ inv(A) (J"



Example 1 2x1 − 4x2 + 5x3 = −4−4x1 − 2x2 + 3x3 = 4

2x1 + 6x2 − 8x3 = 0

g$1µrank(A), rank([A,b]), null(A), pinv(A)*b

Example 2

|^"m(nullspace, AX = 0)y²eA÷§KATA_"¿¦)µ

A =

1 0 12 −1 1−2 3 −2

0 1 5

b =

1016

g$1µrank(A), rank([A,b]), A\b, inv(A’*A)*A’*b, pinv(A)*b



ê|$

\~$¶

e x = [1, 2, 3], y = [4, 5, 6]§* x . ∗ y , x .\y , x ./y (J¶

e x = [1, 2, 3], y = [4, 5, 6]§* x .ˆy , x .ˆ2, 2.ˆx (J"

þeI

* x = 1 : 1 : 5, x = 6 : −1 : 1 (J¶

ã* y = limx→0 sin x/x 4

>> x=pi/2:-0.00001:0.00001

>> y=sin(x)./x

>> plot(x,y,’o’)

e A = [1, 2, 3; 4, 5, 6; 7, 8, 9]§1 * A(1 : 3, 3), A(1 : 2, 2 : 3), A(:, 2) (J¶2 * A([1, 3], [1, 3]), A([1, 3], :) (J¶3 |^Ð1Cò A zF/§¿ rref (A) ¼ê(J'"



éCþ

5¿«©§“;”Ú“...”^¶

eps, pi , i , inf , nan þXÚ3Cþ¶

'X$¥IþIþ§IþÝ§ÝÝLª¶e A = [1, 2, 3; 2, 3, 4], B = [1, 2, 2; 2, 2, 3]§* x = (1 < 2), C = (A <= B), C = (A <= 1) (J¶

Ü6$¥IþIþ§IþÝ§ÝÝLª¶e A = [0, 2, 3; 0, 2, 0], B = [0, 0, 0; 2, 3, 4]§* A&B, A|B, Ã (J¶

~^·-µformat, help, quit, save, save filename, save filename x yz, clear, clc, who, whos, ↑¶ÁO

xn+1 =1

2

(xn +

2

xn

), x1 = 2,

√1 +

√1 +√

1 + . . .



Example 3

ln (1 + x) = x − x2

2+

x3

3− . . .+ (−1)n−1

xn

n+ . . . (−1 < x ≤ 1)

|^Si¼êOln 2¶

|^©)c20ÚCqOln 2¶

|êªc3ÚCqOln 2"

ln1 + x

1− x



Example 4

arctan x = x − x3

3+

x5

5− x7

7+ . . .+ (−1)n−1

x2n−1

2n − 1+ . . . (−1 ≤ x ≤ 1)

|^c20ÚCqOπ¶

e|^î.úª5Oπ§`²OÇ´ÄkJp¶

π

4= arctan

1

2+ arctan

1

3

e|ê_íª2Oπ§`²OÇ´ÄkJp"

an+1 =

√an + 1/

√an

2, bn+1 =

√an(1 + bn)

an + bn

pn+1 =pnbn+1(1 + an+1)

1 + bn+1, a0 =

√2, b0 = 0, p0 = 2 +

√2



Iterative methods for solving a square linear system

We seek an iteration of the form xk+1 = F (xk), where an initial guessx0 ∈ Rn is given and F is simple to compute.

Jacobi Iteration

Ax = b

(U + L + D)x = b

Dx = −(U + L)x + b

x = −D−1(U + L)x + D−1b

Gauss-Seidel Iteration

Ax = b

(U + L + D)x = b

(L + D)x = −Ux + b

x = −(L + D)−1Ux + (L + D)−1b



Example 5

Using Jacobi and Guass-Seidel iteration to solve the following linearequation system and show their computational efficiency.

7x1 + x2 + 2x3 = 10x1 + 8x2 + 2x3 = 8

2x1 + 2x2 + 9x3 = 6

Theorem 6 (Weierstrass Approximation Theorem)

If f (x) is a continuous function in [a, b], then for every ε > 0, there existsa polynomial p(x) such that

|f (x)− p(x)| ≤ ε

for every x ∈ [a, b].


MATLAB ÄÄÄ::: ~~~^êêêÆÆÆ¼¼¼êêê





~^êÆ¼ê

êâÚO

þÝ§max, min¶ex = [9, 4, 7, 8, 3]§*max(x)§min(x)§[y , i ] = max(x)§[y , i ] = min(x)(J¶eA = [1, 5, 3; 7, 2, 6; 9, 3, 8]§*max(A)§max(A, [ ], 1)§max(A, [ ], 2)(J"¦Ú¦È§sum, prod¶²þÚIO§mean, std(A, flag , dim)

σ1 =

√√√√ 1

n − 1

n∑i=1

(xi − x)2 ½ σ2 =

√√√√1

n

n∑i=1

(xi − x)2

'Xê§corrcoef

r =

√∑ni=1(xi − x)(yi − y)√∑n

i=1(xi − x)2 ·√∑n

i=1(yi − y)2

üS§sort"Xi Chen ([email protected]) MATLAB 333²²²LLL+++nnnïïïÄÄÄ¥¥¥AAA^ 17 / 139


õª

õªL«§p(x) = a1xn + a2xn−1 + . . .+ anx1 + an+1 L« n+1 þ p = [a1, a2, . . . , an, an+1]¶

õª\Ú~§$gõªXêØvpg^"Öv¶

õª¦ÚØ§conv(p1,p2), [q,r]=deconv(p1,p2)e p1 = [1, 8, 0, 0,−10], p2 = [2,−1, 3]§* conv(p1,p2),[q,r]=deconv(p1,p2) (J¶

õª¦§polyder(p), polyder(p1,p2), [p,q]=polyder(p1,p2)e p1 = [1,−1], p2 = [1,−1, 3]§* p=polyder(p1),p=polyder(p1,p2), [p,q]=polyder(p1,p2) (J¶



õª¦¦§polyval(p,x), roots, fzeroe p=[1,-6,11,-6], x=1:4§* polyval(p,x), roots(p),fzero(’x-10 x+2’,0.5) (J"

Example 7

-Ù±±êOúª

Mp =rL

12[1− 1

(1+r/12)12y

]Ù¥§L´±ê§ü ´¶Mp´z±ê§ü ´¶rć|Ç¶y´±cê§ü ć"

®20co17±§±ê0.125§¦c|Ç"



¼ê4

¼ê4§fminbnd(’file’,x1,x2)§* fminbnd(’x 3− 2 ∗ x − 5’,0,5) (J¶

õ¼ê4§fminsearch(’file’,x0)¦õ¼ê

f (x , y , z) = x +y2

4x+

z2

y+

2

z

3 (0.5,0.5,0.5) NC"

ïá¼ê© f.m

function w=f(p)

x=p(1);

y=p(2);

z=p(3);

w=x+y^2/(4*x)+z^2/y+2/z;

3·-I¥N^ fminsearch ¼êw=fminsearch(’f’,[0.5,0.5,0.5])


MATLAB ÄÄÄ::: ÎÎÎÒÒÒ$$$





ÎÒ$

ïáÎÒé

ïáÎÒCþÚÎÒ~þ§sym, syms

syms a b;

a=sym(’3’);

b=sym(’4’);

x=3;

y=4;

a*a+b*b

x*x+y*y

cos((a+b)^2)

cos((x+y)^2)

ïáÎÒLª§|^ ’ ’, sym, syms¶

ÎÒLª¥Cþ(½§findsym§ÎÒ~þØw«3(J¥"



ÎÒLª$

\!~!¦!ØÚ$¶

Example 8

-f = 2x2 + 3x − 5, g = x2 − x + 7§*(J"

Ïª©)Ðm§factor, expand¶

Example 9

-s = x3 − 6x2 + 11x − 6§*(J"

ÎÒLªêLªm=§eval¶

Example 10

-x =′ (1 + sqrt(5))/2′§*(J"

zÎÒLª§simple, simplify, pretty"

Example 11

-s = (x3 − 4x2 + 16x)/(x3 + 64)§*(J"



ÎÒÈ©

ÎÒ4§limit(f,x,a), limit(f,x,a,’left’), limit(f,x,a,’right’)¶

Example 12

limx→0

(1 + x)3tx , lim

x→0−

(e

1x − 1

), lim

x→0+

(e

1x − 1

), lim

x→0

|x |x

ÎÒê§diff(s,x), diff(s,x,n)¶

Example 13

y = e−ax2

+ x , y ′; y = cos (x2), y ′′

ÎÒÈ©§int(s,x), int(s,x,a,b)§þeL«(¹¶

Example 14 ∫ 1

0

√1− x2dx ,

∫ sin x

0xtdt,

∫ ∞0

e−x2dx



ÎÒ?ê§symsum(s,x,n,m)¶

Example 15

∞∑n=1

1

n2,

∞∑n=1

(−1)n−11

2n − 1

¼êVÐm§taylor(f,x,a,Name,Value)¶

Example 16

(1). ¦¼êy = tan(x)3x = 0?VÐmªcl¶

(2). ¦¼êy = ln(x)3x = 1?VÐmªcÊ"

ÎÒ§§[x1,. . . ,xn]=solve(s1,. . . ,sn,’x1’,. . . ,’xn’)¶

Example 17 x + ay + a2z = a3

x + by + b2z = b3

x + cy + c2z = c3a,b,c®pÉ¢ê



Example 18 (ÎÒ©)

¤>S^´^¼ê'u¤xê"¤¼ê©O

u1(x) = ln x Ú u2(x) =x1−a

1− a

§¦¤>Sû′(x)">Sû′(x)5½Â

σ(x) = −u′′(x)

u′(x)x .

ÁO^¼êu1(x)Úu2(x)>S^5"

Example 19

|^ rank(A,b)=rank(A); A\b; null(A); Ï)Úy)ÊÚ§¦)e5§|

ax1 − bx2 + bx3 = cbx1 + ax2 + ax3 = d

.



Example 20 (ÎÒÈ©)

e®p = 1§q = 4§¦I¦¼êq(p)§¦Ù÷vI¦d5

ε(p) = −p

q

dq

dp=

2p2

p2 + 1.

Example 21 (JacobiÝ)

þ¼êf = (f1, f2, . . . , fm)'uþv = (x1, x2, . . . , xn)JacobiÝµ

J =∂f

∂v=

∂f1∂x1

∂f1∂x2

. . .∂f1∂xn

......

...

∂fm∂x1

∂fm∂x2

. . .∂fm∂xn

.

ef(ax + by by)Ú(xyz y x + z)§|^jacobian¼êOJacobiÝ"



|^solve·-¦)ê§Ú§|

Example 22 (g§))

¦f = ax2 + bx + c (a 6= 0)ü"

Example 23 (§|))

éu§| 10x + 12y + 16t = 0

5x − y − 13t = 0,

¦CþtÚy'ux)¶

eØ½tÚy§*(J"

Example 24 (CþO)

®ü§§4x2 + y2 = 4Úx2 + 4y2 = 4a§¦ëê§ÎÒ)§¿|^subs¼ê¦a = 1ê)"


MATLAB ÄÄÄ::: MATLAB ???§§§





MATLAB ?§

§S(

^S(§êâÑ\§O?n§êâÑÑ¶ÀJ(§if én«(

(1) ü(

if ^é|

end

(2) ÀJ(

if ^é|1

else

é|2

end



(3) õi@(

if ^1

é|1

elseif ^2

é|2

...

elseif ^m

é|m

else

é|n

end

Example 25

y =

sin x

x, x 6= 0,

1, x = 0.



switch é

õÀJ(

switch Lªcase Lª1

é|1

case Lª2

é|2

...

case Lªm

é|m

otherwise

é|m+1

end

Ìé§for, while, ^þö)¤.Ý¶

break Ú continue é§aÑÌćÌ"



Polynomial Interpolation

Suppose we have a set of ordered points x0, x1, . . . , xn and either acontinuous function f (x) defined on [x0, xn] or a set of y−valuesy0, y1, . . . , yn corresponding to the x−values.

The polynomial interpolation problem is to find a polynomial p(x) ofdegree at most n that interpolates the data (x0, y0), (x1, y1), . . . , (xn, yn)such that

y0 = p(x0)

y1 = p(x1)

...

yn = p(xn)

and we say that p interpolates f (or the data) at x0, x1, . . . , xn, and thatp is an interpolant.



Natural Form

Let x0 < x1 < . . . < xn and y0, y1, . . . , yn be given, and let

p(x) =n∑

i=0

aixi = a0 + a1x + a2x2 + . . .+ anxn

represent the interpolant. We represent the interpolation requirements as

a0 + a1x0 + a2x20 + . . .+ anxn

0 = y0

a0 + a1x1 + a2x21 + . . .+ anxn

1 = y1...

a0 + a1xn + a2x2n + . . .+ anxn

n = yn

⇒

1 x0 . . . xn

0

1 x1 . . . xn1

......

. . ....

1 xn . . . xnn

a = y

Nevertheless, we may need to solve a highly ill-conditioned linear system,and handle miscellaneous problems such as roundoff error, cancellation andRunge phenomenon.



Example 26

Suppose we wish to interpolate a polynomial to the data points(0, 0), (π/2, 1), (π, 0), (3π/2,−1) from the sine curve.

p(x) = 0.0860x3 − 0.8106x2 + 1.6977x .

Plot p(x) and sin x in the same figure.

Polynomial interpolants of high degree tend to be oscillatory, that is, theyoften must make large twists and turns to go through the nodes.

Example 27 (Runge Phenomenon)

Plot the Runge function

f (x) =1

1 + 25x2

with a polynomial interpolant of degree 10 (11 points −1 : .2 : 1).



Figure: Runge’s phenomenon

The red curve is the Runge function.

The blue curve is a 5th-order interpolating polynomial (using sixequally spaced interpolating points).

The green curve is a 9th-order interpolating polynomial (using tenequally spaced interpolating points).



Lagrange Form

Suppose we wish to interpolate a line to two distinct points xa, xb. Define

La(x) =x − xbxa − xb

, Lb(x) =x − xaxb − xa

.

Then we have

La(xa) = 1, La(xb) = 0, Lb(xa) = 0, Lb(xb) = 1,

and both La(x) and Lb(x) are linear functions of x .

Given ya = f (xa) and yb = f (xb), we see that

` = yaLa(x) + ybLb(x)

has the desired properties and interpolates f at (xa, ya) and (xb, yb).



Given an arbitrary list of n + 1 distinct ordered nodes x0 < x1 < . . . < xn,define the functions

Ln,i (x) =(x − x0)(x − x1) . . . (x − xi−1)(x − xi+1) . . . (x − xn)

(xi − x0)(xi − x1) . . . (xi − xi−1)(xi − xi+1) . . . (xi − xn)

=n∏

k=0,k 6=i

(x − xk)

(xi − xk), i = 0, 1, . . . , n.

Langrage polynomials are polynomials of precise degree n with theproperty that Ln,i (xj) = 0 if j 6= i , and Ln,i (xi ) = 1.

Since a sum of polynomials of degree n is a polynomial of degree at mostn, given (x0, y0), (x1, y1), . . . , (xn, yn), we have

p(x) = y0Ln,0(x) + y1Ln,1(x) + . . .+ ynLn,n(x)

=n∑

i=0

yiLn,i (x).



Example 28 (Lagrange Polynomials)

Suppose we wish to interpolate a polynomial to the data points(0, 0), (π/2, 1), (π, 0), (3π/2,−1) from the sine curve.

p1(x) = 0.0860x3 − 0.8106x2 + 1.6977x , (natural form, how?),

p2(x) = 0.0860x3 − 0.8106x2 + 1.6977x , (Lagrange form, how?).

Two forms are in agreement with each other.

What is the difference between

p(x) =n∑

i=0

aixi and p(x) =

n∑i=0

yiLn,i (x) ?

The natural form is easy to evaluate but hard to find.

The Lagrange form is easy to find but hard to evaluate.



Example 29 (Newton’s method)

Suppose that f (λ) = −a0 + a1λ+ a2λ2 + . . .+ anλ

n, where n > 1 andai > 0, i = 0, 1, . . . , n. Here is an iterative technique that generates asequence λ0, λ1, . . . , λk , . . . of estimates that converges to the root λ > 0,solving f (λ) = 0. Start with any λ0 > 0 close to the solution and use

f ′(λk) = a1 + 2a2λk + 3a3λ2k + . . .+ nanλ

n−1k , λk+1 = λk −

f (λk)

f ′(λk).

Locate the root of f (λ) = −1 + λ+ λ2 in [0, 1] with λ0 = 1.



Example 30

Åßµ?Ûu1ê§XJ´óê§òÙØ±2¶XJ´Ûê§òÙ¦±32\1§E$"Á¯µk¦ùL§Øªêíº

Example 31

8¿üSµ´ïá38¿öþ«küS"äNÚ½

(1). m§¦Ùü®²üSSÚ¶

(2). ½ü§Ð ©Oü®²üSSå© ¶

(3). 'ü¤§ÀJé\Ü¿m§¿£Äe ¶

(4). EÚ½3,S¶

(5). ò,Se¤kEÜ¿S"



¼ê©

Ä(

function [a1,...,am]=fname(b1,...,bn)

5º`²Ü©¼êNé

function [s,c]=circle(r)

% circle O» r ¡ÈÚ±% [s,c]=circle(r)

% r »§s ¡È§c ±s=pi*r*r;

c=2*pi*r;

3·-I¥N^ [s,c]=circle(10) Ú help circle



¼êi@N^

Example 32

Fibonacci ê½ÂXeµfn = fn−1 + fn−2f2 = 1f1 = 1

¦ n ≥ 2 ?¿ Fibonacci ê"

function f=fib(n)

% û¦ Fibonacci ê¼ê% f=fib(n)

if n>2

f=fib(n-1)+fib(n-2);

else

f=1;

end



Example 33

(1). |^½È©½Â§©Oæ^Ý/ÚF/O½È©

I =

∫ 1

0x2dx

(2). Á^ü©¦§x3 − 4x2 + 1 = 03[0, 1]þCq"

(3). |^?êÐmª

sin x = limn→∞

Sn =∞∑n=1

an =∞∑n=1

(−1)n−1x2n−1

(2n − 1)!

= x − x3

3!+

x5

5!− x7

7!+ . . .

Osin xCq§¦Øu10−8"

sin x = limn→∞

Sn, |sin x − Sn| ≤ an+1



Example 34

l¯K´±ISÚµ¯KµXÛU38× 8ISÚÚþl§¦?ÛÑÃ¯KÙ¦ºd8§?üÑØU?uÓ^î1!p1½þ"

¦µ¦^êâ(¥Ò(stack)¢y§ª(Jæ^Ý/ªÑÑ"

Example 35

òêÅ3ISÚ8× 8Ú,¥§êUrÚ5K?1£Ä"¦z?\g§rHÚþÜ64"?48§S§¦Ñê1r´"

¦µ¦^êâ(¥Ò(stack)5¢y§òêi1, 2, . . . , 64gW\8× 8§ª(Jæ^Ý/ªÑÑ"


MATLAB ÄÄÄ::: nnn±±±ããã





n±ã

±ü

Example 36

|^ã`²

limx→0+

tan x − sin x

x3=

1

2

x=1:-0.00005:0.001;

y=(tan(x)-sin(x))./(x.^3);

plot(x,y,’o’)

Example 37

Á^Äxü«4 â"close; axis square; fill; pause.Á^4Ixop4r = cos (2θ)"polar(theta,rho,À).



function myfun(n)

close;

clear all;

t=0:.005:2*pi;

x=cos(t);

y=sin(t);

for side=3:n

plot(x, y, ’*’);

axis square;

hold on;

for k=1:side

theta(k)=(2*pi/side)*(k-1);

end

fill(cos(theta), sin(theta), ’r’);

pause(1);

hold off;

end



±õ

Example 38

3ÓIS±ü^Ä

y = sin

(t +

π

4

)Ú y = 2 sin

(2t +

π

8

)t=-4*pi:0.005:4*pi;

y=[sin(t+pi/4);2*sin(2*t+pi/8)];

plot(t,y)

ã/±§|^ hold on/off ¼ê§~Xþ~3 t Ð

y1=sin(t+pi/4);

plot(t,y1);

hold on;

y2=2*sin(2*t+pi/8);

plot(t,y2,’r’);

hold off;



ª

Example 39

3ÓIXS§©O^ØÓ/ÚôÚ±¼ê

y = 0.2e−0.5x cos(4πx) Ú y = 2e−0.5x cos(πx)

ã§¿IÑü:"

x=0:2*pi/1000:2*pi;

y1=0.2*exp(-0.5*x).*cos(4*pi*x);

y2=2*exp(-0.5*x).*cos(pi*x);

k=find(abs(y1-y2)<1e-2);

x3=x(k);

y3=0.2*exp(-0.5*x3).*cos(4*pi*x3);

plot(x,y1,x,y2,’k:’,x3,y3,’ro’);

ã/I5Iµtitle, xlabel, ylabel, zlabel, text(x,y,`²), axis"



Û¼êãµezplot¶

subplot(2,2,1);

ezplot(’sin(x)’);

subplot(2,2,2);

ezplot(’x^2+y^2-1’,[-1,1],[-1,1]);

subplot(2,2,3);

ezplot(’x^3+y^3-5*x*y’);

subplot(2,2,4);

ezplot(’2*cos(t)’,’sin(t)’,[0,2*pi]);



nã/

nµplot3

Example 40

±Ú^ x = cos ty = sin tz = t

(0 ≤ t ≤ 4π)

t=0:0.001:4*pi;

x=cos(t);

y=sin(t);

z=t;

plot3(x,y,z);

grid on;



n¡µnk²¡§2m"

Example 41

±²¡ z = x + y Ú^=Ô¡ z = x2 + y2 ã/"

x=0:1:2;

y=0:1:2;

[x,y]=meshgrid(x,y);

z=x+y;

axis([0,2,0,2,0,4]);

mesh(x,y,z);

x=-8:0.1:8;

y=-8:0.1:8;

[x,y]=meshgrid(x,y);

z=x.^2+y.^2;

surf(x,y,z);



Example 42

Ñ^Sõ>/¡È%C¡Èê¢ã/ü«"

sn =1

2r2n sin

2π

n, Sn = r2n tan

π

n, (n > 2).

Example 43

y = x(1− x)3[0, 1]þ=Ä§lAÛþ`²Y²35"

Example 44

®¼ê

z =1

1 + D2n(x , y)/D0, D(x , y) =

√(x − x0)2 + (y − y0)2

D0 = 200, n = 2, x0 = y0 = 16§^meshÚsurf±nã"


`zzz¯KKK¦¦¦)))




`zzz¯KKK¦¦¦))) 555555yyy




`zzz¯KKK¦¦¦))) 555555yyy

55y¯KIO.

max f (x) = c1x1 + c2x2 + . . .+ cnxn

s.t.

a11x1 + a12x2 + . . .+ a1nxn = b1

a21x1 + c2x22 + . . .+ cnx2n = b2

. . .

am1x1 + am2x2 + . . .+ amnxn = bm

xi ≥ 0, i = 1, 2, . . . , n

½max f (x) = cTxs.t. Ax = b, x ≥ 0.

Example 2.1

òe55y.IOz

min x1 − 2x2 + x3

s.t.

x1 + x2 + x3 ≤ 3

x1 + x2 − 2x3 ≥ 1

− x1 + 2x2 + 3x3 = 4

, x1 ≥ 0, x2 ≥ 0.


`zzz¯KKK¦¦¦))) 555555yyy

55y¯K¥)Vg

Ä)µl A ¥ÀÑÛÉÝ B ¦ BxB + NxN = b¶

1)§1µ÷v D1 = x |Ax = b, x ≥ 0 ¤k)¶Ä1)µ D = x |Ä) ∩ 1)¶`)µ¦8I¼êÄ1)"

Example 2.2

max − x1 + 2x2 − x3

s.t.

x1 + x2 + x3 = 3

x1 − x2 − 2x3 = 1, x1, x2, x3 ≥ 0

A =

(1 1 11 −1 −2

)K B1 =

(P1 P2

)⇒ xB1 =

(2 1 0

)B2 =

(P2 P3

)⇒ xB2 =

(0 7 −4

)B3 ⇒ xB3 =?


`zzz¯KKK¦¦¦))) 555555yyy

55y¯KMATLAB¦)

55y¯KMATLABIO.

min cT xs.t. Ax ≤ b, Aeqx = beq, lb ≤ x ≤ ub

Example 2.3

Ñe55y.¥ c ,A, b,Aeq, beq, lb, ub

max 4x1 − 2x2 + x3

s.t.

2x1 − x2 + x3 ≤ 12

− 8x1 + 2x2 − 2x3 ≥ 8

− 2x1 + x3 = 3

x1 + x2 = 7

x1, x2, x3 ≥ 0


`zzz¯KKK¦¦¦))) 555555yyy

55y¯K¦)MATLAB¼êN^

x = linprog(c ,A, b)

x = linprog(c ,A, b,Aeq, beq)

x = linprog(c ,A, b,Aeq, beq, lb, ub)

x = linprog(c ,A, b,Aeq, beq, lb, ub, x0)

x = linprog(c ,A, b,Aeq, beq, lb, ub, x0, options)

[x , fval ] = linprog(. . .)

[x , fval , exitflag ] = linprog(. . .)

[x , fval , exitflag , output] = linprog(. . .)

[x , fval , exitflag , output, lambda] = linprog(. . .)

Example 2.4

Á^þã·-¦)þ~¥55y."


`zzz¯KKK¦¦¦))) 555555yyy

55y¢~

Example 2.5

,óI)AÚBü«¬±÷v½|I¦"ùü«¬)þI²Lüó²6§"z)1ú6A¬31Ú1ó²6§Ñ©O4Ú6¶z)1ú6B¬331Ú1ó²6§Ñ©O6Ú8"du)Oy¦§ø1Ú1ó²6§ó©O240Ú480"Tó3)B¬Ó§¬ÑB¬C§z)1ú6B¬¬)2ú6B¬C§ ØI\?Û¤^§duB¬C|^Ç¯K§¦C¥Ü©J|§Ù§Ü©U¢"

âØ§ÑÈ1ú6A¬±J|600§ÑÈ1ú6B¬±J|1000§ÑÈ1ú6C¬±J|300§ ¢1ú6C¬Iº200 "²½|ýÿ§3OyÏS§¬CÈþ50 ú6"KAXÛSüAÚBü«¬þ§âU¦óoJ|pº


`zzz¯KKK¦¦¦))) 555555yyy

Example 2.6

,ÅyPk]200§¼ÂÃ§TÅû½òù200?1Ý]§±Ï£"yko«YøÀJ§Ý]ªzccÐòÅ±k¤k]ÑûÝ]"

1 l11c14czccÐÑIÝ]§gcc"Â£|1.15¶

2 13ccÐÝ]§15cc"Â£|1.25§Ý]80¶

3 12ccÐÝ]§15cc"Â£|1.40§Ý]60¶

4 zccÐÝ]§zcc"Â£|1.06"

KAæ^Û«Ý]|ÜüÑ§¦TÅ15cc"o]º

11c 12c 13c 14c 15c

Y x11 x21 x31 x41Y x32Yn x23Yo x14 x24 x34 x44 x54


`zzz¯KKK¦¦¦))) 555555yyy

Example 2.7

küïáAÚB§zcâþ©O35ëÚ55ë§ùâøA`!¯!ZnïÓó/§zïÓó/éâI¦þ©O26ë!38ëÚ26ë§ïáïÓó/m$¤£/ë¤XeL¤«§KAXÛN$âU¦o$¤º

ó/` ó/¯ ó/Z

ïáA 10 12 9ïáB 8 11 13

ó/` ó/¯ ó/Z ÑÑoþ

ïáA x11 x12 x13 35ïáB x21 x22 x21 55Âoþ 26 38 26 90


`zzz¯KKK¦¦¦))) 555555yyy

Example 2.8

,r|k5«"®«rü dÚzü rx!¶Ô!)¹þXeL¤«§qT|zFIx70ü !¶Ô3ü !)10Îü "¯XÛ·ÜNù5«r§âU¦o¤$º

r«a x/ü ¶Ô/ü )/Îü üd

1 0.30 0.10 0.05 22 2.20 0.05 0.10 73 1.00 0.02 0.02 44 0.60 0.20 0.20 35 1.80 0.05 0.08 5


`zzz¯KKK¦¦¦))) 555555yyy

Example 2.9

¦)Xe`z¯K

min |x |+ |y |+ |z |,

s.t.

x + y ≤ 1,

2x + z = 3.

|êãC

m =x + |x |

2, n =

|x | − x

2,

ò55y¯K=z¤55y¯K¦)"


`zzz¯KKK¦¦¦))) êêê555yyy





ê5yêÆ.

max cT x ,

s.t.

Ax = b,

x ≥ 0, Ü½öÜ©ê.

y©Xê5y§·Üê5yÚ0–15y"

ê5y¯K MATLAB ¦)

5·Üê5y¯K[x , fval , exitflag ] = intlinprog(c , intcon,A, b,Aeq, beq, lb, ub)

min cT x ,

s.t.

Ax ≤ b,

Aeqx = beq,

lb ≤ x ≤ ub,

xi ≥ 0, i = 1, 2, . . . , n,

xjê, j ∈ intcon.



Example 2.10

max x1 + x2,

s.t.

4x1 − 2x2 ≥ 1,

4x1 + 2x2 ≤ 11,

2x2 ≥ 1,

x1, x2 ≥ 0ê.

c=[-1;-1];

A=[-4 2;4 2;0 -2];

b=[-1;11;-1];

lb=[0;0];

intcon=[1;2];

[x,fval]=linprog(c,A,b,[],[],lb,[])

[x1,fval1]=intlinprog(c,intcon,A,b,[],[],lb,[])



^ MATLAB ¦) 0–1 5ê5y¯K

min cT x ,

s.t.

Ax ≤ b,

Aeqx = beq,

x = 0 or 1.



Example 2.11

éu 0–1 5y.

max 20x1 + 6x2 + 8x3 + 9x4,

s.t.

10x1 + 6x2 + 5x3 + 2x4 ≤ 19,

7x1 + 2x2 + 2x3 + 4x4 ≤ 11,

2x1 + x2 + x3 + 10x4 ≤ 12,

x4 ≤ x2 + x3,

x1, x2, x3, x4 = 0 or 1.

©O^ linprog Ú intlinprog ¦)"



Example 2.12

À1öÑ1InÙ1o§´¨õU«É25ú6Ô¬"TÀ1ök6Ô¬ø§Ô¬éA?Ò§þ4ÙdXeL¤«"du§TÀ1öû½¦UpdÔ¬§@o¦ATXÛÀJ¦Ô¬odº

Ô¬?Ò 1 2 3 4 5 6

þ/ú6 4.8 6.2 5.7 3.6 4.4 8.5d/ 120 180 150 100 90 230

Example 2.13

,û8ck«5Ag+§Ù½Ý17 "y3IâréÝI¦éÙ)g+?1§XJrI208!406!804T«Ag+§KAXÛég+âU¦^£I²(XÛ½Â^¤º



Example 2.14

,ó),«¬±÷v½|I¦"²L½|NïÚÚO§ýOT«>ì35oGÝ½|I¦þ©O1500§2000§4000Ú1000"XJTû½3,GÝmó§Ió§O¤20000 §zÅì)¤500§XJ÷v½|I¦±GÝ"k¥¬§Kz>ì;¤10"b½óÃÐ©¥§AXÛSü)§âU3÷v½|I¦ê¦o¤^?

Example 2.15

,ë£U,3Umã¤IÑÖ<êXeL¤«"bÑÖþm2mãm©§¿ëYó8"KTU,Aõ¶ÑÖº

g 1 2 3 4 5 6

mã 08–12 12–16 16–20 20–24 00–04 04–08I¦<ê 100 120 80 60 30 50



©½.Ú½£z8I¼ê¤

(1). Ð©¦)ê5ytµ¯Kµ¦)Ùtµ55y§eê)§=ê5y`)"ÄK§Ð©e.−∞"

(2). ïá©äµ3?Û£f¤¯K¥§lØ÷vê¦Cþ¥ÀÑ?1?n§ÏL\\ép½åò£f¤¯K©|üÉ?Úåf¯K§ |¢«"dd§f¯KeØ÷vê¦K?Úe?1©§/¤©ä"

(3). ½.µÏLØä/©Ú¦)f¯K§©½.òØä?d®²`ê)(½e."Ù¥§¦)f¯KUÑy±e(Jµ

Ã1)§ÃLUY©¶ê)§ÃLUY©§#e.¶ê)§À8I¼ê¹û½ÚY©"

(4). UìþãÚ½S§ze.?U±§Au¤kvk¦)Lf¯K¿@8I¼êu#e.f¯K"



Example 2.16

^©|½.¦)ê5y¯K

max x1 + x2,

s.t.

4x1 − 2x2 ≥ 1,

4x1 + 2x2 ≤ 11,

2x2 ≥ 1,

x1, x2 ≥ 0ê.


`zzz¯KKK¦¦¦))) ggg555yyy





g5y¯KêÆ.

min1

2xtHx + ctx,

s.t. Ax ≥ b.

Ù¥§H ∈ Rn×nn¢é¡Ý"

¦¦¦)))

£kªå¤¶

.KF¦f£kªå¤¶

fm&6£kªå½ökþe.å¤¶

k8¶

Wolfe¶

Lemke"



µòA©)¤A = (B,N)§Ù¥BÄÝ§A/òx, c,HXe©¬µ

x =

(xBxN

), c =

(cBcN

), H =

(H11 H12

H21 H22

),

KBxB + NxN = b§=xB = B−1b− B−1NxN§\g5y

min φ(x) =1

2xtNH2xN + ctNxN .

Ù¥

H2 = H22 −H21B−1N−Nt

(B−1

)tH12 + Nt

(B−1

)tH11B

−1N,

cN = cN −Nt(B−1

)tcB +

[H21 −Nt

(B−1

)tH11

]B−1b.

eH2½§Ãå¯K`)

x∗N = −H−12 cN ⇒ x∗ =

(x∗Bx∗N

)=

(B−1b0

)+

(B−1N−I

)H−12 cN .



`:µgü²§¦^B¶":µBUCÛÉ§l Úåx∗êØ½"

Example 2.17

^¦)g5y¯Kµ

min x21 + x2

2 + x23 ,

s.t.

x1 + 2x2 − x3 = 4,

x1 − x2 + x3 = −2.

x∗ = (x∗1 , x∗2 , x∗3 )t =

(2

7,

10

7,−6

7

)t

.

dAtλ∗ = ∇f (x∗) = Hx∗ + c

λ∗ = (λ∗1, λ∗2)t =

(8

7,−4

7

)t

.



Example 2.18

.KF¦fnµ

L(x,λ) =1

2xtHx + ctx− λt(Ax− b),

∇xL(x,λ) = 0,

∇λL(x,λ) = 0,⇒(

H −At

−A 0

)(xλ

)= −

(cb

).

=K-T^§Ù¥.KFÝé¡Ø½½¶e½§Kk(H −At

−A 0

)−1=

(Q −R−Rt G

),

Q = H−1 −H−1At(AH−1At)−1AH−1 = H−1 − RAH−1,R = H−1At(AH−1At)−1 = −H−1AtG,G = −(AH−1At)−1.



x(0)´?1)§=÷vAx(0) = b§K3x(0)?8I¼êFÝ±L«∇f

(x(0))

= Hx(0) + c"Kdx∗ = −Qc + Rb,λ∗ = Rtc− Gb,

⇒

x∗ = x(0) −Q∇f(x(0)),

λ∗ = Rt∇f(x(0)).

Example 2.19

^.KF¦)g5y¯Kµ

min x21 + 2x2

2 + x23 − 2x1x2 + x3,

s.t.

x1 + x2 + x3 = 4,

2x1 − x2 + x3 = 2.

x∗ =

(21

11,

43

22,

3

22

)t

.



g5y¯KMATLAB¦)

g5y¯KMATLABIO.

min 12xTHx + cT x

s.t. Ax ≤ b, Aeqx = beq, lb ≤ x ≤ ub

55y¯K¦)MATLAB¼êN^

x = quadprog(H, c ,A, b)x = quadprog(H, c ,A, b,Aeq, beq)x = quadprog(H, c ,A, b,Aeq, beq, lb, ub)x = quadprog(H, c ,A, b,Aeq, beq, lb, ub, x0)[x , fval ] = quadprog(. . .)



Example 2.20

Á^þã·-¦)þ~¥g5y¯K"

Example 2.21

¦)±eg5y¯Kµ

min1

2x21 + x2

2 − x1x2 − 2x1 − 6x2,

s.t.

x1 + x2 ≤ 2,

− x1 + 2x2 ≤ 2,

2x1 + x2 ≤ 3,

x1, x2 ≥ 0.



Definition 2.1

A set Ω in Rn is said to be convex if for every x1, x2 ∈ Ω and every realnumber α, 0 < α < 1, the point αx1 + (1− α)x2 ∈ Ω.

Definition 2.2

A function f defined on a convex set Ω is said to be convex if, for everyx1, x2 ∈ Ω and every α, 0 ≤ α ≤ 1, there holds

f (αx1 + (1− α)x2) ≤ αf (x1) + (1− α)f (x2).

If, for every α, 0 < α < 1, and x1 6= x2, there holds

f (αx1 + (1− α)x2) < αf (x1) + (1− α)f (x2).

Then f is said to be strictly convex.



Definition 2.3

A function g defined on a convex set Ω is said to be concave if thefunction −g is convex, and be strictly concave if −g is strictly convex.

Definition 2.4

If f ∈ C 1 is a real-valued function on Rn, f (x) = f (x1, x2, . . . , xn), definethe gradient of f to be the vector

∇f (x) =

[∂f (x)

∂x1,∂f (x)

∂x2, . . . ,

∂f (x)

∂xn

].

If f ∈ C 2, then define the Hessian of f at x to be the n × n matrix

F(x) =

[∂2f (x)

∂xi∂xj

].

∂2f (x)

∂xi∂xj=∂2f (x)

∂xj∂xi⇒ F(x) = [F(x)]t .



Proposition 2.1

Let f1 and f2 be convex functions on the convex set Ω. Then the functionsf1 + f2 and af (a > 0) are both convex on Ω.

Proof.

Let x1, x2 ∈ Ω, and 0 < α < 1. Then

f1(αx1 + (1− α)x2) + f2(αx1 + (1− α)x2)

≤α[f1(x1) + f2(x1)] + (1− α)[f1(x2) + f2(x2)].

The conclusion about the function af (a > 0) is immediate.

Proposition 2.2

Let f1, f2, . . . , fn be convex functions over the convex set Ω. Then apositive combination of fi , namely,

∑ni=1 ai fi = a1f1 + a2f2 + . . .+ anfn

(ai > 0), is again convex.



Proposition 2.1

Let f1 and f2 be convex functions on the convex set Ω. Then the functionsf1 + f2 and af (a > 0) are both convex on Ω.

Proof.

Let x1, x2 ∈ Ω, and 0 < α < 1. Then

f1(αx1 + (1− α)x2) + f2(αx1 + (1− α)x2)

≤α[f1(x1) + f2(x1)] + (1− α)[f1(x2) + f2(x2)].

The conclusion about the function af (a > 0) is immediate.

Proposition 2.2

Let f1, f2, . . . , fn be convex functions over the convex set Ω. Then apositive combination of fi , namely,

∑ni=1 ai fi = a1f1 + a2f2 + . . .+ anfn

(ai > 0), is again convex.



Proposition 2.3

Let f ∈ C 1. Then f is convex over a convex set Ω if and only iff (y) ≥ f (x) +∇f (x)(y − x) for all x, y ∈ Ω.

Proof.

“⇐” Since f is convex, then for all α, 0 ≤ α ≤ 1,

f (αy + (1− α)x) ≤ αf (y) + (1− α)f (x),

which indicates that for all α, 0 < α ≤ 1,

f (x + α(y − x))− f (x)

α≤ f (y)− f (x),

limα→0

[f (x + α(y − x))− f (x)](y − x)

α(y − x)= ∇f (x)(y − x) ≤ f (y)− f (x).



Proposition 2.3

Let f ∈ C 1. Then f is convex over a convex set Ω if and only iff (y) ≥ f (x) +∇f (x)(y − x) for all x, y ∈ Ω.

Proof.

“⇐” Since f is convex, then for all α, 0 ≤ α ≤ 1,

f (αy + (1− α)x) ≤ αf (y) + (1− α)f (x),

which indicates that for all α, 0 < α ≤ 1,

f (x + α(y − x))− f (x)

α≤ f (y)− f (x),

limα→0

[f (x + α(y − x))− f (x)](y − x)

α(y − x)= ∇f (x)(y − x) ≤ f (y)− f (x).



continued.

“⇒” Assume f (y) ≥ f (x) +∇f (x)(y − x) for all x, y ∈ Ω. Fix x1, x2 ∈ Ωand α (0 ≤ α ≤ 1). Setting x = αx1 + (1− α)x2, we have

f (x1) ≥ f (x) +∇f (x)(x1 − x), (1)

f (x2) ≥ f (x) +∇f (x)(x2 − x). (2)

Multiplying (1) by α and (2) by 1− α and adding, we obtain

αf (x1) + (1− α)f (x2) ≥ f (x) +∇f (x)(αx1 + (1− α)x2 − x).

Substituting x = αx1 + (1− α)x2, we have

αf (x1) + (1− α)f (x2) ≥ f (αx1 + (1− α)x2).



The original definition essentially states that linear interpolation betweentwo points overestimates the function.

The above proposition states that linear approximation based on the localderivative underestimates the function.



Theorem 45

A quadratic form f (x) = xtAx is positive definite if and only if leadingprincipal minors of A are all positive. It is negative definite if and only if

|Ak | < 0, if k is odd,

|Ak | > 0, if k is even.

Proof.

We skip the proof for positive definiteness. For negative definiteness, since

f (x) = f (x1, x2, . . . , xn) = xtAx

is negative definite, we have

−f (x) = −f (x1, x2, . . . , xn) = −xtAx = xt(−A)x

is positive definite. The conclusion is immediate.



Theorem 45

A quadratic form f (x) = xtAx is positive definite if and only if leadingprincipal minors of A are all positive. It is negative definite if and only if

|Ak | < 0, if k is odd,

|Ak | > 0, if k is even.

Proof.

We skip the proof for positive definiteness. For negative definiteness, since

f (x) = f (x1, x2, . . . , xn) = xtAx

is negative definite, we have

−f (x) = −f (x1, x2, . . . , xn) = −xtAx = xt(−A)x

is positive definite. The conclusion is immediate.



Theorem 2.1 (Taylor’s Theorem)

If f ∈ C 1 in a region containing the line segment [x1, x2], then there is anα (0 ≤ α ≤ 1) such that

f (x2) = f (x1) +∇f (αx1 + (1− α)x2)(x2 − x1).

Furthermore, if f ∈ C 2, then there is a α (0 ≤ α ≤ 1) such that

f (x2) = f (x1) +∇f (x1)(x2 − x1)

+1

2(x2 − x1)tF(αx1 + (1− α)x2)(x2 − x1),

where F denotes the Hessian of f .

Definition 2.5

Given x ∈ Ω, a vector d is a feasible direction at x if there is an α suchthat x + αd ∈ Ω for all α (0 ≤ α ≤ α).



Theorem 2.2

Let f ∈ C 2 be quadratic function. Then f is convex over a convex set Ωcontaining an interior point if and only if the Hessian matrix F of f ispositive semidefinite throughout Ω.

Proof.

By Taylor’s theorem, for some α (0 ≤ α ≤ 1),

f (y) = f (x) +∇f (x)(y − x) +1

2(y − x)tF(x + α(y − x))(y − x). (3)

Since f is a quadratic function, (3) reduces to

f (y) = f (x) +∇f (x)(y − x) +1

2(y − x)tF(y − x).

By Proposition 2.3, the conclusion is immediate.



Theorem 2.2

Let f ∈ C 2 be quadratic function. Then f is convex over a convex set Ωcontaining an interior point if and only if the Hessian matrix F of f ispositive semidefinite throughout Ω.

Proof.


f (y) = f (x) +∇f (x)(y − x) +1

2(y − x)tF(x + α(y − x))(y − x). (3)

Since f is a quadratic function, (3) reduces to

f (y) = f (x) +∇f (x)(y − x) +1

2(y − x)tF(y − x).

By Proposition 2.3, the conclusion is immediate.



Theorem 2.3

Let f ∈ C 2. Then f is convex over a convex set Ω containing an interiorpoint if and only if the Hessian matrix F of f is positive semidefinitethroughout Ω.

Proof.


f (y) = f (x) +∇f (x)(y − x) +1

2(y − x)tF(x + α(y − x))(y − x). (4)

“⇒” Clearly, if the Hessian is everywhere positive semidefinite, we have

f (y) ≥ f (x) +∇f (x)(y − x), (5)

which in view of Proposition 2.3 implies that f is convex.



Theorem 2.3

Let f ∈ C 2. Then f is convex over a convex set Ω containing an interiorpoint if and only if the Hessian matrix F of f is positive semidefinitethroughout Ω.

Proof.


f (y) = f (x) +∇f (x)(y − x) +1

2(y − x)tF(x + α(y − x))(y − x). (4)

“⇒” Clearly, if the Hessian is everywhere positive semidefinite, we have

f (y) ≥ f (x) +∇f (x)(y − x), (5)

which in view of Proposition 2.3 implies that f is convex.



continued.

“⇐” Now suppose the Hessian is not positive semidefinite at some pointx ∈ Ω.

By the continuity of the Hessian it can be assumed, without loss ofgenerality, that x is an interior point of Ω. There is a y ∈ Ω such that

(y − x)tF(x)(y − x) < 0.

Again by the continuity of the Hessian, y may be selected so that for allα (0 ≤ α ≤ 1),

(y − x)tF(x + α(y − x))(y − x) < 0.

By (4), (5) does not hold; which in view of Proposition 2.3 implies that fis not convex.



Theorem 2.4

Let f be a convex function defined on the convex set Ω. Then the set Γwhere f achieves its minimum is convex, and any relative minimum of f isa global minimum.

Proof.

If f has no relative minima the theorem is valid by default. OtherwiseΓ = x : x ∈ Ω, f (x) ≤ c. Let x1, x2 ∈ Γ. Then for α (0 < α < 1),

f (αx1 + (1− α)x2) ≤ αf (x1) + (1− α)f (x2) ≤ c .

Suppose now that x∗ ∈ Ω is a relative minimum point of f , but that thereis another point y ∈ Ω with f (y) < f (x∗). On the line αy + (1− α)x∗,

f (αy + (1− α)x∗) ≤ αf (y) + (1− α)f (x∗) < f (x∗),

contradicting the fact that x∗ is a relative minimum point.



Theorem 2.4

Let f be a convex function defined on the convex set Ω. Then the set Γwhere f achieves its minimum is convex, and any relative minimum of f isa global minimum.

Proof.

If f has no relative minima the theorem is valid by default. OtherwiseΓ = x : x ∈ Ω, f (x) ≤ c. Let x1, x2 ∈ Γ. Then for α (0 < α < 1),

f (αx1 + (1− α)x2) ≤ αf (x1) + (1− α)f (x2) ≤ c .

Suppose now that x∗ ∈ Ω is a relative minimum point of f , but that thereis another point y ∈ Ω with f (y) < f (x∗). On the line αy + (1− α)x∗,

f (αy + (1− α)x∗) ≤ αf (y) + (1− α)f (x∗) < f (x∗),

contradicting the fact that x∗ is a relative minimum point.



Proposition 2.4 (First-order necessary conditions)

Let Ω be a subset of Rn and let f ∈ C 1 be a function on Ω. If x∗ is arelative minimum point of f over Ω, then for any feasible direction d ∈ Rn

at x∗, we have ∇f (x∗)d ≥ 0.

Proof.

For any α (0 ≤ α ≤ α), the point x(α) = x∗ + αd ∈ Ω. Define thefunction g(α) = f (x(α)). Then g(α) has a relative minimum at α = 0.By the ordinary calculus we have

g(α)− g(0) = g ′(0)α + o(α),

where o(α) denotes terms that go to zero faster than α. If g ′(0) < 0,then, for sufficiently small values of α > 0, the right side will be negative,and hence g(α)− g(0) < 0, which contradicts the minimal nature of g(0).Thus g ′(0) = ∇f (x∗)d ≥ 0.



Proposition 2.4 (First-order necessary conditions)


at x∗, we have ∇f (x∗)d ≥ 0.

Proof.

For any α (0 ≤ α ≤ α), the point x(α) = x∗ + αd ∈ Ω. Define thefunction g(α) = f (x(α)). Then g(α) has a relative minimum at α = 0.By the ordinary calculus we have

g(α)− g(0) = g ′(0)α + o(α),

where o(α) denotes terms that go to zero faster than α. If g ′(0) < 0,then, for sufficiently small values of α > 0, the right side will be negative,and hence g(α)− g(0) < 0, which contradicts the minimal nature of g(0).Thus g ′(0) = ∇f (x∗)d ≥ 0.



Corollary 2.1

Let Ω be a subset of Rn, and let f ∈ C 1 be a function on Ω. If x∗ is arelative minimum point of f over Ω and if x∗ is an interior point of Ω,then ∇f (x∗) = 0.

Solve n equations in n unknowns system to derive the solution?

Example 2.22

Consider the problem

min x21 − x1 + x2 + x1x2,

s.t. x1, x2 ≥ 0.

The problem has a global minimum at x∗ = (1/2, 0)t . At x∗, the partialderivatives, ∇f (x∗) = (0, 3/2), do not both vanish, but ∇f (x∗)d ≥ 0 forall feasible direction d ∈ R2 at x∗ = (1/2, 0)t .



Proposition 2.5 (Second-order necessary conditions)


at x∗, we have

i). ∇f (x∗)d ≥ 0;

ii). if ∇f (x∗)d = 0, then dtF(x∗)d ≥ 0.

Proof.

The first condition is just Proposition 2.4, and the second conditionapplies only if ∇f (x∗)d = 0. Introduce x(α) = x∗ + αd and defineg(α) = f (x(α)) as before. Then in view of g ′(0) = 0, we have

g(α)− g(0) =1

2g ′′(0)α2 + o(α2). (6)

If g ′′(0) < 0, the right side of (6) is negative for sufficiently small α, whichcontradicts the relative minimum nature of g(0).



Proposition 2.5 (Second-order necessary conditions)


at x∗, we have

i). ∇f (x∗)d ≥ 0;

ii). if ∇f (x∗)d = 0, then dtF(x∗)d ≥ 0.

Proof.

The first condition is just Proposition 2.4, and the second conditionapplies only if ∇f (x∗)d = 0. Introduce x(α) = x∗ + αd and defineg(α) = f (x(α)) as before. Then in view of g ′(0) = 0, we have

g(α)− g(0) =1

2g ′′(0)α2 + o(α2). (6)

If g ′′(0) < 0, the right side of (6) is negative for sufficiently small α, whichcontradicts the relative minimum nature of g(0).



Corollary 2.2

Let x∗ be an interior point of the set Ω, and suppose x∗ is a relativeminimum point over Ω of the function f ∈ C 2. Then

i). ∇f (x∗) = 0;

ii). For all d, dtF(x∗)d ≥ 0.

Example 2.23


min f (x1, x2) = x21 − x1x2 + x2

2 − 3x2.

There are no constraints, so Ω = R2. Setting the partial derivatives of fequal to zero yields the two equations, which have the solution x = (1, 2)t ,the global minimum point of f .



Example 2.24


min x31 − x2

1x2 + 2x22 ,

s.t. x1, x2 ≥ 0.

Assume the solution is in the interior of Ω. By Corollary 2.2 i), we havex∗ = (6, 9)t . Nevertheless, x∗ is not a relative minimum point sinceaccording to

F(x) =

(6x1 − 2x2 −2x1−2x1 4

),

we have

F(x∗) =

(18 −12−12 4

),

whose determinant is −72, and thus F(x) is not positive semidefinite at x∗.



Example 2.25 (Approximation)

Suppose that through an experiment the value of a function g is observedat m points, x1, x2, . . . , xm. Thus, g(x1), g(x2), . . . , g(xm) are known. Wewish to approximate the function by a polynomial

h(x) =n∑

i=0

aixi = anxn + an−1xn−1 + . . .+ a1x + a0

of degree n (or less), where n < m. Corresponding to any choice of theapproximating polynomial, there will be a set of errors εk = g(xk)− h(xk).We define the best approximation as the polynomial that minimizes thesum of the squares of these errors; i.e., minimizes

f (a) =m∑

k=1

(εk)2 =m∑

k=1

(g(xk)− [an(xk)n + . . .+ a1xk + a0])2 .

with respect to a = (a0, a1, . . . , an) to find the best coefficients.



Example 2.26 (continued)

To find a compact representation for this objective, define

qij =m∑

k=1

(xk)i+j , bj =m∑

k=1

g(xk)(xk)j , c =m∑

k=1

g(xk)2.

Then after a bit of algebra it can be shown that

f (a) = atQa− 2bta + c ,

where Q = (qij) and b = (b1, b2, . . . , bn+1)t .

The first-order necessary conditions state that the gradient of f mustvanish. This leads directly to the system of n + 1 equations

Qa = b,

which can be solved to determine a.



Proposition 2.6 (Second-order sufficient conditions)

Let f ∈ C 2 be a function defined on a region in which the point x∗ is aninterior point. Suppose in addition that

i). ∇f (x∗) = 0;

ii). F(x∗) is positive definite.

Then x∗ is a strict relative minimum point of f .

Proof.

Since F(x∗) is positive definite, there exists an a > 0 such that for all d,dtF(x∗)d ≥ a‖d‖2. Thus, by Taylor’s Theorem, we have

f (x∗ + d)− f (x∗) =1

2dtF(x∗)d + o(‖d‖2) ≥ a

2‖d‖2 + o(‖d‖2),

which implies that both sides are positive for small d.



Proposition 2.6 (Second-order sufficient conditions)

Let f ∈ C 2 be a function defined on a region in which the point x∗ is aninterior point. Suppose in addition that

i). ∇f (x∗) = 0;

ii). F(x∗) is positive definite.

Then x∗ is a strict relative minimum point of f .

Proof.

Since F(x∗) is positive definite, there exists an a > 0 such that for all d,dtF(x∗)d ≥ a‖d‖2. Thus, by Taylor’s Theorem, we have

f (x∗ + d)− f (x∗) =1

2dtF(x∗)d + o(‖d‖2) ≥ a

2‖d‖2 + o(‖d‖2),

which implies that both sides are positive for small d.



Theorem 2.5

Let f ∈ C 1 be convex on the convex set Ω. If there is a point x∗ ∈ Ω suchthat, for all y ∈ Ω, ∇f (x∗)(y − x) ≥ 0, then x∗ is a global minimum pointof f over Ω.

Proof.

Since y − x∗ is a feasible direction at x∗, the given condition is equivalentto the first-order necessary condition. The proof is immediate, since byProposition 2.3, we have

f (y) ≥ f (x∗) +∇f (x∗)(y − x∗) ≥ f (x∗).



Theorem 2.5

Let f ∈ C 1 be convex on the convex set Ω. If there is a point x∗ ∈ Ω suchthat, for all y ∈ Ω, ∇f (x∗)(y − x) ≥ 0, then x∗ is a global minimum pointof f over Ω.

Proof.

Since y − x∗ is a feasible direction at x∗, the given condition is equivalentto the first-order necessary condition. The proof is immediate, since byProposition 2.3, we have

f (y) ≥ f (x∗) +∇f (x∗)(y − x∗) ≥ f (x∗).




min f (x), s.t. x ∈ Ω. (7)

Consider the set Γ ⊂ Rn+1 = (r , x) : f (x) ≤ r , x ∈ Rn. Suppose thatx∗ ∈ Ω is the minimizing point with f ∗ = f (x∗) and construct a tubularregion B with cross section Ω extending vertically from −∞ up to f ∗. B isalso a convex set, and it overlaps the convex set Γ only at the boundarypoint (f ∗,b∗) above x∗.



Theorem 2.6 (Separating Hyperplane Theorem)

Let B and C be convex sets with no common relative interior points, i.e.,the only common points are boundary points. Then there is a hyperplaneseparating B and C , that is, there is a nonzero vector a such that

supb∈B

atb ≤ infc∈C

atc.

According to Theorem 2.6, there is a hyperplane separating these two sets.

sr + λtx ≥ c , for all x ∈ Rn and f (x) ≤ r , (8)

sr + λtx ≤ c , for all x ∈ Ω and r ≤ f ∗. (9)

It follows that s 6= 0; for otherwise λ 6= 0 and then (8) would be violatedfor some x ∈ Rn. It also follows that s ≥ 0 since otherwise (9) would beviolated by very negative values of r . As a consequence, we find s > 0 andby appropriate scaling we may take s = 1.



Proposition 2.7 (Zero-order necessary conditions)

If x∗ solves (7) under the stated convexity conditions, then there is anonzero vector λ ∈ Rn such that x∗ is a solution to the two problems,

min f (x) + λtx, s.t. x ∈ Rn, (10)

max λtx, s.t. x ∈ Ω. (11)

Proof.

Problem (10) follows from (8) (with s = 1) and the fact that f (x) ≤ r forr ≥ f (x). The value c is attained from above at (f ∗, x∗). Likewiseproblem (11) follows from (9) and the fact that x∗ and the appropriate rattain c from below.

Example 2.27 (Investigate what will happen to f ∈ C 1 in R1 over [0, 1])

let f ∈ C 1 on Rn, and let f have a minimum with respect to Ω at x∗. Letd ∈ Rn be a feasible direction at x∗. By (10), we have ∇f (x∗)d ≥ 0.



Proposition 2.7 (Zero-order necessary conditions)

If x∗ solves (7) under the stated convexity conditions, then there is anonzero vector λ ∈ Rn such that x∗ is a solution to the two problems,

min f (x) + λtx, s.t. x ∈ Rn, (10)

max λtx, s.t. x ∈ Ω. (11)

Proof.

Problem (10) follows from (8) (with s = 1) and the fact that f (x) ≤ r forr ≥ f (x). The value c is attained from above at (f ∗, x∗). Likewiseproblem (11) follows from (9) and the fact that x∗ and the appropriate rattain c from below.

Example 2.27 (Investigate what will happen to f ∈ C 1 in R1 over [0, 1])

let f ∈ C 1 on Rn, and let f have a minimum with respect to Ω at x∗. Letd ∈ Rn be a feasible direction at x∗. By (10), we have ∇f (x∗)d ≥ 0.



Proposition 2.8 (Zero-order sufficiency conditions)

If there is a λ such that x∗ ∈ Ω solves the problems (10) and (11), then x∗

solves (7).

Proof.

Suppose x1 is any other point in Ω. Then from (10),

f (x1) + λtx1 ≥ f (x∗) + λtx∗.

This can be rewritten as

f (x1)− f (x∗) ≥ λtx∗ − λtx1.

By problem (11), the right hand side is greater than or equal to zero.Hence f (x1)− f (x∗) ≥ 0, which establishes the result.



Proposition 2.8 (Zero-order sufficiency conditions)

If there is a λ such that x∗ ∈ Ω solves the problems (10) and (11), then x∗

solves (7).

Proof.

Suppose x1 is any other point in Ω. Then from (10),

f (x1) + λtx1 ≥ f (x∗) + λtx∗.

This can be rewritten as

f (x1)− f (x∗) ≥ λtx∗ − λtx1.

By problem (11), the right hand side is greater than or equal to zero.Hence f (x1)− f (x∗) ≥ 0, which establishes the result.



Line Search Methods

FIBONACCI and Golden Section Search

Line Search by Curve Fitting: Newton’s Method, Method of FalsePosition, Cubic Fit, Quadratic Fit

Inaccurate Line Search: Percentage Test, Armijo’s Rule, GoldsteinTest, Wolfe Test, Backtracking



FIBONACCI Search

FIBONACCI method determines the minimum value of a function f over aclosed interval [c1, c2]. The only property that is assumed of f is unimodal.

After values are known at N points x1, x2, . . . , xN with

c1 = x0 ≤ x1 < x2 < . . . < xN−1 < xN ≤ xN+1 = c2,

the region of uncertainty is the interval [xk−1, xk+1] where xk is theminimum point among the N.Xi Chen ([email protected]) MATLAB 333²²²LLL+++nnnïïïÄÄÄ¥¥¥AAA^ 109 / 139


Let d1 = c2 − c1 be the initial width of uncertainty, and dk be the width ofuncertainty after k measurements. If we have N measurements in total,

dk =

(FN−k+1

FN

)d1,

where the integers Fk are members of the Fibonacci sequence generatedby the recurrence relation FN = FN−1 + FN−2 with F0 = F1 = 1.

Recursive Structure

Ln−1 = 2Ln − ε;Ln−2 = Ln−1 + Ln = 3Ln − ε;Ln−3 = Ln−2 + Ln−1 = 5Ln − 2ε;

. . .

Ln−k = Ln−(k−1) + Ln−(k−2) = Fk+1Ln − Fk−1ε.

To reach certain pre-specified accuracy δ, we need to satisfy Fk ≥ 1/δ.



FIBONACCI Search Algorithm (Minimization)

1 Determine k according to δ, and a0, b0. Compute t1 and t2 by

t1 = b0 −Fk−1

Fk(b0 − a0), t2 = a0 +

Fk−1Fk

(b0 − a0).

2 If f (t1) < f (t2), set a1 = a0, b1 = t2, t2 = t1 and compute t1 by

t1 = b1 −Fk−2

Fk−1(b1 − a1).

If f (t1) ≥ f (t2), set b1 = b0, a1 = t1, t1 = t2 and compute t2 by

t2 = a1 +Fk−2

Fk−1(b1 − a1).

3 . . .

4

t1 = bn −F1

F2(bn − an), or t2 = an +

F1

F2(bn − an).



The sequence of measurement points is determined in accordance with theassumption that each measurement is of lower value than its predecessors.Note that the procedure always calls for the last two measurements to bemade at the midpoint of the semifinal interval of uncertainty.



Golden Section Search

Suppose xn+2 = c1xn+1 + c2xn. If we can transform it intoxn+2 − rxn+1 = s(xn+1 − rxn), then xn+2 = (r + s)xn+1 − rsxn. Bycomparison, the characteristic equation can be derived by cancelling s or r .

As for the Fibonacci difference equation,

τ2 − τ − 1 = 0⇒ τ1 =1 +√

5

2, τ2 =

1−√

5

2,

where τ1 is known as the golden section ratio. The solution to theFibonacci difference equation FN = FN−1 + FN−2 is of the form

FN = AτN1 + BτN2 ⇒ limN→∞

FN−1FN

=1

τ1' 0.618.

As a consequence, we have

dk =

(1

τ1

)k−1d1,

dk+1

dk=

1

τ1' 0.618.



Golden Section Search Algorithm (Minimization)

1 Initialize a, b, and ε. Let λ = 0.618. Compute x1 and x2 by

x1 = a + (1− λ)(b − a), x2 = a + λ(b − a).

and f1 = f (x1), f2 = f (x2).2 If f (x1) < f (x2), set b = x2, x2 = x1, f2 = f1 and compute x1 by

x1 = a + (1− λ)(b − a).

and f1 = f (x1).If f (x1) ≥ f (x2), set a = x1, x1 = x2, f1 = f2 and compute x2 by

x2 = a + λ(b − a).

and f2 = f (x2).

3 If |a− b| < ε, then x∗ = 0.5(a + b), f ∗ = f (x∗); else go to Step 2.



Example 2.28

By using Fibonacci Search, maximize the function

f (x) = −|2− x | − |5− 4x | − |8− 9x |

over the interval [0, 3]. The user specified tolerance level is ε = 0.1, andthe first two experimental endpoints are x1 = 1.147 and x2 = 1.853.

Iteration x1 x2 f (x1) f (x2) Interval

2 0.7059 1.1471 -3.5882 -11.2353 [0.0000, 1.8529]3 1.1471 1.4118 -5.1176 -3.5882 [0.7059, 1.8529]4 0.9706 1.1471 -3.5882 -5.9412 [0.7059, 1.4118]5 0.8824 0.9706 -2.8824 -3.5882 [0.7059, 1.1471]6 0.7941 0.8824 -2.6471 -2.8824 [0.7059, 0.9706]7 0.8824 0.8824 -3.8824 -2.6471 [0.7941, 0.9706]

Example 2.29

Solve the previous problem by using Golden Section Search.



Newton’s Method

In most problems, however, it can be safely assumed that the functionbeing searched, as well as being unimodal, possesses a certain degree ofsmoothness, and one might, therefore, expect that more efficient searchtechniques exploiting this smoothness can be devised.

Suppose that the function f of a single variable x is to be minimized, andit is possible to evaluate f (xk), f ′(xk), f ′′(xk).

q(x) = f (xk) + f ′(xk)(x − xk) +1

2f ′′(xk)(x − xk)2,

q′(x) = 0⇒ xk+1 = xk −f ′(xk)

f ′′(xk).

Letting g(x) ≡ f ′(x), we have xk+1 = xk −g(xk)

g ′(xk).



The method can more simply be viewed as a technique for iterativelysolving equations of the form g(x) = 0 by letting g(x) ≡ f ′(x).



Method of False Position

Newton’s method for minimization is based on fitting a quadratic on thebasis of information at a single point. However, by using more points, e.g.,f (xk), f ′(xk), f ′(xk−1), less information is required at each of them to fitthe quadratic

q(x) = f (xk) + f ′(xk)(x − xk) +f ′(xk−1)− f ′(xk)

xk−1 − xk· (x − xk)2

2,

xk+1 = xk − f ′(xk)

[xk−1 − xk

f ′(xk−1)− f ′(xk)

].



Since this method does not depend on values of f directly, it can beregarded as a method for solving f ′(x) ≡ g(x) = 0. Viewed in this way themethod takes the form

xk+1 = xk − g(xk)

[xk − xk−1

g(xk)− g(xk−1)

].



Definition 2.6

Let the sequence rk∞k=0 converge to r∗. The order of convergence ofrk is defined as the supremum of the nonnegative numbers p satisfying

0 ≤ limk→∞

|rk+1 − r∗||rk − r∗|p

<∞.

Definition 2.7

If the sequence rk∞k=0 converge to r∗ in such a way that

limk→∞

|rk+1 − r∗||rk − r∗|

= β < 1,

the sequence is said to converge linearly to r∗ with convergence ratio (orrate) β.



When comparing the relative effectiveness of two competing algorithms,both of which produce linearly convergent sequences, the comparison isbased on their corresponding convergence ratios:

The smaller the ratio the faster the rate.

The ultimate case where β = 0 is referred to as superlinear convergence.Convergence of any order greater than one, i.e., p > 1, is superlinear, butit is also possible for superlinear convergence to correspond to unity order.

Example 2.30

The sequence rk = 1/k converges to zero. The convergence is of orderone but it is not linear, since limk→∞(rk+1/rk) = 1, that is, β = 1.

Example 2.31

The sequence rk = (1/k)k is of order one, since rk+1/(rk)p →∞ forp > 1. But limk→∞ rk+1/rk = 0 and hence this is superlinear convergence.



Theorem 2.7

Let the function g have a continuous second derivative, and let x∗ satisfyg(x∗) = 0 and g ′(x∗) 6= 0. Then, provided x0 is sufficiently close to x∗,the sequence xk generated by Newton’s method converges to x∗ with anorder of convergence at least two.

Proof.

Since g(x∗) = 0, we have

xk+1 − x∗ = xk − x∗ − g(xk)− g(x∗)

g ′(xk)

= −g(xk)− g(x∗) + g ′(xk)(x∗ − xk)

g ′(xk)=

1

2

g ′′(ξ)

g ′(xk)(xk − x∗)2

for some ξ between x∗ and xk . Therefore, if started close enough to thesolution, the method will converge to x∗ with an order of convergence atleast two.



Theorem 2.7

Let the function g have a continuous second derivative, and let x∗ satisfyg(x∗) = 0 and g ′(x∗) 6= 0. Then, provided x0 is sufficiently close to x∗,the sequence xk generated by Newton’s method converges to x∗ with anorder of convergence at least two.

Proof.

Since g(x∗) = 0, we have

xk+1 − x∗ = xk − x∗ − g(xk)− g(x∗)

g ′(xk)

= −g(xk)− g(x∗) + g ′(xk)(x∗ − xk)

g ′(xk)=

1

2

g ′′(ξ)

g ′(xk)(xk − x∗)2

for some ξ between x∗ and xk . Therefore, if started close enough to thesolution, the method will converge to x∗ with an order of convergence atleast two.



The derivative of f from M0 = (x0, y0, z0)t along the direction ` is

∂f (M0)

∂`=∂f

∂x· cosα +

∂f

∂y· cosβ +

∂f

∂z· cos γ.

Proof.

Let M = (x , y , z)t be an arbitrary point along `, and let t = M0M. Thenwe have x − x0 = t cosα, y − y0 = t cosβ, and z − z0 = t cos γ.

∂f (M0)

∂`= lim

M→M0

f (M)− f (M0)

M0M= lim

t→0

φ(t)− φ(0)

t= φ′(0).

On the other hand, we have

φ′(t) =∂f

∂x· dx

dt+∂f

∂y· dy

dt+∂f

∂z· dz

dt

=∂f

∂x· cosα +

∂f

∂y· cosβ +

∂f

∂z· cos γ.



The derivative of f from M0 = (x0, y0, z0)t along the direction ` is

∂f (M0)

∂`=∂f

∂x· cosα +

∂f

∂y· cosβ +

∂f

∂z· cos γ.

Proof.

Let M = (x , y , z)t be an arbitrary point along `, and let t = M0M. Thenwe have x − x0 = t cosα, y − y0 = t cosβ, and z − z0 = t cos γ.

∂f (M0)

∂`= lim

M→M0

f (M)− f (M0)

M0M= lim

t→0

φ(t)− φ(0)

t= φ′(0).

On the other hand, we have

φ′(t) =∂f

∂x· dx

dt+∂f

∂y· dy

dt+∂f

∂z· dz

dt

=∂f

∂x· cosα +

∂f

∂y· cosβ +

∂f

∂z· cos γ.



Let

a =∂f (M0)

∂x, b =

∂f (M0)

∂y, c =

∂f (M0)

∂z.

Then we have

∂f

∂`=∂f

∂x· cosα +

∂f

∂y· cosβ +

∂f

∂z· cos γ

= a · cosα + b · cosβ + c · cos γ

=√

a2 + b2 + c2(cosα1 · cosα + cosβ1 · cosβ + cos γ1 · cos γ)

=√

a2 + b2 + c2 cos(`1, `) direction

≤

√(∂f

∂x

)2

+

(∂f

∂y

)2

+

(∂f

∂z

)2

length



Steepest Descent Method (Gradient Method)

By Taylor’s theorem, if f ∈ C 1 in a region containing the line segment[xk , xk+1], then there is an α, 0 ≤ α ≤ 1, such that

f (xk+1) = f (xk) +∇f (xk)(xk+1 − xk)

= f (xk) +∇f (xk)dk .

Apparently, if dk = −∇f (xk)t , then we have

f (xk+1) = f (xk)− ‖∇f (xk)t‖2,

which can guarantee the descent of a minimizing problem provided∇f (xk)t 6= 0, or practically, ‖∇f (xk)t‖2 < ε. By setting dk = −∇f (xk)t ,we need to choose the step-size αk such that

αk = arg minα

f(xk − α∇f (xk)t

).



The Quadratic Case

Consider

f (x) =1

2xtHx− xtb and E (x) =

1

2(x− x∗)tH(x− x∗), (12)

where H is a positive definite symmetric n × n matrix.

The steepest descent method can be expressed as xk+1 = xk − αkgk ,where gk = g(xk) = ∇f (xk)t = Hxk − b, and αk minimizes f (xk − αgk).By (12), we have

f (xk − αgk) =1

2(xk − αgk)tH(xk − αgk)− (xk − αgk)tb,

and thus

αk =gtkgkgtkHgk

⇒ xk+1 = xk −(

gtkgkgtkHgk

)gk .



Example 2.32

The function f and the steepest descent process can be illustrated as inthe following figure by showing contours of constant values of f and atypical sequence developed by the process. The contours of f aren-dimensional ellipsoids with axes in the directions of the n-mutuallyorthogonal eigenvectors of H.



Lemma 2.1

The iterative process of steepest descent method satisfies

E (xk+1) =

1−

(gtkgk)2

(gtkHgk)(gtkH−1gk)

E (xk).

Proof.

Setting yk = xk − x∗, gk = Hxk − b = Hxk −Hx∗ = Hyk . For E (yk), wehave

E (xk)− E (xk+1)

E (xk)=

2αkgtkHyk − α2

kgtkHgk

ytkHyk

=(gtkgk)2


.



Lemma 2.1

The iterative process of steepest descent method satisfies

E (xk+1) =

1−

(gtkgk)2


E (xk).

Proof.

Setting yk = xk − x∗, gk = Hxk − b = Hxk −Hx∗ = Hyk . For E (yk), wehave

E (xk)− E (xk+1)

E (xk)=

2αkgtkHyk − α2

kgtkHgk

ytkHyk

=(gtkgk)2


.



Lemma 2.2 (Kantorovich inequality)

Let H be a positive definite symmetric n × n matrix. For any vector xthere holds

(xtx)2

(xtHx)(xtH−1x)≥ 4aA

(a + A)2

where a and A are, respectively, the smallest and largest eigenvalues of H.

Proof.

Let the eigenvalues of H satisfy a = λ1 ≤ λ2 ≤ . . . ≤ λn = A. By anappropriate change of coordinates the matrix H becomes diagonal withdiagonal λ1, λ2, . . . , λn. In this coordinate system we have

(xtx)2

(xtHx)(xtH−1x)=

(∑ni=1 x2

i

)2(∑ni=1 λix

2i

) [∑ni=1(x2

i /λi )] ,



Lemma 2.2 (Kantorovich inequality)

Let H be a positive definite symmetric n × n matrix. For any vector xthere holds

(xtx)2

(xtHx)(xtH−1x)≥ 4aA

(a + A)2

where a and A are, respectively, the smallest and largest eigenvalues of H.

Proof.

Let the eigenvalues of H satisfy a = λ1 ≤ λ2 ≤ . . . ≤ λn = A. By anappropriate change of coordinates the matrix H becomes diagonal withdiagonal λ1, λ2, . . . , λn. In this coordinate system we have

(xtx)2

(xtHx)(xtH−1x)=

(∑ni=1 x2

i

)2(∑ni=1 λix

2i

) [∑ni=1(x2

i /λi )] ,



continued.

which, taking into account the convex combination, can be written as

(xtx)2

(xtHx)(xtH−1x)=

1/(∑n

i=1 ξiλi )∑ni=1(ξi/λi )

≡ φ(ξ)

ψ(ξ), where ξi =

x2i∑n

i=1 x2i

.

The minimum value of this ratio is achieved for some λ = ξ1λ1 + ξnλnwith ξ1 + ξn = 1. Using the relation

ξ1λ1

+ξnλn

=λ1 + λn − ξ1λ1 − ξnλn

λ1λn,

an appropriate bound is

φ(ξ)

ψ(ξ)≥ lim

λ1≤λ≤λn

1/λ

(λ1 + λn − λ)/(λ1λn)≥ 4λ1λn

(λ1 + λn)2,

where the minimum is achieved at λ = (λ1 + λn)/2.



continued.

φ(ξ)

ψ(ξ)=

1/(∑n

i=1 ξiλi )∑ni=1(ξi/λi )

The curve represents the function 1/λ. Since∑n

i=1 ξiλi is a point betweenλ1 and λn, the value of φ(ξ) is a point on the curve. On the other hand,the value of ψ(ξ) is a convex combination of points on the curve and itsvalue corresponds to a point in the shaded region. For the same vector ξ,both functions are represented by points on the same vertical line.



Theorem 2.8 (Steepest descent method–quadratic case)

For any x0 ∈ Rn, the steepest descent method converges to the uniqueminimum point x∗ of f . Furthermore, there holds at every step k

E (xk+1) ≤(

A− a

A + a

)2

E (xk).

Proof.

By Lemma 2.1 and Lemma 2.2, we have

E (xk+1) =

1−

(gtkgk)2


E (xk)

≤[

1− 4aA

(A + a)2

]E (xk) =

(A− a

A + a

)2

E (xk).



Theorem 2.8 (Steepest descent method–quadratic case)

For any x0 ∈ Rn, the steepest descent method converges to the uniqueminimum point x∗ of f . Furthermore, there holds at every step k

E (xk+1) ≤(

A− a

A + a

)2

E (xk).

Proof.

By Lemma 2.1 and Lemma 2.2, we have

E (xk+1) =

1−

(gtkgk)2


E (xk)

≤[

1− 4aA

(A + a)2

]E (xk) =

(A− a

A + a

)2

E (xk).



In terms of convergence rate, Theorem 2.8 states that regarding E (x), thesteepest descent method converges linearly with a ratio no greater than(

A− a

A + a

)2

.

The convergence rate actually depends only on the ratio r = A/a. Thusthe convergence ratio is (

A− a

A + a

)2

=

(r − 1

r + 1

)2

,

which clearly shows that convergence is slowed as r increases. The ratio r ,which is the single number associated with the matrix H that characterizesconvergence, is often called the condition number of the matrix.



Example 2.33

Consider

H =

0.78 −0.02 −0.12 −0.14−0.02 0.86 −0.04 0.06−0.12 −0.04 0.72 −0.08−0.14 0.06 −0.08 0.74

, b =

0.760.081.120.68

.

Starting with x0 = (0, 0, 0, 0)t , show the solution sequence generated bythe steepest descent method converges to

x∗ = (1.534965, 0.1220097, 1.975156, 1.412954)t , f (x∗) = −2.1746595.

For this positive definite matrix H, we have a = 0.52, A = 0.94 and hencer = 1.8. This is a very favorable condition number and leads to theconvergence ratio is 0.081. Thus each iteration will reduce the error in theobjective by more than a factor of ten; or, equivalently, each iteration willadd about one more digit of accuracy.



Steepest descent method–nonquadratic case

To establish estimates of the progress of the gradient method when theHessian matrix is always positive definite, assume that the Hessian matrixis bounded above and below as aI ≤ F(x) ≤ AI.

Given a point xk , we have for any α,

f (xk − αg(xk)) ≤ f (xk)− αg(xk)tg(xk) +Aα2

2g(xk)tg(xk).

Minimizing both sides separately with respect to α the inequality will holdfor the two minima, that is,

f (xk+1) ≤ f (xk)− 1

2A‖g(xk)‖2,

and we thus have

f (xk+1)− f ∗ ≤ f (xk)− f ∗ − 1

2A‖g(xk)‖2. (13)



In a similar way, for any x there holds

f (x) ≥ f (xk) + g(xk)t(x− xk) +a

2‖x− xk‖2.

Again we can minimize both sides separately and have

f ∗ ≥ f (xk)− 1

2a‖g(xk)‖2 ⇒ −‖g(xk)‖2 ≤ 2a[f ∗ − f (xk)]. (14)

By (13) and (14), we have

f (xk+1)− f ∗

f (xk)− f ∗≤ 1− a

A,

which shows that the gradient method makes progress even when thestarting point is not close to the optimal solution.



Example 2.34

To minimize a function f , consider solving the equations ∇f (x) = 0 thatrepresent the necessary conditions. We could apply steepest descent to thefunction h(x) = ‖∇f (x)‖2. For simplicity, consider the quadratic case,

f (x) =1

2xtQx− btx, h(x) = xtQ2x− 2xtQb + btb.

The rate of convergence of steepest descent applied to h(x) will begoverned by the eigenvalues of the matrix Q2. By(

r − 1

r + 1

)2

=

(r2 − 1

r2 + 1

)2

'(

1− 1

r2

)4

,(r − 1

r + 1

)2

'(

1− 1

r

)4

,

(1− 1

r2

)r

' 1− 1

r,

it takes about r steps of the new method to equal one step of ordinarysteepest descent.



Newton’s Method

The function f being minimized is approximated locally by a quadraticfunction, and this approximate function is minimized exactly. Thus nearxk , we can approximate f by the truncated Taylor series,

f (x) ' f (xk) +∇f (xk)(x− xk) +1

2(x− xk)tF(xk)(x− xk).

The right-hand side is minimized at

xk+1 = xk − [F(xk)]−1∇f (xk)t ,

and this equation is the pure form of Newton’s method.

In view of the second-order sufficiency conditions for a minimum point, weassume that at a relative minimum point, x∗, the Hessian matrix, F(x∗), ispositive definite.



Theorem 2.9

Let f ∈ C 3 on Rn, and assume that at the local minimum point x∗, theHessian F(x∗) is positive definite. Then if started sufficiently close to x∗,the points generated by Newton’s method converge to x∗. The order ofconvergence is at least two.

Proof.

There are ρ > 0, β1 > 0, β2 > 0 such that ∀x with ‖x− x∗‖ < ρ, thereholds ‖F(x)−1‖ < β1, ‖∇f (x∗)t −∇f (x)t − F(x)(x∗ − x)‖ ≤ β2‖x∗ − x‖2.Suppose xk is selected with β1β2‖x∗ − xk‖ < 1 and ‖x∗ − xk‖ < ρ. Then

‖xk+1 − x∗‖ =∥∥xk − x∗ − [F(xk)]−1∇f (xk)t

∥∥=∥∥[F(xk)]−1

[∇f (x∗)t −∇f (xk)t − F(xk)(x∗ − xk)

]∥∥≤∥∥[F(xk)]−1

∥∥ · β2‖xk − x∗‖2≤ β1β2‖xk − x∗‖2 < ‖xk − x∗‖.



Theorem 2.9

Let f ∈ C 3 on Rn, and assume that at the local minimum point x∗, theHessian F(x∗) is positive definite. Then if started sufficiently close to x∗,the points generated by Newton’s method converge to x∗. The order ofconvergence is at least two.

Proof.

There are ρ > 0, β1 > 0, β2 > 0 such that ∀x with ‖x− x∗‖ < ρ, thereholds ‖F(x)−1‖ < β1, ‖∇f (x∗)t −∇f (x)t − F(x)(x∗ − x)‖ ≤ β2‖x∗ − x‖2.Suppose xk is selected with β1β2‖x∗ − xk‖ < 1 and ‖x∗ − xk‖ < ρ. Then

‖xk+1 − x∗‖ =∥∥xk − x∗ − [F(xk)]−1∇f (xk)t

∥∥=∥∥[F(xk)]−1

[∇f (x∗)t −∇f (xk)t − F(xk)(x∗ − xk)

]∥∥≤∥∥[F(xk)]−1

∥∥ · β2‖xk − x∗‖2≤ β1β2‖xk − x∗‖2 < ‖xk − x∗‖.


Documents

MATLAB 333†††LLLƒƒƒ+++nnnïïï˜˜˜¥¥¥˙˙˙AAA^^^ · MATLAB ˜˜˜::: ÝÝÝ ƒƒƒŒŒŒ|||$$$””” Example 5 Using Jacobi and Guass-Seidel iteration to solve