Optimization class notes MTH-9842

8/14/2019 Optimization class notes MTH-9842

1/25

Part I

Unconstrained optimization

Line search method

Truest region method

Recall: Taylors theorem

1. f(y) = f(x) + f()(y x), (x, y)

2. f(y) = f(x) +10

f(x + t(y x))dt

1 Line search method

1.1 Determine a direction (decent)

Definition:

A direction p at x1 is called a decent direction if pTf(x1) < 0.

1.1.1 Steepest decent (negative gradient)

Rmk:

Duf(x) = limt0f(x + tu) f(x)

t= f(x) u (1)

= |f(x)|cos, when u is a unit vector.

Therefore, if

1. = 0, i.e, u is in the direction of the gradient of f, then Duf(x) = |f(x)| > 0 iff(x) = 0.

2. = , i.e, u is in the opposite direction of the gradient of f, then Duf(x) = |f(x)| < 0if f(x) = 0.

Rmk:

1. We only need first derivative (good)!2. Sometimes the convergence is very very slow (bad)!

3. Any descent direction will work.

1


2/25

1.1.2 Newtons method

Starting at x1 approximate f at x1 by Taylor polynomial at degree 2:

f(x) = f(x1) + f(x1)T(x x1) +

1

2

(x x1)T2f(x1)(x x1) := m(x) (2)

A sufficient condition to minimize m(x) is m(x) = 0, i.e.,

f(x1) + 2f(x x1) = 0 (3)

Side remark:

f(x) = bTx = f(x) = b

f(x) = 12xTQx = f(x) = 12(Q + Q

T)x

NOTE: Only if Q is symmetric, this simplifies to f = Qx.

If 2f(x1) is invertible, then x x1 = (2f(x1))

1 f(x1).

This direction is called Newtons direction.

Remarks:

1. Convergence is quadratic (fast)!

2. Need to compute Hessian.

3. Why is Newtons direction a decent direction?HOMEWORK: Show that if2f(x1) is positive definite, then Newtons direction is a decent.

4. If2

f(x1) is NOT positive definite (then Newtons direction is not decent), then we need tofix the Hessian.

Summary:

Let

pk: direction at the kth stage

k: step size for the kth iteration.

Then

xk+1 = xk + kpk, k = 0, 1, 2, . . . (4)

Note thatpk = kf(xk) (5)

and if

2


3/25

k = Id, then we are using steepest decent method.

k = x x1, then we are using Newtons method.

1.2 How far do we go along a chosen direction?

The next question is how long to go along a piven pk.

Let() = f(xk + pk), > 0. (6)

Idea:Minimize (), i,e., find the step which minimizes for > 0.

Good, but expensive!

How do we guarantee that we arrive at an absolute and not at a local minimum?

What we need is:

1. () < (0)

2. >> 0 (away from 0).

1.2.1 Criteria for acceptable stepsize

1. Wolfes conditions

(a) () (0) + c1(0) , (Armijo condition)

(b) () c2(0), where 0 < c1 < c2 < 1. (Curvature condition)

2. Goldsteins conditions

(a) (0) + 1 0)(0) () (0) + c(0), c (0, 1/2)

Backtracking:

Pick so that(a) is acceptable: done.(b) Go back to smaller steps.

1. For backtracking, do not use the curvature condition

2. For Newton direction, the natural initial step is = 1.

1.3 ConvergenceTheorem (Zoutendijk):

If f is continuously differentiable and f is Lipschitz continuous (i.e, L > 0, s.t. |f(y) f(x)| L|x y| x, y), and the steps satisfy Wolfes conditions and the directions are

3


4/25

decent directions and given our iterative process:

xk+1 = xk + kpk (7)

then

k=1

cos2k||f(xk)||2 < . (8)

Remarks:

1. Zoutendijks theorem implies global convergence

(a) Sum is finite implies the terms go to zero.

(b) Ifcos2k > > 0k then ||f(xk)||2 0

(c) For steepest decent, cos2k = 1 > 0 = . steepest decent is always convergent.

(d) For Newtons method, if Hessian satisfies the uniformly bounded condition number (Ainvertible, (A) = ||A||||A1||), i.e, ||2f(xk)||||

2f(xk)1|| > M xk.

2. ||A|| = sup||x||1 ||Ax||||x||

3. g is Lipschitz continuous in a domain D ifL > 0 such that |g(y)g(x)| L|xy|, x, y DE.g. (HOMEWORK), g C1 then g is Lipschitz.

4. cosk =f(xk)pk

||f(xk)||||pk||

Recall that in Wolfes condition () = f(xk + pk), hence () = f(xk + pk) pk.

We pick k such that() c2f(xk) pk, hencef(xk + pk) c2f(xk + pk)

Proof (Zoutendijk):

(f(xk+1) f(xk)) pk (9)

= f(xk+1) pk f(xk) pk (10)

= f(xk + kpk) pk f(xk) pk (11)

c2f(xk) pk f(xk) pk, W olf e(ii) (12)

At the same time

|(f(xk+1) f(xk)) pk| (13)

||f(xk+1) f(xk)||||pk||, Cauchy (14)

= L||xk+1 xk||||pk||, Lipschitz (15)

= Lk||pk||, xk+1 xk = kpk (16)

Combining both inequalities, we obtain

4


5/25

(c2 1)f(xk) pk Lk||pk||2

= k (c21)f(xk)pkL||pk||2

,

from the curvature condition; this implies that the steps have to be far away enough from 0.

Now, note that

f(xk+1) f(xk)= f(xk + kpk) f(xk) c1kf(xk) pk), Wolfes condition (i)= c1k(f(xk) pk)

c1(c2 1)

L||pk||2(f(xk) pk)(f(xk) pk) =

c1(1 c2)

L

(f(xk) pk)2

||pk||2

=c1(1 c2)

L

(f(xk) pk)2

||pk||2

Therefore we have,

f(xk) f(xk+1) c1(1 c2)L

||f(xk)||2cos2k, k

f(xk1) f(xk) c1(1 c2)

L||f(xk1)||

2cos2k1

...

f(x0) f(x1) c1(1 c2)

L||f(x0)||

2cos20

Adding this telescopic sum, we obtain

f(xk) f(xk+1) kn=0

c1(1 c2)

L||f(xn)||

2cos2n (17)

1.4 More on Convergence Rate

Theorem: f : Rn R twice continuously differentiable iterates generated by steepest descentwith exact line search convergence to x, at which 2f(x) > 0. Then for all k large enough wehave

f(xk+1) f(xk) = r2(f(xk) f(x

), (18)

where r (n1n+1

, 1) and 1 2 . . . n are the eigenvalues of the Hessian.

Sketch of proof: Using a telescopic argument

f(xk+1) f(xk) r2(r2(f(xk) f(x

))

= r

4

(f(xk) f(x

)

. . .

r2k(f(xk) f(x))

5


6/25

Remark: Assume that the condition number of 2f(x) is

(2f(x)) = 2f(x)2f(x)1 =n1

= 800, say

Then

n 1n + 1 =

799

801 .

Then

f(xk+1) f(xk) 799

801

2k

(f(xk) f(x)

, but

f(xk+1) f(xk) 799

801

2k

= 0.08,

which is very slow! A disadvantage of linear convergence!

2 Trust region method

In contrast to the line search method, the trust region method determines the direction of the steplength simultaneously. Let xk be the current position within a region (the trust region), usually adisk B(xk, k) := {x : x xk k}.

Goal:minx

f(x), x R2

The objective function f has a good approximation function m(p), called the model function.Usually we pick

m(p) = f(xk) + f(xk)Tpk +

1

2pT2 f(xk)p,

the second order Taylor expansion of f at xk.

That is the ideal case, but sometimes the Hessian is hard to find, therefore, in practice we useinstead

m(p) = f(xk) + gTk pk +

1

2pTBkp.

We choose xk+1 to be the one that minimizes the model function m(p) within the trust region ifthe function decreases enough at the new point.

Remark: If the value of the objective function at the new point is not small enough, then wereject the model and shrink k and redo the constraint optimization.

2.1 Pseudo-Algorithm

Given a maximal radius pick 0 (0, , [0,14).

1. Define

k =f(xk) f(xk+1)

m(0) m(k), xk+1 = xk + pk

6


7/25

Figure 1: default

7


8/25

2. Ifk < 0 then reject it, because f(xk+1) > f(xk) and shrink k.

3. Ifk 1, accept it and expand k.

4. Ifk > 0, k close to 0, then reject it and redo with smaller k.

Pseudo-code:

Input: , 0 (0, , [0,14)

Compute k, from solving m(p)p, |p| < k.

Calculate k =f(xk)f(xk+1)m(0)m(k)

if k 34 and |p| = k, then k+1 = min{2l,

Delta}else k+1 = k.

if k > then xk+1 = xk + pk

else xk+1 = xk.

Remark: The optimal solution p of

2.2 How to solve (or approximate) the model problem quickly?

The model problem is

minp

m(p) = fk + gTk p +

1

2pTBk p, s.t.|p| k, (19)

where fk = f(xk), gk = f(xk), Bk = 2 f(xk).

Remark: The optimal solution p of (19) is characterized by

1. p is feasible (i.e. |p| k)

2. 0

3. (k |p|) = 0

4. (Bk + I)p = g

5. Bk + I 0.

2.3 Three approaches to approximating the solution1. Cauchy Point: steepest descent

2. Dogleg method

3. Two dimensional minimization

8


9/25

2.4 Cauchy Point

m(p) = fk + gTk p +

1

2pTBkp,

The Cauchy point is the minimum point ofm(p) in the steepest descent direction such that |p| .Define

() = f |g|2 +1

2pTBkp

2

Minimize () s.t. 0 |g| .

The Cauchy point pc is defined by

pc = g, where is the minimum point of .

2.5 Dogleg method

If small, then m(p) f + gT

p: Cauchy point is a good approx.

If is not small then the quadratic term plays a role: m(p) = fk + gTk p +12pTBkp

The unconstrained minimum of m(p) is given by m(p) = 0.

Let m(p) = g + Bp = 0 = p = B1g (if B > 0).

See graph in book, page 74.

We follow a kinked path, called the dog leg path, which is parametrized by

2.6 Two dimensional problem

The model problem is solving

minpf + gTp +

1

2pTBp,

where |p| < , p span{pu, pB}.

Remark: The 2-dimensional minimization improves the min found in the dogleg method, whichimproves the Cauchy point.

2-dim min better Dogleg > better > Cauchy.

9


10/25

Part II

Constrained Optimization

3 Mathematical programming

If f : Rn R, the general mathematical programming optimization problem is often denoted by(MP), and it is the following problem:

min f(x) (20)

subject to m equality constraints

g1(x) = 0g2(x) = 0

...gm(x) = 0

and r inequality constraints:

h1(x) 0h2(x) 0

...hr(x) 0

Iff, g1, g2, . . . , gm, h1, . . . , hr are linear (affine) functions then the problem is called Linear Programmingand denoted (LP).

The (LP) problem can then be written in matrix form:

min cTx (21)

such that Aeqx + beq = 0 and Aineqx + bineq 0.

3.1 Standard Linear programming

The standard Linear programming can then be written as:

min cTxs.t.

Ax = bx 0

3.2 Simplex Method

Consider the following problem, which is solved using the simplex method. The tables below arecalled tableaux.

10


11/25

(1) Consider the following linear programming problem:

max4x1 + 3x2

subject to3 x1 + x2 93x1 + 2x2 10x1 + x2 4x1, x2 0.

Transform this problem into the standard form. How many basic solutions doesthe standard form problem have? What are the basic feasible solutions andwhat are the extreme points of the feasible region? Solve the problem by sim-plex method. NOTE: Basic solutions are the points which satisfy the equalityconstraints but are not necessarily positive (if its positive, then its called basicfeasible).

By adding slack variables, this problem can be written in standard LP form:

max4x1 + 3x2 (22)

subject to3x

1+ x

2+x

3= 9

3x1+ 2x2 +x4 = 10x1+ x2 +x5 = 4x1, x2 , x3, x4 , x5 0.

The tableau for this linear programming problem is given by:

Basic var. x1 x2 x3 x4 x5

Z -4 -3 0 0 0 0 x3 3 1 1 0 0 9

x4 3 2 0 1 0 10x5 1 1 0 0 1 4

We choose x1 to be our entering variable since it is the most negative of the variables in theobjective row. We perform the ratio test and obtain 3, 10/3, 4, respectively, so we select x3 as ourleaving variable and 3 as our pivot. Therefore, we obtain:


Z 0 -5/3 4/3 0 0 12x1 1 1/3 1/3 0 0 3

x4 0 1 -1 1 0 1x5 0 2/3 -1/3 0 1 1

The entering variable is the only remaining negative variable in the objective row, x2, and per-forming the ratio test we obtain 9,1,3/2, respectively. Therefore, the leaving variable is x4 and thepivot is 1, which leads to tableau # 3:


Z 0 0 -1/3 5/3 0 41/3x1 1 0 2/3 -1/3 0 8/3x2 0 1 -1 1 0 1

x5 0 0 1/3 -2/3 1 1/3

11


12/25

Since now x3 is negative in the objective row, we try to eliminate it by choosing x3 as the enteringvariable. We perform the ratio test on the non-negative entries in that column and we obtain 4,and 1 respectively, so that x5 is our leaving variable and 1/3

our pivot element, which leads toour last tableau:

Basic var. x1 x2 x3 x4 x5Z 0 0 0 1 1 14

x1 1 0 0 1 -2 2x2 0 1 0 -1 3 2x3 0 0 1 -2 3 1

Since both objective row variables are positive, we found an optimal solution, namely 14, for thevalues of x1 = 2, x2 = 2, x3 = 1.

12


13/25

(a) Bild a. (b) Bild b.

Figure 2: Bild a och b.

4 Convex Optimization

NOTE: See Convex Optimization book chapters 4 and 5.

A convex optimization problem is one of the form

min f0(x)

subject to fi(x)0, i = 1,...,maTi x = bi, i = 1,...,p,

(23)

where f0,...,fm are convex functions.

A fundamental property of convex optimization problems is that any locally optimal point is also(globally) optimal. Proof by contradiction and the use of convexity in the segment xz, then

f0(z)(1?)f0(x) + f0(y) < f0(x).

More notes can be found on Boyd, and Vanderberghe, p. 136 - 151.

4.1 Chebyshev problem

Assume X is discrete and finiteX x1, x2, . . . , xn

P[X = xi] = pi

Given that f0(xi) = a0i , f1(xi) = a

1i , . . . , f m(xi) = a

mi ,

and

k E[fk(x)] =ni=1

piaki k

13


14/25

then the Chebyshevs problem is to find

minE[f0(x)]

s.t.k E[fk(x)] k, k = 1,...,m

4.1.1 Example 1

I could not add as a note into the book.

Letf0(x) = x

f1(x) = 1(,)(x)

then the problem becomes

min E[X] =

pixi

st

1 E[1(,)(x)] 1 = P[X > ] =

{xi>}

pi

4.1.2 Example 2: Forward rate

X price of asset.

What is the max/min of a call on X struck at K, given E[X]?

max or min E[(X K)+] = maxp or min

ni=1

(xi K)+

pi

s.t.E[X] =

ni=1

xipi = , p 0,

pi = 1

More interesting / challenging if the constraint is

E[(x K1)+] = C

iff ni=1

xipi = c

What could be added to make the problem more realistic?

Adding variance the problem becomes quadratic optimization.

14


15/25

4.2 Quadratic problem QP

Quadratic Objective function:

min1

2xTQx + pTx + r

s.t.

Ax = b, Gx h, affine constraints

4.2.1 Quadratic constraint quadratic problem QCQP

min1

2xTQx + pTx + r

s.t.

Ax = b

and1

2xTQix + p

Ti x + ri 0, i = 1, 2, . . . , m

4.2.2 Markowitz portfolio optimization

LetX1, X2, . . . , X n

be the return of the risky assets with mean returns

1, 2, . . . , n

and covariance matrixi,j = E[XiXj ]

LetP := (p1, p2, . . . , pn)

the portfolio of these assets, i.e., the amount invested in each asset 1, 2, . . . , n.

Thentp

is the expected return of the portfolio and

ptp

is the variance of the portfolio P.

Consider the problem

minpTp minimize risk

15


16/25

such that

Tp rmin, expected return no less than rmin

n

i=1

pi 1, budget constraint

P[ptX ]

16


17/25

5 Duality Theory

Problem:min f0(x) (24)

such thatfi(x) 0, i = 1, . . . , m

hi(x) = 0, i = 1, . . . , p

Let x D, which is the intersection of the domains of all the functions.

Let

L(x,,) := f0(x) +

mi=1

ifi(x) +

pi=1

ihi(x) = f0(x)+ < , f > + < ,h >

whereL : D Rm Rp R

is the Lagrangian.

Define

g(, ) := infx

L(x,,) = infx

{f0(x) +ni=1

ifi(x) +

pi=1

ihi(x)}

This function g is the dual function of (24).

Remark:

1. g is concave, since it is the point-wise infimum of linear functions.

2. Let p be the optimal value of (24), then

p g(, ) 0.

Proof: If 0, x is feasible.

L(x,,) := f0(x) +mi=1

ifi(x) +

pi=1

ihi(x) f0(x),

because i 0, fi(x) 0, hi(x) = 0, since x is feasible. Hence,

infyD

L(y, , ) L(x,,) f0(x) x feasible and 0,

butL(y, , ) = g(, ),

so that

g(, ) p.

17


18/25

5.0.3 Example: dual function for LP

min cTx

s.t. Ax = b,x 0. Rewrite instead as x 0.

ThenL(x,,) := cTx + T(x) + T(Ax b)

and

g(, ) = infx

{(ct t + tA)x} tb =

Tb, At = c otherwise

Since (ct t+ tA)x with no constraints, its infimum is attained at . To avoid that, we needto have

ct t + tA = 0

or equivalently:At = c

5.0.4 Example: trust-region problem

min xTAx + bTx

s.t. xTx 2, where is the radius of the trust region.

Find the dual function of the problem.

Solution:L(x, ) := xTAx + bTx + (xTx 2)

and

g() = infxR

n

L(x, ) = infx

xTAx + bTx + (xTx 2) = infx

xT(A + I)x + bTx 2.

Then, since A is symmetric, we obtain

(xT(A + I)x + bTx 2) = 2(A + I)x + b = 0,

and hence

x = (A + I)1b

2

if A + I is invertible.

Therefore,

g() =

14 bT(A + I)1b 2 ifA + I is positive definite

otherwise

The value 2 is obtained by pluging x = (A + I)1b into the equation for L:

bt

2(A + I)1(A + I)(A + I)1

b

2 bT(A + I)1

b

2=

1

4bT(A + I)1b

Remark:

18


19/25

p 14bT(A + I)1b 2 is positive definite

The dual function provides a nontrivial lower bound on p if 0 and g(, ) =

(, ) s.t. 0 and g(, ) = is called dual feasible.

Actual question:

1. What is the best lower bound?

2. Is the best lower bound equal to p?

5.1 Dual Problem

Answer:

1. Dual problemd := max g(, ) s.t. 0

which is always a convex problem.

2. In general, p = d.

5.1.1 Strong Duality

If p = d, then we say we have strong duality.

5.1.2 Weak Duality

Since

p

g(, ), 0we have

p max{g(, )| 0} := d

NOTE: Weak duality is always true, no matter if the primal is convex or not.

5.1.3 Examples

1. LP:min cTxs.t.

Ax = b

x 0Then

L(x,,) := cTx + T(x) + T(Ax b) = (ct t + tA)x tb

and

g(, ) = infx

{(ct t + tA)x} tb =

Tb, ct + t + tA = 0 otherwise

19


20/25

The dual problem ismax g(, )s.t.

0

Rewriting it in an equivalent form:

max,

tb

s.t.At+ c + = 0 0

or alternatively, since is playing the role of a slack variable:

max

tb

s.t.At+ c = 0

Remark:Dual of the dual problem of the LP problem is the LP problem again!2. Two way partition problem

min xTW x, W symmetric, not nec. positive definites.t. x = (x1, . . . , xn)

xi = 1, i = 0, . . . , n

What is the dual problem?Solution:

L(, ) = xtW x +

i(x2i 1)

and

g(, ) = inf x

{xtW x +ni=1 ix

2i n

i=1

i}

= infx

{xt(W + N)x ni=1

i}

which equals ni=1 i if W + N = 0

otherwise

where

N =

1 0. . .

0 n

Dual Problem:max

ni=1

i

s.t. W + N 0(25)

Remarks:p d, even if p, d =

20


21/25

1. p = , i.e. primal is unbounded, thend = , i.e., the dual is infeasible

2. d = , i.e., the dual is unbounded, thenp = , i.e., the primal is infeasible

5.2 Strong duality

p = d

Remark:In general, we dont have strong duality.For convex problem, we usuallyhave strong duality

Theorem 5.1 (Slaters condition)For a convex problem, the strong duality holds if there is an x rel int(D), such that x is feasible,i.e.,

fi(x) < 0, i = 1, . . . , mhi(x) = 0, i = 1, . . . , p

Notice the strict inequality in the first condition.

Remark:(Weaker Slaters condition)If f1, . . . , f m are affine then the Slaters condition can be weakened to nonpositivity: f1(x) 0, . . . , f m(x) 0.

5.2.1 Examples

1. LP:min cTxs.t.

Ax = bx 0

Slaters condition just means feasibility in this case. The dual in this case is

max,

tb

s.t.At+ c + = 0 0

(a) If the primal is feasible (then weak Slater condition holds) , then we have strong duality,i.e. p = d.

(b) If the dual is feasible (then weak Slater condition holds), then we have strong duality,i.e. d = p.

(c) primal and dual are both infeasible, then it can happen that = p = q = .

21


22/25

2. QCQP: quadratic constraint, quadr. programming:

min 12xTP0x + q

T0 x + r0

s.t.12xTPix + q

Ti x + ri 0

i = 1, . . . , m

Ax = b

Then Slaters condition is: x | 12xTPix + q

Ti x + ri < 0.

Convexity:P0 > 0, pi 0, i = 1, . . . , m

Slaters condition + convexity give us strong duality.

3. Nonconvex problem with strong duality Strong duality holds for any optimization problemwith quadratic objective function and one quadratic inequality constraint provided thatSlaters condition holds.Remark:Trust-region problem

min xTAx + bTx

s.t. xTx 2

where A is symmetric but not necessarily positive definite!

Dual problem:

max 14bT(A + I)1b 2

s.t. 0, A + I 0

Figure 3: Solving both problems simultaneously gives faster convergence.

5.3 Optimality conditions

Assume that the objective function and the constraints are differentiable.

22


23/25

Complementary slackness:Assume we have strong duality, i.e. p = d, and they are attained at x for primal problem and(, ) for the dual problem, i.e.

f0(x) = g(, ) strong duality

= infx L(x,

,

) definition of g (=) L(x, , )

= f0(x) +mi=1

i fi(x) +pi=1

i hi(x) def of Lagrangian

Since we actually achieve strict equality in the one but last line, therefore, since

hi(x) = 0, fi(x

) 0, 0

it follows that both sums have to be exactly 0.

Therefore,

1. x minimizes L(x, , ) over x.

2. Complementary slackness: i fi(x) = 0, i = 1, . . . , m

5.4 Necessary optimality conditions (Karuch-Kuhn-Tucker)

Dual feasible:i 0, i

Primal feasible:

fi(x) 0, i = 1, . . . ,m, hi(x) 0, i = 1, . . . , p .

Complementary slackness: i fi(x) = 0, i = 1, . . . , m

This last condition can be relaxed by setting instead:

xL(x,,) = f0(x) +mi=1

fi i(x) +

pi=1

ihi(x)

23


24/25

Contents

I Unconstrained optimization 1

1 Line search method 1

1.1 Determine a direction (decent) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Steepest decent (negative gradient) . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 Newtons method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 How far do we go along a chosen direction? . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Criteria for acceptable stepsize . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 More on Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Trust region method 6

2.1 Pseudo-Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 How to solve (or approximate) the model problem quickly? . . . . . . . . . . . . . 8

2.3 Three approaches to approximating the solution . . . . . . . . . . . . . . . . . . . 8

2.4 Cauchy Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Dogleg method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.6 Two dimensional problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

II Constrained Optimization 10

3 Mathematical programming 10

3.1 Standard Linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.2 Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

4 Convex Optimization 13

4.1 Chebyshev problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

4.1.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.2 Example 2: Forward rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.2 Quadratic problem QP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2.1 Quadratic constraint quadratic problem QCQP . . . . . . . . . . . . . . . . 15

4.2.2 Markowitz portfolio optimization . . . . . . . . . . . . . . . . . . . . . . . . 15

24


25/25

5 Duality Theory 17

5.0.3 Example: dual function for LP . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.0.4 Example: trust-region problem . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.1 Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.1.1 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.1.2 Weak Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2 Strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

5.3 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.4 Necessary optimality conditions (Karuch-Kuhn-Tucker) . . . . . . . . . . . . . . . 23

25

Documents

Optimization class notes MTH-9842