Upload
felixapfaltrer7766
View
218
Download
0
Embed Size (px)
Citation preview
8/14/2019 Optimization class notes MTH-9842
1/25
Part I
Unconstrained optimization
Line search method
Truest region method
Recall: Taylors theorem
1. f(y) = f(x) + f()(y x), (x, y)
2. f(y) = f(x) +10
f(x + t(y x))dt
1 Line search method
1.1 Determine a direction (decent)
Definition:
A direction p at x1 is called a decent direction if pTf(x1) < 0.
1.1.1 Steepest decent (negative gradient)
Rmk:
Duf(x) = limt0f(x + tu) f(x)
t= f(x) u (1)
= |f(x)|cos, when u is a unit vector.
Therefore, if
1. = 0, i.e, u is in the direction of the gradient of f, then Duf(x) = |f(x)| > 0 iff(x) = 0.
2. = , i.e, u is in the opposite direction of the gradient of f, then Duf(x) = |f(x)| < 0if f(x) = 0.
Rmk:
1. We only need first derivative (good)!2. Sometimes the convergence is very very slow (bad)!
3. Any descent direction will work.
1
8/14/2019 Optimization class notes MTH-9842
2/25
1.1.2 Newtons method
Starting at x1 approximate f at x1 by Taylor polynomial at degree 2:
f(x) = f(x1) + f(x1)T(x x1) +
1
2
(x x1)T2f(x1)(x x1) := m(x) (2)
A sufficient condition to minimize m(x) is m(x) = 0, i.e.,
f(x1) + 2f(x x1) = 0 (3)
Side remark:
f(x) = bTx = f(x) = b
f(x) = 12xTQx = f(x) = 12(Q + Q
T)x
NOTE: Only if Q is symmetric, this simplifies to f = Qx.
If 2f(x1) is invertible, then x x1 = (2f(x1))
1 f(x1).
This direction is called Newtons direction.
Remarks:
1. Convergence is quadratic (fast)!
2. Need to compute Hessian.
3. Why is Newtons direction a decent direction?HOMEWORK: Show that if2f(x1) is positive definite, then Newtons direction is a decent.
4. If2
f(x1) is NOT positive definite (then Newtons direction is not decent), then we need tofix the Hessian.
Summary:
Let
pk: direction at the kth stage
k: step size for the kth iteration.
Then
xk+1 = xk + kpk, k = 0, 1, 2, . . . (4)
Note thatpk = kf(xk) (5)
and if
2
8/14/2019 Optimization class notes MTH-9842
3/25
k = Id, then we are using steepest decent method.
k = x x1, then we are using Newtons method.
1.2 How far do we go along a chosen direction?
The next question is how long to go along a piven pk.
Let() = f(xk + pk), > 0. (6)
Idea:Minimize (), i,e., find the step which minimizes for > 0.
Good, but expensive!
How do we guarantee that we arrive at an absolute and not at a local minimum?
What we need is:
1. () < (0)
2. >> 0 (away from 0).
1.2.1 Criteria for acceptable stepsize
1. Wolfes conditions
(a) () (0) + c1(0) , (Armijo condition)
(b) () c2(0), where 0 < c1 < c2 < 1. (Curvature condition)
2. Goldsteins conditions
(a) (0) + 1 0)(0) () (0) + c(0), c (0, 1/2)
Backtracking:
Pick so that(a) is acceptable: done.(b) Go back to smaller steps.
1. For backtracking, do not use the curvature condition
2. For Newton direction, the natural initial step is = 1.
1.3 ConvergenceTheorem (Zoutendijk):
If f is continuously differentiable and f is Lipschitz continuous (i.e, L > 0, s.t. |f(y) f(x)| L|x y| x, y), and the steps satisfy Wolfes conditions and the directions are
3
8/14/2019 Optimization class notes MTH-9842
4/25
decent directions and given our iterative process:
xk+1 = xk + kpk (7)
then
k=1
cos2k||f(xk)||2 < . (8)
Remarks:
1. Zoutendijks theorem implies global convergence
(a) Sum is finite implies the terms go to zero.
(b) Ifcos2k > > 0k then ||f(xk)||2 0
(c) For steepest decent, cos2k = 1 > 0 = . steepest decent is always convergent.
(d) For Newtons method, if Hessian satisfies the uniformly bounded condition number (Ainvertible, (A) = ||A||||A1||), i.e, ||2f(xk)||||
2f(xk)1|| > M xk.
2. ||A|| = sup||x||1 ||Ax||||x||
3. g is Lipschitz continuous in a domain D ifL > 0 such that |g(y)g(x)| L|xy|, x, y DE.g. (HOMEWORK), g C1 then g is Lipschitz.
4. cosk =f(xk)pk
||f(xk)||||pk||
Recall that in Wolfes condition () = f(xk + pk), hence () = f(xk + pk) pk.
We pick k such that() c2f(xk) pk, hencef(xk + pk) c2f(xk + pk)
Proof (Zoutendijk):
(f(xk+1) f(xk)) pk (9)
= f(xk+1) pk f(xk) pk (10)
= f(xk + kpk) pk f(xk) pk (11)
c2f(xk) pk f(xk) pk, W olf e(ii) (12)
At the same time
|(f(xk+1) f(xk)) pk| (13)
||f(xk+1) f(xk)||||pk||, Cauchy (14)
= L||xk+1 xk||||pk||, Lipschitz (15)
= Lk||pk||, xk+1 xk = kpk (16)
Combining both inequalities, we obtain
4
8/14/2019 Optimization class notes MTH-9842
5/25
(c2 1)f(xk) pk Lk||pk||2
= k (c21)f(xk)pkL||pk||2
,
from the curvature condition; this implies that the steps have to be far away enough from 0.
Now, note that
f(xk+1) f(xk)= f(xk + kpk) f(xk) c1kf(xk) pk), Wolfes condition (i)= c1k(f(xk) pk)
c1(c2 1)
L||pk||2(f(xk) pk)(f(xk) pk) =
c1(1 c2)
L
(f(xk) pk)2
||pk||2
=c1(1 c2)
L
(f(xk) pk)2
||pk||2
Therefore we have,
f(xk) f(xk+1) c1(1 c2)L
||f(xk)||2cos2k, k
f(xk1) f(xk) c1(1 c2)
L||f(xk1)||
2cos2k1
...
f(x0) f(x1) c1(1 c2)
L||f(x0)||
2cos20
Adding this telescopic sum, we obtain
f(xk) f(xk+1) kn=0
c1(1 c2)
L||f(xn)||
2cos2n (17)
1.4 More on Convergence Rate
Theorem: f : Rn R twice continuously differentiable iterates generated by steepest descentwith exact line search convergence to x, at which 2f(x) > 0. Then for all k large enough wehave
f(xk+1) f(xk) = r2(f(xk) f(x
), (18)
where r (n1n+1
, 1) and 1 2 . . . n are the eigenvalues of the Hessian.
Sketch of proof: Using a telescopic argument
f(xk+1) f(xk) r2(r2(f(xk) f(x
))
= r
4
(f(xk) f(x
)
. . .
r2k(f(xk) f(x))
5
8/14/2019 Optimization class notes MTH-9842
6/25
Remark: Assume that the condition number of 2f(x) is
(2f(x)) = 2f(x)2f(x)1 =n1
= 800, say
Then
n 1n + 1 =
799
801 .
Then
f(xk+1) f(xk) 799
801
2k
(f(xk) f(x)
, but
f(xk+1) f(xk) 799
801
2k
= 0.08,
which is very slow! A disadvantage of linear convergence!
2 Trust region method
In contrast to the line search method, the trust region method determines the direction of the steplength simultaneously. Let xk be the current position within a region (the trust region), usually adisk B(xk, k) := {x : x xk k}.
Goal:minx
f(x), x R2
The objective function f has a good approximation function m(p), called the model function.Usually we pick
m(p) = f(xk) + f(xk)Tpk +
1
2pT2 f(xk)p,
the second order Taylor expansion of f at xk.
That is the ideal case, but sometimes the Hessian is hard to find, therefore, in practice we useinstead
m(p) = f(xk) + gTk pk +
1
2pTBkp.
We choose xk+1 to be the one that minimizes the model function m(p) within the trust region ifthe function decreases enough at the new point.
Remark: If the value of the objective function at the new point is not small enough, then wereject the model and shrink k and redo the constraint optimization.
2.1 Pseudo-Algorithm
Given a maximal radius pick 0 (0, , [0,14).
1. Define
k =f(xk) f(xk+1)
m(0) m(k), xk+1 = xk + pk
6
8/14/2019 Optimization class notes MTH-9842
7/25
Figure 1: default
7
8/14/2019 Optimization class notes MTH-9842
8/25
2. Ifk < 0 then reject it, because f(xk+1) > f(xk) and shrink k.
3. Ifk 1, accept it and expand k.
4. Ifk > 0, k close to 0, then reject it and redo with smaller k.
Pseudo-code:
Input: , 0 (0, , [0,14)
Compute k, from solving m(p)p, |p| < k.
Calculate k =f(xk)f(xk+1)m(0)m(k)
if k 34 and |p| = k, then k+1 = min{2l,
Delta}else k+1 = k.
if k > then xk+1 = xk + pk
else xk+1 = xk.
Remark: The optimal solution p of
2.2 How to solve (or approximate) the model problem quickly?
The model problem is
minp
m(p) = fk + gTk p +
1
2pTBk p, s.t.|p| k, (19)
where fk = f(xk), gk = f(xk), Bk = 2 f(xk).
Remark: The optimal solution p of (19) is characterized by
1. p is feasible (i.e. |p| k)
2. 0
3. (k |p|) = 0
4. (Bk + I)p = g
5. Bk + I 0.
2.3 Three approaches to approximating the solution1. Cauchy Point: steepest descent
2. Dogleg method
3. Two dimensional minimization
8
8/14/2019 Optimization class notes MTH-9842
9/25
2.4 Cauchy Point
m(p) = fk + gTk p +
1
2pTBkp,
The Cauchy point is the minimum point ofm(p) in the steepest descent direction such that |p| .Define
() = f |g|2 +1
2pTBkp
2
Minimize () s.t. 0 |g| .
The Cauchy point pc is defined by
pc = g, where is the minimum point of .
2.5 Dogleg method
If small, then m(p) f + gT
p: Cauchy point is a good approx.
If is not small then the quadratic term plays a role: m(p) = fk + gTk p +12pTBkp
The unconstrained minimum of m(p) is given by m(p) = 0.
Let m(p) = g + Bp = 0 = p = B1g (if B > 0).
See graph in book, page 74.
We follow a kinked path, called the dog leg path, which is parametrized by
2.6 Two dimensional problem
The model problem is solving
minpf + gTp +
1
2pTBp,
where |p| < , p span{pu, pB}.
Remark: The 2-dimensional minimization improves the min found in the dogleg method, whichimproves the Cauchy point.
2-dim min better Dogleg > better > Cauchy.
9
8/14/2019 Optimization class notes MTH-9842
10/25
Part II
Constrained Optimization
3 Mathematical programming
If f : Rn R, the general mathematical programming optimization problem is often denoted by(MP), and it is the following problem:
min f(x) (20)
subject to m equality constraints
g1(x) = 0g2(x) = 0
...gm(x) = 0
and r inequality constraints:
h1(x) 0h2(x) 0
...hr(x) 0
Iff, g1, g2, . . . , gm, h1, . . . , hr are linear (affine) functions then the problem is called Linear Programmingand denoted (LP).
The (LP) problem can then be written in matrix form:
min cTx (21)
such that Aeqx + beq = 0 and Aineqx + bineq 0.
3.1 Standard Linear programming
The standard Linear programming can then be written as:
min cTxs.t.
Ax = bx 0
3.2 Simplex Method
Consider the following problem, which is solved using the simplex method. The tables below arecalled tableaux.
10
8/14/2019 Optimization class notes MTH-9842
11/25
(1) Consider the following linear programming problem:
max4x1 + 3x2
subject to3 x1 + x2 93x1 + 2x2 10x1 + x2 4x1, x2 0.
Transform this problem into the standard form. How many basic solutions doesthe standard form problem have? What are the basic feasible solutions andwhat are the extreme points of the feasible region? Solve the problem by sim-plex method. NOTE: Basic solutions are the points which satisfy the equalityconstraints but are not necessarily positive (if its positive, then its called basicfeasible).
By adding slack variables, this problem can be written in standard LP form:
max4x1 + 3x2 (22)
subject to3x
1+ x
2+x
3= 9
3x1+ 2x2 +x4 = 10x1+ x2 +x5 = 4x1, x2 , x3, x4 , x5 0.
The tableau for this linear programming problem is given by:
Basic var. x1 x2 x3 x4 x5
Z -4 -3 0 0 0 0 x3 3 1 1 0 0 9
x4 3 2 0 1 0 10x5 1 1 0 0 1 4
We choose x1 to be our entering variable since it is the most negative of the variables in theobjective row. We perform the ratio test and obtain 3, 10/3, 4, respectively, so we select x3 as ourleaving variable and 3 as our pivot. Therefore, we obtain:
Basic var. x1 x2 x3 x4 x5
Z 0 -5/3 4/3 0 0 12x1 1 1/3 1/3 0 0 3
x4 0 1 -1 1 0 1x5 0 2/3 -1/3 0 1 1
The entering variable is the only remaining negative variable in the objective row, x2, and per-forming the ratio test we obtain 9,1,3/2, respectively. Therefore, the leaving variable is x4 and thepivot is 1, which leads to tableau # 3:
Basic var. x1 x2 x3 x4 x5
Z 0 0 -1/3 5/3 0 41/3x1 1 0 2/3 -1/3 0 8/3x2 0 1 -1 1 0 1
x5 0 0 1/3 -2/3 1 1/3
11
8/14/2019 Optimization class notes MTH-9842
12/25
Since now x3 is negative in the objective row, we try to eliminate it by choosing x3 as the enteringvariable. We perform the ratio test on the non-negative entries in that column and we obtain 4,and 1 respectively, so that x5 is our leaving variable and 1/3
our pivot element, which leads toour last tableau:
Basic var. x1 x2 x3 x4 x5Z 0 0 0 1 1 14
x1 1 0 0 1 -2 2x2 0 1 0 -1 3 2x3 0 0 1 -2 3 1
Since both objective row variables are positive, we found an optimal solution, namely 14, for thevalues of x1 = 2, x2 = 2, x3 = 1.
12
8/14/2019 Optimization class notes MTH-9842
13/25
(a) Bild a. (b) Bild b.
Figure 2: Bild a och b.
4 Convex Optimization
NOTE: See Convex Optimization book chapters 4 and 5.
A convex optimization problem is one of the form
min f0(x)
subject to fi(x)0, i = 1,...,maTi x = bi, i = 1,...,p,
(23)
where f0,...,fm are convex functions.
A fundamental property of convex optimization problems is that any locally optimal point is also(globally) optimal. Proof by contradiction and the use of convexity in the segment xz, then
f0(z)(1?)f0(x) + f0(y) < f0(x).
More notes can be found on Boyd, and Vanderberghe, p. 136 - 151.
4.1 Chebyshev problem
Assume X is discrete and finiteX x1, x2, . . . , xn
P[X = xi] = pi
Given that f0(xi) = a0i , f1(xi) = a
1i , . . . , f m(xi) = a
mi ,
and
k E[fk(x)] =ni=1
piaki k
13
8/14/2019 Optimization class notes MTH-9842
14/25
then the Chebyshevs problem is to find
minE[f0(x)]
s.t.k E[fk(x)] k, k = 1,...,m
4.1.1 Example 1
I could not add as a note into the book.
Letf0(x) = x
f1(x) = 1(,)(x)
then the problem becomes
min E[X] =
pixi
st
1 E[1(,)(x)] 1 = P[X > ] =
{xi>}
pi
4.1.2 Example 2: Forward rate
X price of asset.
What is the max/min of a call on X struck at K, given E[X]?
max or min E[(X K)+] = maxp or min
ni=1
(xi K)+
pi
s.t.E[X] =
ni=1
xipi = , p 0,
pi = 1
More interesting / challenging if the constraint is
E[(x K1)+] = C
iff ni=1
xipi = c
What could be added to make the problem more realistic?
Adding variance the problem becomes quadratic optimization.
14
8/14/2019 Optimization class notes MTH-9842
15/25
4.2 Quadratic problem QP
Quadratic Objective function:
min1
2xTQx + pTx + r
s.t.
Ax = b, Gx h, affine constraints
4.2.1 Quadratic constraint quadratic problem QCQP
min1
2xTQx + pTx + r
s.t.
Ax = b
and1
2xTQix + p
Ti x + ri 0, i = 1, 2, . . . , m
4.2.2 Markowitz portfolio optimization
LetX1, X2, . . . , X n
be the return of the risky assets with mean returns
1, 2, . . . , n
and covariance matrixi,j = E[XiXj ]
LetP := (p1, p2, . . . , pn)
the portfolio of these assets, i.e., the amount invested in each asset 1, 2, . . . , n.
Thentp
is the expected return of the portfolio and
ptp
is the variance of the portfolio P.
Consider the problem
minpTp minimize risk
15
8/14/2019 Optimization class notes MTH-9842
16/25
such that
Tp rmin, expected return no less than rmin
n
i=1
pi 1, budget constraint
P[ptX ]
16
8/14/2019 Optimization class notes MTH-9842
17/25
5 Duality Theory
Problem:min f0(x) (24)
such thatfi(x) 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
Let x D, which is the intersection of the domains of all the functions.
Let
L(x,,) := f0(x) +
mi=1
ifi(x) +
pi=1
ihi(x) = f0(x)+ < , f > + < ,h >
whereL : D Rm Rp R
is the Lagrangian.
Define
g(, ) := infx
L(x,,) = infx
{f0(x) +ni=1
ifi(x) +
pi=1
ihi(x)}
This function g is the dual function of (24).
Remark:
1. g is concave, since it is the point-wise infimum of linear functions.
2. Let p be the optimal value of (24), then
p g(, ) 0.
Proof: If 0, x is feasible.
L(x,,) := f0(x) +mi=1
ifi(x) +
pi=1
ihi(x) f0(x),
because i 0, fi(x) 0, hi(x) = 0, since x is feasible. Hence,
infyD
L(y, , ) L(x,,) f0(x) x feasible and 0,
butL(y, , ) = g(, ),
so that
g(, ) p.
17
8/14/2019 Optimization class notes MTH-9842
18/25
5.0.3 Example: dual function for LP
min cTx
s.t. Ax = b,x 0. Rewrite instead as x 0.
ThenL(x,,) := cTx + T(x) + T(Ax b)
and
g(, ) = infx
{(ct t + tA)x} tb =
Tb, At = c otherwise
Since (ct t+ tA)x with no constraints, its infimum is attained at . To avoid that, we needto have
ct t + tA = 0
or equivalently:At = c
5.0.4 Example: trust-region problem
min xTAx + bTx
s.t. xTx 2, where is the radius of the trust region.
Find the dual function of the problem.
Solution:L(x, ) := xTAx + bTx + (xTx 2)
and
g() = infxR
n
L(x, ) = infx
xTAx + bTx + (xTx 2) = infx
xT(A + I)x + bTx 2.
Then, since A is symmetric, we obtain
(xT(A + I)x + bTx 2) = 2(A + I)x + b = 0,
and hence
x = (A + I)1b
2
if A + I is invertible.
Therefore,
g() =
14 bT(A + I)1b 2 ifA + I is positive definite
otherwise
The value 2 is obtained by pluging x = (A + I)1b into the equation for L:
bt
2(A + I)1(A + I)(A + I)1
b
2 bT(A + I)1
b
2=
1
4bT(A + I)1b
Remark:
18
8/14/2019 Optimization class notes MTH-9842
19/25
p 14bT(A + I)1b 2 is positive definite
The dual function provides a nontrivial lower bound on p if 0 and g(, ) =
(, ) s.t. 0 and g(, ) = is called dual feasible.
Actual question:
1. What is the best lower bound?
2. Is the best lower bound equal to p?
5.1 Dual Problem
Answer:
1. Dual problemd := max g(, ) s.t. 0
which is always a convex problem.
2. In general, p = d.
5.1.1 Strong Duality
If p = d, then we say we have strong duality.
5.1.2 Weak Duality
Since
p
g(, ), 0we have
p max{g(, )| 0} := d
NOTE: Weak duality is always true, no matter if the primal is convex or not.
5.1.3 Examples
1. LP:min cTxs.t.
Ax = b
x 0Then
L(x,,) := cTx + T(x) + T(Ax b) = (ct t + tA)x tb
and
g(, ) = infx
{(ct t + tA)x} tb =
Tb, ct + t + tA = 0 otherwise
19
8/14/2019 Optimization class notes MTH-9842
20/25
The dual problem ismax g(, )s.t.
0
Rewriting it in an equivalent form:
max,
tb
s.t.At+ c + = 0 0
or alternatively, since is playing the role of a slack variable:
max
tb
s.t.At+ c = 0
Remark:Dual of the dual problem of the LP problem is the LP problem again!2. Two way partition problem
min xTW x, W symmetric, not nec. positive definites.t. x = (x1, . . . , xn)
xi = 1, i = 0, . . . , n
What is the dual problem?Solution:
L(, ) = xtW x +
i(x2i 1)
and
g(, ) = inf x
{xtW x +ni=1 ix
2i n
i=1
i}
= infx
{xt(W + N)x ni=1
i}
which equals ni=1 i if W + N = 0
otherwise
where
N =
1 0. . .
0 n
Dual Problem:max
ni=1
i
s.t. W + N 0(25)
Remarks:p d, even if p, d =
20
8/14/2019 Optimization class notes MTH-9842
21/25
1. p = , i.e. primal is unbounded, thend = , i.e., the dual is infeasible
2. d = , i.e., the dual is unbounded, thenp = , i.e., the primal is infeasible
5.2 Strong duality
p = d
Remark:In general, we dont have strong duality.For convex problem, we usuallyhave strong duality
Theorem 5.1 (Slaters condition)For a convex problem, the strong duality holds if there is an x rel int(D), such that x is feasible,i.e.,
fi(x) < 0, i = 1, . . . , mhi(x) = 0, i = 1, . . . , p
Notice the strict inequality in the first condition.
Remark:(Weaker Slaters condition)If f1, . . . , f m are affine then the Slaters condition can be weakened to nonpositivity: f1(x) 0, . . . , f m(x) 0.
5.2.1 Examples
1. LP:min cTxs.t.
Ax = bx 0
Slaters condition just means feasibility in this case. The dual in this case is
max,
tb
s.t.At+ c + = 0 0
(a) If the primal is feasible (then weak Slater condition holds) , then we have strong duality,i.e. p = d.
(b) If the dual is feasible (then weak Slater condition holds), then we have strong duality,i.e. d = p.
(c) primal and dual are both infeasible, then it can happen that = p = q = .
21
8/14/2019 Optimization class notes MTH-9842
22/25
2. QCQP: quadratic constraint, quadr. programming:
min 12xTP0x + q
T0 x + r0
s.t.12xTPix + q
Ti x + ri 0
i = 1, . . . , m
Ax = b
Then Slaters condition is: x | 12xTPix + q
Ti x + ri < 0.
Convexity:P0 > 0, pi 0, i = 1, . . . , m
Slaters condition + convexity give us strong duality.
3. Nonconvex problem with strong duality Strong duality holds for any optimization problemwith quadratic objective function and one quadratic inequality constraint provided thatSlaters condition holds.Remark:Trust-region problem
min xTAx + bTx
s.t. xTx 2
where A is symmetric but not necessarily positive definite!
Dual problem:
max 14bT(A + I)1b 2
s.t. 0, A + I 0
Figure 3: Solving both problems simultaneously gives faster convergence.
5.3 Optimality conditions
Assume that the objective function and the constraints are differentiable.
22
8/14/2019 Optimization class notes MTH-9842
23/25
Complementary slackness:Assume we have strong duality, i.e. p = d, and they are attained at x for primal problem and(, ) for the dual problem, i.e.
f0(x) = g(, ) strong duality
= infx L(x,
,
) definition of g (=) L(x, , )
= f0(x) +mi=1
i fi(x) +pi=1
i hi(x) def of Lagrangian
Since we actually achieve strict equality in the one but last line, therefore, since
hi(x) = 0, fi(x
) 0, 0
it follows that both sums have to be exactly 0.
Therefore,
1. x minimizes L(x, , ) over x.
2. Complementary slackness: i fi(x) = 0, i = 1, . . . , m
5.4 Necessary optimality conditions (Karuch-Kuhn-Tucker)
Dual feasible:i 0, i
Primal feasible:
fi(x) 0, i = 1, . . . ,m, hi(x) 0, i = 1, . . . , p .
Complementary slackness: i fi(x) = 0, i = 1, . . . , m
This last condition can be relaxed by setting instead:
xL(x,,) = f0(x) +mi=1
fi i(x) +
pi=1
ihi(x)
23
8/14/2019 Optimization class notes MTH-9842
24/25
Contents
I Unconstrained optimization 1
1 Line search method 1
1.1 Determine a direction (decent) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Steepest decent (negative gradient) . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Newtons method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 How far do we go along a chosen direction? . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Criteria for acceptable stepsize . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 More on Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Trust region method 6
2.1 Pseudo-Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 How to solve (or approximate) the model problem quickly? . . . . . . . . . . . . . 8
2.3 Three approaches to approximating the solution . . . . . . . . . . . . . . . . . . . 8
2.4 Cauchy Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Dogleg method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.6 Two dimensional problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
II Constrained Optimization 10
3 Mathematical programming 10
3.1 Standard Linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4 Convex Optimization 13
4.1 Chebyshev problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.1.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1.2 Example 2: Forward rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.2 Quadratic problem QP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2.1 Quadratic constraint quadratic problem QCQP . . . . . . . . . . . . . . . . 15
4.2.2 Markowitz portfolio optimization . . . . . . . . . . . . . . . . . . . . . . . . 15
24
8/14/2019 Optimization class notes MTH-9842
25/25
5 Duality Theory 17
5.0.3 Example: dual function for LP . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.0.4 Example: trust-region problem . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.1 Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1.1 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1.2 Weak Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 Strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4 Necessary optimality conditions (Karuch-Kuhn-Tucker) . . . . . . . . . . . . . . . 23
25