Optimization class notes MTH-9842

Embed Size (px)

Citation preview

  • 8/14/2019 Optimization class notes MTH-9842

    1/25

    Part I

    Unconstrained optimization

    Line search method

    Truest region method

    Recall: Taylors theorem

    1. f(y) = f(x) + f()(y x), (x, y)

    2. f(y) = f(x) +10

    f(x + t(y x))dt

    1 Line search method

    1.1 Determine a direction (decent)

    Definition:

    A direction p at x1 is called a decent direction if pTf(x1) < 0.

    1.1.1 Steepest decent (negative gradient)

    Rmk:

    Duf(x) = limt0f(x + tu) f(x)

    t= f(x) u (1)

    = |f(x)|cos, when u is a unit vector.

    Therefore, if

    1. = 0, i.e, u is in the direction of the gradient of f, then Duf(x) = |f(x)| > 0 iff(x) = 0.

    2. = , i.e, u is in the opposite direction of the gradient of f, then Duf(x) = |f(x)| < 0if f(x) = 0.

    Rmk:

    1. We only need first derivative (good)!2. Sometimes the convergence is very very slow (bad)!

    3. Any descent direction will work.

    1

  • 8/14/2019 Optimization class notes MTH-9842

    2/25

    1.1.2 Newtons method

    Starting at x1 approximate f at x1 by Taylor polynomial at degree 2:

    f(x) = f(x1) + f(x1)T(x x1) +

    1

    2

    (x x1)T2f(x1)(x x1) := m(x) (2)

    A sufficient condition to minimize m(x) is m(x) = 0, i.e.,

    f(x1) + 2f(x x1) = 0 (3)

    Side remark:

    f(x) = bTx = f(x) = b

    f(x) = 12xTQx = f(x) = 12(Q + Q

    T)x

    NOTE: Only if Q is symmetric, this simplifies to f = Qx.

    If 2f(x1) is invertible, then x x1 = (2f(x1))

    1 f(x1).

    This direction is called Newtons direction.

    Remarks:

    1. Convergence is quadratic (fast)!

    2. Need to compute Hessian.

    3. Why is Newtons direction a decent direction?HOMEWORK: Show that if2f(x1) is positive definite, then Newtons direction is a decent.

    4. If2

    f(x1) is NOT positive definite (then Newtons direction is not decent), then we need tofix the Hessian.

    Summary:

    Let

    pk: direction at the kth stage

    k: step size for the kth iteration.

    Then

    xk+1 = xk + kpk, k = 0, 1, 2, . . . (4)

    Note thatpk = kf(xk) (5)

    and if

    2

  • 8/14/2019 Optimization class notes MTH-9842

    3/25

    k = Id, then we are using steepest decent method.

    k = x x1, then we are using Newtons method.

    1.2 How far do we go along a chosen direction?

    The next question is how long to go along a piven pk.

    Let() = f(xk + pk), > 0. (6)

    Idea:Minimize (), i,e., find the step which minimizes for > 0.

    Good, but expensive!

    How do we guarantee that we arrive at an absolute and not at a local minimum?

    What we need is:

    1. () < (0)

    2. >> 0 (away from 0).

    1.2.1 Criteria for acceptable stepsize

    1. Wolfes conditions

    (a) () (0) + c1(0) , (Armijo condition)

    (b) () c2(0), where 0 < c1 < c2 < 1. (Curvature condition)

    2. Goldsteins conditions

    (a) (0) + 1 0)(0) () (0) + c(0), c (0, 1/2)

    Backtracking:

    Pick so that(a) is acceptable: done.(b) Go back to smaller steps.

    1. For backtracking, do not use the curvature condition

    2. For Newton direction, the natural initial step is = 1.

    1.3 ConvergenceTheorem (Zoutendijk):

    If f is continuously differentiable and f is Lipschitz continuous (i.e, L > 0, s.t. |f(y) f(x)| L|x y| x, y), and the steps satisfy Wolfes conditions and the directions are

    3

  • 8/14/2019 Optimization class notes MTH-9842

    4/25

    decent directions and given our iterative process:

    xk+1 = xk + kpk (7)

    then

    k=1

    cos2k||f(xk)||2 < . (8)

    Remarks:

    1. Zoutendijks theorem implies global convergence

    (a) Sum is finite implies the terms go to zero.

    (b) Ifcos2k > > 0k then ||f(xk)||2 0

    (c) For steepest decent, cos2k = 1 > 0 = . steepest decent is always convergent.

    (d) For Newtons method, if Hessian satisfies the uniformly bounded condition number (Ainvertible, (A) = ||A||||A1||), i.e, ||2f(xk)||||

    2f(xk)1|| > M xk.

    2. ||A|| = sup||x||1 ||Ax||||x||

    3. g is Lipschitz continuous in a domain D ifL > 0 such that |g(y)g(x)| L|xy|, x, y DE.g. (HOMEWORK), g C1 then g is Lipschitz.

    4. cosk =f(xk)pk

    ||f(xk)||||pk||

    Recall that in Wolfes condition () = f(xk + pk), hence () = f(xk + pk) pk.

    We pick k such that() c2f(xk) pk, hencef(xk + pk) c2f(xk + pk)

    Proof (Zoutendijk):

    (f(xk+1) f(xk)) pk (9)

    = f(xk+1) pk f(xk) pk (10)

    = f(xk + kpk) pk f(xk) pk (11)

    c2f(xk) pk f(xk) pk, W olf e(ii) (12)

    At the same time

    |(f(xk+1) f(xk)) pk| (13)

    ||f(xk+1) f(xk)||||pk||, Cauchy (14)

    = L||xk+1 xk||||pk||, Lipschitz (15)

    = Lk||pk||, xk+1 xk = kpk (16)

    Combining both inequalities, we obtain

    4

  • 8/14/2019 Optimization class notes MTH-9842

    5/25

    (c2 1)f(xk) pk Lk||pk||2

    = k (c21)f(xk)pkL||pk||2

    ,

    from the curvature condition; this implies that the steps have to be far away enough from 0.

    Now, note that

    f(xk+1) f(xk)= f(xk + kpk) f(xk) c1kf(xk) pk), Wolfes condition (i)= c1k(f(xk) pk)

    c1(c2 1)

    L||pk||2(f(xk) pk)(f(xk) pk) =

    c1(1 c2)

    L

    (f(xk) pk)2

    ||pk||2

    =c1(1 c2)

    L

    (f(xk) pk)2

    ||pk||2

    Therefore we have,

    f(xk) f(xk+1) c1(1 c2)L

    ||f(xk)||2cos2k, k

    f(xk1) f(xk) c1(1 c2)

    L||f(xk1)||

    2cos2k1

    ...

    f(x0) f(x1) c1(1 c2)

    L||f(x0)||

    2cos20

    Adding this telescopic sum, we obtain

    f(xk) f(xk+1) kn=0

    c1(1 c2)

    L||f(xn)||

    2cos2n (17)

    1.4 More on Convergence Rate

    Theorem: f : Rn R twice continuously differentiable iterates generated by steepest descentwith exact line search convergence to x, at which 2f(x) > 0. Then for all k large enough wehave

    f(xk+1) f(xk) = r2(f(xk) f(x

    ), (18)

    where r (n1n+1

    , 1) and 1 2 . . . n are the eigenvalues of the Hessian.

    Sketch of proof: Using a telescopic argument

    f(xk+1) f(xk) r2(r2(f(xk) f(x

    ))

    = r

    4

    (f(xk) f(x

    )

    . . .

    r2k(f(xk) f(x))

    5

  • 8/14/2019 Optimization class notes MTH-9842

    6/25

    Remark: Assume that the condition number of 2f(x) is

    (2f(x)) = 2f(x)2f(x)1 =n1

    = 800, say

    Then

    n 1n + 1 =

    799

    801 .

    Then

    f(xk+1) f(xk) 799

    801

    2k

    (f(xk) f(x)

    , but

    f(xk+1) f(xk) 799

    801

    2k

    = 0.08,

    which is very slow! A disadvantage of linear convergence!

    2 Trust region method

    In contrast to the line search method, the trust region method determines the direction of the steplength simultaneously. Let xk be the current position within a region (the trust region), usually adisk B(xk, k) := {x : x xk k}.

    Goal:minx

    f(x), x R2

    The objective function f has a good approximation function m(p), called the model function.Usually we pick

    m(p) = f(xk) + f(xk)Tpk +

    1

    2pT2 f(xk)p,

    the second order Taylor expansion of f at xk.

    That is the ideal case, but sometimes the Hessian is hard to find, therefore, in practice we useinstead

    m(p) = f(xk) + gTk pk +

    1

    2pTBkp.

    We choose xk+1 to be the one that minimizes the model function m(p) within the trust region ifthe function decreases enough at the new point.

    Remark: If the value of the objective function at the new point is not small enough, then wereject the model and shrink k and redo the constraint optimization.

    2.1 Pseudo-Algorithm

    Given a maximal radius pick 0 (0, , [0,14).

    1. Define

    k =f(xk) f(xk+1)

    m(0) m(k), xk+1 = xk + pk

    6

  • 8/14/2019 Optimization class notes MTH-9842

    7/25

    Figure 1: default

    7

  • 8/14/2019 Optimization class notes MTH-9842

    8/25

    2. Ifk < 0 then reject it, because f(xk+1) > f(xk) and shrink k.

    3. Ifk 1, accept it and expand k.

    4. Ifk > 0, k close to 0, then reject it and redo with smaller k.

    Pseudo-code:

    Input: , 0 (0, , [0,14)

    Compute k, from solving m(p)p, |p| < k.

    Calculate k =f(xk)f(xk+1)m(0)m(k)

    if k 34 and |p| = k, then k+1 = min{2l,

    Delta}else k+1 = k.

    if k > then xk+1 = xk + pk

    else xk+1 = xk.

    Remark: The optimal solution p of

    2.2 How to solve (or approximate) the model problem quickly?

    The model problem is

    minp

    m(p) = fk + gTk p +

    1

    2pTBk p, s.t.|p| k, (19)

    where fk = f(xk), gk = f(xk), Bk = 2 f(xk).

    Remark: The optimal solution p of (19) is characterized by

    1. p is feasible (i.e. |p| k)

    2. 0

    3. (k |p|) = 0

    4. (Bk + I)p = g

    5. Bk + I 0.

    2.3 Three approaches to approximating the solution1. Cauchy Point: steepest descent

    2. Dogleg method

    3. Two dimensional minimization

    8

  • 8/14/2019 Optimization class notes MTH-9842

    9/25

    2.4 Cauchy Point

    m(p) = fk + gTk p +

    1

    2pTBkp,

    The Cauchy point is the minimum point ofm(p) in the steepest descent direction such that |p| .Define

    () = f |g|2 +1

    2pTBkp

    2

    Minimize () s.t. 0 |g| .

    The Cauchy point pc is defined by

    pc = g, where is the minimum point of .

    2.5 Dogleg method

    If small, then m(p) f + gT

    p: Cauchy point is a good approx.

    If is not small then the quadratic term plays a role: m(p) = fk + gTk p +12pTBkp

    The unconstrained minimum of m(p) is given by m(p) = 0.

    Let m(p) = g + Bp = 0 = p = B1g (if B > 0).

    See graph in book, page 74.

    We follow a kinked path, called the dog leg path, which is parametrized by

    2.6 Two dimensional problem

    The model problem is solving

    minpf + gTp +

    1

    2pTBp,

    where |p| < , p span{pu, pB}.

    Remark: The 2-dimensional minimization improves the min found in the dogleg method, whichimproves the Cauchy point.

    2-dim min better Dogleg > better > Cauchy.

    9

  • 8/14/2019 Optimization class notes MTH-9842

    10/25

    Part II

    Constrained Optimization

    3 Mathematical programming

    If f : Rn R, the general mathematical programming optimization problem is often denoted by(MP), and it is the following problem:

    min f(x) (20)

    subject to m equality constraints

    g1(x) = 0g2(x) = 0

    ...gm(x) = 0

    and r inequality constraints:

    h1(x) 0h2(x) 0

    ...hr(x) 0

    Iff, g1, g2, . . . , gm, h1, . . . , hr are linear (affine) functions then the problem is called Linear Programmingand denoted (LP).

    The (LP) problem can then be written in matrix form:

    min cTx (21)

    such that Aeqx + beq = 0 and Aineqx + bineq 0.

    3.1 Standard Linear programming

    The standard Linear programming can then be written as:

    min cTxs.t.

    Ax = bx 0

    3.2 Simplex Method

    Consider the following problem, which is solved using the simplex method. The tables below arecalled tableaux.

    10

  • 8/14/2019 Optimization class notes MTH-9842

    11/25

    (1) Consider the following linear programming problem:

    max4x1 + 3x2

    subject to3 x1 + x2 93x1 + 2x2 10x1 + x2 4x1, x2 0.

    Transform this problem into the standard form. How many basic solutions doesthe standard form problem have? What are the basic feasible solutions andwhat are the extreme points of the feasible region? Solve the problem by sim-plex method. NOTE: Basic solutions are the points which satisfy the equalityconstraints but are not necessarily positive (if its positive, then its called basicfeasible).

    By adding slack variables, this problem can be written in standard LP form:

    max4x1 + 3x2 (22)

    subject to3x

    1+ x

    2+x

    3= 9

    3x1+ 2x2 +x4 = 10x1+ x2 +x5 = 4x1, x2 , x3, x4 , x5 0.

    The tableau for this linear programming problem is given by:

    Basic var. x1 x2 x3 x4 x5

    Z -4 -3 0 0 0 0 x3 3 1 1 0 0 9

    x4 3 2 0 1 0 10x5 1 1 0 0 1 4

    We choose x1 to be our entering variable since it is the most negative of the variables in theobjective row. We perform the ratio test and obtain 3, 10/3, 4, respectively, so we select x3 as ourleaving variable and 3 as our pivot. Therefore, we obtain:

    Basic var. x1 x2 x3 x4 x5

    Z 0 -5/3 4/3 0 0 12x1 1 1/3 1/3 0 0 3

    x4 0 1 -1 1 0 1x5 0 2/3 -1/3 0 1 1

    The entering variable is the only remaining negative variable in the objective row, x2, and per-forming the ratio test we obtain 9,1,3/2, respectively. Therefore, the leaving variable is x4 and thepivot is 1, which leads to tableau # 3:

    Basic var. x1 x2 x3 x4 x5

    Z 0 0 -1/3 5/3 0 41/3x1 1 0 2/3 -1/3 0 8/3x2 0 1 -1 1 0 1

    x5 0 0 1/3 -2/3 1 1/3

    11

  • 8/14/2019 Optimization class notes MTH-9842

    12/25

    Since now x3 is negative in the objective row, we try to eliminate it by choosing x3 as the enteringvariable. We perform the ratio test on the non-negative entries in that column and we obtain 4,and 1 respectively, so that x5 is our leaving variable and 1/3

    our pivot element, which leads toour last tableau:

    Basic var. x1 x2 x3 x4 x5Z 0 0 0 1 1 14

    x1 1 0 0 1 -2 2x2 0 1 0 -1 3 2x3 0 0 1 -2 3 1

    Since both objective row variables are positive, we found an optimal solution, namely 14, for thevalues of x1 = 2, x2 = 2, x3 = 1.

    12

  • 8/14/2019 Optimization class notes MTH-9842

    13/25

    (a) Bild a. (b) Bild b.

    Figure 2: Bild a och b.

    4 Convex Optimization

    NOTE: See Convex Optimization book chapters 4 and 5.

    A convex optimization problem is one of the form

    min f0(x)

    subject to fi(x)0, i = 1,...,maTi x = bi, i = 1,...,p,

    (23)

    where f0,...,fm are convex functions.

    A fundamental property of convex optimization problems is that any locally optimal point is also(globally) optimal. Proof by contradiction and the use of convexity in the segment xz, then

    f0(z)(1?)f0(x) + f0(y) < f0(x).

    More notes can be found on Boyd, and Vanderberghe, p. 136 - 151.

    4.1 Chebyshev problem

    Assume X is discrete and finiteX x1, x2, . . . , xn

    P[X = xi] = pi

    Given that f0(xi) = a0i , f1(xi) = a

    1i , . . . , f m(xi) = a

    mi ,

    and

    k E[fk(x)] =ni=1

    piaki k

    13

  • 8/14/2019 Optimization class notes MTH-9842

    14/25

    then the Chebyshevs problem is to find

    minE[f0(x)]

    s.t.k E[fk(x)] k, k = 1,...,m

    4.1.1 Example 1

    I could not add as a note into the book.

    Letf0(x) = x

    f1(x) = 1(,)(x)

    then the problem becomes

    min E[X] =

    pixi

    st

    1 E[1(,)(x)] 1 = P[X > ] =

    {xi>}

    pi

    4.1.2 Example 2: Forward rate

    X price of asset.

    What is the max/min of a call on X struck at K, given E[X]?

    max or min E[(X K)+] = maxp or min

    ni=1

    (xi K)+

    pi

    s.t.E[X] =

    ni=1

    xipi = , p 0,

    pi = 1

    More interesting / challenging if the constraint is

    E[(x K1)+] = C

    iff ni=1

    xipi = c

    What could be added to make the problem more realistic?

    Adding variance the problem becomes quadratic optimization.

    14

  • 8/14/2019 Optimization class notes MTH-9842

    15/25

    4.2 Quadratic problem QP

    Quadratic Objective function:

    min1

    2xTQx + pTx + r

    s.t.

    Ax = b, Gx h, affine constraints

    4.2.1 Quadratic constraint quadratic problem QCQP

    min1

    2xTQx + pTx + r

    s.t.

    Ax = b

    and1

    2xTQix + p

    Ti x + ri 0, i = 1, 2, . . . , m

    4.2.2 Markowitz portfolio optimization

    LetX1, X2, . . . , X n

    be the return of the risky assets with mean returns

    1, 2, . . . , n

    and covariance matrixi,j = E[XiXj ]

    LetP := (p1, p2, . . . , pn)

    the portfolio of these assets, i.e., the amount invested in each asset 1, 2, . . . , n.

    Thentp

    is the expected return of the portfolio and

    ptp

    is the variance of the portfolio P.

    Consider the problem

    minpTp minimize risk

    15

  • 8/14/2019 Optimization class notes MTH-9842

    16/25

    such that

    Tp rmin, expected return no less than rmin

    n

    i=1

    pi 1, budget constraint

    P[ptX ]

    16

  • 8/14/2019 Optimization class notes MTH-9842

    17/25

    5 Duality Theory

    Problem:min f0(x) (24)

    such thatfi(x) 0, i = 1, . . . , m

    hi(x) = 0, i = 1, . . . , p

    Let x D, which is the intersection of the domains of all the functions.

    Let

    L(x,,) := f0(x) +

    mi=1

    ifi(x) +

    pi=1

    ihi(x) = f0(x)+ < , f > + < ,h >

    whereL : D Rm Rp R

    is the Lagrangian.

    Define

    g(, ) := infx

    L(x,,) = infx

    {f0(x) +ni=1

    ifi(x) +

    pi=1

    ihi(x)}

    This function g is the dual function of (24).

    Remark:

    1. g is concave, since it is the point-wise infimum of linear functions.

    2. Let p be the optimal value of (24), then

    p g(, ) 0.

    Proof: If 0, x is feasible.

    L(x,,) := f0(x) +mi=1

    ifi(x) +

    pi=1

    ihi(x) f0(x),

    because i 0, fi(x) 0, hi(x) = 0, since x is feasible. Hence,

    infyD

    L(y, , ) L(x,,) f0(x) x feasible and 0,

    butL(y, , ) = g(, ),

    so that

    g(, ) p.

    17

  • 8/14/2019 Optimization class notes MTH-9842

    18/25

    5.0.3 Example: dual function for LP

    min cTx

    s.t. Ax = b,x 0. Rewrite instead as x 0.

    ThenL(x,,) := cTx + T(x) + T(Ax b)

    and

    g(, ) = infx

    {(ct t + tA)x} tb =

    Tb, At = c otherwise

    Since (ct t+ tA)x with no constraints, its infimum is attained at . To avoid that, we needto have

    ct t + tA = 0

    or equivalently:At = c

    5.0.4 Example: trust-region problem

    min xTAx + bTx

    s.t. xTx 2, where is the radius of the trust region.

    Find the dual function of the problem.

    Solution:L(x, ) := xTAx + bTx + (xTx 2)

    and

    g() = infxR

    n

    L(x, ) = infx

    xTAx + bTx + (xTx 2) = infx

    xT(A + I)x + bTx 2.

    Then, since A is symmetric, we obtain

    (xT(A + I)x + bTx 2) = 2(A + I)x + b = 0,

    and hence

    x = (A + I)1b

    2

    if A + I is invertible.

    Therefore,

    g() =

    14 bT(A + I)1b 2 ifA + I is positive definite

    otherwise

    The value 2 is obtained by pluging x = (A + I)1b into the equation for L:

    bt

    2(A + I)1(A + I)(A + I)1

    b

    2 bT(A + I)1

    b

    2=

    1

    4bT(A + I)1b

    Remark:

    18

  • 8/14/2019 Optimization class notes MTH-9842

    19/25

    p 14bT(A + I)1b 2 is positive definite

    The dual function provides a nontrivial lower bound on p if 0 and g(, ) =

    (, ) s.t. 0 and g(, ) = is called dual feasible.

    Actual question:

    1. What is the best lower bound?

    2. Is the best lower bound equal to p?

    5.1 Dual Problem

    Answer:

    1. Dual problemd := max g(, ) s.t. 0

    which is always a convex problem.

    2. In general, p = d.

    5.1.1 Strong Duality

    If p = d, then we say we have strong duality.

    5.1.2 Weak Duality

    Since

    p

    g(, ), 0we have

    p max{g(, )| 0} := d

    NOTE: Weak duality is always true, no matter if the primal is convex or not.

    5.1.3 Examples

    1. LP:min cTxs.t.

    Ax = b

    x 0Then

    L(x,,) := cTx + T(x) + T(Ax b) = (ct t + tA)x tb

    and

    g(, ) = infx

    {(ct t + tA)x} tb =

    Tb, ct + t + tA = 0 otherwise

    19

  • 8/14/2019 Optimization class notes MTH-9842

    20/25

    The dual problem ismax g(, )s.t.

    0

    Rewriting it in an equivalent form:

    max,

    tb

    s.t.At+ c + = 0 0

    or alternatively, since is playing the role of a slack variable:

    max

    tb

    s.t.At+ c = 0

    Remark:Dual of the dual problem of the LP problem is the LP problem again!2. Two way partition problem

    min xTW x, W symmetric, not nec. positive definites.t. x = (x1, . . . , xn)

    xi = 1, i = 0, . . . , n

    What is the dual problem?Solution:

    L(, ) = xtW x +

    i(x2i 1)

    and

    g(, ) = inf x

    {xtW x +ni=1 ix

    2i n

    i=1

    i}

    = infx

    {xt(W + N)x ni=1

    i}

    which equals ni=1 i if W + N = 0

    otherwise

    where

    N =

    1 0. . .

    0 n

    Dual Problem:max

    ni=1

    i

    s.t. W + N 0(25)

    Remarks:p d, even if p, d =

    20

  • 8/14/2019 Optimization class notes MTH-9842

    21/25

    1. p = , i.e. primal is unbounded, thend = , i.e., the dual is infeasible

    2. d = , i.e., the dual is unbounded, thenp = , i.e., the primal is infeasible

    5.2 Strong duality

    p = d

    Remark:In general, we dont have strong duality.For convex problem, we usuallyhave strong duality

    Theorem 5.1 (Slaters condition)For a convex problem, the strong duality holds if there is an x rel int(D), such that x is feasible,i.e.,

    fi(x) < 0, i = 1, . . . , mhi(x) = 0, i = 1, . . . , p

    Notice the strict inequality in the first condition.

    Remark:(Weaker Slaters condition)If f1, . . . , f m are affine then the Slaters condition can be weakened to nonpositivity: f1(x) 0, . . . , f m(x) 0.

    5.2.1 Examples

    1. LP:min cTxs.t.

    Ax = bx 0

    Slaters condition just means feasibility in this case. The dual in this case is

    max,

    tb

    s.t.At+ c + = 0 0

    (a) If the primal is feasible (then weak Slater condition holds) , then we have strong duality,i.e. p = d.

    (b) If the dual is feasible (then weak Slater condition holds), then we have strong duality,i.e. d = p.

    (c) primal and dual are both infeasible, then it can happen that = p = q = .

    21

  • 8/14/2019 Optimization class notes MTH-9842

    22/25

    2. QCQP: quadratic constraint, quadr. programming:

    min 12xTP0x + q

    T0 x + r0

    s.t.12xTPix + q

    Ti x + ri 0

    i = 1, . . . , m

    Ax = b

    Then Slaters condition is: x | 12xTPix + q

    Ti x + ri < 0.

    Convexity:P0 > 0, pi 0, i = 1, . . . , m

    Slaters condition + convexity give us strong duality.

    3. Nonconvex problem with strong duality Strong duality holds for any optimization problemwith quadratic objective function and one quadratic inequality constraint provided thatSlaters condition holds.Remark:Trust-region problem

    min xTAx + bTx

    s.t. xTx 2

    where A is symmetric but not necessarily positive definite!

    Dual problem:

    max 14bT(A + I)1b 2

    s.t. 0, A + I 0

    Figure 3: Solving both problems simultaneously gives faster convergence.

    5.3 Optimality conditions

    Assume that the objective function and the constraints are differentiable.

    22

  • 8/14/2019 Optimization class notes MTH-9842

    23/25

    Complementary slackness:Assume we have strong duality, i.e. p = d, and they are attained at x for primal problem and(, ) for the dual problem, i.e.

    f0(x) = g(, ) strong duality

    = infx L(x,

    ,

    ) definition of g (=) L(x, , )

    = f0(x) +mi=1

    i fi(x) +pi=1

    i hi(x) def of Lagrangian

    Since we actually achieve strict equality in the one but last line, therefore, since

    hi(x) = 0, fi(x

    ) 0, 0

    it follows that both sums have to be exactly 0.

    Therefore,

    1. x minimizes L(x, , ) over x.

    2. Complementary slackness: i fi(x) = 0, i = 1, . . . , m

    5.4 Necessary optimality conditions (Karuch-Kuhn-Tucker)

    Dual feasible:i 0, i

    Primal feasible:

    fi(x) 0, i = 1, . . . ,m, hi(x) 0, i = 1, . . . , p .

    Complementary slackness: i fi(x) = 0, i = 1, . . . , m

    This last condition can be relaxed by setting instead:

    xL(x,,) = f0(x) +mi=1

    fi i(x) +

    pi=1

    ihi(x)

    23

  • 8/14/2019 Optimization class notes MTH-9842

    24/25

    Contents

    I Unconstrained optimization 1

    1 Line search method 1

    1.1 Determine a direction (decent) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.1.1 Steepest decent (negative gradient) . . . . . . . . . . . . . . . . . . . . . . 1

    1.1.2 Newtons method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.2 How far do we go along a chosen direction? . . . . . . . . . . . . . . . . . . . . . . 3

    1.2.1 Criteria for acceptable stepsize . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.4 More on Convergence Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2 Trust region method 6

    2.1 Pseudo-Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    2.2 How to solve (or approximate) the model problem quickly? . . . . . . . . . . . . . 8

    2.3 Three approaches to approximating the solution . . . . . . . . . . . . . . . . . . . 8

    2.4 Cauchy Point . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.5 Dogleg method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    2.6 Two dimensional problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    II Constrained Optimization 10

    3 Mathematical programming 10

    3.1 Standard Linear programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    3.2 Simplex Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

    4 Convex Optimization 13

    4.1 Chebyshev problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    4.1.1 Example 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    4.1.2 Example 2: Forward rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    4.2 Quadratic problem QP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    4.2.1 Quadratic constraint quadratic problem QCQP . . . . . . . . . . . . . . . . 15

    4.2.2 Markowitz portfolio optimization . . . . . . . . . . . . . . . . . . . . . . . . 15

    24

  • 8/14/2019 Optimization class notes MTH-9842

    25/25

    5 Duality Theory 17

    5.0.3 Example: dual function for LP . . . . . . . . . . . . . . . . . . . . . . . . . 18

    5.0.4 Example: trust-region problem . . . . . . . . . . . . . . . . . . . . . . . . . 18

    5.1 Dual Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    5.1.1 Strong Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    5.1.2 Weak Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    5.1.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    5.2 Strong duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    5.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    5.3 Optimality conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    5.4 Necessary optimality conditions (Karuch-Kuhn-Tucker) . . . . . . . . . . . . . . . 23

    25