Notes on Luenberger's Vector Space Optimization

Embed Size (px)

Citation preview

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    1/131

    Convexity and OptimizationWITH APPLICATIONS

    Paul G. BambergHarvard University

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    2/131

    Convexity and OptimizationWITH APPLICATIONS

    Paul G. Bamberg

    Copyright c 2008Paul G. BambergHarvard UniversityCambridge, MA 02138

    This text is based on lecture notes by Paul G. Bamberg written for M ATH 116: Convexity and Optimiza-tion with Applications, a course offered at Harvard University in Fall 2008. The notes were meant tocomplement Optimization by Vector Space Methods by David Luenberger (Wiley Interscience, 1969 [1997]).

    Front cover: The image was generated by the following M ATHEMATICA code:

    GraphicsGrid[Table[ReliefPlot[

    Table[Evaluate[Sum[RiemannSiegelZ[RandomReal[3, 2].{x, y}], {3}]], {x, 0,10, .2}, {y, 0, 10, .2}],

    ColorFunction -> ColorData["BlueGreenYellow"],Frame -> False], {3}, {3}]]

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    3/131

    C ONTENTS

    1 Generalizing from Two Dimensions 51.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Existence of Optimal Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4 Finite- vs. Innite-Dimensional Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.5 Minimum Norm Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    2 Preliminaries in Algebra, Topology, and Analysis 212.1 Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.3 Linear Independence and Dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.4 Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.5 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.6 Convergence, Limits, and Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    3 Banach Spaces 413.1 l p Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.2 Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443.3 L p Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43.4 Cauchy Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.5 Compactness and Extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.6 Quotient Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.7 Denseness and Separability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

    4 Hilbert Space 594.1 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54.2 The Projection Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.3 Orthogonal Complements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.4 The Gram-Schmidt Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 644.5 Fourier Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

    5 Dual Spaces and the Hahn-Banach Theorem 755.1 Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 755.2 Common Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    6 Applications of the Hahn-Banach Theorem 876.1 The Dual of C a, b . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 876.2 The Second Dual Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 906.3 Alignment and Orthogonal Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 916.4 Minimum Norm Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 926.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

    3

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    4/131

    4 CONTENTS

    6.6 Hyperplanes and Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    7 Calculus of Variations 1037.1 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Gateaux and Fr echet Differentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Euler-Lagrange Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    7.4 Problems with Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    8 Convex Functionals 1158.1 Local to Global . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8.2 Conjugate Convex Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3 Conjugate Concave Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.4 Fenchel Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Bibliography 130

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    5/131

    C HAPTER 1

    GENERALIZING FROM TWO DIMENSIONS

    Reading: [1, Chapter 1]

    1.1 Introduction

    The general approach in [ 1], which has made the book a classic, is this:

    Identify techniques from algebra, elementary single-variable calculus, or elementary multivariablecalculus that can be used to solve optimization problems.

    Reformulate the solution geometrically.

    Using geometry for inspiration, generalize the solution, typically to innite-dimensional vectorspaces and non-Euclidean norms, and prove (algebraically) that it is still valid.

    All the nite-dimensional problems in this chapter should be familiar, though they may be valuable re-view for some students. The innite-dimensional problems are just stated, not solved, and we will takequite a while to get to them.

    In this chapter, we will hold off on dening some important concepts, which for now are just in SMALL

    CAPS . These concepts will appear in Chapters 2, 3, and 5 of [ 1], generally in a context where there is nomention of optimization, just some challenging mathematics. You will need to learn them before youtackle optimization. In the process, you will acquire a good background in real analysis and in the branchof mathematics called functional analysis, the theory of normed innite-dimensional vector spaces.

    My hope for these introductory notes is to convince you that optimization problems are fun and rele-vant, that some of the best ones can only be formulated in innite-dimensional vector spaces, and that itis worth your while to learn quite a few new denitions and theorems in order to be able to solve them.

    1.2 Existence of Optimal Solutions

    We begin by remembering an EXISTENCE THEOREM from real analysis:

    T HEOREM 1.2.1 (extreme value theorem) . If f is a continuous real function on a compact metric space X ,M sup p X f p , and m inf p X f p , then there exist points q , r X such that f q M and f r m .

    The following examples illustrate applications of this theorem.

    5

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    6/131

    6 1. GENERALIZING FROM TWO DIMENSIONS

    Example 1.2.2. You are entering a student competition to draw up a business plan for a company withm scientists and n other employees. Entries with m2 2n 2 get rejected. You want to have the highestpossible ratio of scientists to other employees. Does this optimization problem have a solution?

    Solution (based on [ 2 , Example 1.1, p. 2]): We want to solve max m n such that m 2 2n 2 for m, n N.Equivalently, we want to nd the largest rational number p m n such that p2 2. We cannot applyTheorem 1.2.1 because the function we are optimizing (our OBJECTIVE FUNCTION ) is rational-, not real-valued. This alone is not enough to rule out the existence of an optimal solution, but we can show that, infact, an optimal solution does not exist.

    Assume that p is this largest rational number such that p2 2. Dene

    q p p2 2

    p 2 . (1.2.1)

    Thenq 2 2

    2 p2 2

    p

    2

    2 . (1.2.2)

    Since p2 2, (1.2.2) shows that q 2 2. However, ( 1.2.1) shows that q p, which contradicts our initialassumption that p is the largest rational number such that p2 2. Therefore this optimization problemhas no solution.

    Of course, note that as p2 approaches 2, m and n would become larger and larger. Since the number of people is nite, we should impose constraints m mmax and n nmax , in which case there would be asolution.

    Example 1.2.3. As director of the state lottery, you are designing scratch tickets by assigning probabilitiesto the possible payoffs from 0 through $4. Since a ticket sells for $5, you want to be as generous as possible.Does this optimization problem in R5 have a solution? Change the problem so that any nonnegativeinteger payoff is allowed. Does this optimization problem in an innite-dimensional vector space have asolution?

    Solution: We dene as generous as possible as maximizing the expected value of the ticket. There-fore, in the rst problem, we want to solve max

    4k 0 kp k where pk is the probability of receiving a payoff

    of $k, such that 0 pk 1 for all k 0, . . . , 4 and 4

    k 0 pk 1. Our objective function is continuousand real-valued, and the constraints on pk 4k 0 dene a compact set, so Theorem 1.2.1 guarantees theexistence of a solution. We simply set pk 1 for k 4 and pk 0 for k 4.

    For the second problem, we want to solve max

    k Z kpk , such that 0 pk 1 for all k Z and

    k Z

    pk

    1. The notion of compactness is tricky in innite-dimensional vector spaces so we cannotapply Theorem 1.2.1 to this problem, but we can show that, in fact, an optimal solution does not exist.Assume that we have some solution pk k for all k Z . We can always increase the expected value of the ticket and still satisfy our constraints by setting p0 0 and pk k 1 for k N since

    k 1

    k k 1

    k 0

    k 1 k

    k 0

    k k .

    Because the expected value is unbounded, there is no solution.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    7/131

    1.3. Linear Programming 7

    We will return to the concepts of compactness and METRIC SPACES later.

    1.3 Linear Programming

    Example 1.3.1. Your small bakery can produce only two products: frosted cookies and cakes. A batchof frosted cookies uses up 1 pound of our and 3 pounds of sugar. A batch of cake uses up 2 poundsof our and 1 pound of sugar. Each day your suppliers bring you 14 pounds of our and 17 pounds of sugar. Your optimization problem is to look at the market price of cookies and cakes and decide what toproduce.

    Figure 1.3.1: possible production schemes in Example 1.3.1

    Solution: Let x be the number of batches of cookies, y be the number of batches of cake, p1 be the marketprice of a batch of cookies, and p2 be the market price of a batch of cake. Then we want to solve:

    maxx,y

    p1 x p2 y (revenue

    such that x 2y 14 (our constraint)3x y 17 (sugar constraint)

    x, y 0

    Figure 1.3.1 shows the possible production schemes. Observe that if v and w represent possible produc-tion schemes, then v 1 w with 0 1, is also possible. This is the denition of a CONVEX SET.To show this, let

    A

    1 23 1

    , x

    xy

    , b

    1417

    so that we can write the constraints as Ax b . Then if Av b and Aw b , Av b and 1 Aw

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    8/131

    8 1. GENERALIZING FROM TWO DIMENSIONS

    1 b ; adding these inequalities gives the desired result.Now we can consider what the optimal solutions are for various values of p1 and p2 :

    p1 5, p2 5. We see from Figure 1.3.1 that the optimal solution is x 4, y 5. Revenue is$5 4 $5 5 $45.

    p1

    1, p2

    7. Again, we see from Figure 1.3.1 that the optimal solution is x

    0, y

    7. Revenue is$1 0 $7 7 $47.

    p1 6, p2 2. Notice in Figure 1.3.1 that the revenue function overlaps with the sugar constraint.Therefore, we maximize revenue by choosing any point along this constraint. Revenue is $6 173 $2 0 $34.

    The revenue function is an example of a ( LINEAR ) FUNCTIONAL on R 2 that we are trying to optimize.It is an element of the DUAL SPACE . The straight line that we slide to solve the problem is an exampleof a HYPERPLANE . The fact that this approach works for any convex set is a simple consequence of theH AHN -BANACH THEOREM .

    What we have done is to calculate, for any functional in the dual space, the largest value that thisfunction can achieve subject to the constraint imposed by our budget. This functional on the dual spaceis called the SUPPORT FUNCIONAL .

    Example 1.3.2. What is the support functional for the (closed) unit disk D 1 ? Can you reconstruct the unitdisk (or any other convex set) from its support functional?

    Figure 1.3.2: As shown in Example 1.3.2, on the left we see that the envelope of all functionals for whichthe support functional returns a constant value (i. e. , 1) is the boundary of our convex set, the closedunit disk. On the right we see that the contour plots of the support functional c m, n

    m 2 n 2 areconcentric circles centered at the origin; the contour c m, n 1 is the boundary of our convex set.

    Solution: Let the functionals be given by mx ny . The support functional is then

    c m, n inf

    c : mx ny c for all x, y D 2

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    9/131

    1.4. Finite- vs. Innite-Dimensional Vector Spaces 9

    ( D 2 is the closed unit disk). We can see from the left side of Figure 1.3.2 that c m, n c0 , where mx 0 ny 0 c0 is the tangent to the unit disk at the point x0 , y0 . The slope of the tangent is m n, so theslope of the normal is n m and the angle between the normal and the positive x-axis is tan 1 n m .Therefore, x0 cos m

    m 2 n 2 and y0 sin n

    m 2 n 2 . Then c m, n c0

    m2 n 2 .Observe that the value of the support functional evaluated for a given functional was just the given

    functional evaluated at some point on the boundary of the convex set. Therefore, intuitively, it seemsthat we can reconstruct the convex set from the support functional. We can see this in two ways. First,the boundary of the convex set can be given by a contour of the support functional, as shown in Figure1.3.2. Equivalently, the boundary of the convex set is the envelope of all functionals for which the supportfunctional returns some constant value, as shown in Figure 1.3.2.

    1.4 Finite- vs. Innite-Dimensional Vector Spaces

    Example 1.2.3 illustrated some of the added complexities of moving from nite- to innite-dimensionalvector spaces. The following examples further explore the differences between solving optimization prob-lems in nite- versus innite-dimensional vector spaces.

    Example 1.4.1 (innite-dimensional: equivalent to [ 1, Example 8.7.3, p. 231234]). You are playing a real-time strategy computer game in which you have to build up a civilization from scratch. The rst phase of the game, the Age of Agriculture, lasts from time t 0 to time t T . In this phase, you have farms thatproduce at a rate f t . Your farms disappear when you move to the next age, the Age of Mining. Whatmakes this game interesting is that you can allocate production between reinvestment r t and storages t . Your farm production increases at a rate proportional to your reinvestment:

    f t kr t , k 0, f t df

    dt.

    What system of allocation maximizes the amount of food available at time T ?

    Solution: Note that if we set s t 0, then f t r t and f t kr t kf t , so f t f 0 exp kt . Inany case, we can solve ( 1.4.1) with the boundary condition f 0 f 0 :

    f t t

    0kr d f 0 .

    Then the total amount of food stored by time T is T

    0 s t dt

    T

    0 f t r t dt

    T

    0

    t

    0 kr d r t

    dt f 0 T.

    Reinvestment and storage must be nonnegative, so we have the constraints

    0 r t f t 0 r t t

    0kr d f 0 .

    If functions r 1 t and r 2 t satisfy these inequalities, so does r 1 t 1 r2 t , where 0 1. Theset of acceptable solutions to the problem is a CONVEX SET in an INFINITE DIMENSIONAL vector space and

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    10/131

    10 1. GENERALIZING FROM TWO DIMENSIONS

    the quantity to be maximized is a LINEAR FUNCTIONAL on this space.We guess that the optimal strategy is to reinvest everything during the interval 0, t and then store

    everything during the interval t, T :

    0, t : f r , so f kr kf , f f 0 exp k , and s 0.

    t, T

    : f

    s

    , so

    f

    kr

    0, f

    f 0 exp

    kt

    , and s

    f 0 exp

    kt

    .Then we want to maximize

    T

    0s d T t f 0 exp kt .

    Differentiating with respect to t gives

    T t kf 0 exp kt f 0 exp kt 0 t T 1

    k.

    This solution holds if T 1 k 0 (if k 1 T ); if k 1 T , then the derivative of the objective functionwith respect to t is always negative and we are forced into a corner solution in which we just set t 0.

    Example 1.4.2 (nite-dimensional) . You are growing a new genetically engineered crop as part of a 2-year biofuels experiment. Your contract requires you to deliver 5 tons at the end of each year. The governmentpays all costs except the cost of fencing your plot of land, so the cost of producing x tons can be modeledas c x 6

    x. The inventory cost of storing your excess crop is hx .

    Figure 1.4.1: possible production schemes for Example 1.4.2

    Solution: Let yi be the number of tons grown in year i for i 1, 2. We have the constraints y1 5 andy1 y2 10 since we must grow at least 5 tons in order to be able to deliver the required amount by theend of year 1, and we must grow at least 10 tons in order to be ale to deliver the required amount by theend of year 2. The region corresponding to possible production schemes is shown in Figure 1.4.1. Clearly,

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    11/131

    1.4. Finite- vs. Innite-Dimensional Vector Spaces 11

    our cost function 6

    y1 6

    y2 h y1 5 is strictly increasing in both y1 and y2 within the feasible region,so the minimum cost will be found somewhere along the line y1 y2 10 between y1 5 and y1 10.We substitute y2 10 y1 into the cost function: now we want to minimize

    6

    y1 6

    10 y1 h y1 5

    with respect to y1 . The derivative with respect to y1 is

    3

    y1

    310 y1

    h 3

    10 y1

    y1

    y1 10 y1 h,

    which is strictly positive for y1 5, 10 and zero for y1 5. Therefore, we reach a corner solution bysetting y1 , y2 5, 5 , at which the minimum cost is 12

    5 26.83.

    Example 1.4.3 (innite-dimensional: equivalent to [ 1, Example 1.2.2, p. 3]). Your new contract requires

    to deliver at a known rate d t during the interval t 0, T . You produce at a rate r t . Your rate of production cost is c r t (if all you have to pay for is maintaining the fence, this might equal

    r t . If you have x t tons on hand, you pay inventory costs at a rate hx t . You start with an inventory x 0 .

    Solution: First, note that x t r t d t . Both inventory and production must be nonnegative, so wehave the constraints

    x t x 0 t

    0 r d d 0

    r t 0.

    We are trying to minimize total costs: T

    0 c r t hx t dt.

    The optimal solution r t lies in an innite-dimensional vector space. Note that r t is not necessarilycontinuous: for example, we might want to produce some constant positive quantity r r for 0, t and then produce r 0 for t, T . r t is not necessarily bounded either: if h is very small,then the optimal solution will be to produce

    T 0 d t dt , the total amount required over the interval 0, T ,

    as rapidly as possible.

    See [1, Example 8.7.4, p. 234] for a related problem.

    Example 1.4.4 (nite-dimensional) . You are operating a simple frictionless rocket-propelled car. You ex-pend fuel instantaneously to increase your speed to v miles/second, coast 1 mile, and use your brakes tostop instantaneously. You then expend fuel to increase your speed to w miles/second, coast 4 miles, andstop instantaneously. The challenge is to minimize the travel time. The constraint imposed by your fueltank is that v w 3.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    12/131

    12 1. GENERALIZING FROM TWO DIMENSIONS

    Solution: The total time spent is 1 v 4 w. We use a Lagrange multiplier to solve this problem:

    L v,w, 1

    v

    4w

    v w 3

    L v

    1v2

    0

    L w

    4w2

    0

    Combining the two partial derivatives gives 4v2 w2 . Combining this with the constraint gives 4v2 3 v 2, so v 1 and w 2 (we discard the solution v 3 since v, w 0 as we are discussing speed,not velocity). The travel time is 1.5 seconds.

    If this were a physics course, we might make this problem more realistic by assuming some nonzerocoefcient of friction between the car and the surface on which it travels and some function that gives therate of expenditure of fuel, among many other things.

    Example 1.4.5 (innite-dimensional: [ 1, Example 1.2.5, p. 4]) . Now your rocket-propelled vehicle goesstraight up and is subject to gravity. You expend fuel at a rate u t . Your goal is for the vehicle to reachheight h at time T while expending minimum fuel.

    Solution (full solution in [ 1 , Example 5.9.3, p. 125]): Assuming unit mass, massless fuel, and the absenceof aerodynamic forces, the equation of the rocket is governed by the second-order differential equation

    h t u t g. We cannot expend negative fuel and we want to reach height h at time T , so we have theconstraints

    u t 0

    h T

    T

    0

    t

    0 u g d

    dt h,

    where we assume in the second constraint that h 0 h 0 0; that is, we start at ground level with zerovelocity. We want to minimize the total fuel expended:

    T 0 u t dt .

    The optimal solution u t is not necessarily continuous or bounded: for example, it might (and ac-tually does) consist of an impulse at time t 0. In that case, in order to work with a function that is atleast bounded, it might be a better idea to work with v t

    t0 u d , the total amount of fuel expended

    during the interval 0, t . This insight is closely related to the R IESZ REPRESENTATION THEOREM .

    Example 1.4.6 (nite-dimensional) . Your job is to invent the most exciting scratch ticket. If an outcomehas probability p of occurring, the excitement that results when that outcome occurs is proportional to log p. For example, the excitement of rolling 6s on two dice log 136 2log

    16 is precisely twice the

    excitement of rolling 6 on one die. The ticket can pay $k for k 0, 1, 2. You must assign probabilities pkfor k 0, 1, 2 to to these three outcomes. So that the state can make a nice prot, the expected payoff must be 4 7 of a dollar.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    13/131

    1.4. Finite- vs. Innite-Dimensional Vector Spaces 13

    Solution: Our two constraints are

    2

    k 0

    pk 12

    k 0

    kp k 47

    .

    We want to maximize the quantity

    H X 2

    k 0

    pk log pk . (1.4.1)

    (H X , the excitement function, is the information entropy or Shannon entropy of the random variable X ,the payoff of the lottery ticket.) We can substitute the constraints into ( 1.4.1) to turn this into a single-variable problem, where we maximize

    37

    p2

    log

    37

    p2

    47

    2 p2

    log

    47

    2 p2

    p2 log p2

    over p2 . However, this would be messy, so we just use Lagrange multipliers:

    L p1 , p2 , p3 , 1 , 2 2

    k 0

    pk log pk 1

    2

    k 0

    pk 1

    2

    2

    k 0

    k pk 47

    L pk

    1 log pk 1 k 2 0, k 0, 1, 2

    Therefore, pk e1 1

    e 2 k . Some algebra tells us 1 log7 1 and 2 log2, so pk 2k 7.

    Example 1.4.7 (innite-dimensional) . Your job is to again to invent the most exciting scratch ticket. Theticket can have any nonnegative integer payoff. You need to assign probabilities pk for k Z to each of

    these possible outcomes. Make the expected payoff be $1, since the tickets will sell for $1.50.

    Solution: Now we are trying to maximize over an innite number of variables. Obviously we cannot usesubstitution as we could in the previous example since we still have an innite number of variables aftertaking into account the two constraints

    k 0

    pk 1

    k 0

    kp k 1.

    There are two LINEAR FUNCTIONALS in the dual space acting as constraints, so the CODIMENSION of thisproblem is 2.

    We can still solve the problem using Lagrange multipliers:

    L p0 , p1 , . . . , 1 , 2

    k 0

    pk log pk 1

    k 0

    pk 1

    2

    2

    k 0

    k pk 1

    L pk

    1 log pk 1 k 2 0, k 0

    Therefore, pk e1 1

    e 2 k. We recognize this as in the form of a geometric distribution pk p 1 p k

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    14/131

    14 1. GENERALIZING FROM TWO DIMENSIONS

    (note this is a geometric distribution with support 0, , not 1, ). Since the mean of a geometricdistribution is 1 p p, we have p 1 2, so 1 log2 1, 2 log2, and pk 1 2k 1.

    1.5 Minimum Norm Problems

    Minimum norm problems arise in several instances, as shown by the following examples.

    Example 1.5.1. You are visiting a friend who lives at the point x, y 2, 1 . A bus drives through townalong the main highway, a subspace whose equation is x 2y 0.

    Where do you get off the bus in order to minimize your walking distance to the friends house?

    Suppose you take a taxicab, which can travel only north-south or east-west. The cab driver chargesfor the driving distance. Where do you get out of the cab in order to minimize your cost?

    What if the driver only charges for the larger dimension?

    Figure 1.5.1: minimizing the distance between the highway and the house for the various norms in Ex-ample 1.5.1

    Solution: In the rst problem, we are minimizing the 2- NORM or EUCLIDEAN NORM . We can solve theproblem geometrically by expanding a circle centered at the house until it is tangent to the road, as shownin Figure 1.5.1. The point of tangency is the solution to the system

    x 2y 0 highwayy 1 2 x 2 normal to the highway through the point 2, 1

    ,

    or x, y 6 5, 3 5 . The minimum cost (assuming unit cost) is then

    4 5 2 8 5 2 4 sqrt 5. Notethat we were implicitly invoking the PROJECTION THEOREM ([1], Theorem 2, p. 51) in knowing that weminimize the distance by nding the point on the highway through which the normal through 2, 1 passes.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    15/131

    1.5. Minimum Norm Problems 15

    In the second problem, we are minimizing the 1-norm or TAXICAB NORM . We can solve the problemgeometrically by expanding a diamond centered at the house until it touches the road, as shown in Figure1.5.1. The point where the square touches the road is the solution to the system x 2y 0, x 2, or x, y 2, 1 . The minimum cost is then 4 3 4 3 2. In this case, the projection theorem with theordinary concept of perpendicular does not apply, but we could have used [ 1, Theorem 5.8.1, p. 119].

    In the third problem, we are minimizing a type of M INKOWSKI FUNCTIONAL (see [1, p. 131]). We cansolve the problem geometrically by expanding a square around the house until it touches the road, as wedid before. The factor by which you must expand the unit square in order for it make it touch a point, or4 3, is the minimum cost.

    Example 1.5.2 (minimum norms and convex sets) . Assume that the Island of Sodor is a convex set (itactually is not see Figure 1.5.2) whose shore is a smooth curve. Reverend Awdry is offshore and wantto swim to the closest point on the island.

    Figure 1.5.2: The Island of Sodor is actually not a convex set (Example 1.5.2)!

    Solution: We solve the problem geometrically by expanding a circle centered at Reverend Awdrys loca-tion until it just touches the island. Let the radius of the circle be d and the point of tangency between thecircle and the convex set be pt .

    Consider any other tangent to the circle, where the point of tangency is very close to pt : this tangentwill cut across the interior of the island, otherwise either it would not be a tangent or the set would not be convex. Therefore, d is the minimum distance from Reverend Awdrys location to the shore. Now con-sider all hyperplanes (lines) that separate Reverend Awdry from every point on the island. The reverendis farthest away from the hyperplane tangent to the island that we just constructed.

    This is a DUALITY theorem: the distance d solves a minimization problem over points on the islandand also solves a maximization problem over elements of the dual space. The theorem is proved in [ 1](Theorem 1, p. 136) with no restrictions except convexity. That is:

    The shore of the island need not be smooth.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    16/131

    16 1. GENERALIZING FROM TWO DIMENSIONS

    The vector space can be innite-dimensional.

    The norm need not be the Euclidean norm.

    An important class of minimum norm problems is nding a polynomial that best approximates afunction in some interval. Before taking a brief look at approximation problems, we rst dene a LINEARFUNCTIONAL :

    Denition 1.5.3 (linear functional) . Let V be a vector space over a eld F. A linear functional is a mapf : V F such that f v w f v f w for all v , w V and f av af v for all v V and all a F.

    Example 1.5.4. Consider the innite-dimensional vector space V of all continuous functions f on 1, 1 .Let W V be the 4-dimensional subspace of polynomials p for which deg p 3. Which of the followingare linear functionals?

    L : f x f 1

    L : f x f 0

    L : f x f 1

    L : f x 1

    1 f x dx

    Solution: The rst three are linear functionals since

    L

    f

    g

    f

    g

    x

    f

    x

    g

    x

    L

    f

    L

    g

    L af af x aL f

    for all x R . The last is also a linear functional since

    L f g 1

    1 f x g x dx

    1

    1f x dx

    1

    1g x dx L f L g

    L af 1

    1a f x dx a

    1

    1f x dx aL f .

    Note that these four functionals acting on V are clearly linearly independent, as we cannot write one as a

    linear combination of the others for every continuous function f on

    1, 1

    . If we restrict ourselves to W and write f x 3

    k 0 a k xk , then

    f 1 a0 a1 a2 a3f 0 a0f 1 a0 a1 a2 a3

    1

    1f x dx

    a0 x a1

    2 x2

    a23

    x3 a3

    4 x4

    1

    1 2a 0

    23

    a 2 .

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    17/131

    1.5. Minimum Norm Problems 17

    Since

    det

    1 1 1 11 0 0 01 1 1 12 0 23 0

    0,

    the functionals are linearly dependent. More elegantly, we might have remembered that Simpsons ruleis exact for polynomials p for which deg p 3, so we can write

    1

    1f x dx

    13

    f 1 4f 0 f 1 .

    Example 1.5.5. You have lost your scientic calculator and can only evaluate polynomials p x . You needto compute approximate values of f x sin x for randomly chosen values x 0, 1 . Here are threenorms that might be used to choose the best approximating polynomial:

    L1-norm: 1

    0 f x p x dx

    L2-norm: 1

    0 f x p x 2 dx

    L -norm: max 0 x 1 f x p x

    For which of these norms is the Taylorpolynomial the right choice for p x ? Under what circumstancesmight one or another of these norms be the appropriate choice?

    Solution: The Taylor polynomial is actually not the correct choice for any of these norms. It turns out thatinterpolating polynomials arise in minimizing the 1-norm, Fourier series in minimizing the 2-norm, andCHEBYSHEV POLYNOMIALS in minimizing the -norm. See [3] for a complete textbook on approximation

    theory.

    The following theorem provides reassurance that under reasonably broad conditions, functions can be approximated by polynomials to arbitrary accuracy.

    T HEOREM 1.5.6 (Weierstra Approximation Theorem) . If f is a continuous real-valued function on 0, 1 , thenthere exists a polynomial p on a, b such that f x p x for all x a and any 0.

    Proof: Dene the n 1 Bernstein basis polynomials of degree n as

    bk,n x

    nk

    xk 1 x n k , k 0, . . . , n .

    Then a linear combination of Bernstein basis polynomials

    B X n

    k 0

    vbk,n x

    is called a Bernstein polynomial , and 1 , . . . n are called the Bernstein or Bezier coefcients . The proof of the Weierstra approximation theorem is constructive: we will construct a series of Bernstein polyno-mials that converges uniformly to f .

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    18/131

    18 1. GENERALIZING FROM TWO DIMENSIONS

    Dene the Bernstein polynomial

    B n f x n

    k 0

    f

    kn

    nk

    xk 1 x n k , x 0, 1 .

    f is continuous and 0, 1 is compact, so f is uniformly continuous ([ 2, Theorem 4.19]) on 0, 1 . Therefore,

    given any 0, there exists a 0 such that x y implies f x f y 2. Because f is continuous and 0, 1 is compact, we can also conclude that f is bounded ([ 2, Theorem 4.15]); saysup x 0,1 f x M . Now

    f x Bn f x n

    k 0

    f x f

    kn

    nk

    xk 1 x n k ,

    so,

    f x Bn f x n

    k 0

    f x f

    kn

    nk

    xk 1 x n k

    k S

    f x f

    kn

    nk

    xk 1 x n k

    A

    k T

    f x f

    kn

    nk

    xk 1 x n k

    B

    ,

    where S j : x j n n 1 4, j 0, n and T j : x j n n 1 4 , j 0, n 0, n S .Let X Binom n, p , so that P X k

    nk

    xk 1 x n k and 2 var X n p 1 p n 1 4n . ByChebyshevs inequality,

    A M P

    X k

    n

    n 1 4

    M 2

    n 1 2 M

    x 1 x

    n M

    14

    n.

    Equivalently, A 2 if n M 2 4 2 . Furthermore, if X k n n 1 4 and n 1 4 , then X k n and f x Bn f x 2 since f is uniformly continuous on 0, 1 . Therefore, B is trivially less than

    2. We conclude that if n max M 2 4 2 , 4 , then f x Bn f x A B for all x 0, 1 ; thatis, the Bernstein polynomials B n f converge uniformly to f .

    Example 1.5.7 (Gibbs Phenomenon) . Find a discontinuous real-valued function for which such a polyno-mial does not exist.

    Solution (based on [ 4]): Consider the function f x sgnsin x , where

    sgn x

    1 x 00 x 0

    1 x 0

    .

    Clearly f is discontinuous at x k , for k Z . Consider the partial Fourier series:

    S N f x 4

    N

    k 1

    sin 2k 1 x 2k 1

    .

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    19/131

    1.5. Minimum Norm Problems 19

    Figure 1.5.3: Notice how limN S N f x f x at x 0. This discrepancy is known as Gibbs phe-nomenon or ringing artifacts (Example 1.5.7).

    By the Weierstra approximation theorem, we have lim N f x S N f x for all x where x is insome closed interval not containing a multiple of .

    However, we notice in Figure 1.5.3 that S N f x exhibits a bump whenever x k . In particular,S N f x reaches a critical point whenever

    S N f

    x 4

    N

    k 1

    cos 2k 1 x 0.

    To nd the zeros of S N f x , we rst show that

    2sin xN

    k 1

    cos 2k 1 x sin 2Nx . (1.5.1)

    The proof is by induction. For N 1, Equation ( 1.5.1) reduces to the identity sin 2x 2sin x cos x . Nowassume Equation ( 1.5.1) is true for general N and show that this implies it is true for N 1. We have

    2sin xN 1

    k 1

    cos 2k 1 x 2sin x

    N

    k 1

    cos 2k 1 x cos 2N 1 x

    a sin 2Nx 2sin x cos 2N 1 x

    sin 2Nx 2sin x cos 2Nx cos x 2sin x sin 2Nx sin x b

    1 2sin2 x sin 2Nx sin 2x cos 2Nx c

    cos 2x sin 2Nx sin 2x cos 2Nx d

    sin 2N 1 x .

    where (a) follows from the inductive step, (b) follows from the identity sin 2x 2sin x cos x , (c) followsfrom the identity cos 2x cos2 x sin2 x 1 sin2 x sin2 x 1 2sin2 x , and (d) follows from theidentity sin a b cos a sin b sin a cos b.

    Using this fact, we see that S N f reaches a critical point whenever sin x S N f x 2sin 2Nx 0,or whenever x is a multiple of 2N . Consider the closest critical points to x 0: x 2N . Then we

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    20/131

    20 1. GENERALIZING FROM TWO DIMENSIONS

    have

    S N f

    2N

    4

    N

    k 1

    sin

    2k 12N

    2k 1

    2

    N

    k 1

    sin

    2k 12N

    2k 12N

    N

    .

    The last sum is a Riemann sum taken over midpoints of the partition

    2k2N ,

    2 k 1 2N

    , k 0, . . . , N 1

    .Therefore,

    limN

    S N f

    2N

    limx 0

    f x

    0sin x

    x dx 1.1790 .

    A similar analysis limx 0 f x 1.1790 . Therefore, the sequence of polynomials S N f does notconverge pointwise to f . However, we do see that the sequence S N f does converge to f for the L1 - andL 2 -norms, though it does not for the L -norm.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    21/131

    C HAPTER 2

    PRELIMINARIES IN ALGEBRA , TOPOLOGY , AND ANALYSIS

    Reading: [1, Sections 2.1 2.9]. Also see [5, Chapters 1 2] for more on the linear algebra covered in thischapter, [ 6, Chapter 2] for more on the topology, and [ 2, Chapters 2 and 4] for more on the topology andanalysis.

    2.1 Vector Spaces

    Denition 2.1.1 (vector space) . A vector space V over a eld F is a set V along with two operations:

    addition, which associates with any two vectors u , v

    V a vector u

    v

    V , and scalar multiplication,which associates with any vector v V and any scalar a F a vector av V . The following axioms hold:

    1. commutative law for vector addition : u v v u for all u , v V .

    2. associative law for vector addition : u v w u v w for all u , v , w V .

    3. existence of an additive identity : There exists a null vector 0 V such that v 0 v for all v V .

    4. distributive law for vector addition : a u v au av for all u , v V and all a F.

    5. distributive law for scalar addition : a b v av bv for all v V and all a, b F.

    6. associative law for scalar multiplication : ab v a bv for all v V and all a, b F.

    7. existence of a multiplicative identity : 1v v for all v V . Also, 0v 0 for all v V .

    Unless specied otherwise, we will assume we are working with real vector spaces (F R).

    Example 2.1.2. Note that [ 1] replaces a standard axiom, the existence of an additive inverse, with theaxiom 0v 0 for all v V . Prove the existence of an additive inverse from [ 1]s axioms.

    Solution: We have0

    a 0v 1 1 v

    b 1v 1 v

    c v 1v ,

    where (a) follows from [ 1]s axiom 0v 0 for all v V , (b) follows from the distributive law for scalaraddition, and (c) follows from the denition of a multiplicative identity. Therefore, we dene the additiveinverse v 1v .

    21

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    22/131

    22 2. PRELIMINARIES IN A LGEBRA, TOPOLOGY , AND A NALYSIS

    Example 2.1.3. Which axiom on the list can be proved from the others?

    Solution: We can prove the commutative law from the other axioms. We have

    0

    a

    0

    u

    v

    1 1 u v b

    1 u v 1 u v c

    1 u 1 v 1u 1v d

    u v u v

    where (a) follows from the fact that 0v 0, (b) follows from the distributive law for scalar addition,(c) follows from the distributive law for vector addition, and (d) follows from the existence of additiveinverses, as shown above, and the denition of the multiplicative identity. Then we add v to the left of

    both sides to get u

    u

    0

    u

    u

    v

    u

    v

    v

    u

    v ; repeating with v gives v

    u

    u

    v . How do we know that we cannot prove another axiom from the remaining ones? We can try dening

    vector addition and scalar multiplication in different ways and seeing if the resulting operators still obeythe axioms for a vector space. If not, then the axioms that are not satised are independent of the axiomsthat are.

    Example 2.1.4. Let V R2 . Dene addition as usual, but dene scalar multiplication by a x, y ax, 0 for x, y V and a R . Is V a vector space? If not, what axiom is not satised?

    Solution: A multiplicative identity does not exist in this vector space since there exists no a such thata x, y x, y for all x, y . Therefore, the existence of a multiplicative identity is independent of theother axioms.

    Example 2.1.5. For the NSA budget, let V R s . The symbol S denotes a secret amount. For elementsin V other than s , addition and multiplication are dened as usual. However, s v v s s for allv V and as s for all a R . Is V a vector space? If not, what axiom is not satised?

    Solution: No, because the axiom 0v 0 is not satised. Clearly 0s 0 s.

    Below are some potential vector spaces:

    All 2 3 matrices with real entries: This is equivalent to R 6 , which is a 6-dimensional vector space

    All innite sequences: innite-dimensional vector space

    All bounded innite sequences: innite-dimensional vector space (since the sum of two boundedsequences is bounded and a bounded sequence times a constant is still bounded)

    All innite sequences that converge to zero: innite-dimensional vector space

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    23/131

    2.1. Vector Spaces 23

    All innite sequences with only nitely many nonzero terms: innite-dimensional vec-tor space

    All innite sequences for which the terms form a convergent series: innite-dimensionalvector space

    All functions f : A,B,C R : 3-dimensional vector space

    All polynomials of any (nite) degree: innite-dimensional vector space

    All polynomials p with deg p 3: not a vector space (consider f x x3 and g x x3 :deg f g 3)

    All power series: innite-dimensional vector space

    All continuous functions on 0, 1 : innite-dimensional vector space

    All linear functions f : R 3 R : 3-dimensional vector space (since we have the basis

    f 1, 0, 0 , f 0, 1, 0 , f 0, 0, 1

    because f is linear)

    Note that we nested SUBSPACES inside each other:

    Denition 2.1.6 (vector subspace) . A nonempty subset U of a vector space V over F is called a subspaceof V if for any u , v U and a, b F, au bv U . That is, U is a subspace of V if U is closed under vectoraddition and scalar multiplication.

    A subspace must be nonempty, so it must contain at least one element. By the axiom guaranteeing the

    existence of an additive identity, that one element must be 0 . Therefore, every subspace contains 0 .

    Proposition 2.1.7 ([1, Proposition 2.3.1, p. 15]) . If U 1 and U 2 are subspaces of V , so is U 1 U 2 .

    Proof: 0 U 1 , U 2 since U 1 and U 2 are subspaces, so 0 U 1 U 2 . Therefore, U 1 U 2 is nonempty. If u , v U 1 U 2 , then u , v U 1 and u , v U 2 . Then au bv U 1 , U 2 since U 1 and U 2 are subspaces, soau bv U 1 U 2 .

    While U 1 U 2 is a subspace if U 1 and U 2 are subspaces, it is not necessarily true that U 1 U 2 is asubspace. Consider any two (noncolinear) lines in R 2 that pass through the origin. These lines are bothsubspaces, but the smallest subspace containing both of these subspaces is R 2 .

    Denition 2.1.8 (sum of vector spaces) . The sum of two subsets U 1 and U 2 of a vector space is the set of all sums u 1 u 2 , where u 1 U 1 and u 2 U 2 .

    Example 2.1.9. What are the sum and difference of two squares, one centered at a and with side length 2rand the other centered at b and with side length 2s?

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    24/131

    24 2. PRELIMINARIES IN A LGEBRA, TOPOLOGY , AND A NALYSIS

    Solution: We skip ahead a little and assume knowledge of NORMED VECTOR SPACES . Let the rst square be the set A x : x a r and B y : y b s . Consider any x A and y B . We have

    x y a b

    x a

    y b

    r s,

    where the rst inequality follows from the TRIANGLE INEQUALITY . Therefore, A B is a square centered

    at a b with side length 2 r s . Furthermore, we have

    x y a b

    x a

    y b

    r s,

    where again we use the triangle inequality, as well as the property that ax a x for any scalar a Fand any vector x X , where X is a normed vector space. Therefore, A B is a square centered at a bwith side length 2 r s .

    Note that A B does not depend on our choice of the origin o since a o b o a b . Thisconstruction will be crucial to the proof of the E IDELHEIT SEPARATION THEOREM ([1], Theorem 3, p. 133).

    Proposition 2.1.10 ([1, Proposition 2.3.2, p. 15]) . If U 1 and U 2 are subspaces of V , then U 1

    U 2 is a subspace ofV .

    Proof: U 1 and U 2 contain 0 , so U 1 U 2 contains 0 . Suppose u 1 , u 2 U 1 U 2 . Then there exist vectorsv 1 , v 2 U 1 and w 1 , w 2 U 2 such that u 1 v 1 w 1 and u 2 v 2 w 2 . For any scalars a, b F, we have

    au 1 bu 2 av 1 bv 2

    U 1

    aw 1 bw 2

    U 2

    ;

    since au 1 bu 2 can be expressed as the sum of a vector in U 1 and a vector in U 2 , it is in U 1 U 2 . Therefore,U 1 U 2 is a subspace of V .

    Given any set S of vectors in V , there will in general be many subspaces of V that contain S . One is V itself, but there may be smaller ones. For example, if S 1, 1, 0 , then some examples of subspaces thatcontain S are x,x, 0 : x R , x,y, 0 : x, y R , and x ,y,z : x, y, z R , which are of dimensions1, 2, and 3, respectively. The smallest such subspace is given a special name:

    Denition 2.1.11 (subspace generated by a subset) . Suppose S is a subset of a vector space V . The set S ,called the subspace generated by S , consists of all vectors in V which are linear combinations of vectorsin S .

    If subspace U contains S , it contains S . Equivalently, S is the intersection of all subspaces that containS .

    The usual way to construct S is to form all linear combinations of vectors in S . This is clearly asubspace since it is closed.

    Example 2.1.12. Construct S R2 if

    S 1, 2 , 3, 6

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    25/131

    2.2. Convex Sets 25

    S 1, 2 , 3, 3

    S is the line segment from 1, 2 to 3, 6 .

    Solution: We have

    S x, 2x : x R since 3, 6 is a multiple of 1, 2 .

    S R2 since 1, 2 and 3, 3 are linearly independent.

    S x, 2x : x R since the line segment is part of the line y 2x .

    SUBSPACE generalizes line or plane through the origin. The generalization of any line or plane isLINEAR VARIETY , a subspace plus a constant vector:

    Denition 2.1.13 (linear variety) . The translation of a subspace is a linear variety or afne subspace .

    Analogous to the subspace generated by a subset, we have the following:

    Denition 2.1.14 (linear variety generated by a subset) . Suppose S is a nonempty subset of a vector spaceV . The set S , called the linear variety generated by S , is the intersection of all linear varieties in V thatcontain S .

    Example 2.1.15. If S R3 consists of the vectors 0, 0, 1 , 0, 1, 1 , give examples of subspaces of twodifferent dimensions that contain S . Which subspace is the smallest? What is S ?

    Solution: The subspaces 0, x ,y : x, y R and x ,y,z : x, y, z R are subspaces of dimensions 2 and3, respectively, that contain S . The rst subspace is the smallest. S is 0, 0, 1 0, x, 0 : x R .

    Example 2.1.16. In R 3 , what is S if S is a circle? Under what circumstances does S S ?

    Solution: S is the plane containing the circle. If the origin lies in the plane containing the circle, then S S .

    2.2 Convex Sets

    Denition 2.2.1 (convex set) . A set K in a linear vector space is said to be convex if for any x 1 , x 2 K , allelements in the set x 1 1 x 2 : 0, 1 are in K .

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    26/131

    26 2. PRELIMINARIES IN A LGEBRA, TOPOLOGY , AND A NALYSIS

    Proposition 2.2.2. The empty set is convex.

    Proof: To show that is not convex, we must nd two vectors x 1 , x 2 such that there exist vectors inthe set x 1 1 x 2 : 0 1 not in . Since contains no vectors at all, it is convex.

    Proposition 2.2.3 ([1, Proposition 2.4.1, p. 18]) . For any convex sets K, L and any scalars a, b, aK bL isconvex.

    Proof: Consider the vectors z1 , z 2 aK bL. There exist vectors x 1 , x 2 K and y 1 , y 2 L such thatz 1 ax 1 by 1 and z2 ax 2 by 2 . Since K and L are convex, we have

    z 1 1 z2 ax 1 by 1 1 ax 2 by 2 a x 1 1 x 2 b y 1 1 y2 .

    Then x 1 1 x 2 and y 1 1 y 2 are in K and L , respectively, since K and L are convex. Therefore, z 1 1 z2 is in aK bL.

    Figure 2.2.1: The union of the two disks is not convex.

    In general, the union of two convex sets is not convex, as shown in Figure 2.2.1. However, we do havethe following:

    Proposition 2.2.4 ([1, Proposition 2.4.2, p. 18]) . Let C be an arbitrary collection of convex sets. Then K C K isconvex.

    Proof: Let C K C . If C is empty, then the proof reduces to that in Example 2.2.2. Assume we havex 1 , x 2 C and choose any 0, 1 . Then x 1 , x 2 K for all K C and since each K is convex, x 1 1

    x 2

    K for all K

    C . Therefore, x 1

    1

    x2

    C and C is convex.

    Denition 2.2.5 (convex hull) . Given an arbitrary set S in a linear vector space, the convex hull or convexcover , denoted co S , is the smallest convex set containing S .

    We can express co S as the intersection of all convex sets containing S . Alternatively, we could expressco S as the set of all convex combinations of vectors in S , where a convex combination is a linear combi-nation

    S k 1 k x k , where k 0 for k 1, . . . , S and

    S k 1 k 1.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    27/131

    2.2. Convex Sets 27

    Proposition 2.2.6. Let K be the set of vectors consisting of all convex combinations of vectors in S . Show thatK co S .

    Proof: First we show that K co S . Let K m be the set of all convex combinations of the form m

    i 1 i x j i ,where

    m

    i

    1 i 1, a i 0 for i 1, . . . , m , and m 1, . . . , n . Clearly K

    n

    m

    1K m . Then in order to

    show K co S , we need to show that K m co S for m 1, . . . , n . The proof is by induction. Thisclearly holds for m 1 since in that case 1 1 and the convex combinations are therefore just theelements of S co S . Now we assume that the result is true for general m n and show that thisimplies it is true for m 1. Say we are given a convex combination p

    m 1i 1 i x j i , where

    k 1i 1 i 1

    and a i 0 for i 1, . . . , m 1. At least one of 1 , . . . , m 1 must be strictly positive; without loss of generality, assume that 1 0. Then

    p m 1

    i 1

    i x j i 1 x j 1 m 1

    i 2

    i x j i 1 x j 1

    q

    m 1

    i 2

    i

    m 1

    i 2

    i m 1

    i 2 ix j i

    r

    ,

    Clearly q co S since q S . Since

    m 1

    i 2

    i m 1

    i 2 i

    m 1i 2 i

    m 1i 2 i

    1,

    r is a convex combination of m elements of S and therefore, by the inductive hypothesis, is also in co S .Then p is on the line segment connecting q and r with q, r co S , so p co S , as we wanted to show.

    Remember that co S is the smallest convex set containing S . We know that S K 1 K co S , sowe just need to show that K is actually convex in order to show that K co S . Say we are given twoelements of K , q

    ni 1 i x i and r

    ni 1 i x i , with

    ni 1 i

    ni 1 i 1 and i , i 0 for i 1, . . . , n .

    Then any point on the line segment connecting q and r can be written as p

    ni 1 i x i 1

    ni 1 i x iwith 0, 1 . We can rewrite p as

    ni 1 i 1 i xi , with

    n

    i 1

    i 1 i n

    i 1

    i 1 n

    i 1

    i 1 1,

    so p is a convex combination of x1 , . . . , x n and is therefore in K .

    Denition 2.2.7 (cone) . A set C in a linear vector space is said to be a cone with vertex at the origin if x C implies x C for all 0.

    Example 2.2.8. Make a cone from a line segment, from a circle, and from a disk.

    Solution: Figure 2.2.2 shows a cone made from the line segment connecting 5, 5 and 5, 10 and a conemade from the circle of radius 1 2 centered at 0, 0, 1 2 and parallel to the xy -plane. A cone made fromthe corresponding disk would be the same as the cone made from the circle, except that the cone madefrom the disk would be lled in.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    28/131

    28 2. PRELIMINARIES IN A LGEBRA, TOPOLOGY , AND A NALYSIS

    Figure 2.2.2: cones made from the line segment and the circle in Example 2.2.8

    Example 2.2.9. Make a convex cone from a non-convex set.

    Solution: Consider the convex cone C generated from the line segment L in Figure 2.2.2. If we replaceL with an arc A that has the same endpoints as L and that is entirely contained within C , then we stillgenerate C . However, A is a non-convex set.

    Proposition 2.2.10. It is impossible to make a non-convex cone from a convex set.

    Proof: Let C be a cone generated from a convex set K . Pick any two vectors y 1 , y 2 C . Then there existvectors x 1 , x 2 K and scalars 1 , 2 0 such that y 1 1 x 1 and y 2 2 x 2 . Furthermore, we can set 1 and 2 1 where 0 and 0 1: specically, 1 2 and 1 1 2 .

    Then we can write y c y 1 1 y2 x 1 1 x 2 x c;

    x c K since K is convex and y c C by denition of a cone. Therefore, C is convex.

    2.3 Linear Independence and Dimension

    Denition 2.3.1 (linear independence) . A vector x is said to be linearly dependent upon a set of vectors S if x can be expressed as a linear combination of vectors in S . Equivalently, x is linearly dependent upon S if x S . Conversely, x is said to be linearly independent of the set S if it is not linearly dependent on S ,

    and a set of vectors is said to be a linearly independent set if each vector in the set is linearly independentof the remainder of the set.

    Note that this denition works even for innite-dimensional vector spaces.

    Example 2.3.2. Let S 1, t 2 , t 4 , . . . . Is f t t8 4t 2 t 6 t 2 7 dependent upon S ? Is g t t5

    dependent upon S ? Is S a linearly independent set?

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    29/131

    2.3. Linear Independence and Dimension 29

    Solution: f t is dependent upon S since each term of f t has an even exponent. g t is independent of S since t5 has an odd exponent. S is a linearly independent set.

    T HEOREM 2.3.3 ([1, Theorem 2.5.1, p. 20]) . The set x 1 , . . . , x n is linearly independent if and only if n

    k 1 k x k 0 implies k 0 for k 1, . . . , n .

    Proof: See the proof in [ 1].

    T HEOREM 2.3.4 ([1, Corollary 2.5.1, p. 20]) . If x 1 , . . . , x n is a linearly independent set and n

    k 1 k x k n

    k 1 k x k , then k k for k 1, . . . , n .

    Proof: If n

    k 1 k x k n

    k 1 k x k , then n

    k 1 k k xk 0 and k k for k 1, . . . , n by Theorem2.3.3.

    Denition 2.3.5 (basis) . A nite set B of linearly independent vectors is said to be a basis for the space V if B V . The number of vectors B in B is called the dimension of V .

    Example 2.3.6. What is the dimension of the space spanned by 1, cos2 x, sin 2 x, cos2x ?

    Solution: This space has dimension 2, since we can write 1 cos2 x sin2 x and cos2x cos2 x sin 2 x .

    Q UIZ THEOREM 1 (from [1, Theorem 2.5.2, p. 21]) . If a vector space V is generated by the set of k vectorsS k v 1 , . . . , v k , then any set of k 1 vectors T k 1 w 1 , . . . , w k 1 in V must be linearly dependent.

    Proof: The proof is by induction. For the base case k 1, S 1 v 1 and T 2 w 1 , w 2 . There existscalars 1 , 2 0 such that w 1 1 v 1 and w 2 2 v 1 . Then we have 2 v 1 1 v 2 0, so T is a linearlydependent set.

    Our inductive hypothesis is that if V S k 1 , then T k is a linearly dependent set. We assume this istrue for arbitrary k and show that it holds for k 1. First, we can write w k 1

    ki 1 i v i since v 1 , . . . , v k

    is a basis. Choose some j such that j 0. If this cannot be done, then w 1 0 and T k 1 is a linearlydependent set. Then let S k S k v j . Now

    v j 1

    jw k 1 v

    for some v S k , with v k

    i 1i j

    bi v i . Substituting this into the expression below, we have

    w m k

    i 1

    i v i k

    i 1i j

    i bi v i w k 1

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    30/131

    30 2. PRELIMINARIES IN A LGEBRA, TOPOLOGY , AND A NALYSIS

    for m 1, . . . , k . By the inductive hypothesis, these k vectors w 1 , . . . , w k are linearly dependent; that is,there exists some linear combination

    ki 1 i w i 0 with not all of 1 , . . . , k equal to 0. But then we can

    set k 1 0 so that the linear combination k

    k 1 i w i also equals 0 ; again not all of 1 , . . . , k equal 0 sothe set T k 1 is linearly dependent, thus proving the theorem.

    We conclude that if S k is a basis for V , then any set of more than k vectors in V is linearly dependentand cannot be a basis. Therefore, any two bases for a nite-dimensional vector space contain the samenumber of elements.

    Example 2.3.7. Show that the result does not extend to innite-dimensional vector spaces.

    Solution: The vector space V of convergent sequences is generated by the countably innite set of linearlyindependent elements S 1, 0, . . . , 1, 1, 0, . . . , 1, 1, 1, 0, . . . , . . . . If we remove the rst element of this set, we still have a countably innite set S S 1, 0, . . . , but this does not generate V since notall convergent sequences ak k 1 have a 1 a2 .

    2.4 Normed Vector Spaces

    Denition 2.4.1 (normed vector space) . A normed vector space , or normed linear space or normed linearvector space , is a vector space X on which there is dened a real-valued function which maps every x X into a real number x called the norm of x . The norm satises the following axioms:

    1. positive homogeneity: x x for all R and all x X .

    2. triangle inequality: x y x y for all x , y X .

    3. positivity and positive deniteness:

    x

    0 for all x

    X and

    x

    0 if and only if x

    0.

    A real-valued function p : X R that satises just the rst two axioms is called a SEMINORM .Note the following about each axiom:

    positive homogeneity

    If you take a cab from your destination back to your starting point, you should pay the sameamount as when you went from your starting point to your destination.

    You cannot save money by breaking a cab ride along a single straight road into two pieces.

    The cab driver cannot offer bargain fares for extra-long rides.

    In order to convert x 41 x42 into a norm, we must take the fourth root (this is the l4 -norm in 2dimensions: x 4

    x41 x42

    14 ).

    If you draw a convex set that contains the origin and decree that the cab fare from the origin toany point on the boundary of this set is $1 and that fare scales linearly with distance, you do notnecessarily have a norm. In fact, you only have a norm if the boundary is the set x : x c for some constant c.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    31/131

    2.4. Normed Vector Spaces 31

    triangle inequality

    You cannot save money by breaking a cab ride into two pieces.

    A cab driver can offer a 50% discount on the shorter dimension ( x u, v 12 min u , v max u , v ). Say we have x a, b and y c, d . Assume without loss of generality thatmin a , b a . If min c , d c , then clearly x y x y . If min c , d d ,assume without loss of generality that min a c , b d a c . Then

    x y x y

    12

    a b

    c 1

    2 d

    12

    a c b d

    12

    a c a c b d b d 1

    2 c d 0,

    where the inequality follows since each term in parentheses is nonnegative. Similarly, a cabdriver can also offer a 50% discount on the longer dimension.

    The set of vectors x : x 1 is convex since

    x 1 1 x2 x 1 1 x 2 1 1.

    positivity and positive deniteness

    The if part of positive deniteness is not an independent requirement: positive homogeneityimplies that x 0 if x 0 since we have 0 00 0 0 0.

    Positivity is not an independent requirement: we have

    0 a

    0 x x b

    x x c

    x 1 x 2 x ,

    where (a) follows from positive homogeneity as shown above, (b) follows from the triangle in-equality, and (c) follows from positive homogeneity. Dividing both sides by 2 proves positivity.

    If the cab driver makes the longer dimension free, cab fare is still a norm (the l -norm).

    In R 2 , we have seen the l1 -, l2-, l4 -, and l -norms so far, dened by x1 x2 ,

    x21 x22

    12 ,

    x41 x42

    14 ,

    and max x1 , x2 , respectively. Before we are tempted to conclude that x p x p1 x p2

    1

    p is a normfor all p 0, note that it is actually not a norm for p 0, 1 . The l0 -norm does not satisfy positivehomogeneity, and the l p-norm for p 0, 1 does not satisfy the triangle inequality (the unit disks forthese norms are not convex).

    Example 2.4.2. Consider the set of innite sequences of real numbers that have only nitely many nonzeroterms. Is this a vector space? If so, what is its dimension? Generalize the norm examples to this case. If you remove the restriction to nitely many nonzero terms but insist that the norm be nite, which examplegives the largest vector space? The smallest?

    Solution: Yes, this is a vector space since the sum of an innite sequence with M nonzero terms andan innite sequence with N nonzero terms will have at most M N nonzero terms, and anymultiple of an innite sequence with M nonzero terms will still have only M nonzero terms. The vector

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    32/131

    32 2. PRELIMINARIES IN A LGEBRA, TOPOLOGY , AND A NALYSIS

    space is innite-dimensional. The l p-norm ( p 1, ) for a sequence 1 , 2 , . . . is

    i 1 i p

    1

    p . Thel -norm is max i i i 1. If we remove the restriction to nitely many nonzero terms but insist that thenorm be nite, then the l -norm gives the largest vector space, while the l1 norm gives the smallest.

    Example 2.4.3. C a, b is the space of continuous functions on a, b , with norm x max t a,b x t .

    Conrm that this norm satises all the axioms.

    Solution: We have

    1. positive homogeneity:

    x maxa t b

    x t maxa t b

    x t x .

    2. triangle inequality:

    x y maxa

    t

    b

    x t y t maxa

    t

    b

    x t y t maxa

    t

    b

    x maxa

    t

    b

    y t x y .

    3. positivity and positive deniteness: Clearly x 0 if only if x t 0 for all t a, b . x 0 since x 0.

    Example 2.4.4. Give examples of functions, not in C 0, 1 , that can be included in the space if you choosethe norm x

    10 x t dt .

    Figure 2.4.1: Thomaes function (Example 2.4.4). Image taken from Wikipedia.

    Solution: The function x t sgn x 0.5 is discontinuous at the point x 0.5, but x 1. A moreinteresting choice is Thomaes function, also known as the popcorn function or the Riemann function:

    x t

    1q t

    pq Q

    0 t R Q.

    We show that x 0 (see Figure 2.4.1). Given any 0, choose n such that 1 n . In the interval 0, 1 , there are only nitely many rational numbers with denominator at most n . Say there are dn of thesenumbers. Surround each of these dn points with an interval of length dn , and use the endpoints of theseintervals to form a dissection of the interval 0, 1 . The upper Darboux sum for this dissection is less than2 : within each of the dn intervals, whose combined length is , the function has an upper bound of 1, and

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    33/131

    2.4. Normed Vector Spaces 33

    within each of the remaining intervals, whose combined length is less than 1, the function has an upper bound of 1 n . The lower Darboux sum is obviously 0 since every interval in the dissection containsan irrational number. Since we have 0 x 2 for all 0, x 0.

    The norm x does not exist for the Dirichlet function, dened as

    x

    t

    1Q

    t

    1 t Q

    0 t R Q ,

    is not Darboux-integrable (and therefore not Riemann-integrable) since every interval in any dissectionof 0, 1 will always contain at least one rational and at least on irrational number; therefore, the upperand lower Darboux sums will not converge. If we use the Lebesgue integral, however, then x exists andequals 0 since Q is countable.

    Denition 2.4.5 (total variation) . A function x on a, b is said to be of bounded variation if there is aconstant K so that for any partition a t0 t1 tn b of a, b ,

    n

    i 1

    x ti x ti 1 K.

    We then dene the total variation of x as

    TV x sup

    n

    i 1

    x ti x ti 1 b

    a dx t ,

    where the supremum is taken with respect to all partitions of a, b .

    Example 2.4.6. The space BV a, b is dened as the space of all functions of bounded variation on a, b together with the norm x x a TV x . Why is this norm the appropriate choice? Is the functionx t sin 11 t in BV 0, 1 ?

    Figure 2.4.2: The function x t sin 1t does not have bounded variation (Example 2.4.6).

    Solution: We check that x satises the three axioms for a norm:

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    34/131

    34 2. PRELIMINARIES IN A LGEBRA, TOPOLOGY , AND A NALYSIS

    1. positive homogeneity:

    x x a supn

    i 1

    x ti x ti 1 x a supn

    i 1

    x ti x ti 1 x .

    2. triangle inequality:

    x y x a y a supn

    i 1

    x ti y ti x ti 1 y ti 1

    x a y a supn

    i 1

    x ti x ti 1 supn

    i 1

    y ti y ti 1 ,

    where we use the triangle inequality for the absolute value function.

    3. positivity and positive deniteness: Clearly x 0 if x t 0 for t a, b . Furthermore, becausewe include the term x a in the norm, x 0 only if x t 0 for t a, b . Finally, x 0 for allfunctions x BV a, b since every term in the norm involves absolute values.

    The function x t sin 1t is not in BV 0, 1 since there are innitely many points t 0, 1 at whichx t 1. Specically, as t 0, x t oscillates between 1 and 1 innitely often, as shown in Figure 2.4.2.Therefore, x t does not have bounded variation.

    We can think of x t as the position of a car as a function of time for t a, b , and that the carsodometer increases even when the car is backing up. Then the space BV a, b is the set of functions forwhich the change in the odometer reading is nite. This space is important because it will turn out to bethe dual space to C a, b .

    2.5 Open and Closed Sets

    A norm introduces a topology that may be more general than what the reader is used to.

    Denition 2.5.1 (topology) . A topology on a set X is a collection T of subsets of X having the followingproperties:

    1. and X are in T .

    2. The union of the elements of any subcollection of T is in T .

    3. The intersection of the elements of any nite subcollection of T is in T .

    A set X for which a topology T has been specied is called a topological space .

    Denition 2.5.2 (open set) . If X is a topological space with topology T , we say that a subset U X is anopen set of X if U belongs to the collection T . More generally, a topological space is a set X together witha collection of subsets of X , called open sets, such that and X are both open, and such that arbitraryunions and nite intersections of open sets are open.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    35/131

    2.5. Open and Closed Sets 35

    The most basic topology on a nonempty set X is the collection , X , so a topology can contain justtwo open sets. Furthermore, it is permissible to have every set in a topology be open.

    Here are some other denitions that are independent of any model for topology:

    Denition 2.5.3 (closed set) . A subset A of a topological space X is said to be closed if the set X A is

    open.

    Denition 2.5.4 (interior and closure) . Given a subset A of a topological space X , the interior of A , de-noted A, is dened as the union of all open sets contained in A (i. e. the largest open subset of A ). Theclosure of A, denoted A, is dened as the intersection of all closed sets containing A (i. e. the smallestclosed set containing the subset A ).

    Clearly if A A, then A is open, while if A A, then A is closed.

    Figure 2.5.1: a Web site model that illustrates a nite topology that does not involve norms

    We use a Web site model to illustrate a nite topology that does not involve norms. Figure 2.5.1 showsthe links between the six Web pages in a set X . An open set is dened by the property that no page inthe set can be reached from a page from outside the set.

    Example 2.5.5. For the Web site example, that both and X are open.

    Solution: We cannot nd a page in that can be reached from a page outside because there are nopages in . We cannot nd a page outside X that links to a page in X because there are no pages outsideX . Therefore, and X are open.

    Example 2.5.6. For the Web site example, prove that the union of two open sets is open.

    Solution: Let A and B be two open sets. By the denition of an open set, there are no links from pagesin X A to pages in A, nor are there links from pages in X B to pages in B . Since X A B X Aand X A B X B , there are no links from pages in X A B to either pages in A or pages in B ;therefore, there are no links from pages in X A B to pages in A B . Thus, A B is open.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    36/131

    36 2. PRELIMINARIES IN A LGEBRA, TOPOLOGY , AND A NALYSIS

    Example 2.5.7. For the Web site example, prove that the intersection of two open sets is open.

    Solution: Let A and B be two open sets. Suppose that A B is not open. Then there must be some pagein X A B that links to a page in A B . But X A B X A X B , so we have found either

    a page in X A

    that links to a page in A

    or a page in X B

    that links to a page in B

    . This contradicts ourassumption that A and B are open, so A B is open.

    For the Web site model, we note the following to illustrate some of the topological concepts we havedened:

    The nine open sets in X are , 2 , 4, 5 , 1, 2, 3 , 2, 4, 5 , 4, 5, 6 , 2, 4, 5, 6 , 1, 2, 3, 4, 5 , and 1, 2, 3, 4, 5, 6 X .

    The interior of 2, 3 is 2 since 2 is the largest open subset of 2, 3 .

    The interior of 1, 2, 4, 5, 6 is 2, 4, 5, 6 .

    The nine closed sets are the complements of the nine open sets in X . Note that and X are bothopen and closed sets (sometimes called clopen sets ).

    The closure of 1 is 1, 3 since 1, 3 is the smallest closed set containing the subset 1 .

    The closure of 1, 6 is 1, 3, 6 since 1, 3, 6 is the smallest closed set containing the subset 1, 6 .

    Now we dene a topology using a norm.

    Denition 2.5.8 (interior point) . Let P be a subset of a normed vector space X . The point p P is an

    interior point of P if there exists some

    0 such that the ball B

    p ,

    x :

    x

    p

    is a subset of P .

    Denition 2.5.9 (clsoure point) . A point x X is a closure point or limit point of a set P if, given any 0, there is a point p P such that x p .

    In other words, these denitions are saying that a point p is an interior point of P if we can alwayssurround p with a ball entirely contained within P , while a point x is a closure point of P if every ballcentered at x contains at least one point in P .

    Proposition 2.5.10. P is a closed set; that is, that the complement of P is an open set.

    (based on [ 2 , Theorem 2.27, p. 35]): If p X and p P , then p is neither a point in P nor a closure pointof P . Therefore, there exists some ball centered at p that does not contain any points in P . This showsthat the complement of P in X is open, so P is closed.

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    37/131

    2.6. Convergence, Limits, and Continuity 37

    Q UIZ T HEOREM 2 (rst part of [ 1, Proposition 2.7.4, p. 25]) . Let C be a convex set in a normed space. Then C is convex.

    Proof: If C is empty, it is convex (see Proposition 2.2.2). Suppose x 0 , y 0 C . Fix 0, 1 : we must

    show that z x

    0 1 y

    0 is in C

    . Since x , y C

    , there exists some 0

    such that the open balls B x 0 , , B y 0 , are contained in C ; that is, all vectors x 0 w and y 0 w with w are in C .Since C is convex, all convex combinations x 0 w 1 y 0 w are in C . Furthermore, since x 0 w 1 y0 w z0 w , it follows that all points of the form z0 w are in C ; that is, thereexists some 0 such that the open ball B z0 , is contained in C . Therefore, z0 C .

    Q UIZ T HEOREM 3 (second part of [ 1, Proposition 2.7.4, p. 25]) . Let C be a convex set in a normed space. ThenC is convex.

    Proof: If C is empty, it is convex (see Proposition 2.2.2). Suppose x 0 , y 0 C . Fix 0, 1 : we mustshow that z 0 x 0 1 y 0 is in C . Given any 0, select x , y from C such that x x 0 and y y0 . Since C is convex, z x 1 y is in C . Then by the triangle inequality,

    z z0 x 1 y x 0 1 y 0 x x 0 1 y y 0 1 ,

    so z 0 is within a distance of the point z C . Since this is true for every 0, z 0 is a closure point of C and is therefore in C .

    2.6 Convergence, Limits, and Continuity

    Any topology is sufcient to dene convergence.

    Denition 2.6.1 (convergence (any topology)) . The sequence x n converges to x if, for every open set P containing x , there exists an N such that for all n N , x n P . We write x n x .

    Example 2.6.2. For the Web site topology, show that the sequence 6, 5, 4, 6, 5, 4, 5, 4, 5, 4, . . . converges bothto 4 and to 5.

    Solution: The smallest open set containing 4 is 4, 5

    , and the smallest open set containing 5 is also 4, 5

    .Therefore, for every open set P containing 4 or 5, there exists some N such that for all n N ,

    If we dene open sets by a norm, we can formulate convergence in terms of norms.

    Denition 2.6.3 (convergence (normed space)) . The sequence x n converges to x if, for every 0, thereexists an N such that for all n N , x n x . As before, we write x n x .

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    38/131

    38 2. PRELIMINARIES IN A LGEBRA, TOPOLOGY , AND A NALYSIS

    Q UIZ T HEOREM 4 ([1, Proposition 2.8.1, p. 27]) . If a sequence converges, its limit is unique.

    Proof: Suppose x n x and x n x . Then for every 2 0, there exist N, N such that for all n N and all n N , x n x 2 and x n x 2. Then by the triangle inequality,

    x x

    x x m x

    xm x xm x

    x m

    for m max N, N . Since this is true for any 0, x x .

    T HEOREM 2.6.4 ([1, Proposition 2.8.2, p. 27]) . A set F is closed if and only if every convergent sequence witelements in F has its limit in F .

    Proof: For the only if direction, the limit of a sequence in F is obviously a closure point of F andtherefore must be in F if F is closed. For the if direction, suppose that F is not closed. Then there is aclosure point x of F that is not in F . In each of the open balls B x , 1 n we may select a point x n F since

    x is a closure point. The sequence

    x n

    generated in this way converges to some x

    F , which contradictsour assumption that every convergent sequence with elements in F has its limit in F . Therefore, F must be closed.

    Denition 2.6.5 (transformation) . Let X and Y be vector spaces and let D be a subset of X . A rule whichassociates an element y Y with every element x D is a transformation from X to Y with domain D .

    Denition 2.6.6 (injective) . T is injective or one-to-one if T x T y implies x y .

    Denition 2.6.7 (surjective) . T is surjective or onto if for every y Y , there exists at least one x such thatT x y . In other words, the image of T equals its codomain.

    Denition 2.6.8 (linear) . T is linear if T ax by aT x bT y for any x , y X and any scalars a, b F.

    Example 2.6.9. If X or Y is innite-dimensional, then a linear transformation T cannot be represented bya matrix. Show that the following transformations still qualify as linear:

    Both X and Y are the space of polynomial functions: T is differentiation. Both X and Y are the space of continuous functions on a, b : T is dened by

    T x b

    ak t, x d.

    where k is a continuous function on a, b a, b

  • 8/13/2019 Notes on Luenberger's Vector Space Optimization

    39/131

    2.6. Convergence, Limits, and Continuity 39

    Solution: Differentiation and integration are linear operators: D x af bg aD x f bDx g and

    R af x bg x dx a

    R f x dx b

    R g x dx .

    Denition 2.6.10 (continuity (any topology)) . T : X Y is continuous if the inverse image U T 1 V of any open subset V Y is an open subset of X .

    Notice that the inverse image is dened even if T is not invertible, and it is not necessarily a connectedset.

    For a topology dened by a norm, we can formulate convergence in terms of norms.

    Denition 2.6.11 (continuity (normed space)) . A transformation T from a normed space X to a normedspace Y is continuous at x 0 X if for every 0, there exists a 0 such that x x 0 implies T x T x 0 .

    Example 2.6.12 (based on [ 6, Theorem 18.1, p. 104]) . Show that this is the same denition, specialized toa topology dened by a norm. In other words, show that the denition of continuity for any topologyimplies the denition of continuity for a normed space.

    Solution: Suppose T is continuous at x 0 and let V y : T x 0 y be an open subset of Y . Thenthere exists an open set U T 1 V in X , and clearly, x 0 U . Furthermore, since U i