63
OPTIMIZATION BY VECTOR SPACE METHODS ............ \

Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

  • Upload
    tranthu

  • View
    224

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

OPTIMIZATION BYVECTOR SPACE METHODS

............

\

Page 2: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

SERIES IN DECISION AND CONTROL

Ronald! A. Howard

OPTIMIZATION BY· VECTOR SPACE METHODS

by David G. Luenberger

INTRODUCTION TO DYNAMIC PROGRAMMING

by George L. Nemhauser

DYNAMIC PROBABILISTIC SYSTEMS (In Preparation)

by Ronald A. Howard

Page 3: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

,....,.;...)- .. _- -.-

OPTIMIZATION BYVECTOR SPACE METHODS

David G. Luenberger

Stanford University,Stanford, California

ffi John Wiley & SCJ11s,!nc.~ New York-London-Sydney-Toronto

''r>ot,. .... ~ •

, \!!

Page 4: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

Copyright © 1969 by'JohnWiJey & Sons, Inc.

All rights reserved. Nopart of this book may be

reproduced by any means,nor transmitted, nor trans~

lated into a machinelanguage without the

written permissionof the publisher.

Library of CongressCatalog Card Number:

68-8716 SBN 471 5S3S9xPrinted in the United

States of America3 4 5 6 7 8 9 10

II

I

II

lII\

II

-_._-----,.._,,_.._...~,. -"*'~-_ .....--_.----_.._._-----_.'._-- ------.,----

Page 5: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

To Nancy.

Page 6: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

, ,

,

Page 7: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

PREFACE

This book has evolved from a course on optimization that I have taught atStanford University for the past five years. It is intended to be essentiallyself-contained and should be suitable for classroom work or self-study.As a text it is aimed at first- or second-year graduate students in engineering,mathematics, operations research, or other disciplines dealing with opti­mization theory.

The primary objective of the book is to demonstrate that a rather largesegrruinCof'tlie'lield-'ofoptiriii:l'ation cali be effectively unified by a fewgeometric principles of linear vector space theory. By use of these principles,important and complex infinite-dimensional problerrls~""s-uch'" as' "i'b'osegenerated byconsideration of time functions, are interpreted and solved bymethods springing from qur geometric insight. Concepts such as distance,orthogonality, and convexity play fundamental' roles in this development.Viewed in these terms, seemingly diverse problems and techniques oftenare found to be intimately related.

The essential mathematical prerequisite is a familiarity with linearalgebra, preferably from the geometric viewpoint. Some familiarity withelementary analysis including the basic notions of sets, convergence, andcontinuity.is assumed, but deficiencies in this area can be corrected as oneprogresses through the book. More advanced concepts of analysis such asLebesgue measure and integration theory, although referred to in a fewisolated sections, are not required background for this book.

Imposing simple intuitive interpretations on complex infinite-dimen­sional problems requires a fair degree of mathematical sophistication.The backbone of the approach taken in this book is functional analysis,the study of linear vector spaces. In an attempt to keep the mathematicalprerequisites to a minimum while Not sacrificing completeness ofthe develop­ment, the early chapters of the book essentially constitute an introductionto functional analysis, with applications to optimization, for those havingthe r€;latively modest background described above, The mathematician ormore advanced student may wish simply to scan Chapters 2,3,5, and 6 forreview or for sections treating applications and then concentraJ~ on theother chapters which deal explicitly with optimization theory.

vii

Page 8: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

"......

y.:-..-"-"!>

VJ1l PREFACE

The sequencing of the various sections is not necessarily inviolable.Even at the chapter level the reader may wish to alter his oider of progressthrough 'the' book. The course from which this text developed is twoquarters (six months) in duration, but there is more material in the textthan can be comfortably covered in that period. By reading only the firstfew sections of Chapter 3, it is possible to go directly from Chapters 1 and 2to Chapters 5, 7,8, and 10 for a fairly comprehensive treatment of optim­ization which can be covered in about one semester. Alternative}y~ thematerial at the end of Chapter 6 can be combined with Chapters 3 and 4 fora unified introduction to Hilbert space problems. To help the reader makeintelligent decisions regarding his order of progress through the book,sections of a specialized or digressive nature are indicated by an *.

The problems at the end of each chapter are of two basic varieties.The first consists of miscellaneous mathematical problems and proofswhich extend and supplement the theoretical material inthe text; the sec'ondconsists of optimization problems which illustrate further areas of appli­cation and which hopefully will help the student formulate and solvepractical problems. The problems represent a major component of thebook) and the serious student will not pass over them lightly.

I have received help and encouragement from many people during theyears of preparation of this book. Of great benefit were comments andsuggestions of Pravin Varaiya, E. Bruce Lee~ and particularly SamuelKarlin who read the entire manuscript and suggested several valuableimprovements. I wish to acknowledge the Departments of Engineering­Economic Systems and Electrical Engineering at Stanford University forsupplying much of the financia~ assistance. This effort was also partiallysupported by the Office of Naval Research and the National Science Fobn­dation. Of particular benefit) of course) have been the faces of puzzled'confusion or of elated understanding, the critical comments and the sinceresuggestions of the many students who have worked through this materialas the book evolved.

DAVID G. LUENBERGER

Palo Alto, CaliforniaAugust 1968

i

I

~I'

Page 9: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

CONTENTS

1 INTRODUCTION 1

1.1 Motivation 11.2 Applications 21.3 The Main Principles 81.4 Organization of the Book 10

2 LINEAR SPACES 11

2.1 Introduction 11

VECTOR SPACES 112.2 Definition and Examples 112.3 Subspaces~ Linear Combinations, and Linear V~rieties 142.4 Convexity and Cones 172.5 Linear Independence and Dimension 19

NOR1vlED LINEAR SPACES 222.6 Definition and Examples 22'2.7 Open and Cilosed Sets 242.8 Convergence: 262.9 Transformations and Continuity 27

*2.10 The lp and Lp Spaces 292.11 Banach Spaces 332.12 Complete Subsets 38

*2.13 Extreme Vall11es of Functionals, and Compactness 39*2.14 Quotient Spaces 41*2.15 Denseness and Separability 42

2.16 Problems 43References 45

3 HILBERT SPACE 46

3.1 Introduction 46

IX

...- ..

Page 10: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

rT

.2.

y ,

~. .\

X CONTENTS

PRE·HILBERT SPACES 463.2 Inner Products 463.3 The Projection Theorem 493.4 Orthogonal Complements 523.5 The Gram-Schmidt Proccdllre 53

APPROXIMATiON 553.6 The Normal Equations and Gram Matrices 553.7 Fourier Series 58

*3.8 Complete Orthonormal Sequences 603.9 Approximation and Fourier Series 62

OTHER MINIMUM NORM PROBLEMS 643.10 The Dual Approximation Problem 64

*3.11 A Control Problem 683.12 Minimum Distance to a Convex Set 693.13 Problems 72

References 77

4 LEAST-SQUARES ESTIMATION 78

4.1 Introduction 784.2 Hilbert Space of Random Variables 794.3 The Least-Squares Estimate 824.4 Minimum-Variance Unbiased Estimate (Gauss-Markov Esti-

mate) 844.5 Minimum-Variance Estimate 87 '4.6 Additional Properties of Minimum-Variance Estimates 904.7 Recursive Estimation 934.8 Problems 97

References 102

5 DUAL SPACES 103

5.1 Introduction 103

LINEAR FUNCTIONALS 1045.2 Basic Concepts 1045,3 Duals of Some Common Banach Spaces 106

EXTENSION FORM OF THE HAHN-BANACHTHEOREM 110

5.4 Extension of Linear Functionals ] 105,5 The Dual of C[a, b] 1135,6 The Second Dual Space 1155.7 Alignment and Orthogonal Complements 116

\"

Page 11: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

./ .....

CONTENTS Xl

5.8 Minimum Norm Problems 1185.9 Applications 122

*5.l0 Weak Convergence 126

GEOMETRIC FORM OF THE HAHN-BANACHTHEOREM 129

5.11 Hyperplanes and Linear Functionals 1295.12 Hyperplan.es and Convex Sets 131

*5.13 Duality in Minimum Norm Problems 1345.14 Problems 137

References 142

6 LINEAR OPERATORS AND ADJOINTS 143

6.1 Introduction 1436.2 Fundamentals 143

INVERSE OPERATORS 1476.3 Linearity of Inverses 1476,4 The Banach Inverse Theorem 148

ADJOINTS 1506.5 Definition and Examples 15('6.6 Relations between Range and Nu]]space 1556.7 Duality Re:lations for Convex Cones 157

*6.8 Geometric Interpretation of Adjoints 159

OPTIMIZATION IN HILBERT SPACE 1606.9 The Normal Equations 1606.10 The Dual' Problem 1616.11 Pseudoinverse Operators 1636.12 Problems 165

References 168

7 OPTIMIZATION OF FUNCTlONALS 169

7.1 Introduction 169

7.27.37A

*7.5*7.6

7.7

LOCAL THEORY 171Gateaux and Frechet DifferentialsFrechet Derivatives 175Extrema 177Euler-Lagrange Equations 179Problems with Variable End PointsProblems with Constraints 185

171

183

Page 12: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

XlI CONTENTS

GLOBAL THEORY 190

7.8 Convex and Concave Functionals 190*7.9 Properties of the Set Lf, C] 1927.10 Conjugate Convex Functionals 1957.11 Conjugate Concave Functionals 1997.12 Dual Optimization Problems 200

*7.13 Min-Max Theorem of Game Theory 2067.14 Problems 209

References 212

8 GLOBAL THEORY OF CONSTRAINED OPTIMIZATION 213

8.1 Introduction 2138.2 Positive Cones and Convex Mappings 2148.3 Lagrange Multipliers 2168.4 Sufficiency 2198.5 Sensitivity 2218.6 Duality 2238.7 Applications 2268.8 Problems 236

References 238

9 LOCAL THEORY OF CONSTRAINED OPTIMIZATION 239

9.1

9.2. 9.3

9.4

Introduction 239

LAGRANGE MULTIPLIER THEOREMSInverse Function Theorem 240Equality Constraints 242Inequality Constraints (Kuhn-Tucker Theorem)

,240

247

~ "

OPTIMAL CONTROL THEORY 2549.5 Basic Necessary Conditions- 254

*9.6 Pontryagin Maximum Principle 2619.7 Problems 266

References 269

10 ITERATIVE METHODS OF OPTIMIZATION 271

10.1 Introduction 271

METHODS FOR SOLVING EQUATIONS 27210.2 Successive Approximation 272

1003 Newton's Method 277

Page 13: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

CONTENTS xiii

DESCENT METHODS 28310.4 General Philosophy 28310.5 Steepest Descent 285

CONJUGATE DIRECTION METHODS 29010.6 Fourier Series 290

*10.7 Orthogonalization of Moments 29310.8 The Conjugate Gradient Method 294

METHODS FOR SOLVING CONSTRAINEDPR01BLEMS 297

10.9 Projection ]\1ethods 29710.10 The Primal··Dual Method 29910.11 Penalty Functions 30210.12 Problems 308

References 311

SYMBOL INDEX 321

SUBJECT INDEX 323

Page 14: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

,

....

Page 15: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

NOTATION

Sets

If x is a member of the set S, we write xeS. The notation y ¢S means y isnot a member of S.

A set may be specified by listing its elements between braces such as. S ={I, 2, 3} for the set consisting of the first three positive integers.

Alternatively, a set S may be specified as consisting of all elements of theset X which have the .property P. This is written S ={x e X: P(x)} or~ ifX is understood, S = {x: P(x)}.

The union of two sets Sand T is denoted S u T and consists of thoseelements that are in either S or T.

The intersection of two sets Sand T is denoted S n T and consists ofthose el~ments that are in both Sand T. Two sets are disjoint if theiri!ltersection is empty.

IfS is defined as a subset of elements of X, the complement of S, denotedS, consists of those elements of X that are not in S.

A set S is a subset of the set T if every element of S is also an element ofT. In this case we write SeT or T:::> S. If S c:: T and S is not equal to Tthen S is said to be a proper subset of T.

Sets of Real Numbers

If a and b are real numbers, [a, b] denotes the set of rea] numbers x satis­fying a<' x <b. A rounded instead of square bracket denotes strictinequality in the definition. Thus (a, b] denotes all x with a < x < b.

If S is a set of real numbers bounded ahove, then there is a smallest realnumber y such that x ::s; y for all XES. 1he number y is called the leastupper bound or supremum of Sand is denoted sup (-") or sup {x: XES}

, xsS

If S is not bounded above we write sup (x) = 00. Similarly, the greatestxeS

lower bound or infimum of a set S is denoted inf (x) or inf {x: XES}.. "E S ,".,·,.,'c

xv

Page 16: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

XVI NOTATION

is finite. The notation g(x) := o(x) means that K~ above, is zero.

. g(x)K=hmsup -

x-o x

X>ox=ox<O

sgn (x) == { ~-1

Functions

The function sgn (pronounced sig~num) of a real variable is defined by

The Kronecker delta function oij is defined by ..

(1 i = j

00 = 0 i == j

The Dirac delta function b is used occasionally in heuristic discussions.It is defined by the relation

bf f(t)b(t) dt =[(0)/I

for every continuous function f provided that 0 E (a, h).

If9 is a real-valued function of a real variable we write S = lim sup g(x)X-. :Co

if: (1) for every B > 0 there is {) > 0 such that for all x satisfying Ix - xol< 0, g(x) < S + B, and (2) for every B > 0 and {) > 0 there is an x such thatIx - Xo I < (j and g(x) > S - E. (See the corresponding definitions forsequences.)

If 9 is a real-valued function of a real variable, the notation g(x) =O(x)means that

Sequences

A sequence Xt , x 2 , ..• , XII' •.• is denoted by {XI}~=l or {Xi} if the range ofthe indices is clear.

Let {x i} be an infinite sequence of real numbers and suppose that thereis a real number S satisfying: (I) for every € > 0 there is an N such that forall n > N, XII < S + €, and (2) for every € > 0 and M > 0 there is an n > Msuch that XII> S - E. Then S is called the limit superior of{x,J and we writeS = lim sup xn . If {xn} is not bounded above we write lim sup Xn = + 00.

1l"'~

The limit inferior of Xn is lim inf Xn = -lim sup ( - xn). If lim sup x" = Jiminf Xn = S, we write lim X n = S.

Page 17: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

. '."

NOTATION xvii

Matrices and Veil~tors

Avector xwithneomponents is written x = (Xl' XZ, ••• , x,J, but when usedin matrix calculations it is represented as a column vector, Le.,

The corresponding row vector is

An n x m matritx A with entry aij in its i-th row andj-th column is writtenA = [atj]' If x = (Xl' X 2 , ••• , XII)' the product Ax is the vector Y with com­ponents Yt = Li= 1 a'jXj ' i = 1, 2, ... , m.

Let/(x1' X2, ••• ,. XII) be a function of the n real variables Xi' Then wewrite Ix for the row vector

If F = (fh f2 , ••• :.fm) is a vector function of x = (XH ••• , xn), we writeFx for the m x n Jacobian matrix [aii/oxi]'

Page 18: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema
Page 19: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

1INTRODUCTION

1.1 Motivation

During the past twenty years mathematics and engineering have beenincreasingly directed towards proble~ is of decision making in physical ororganizational systems. This trend has been in~pired primarily by thesignificant economic benefits which often result from a proper decisionconcerning the distribution of expensive resources~ and by the repeateddemonstration that such problems can be realistically formulated andmathematically an:alyzed to obtain good decisions.

The arrival of high-speed digital computers has also played a majorrole in the development of the science of decision making. Computershave inspired the: development of larger systems and the coupling ofpreviously separate systems~ thereby resulting in decision and controlproblems of correspondingly increased complexity. At the same time,however, computt:rs have revolutionized applied mathematics and solved

.many of the complex problems they generated.It is perhaps natural that the concept of best or optimal decisions should

emerge as the fundamental approach for formulating decision problems.In this approach Ii single real quantity, summarizing the performance orvalue of a decisiQin~ is isolated and optimized (i.e., either maximized orminimized depending on the situation) by proper selection among availablealternatives. The resulting optimal decision is taken as the solution to thedecision problem. This approach to decision problems has the virtues ofsimplicity, preciseness, elegance, and, in many cases, mathematical tract~

ability. It also has obvious limitations due to the necessity of selecting asing{e objective by which to measure results. But optimization has provedits utility as a mode of analysis and is firmly entrenched in the field ofdecision making.JXl~c~,_(:)fthe classical theory of optimization, motivated primarily by

problems of physics, is associated with great mathematicians ~ Gauss,. . ~ ..Lagrange, Euler, the Bernoutis, etc. During the recent development ofoptimization in decision problems~ the classical techniques have been re­examined, extend,ed, sometimes rediscovered, and applied to problems

1

Page 20: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

2 INTRODUCTION 1

having quite different origins than those responsible for their earlierdevelopment. New insights have been obtained and new techniques havebeen discovered. The computer has rendered many techniques obsoletewhile making other previously impractical methods feasible and efficient.These recent developments in optimization have been made by mathe­maticians, system engineers, economists, operations. researchers, statis­ticians, numerical analysts~ and others in a host of different fields.

The study of optimization as an independeht topic must, of course, beregarded as a branch of applied mathematics. As such it must look tovarious areas of pure mathematics for its unification,clarification, andgeneral foundation. One such area of particular relevance is functionalanalysis.

Functional analysis is the study of vector spaces resulting from amerging of geometry, linear algebra, and analysis. It serves as a basis foraspects of several important branches of applied mathematics inc1UrdingFourier series, integral and differential equations, numerical analysis, andany field where linearity plays a key role. Its appeal as a unifying disciplinestems primarily from its geometric character. Most'"of the principal resultsin functional analysis are expressed as abstractions of intuitive geometricproperties of ordinary three-dimensional space.

Some readers may look with great expectation toward functionalanalysis, hoping to discover new powerful techniques that will enable themto solve important problems beyond the reach of simpler mathematicalanalysis. Such hopes are rarely realized in practice. The primary utilityof functional analysis for the purposes of this book is its role as a unifyingdiscipline, gathering a number of apparently diverse, specialized mathe­matical tricks into one or a few general geometric principles.

1.2 Applications

The main purpose of this section is to illustrate the variety of problemsthat can be formulated as optimization problems in vector space by intro~

ducing some specific examples that are treated in later chapters. As avehicle for this purpose, we classify optimization problems according tothe role of the decision maker. We list the classification, briefly describeits meaning, and illustrate it with one problem that can be formulated invector space and treated by the methods described later in the book. Theclassification is not intended to be necessarily complete nor, for that matter,particularly significant. It is merely representative of the classificationsoften employed when discussing optimization.

Although the formal definition of a vector space is not given untilChapter 2, we point out, in the examples that follow, how each problem

Page 21: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

.1.

§1.2 APPLICATIONS 3

can be regarded as formulated in some apprc""riate vector space. However,the details of the formulation must, in many cases, be deferred until laterchapters.

1. Allocation. In allocation problems there is typically a collection ofresources to be distributed in some optimal fashion. Almost any optimiza­tion problem can be placed in this broad category, but usually the termis reserved for problems in which the resources are distributed over spaceor among various activiities.

A typical problem of this type is that faced by a manufacturer with aninventory of raw materials. He has certain processing equipment capableof producing n different kinds of goods from the raw materials. Hisproblem is to allocate the raw materials among the possible products soas to maximize his profit.

In an idealized version of the problem, we assume that the productionand profit model is linear. Assume that the selling price per unit of productj is Pi ~ j = 1,2, ... , n. If xj denotes the amount of product} that is to beproduced, bi the amount of raw material i on hand, and au the amount ofmaterial i in one unit of product j, the manufacturer seeks to maximizehis profit

.PIXI + PZ X 2 + ... +Pnxn

subject to the productio:n constraints on the amount of raw materials

aU x 1 + a12 x 2 + ... + a1nx" ~ b1

a21x l + a22 x 2 + .. ·+ a2n x " ~ b2· .· .· .aml xl + am2 x2 + ... + am"xn ::;;; bm ·

and

This type of problem, characterized by a linear objective function subjectto linear ineq:uality constraints~ is ~ linear programming problem and isused to illustrate aspects of the general theory of optimization in laterchapters.

We note that the problem can be regarded as formulated in ordinaryn-dimensional vector space. The vector x with components Xi is the un­known. The constraints define a region ii_ the vector space in which theselected vector must lie. The optimal vector is the ~one in that regionmaximizing the objective. .

The manufacturing problem can be generalized to allow for nonlin.ear

Page 22: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

and one seeks the function r satisfying the inequality constraints

4 INTRODUCTION 1

objectives and more general constraints. Linearity is destroyed, which maymake the solution more difficult to obtain, but the problem can stili beregarded as one in ordinary Euclidean n-cltimensional space.

2. Planning. Planning is the problem of determining an optimal pro­cedure for attaining a set of objectives. In common usage, planning refersespecially to those problems involving outlays of capital over a period oftime such as (1) planning a future investment in electric power generationequipment for a given geographic region or (2) determining the best hiringpolicy in order to complete a complex'project at minimum expense.

As an example, consider a problem of production planning. A firmproducing a certain product wishes to plan its production schedule over aperiod of time in an optimal fashion. It is assumed that a fixed demandfunction over the time interval is known and that this demand must be met.!Excess inventory must be stored at a storage cost proportional to theamount stored. There is a production .cost associated with a given rate ofproduction. Thus, denoting x(t) as the stock held at time t, ret) as the rateof production' at time I, and d(t) as the demand at time t, the productionsystem can be described by the equations! ,

x(O) given

<

!1

I -I1 II

·i

: .~

.]

x(t) = ret) - d(t),

ret) ;::: a,}x(O) + ([r(r) - de!)] dt = x(t) > 0 I

and minimizing the cost

for 0 ~ t :$ T

.TJ = J

o{c[r(t)] + h . xCt)} dt

where dr] is the production cost rate for the production level rand h . xis the inventory cost rate for inventory level x.

This problem can be regarded as defined on a vector spa'Ce consisting ofcontinuous functions on. the interval [0, T] .of the realline. The optimalproduction schedule r is then an· element of the space; Again the con­straints define a region in the space in which the solution r must lie whileminimizing the cost.

3. Control (or Guidance). Problems of control are asso~iated with dy­namic systems evolving in time. Control is quite similar to planning;

1 x(t) ~ dx(t)/dt.

Page 23: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§1.2 APPLICATIONS 5

indeed, as we shall see, it is often the source of a problem rather than itsmathematical structure which determines its category.

Control or guidance usually refers to directed! influence on a dynamicsystem to achieve desired performance. The system itself may be physicalin nature, such as a rocket heading for Mars or a chemical plant processingacid, or it may be operational such as a warehouse receiving and fillingorders.

Often we seek feedback or so-caHed dosed-loop control in which de­cisions of current control action are made continuously in time based onrecent observations of system behavior. Thus, one may imagine himself asa controller sitting at the control panel watching meters and turning knobsor in a warehouse ordering new stock based on inventory and predicteddemand. This is in contrast to the approach described for planning inwhich the whole series of control actions is predetermined. Generally,however~ the terms planning or control may refer to either possibility.

As an example of a control problem, we consider the launch of a rocketto a fixed altitudeh in given time T while expending a minimum of fuel."For simplicity, we assume unit mass, a constant gravitational force, andthe absence of aerodynamic forces. The motion of a rocket being propelledvertically is governed by the equations

y(t) = u(t) - 9

where y is the vertical height, u is the accelerating force, and 9 is thegravitational force. The optimal control function u is the one whichforces yeT) = h whi;le minimizing the fuel expenditure Jb' lu(t)1 dt.

This problem tOQI might be formulated in a vector space consisting offunctions u defined on the interval [0, i]. The solution to this problem,however, is that u(t)i consists of an impulse at t = 0 lind, therefore, correctproblem formulaticln and- selection of an appropriate vector space arethemselves interestimg aspects of this example. Problems of this type, jn­eluding this specific example, are diHcussed in Chaptet S.

4. Approximation!. Approximation problems are motivated by the desireto approximate a gcmeral mathematical entity (such as a function) by oneof simpler, specifiedl form. A large class of such approximation problemsis important in num.erical analysis. For example, suppose we wish, becauseof storage limitatioIlls or for purposes ofsimplifying an analysis. to approxi­mate a function, say x(t), over an interval [a, b] of the real line by a poly~

nomial pet} of orde~r n. The best approximating polynomial p minimizesthe error e = x - p in the sense of some criterion. The choice of criteriondetermines the approximation. Often used criteria are:

Page 24: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

,\ .

6 INTRODueTION 1

b"t" fe 2(t) dt

a "

2. max le(t)\a~t~b

b

3. Jle(t)\ dt.Q

The problem is quite naturally viewed as fO'rmulated in a vector spaceof functions overthe interval [a, b]. The problem is then viewed as fmdinga vector from a given class (polynomials) which is closest to a given vector.

5. Estimation. Estimation problems are really a special class of approxi­mation problems. We seek to. estimate some quantity from imperfectobservations of it or from observations which" are statistically correlatedbut not deterministically related to it. Loosely speaking, the problemamounts to approximating the unobservable quantity by a combination ofthe observable ones. For example, the position of a random maneuveringairplane at some future time might reasonably be estimated by a linearcombination of past measurements of its position. .

Another example of estimation arises in connection with triangulationproblems such as in location of torest fires, ships at sea, or "remote stars.Suppose there are three lookout stations, each of which measures the angleof the line-of~sight from the station to the observed object. The situation isillustrated in Figure 1.1. Given these three angles, what is the best estimateof the object's location?

Figure 1.1 A triangulation problem

To formulate the problem completely, a criterion must be preciselyprescribed and hypotheses specified regarding the nature of probable

Page 25: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§1.2 APPLICATIONS 7

measurement errors and probable location of the object. Approaches canbe taken that-resulLin a problem form~.dated in vector space; ~uc_h prob­lems are di'Scussed in Chapter 4.

,6. Games. Many problems involving a competItIve element can beregarded as' games. In the usual formulation, involving two players orprotagonists, there is an objective function whose value depends jointly onthe action employed by both players. One player attempts to maximizethis objective while the other attempts to minimize it.

Often two problems from the categories discussed above can be com­petitively intermixed to produce a game. Combinations of categories thatlead to interesting games. include: allocation-allocation, allocation-control,control-control, and estimation-control.

As an example, consider a control-con:rol game. Most problems of thistype are of the pursuer-evader type such as a fighter plane chasing abomber. Each player has a system he contro~" but one is trying to maximize

, the objective (time to, intercept for instance) while the other is trying tominimize the objective.

As a simpler example, we consider a problem of advertising or campaign­ing which is essentially an allocation~allocation game. 2 Two opposingcandidates, A and B, are running for office and must plan how to allocatetheir advertising resources (A and B dollars, respectively) among n distinctgeographical areas. Let Xi and Yf denote, respectively, the resources com­mitted to area i by candidates A and B. We assume that there are currentlya total of u undecided votes of which there are U i undecided votes in area i.The number. of votes going to candidates A and B from area'i are assumedto be

respectively. The total difference between the number of votes received byA and by B is' then '

~ Xi - YtL.- --- Uj'

i;:; 1 Xi ,.. Yl

Candidate A seeks to maximize this quantity while B seeks to minimize it.This problem is obviously finite dimensional and can be solved by

ordinary calculus in a few lines. It is illustrative, however, of an iJiterestingclass of game problems.

2. This problem is due to L. Friedman [57].

Page 26: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

.j

i 8 INTRODUCTION 1

,

1.3 The Main Prindples

The theory of optimization presented in this book is derived from a fewsimple, intuitive, geometric relations. The extension of these relations toinfinite-dimensional spaces is the motivation for the mathematics offunctional analysis which, in a sense, often enables us to extend our three­dimensional geometric insights to complex jnfinite~dimensionalproblems.This is the conceptua[ utility of functionall analysis. On the other hand,these simple geometric relations have great practical utility as well becausea vast assortment of problems can be analyzed from this point of view.

In this section, we briefly describe a few of the important geometricprinciples of optimization that are developed in detail in later chapters.

1. The Projection Theorem. This theorem is one of the simplest andnicest results of optimization theory. In ordinary three-dimensionalEuclidean space, it states that the shortest line from a point to a plane isfurnished by the perpendicular from the point to the plane, as illustratedin Figure 1.2.

Figure 1.2 The projection theorem

This simple and seemingly innocuous result has direct extensions inspaces of higher dimension and in infinite-dimensio.nal Hilbert space. Inthe generalized form, this optimization principle forms the basis of allleast-squares approximation, control, and estimation procedures.

2. Tile Hahn-Banach Theorem. Of the many results and concepts infunctional analysis, the one theorem dominating the theme of this bookand embodying the essence of the simple geometric ideas upon whichthe theory is built is the Hahn-Banach theorem. The theorem takes severalforms. One version extends the projection theorem to problems havingnonquadratic objectives. In this manner the simple geometric interpretationis preserved for these more complex problems. Another version of the

Page 27: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§1.3 THE MAIN PRINCIPLES 9

Hahn-Banach theorem states (in simplest form) that given a sphere and apoint not in the sphere there is a byperplane separating the point and thesphere: This version of the theorem, together with the associated notionsof hyperplanes, is the basis for most of the theory beyond Chapter 5.

3. Duality. There are several duality principles in optimization theorythat relate a probl~em expressed in terms of vectors in a space to a problemexpressed in terms of hyperplanes in the space. This concept of duality isa recurring theme in this book.

Many of these duality principles are based on the geometric relationillustrated in Figure 1.3. The shortest distance from a point to a convexset is equal to the maximum of the distances from the point to a hyper­plane. separating the point from the convex set. Thus, the original mini­mization over vectors can be converted to maximization over hyperplanes.

Figure 1.3 Duality

4. Differentials. Perhaps the most familiar optimization technique isthe method of differential calculus-setting the derivative of the objectivefunction equal to zero. The technique is discussed for a single Of, perhaps,finite number of variables in the most elementary courses on differentialcakulus. Its extension to infinite-dimensional spaces is straightforward and,in that form, it can be applied to a variety of interesting optimization prob­lems>. Much of the classical theory of the calculus of variations can beviewed as a cons~~quence of this principle.

The geometric interpretation of the technigue for one-dimensionalproblems is obvious. At a maximum or minimum the tangent to the graphof a function is horizontal. In higher dimensions the geometric interpreta­tion is similar: at a 'maximum or mnlimum the tangent hyperplane to the

Page 28: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

J",,\1 10 INTRODUCTION 1

II

11 1, i

;1.~. 1

I

graph is horizontal. Thus, again we are led to observe the fundamentalrole of hyperplanes in optimization.

1.4 Organization of the Book

Before our discussion of optimization can begin in earnest, certain funda~

mental concepts and results of linear vector space theory must be intro~

duced. Chapter 2 is devoted to that task. The chapter consists of materialthat is standard, elementary functional analysis background and is essen­tial for further pursuit of our objectives. Anyone having some familiaritywith linear algebra and analysis should have little difficulty with thischapter.

Chapters 3 and 4 are devoted to the projection theorem in Hilbert spaceand its applications. Chapter 3 develops the general theory, illustrating itwith some applications from Fourier approximation and optimal controltheory. Chapter 4 deals solely with,the applications of the projectiontheorem to estimation problems including the recursive estimation andprediction of time series as developed by Kalman.

Chapter 5 is devoted to the Hahn-Banach theorem. It is in this chapterthat we meet with full force the essential ingredients of the general theoryof optimization: hyperplanes, duality, and convexity.

Chapter 6 discusses linear transformations on a vector space and is thelast chapter devoted to the elements of linear functional analysis. Theconcept of duality is pursued in this chapter through the introduction ofadjoint transformations and their relation to minimum norm problems.'The pseudoinverse of an operator in Hilbert space is discussed.

Chapters 7, 8, and 9 consider general optimization problems in linearspaces. Two basic approaches, the local theory leading to differentialconditions and the global theory relying on convexity, are isolated anddiscussed in a parallel fashion. The techniques in these chapters are a directoutgrowth of the principles of earlier chapters, and geometric visualizationis stressed wherever possible. In the course of the development, we treatproblems from the calculus of variations, the Fenchel conjugate functiontheory, Lagrange multipliers, the Kuhn-Tucker theorem, and Pontryagin'smaximum principle for optimal control problems.

Finally, Chapter 10 contains an introduction to iterative techniques forthe solution of optimization problems. Some techniques in this chapterare quite different than those in previous chapters, but many are based onextensions of the same logic and geometrical considerations found to be sofruitful throughout the book. The methods discussed include successiveapproximation, Newton's method, steepest descent, conjugate gradients,the primal-dual method, and penalty functions.

Page 29: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

2

2.1 Introduction

The first few sections of this chapter introduce the concept of a vectorspace and explor(l the elementary properties resulting from the basicdefinition. The notions of subspace, linear independence, convexity, anddimension are developed and illustrated by examples. The material islargely review for most readers since it duplicates the first part of standardcourses in linear algebra.

The second part of the chapter discusses the basic properties oLnormedlinear spaces. A nClrmed linear space is a vector space having a measure ofdistance or length defined on it. With the introduction of a norm, itbecomes possible to define analytical or topological properties such asconvergence and open and closed sets. Therefore, that portion of thechapter introduces and explores these basic concepts which distinguishfunctional analysis from linear algebra.

VECTOR SPACES

2.2 Definition and Examples

Associated with every vector space is a set of scalars used to define scalar'multiplication on the space. In the most abstract Jetting these scalars arerequired only to he elements of an algebraic field. However, in this bookthe scalars are always taken to be either the set of real numbers or ofcomplex numbers. We sometimes distinguish between these possibilitiesby referring to a vector space as either a real or a complex vector space. Inthis book, however, the primary emphasis is on real vector spaces and,although occasion:al reference is made to complex spaces, many resultsare derived only for real spaces. In case of ambiguity, the reader shouldassume the space to be real.

Definition. A vector space X is a set of elements called vectors togetherwith two operations. The first operation is addition which associates with

11

.... ,

Page 30: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

any two vectors x, y E X a· vector x +Y E X, the sum of x and y. Thesecond operation is scalar multiplication which associates with any vectorx E X and any scalar C( a vector CtX; the scalar multiple of x by (x. The set Xand the operations of addition and scalar multiplication are assumed tosafisfy the following axioms:

Some additional properties are given as exercises· at the end of thechapter.

Example 1. Perhaps the simplest example of a vector space is the set ofreal numbers. It is a real vector space with addition defined in the usualway and multiplication by (real) scalars defined as ordinary multiplication.The null vector is the real number zero. The properties of ordinary additionand multiplication of real numbers satisfy the axioms in the definition of avector space. This vector space is called the one-dimensional real co­ordinate space or simply the real line. It is denoted by R1 or simply R.

Example 2. An obvious extension of Example 1 is to n-dimensional realcoordinate space. Vectors in the space consist of sequences (n-tup]e~) ofn real numbers so that a typical vector has the form x = (~1' e2 , ••• , 'II).

(associative law)

(distributive laws)

(cancellation laws).

(distributive laws)

(commutative law)(associative law)

_........--,,-_.__.--.......

LINEAR SPACES 2

Proposition 1. In any vector space:

1. x + y = x + z implies y = z. }2. ax = exy and (X "# 0 imply x = y.3. !Xx = fJx and x:#:() imply a =. /3. I

4. (ex - f3)x = ax - f3x. )5. rx(x - y) = ax - (Xy.6. tie = e.

1. x+y=y+x.2. (x + y) + z = x + (y + z).3. There is a null vector e in X such

that x + 8 = x for aU x in X.4. a(x + y) = ax + ay. )5. «(x + {J)x = (Xx + f3x.6. (rxf3)x = rx({3x) .7. Ox = 8, 1x = X.

\For convenience the vector - 1 x is qenoted -:- x and called the negative

of the vector x. We have x + (-x) = (1 - l)x = Ox = 8.There are several elementary but important properties of vector spaces

that follow directly from the axioms listed in the definition. For examplethe following properties are easily deduced. The details are left to thereader.

12

Page 31: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.2 DEFINITION AND EXAMPLES 13

The real number 'k is referred to as the k-th component of the vector. Twovectors are equal if their corresponding components are equal. The nullvector is defined as () = (0,0, ., .,0). If x = (~1' ~2' ... , en) and y =(~1' 112' .. " ~"), the vector x + y is defined as the n-tuple whose k~th

component is ~k + ~k' The vector lXX, where (% is a (real) scalar, is then-tup1e whose k-th component is CX~k' The axioms in the definition areverified by checking for equality among components, For example, ifx = (~h e2 , ••• , ~n), the relation ek + 0 = ek implies x + (} =x.

This space, n-dimensional real coordinate space, is denoted by Rn, The

corresponding complex space consisting of n-tuples of complex numbersis denoted bye".

At this point we are, strictly speaking, somewhat prematurely intro­ducing the term dimensionality. Later in this chapter the notion of dimen­sion is defined, and it is proved that these spaces are in fact n dimensional.

Example 3. Several interesting vector spaces can be constructed withvectors consisting of infinite sequences of real numbers so that a typicalvector has the for.m x = (,1' '2"'" eb"') or, equivalently,x = {~k}k~l'

Again addition and multiplication '.re defined componentwise as inExample 2. The collection of all infinite sequences of real numbers formsa vector space. A sequence{ek} is said to be bounded if there is a constantM such that I~kl < M for all k. The collection of all bounded infinite

:' sequences forms a vector space since the sum of two bounded sequencesor the scalar multiple of a bounded sequence is again bounded. This spaceis referred to as the space of bounded real sequences.

Example 4. The collection of all sequences of real numbers having only afinite number of terms not equal to zero is a vector space. (Differentmembers of the space may have different numbers of nonzero components.)This space is called the space of finitely nonzero sequences.

Example 5. The collection of infinite sequences of real numbers whichconverge to zero is a vector space since the sum of two sequences con­verging to zero or the scalar multiple of a sequence converging to zero alsoconverges to zero.

Example 6. Consider the interval [a, bJ on the real line. The collection ofall real-valued continuous functions on this interval forms a vector space.Write x = y if x(t) := y(t) for aU t E [a, b]. The null vector () is the functionidentically zero on [a, b]. If X and yare vectors in the space and ex is a(real) scalar, write (x + y)(t) = x(t) + yet) and (~x)(t) = ax(t). These areobviously continuous functions. This space is referred to as the vectorspace of real-valued continuous functions on [a, b].

Page 32: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

14 LINEAR SPACES 2

l,L

1,"

.,.t.

I..1..

..t.f

..L

,J....

I·.c.

,i' ,I .

! ~... ;,,,....

itIf"····""·"I\~

! I', :I

r<'''''\>''

Example 7. The collection of all polynomial functions defined on theinterval [a, b] with complex. coefficients forms a complex vector space.The null vector, addition,· and scalar multiplication are defined as inExample 6. The sum of two polynomials and the scalar multiple of apolynomial are themselves polynomials.

We now consider how a set ofvectot spaces can be combined to producea larger one.

Definition. Let X and Y be vector spaces over the same field of scalars.Then the Cartesian product of X and Y, denoted X x Y, consists of thecollection of ordered pairs ex, y) with x e X, y eY. Addition and scalarmultiplication are defined on X x Y by (Xl' Yl) + (x2 , Y2) == (Xl + X2'

Yl + Y2) and a(x, y) = (etX, ety).,

That the above definition is tonsistent with the axioms of a vector spaceis obvious. The definition is easily generalized to the product of n vectorspaces Xl' Xl' ... , Xn • We write XIf for the product of a vector spacewith itself n times.

2.3 Subspaces, Linear Combinations, and Linear Varieties

Definition. A nonempty subset M of a vector space X is called a subspaceof X if every vector of the form o:x + py is in M whenever x and yareboth in M.

Since a subspace is assumed to be nonempty~ it must contain at leastone element x. By definition, it must then also contain Ox = (}~ so everysubspace contains the null vector. The simplest subspace is the set con­sisting of e alone. In three~dimensionalspace a plane that passes throughthe origin is a subspace as is a line through the origin.

The entire space X is itself a subspace of X. A subspace notequal to theentire space is said to be a proper subspace.

Any subspace contains sums and scalar multiples of its elements;satisfaction of the seven axioms in the entire space implies that they aresatisfied within the subspace. Therefore a subspace is itself a vector spaceand this observation justifies the terminology.

If X is the space of n~tuples, the set of n-tuples having the first com­ponent equal to zero is a subspace of X. The space of convergent infinitesequences is a subspace of the vector space of bounded sequences. Thespace of continuous functions on [0, 1] which are zero at the point one­half is a subspace of the vector space of continuous functions.

In three-dimensional space, the intersection of two distinct planes

.~ _c........ .........

Page 33: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.3 SUBSPACES t LINEAR COMBINATIONS, AND LINEAR VARIETIES 15

containing the origin is a line containing the origin. This is a special caseof the following result.

Proposition 1. Let M and N be subspaces of a vector space X. Then theintersection, M n N, of M and N is a Jubspace of X.

Proof M (i N contains 0 since 0 is contained in each of the subspacesM and N. Therefore, M (\ N is nonempt.;' If x and yare in M (i N, theyare in both M and N. For any scalars IX, p the vector C'x + f3y is containedin both M and N since M and N are subspaces. Therefore, (Xx + py iscontained in the intersection M n N. I

The union of two subspaces in a vector space is not necessarily a sub­space. In the plane, for example, the union of two (noncolinear) linesthrough the origin does not contain arbitrary sums of its elements. How·ever, two subspaces may be combined to form a larger subspace by intro~

dueing the notion oftlle sum of two sets.

Definition. The sum of two subsets Sand T in a vector space, denotedS + T, consists of all vectors of the form s + t where s E Sand t E T.

The sum of two set:s is illustrated in Figure 2. t .

........--

Figure 2.1 The sum of two sets

Proposition 2. Let· M and N be subspaces of a vector space X. Then theirsum, M + N, is a subspace of X. •

Proof. Clearly M' +N contains O. Suppose x and yare vectors inM + N. There are vectors m1 , mz in M and vectors nh nz in N suc.~,that

Page 34: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

;.:l"Ii"!

16 LINEAR SPACES 2

x = tnl + nnY = m 2 + n2. For any scalars c(, P; c(x + py == (C(m1 + {3m2) +(C(n1 + {3nz). Therefore, ax + f3y can be expressed as the sum of a vectorin the subspace M and a vector in the subspace N. I

In two-dimensional Euclidean space the sum of two noncolinear linesthrough the origin is the entire space. The set of even continuous functionsand the set of odd continuous functions are subspaces of the space X ofcontinuous functions on the real line. Their sum is the whole space Xsince any continuous function can be expressed as a sum of an even andan odd continuous function.

Definition. A linear combination of the vectors Xl' x 2 , ••• , X n in a vectorspace is a sum of the form C(lXt + C£2X2 + ... + C(nxn'

Actually, addition llas been defined previously only for a sum of twovectors. To form a stlm consisting ofn elements, the sum must be performedtwo at a time. It follows from the axioms, however, that analogous to thecorresponding operations with real numbers, the result is independent ofthe order of summation. There is thus no ambiguity in the simplifiednotation.

It is apparent that a linear combination of vectors from a subspace isalso in the sUibspace. Conversely, linear combinations can be used toconstruct a subspace from an arbitrary subset in a vector space;'

Definition. Suppose S is a subset of a vector space X. The set [8], calledthe subspace generated by S, consists of all vectors in X which are lin~ar

combinations of vectors in S.

The verification that [SJ is a subspace in X foillows from the obviousfact that a linear combination of linear combinations is also a linearcombination.

There is an interesting characterization of the subspace [S). The set Sis, in general, wholly contained in a number of subspaces. Of these, thesubspace [S] is the smallest in the sense that if M is a subspace containingS, then M contains [S].. This statement is proved by noting that if thesubspace M contains S, it must contain all linear combinations from S.

In three-dimensional space the subspace generated by a two-dimensionalcircle centered at the origin isa plane. The subspace generated by a planenot passing through the origin is the whole space. A subspace is a generali­zationof our intuitive notion of a plane or line through the origin. Thetranslation of a subspace~ therefore, is a generalization of an arbitraryplane or line.

Definition. The translation of a subspace is said to be a linear variety.

Page 35: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

t2.4 CONVEXITY AND CONES l7

A linear varietyl V can be written as V = Xo + M where M is a sub:­space. In this representation the subspace M is unique, but any vector inV can serve as Xo . This is illustrated in Figure 2.2.

Given a subset S, we can construct the smallest linear variety contain­ing S.

iI, ,.

\M\\\\\\\\e

Figure 2.2 A linear variety

Definition. Let S be a nonempty subset of a vector space X. The linearvariety generated by S, denoted v(S) is defined as the intersection of alllinear varieties in X that contain S.

We leave it to the reader to justify the above definition by showing thatv(S) is indeed a llinear variety.

2.4 Convexity and Cones

We come now to the topic that is responsible for a surprising number ofthe results in this book and which generalizes many of the useful propertiesof subspaces and linear varieties.

Definition. A set K in a linear vector space is said to be convex if, givenXi' X2 E K, all points of the form ax! + (1 - ex)x2 with 0 < IX < 1 are in K.

This definition merely says that given two points in a convex set, the linesegment between, them is also in the set. Examples are shown in Figure 2.3Note in particular that subspaces and linear varieties are convex. Theempty set is considered convex.

1 Other names for a linear variety include: flat, affine subspace, and linear manifold.The term linear manifold, although commonly used, is reserved by many authors as analternative term for subspace.

Page 36: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

Figure 2.4 Convex hulls

Convex Nonconvex

Figure 2.3 Convex and nonconvex sets

LINEAR SPACES 2

Note that the justification of this definition rests with Proposition 2since it guarantees the existence of a smallest convex set containing S.Some examples of a set and its convex hull are illustrated in Figure 2.4.

We also have the following elementary property.

Proposition 2. Let t:C be an arbitrary collection of convex sets. ThennKelt K is, convex.

Proof. Let C = nK e I:fl K. If C is empty, the lemma is trivially proved.Assume that Xl' X2 E C and select (x, 0 < ~ < 1. Then Xl' Xl E K for allK E l(6, and since K is convex, aXt + (1 - ct)x2 E K for all K E 'C. Thusax! + (1 - a)x2 E C and C is convex. I

Definition. Let S be an arbitrary set in a linear vector space. The convexcover or convex hull, denoted co(S) is the smallest convex set containingS. In other words, co(S) is the intersection of all convex sets containing S.

Definition. A set C in a linear vector space is said to be a cone with vertexat the origin if X E C implies that (Xx E C for all (;( > O.

• )00,. ~.- .'•••• ~;}!'j

~ .", ~ .

The following relations for convex sets are elementary but important.Their proofs are left to the reader.

Proposition 1. 'Let K and G be convex sets in a vector space. Then

1. aK = {x: x = ak, k E X} is convex/or any scalar a.2. K + G is convex.

18

\''i"

Page 37: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.S LINEAR INDEPENDENCE AND DIMB.NSION 19

Several cases are shown in Figure 2.5. A cone with vertex p is defined as atranslation p + C of a cone C with vertex at the origin. If the vertex of acone is not e~plicitly mentioned, it is assumed to be the origin. A convexcone is, of course, defined as a set which is both convex and a cone..Ofthe cones in Figure 2.5, only (b) is a convex cone. Cones again generalizethe concepts of subspace and linear variety since both of these are convexcones.

(0) (b)

Figure 2.5 Cones

(c)

Convex cones usually arise in connection with the definition of positivevectors in a vector space. For instance, in n-dimensional real coordinatespace, the set

p = {x : x = {~1' '2' ... ,,,,}, 'i ~ 0 all i},

defining the positive orthant, is a convex cone. Likewise, the set of allnonnegative continuous functions is a convex cone in the vector space ofcontinuous fun·ctions.

2.5 Linear Independence and Dimension

In this section we first introduce the concept of linear independence, whichis important for any general study of vector space, and then review theessential distinguishing features of finite-dimensional space: basis anddimension.

Definition. A vector x is said to be linearly dependent upon a set S of.vectors if x can be expressed as a linear combination of vectors from S.Equivalently, :x' is linearly dependent upon S if x is in [8], the subspacegenerated by S. Conversely, the vector x is sa.id to be linearly independentof the set S if it is not linearly dependent on S; a set of vectors is said tobe a linearly independent set if each vector in the set is linearly independentof the remaind'er of the set.

Page 38: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

20 LINEAR SPACES 2

Thus,two vectors are linearly independent if they do not lie on a com­mon line through the origin, three vectors an~ linearly independent if theydo not lie in a plane through the origin, etc. H fo~lows from our definitionthat the vector () is dependent on any given vector x since () = Ox. Also,by cOli1Vcntion, the set consisting or () alone is understood to be a dependentset. On the other hand, a set consisting of a single nonzero vector is anindependent set. With these Co.rtventions the following major test for linearindependence is applicable even to trivial sets consisting of a single vector.

Theorem 1. A necessary and sufficient condition for the seto! vectorsXl' X2' •.. ~ X n to he linearly independent is that the expression Ik=: 1 ctk Xk = eimplies tl.k = ofor all k = 1, 2,3, .,., n.

Prooj~ To prove the necessity of the condition, assume thatLk = 1 ak Xk = 0, and that for some index r, lI.r i= O. Since the coefficient ofX r is nonzero, the original relation may be rearranged to produceXr = Lk*r (-ak/r:t.r}xk which shows Xr to be linearly· dependent on theremaining vectors.

To prove the sufficiency of the condition, note that linear dependenceamong the vectors implies that one vector, say x,., can be expressed as alinear combination of the others~ xr = Lk;/,rClkXk' Rearrangement givesLk*r (J.kXk - X r = ewhich is the desired relation. I

An important consequence of this theorem is that a vector expressed asa linear combination of linearly independent vectors can be so expressedin only one way.

Corollary 1. If Xl' Xl' ... , X'I are linearly independent vectors, and 1/

Ii:=: 1 (Xk X k = L~= 1 PkXk, then elk = fikfor all k :;;: 1,2, ... , n.

Proof lfI~=l akxk = LI:=l fikXb then L~=:l (exk - f3k)X k = () andr:tk - f3k = 0 according to Theorem I. I

We turn now from the general notion of linear independence to thespecial topic of finite dimensionality, Consider a set S of linearly inde­pendent vectors in a vector space. These vectors may be used to generate[8J, a certain subspace of the vector space. If the subspace ISJ is actuallythe entire vector space, every vector in the space is expressible as a linearcombination of vectors from the original set S. Furthermore, accordingto the corollary to Theorem 1, the expression is unique.

Definition. A, finite set S of linearly independent vectors is said to be abasis for the space X if S generates X. A vector space having a finite basisis said to be finite dimensional. All other vector spaces are said to beinfinite dimensional.

Page 39: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

______ ,_......- -.c...._

§2.5 LINEAR INDEr~NDENCE AND DIMENSION 21

Usually, we characterize a finite-dimensional space by the number ofelements in a basis. Thus, a space with a basis consisting of n elementsis referred to as n·diml~nsional space. This practice would be undesirableon grounds of ambiguity if the number of elements in a basis for a spacewere not unique. The following theorem is, for this reason, a fundamentalresult in the study of finite-dimensional spaces.

Theorem 2. Any two bases for a finite-dimensional vector space contain thesame number of elements.

Proof. Suppose that {Xl' X2' ••• , x n} and {Yt, Y2' ... , Ym} are basesfor a vector space X. Suppose also that m > n. We shall substitute Y vectorsfor x vectors one by one in the first basis until all the x vectors are replaced.

Since the x/s form a basis, the vector Yl may be expressed as a linearcombination of them, say, Yl = I,i=l cttXj' Since Y11= 8, at least one ofthe scalars ctt must be nonzero. Rearranging the xt's if necessary, it maybe assumed that C<:1 :F O. Then Xl may be expressed as a linear combinationof h, x2 , X 3 , ••• , X n by the formula

The set Yl' X 2 , ••• , X n generates X since any linear combination of theoriginal x/s becomes an equivalent linear combination of this new setwhen Xl is replaced according to the above formula.

Suppose now that k - 1 ·of the vectors Xi have been replaced by thefirst k - 1Yl'S. The vector Yk can be expressed as a linear combination ofYllY2, ""Yk-l' Xk' .•• ,Xn , say,

Ie-l n

Yk = LC(iYI + IPtXi'i= 1 t=k

Since the vectors Yl, Y2' ... ,Yk are linearly independent, not all the P,'scan be zero. Rearrangitng xk , Xk+l"'" xn if necessary, it may be assumedthat Pk:l:. O. Then Xk can be solved for as a linear combination ofYl' Y2 , ••• , Yk' Xk + 1, ... , X n , and this new set of vectors generates X.

By induction on k then, we can replace all nof the xt's by y/s, forming agenerating set at each step. This implies that the independent vectorsY1' Y2 , •.. , Y/I generate X and hence forrr: a. basis for X. Therefore, by theassumption of linear independence of {Yl' Y2., ... , Ym}, we must haven=m. I

Finite-dimensional spaces are somewhat simpler to analyze t~lan

infinite-dimensional spaces. Fewer definitions are required, fewer patholog­ical cases arise, and O\Jlr native intuition is contradicted in fewer instances

]

Page 40: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

22 LINEAR SPACES 2

infinite~dimensional 'spac~s. Furthermore; there, are ~heorems and' con~cepts in finite-dimensional space which have no' direct counterpart ininfinite-dimensional spaces. Occasionally~ the a:vailabilh:y of these special'characte,ristics of finite dimensionality is essential to obtaining a solutionto a particular problem. It is more usual, however, that results first derivedfor 'finite·dimensional spaces do have direct analogs in more generalspaces. In these cases, verificatio1'l; of the corresponding result in infinite­dimensional space often enhances our· ullderstanding by indicating pre­cisely which properties of the space are responsIble for the result. Weconstantly endeavor to stress the similarities between infinite- and finite­dimensional spaces rather than the few minor differences.

NORMED LINEAR SPACES

2.6 Definition and Examples

The vector spaces of part\cular interest in both abstract ana..lysis and appli­cations have a good deal more structure than that implied solely by theseven principal axioms. The vector space axioms only describe algebraicproperties of the elements of the space: additIon,' scalar multiplication,and combinations of these. What are. missing are the topological conceptssuch as openness, closure, convergence, and completeness. These' conceptscan be provided by the introduction of a measure of distance in a space.

Definition. A normed linear vector space is a vector space X on whichthere is defined a real~valued function which maps each element x· in Xinto a real number Ilxl'l called the norm of x. The norm satisfies the follow­ing axioms:

1. \Ixll > 0 for all x E X, Ilxll = 0 if and only if x = e.2. Ilx + yll < Ilxll + Ilyll for each x, y E X. (triangle inequality)3. Ilexxll = \exl' IIxll for all scalars a and each x E X.

The norm is ct,early an abstraction of our usual concept of length. Thefollowing useful inequality is a direct consequence of the trian~le inequality.

Lemma t. In a normed linear space Ilxll - lIyll < Ilx - Yll' for any twovectors x, y.

Proof·Ilxll - llYl1 = Ilx - y +Yll - Ilyll < Ilx - Yll + Ily\1 - I\YII = llx - YII· I

By introduction of a suitable norm, many of our earlier examples ofvector spaces can be converted to normed spaces.

i!itIiI

i

\

!

{

t<

iI

rII

Ii

Page 41: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.6 DEFINITION AND EXAMPLES 23

Example 1. The normed.1inear space C[a, b] consists ofcontinuous func­tions on the real interval [a, b] together with the norm IIxll = max Ix{t)l.

a;s;tsbThis space was considered as a vector space in Section 2.2. We now verifythat the proposed norm satisfies the three required axioms. Obviously,Ilxll > 0 and is zero only for the function which is identically zero. Thetriangle inequality follows from the relation

max Ix(t) + y(t)1 ~ max [lxCt)1 + ly(t)l] S; max Ix(t)1 + max ly(t)l.

Finally, the third aximn follows from the relation

max Icu:(t) I = max·lctllx(t)1 = letl max Ix(t)l·

Example 2. The norml~d linear space D[a, b] consists of all functions onthe interval [a, b] whkh are continuous and have continuous derivativeson [a, b]. The norm on the space D[a, l,J is defined as

Ilxll = max Ix(t)1 + max li(t)I.astsb aStsb

We leave it to the reader to verify that D[a, b] is a normed linear space.

Example 3. The space: of finitely nonzero sequences together with thenorm equal to the stim of the absolute values of the nonzero componentsis a normed linear s·pace. Thus the element x = {e1~ ~2' ••• , ~"' 0, 0, ...}has its norm defined as Ilxll = 2':7= 1 lell· We may easily verify the threerequired properties by inspectio.n.

Example 6. The space of continuous fl. 'lctions on the interval fa, b]becomes a normed space with the norm of a function x defined asIlxll = J~ Ix(t) Idt. This is a different normed space than C[a, b].

Exampk 5. Euclidean n-space, denoted E", consists of n-tuples with thenorm of anelement x = {e·1, e2"'" e,J defined as Ilxll = (L~=l le iI2)l f2.

This definition obviom.ly satisfies the first and third axioms for norms. Thetriangle inequality fOlr this norm is a well-known result from finite­dimensional vector spaces and is a special case of the Minkowski inequalitydiscussed in Section 2.10. The space En can be chosen as,a real or complex

.. space by considering rl;:al or complex n-tuples. We employ the same nota­tion' En for both because it is generally apparent from context which ismeant.

~

Example 6. We considler now the space BV[a, b] consisting of functionsof bounded .variation 4)n the interval [a, b]. By a partition of the interval[a, b], we mean afinit(~ set of points t i E [a, b]~ i = ·0, 1,2, ... , n, such thata = to < 11 < t2 < ....< tn =b. A function x defined on [a, b] is said to

--"'1""'--".

Page 42: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

24 LINEAR SPACES 2

be of bounded variation if there is a constant K so that for any partitionof [a, b]

11

L"Ix(t j ) - x(ti-t)! :$ K.i= 1

The total variation of x, denoted T.V.(x), is then defined as

n

T.Y.(x) = sup L Ix(ti ) - xCti-I)1i = 1

where the supremum is taken with respect to all 'partitions of (a, b]. Aconvenient and suggestive notation for the total variation is

b

T.Y.(x) = f Idx(t)l·a

The total variation of a constant function is zero and the total variation ofa monotonic function is the absolute value of the difference between thefunction values at the end points a and b.

The space BV[a, bJJs defined as the space of all functions of boundedvariation on [a, b] together with the norm defined as

Ilxll = Ix(a)1 +T.V.(x).

2.7 Open and Closed Sets

We come now to the concepts that are '''{undamental to the study oftopological properties.

Definition. Let P be a subset of a normed space X. The point pEP issaid to be an interior point of P if there is an e > Osuch that all vectors xsatisfying Ilx- pll < B are also members of P. The collection of all in~

teriar points of P is called the interior of P and is denoted P.

We introduce the notation Sex, e) fof the (open) sphere centered at xwith radius e; that is, Sex, e) ={y; Ilx - yll < e}. Thus, according to theabove definition, a point x is an interior point of P if there is a sphereS(x, 6) centered at x and contained in P. A set may have an empty interioras, for example, a set consisting of a single point or a line in £2.

Definition. A set P is said to be open if P =p.

The eml1ty set is open since its interior is also empty. The entire spaceis an open set. The unit sphere consisting of all vectors x with Ilxll < 1 isan open set. We leave it to the reader to verify that Pis open.

Page 43: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

;i

II!1

\

§2.7 OPEN AND CLOSED SETS 25

Definition. A point x E X is said to be a closure point of a set P if, givene > 0, there is a po,int pEP satisfying lIx - pll < e. The collection of allclosure points of P is called the closure of P and is denoted P.

In other words, a point x is a closure point of P if every sphere centeredat x contains a point of P. It is clear that Pc:: P.

Definition. A set P is said to be closed if P = p.

The empty set and the whole space X are closed as well as open. Theunit sphere consisting of all points x with Ilxll < 1 is a closed set. A singlepoint is a closed set. It is clear that is = P.

Proposition 1. The complement of an open set is closed and the complementof a closed set is open.

Proof. Let P be an open set and 15 = {x : x ¢ P} its complement. Apoint x in P is not a closure point of P since there is a sphere about xdisjoint from P. Thus P contains all its closure pomts and is thereforeclosed.

Let S be a closed set. If xeS, then x is not a closure point of Sand,hence, there is a sphe:re about x which is disjoint from S. Therefore x isan interior point of S. We conclude that S is open. II

The proofs of the following two complementary results are ~eft to thereader.

Proposition 2. The intersection of a finite number of open sets is open; theunion ofan arbitrary collection of open sets is open.

Proposition 3. The union ofa finite number ofclosed sets is closed; the inter­section of an arbitrary collection of closed sets is closed.

We now have two topological operations, taking closures and takinginteriors, that can bl~ applied to sets in normed space. It is natural toinvestigate the effect of these operations on convexity, the fundamentalalgebraic concept of vector space.

Proposition 4. Let C be a convex set in a normed space. Then E and Careconvex.

Proof. If C is empty, it is convex. Suppose xo , Yo are points in E.Fix ct., 0 < ct. < 1. We must show that Zo = OCXo + (1 - oc)Yo E C. Givens> 0, let x, y be selected from C such that /Ix - XO II < e, IIY -"Yoll < e.Then Ilea + (1 - ~)y - exxo - (1 - ct)Yoll < e and, hence, Zo is within adistance 8 of z = aX + (l - a)y which is in C. Since B is arbitrary, itfollows that Zo is a closure point of C.

Page 44: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

26 LINEAR SPACES 2

If C is empty, it is convex. Suppose Xo, Yo E (; and fix ex, 0 < ex < 1. Wemust show that Zo = iXXo + (1 - ex)Yo E C. Since xo, Yo € C, there is an8 > 0 such that the open spheres S(xo , e), S(Yo, e) are contained in C. Itfollows that aU points of the form Zo + w with llwll < e are in C sinceZo + w = ex(xo + w) + (1 - ct)(Yo + w). Thus, Zo is an interior point of C. I

Likewise, it can be shown that taking closure preserves subspaces,linear varieties, and cones.

Finally, we remark that all of the topological concepts discussed abovecan be defined relative to a given linear variety. Suppose that P is a setcontained in a linear variety V. We say that p E: P is an interior point of Prelative to V if there is an e > 0 such that all vectors x E V satisfyingIlx - pll < E are also members of P. The set P is said to be open relativeto V if every point in P is an interior point of P relative to V.

I n case V is taken as the closed linear variety generated by P, i.e., theintersection of an closed ~inear varieties containing P, then x is simplyreferred to as a relative interior point of P jf it is an interior point of Prelative to th~ variety V. Similar meaning is given to relatively closed, etc.

2.8 Convergence

In order to prove the existence of a vector satisfying a desired property,it is common to establish an appropriate sequence of vectors convergingto a limit. In many cases the limit can be shown to satisfy the requiredproperty. It is for this reason that the concept of convergence plays animportant role in analysis.

Definition. In a normed linear space an infinite sequence of vectors {xn }

is said to converge to a vector x if the sequence {llx - XliiI} of real numbersconverges to zero. In this case, we write X n ~ X.

If X n ~ x, it follows that II XII II ~ Ilxll because, according to Lemma I,Section 2.6, we have both Ilxnll - !Ixll < Ilxl1 - xli and Ilxll - lIx~11 ::::;Ilxl1 - x 1'1 which .implies that Illxnll - Ilxlll < Ilx" - xiI -. O.

In the space E" of n-tuples, a sequence converges if and only if eachcomponent converges; however I in other spaces convergence is lilot alwayseasy to characterize. In the space of finitely nonzero sequences; define thevectors ('{= {O, 0, ... , 11, 0, ...}, the i-th vector having each of its com­ponents zero except the i-th which is 1. The sequence of vectors {eJ(which is now a sequence of sequences) converges to zero componentwise,but the sequence does not converge to the null) vector since lleill = 1 for.all i.

An important observation is stated in the following proposition.

Page 45: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.9 TRANSFORMATIONS AND CONTINUITY 27

Proposition 1. Ifa sequence converges, its limit is unique.

Proof. Suppose XII -+ X and XII -+ y. Th~n

IIx - yll.= llx - XII +Xn - ylI::; I\x - xnll + Ilxn - yll ~ O.

Thus, x =y. IAnother way to state the definition of convergence is in terms of spheres.

A sequence {xn} converges to x if and only if given e > 0; the sphereS(X, 8) contains x~ for all n greater than some number N.

The definition of convergence can be used to characterize closed setsand provides a useful alternative t6 the original definition of closed sets.

Proposition 2. A set F is closed if and only if every convergent sequencewith elements in F has its limit in F.

Proof The limit of a sequence fromF is obviously a closure point ofF and, therefore, must be contained in F if F is closed. Suppose now that Fis not closed. Then there is a Closure point x of F that is not in F. In eachof the spheres Sex, lin) we may sel~ct a point xn e F since x is a closurepoint. The sequence {x,,} generated in this way converges to x rj: F. •..

·2.9 Transformations and CODtinuity

The objects that make the ..study of linear spaces interesting and usefulare transformatiorts.

Definition. Let X and Y be linear vector spaces and let D be a subset ofX~ A rule which associates with every element XED an element ye Yis said tobe a transformation from Xto Ywith domain D. Ify correspondsto x underT, we write y = rex).

Transformations on vector spaces become increasingly more importantas we progress through this book; they are treated in some detail beginningwith Chapter 6. It is the ;purpose of this section to introduce some commonterminology that· is conyenient for describing the simple transformationsencountered in the early chapters.

If 'a specific domain is not explicitly mentioned when discussing atransformation on a vector space Xf it is understood that the domain isX itself. If for every ye Y there is at most one XED for which T(x) = y,the transformation T is said to be one-to-one. If for every y E Y there is atleast one xeD for which T(x) = y, Tis said to be onto Of, more precisely,to map D onto Y. This terminology, as the notidn of a transformationitself, is ofcourse an extension of the familiar notion of ordinary functions.~ transformation is simply a fu.nction c.:.;:fined on one vector space X while

Page 46: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

28 LINEAR SPACES 2

taking values in another vector space Y. A special case of this situation isthat in which the space Y is taken to be the real line.

Definition. A transformation from a vector space X into the space of real(or complex) scalars is said to be afunctional on X.

In order to distinguish functionals from more general transformations,they are usually denoted by lower case letters such asf and g. Hence,f(x)denotes the scalar that f associates with the vector x E X.

On a normed space, f(x) = Ilxll is an example of a functional. On thespace C [0, I], examples of functionals are flex) = xC!), f2(X) = J~ x(t) dt,/.3 (x) = max x4 (t), etc. ReaJ-valued functionals are of direct interest

o~t ~ 1

to optimization theory, since optimization consists of selecting a vectorin minimize (or maximize) a prescribed functional.

Definition. A transformation T mapping a vector space X into a vectorspace Y is said to be linear if for every Xl' Xl E Xand all scalars Cit, (X2 wehave T(a1x 1 + 0: 2 x 2 ) = C(t T(X t ) + a 2 T(x 2 ). I

The most familiar example of a linear transformation is supplied by arectangular.. m x n matrix mapping elements of Rn into R'". An exampleof a linear transformation mapping X = C [a; b] into X is the integraloperator T(x) = J~ k(t, r)x(-r) dr where k(t, T) is a function continuous onthe square a < 1 :S b~ a < r < b.

Up to this point we have considered transformations mapping oneabstract space into another. If these spaces happen to be normed, it ispossible to define the notion of corttinuity.

Definition. A transformation T mapping a llormed space X into a normedspace Y is continuous at Xo E X if for every e > °there is a l5 > 0 such thatIlx - xoll < b implies that llT(x) - T(xo)11 < e.

Note that continuity depends on the norm in both the spaces X and Y.If T liS continuous at each point Xo E X, we sa.y that T is continuous every­where or, more simplY,that T is continuous.

The following characterization of continuity is useful in many proofsinvolving continuous transformations.

Proposition 1. A transformation T mapping a normed space X into a normedspace Y is continuous at the point Xo E X if and only If Xn~ X o impliesT(xtJ) ~ T(xo)·

Proof. The H if" portion of the statement is obvious; thus we needonly proof the H only if" portion. Let {xn} be a sequence such that oXn ---t Xo ,T(xn) -k T(xo). Then, for some £ > °and every N there is an n > N such

Page 47: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.l0 THE Ip AND Lp SPACES 29

that IIT(xn) - T(xo) II > 8. Since xn -4 xo, this implies that for every .5 > 0there is apoint x" with IlxlI - xoll < lJ and IIT(x") - T(x) II > 8. This provesthe" only if" portion by contraposition. I

*2.10 The I., and LII Spaces

In this section we discuss some classical nonned spaces that are usefulthroughout the book.

Definition. Let p be a rea'i number] < p < oo.The space Ip consists of allsequences of scalars {e 1 , ~.1 J ••• } for which

eo

L I~il" < 00.i= 1

The norm of an element x = {ei} in Ip is defined as

The space 100 consists of bounded sequences. The norm of an elementx = {~i} in /00 isdefined as

Ilxll eo -: sup l~il., i

It is obvious that the norm on Ip satisfies I1tXxll = lal Ilxlll and thatIlxll > 0 for each x :I:. O. In this section, we establish two inequalities con­cerning the I, norms, the second of which gives the triangle inequality forthese Ip norms. Therefore, the Ip fiom indeed satisfies the three axiomsrequired of a general norm. Incidentally, it follows from these propertiesof the norm that 1" is in fact a linea" vector space because, if x =gil,Y = {tJJ} are vectors in 1, ~ then for any scalars IX, /3, we have lIcxx + flylls Ilcxl !lxll + !f3IIIYlI < 00 so ~haf' ax + {3y is a vec:~or in Ip • Since 1, is, avector space and the nortn satisfies the three required axioms, we mayjustifiably refer to Ip as a normed linear vector space.

The following two theorems, although of fundamental importance for astudy of the Ip spaces, are somewhat tangential to our main purpose. Thereader wmlose little by simply scanning the proofs.

Theorem 1. (The HOlder Inequality) If p and q are positive numbers1 S P 5,00, 1 '5. q < 00, such that IIp + l/q = 1 am! if x = {e h e1 , ...} e Ip

'y = {'Iu tJ2' •..} E lq, then00

L ICi tlil < IIxll" · flYH q •1= 1

Page 48: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

1l-A=- ,

q

t...

30 LINEAR SPACES 2

Equality holds if and only if

(JtL) l/q = (M)lIPIlxllp IIYllq

for each i.

Proof The cases p = 1, 00 ; q = 00, 1 are straightforward and are leftto the reader. Therefore~ it is assumed that 1 < p < 00, 1 < q < 00. Wefirst prove the auxiliary inequality: For a ~ 0, b > 0, and 0 < A< 1, wehave

a)..b(l-J.) < Aa + (1 - A)b

with equality if and only if a = b.F or this purpose, consider the function

f(t) = fA - At + A. - 1

defined for t > O. Then f'U) = A(t)·-l - 1). Since °< A < 1, we havef'(t) > 0 for 0 < t < 1 and f'(I) < 0 for t > L It follows that for t > 0,f(t) < f( 1) = 0 with equality only for t = 1. Hence

,. t A < At + 1 - A

with equality only for t = 1. If b =f- 0, the substitution t ;= alb gives thedesired inequality, while for b = 0 the inequality is trivial.

Applying this inequality to the numbers

a-(I~i')P· b_(ll1d)q with)'=~,- Ilxllp ' - IIY\lq P

we obtain for each i

lei rrd < ~ (lli)P +.!. (M)qIlxllpllYllq- P Ilxllp q IIYllq .

Summing this inequality over i, we obtain the Holder inequality<X)

.~ l~i11d 1 1h-1 <-+-=1IlxllpllYllq- p q .

The conditions for equality follow directly from the required conditiona = b in the auxiliary inequality.•

The special case p = 2, q = 2 is of major importance. In this case,Holder's inequality becomes the wen-known Cauchy-Schwarz inequalityfor sequences:

_., ~__-"------"~---=""------~,,,,,,,,-----,,----

Page 49: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.10 THE lp AND Lp SPACES 31

Using the Holder inequality, we can establish the triangle inequalityfor'p norms.

Theorem 2. (The Minkowski Inequality) If x and yare in lp, I <p < 00,

then so is X + Y,and Ilx + Yll p ~ Ilxllp + IIYll p • For I < p < oo,equalityholdsifand only ifktx = k~: y for some positive constants k 1 and k 2 •

Proof The cases p = 1 and p = 00 are straightforward. Therefore, itis assumed that 1 < P < 00. We first consider finite sums. Clearly we maywrite

/I II n

L let + tlil P < L I~j + l1iIP-ll~d + L lei + tlil p-

11Y1il·i=1 1=1 £=1

Applying Holder's inequality to each summation on the right, we obtain

it11ei + t7iI P < Ct

Ile,: + l1i1{P-l)Q) l/q ( CtlleilP) IIp + Ctll11iIP) lIP]

= Ctle,+ lIdP!'!' [(%,le,lfP

+ Ctll'!'I')"'}Dividing both sides of this inequality by (Li,. I I~ + l1iI P)l/Q and takingaccount of 1 - Ijq = lip, we find

(lie, + '!dP)'Ip s; et,le,IP) lip + Ct.I'!,IP) liP.

Letting n~ 00 on the right side of this inequ~lity can only increase itsvalue, so

Ctll~i + 17d P) IIp < Ilxll;, + 11YlI p

for each n. Therefore, letting n ~ 00 on the left side produces Ilx + Yllp<Ilxll p + IIYll p • ,

The conditions for equality follow from the conditions for equality inthe Holder inequality and are left to the reader. I

The L p and Rp spaces are defined analogously to the lp spaces. Forp ~ 1, the space Lp[al b] consists of those real-valued measurable func­tions x on the interval [a, b] for which Ix(t)IP is Lebesgue integrable. Thenorm on this space is defined as

IIxllp = (f:'X(tW' dt) tiP. I

Unfortunately~ on this space Ilxllp = 0 does not necessarily imply x = 8since x may be nonzero on a set of measure zero (such as a set consistingof a countable number of points). If, however, we do not distinguish

Page 50: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

, I

~ ,

32 LINEAR SPACES 2

between functions that are equal almost everywhere,1 then Lp[a-~ b] is a 'normed linear space.

The space Rla, b] is defined in an analogous manner but with attentionrestricted to functions x for which Ix(t)] is Riemann integrable. Since allfunctions encountered in applications in this book are Riemann integrable,we have little reason,at this point in the deve]opment, to prefer one ofthese spaces over the other. Readers unfamiliar with Lebesgue measureand integration theory los'e no essential insight by considering the R p

spaces only. In the next section, however, when con'sidering the notion ofcompleteness, we find reason to prefer the L p spaces over the R p spaces.

The space Loo[a, bJ is the function spaCe analog of 100 " It is defined asthe space ofaULebesgue measurable functions on [a, b] which are bounded,except possibly-on a set of measure zero. Again, in this, space, two functionsdiffering only on a set of measure zero are regarded as equivalent.

Roughly speaking, the norm of an element x in Lco[a, b] is defined assup Ix(t)l. This quantity is ambiguous, however, since an element x does

Q ~t ~b

not correspond uniquely to any given function but to a whole class offunctions difftring on a set o'r measure zero. The 'value sup Ix(t)1 is

a ~t:sb

different for the different functions equivalent to x. The norm ofa functionin Loo[a, b] is therefore defined as

IIxll oo = essential supremum of Ix(t)~

=infimum [sup ly(t)l])1(1) =x(t) a.e.

For brevity, we write IlxlLJ:> = ess slip Ix(t)l.

',r Example 1. Consider the function

"x(t) = g-1

2 t E [ -1, IJ~

t = 0t#O

shown in Figure 2:6. The supremum of this function is equal to 2 butthe essential supremum is 1.

There are -Holder and Minkowski inequalities for the L p spaces analogousto the corresponding results for the Ip spaces.

Theorem 3. (The Holder Inequality) If x E Lp[a, b], y ELla, b], lip +l/q=l,p,q>l~then '

b

J.. \x(t)y(t)1 dt< IlxllpllYliq.

:I. Two functions are sajd to be equal almost everywhere (a.e.) on [a, b] if they differon a set of Lebesgue measure zero.

Page 51: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

.----""-----.-·~~---~..------.._·..IS ..

§2.l1 BANACH SPACES 33

Eq~lity holds if and only if

(fXCtJ1)P = (ly(t)I)qIlxll p IIYll q

almost everywhere on [a, b].

Theorem 4. (The Mink,owski IneqUillity) If x and yare in Lp[a, b], then sois X + y, and Ilx + Yll p < Ilxll p + lIYll p •

The proofs of these inequalities are similar to thJse for the lp-"spacesand are omitted here.

xU)

2

__.....a ---l ~_L__~ t

oFigure 2.6 The function for Example 1

2.11 Banach Spaces

Definition. A sequence {xli} in a normed space is said to be a Cauchy.sequence if II XII - Xm II -~ 0 as nJ m~ 00 ; i.e. J given B > 0, there is an integerN such that Ilx" - xmll < B for aU n, m > N.

In a normed space" every convergent sequence is a Cauchy sequencesince, if xn -t x, then

IIxn - xmll == IIxn - x + x - xmll ~ Ilxn - xii + Ilx - xmll-loO.In general, however, a. Cauchy sequence may not be convergent.

N armed spaces in which every Cauchy sequence is convergent are ofparticular interest in analysis ;in such spaces it fs possible to identifyconvergent sequences without explicitly identifying their limits. A spacein which every Cauchy sequence has a limit (and is therefore convergent)is said to be complete.

Page 52: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

iI

I,·· 34 LINEAR SPACES 2

I·t...

Definition. A normed linear vector space X is complete if every Cauchysequence from X has a limit in X. A complete uormed linear vector spaceis called a Banach space.

We frequently take great care to formulate problems arising in applica­tions as equivalent problems in Banach space rather than as problems inother, possibly incomplete, spaces. The princ~pal advantage of Banachspace in optimization problems is that when seeking an optimal vectormaximizing a given objective, we often construct a sequence of vectors,each member of which is superior to the preceding members; the desiredoptimal vector is then the limit of the sequence. In order that the schemebe effective, there must be available a test for convergence which can beapplied when the limit is unknown. The Cauchy criterion for convergencemeets this requirement provided the underlying space is complete.

We now consider examples of incomplete normed spaces.

Example 1. Let X be the space of continuous functions on [0, 1J withnorm defined by Ilxll = glx(t)1 dt. One may readily verify that X is anormed linear space. Note that the space is not the space C [0, IJ since thenorm is different. We show that Xis incomplete. Define a sequence ofelements in X by t\\e equation

for

for

..1.__.. ',

I

11· ;I•

.'j.•.. ,•. ;I "

:·j··:.I.:1·;j !:I ",,-~~.

,(0

n

x (t) = .,' nt - - + 1" 1 2

1

1 1O<t<---- - 2 n

1 1 1---<t<­2 n - - 2

1for t >­

"-2

,

'

;fl." !( , ,., ,I It, ;,~,,:; t, !"'~."~' ,~

i~l:~

M ..."

This sequence of functions is illustrated in Figure 2.7. Each memberof the sequence is a continuous function and thus a member of thespace X. The sequence is Cauchy since, as is easily verified, "XII - xmll =tll/n - l/ml ~ O. It is obvious, however, that there' is no continuousfunction to which the sequence converge~.

Example 2. Let X be the vector space of finitely nonzero sequencesx = gil e2 , ••• , en, 0, 0, ...}. Define the norm of an element x asIlxll = max leJ Define a sequence of elements in X by

j

Page 53: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

i .•

f

§2.11 BANACH SPACES 35

Each XII is in X and has its norm equal to unity. The sequence {XII} is aCauchy sequence since~, as is easily verified,

IIx.~ - xmU = max {lin, 11m} -+ 0.

It is obvious, howev(~r, that there is no element of X (with a finitenumber of nonzero components) to which this sequence converges.

x

IL.-_~~L-L.LL..~ ..L..._~ to

figure 2.7 Sequence for Example 1

The space E 1, the real line, is the fundamental example of a completespace. It is assumed here that the completeness of E 1, a major topic inelementary analysis, is well known. The completeness of E 1 is used toestablish the completeness of various other normed spaces.

We establish the completeness of several important spaces that are usedthroughout this book. For this purpose the following lemma is useful.

Lemma 1. A Cauchy sequence is bounded.

Proof Let {xn} be a Cauchy sequence and let N be an integer suchthat IIx lI - xNll < 1 fiOr n > N. For 11 > N, we have

IlxlIll == !Ix" - xN+ xNII < IIxN II + Ilx,,- xN11 < IIxNIl + 1. IExample 3. C[O,1] is a Banach spac .. We have previously consideredthis space as a norJll1ed space. To prove that C ro, 1] is complete, it isonly necessary to show that every Cauchy sequence in C [0, 1] has a limit.Suppose {xn}is a Cauchy sequence in C [0, 1]. For each fixed t E [0, 1J,I,xit) - xm(t)1 < llx" - xmll -+ 0, so {xlI(t)} is a Cauchy sequence of realnumber,s. Since the set of real numbers is complete, there is a real numberx(t) to which the sequence converges; XII(t) 4 x(t) .•Therefore, the functionsxn converge pointwiise to the function x.

We prove next that this pointwise convergence is actually uniform int E [0, 1], Le., given e> 0, there is an N such that IXl/l(t) - x(t)\ < e for all

Page 54: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

II

36 LINEAR SPACES 2

tfV

.tit!!

,{ .!

I:'

r

t E [0, 1] and n > N. Given E: > 0, choose N such that IIxn - xmll < B/2 forn, m > N. Then for n > N

Ixn(t) - x(t)1 < Ixn(t) - xm(t) I + Ixm(t) - xCt)1

< Ilxn - xmll + Ixm(t) - xCt)l.

By choosing m sufficiently large (which may depend on t), each term onthe right can be made smaller than e/2 so Ixit) - x(t}1 < & for n > N.

We must still prove that the function x is continuous and that thesequence {x,J converges to x in the norm of C[O, 1]. To prove the con­tinuity of x, fix e> O. For every b, t, and n,

[xCt + 15) - xCt)[ < Ix(! + 15) - xn(t + <5)1

+ Ix,lt + <5) - Xn(t) I + Ix,lt) - x(t)l·

Since {xn} converges uniformly to x, n may be chosen to make both thefirst and the last terms less than 2/3 for all b. Since xn is continuous, e5 maybe chosen to make the second term less than B/3. Therefore, x is continu­ous. The convergence of X n to x in the C [0, 1Jnorm is a direct consequenceof the uniform convergence.

It is instrucdve to reconcile the completeness of C [0, 1] with Example 1in which a sequence of continuous functions was shown to be Cauchy butnonconvergent ~ith respect to the. integral norm. The difference is that,with respect to the C [0, i] norm, the sequence defined in Example 1 isnot Cauchy. The reader may find it useful to compare these two cases indetail.

Example 4. Ip, 1 ~ p S 00 is a Banach space. Assu~e first that 1 <p < 00.

Let {x,,} be a Cauchy sequence in lp. Then, if xn = gi, ~2' ... }, we have

Hence, for each k, the sequence {en is a Cauchy sequence of real numbersand, therefore, converges toa limit ~k' We show now that the elementx = {~1' ~2, ...} is a vector in Ip' According to Lemma 1, there is a constant

. M which bounds the sequence {xn}. Hence, for all nand k

k

L l~ilP < Ilxnll P < MP.i;;;: 1 .

Since the left member of the inequality is a finite sum, the j'aequalityremains valid as n ...... 00; therefore,

k

L l~ilP ::; MP.i;;;: 1

Page 55: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

"

§2.11 BANACH SPACES 37

Since this inequality holds uniformly in k, it follows that

CXl

I l~ilP ~ Ml'i= 1

and x e lp; the norm of x is bounded by Ilxll ~ M.It remains to be shown that the sequence {xn} converges to x in the II'

harm. Given 8 > 0, there is an N such that

k

L I~i - ~ilP < IlxII - xmll P ~ eL== 1

for n, m > N and for each k. Letting m -.:,. 00 in the finite sum on the left,we conc~ude that

k

L I~i - eiIP ~ ei= 1

for n > N. Now letting k -+ 00, we deduce that IlxlJ - xllP:s; 6 fOT n > Nand, therefore, X n -+ x. This completes the proof for 1 "5,. p < 00.

Now let {XII} be aCauchy sequence in 'CXl' Then le~ - e~1 s:; Ilxn - xmll -.:,. O.Hence, for each k there is a reai number ~k such that ~k -i> ek • Furthermore,this convergence is uniform in k. Let X = {~1' ~2' ...}. Since {xn} is Cauchy,there is a constant M such that for all. , !lxnll < M. Therefore, for each kand each n, we have I~kl < Ilxnll < M from which it follows that x E Zooand Ilxil < M.

The convergence ofx" to x follows directly from the uniform convergenceof e~ -. ~k'

Examplt! 5. Lp[O, Il!. 1 ~p < 00 is a Banach space. We do not prove thecompleteness of the Lp spaces because the proof requires a fairly thoroughfamiliarity with Leb(~sgue integration theory. Consider instead the spaceRp consisting of all functions x on [0, 1] for which Ixl P is Riemannintegrable with norm defined as

IIxll =u: Ix(t)I' dt)"'.The Dormed space Rp is incomplete. It may be completed by adjoiningto it certain additional functions derived from Cauchy sequences in Rp •

In this way, Rp is imbedded in a larger normed space which is complete.The smallest complete space containing R p is L p • A general method forcompleting a normed space is discussed in Problem 15.

Exampli! 6. Given two normed spaces X, Y, we consider the product spaceX x Y consisting of ordered pairs (x, y) as defined in Section 2.2. Thespace X x Y can be normed in several ways such as II(x, Y)II = Ilxll + Ilyll

Page 56: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

38 LINEAR SPACES 2

or II(x,y)11 = max {llxll, Ilyll} but, unless specifically noted otherwise, wedefine the product nQrm as H(x, y)11 = Ilxll + l1yll. It is simple to show thatif X and Yare Banach spaces, the product space X x· Y with the productnorm is arso a Banach space.

2.12 Complete Subsets

The definition of completeness has an obvious extension to subsets of anormed space; a subset is complete if every Cauchy sequence from thesubset converges to a limit in the subset. The following theorem states thatcompleteness and closure are equivalent in a Banach space. This isnot so in general normed space since, for example, a normed space isalways closed but not necessarily complete.

Theorem 1. In a Banach space a subset is complete if and only iJit is closed.

Proof A complete subset is obviously closed since every convergent(and hence Cauchy) sequence has a limit in the subset. A Cauchy sequencefrom a closed subset has a limit somewhere in the Banach space. Byclosure the limit must be in the subset. I

The following theorem is of great importance in many applications.

Theorem 2. In a mormed linear space, any finite-dimensional subspace iscomplete.

Proof The proof is by induction on the dimension of the subspace.A one-dimensional subspace is complete since, in such. a subspace, allelements have the form x = ae where (J. is an arbitrary scalar and e is afixed vector. Convergence of a sequence ctn e is equivalent to convergenceof the sequence of scalars {Cl,J and, hence, completeness follows from thecompleteness of E.

Assume that the theorem is true for subspaces of dimension N - 1.Let X be a normed space and M an N-dimensional subspace of X, Weshow that M is complete.

Let {e l , e2 , ... , eN} be a basis for M. For each k, define

Ok = inf Ilek - LC(jejll.a/I! j*k

The number Ok is the distance from the vector ek to the subspace M k

generated by the remaining N - 1 basis vectors. The number tJ k is greaterthan l.ero because otherwise a sequence of vectors in the N - 1dimensionalsubspace M k could be constructed converging to eli.' Such a sequencecannot exist since M k is complete by the induction hypothesis.

Page 57: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.13 EXTREME VALUES OF FUNCTIONALS AND COMPACTNESS 39

Define 1> > 0 as the minimum of the lJk , k = 1,2, "', N. Suppose that{xn} is a Cauchy sequence in M. Each Xli has a unique representation as

For arbitrary n, m

Ilx" - xmll = f (17 - l~)ell ~ IA~ - Ak'llJi::: 1 ,

for each k, 1 S k < N. Since Ilx,. - xmll -+ 0, each IA7 - Akl -4 O. Thus,{lU~ 1 is a Cauchy sequence of scalars and hence convergent to a scalarAk' Let x = L;;=1 Akek. Obviously, x E M. We show that Xli ~ x. For alln, we have

N

Ilxn - xii = L (A~ - Ak)ek < N· max IA~ - Akl • Ilekll,k=l l~k~N

but since 11k- Akl ~ 0 for all k, IIxn - x II ~ O. Thus, {x,,} converges toxeM. I

*2.13 Extreme Values of Functionals and Compactness

Optimization theory is largely concerned with the maximization orminimization of real functionals over a given subset; indeed, a majorportion of this book is concerned with principles for finding the points atwhich a given functional attains its maximum. A more fundamentalquestion, however, is whether a functional has a maximum on a given set.In many cases the answer to this is easily established by inspection, but inothers it is by no means obvious.

In finite-dimensional spaces the well-known Weierstrass theorem, whichstates that "a continuous function defined on a closed and bounded (com­pact) set has a maximum and a minimum, is of great utility. Usually, thistheorem alone is sufficient to establish the existence of a solution to agiven optimization problem.

In this section we generalize the Weierstrass theorem to compact setsin a normed space, thereby obtaining" a simple and yet general resultapplicable to infinite-dimensional problems. Unfortunately, however, therestriction to compact sets is so severe in infinite·dimensional normed

~

spaces that the Weierstrass theorem can in fact only be employed in tbeminority of optimization problems. The theorem, however, deservesspecial attention in optimization theory if only because of the finite­dimensional version. The: interested reader should also conslllt Section 5.10.

Page 58: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

L.\I;·I.·

40 LINEAR SPACES 2

Actually, to prove the Weierstrass theorem it is not necessary to assumecOli1timiity of the functional but onfy upper semicontinuity. This addedgenerality is often of great utility.

Definition. A (real-valued) functional/ defined on a normed space X issaid to be upper sClmicontinuous at Xo if, given e > 0, there is a f> > 0 such.that f(x) -/(xo) < e for Ilx ~ Xo II < 6. A functional/is said to be lowersemicontinuous at X o if - fis upper semicontinuous at xo'

As the reader may verify, an equivalent definition is thatf~s upper semi~

continuous at X o if 3 lim suPx-+xo!(x) < f(x o)' Clearly, if f is both upperand lower semicontinuous, it is continuous.

Definition. A set K in a normed space X is. said to be compact if, given anarbitrary sequence {xJ in K, there is a subsequence {x;J converging to anelement x E K.

In finite dimensions, compactness is equivalent to being closed andbounded, but, as is shown below, this is not true in genera~ normed space.Note, however, that a compact set K must be complete since any Cauchysequence from K must have a limit in K.

Theorem 1. (Weierstrass) An upper semicontinuousfunctional on a compactsubset K ofa normed linear space X achiepes a maximum on K.,

Proof. Let M = sup f(x) (we allow the possibility M =oo). There isxeK

a sequence {Xi} from K such that !(x i ) -lo M. Since K is compact, thereis a convergent subsequence Xi" -lo X E K. Clearly, f(Xi,,) -lo M and, since fis upper semicontinuous, f(x) > lim/ex i,,1 = M; Thus, since lex) must bennite, we conclude that M < 00 and that f(x) i M. I

We offer now an example of a continuous functiona[ on the unit sphereof C [0, 1J which does not attain a maximum (thus proving that the unitsphere is not compact).

Example 1. Let the functional f be defined on C [0, n by

1/2 1

f(x) = J x(t) dt - f xCt) dt.o 1/2

It is easily verified that.f is continuous since, in fact, If(x)1 < Ilxll. Thesupremum of f over the unit sphere in C [0, 1] is 1, but no continuous[unction of norm Jess than Unity aChieves this supremum. (If the problem

3 The lim sup of a functional on a normed space is the obvious extension of thecorresponding definition for functions of a real variable. .

Page 59: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.14 QUOTIENT SPACES 41

were formulated in Loo[O, IJ, the supremum would be achieved by afunction discontinuous. at t = t.)

Example 2. Suppose that in the above example we restrict our attentionto those continuous functions in C [0, t] within the unit sphere which arepolynomials of degree n or less. The set of permissible elements in C [0, 1]is now a closed, bounded subset of the finite-dimensional space of n-thdegree polynomials and is therefore compact. Thus Weierstrass's theoremguarantees· the existence of a maximizing vector from this set..

*2.14 Quotient Spac(~

Suppose we select a subspace M from a vector space X and generatelinear varieties V in X by translations of. M. The linear varieties obtainedcan be regarded as th(~ elements of a new vector space called the quotientspace of X modulo kt and denoted XIM. If, for example M is a planethrough the origin in three-dimensional space, XIM consists of the familyof planes parallel to 114. We formalize this definition below.

Definition. Let M be: a subspace of a vector space X. Two elementsXl' X2 E X are said to be equivalent modulo M jf Xl - x 2 EM. In this case,we write Xl == x 2 •

This equivalence relation partitions the space X into disjoint subsets, orclasses, of equivalent elements: namely, the linear varieties that are distincttranslates of the subspaceM. These classes are often called the cosets of M.Given an arbitrary element x E X, it belongs to a unique coset of M whichwe denote4 by [x].

Definition. Let M be a subspace of a vector space X. The quotient spaceXIM consists of all cosets of M with addition and scalar multiplicationdefined by [Xl] + [x2] = [Xl + Xl], (X[X] = [(XX].

Several things, which we leave to the reader, need to be verified in orderto justify the above definition. It must be shown that the definitions foraddition and scalar multiplication are independent of the choice ofrepresentative elements, and the axioms for a vector space must be verified.These matters are easily proved, howevl',r, and it is not difficult to see thataddition of cosets [Xl] and [X2] merely amounts to addition of the corre­sponding linear varieties regarded as sets in X. Like-vise t multiplication ofa coset by a scalar ex (except for ex = 0) amounts to multiplication of thecorresponding linear variety by ('1,

4 The notation [x] is also used for the subspace generated by x but the usage is alwaysclear from context.

Page 60: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

42 LINEAR SPACES 2

Suppose no~ that X'is a normed space and that Mis 3; c1o~eq subspaceof X. We define the norm o'r a coset (x] E XIM by

II [x]l! = inf Ilx + mil,nlEM

i.e., II [x] II is the infimum of the norms of all elements in the coset [x]. The.assumption that M is closed insures that 11 [x] II > 0 if [x] ~ e. Satisfactionof the other two axioms for a norm is easily verified. In the case of X beingtwo dimensional, M one dimensional;: and XIM consisting of parallellines J the quotient norm of one of the lines is the minimum distance of theline from the origin.

Proposition 1. Let X be a Banach space, M a closed subspace of X, and XIMthe quotient space 'rl'ith the quotient norm defined as above. Then XIM is alsoa Banach space.

The proof is left to the reader.

*2.15 Denseness and Separability

We conclude this chapter by introducing one additional topological con­cept, that of denseness.

Definition. A set D is said to be dense in a normed space Xif for eachelement x E X and each e > 0 there exists d E D with II x - dll < 8.

f

If D is dense in X, there are points of Darbitrarily close to each x E X.Thus, given X J a sequence can be constructed from D which converges tox. It follows that equivalent to the' above· definition is the statement: Dis dense in X if 15, the closure of D, is X.

The definition converse to the above is that of a nowhere dense set.

Definition. A set E is said to be nowhere dense in a normed space X if Econtains no open set.

The classic example of a dense set is the set of rationals in the real line.Another example, provided by the well-known Weierstrass approximationtheorem, is that the space of polynomials is dense in the space C [a, b].

Definition. A normed space is separable if itcontains a countable dense set.

Most, but not all, of the spaces considered in this book are separable

Example 1. The space Ell is separable. The collection of vectorsx = (~1' ~2' ... , ,") having rational components is countable and densein En.

Page 61: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

§2.16 PROBLEMS 43

Example 1. The lp spaces,l < p < ~~ are separable. To prove separability,let D be the set' consisting of all. finitely nonzero sequences with rationalcomponents. D is easily seen to be countable. Let x . {el' e2' ...}exist inlpl 1 <p < 00, and fix e > O. Since If;. l~ilP < 00, there is an N such thatL~N+ 1 led" < e/2. For each k, 1 ~ k < N, let 'k be a rational such thatleA: - r"p' < s/2N; let d =: {r t , r2' ... , rN to, 0, ... }. ClearlYt de D andIlx - dil P < e. Thus, D is dense in 11"

The space 100 is not separable.

Example 3. The spacl~ C [a, bJ is separable. Indeed, the countable setconsisting of all polynomials with rational coefficients is dense in C [a, b].Given x e C [a, b] and e > 0, it follows from the well-known Weierstrassapproximation theorem that a polynomial p can be found such thatIx(t) - p(t)1 < 6/2 for all t E [a, b]. Clearly~ there is another polynomial rwith rational coefficil~nts such that Ip(t) - r(t)1 < e/2 for all t E [a, b]can be constructed by changing each coefficient by less than e/2N whereN - 1 is the order of the polynomial p). Thus

Ilx - rll = max Ix(t) - r(t)1 < max Ix(t) - p(t)1 + max Ip(t) - r(t)\te[o,b]

Example 4. The Lp spaces 1 <p < 00 are separable but Loo is not separable.The particular space L z is considered in great detail in Chapter 3.

2.16 Problems

1. Prove Proposition 1, Section 2.2.2. Show that in a vector space (-ct)x = 0:( -x) = -(o:x), o:(x - y) =

C(x - ay, (a - {J)x := (Xx - {3x.3. Let M and N be s1llbspaces in a vector space. Show that [Mu N] =

M+N.4. A convex combination of the vectors xi' Xz ~ . ", xn is a linear com­

bination of the form !XIXI + !X2 x 2 + ... + iXnxn where IXi > 0, for eachi; and !Xl + 0:2 + ... + (XII = 1. Given a set S in a vector space, let K bethe set of vectors consisting of all convex combinations from S. Showthat K = co(S).

5. Let C and D be co:nvex cones in a vector space. Show that C {i·D andC + D are convex cones.

6. Prove that the union of an arbitrary collection of open sets is open andthat the intersection of a finite collection of open sets is open.

Page 62: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

, i!i, I

'1I

; (; I

~ f"

:;!;

~ : ~1 ~I! 'Ii

44 LINEAR SPACES 2

7. Prove that the intersection of an arbitrary collection of closed sets isclosed and that the union of a finite collection of closed sets is closed.

8. Show that the closure of a set S in a normed space is the smallest dosedset containing S.

9. Let X be a normed linear space and let Xl' xz , ... , xn be linearlyindependent vectors from X. For fixed y E X, show that there arecoefficients Ql' Qz , ... , an minimizing Ily - a 1x t - Qz X2 ••• - an Xn II·

10. A normed space is said to be strictly normed if Ilx + YII = Ilxll + IIYIIimplies that y = eor x = cxy for some cx.(a) Show that Lp[O, 1] is strictly norrried for 1 < p < 00.

(b) Show that if X is strictly normed the solution to Problem,9 isunique.

11. Prove the Holder inequality for p = 1, 00.

12. Suppose x = gt, ~2' ... , ~"' ... } E [PrJ for some Po, 1 <Po < 00.

(a) Show that x E lp for all p > Po. '

(b) Show that Ilxllo:l = lim Ilxll p 'P-+ <Xl

13. Prove that the normed space DCa, b] is complete.14. Two vector spaces, X and Y, are said to be isomorphic if there is a

one-to-one mapping T of X onto Y such that T(CX1X l + cx2 Xl) =cx 1T(x1 ) + 0:2 T(x 2 ). Show that any real n-dimensional space is iso­morphic to Ell.

15. Two normed linear spaces, X and Y, are said to be isometricallyisomorphic if they/ ~re isomorphic and if the corresponding one-tokonemapping T satisfies IIT(x)11 = Ilxll. The object of this problem is toshow, that any normed space X is isometrically isomorphic to a densesubset of a Banach space g.(a) Let X be the set of all Cauchy sequences {xn} from X with additionand scalar multiplication defined coordinatewise. 'Show that X is alinear space.(b) If y = {xn} E X, define Ilyll = sup Ilxnll. Show that with this

n

definition, X becomes a normed space;.(c) Let M be the subspace of X consisting of all Cauchy sequencesconvergent to zero, and let j( =X/M. Show that g has the requiredproperties.

16. Let S, T be open sets in the normed spaces X and Y, respectively.Show that S x T == {(s, t) : s E S, t E T} is an open subset of X x Yunder the usual product norm.

17. Let M be all m:-dimensional subspace of an n·dimensional vectorspace X. Show that KIM is (n - m) dimensional.

18. Let X be n dimensional and Y be mdimensional. What is the dimensionof X x Y?

Page 63: Luenberger D.- Optimization bey, 1969)(171d)047155359Xbioen.utah.edu/wiki/images/4/44/OptimizationIntroduction.pdf · Gateaux and Frechet Differentials Frechet Derivatives 175 Extrema

'" .!}I'~I

ri!l'I;!.

L\;,~:;\.~

t'i

l'

!.,!

:

'. ,

§2.16 PROBLEMS 45

19. A real~valued functional lxi, defined on a vector space X, is called aseminorm if:1. Ixl ~ 0 all x E X2. Io:xl = lal'lxl3. Ix +y I < Ixi + Iyl·Let M = {x: Ixl = O} and show that the space X/M with lI[x]1I =inf Ix + ml is normed.

meAl

20. Let X be the space of all functions x on [0, 1] which vanish at allbut a countable number of points and for which

00

Ilx II = L Ix(tn)1 < 00,n= 1

where the tn are the points at which x does not vanish.(a) Show that X is a Banach space.(b) Show that X is not separable.

21. Show that the Banach space 100 is not separable.

REFERENCES

There are a number of excellent texts on linear spaces that can be used tosupplement this chapter.

§2.2. For the basic concepts of vector space, an established standard is Halmos[68].

§2.4. Convexity is an important aspect of a number of mathematical areas. Foran introductory discussion, consult Eggleston [47] or, more advanced,Valentine [148].

§2.6-9. For general discussions of functional analysis, refer to any of the follow­ing. They a,re listed in approximate order of increasing level: Kolmogorovand Fomin [88]~ Simmons [139], Luisternik and Sobolev [101], Goffmanand Pedrick [59], Kantorovich and· Akilov [79], Taylor [145], Yosida [157],Riesz and Sz-Nagy [123], Edwards [46], Dunford and Schwartz [45], andHille and Phillips [73].

§2.1O. For a readable discussion of i p and L p spaces and a general reference onanalysis and integration theory, see Royden [131].

§2.13. For the Weierstrass approximation theorem, see Apostol [10].