Notes on Optimization and Pareto Optimality

Embed Size (px)

Citation preview

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    1/68

    Notes on Optimization and Pareto Optimality

    John DugganUniversity of Rochester

    June 21, 2010

    Contents

    1 Opening Remarks 1

    2 Unconstrained Optimization 1

    3 Pareto Optimality 83.1 Existence of Pareto Optimals . . . . . . . . . . . . . . . . . . . . . . 93.2 Characterization with Concavity. . . . . . . . . . . . . . . . . . . . . 123.3 Characterization with Differentiability . . . . . . . . . . . . . . . . . 18

    4 Constrained Optimization 19

    5 Equality Constraints 215.1 First Order Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

    5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275.3 Second Order Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 315.4 Multiple Equality Constraints . . . . . . . . . . . . . . . . . . . . . . 37

    6 Inequality Constraints 446.1 First Order Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.2 Concave Programming . . . . . . . . . . . . . . . . . . . . . . . . . . 496.3 Second Order Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 55

    7 Pareto Optimality Revisited 58

    8 Mixed Constraints 62

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    2/68

    1 Opening Remarks

    These notes were written for PSC 408, the second semester of the formal modelingsequence in the political science graduate program at the University of Rochester. Ihope they will be a useful reference on optimization and Pareto optimality for polit-

    ical scientists, who otherwise would see very little of these subjects, and economistswanting deeper coverage than one gets in a typical first-year micro class. I do notinvent any new theory, but I try to draw together results in a systematic way and tobuild up gradually from the basic problems of unconstrained optimization and opti-mization with a single equality constraint. That said, Theorem 8.2 may be a slightlynew way of presenting results on convex optimization, and Ive strived for quantityand quality of figures to aid intuition. As alternatives to these notes, I suggest Simonand Blume (1994), who cover a greater range of topics, and Sundaram (1996), whois more thorough and technically rigorous. Unfortunately, my notes are not entirelyself-contained and do presume some sophistication with calculus and a bit of linear

    algebra and matrix algebra (not too much), and worse yet, I havent been entirelyconsistent with notation for partial derivatives; I hope the meaning of my notation isclear from context.

    2 Unconstrained Optimization

    Given X Rn, f: X R, and x X, we say x is a maximizer of f if f(x) =max{f(y)|y X}, i.e., for all y X, f(x)f(y). We sayx is a local maximizeroff if there is some >0 such that for all y X B(x), f(x) f(y). And x is astrict local maximizer offif the latter inequality holds strictly: there is some >0

    such that for ally X B(x) withy =x,f(x)> f(y). We sometimes use the termglobal maximizerto refer to a maximizer off.

    Our first result establishes a straightforward necessary condition for an interior localmaximizer of a function: the derivative of the function at the local maximizer mustbe equal to zero. A direction t Rn is any vector with unit norm, i.e.,||t|| = 1.

    Theorem 2.1 LetX Rn, letx Xbe interior to X, and letf: X R be differ-entiable atx. Ifx is a local maximizer off, then for every directiont, Dtf(x) = 0.

    Proof Suppose x is an interior local maximizer, and let t be an arbitrary direction.

    Pick >0 such that B(x) Xand for all y B(x), f(x) f(y). In particular,f(x) f(x +t) for Rsmall. Then

    Dtf(x) = lim0

    f(x +t) f(x)

    0

    and

    Dtf(x) = lim0

    f(x+t) f(x)

    0,

    1

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    3/68

    Figure 1: First order condition

    and we conclude Dtf(x) = 0, as required.

    In matrix terms, assuming f is continuously differentiable, the necessary first ordercondition is writtenDf(x)t= 0. This is a generalization of the well-known first ordercondition from univariate calculus: if the graph of a function peaks at one point inthe domain, then the graph of the function has slope equal to zero at that point.In general, we see that the derivative of the function in any direction (in multipledimensions, there are many directions) must be zero. See Figure1.

    This suggests an approach for finding the maximizers of a function: we solve the

    first order condition on the interior of the domain and check to see if any of thesesolutions are maximizers. Now, however, solving the first order condition can itselfbe a rather complicated task. Considering the directions pointing along each axis(the unit coordinate vectors), we see that each partial derivative is equal to zero,

    D1f(x1, . . . , xn) = 0

    D2f(x1, . . . , xn) = 0...

    Dnf(x1, . . . , xn) = 0,

    a system ofn equations in n unknowns (x1, . . . , xn). Solving such a system can bestraightforward or impossiblehopefully the former.

    Example Letting X= R2+ andf(x) =x1x2 2x41 x22, the first order condition is

    x2 8x31 = 0x1 2x2 = 0.

    2

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    4/68

    Solving the second equation for x1, we havex1= 2x2. Substituting this into the firstequation, we have x2 64x32 = 0, which has three solutions: x2 = 0, 1/8, 1/8. Thenthe first order condition has three solutions,

    (x1, x2) = (0, 0),(1/4, 1/8),

    (1/4, 1/8),

    but the last of these is not in the domain off, and the first is on the boundary of thedomain. Thus, we have a unique solution in the interior of the domain: (x1, x2) =(1/4, 1/8).

    The usual necessary second order condition from univariate calculus extends as well.

    Theorem 2.2 LetX Rn, letx Xbe interior to X, and letf: X R be twicedifferentiable. If x is a local maximizer of f, then for every direction t, we haveD2t f(x)

    0.

    Proof Assume xis an interior local maximizer off, lettbe an arbitrary direction,and let >0 be such that B(x) Xand for ally B(x),f(x) f(y). Consider asequence{n}such that n 0, so for sufficiently high n, we have x+nt B(x),and therefore f(x+nt) f(x). For each such n, the mean value theorem yieldsn (0, n) such that

    Dtf(x +nt) = f(x +nt) f(x)

    n,

    and therefore Dtf(x +nt) 0. Furthermore, using Dtf(x) = 0, we haveDtf(x+nt) Dtf(x)

    n 0.

    Taking limits, we have D2t f(x) 0, as required.

    In matrix terms, assuming f is twice continuously differentiable, the inequality inthe previous theorem is written tD2f(x)t 0. That is, the Hessian of f at x isnegative semi-definite. It is easy to see that the necessary second order condition isnot sufficient.

    Example LetX= Rand f(x) =x3

    x4. Then Df(0) =D2f(0) = 0, butx = 0 is

    not a local maximizer.

    As for functions of one variable, we can give sufficient conditions for a strict localmaximizer. Although this result is of limited usefulness in finding a global maxi-mizer, we will see that it can be of great use in comparative statics analysis, whichinvestigates the dependence of maximizers on parameters.

    3

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    5/68

    Theorem 2.3 LetX Rn, letx Xbe interior to X, and letf: X R be twicecontinuously differentiable. IfDtf(x) = 0 andD

    2t f(x)< 0 for every directiont, then

    x is a strict local maximizer off.

    Proof Assume x is interior, Dtf(x) = 0 and D2t f(x)< 0 for every direction t, and

    suppose thatx is not a local maximizer. Then for all n, there existsxn B 1n

    (x) such

    thatf(xn) f(x). For each n, letting tn = 1||xnx||(xn x), the mean value theoremyieldsn (0, 1) such that

    Dtnf(nxn+ (1 n)x) = f(xn) f(x)

    ||xn x|| 0.

    Then, letting yn= x+n(xn x), we haveDftn(yn) Dtnf(x)

    ||yn

    x

    || 0.

    Since{tn} lies in the closed unit ball, a compact set, we may consider a conver-gent subsequence (still indexed by n) with limit t. Taking limits, we conclude thatD2t f(x) 0, a contradiction. We conclude that x is a local maximizer.

    In matrix terms, the inequality in the previous theorem is writtentD2f(x)t

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    6/68

    where we assume for simplicity that this limit exists, and let

    y = sup{f(x) : x bdX}represent the highest value offon the boundary of its domain. Assume for simplicity

    that the function f has at most a finite number of critical points. There are thenthree possibilities.

    1. The first order condition has a unique solution x.

    (a) IfD2t f(x) 0 for every direction t, thenx is the unique minimizer.

    An elementx is a maximizer if and only if it is a boundary point andf(x) max{y, y}. There may be no maximizer.

    (c) Else, x

    is the unique maximizer if and only iff(x

    )max{y

    , y}.If this inequality does not hold, then an element x is a maximizerif and only if it is a boundary point and f(x)max{y, y}. Theremay be no maximizer.

    2. There are multiple solutions, sayx1, . . . , xk, to the first order condition.An elementx is a maximizer if and only if it is a critical point or boundarypoint and

    f(x) max{y, y , f (x1), . . . , f (xk)}.

    There may be no maximizers.3. The first order condition has no solution. An element xis a maximizerif and only if it is a boundary point and f(x) max{y, y}. There maybe no maximizer.

    IfXis compact, then fhas a maximizer, simplifying the situation somewhat.

    Example Returning one last time to X = R2+ and f(x) = x1x2 2x41 x22, wehave noted that (1/4, 1/8) is the unique interior solution to the first order conditionand that the second directional derivatives offare non-positive at this point. Thus,

    we are in case 1(c). Note that whenx1 = 0 or x2 = 0, we have f(x1, x2) 0, andf(0) = 0. Thus, y = 0. Furthermore, we claim thaty = . To see this, take anyvaluec

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    7/68

    Otherwise, we have x1 < a, which implies

    x2 =

    k2 x21 >

    k2 a.

    Then we have

    f(x1, x2) (x2 x1)2 +x21 = (

    k2 a a)2 +a2 < cfor high enough k, which establishes the claim. Then f(1/4, 1/8)> 0 = max{y, y},and we conclude that (1/4, 1/8) is the unique maximizer of this function.

    As for functions of one variable, matters are greatly simplified when we considerconcave functions. Recall that a necessary condition forfto be concave is that at theHessian matrix D2f(x) be negative semi-definite at every interior x, i.e.,tD2f(x)t 0for every directiont. When the domain is open, negative definiteness is also sufficient.

    Theorem 2.4 LetX Rn be convex, letx Xbe interior to X, and letf: X Rbe differentiable atx and concave. IfDf(x) = 0, thenx is a global maximizer off.

    Proof Suppose Df(x) = 0 but f(y)> f(x) for some y X. Let t = 1||yx||

    (y x).Consider any sequence n 0, and let xn = x+nt. When n< ||y x||, note that

    xn = ||y xn||

    ||y x|| x +||xn x||||y x|| y

    is a convex combination ofx and y . By concavity, we have

    f(xn) f(x)||xn x||

    ||yxn||||yx||f(x) +

    ||xnx||||yx|| f(y) f(x)

    ||xn x|| = f(y) f(x)

    ||y x|| > 0.

    Taking limits, we have

    Dtf(x) f(y) f(x)y x > 0,

    a contradiction.

    Example AssumeX= Rn, and note thatf(x) =

    ||x

    x||2 is strictly concave. The

    first order condition has the unique solution x= y , and we conclude that this is theunique maximizer of the function. (Of course, we could have verified that directly.)

    The next result lays the foundation for comparative statics analysis, in which weconsider how local maximizers vary with respect to underlying parameters of theproblem. Specifically, we study the effect of letting a parameter, say, vary in the

    6

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    8/68

    objective function. Of course, ifx is a local maximizer given parameter , and thenthe value of the parameter changes to , thenx may no longer be a local maximizer.But we will see that under the first order and second order sufficient condition, itslocation will vary smoothly as we vary the parameter.

    Theorem 2.5 LetIbe an open interval andX Rn, and letf: XI Rbe twicecontinuously differentiable in an open neighborhood around(x, ), an interior pointofX I. Assumex satisfies the first order condition at, i.e., Dxf(x, ) = 0. IftD2xf(x

    , )t

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    9/68

    In case the matrix algebra in the preceding theorem is a bit hard to digest, we canstate the derivative of in terms of partial derivatives when n = 1: it is

    D() = Dxf(x, )

    D2f(x, )

    .

    Given the parameterization of a local maximizer from the previous theorem, wecan consider the locally maximized value of the objective function,

    F() = f((), ),

    as a function of . Since f is twice continuously differentiable and is continuouslydifferentiable, it follows that f((), ) is continuously differentiable as a functionof . The next result, known as the envelope theorem, provides a simple wayof calculating the rate of change of the locally maximized objective function as a

    function of : basically, we take a simple partial derivative of the objective functionwith respect to , holding x = () fixed. That is, although the location of theconstrained local maximizer may in fact change when we vary , we can ignore thatvariation, treating the constrained local maximizer as fixed in taking the derivative.

    Theorem 2.6 LetIbe an open interval andX Rn, and letf: XI Rbe twicecontinuously differentiable in an open neighborhood around(x, ), an interior pointofX I. Let : I Rn be a continuously differentiable mapping such that for allI, () is a local maximizer off at. Letx = (), and define the mappingF: I

    R by

    F() = f((), )

    for all I. ThenF is continuously differentiable andDF() = Df(x

    , ).

    Proof For all I, the chain rule impliesDF() = Dxf((), )D()

    +Df((), ).

    Since () is a local maximizer at , the first order condition Dxf((), ) = 0

    obtains, and the above expression simplifies as required.

    3 Pareto Optimality

    A set N ={1, 2, . . . , n} of individuals must choose from a set A of alternatives.Assume each individual is preferences over alternatives are represented by a utilityfunction ui : A R. One alternative y Pareto dominates another alternative x if

    8

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    10/68

    x

    x1

    x2

    Figure 2: Pareto optimals with Euclidean preferences

    ui(y) ui(x) for alli, with strict inequality for at least one individual. An alternativeisPareto optimalif there is no alternative that Pareto dominates it.

    Consider the case of two individuals and quadratic utility, i.e., ui(x) =||x xi||2,and an alternative x, as in Figure 2. It is clear that any alternative in the shadedlens is strictly preferred to x by both individuals, which implies that x is Paretodominated and, therefore, not Pareto optimal. In fact, this will be true wheneverthe individuals indifference curves through an alternative create a lens shape likethis. The only way that the individuals indifference curves wont create such a lensif they meet at a tangency at the alternative x, and this happens only when x liesdirectly between the two individuals ideal points. We conclude that, when there are

    just two individuals and both have Euclidean preferences, the set of Pareto optimalalternatives is the line connecting the two ideal points. See Figure 3 for elliptical

    indifference curves, in which case the set of Pareto optimal alternatives is a curveconnecting the two ideal points. This motivates the standard terminology: whenthere are just two individuals, we refer to the set of Pareto optimal alternatives asthe contract curve.

    3.1 Existence of Pareto Optimals

    It is straightforward to provide a sufficient condition for Pareto optimality of analternative in terms of social welfare maximization with weights 1, . . . , n on theutilities of individuals.

    Theorem 3.1 Letx A, and let1, . . . , n> 0 be positive weights for each individ-ual. Ifx solves

    maxyA

    ni=1

    iui(y),

    thenx is Pareto optimal.

    9

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    11/68

    x

    x1

    x2

    Figure 3: More Pareto optimals

    Proof Suppose x solves the above maximization problem but there is some alter-native y that Pareto dominates it. Since ui(y) ui(x) for each i, each term iui(y)is at least as great as iui(x). And since there is some individual, say j, such thatuj(y)> uj(x), and since j >0, there is at least one y-term that is strictly greaterthan the corresponding x-term. But then

    ni=1

    iui(y) >n

    i=1

    iui(x),

    a contradiction.

    From the preceding sufficient condition, we can then deduce the existence of at leastone Pareto optimal alternative very generally.

    Theorem 3.2 Assume A Rd is compact and each ui is continuous. Then thereexists a Pareto optimal alternative.

    Proof Define the functionf: A Rby f(x) = ni=1 iui(x) for allx. Note thatfis continuous, and so it achieves a maximum over the compact set A. Letting x bea maximizer, this alternative is Pareto optimal.

    We have shown that if an alternative maximizes the sum of utilities for strictly positiveweights, then it is Pareto optimal. The next result imposes Euclidean structure onthe set of alternatives and individual utilities, namely strict quasi-concavity, andstrengthens the result of Theorem3.1by weakening the sufficient condition to allowsome weights to be zero.

    10

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    12/68

    Theorem 3.3 Assume A Rd is convex and each ui is strictly quasi-concave. Ifthere exist weights1, . . . , n 0 (not all zero) such thatx solves

    maxyA

    n

    i=1

    iui(y),

    then it is Pareto optimal.

    Proof Suppose x maximizes the weighted sum of utilities over A but is Paretodominated by some alternative z. In particular, ui(z) ui(x) for each i. Definew = 12x+

    12z, and note that convexity of A implies w A. Furthermore, strict

    quasi-concavity impliesui(w)>min{ui(x), ui(z)} =ui(x) for alli. Since the weightsi are non-negative, we have iui(w) iui(x) for alli, and since i > 0 for at leastone individual, the latter inequality is strict for at least one individual. But then

    ni=1

    iui(w) >

    ni=1

    iui(x),

    a contradiction. We conclude that x is Pareto optimal.

    Our sufficient condition for Pareto optimality for general utilities, Theorem3.1,relieson all coefficientsibeing strictly positive, while Theorem3.3weakens this for strictlyquasi-concave utilities to at least one positive i. In general, we cannot state asufficient condition that allows some coefficients to be zero, even if we replace strictquasi-concavity with concavity.

    Example Let there be two individuals,A= [0, 1],u1(x) =x, and u2(x) = 0. Theseutilities are concave, and x = 0 maximizes 1u1(x) +2u2(x) with weights 1 = 0and2 = 1, but it is obviously not Pareto optimal.

    In the latter example, of course the problem maxx[0,1] 1u1(x)+2u2(x) (with1= 0and2= 1) has multiple (in fact, an infinite number of) solutions. Next, we providea different sort of sufficient condition, relying on uniqueness of solutions to the socialwelfare problem, for Pareto optimality.

    Theorem 3.4 Assume that for weights1, . . . , n

    0 (not all zero), the problem

    maxyA

    ni=1

    iui(y)

    has a unique solution. Ifx solves the above maximization problem, then it is Paretooptimal.

    11

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    13/68

    U

    V

    = (1, . . . , n)

    (u1(x), . . . , un(x

    ))

    z

    z

    z

    utility for 1

    utility for 2

    z+ (1 )z

    Figure 4: Utility imputations

    The proof is trivial. Suppose that the conditions of the theorem hold andx solvesthe problem but is not Pareto optimal; but then there is a distinct alternative y thatprovides each individual with utility no lower than x, but then y is another solutionto the problem, a contradiction.

    3.2 Characterization with Concavity

    As yet, we have derived sufficientbut not necessaryconditions for Pareto optimal-ity. To provide a more detailed characterization of the Pareto optimal alternatives

    under convexity and concavity conditions, we define the set ofutility imputations as

    U =

    z Rn : there existsx As.t.

    (u1(x), . . . , un(x)) z

    .

    Intuitively, given an alternative x, we may consider the vector (u1(x), . . . , un(x))of utilities generated by x. Note that this vector lies in Rn, which has number ofdimensions equal to the number of individuals. The set of utility imputations consistsof all such utility vectors, as well as any vectors less than or equal to them. See Figure4for the n = 2 case.

    The next lemma gives some useful technical properties of the set of utility imputations.In particular, assuming the set of alternatives is convex and utilities are concave, itestablishes that the set Uof imputations is convex. See Figure4.

    Lemma 3.5 AssumeA Rm is convex and eachui is concave. ThenU is convex.Furthermore, if each ui is strictly concave, then for all distinct x, y A and all

    12

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    14/68

    (0, 1), there existsz U such that

    z > (u1(x), . . . , un(x)) + (1 )(u1(y), . . . , un(y)).

    Proof Take distinct z, z

    U, so there exist x, x

    Asuch that

    (u1(x), . . . , un(x)) z and (u1(x), . . . , un(x)) z.

    SinceAis convex, we have x =x + (1 )x A. By concavity ofui, we have

    ui(x) ui(x) + (1 )ui(x) zi+ (1 )zi

    for all i N. Setting z = (u1(x), . . . , un(x)), we have z z+ (1 )z, whichimpliesz+ (1 )z U. Therefore, U is convex. Now assume each ui is strictlyconcave, and consider any distinct x, x Aand any (0, 1). Borrowing the abovenotation, strict concavity implies

    ui(x) > ui(x) + (1 )ui(x),

    which implies

    z > (u1(x), . . . , un(x)) + (1 )(u1(x), . . . , un(x)),

    as required.

    Next, assuming utilities are concave, we derive a necessary condition for Pareto opti-mality: if an alternative x is Pareto optimal, then there is a vector of non-negative

    weights = (1, . . . , n) (not all zero) such that x

    maximizes the sum of individualutilities with those weights. Note that we do not claim thatx must maximize thesum of utilities with strictly positiveweights.

    Theorem 3.6 Assume A Rd is convex and each ui is concave. If x is Paretooptimal, then there exist weights1, . . . , n 0 (not all zero) such thatx solves

    maxyA

    ni=1

    iui(y).

    Proof Assume that x is Pareto optimal, and define the set

    V = {z Rn :z >(u1(x), . . . , un(x))}

    of vectors strictly greater than the utility vector (u1(x), . . . , un(x

    )) in each coor-dinate. For the remainder of the proof, letz = (u1(x

    ), . . . , un(x)) be the utility

    vector associated with x. The set V is nonempty, convex, and open (and so has

    13

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    15/68

    nonempty interior). The set Uof imputations is nonempty and, by Lemma3.5, con-vex. Note that U V =, for suppose otherwise. Then there exists z U V,which implies the existence ofx Asuch that

    (u1(x), . . . , un(x))

    z > z.

    But then we have xPix for alli N, contradicting our assumption that x is Pareto

    optimal. Therefore, by the separating hyperplane theorem, there is a hyperplane Hthat separates U and V. Let H be generated by the linear function fat value c,and let = (1, . . . , n) Rn be the non-zero gradient off. Then we may assumewithout loss of generality that for allz Uand allw V, we havef(z) c f(w),i.e., z c w. We claim that z = c, and particular that x solves themaximization problem in the theorem. Since z U , it follows immediately thatfevaluated at this vector is less than or equal to c. Suppose it is strictly less so,i.e., z < c. Given > 0, define w = z +(1, 1, . . . , 1), and note that w V,and therefore

    w

    c. But for sufficiently small, we in fact have

    w < c, a

    contradiction. Thatx solves the maximization problem in the theorem then followsimmediately: for all x A, we have (u1(x), . . . , un(x)) U, and then

    (u1(x), . . . , un(x)) c = z,or equivalently,

    iN

    iui(x) iN

    iui(x),

    as claimed. Finally, we claim that Rn+, i.e., i 0 for all i N. To see this,suppose thati< 0 for some i. Then we may define the vector w= z

    + e

    i

    , and forhigh enough , we have

    w = z +ii < z.For all >0, we have w = w+(1, 1, . . . , 1) V, and therefore w c. But wemay choose > 0 sufficiently small that w < z = c, a contradiction. Thus, Rn+ \ {0}.

    The proof of the previous result uses the separating hyperplane theorem and thefollowing insight. We can think of the social welfare function above as merging twosteps: first we apply individual utility functions to an alternativex to get a vector,say z = (z1, . . . , z n), of individual utilities, and then we take the dot product zto get the social welfare from x. Of course, dot products are equivalent to linearfunctions, so we can view the second step as applying a linear function f: Rn Rtothe vector of utilities. Geometrically, when n= 2, we can draw the level sets of thelinear function, and ifx maximizes social welfare with weights , then the vectorof utilities from x, denoted (u1(x

    ), . . . , un(x)), must maximize the linear function

    over the set Uof utility imputations. See Figure4.

    14

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    16/68

    U

    = (1, 2) 0(u1(1), u2(1))

    utility for 1

    utility for 2

    Figure 5: No strictly positive weights

    With the previous result, this provides a complete characterization of Pareto opti-mality (under appropriate convexity/concavity conditions) in terms of optimizationtheory.

    Corollary 3.7 AssumeA Rd is convex and eachui is strictly concave. Thenx isPareto optimal if and only if there exists weights1, . . . , n 0 (not all zero) suchthatx solves

    maxyA

    ni=1

    iui(y).

    The condition that the weights are non-negative but not all zero cannot be strength-ened to the condition that they are all strictly positive in the necessary condition ofTheorem3.6and Corollary3.7.

    Example Suppose there are two individuals who must choose an alternative in theunit interval,A = [0, 1], with quadratic utilities: u1(x) = x2 andu2(x) = (1x)2.Thenx = 1 is Pareto optimal, yet there do not exist strictly positive weights 1, 2>0 such thatx maximizes

    1u1(y) +

    2u2(y). See Figure5. Given any strictly positive

    weights, 1 and 2, the level set through (0, 1) of the linear function with gradient(1, 2) cuts through the set of utility imputations; thus, (u1(1), u2(1)) does notmaximize the linear function over the set of imputations.

    The previous corollary uses the assumption of strict concavity to provide a full char-acterization of Pareto optimality. It is simple to deduce a more general conclusionthat relies instead on the uniqueness condition of Theorem3.4.

    15

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    17/68

    Corollary 3.8 Assume A Rd is convex and each ui is concave. Furthermore,assume that for all weights1, . . . , n 0 (not all zero), the problem

    maxyA

    n

    i=1

    iui(y)

    has a unique solution. Then x is Pareto optimal if and only if there exist weights1, . . . , n 0 (not all zero) such thatx solves the above maximization problem.

    One direction follows immediately from Theorem 3.6. Under the conditions of thecorollary, suppose x solves the maximization problem for some non-negative weights(not all zero). Then Theorem3.4implies xis Pareto optimal, as required.

    With the necessary condition for Pareto optimality established in Theorem 3.6, wecan use calculus techniques to calculate contract curves in simple examples with two

    individuals. Letx intX be Pareto optimal, which therefore maximizes 1u1(x) +2u2(x) for some 1, 2 0 such that 1+ 2 > 0. Then the first order necessarycondition holds, and for all coordinates j, k= 1, . . . , n, we have

    1Dju1(x) +2Dju2(x

    ) = 0

    1Dku1(x) +2Dku2(x

    ) = 0.

    Note that when Dku1(x) = 0 and Dku2(x) = 0, we have

    Dju1(x)

    Dku1(x) =

    Dju2(x)

    Dku2(x).

    That is, the marginal rates of substitution ofk for j are equal for the two individuals,i.e., their indifference curves are tangent, as in Figures2and3. And although themachinery we have developed thus far requires the utilities u1and u2in the precedingdiscussion to be concave, we will see that the analysis extends more generally.

    Example Suppose A = Rd and each ui is quadratic. Since quadratic utilities arestrictly concave, it follows that x is Pareto optimal if and only if there exist weights1, . . . , n 0 (not all zero) such that x solves

    maxyA

    n

    i=1

    iui(y).

    Furthermore, since each ui is strictly concave, the functionn

    i=1 iui(x) is strictlyconcave, so x is a solution to the above maximization problem if and only if it solvesthe first order condition

    0 = D

    ni=1

    iui(x) =

    ni=1

    2i(xi x),

    16

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    18/68

    Figure 6: Convex hull of ideal points

    or

    x =n

    i=1

    inj=1 j

    xi.

    Finally, writing i = iPnj=1j

    , we have i 0 for all i,n

    i=1i= 1, and

    x =n

    i=1

    ixi,

    i.e., x is a convex combination of ideal points with weights i. This gives us a

    characterization of all of the Pareto optimal alternatives: an alternative is Paretooptimal if and only if it is a convex combination of individual ideal points. That is,we connect the exterior ideal points to create an enclosed space, and the Paretooptimals consist of that line and the area within. See Figure 6.

    Since we rely only on ordinal information contained in utility representations, andany utility representation ui is equivalent, for our purposes, to an infinite number ofothers resulting from monotonic transformations ofui. This may seem to run counterto the result just described: ifx maximizes social welfare with weights (1, . . . , n)for one specification of utility representations, u1, . . . , un, then there is no cause tothink it will maximize social welfare with those weights for a different specification,

    say 5u1, u32, ln(u3), . . .. Indeed, it may not. But if we take monotonic transformationsof the original utility functions,x will still be Pareto optimal, and there will still existweights, say (1, . . . ,

    n), for which x maximizes social welfare. In short, Theorem

    3.6says that a Pareto optimal alternative will maximize social welfare for suitablychosen weights, but those weights may depend on the precise specification of utilityfunctions.

    17

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    19/68

    3.3 Characterization with Differentiability

    When utilities are differentiable, we can sharpen the characterization of the previoussubsection. We first note that at an interior Pareto optimal alternative, the gradientsof the individuals are linearly dependent.

    Theorem 3.9 AssumeA Rd, and letx be interior to A. Assume eachui is differ-entiable atx. Then there exist1, . . . , n 0(not all zero) such that

    ni=1 iDui(x) =

    0.

    Proof If there do not exist such weights, then 0 / conv{Du1(x), . . . , D un(x)}. Thenby the separating hyperplane theorem, there is a non-zero vector p Rn such that

    p Du1(x)>0, . . . ,p Dun(x)> 0. Then there exists >0 such that x + p Aandui(x+p)> ui(x) for all i, contradicting Pareto optimality ofx.

    An easy implication of Theorem3.9is a differentiable version of Theorem3.6. Indeed,if eachuiis differentiable and concave andx is Pareto optimal, then there are weights1, . . . , n 0 such thatx satisfies the first order condition for maxyA

    ni=1 iui(y),

    and by concavity, x solves the maximization problem.

    We can take a geometric perspective by defining the mapping u : X Rn fromalternatives to vectors of utilities, i.e., u(x) = (u1(x), . . . , un(x)). Then the derivativeofu at x is the matrix

    u1x1

    (x) u1xd

    (x)...

    . . . ...

    unx1 (x)

    unxd (x)

    .

    The span of the columns is a linear subspace of Rn called the tangent space of uat x. Theorem 3.9 implies that at a Pareto optimal alternative, the rank of thisderivative is n 1 or less. By Pareto optimality, u(x) belongs to the boundary ofu(X). Furthermore, the theorem implies

    1 n

    u1x1

    (x) u1xd

    (x)...

    . . . ...

    unx1

    (x) unxd

    (x)

    = 0,

    so the tangent space has a normal vector (1, . . . , n) with non-negative coordinates.

    The weights in Theorem 3.9 cannot be unique: if weights (1, . . . , n) fulfill thetheorem, then any positive scaling of the weights does as well. But when the derivativeDu(x) has rank n 1, the weights are unique up to a positive scalar. Indeed, whenthe derivative has rank n 1, the tangent space atu(x) is a hyperplane of dimensionn 1, e.g., it is a tangent line when n = 2 and a tangent plane when n = 3. See

    18

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    20/68

    for 1

    utility

    utility

    utility

    u(x)

    for 2

    for 3

    du

    dx1 (x)

    dudx2

    (x)

    dudx3

    (x)

    normal space

    boundaryofu(X)

    Figure 7: Unique weights

    Figure 7 for the three-individual case. Then the normal space is one-dimensional,and the uniqueness claim follows.

    Theorem 3.10 Assume A Rd, and let x be interior to A. Assume each ui isdifferentiable atx and thatDu(x) has rankn 1. Then there exist1, . . . , n 0(not all zero) such that

    ni=1 iDui(x) = 0, and these weights are unique up to a

    positive scaling.

    The rank condition used in the previous result, while reasonable in some contexts,is restrictive; it implies, for example, that the set of alternatives has dimension atleast n 1. Note that the condition that the weights are non-negative and not allzero implies that the tangent line at u(x) is downward sloping when n = 2, and itformalizes the idea that the boundary ofu(X) at u(x) is downward sloping for anynumber of individuals.

    4 Constrained Optimization

    Aconstrained maximization problemis one in which we search for a maximizer withinaconstraint setC Rn. Given domainX Rn, constraint setC Rn, and objective

    19

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    21/68

    functionf:X R, the problem ismaxxXf(x)

    subject tox C.

    That is, we want a vector x X Csuch that for all y X C, f(x) f(y).An element x X C is a constrained local maximizer off subject to C if thereexists some >0 such that for all y B(x) X C,f(x) f(y).

    Similarly, an element x X C is a constrained strict local maximizer off subjectto C if there exists some > 0 such that for all y B(x) X C with y= x, wehave f(x)> f(y).

    As long asfis continuous and X Cis nonempty and compact, there is at least one(global) constrained maximizer.

    We will first consider constraint setsCtaking the form of a single equality constraint:

    C = {x Rn | g(x) =c},where g : Rn R is any function, and c R is a fixed value of g. We write amaximization problem subject to such a constraint as

    maxxXf(x)s.t. g(x) =c.

    You might think ofg(x) as a cost and c as a pre-determined budget. The latter

    formulation is unrestrictive, but we will impose more structure (i.e., differentiability)ong. Then we will allow multiple equality constraints g1 : Rn R, . . . ,gm : Rn R,

    so that the constraint set takes the form

    C = {x Rn | g1(x) =c1, . . . , gm(x) =cm},where cj is a fixed value of the j th constraint forj = 1, . . . , m.

    We then consider the maximization problem with multiple inequality constraints: Csatisfying

    C =

    {x

    R

    n

    |g1(x)

    c1, . . . , gm(x)

    cm

    }.

    These problems are written

    maxxRn

    f(x)

    s.t. g1(x) c1...

    gm(x) cm,

    20

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    22/68

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    23/68

    x1

    x2

    g= c

    level setsoff

    Df(x)

    x

    Dg(x)

    y

    Figure 8: Constrained local maximizer

    x1

    x2

    g= c

    level setoff

    Df(x)

    x

    Dg(x)

    y

    Figure 9: Not a constrained local maximizer

    22

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    24/68

    x1

    x2

    Dg(x)

    g= c

    x= 0

    z

    (z)

    I

    Figure 10: Proof of Lagrange

    Theorem 5.1 (Lagrange) LetX Rn, f: X R, and g : Rn R. Assume fandg are continuously differentiable in an open neighborhood aroundx, an interiorpoint ofX. Also assumeDg(x) = 0. Ifx is a constrained local maximizer offsubjectto g(x) = c, then there is a unique multiplier R such that

    Df(x) = Dg(x). (1)

    Proof I provide a heuristic argument for the case of two variables. The idea is totransform the constrained problem into an unconstrained one. The theorem assumesthatDg(x) = 0, and (only to simplify notation) we will assumex= 0 andD2g(x) = 0.The implicit function theorem implies that in an open interval I around x1 = 0, wemay then view the level set ofgat c as the graph of a function :I R such that forallz I,g(z, (z)) = c. See Figure10. Note that 0 =x = (0, (0)). Furthermore, is continuously differentiable with derivative

    D(z) = D1g(z, (z))D2g(z, (z)) . (2)

    Because x is interior to X, we can choose the interval I small enough that each(z, (z)) belongs to the domain X of the objective function. Then z= 0 is a localmaximizer of the unconstrained problem

    maxzIf(z, (z)),

    23

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    25/68

    and we know the first order condition holds, i.e., differentiating with respect to zandusing the chain rule, we have

    D1f(0) + D2f(0)D(0) = 0,

    which implies

    D1f(0) + D2f(0)D1g(0)

    D2g(0) = 0.

    Defining= D2f(0)D2g(0)

    , we have Df(0) =Dg(0), as desired.

    Of course, the first order condition from Lagranges theorem can be written in termsof partial derivatives:

    f

    xi

    (x) = g

    xi

    (x)

    for all i = 1 . . . n. Thus, the theorem gives us n+ 1 equations (including the con-straint) inn +1 unknowns (including). If we can solve for all of the solutions of thissystem, then we have an upper bound on the interior constrained local maximizers.Remember: the theorem of Lagrange gives a necessary condition for a constrainedlocal maximizer, not a sufficient one; the solutions to the first order condition maynot be local maximizers.

    The number is the Lagrange multipliercorresponding to the constraint. The con-dition Dg(x)= 0 is called the constraint qualification. Without it, the result wouldnot be true.

    Example Consider X= R, f(x) = (x+ 1)2, and g(x) =x2. Consider the problemof maximizingfsubject tog(x) = 0. The maximizer is clearly x = 0. ButDg(0) = 0andDf(0) = 2, so there can be no such that Df(0) =Dg(0).

    There is an easy way to remember the conditions in the corollary to the Lagrangestheorem: if x is an interior constrained local maximizer of f subject to g(x) = c,and ifDg(x)= 0, then there exists R such that (x, ) is a critical point of thefunctionL : X R R defined by

    L(x, ) = f(x) +(c g(x)).That is, there exists Rsuch that

    Lx1

    (x, ) = fx1

    (x) gx1

    (x) = 0...

    ...Lxn

    (x, ) = fxn

    (x) gxn

    (x) = 0L

    (x, ) = c g(x) = 0,

    24

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    26/68

    which is equivalent to the first order condition (1). The function L is called theLagrangian function.

    Though its not quite technically correct, its as though weve converted a constrainedmaximization problem into an unconstrained one: maximizing the LagrangianL(x, )with respect to x. Imagine allowing xs that violate the constraint; for example,suppose, at a constrained maximizer x, that we could increase the value of f bymoving fromx to a nearby pointxwithg(x)< c. Since thisxviolates the constraint,we dont want this to be profitable, so the Lagrangian has to impose a cost of doingso in the amount (c g(x)) (here, has to be positive). Then is like a priceof violating the constraint imposed by the Lagrangian. The reason why this is nottechnically correct is that given the multiplier , a constrained local maximizer neednot be a local maximizer ofL(, ).

    Example Consider X= R,f(x) = (x 1)3 +x, g(x) =x, andmaxxR f(x)

    s.t. g(x) = 1

    The unique solution to the constraint, and therefore to the maximization problem,is x = 1. Note thatDf(x) = 3(x 1)2 + 1 and Dg(x) = 1, and evaluating at thesolutionx = 1, we have Df(1) = 1 =Dg (1). Thus, the multiplier for this problem is= 1. The Lagrangian is

    L(x, ) = (x 1)3 +x +(1 x),and evaluated at = 1, this becomes

    L(x, 1) = (x 1)3 + 1.But note that this function is strictly increasing at x = 1, i.e., for arbitrarily small >0, we have L(1 + , 1)> L(1, 1), sox = 1 is not a local maximizer ofL(, 1).

    Note the following implication of Lagranges Theorem: at a constrained local maxi-mizer, x, we have

    f

    xi(x)

    f

    xj(x)

    =

    g

    xi(x)

    g

    xj(x)

    for all i and j. The lefthand side is the marginal rate of substitution telling us thevalue ofxi in terms ofxj. The righthand side tells us the cost ofxi in terms ofxj .Lagrange tells us that, at an interior local maximizer, those have to be the same.

    Recall that Lagranges theorem only gives us a necessarynot a sufficientconditionfor a constrained local maximizer. To see why the first order condition is not generallysufficient, consider the following example.

    25

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    27/68

    Example Consider X= R2, f(x1, x2) =x1+x22,g(x1, x2) =x1, and

    maxxR2f(x1, x2)

    s.t. g(x1, x2) = 1.

    Note that x = (1, 0) satisfies the constraint g(x) = 1, and the constraint qualifica-tion is also satisfied. Furthermore, the first order condition from Lagranges theoremis satisfied at x = (1, 0). Indeed,Df(x) = (1, 2x2) and Dg(x) = (1, 0). Evaluatingat x, we have Df(x) = (1, 0) = Dg(x). Thus, the equalityDf(x) = Dg(x) isobtained by setting = 1, as in Lagranges theorem. Butx is not a constrained localmaximizer: for arbitrarily small >0, (1, ) satisfies g(1, ) = 1 and f(1, )> f(x).

    Note that the objective function in the preceding example violates quasi-concavity.I claim, for now without proof, that when the objective function is concave and the

    constraint is linear, the first order condition from Lagranges theorem is sufficientfor a global maximum. But what ifg is linear, the first-order condition is satisfiedat x, and f is only quasi-concave? Must x be a global maximizer? The answer,unfortunately, is no. In fact, xneed not even be a local maximizer.

    Example Consider f(x1, x2) =x32, g(x1, x2) =x1, and

    maxxR2f(x1, x2)

    s.t. g(x1, x2) = 1.

    Note that f is quasi-concave, that g is linear with gradient (1, 0) satisfying the con-straint qualification, and that x = (1, 0) satisfies the constraint g(x) = 1. Further-more, Df(x1, x2) = (0, 3x

    22). Evaluating atx

    , we have Df(x) = 0, and we obtainthe equality Df(x) = Dg(x) by setting = 0. But x is not a constrained localmaximizer: for arbitrarily small >0, (1, ) satisfies g(1, ) = 1 and f(1, )> f(x).

    But the example leaves open one possibility for a general result. In the example, theobjective function was quasi-concave, but the gradient at x was zero; what iff isquasi-concave and the gradient is non-zero? The next result establishes that theseconditions are indeed sufficient for a global maximizer. It actually follows from a more

    general result, Theorem6.2, for inequality constrained maximization, so we defer theproof until then.

    Theorem 5.2 LetX be open and convex, letf: X Rbe quasi-concave and con-tinuously differentiable, and letg : Rn Rbe linear. Ifg(x) = c and there exists R such that the first order condition (1) holds with respect tox, thenx is a constrainedglobal maximizer offsubject to g(x) =c provided either of two conditions holds:

    26

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    28/68

    1. Df(x) = 0, or2. f is concave.

    The preceding example shows that the first order condition is not sufficient for alocal maximizer (and a fortiori, not for a global maximizer). One approach to thisproblem, taken above, is to add the assumption of non-zero gradient. An alterna-tive is to strengthen the first order condition to the assumption that x is a localmaximizer. . . but this hope is not realized: in the previous example, re-define f tobe constant at zero whenever x2 < 0, leaving the definition unchanged wheneverx2 0; then every vector with x2 0 besuch that for all z X C B(x) with z= x, we have f(x) > f(z). Given any with 0 < < 1, define z() = y + (1 )x. Then quasi-concavity impliesf(z()) min{f(x), f(y)} =f(x). Furthermore, with g(x) = g(y) = c, linearity ofgimpliesg(z()) = c. But for small enough >0, we have z() X C B(x) andf(z()) 0, a contradiction.

    Of course, iff is strictly quasi-concave and x is a constrained local maximizer, thenit is a constrained strict local maximizer, and the theorem can be applied.

    5.2 Examples

    Example A consumer purchases a bundle (x1, x2) to maximize utility. His income isI >0 and prices arep1> 0 andp2> 0. His utility function isu : R

    2+

    R. We assume

    u is differentiable and monotonic in the following sense: for all (x1, x2) and (y1, y2)with x1 y1 andx2 y2, at least one inequality strict, we have u(x1, x2)> u(y1, y2).The consumers problem is:

    max(x1,x2)R2+ u(x1, x2)

    s.t. p1x1+p2x2 = I .

    27

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    29/68

    Note that we impose the constraint that the consumer must spend all of his income;since we assume monotonicity, this is without loss of generality. The set X C =R

    2+ {(x1, x2)| p1x1+p2x2 =I} is compact (since p1, p2 >0), and u is continuous,

    so the maximization problem has a solution. We can apply Lagranges theorem with

    f(x1, x2) = u(x1, x2)g(x1, x2) = p1x1+p2x2

    c = I

    to find all the constrained local maximizers (x1, x2) interior to R2+ (i.e, x1, x2 > 0)

    satisfyingDg(x1, x2) = 0. In fact, for all (x1, x2) R2+,Dg(x1, x2) = (p1, p2) = 0,

    so the constraint qualification is always met. Letting (x1, x2) be an interior con-strained local maximizer, there exists

    Rsuch that (x1, x2, ) is a critical point of

    the Lagrangian:

    L(x1, x2, ) = u(x1, x2) +(Ip1x1 p2x2).That is,

    L

    x1(x1, x2, ) =

    u

    x1(x1, x2, ) p1= 0

    L

    x2(x1, x2, ) =

    u

    x2(x1, x2, ) p2= 0

    L

    (x1, x2, ) = I

    p1x1

    p2x2 = 0.

    Solving these equations gives us the critical points of the Lagrangian, and if a maxi-mizer (x1, x

    2) is interior to R

    2+(x

    1, x

    2> 0 ), then it will be one of these critical points.

    Note that

    ux1

    (x1, x2)

    ux2

    (x1, x2)

    = p1p2

    ,

    i.e., the relative value ofx1in terms ofx2equals the relative price. Consider the Cobb-Douglas special caseu(x1, x2) =x

    1 x

    2 , where, >0. Its clear that every maximizer

    must be interior to R2+. (Right?) The critical points of the Lagrangian satisfy

    x11 x2 p1 = 0

    x1 x12 p2 = 0,

    Divide to get

    = x2x1

    p1p2

    , orx2 =

    p1p2

    x1. Plug into p1x1+p2x2 = Ito get

    p1x1+p2

    p1p2

    x1

    = I,

    28

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    30/68

    so the unique critical point of the Lagrangian is

    x1 =

    +

    I

    p1and x2 =

    +

    I

    p2.

    Since this critical point is unique, it is the unique maximizer, and we call

    x1(p1, p2, I) =

    +

    I

    p1

    x2(p1, p2, I) =

    +

    I

    p1

    demand functions. They tell us the consumers consumption for different prices andincomes. Fixing p2 and I, we can graph x1 as a function ofp1, which gives us thedemand curve for good 1. We can also solve for by substituting intox11 x

    2 =p1.

    This gives us,

    =

    p1

    +

    1

    Ip1

    1

    +

    Ip2

    =

    p1

    p2

    I

    +

    +1.

    If+= 1, then the last term drops out. Note that we can always take a strictlyincreasing transformation of Cobb-Douglas utilities to obtain += 1 without al-tering the consumers demand functions, but such a transformation can affect theLagrange multiplier.

    Example Now consider the distributive model of social choice, where the set ofalternatives is the unit simplex,

    X =

    x Rn+|

    ni=1

    xi = 1

    ,

    and each individual simply wants more of this scarce resource for him- or herself.Formally, assume each is utility function ui(x1, . . . , xn) is strictly increasing in xiand invariant with respect to reallocations of the resource among other individuals.Consider the welfare maximization problem of a social planner with non-negativeweights1, . . . , n (not all zero):

    maxxX

    n

    i=1

    iui(x).

    From Lagranges theorem, all interior local maximizers satisfy

    xi

    ni=1

    iui(x) = iui(x)

    xi=

    ni=1

    xi = 1.

    29

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    31/68

    In contrast to unconstrained maximization, where the first order condition means thatthe marginal impact of changing each choice variable is zero, now an interior allocationcan be a local maximizer only if the marginal impacts are equalized across individuals.If a local maximum involves some individuals receiving an allocation of zero, then thelogic extends to all individuals receiving a positive amount of the resource. Nowconsider the special caseui(x) = ln(xi). (Henceforth, we only consider alternatives inwhich each individual receives a strictly positive amount of the resource, so utilitiesare well-defeined.) These utilities are concave in x but not strictly concave or evenstrictly quasi-concave. Given the structure of the set of alternatives and utilities, wecan write the first order condition as

    xi

    ni=1

    iln(xi) = i

    xi=

    n

    i=1xi = 1,

    and it is straightforward to deduce that the unique maximizer is x = (1, . . . , n).Interestingly, we have seen this problem before in the Cobb-Douglas example of theconsumers problem: the maximization problem in the distributive setting is unaf-fected if we take a strictly increasing transformation of the objective function, so wecan replace the above objective with

    ePn

    i=1iln(xi) =n

    i=1

    elnxii =

    ni=1

    xii ,

    which has the form of a Cobb-Douglas utility function with exponent i on xi; thus,the above problem is isomorphic to the problem of a Cobb-Douglas consumer facing

    unit prices p1 = =pn = 1 and income I= 1. Because the maximization problemhas a unique solution for all such weights, the characterization result of Corollary3.8applies, and so we have solved for all Pareto optimal alternatives. In fact, by varyingthe weights1, . . . , n, we conclude thateveryalternative is Pareto optimal a factthat was pretty obvious from the outset (right?).

    Example Prior to a national election, suppose a political party must decide howmuch to spend in a number of electoral districtsi= 1, . . . , n. Letxidenote the amountspent in district i, and assume x1 0, . . . xn 0,

    ni=1 xi = I. The probability the

    party wins district i is Pi(xi), where Pi : R+ R is a differentiable function. Theparty seeks to maximize the expected number of districts it wins, i.e.,

    max(x1,...xn)Rn+n

    i=1 Pi(xi)

    s.t. x1+ +xn=I .The first order conditions for an interior local maximizer are

    DP1(x1) = , . . . , D P n(xn) =,

    ni=1

    xi= I .

    30

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    32/68

    Again, the first order conditions reduce to the following simple principle: allocatemoney to districts in a way that equalizes marginal probability of victory acrossthe districts. Note that the special case Pi(xi) =iln(xi) is equivalent to the Cobb-Douglas specification of the consumers problem. For an alternative parameterization,it could be that

    Pi(xi) = i+

    +xi

    where i< and i may vary across districts. The first order condition is

    1(+x1)

    2 = 2(+x2)

    2 = = n1(+xn1)

    2 = n(+xn)

    2 .

    The solutions to these equations will include all interior maximizers, if any. (Whetherthere are any will depend on the is. Ifi is close toso the probability of victory

    is close to one, spending will be low. If

    i 0 andDgj(x)t > 0 for all j =1, . . . , k. In the latter case, however, we can choose > 0 sufficiently small so thatf(x+t) > f(x) and gj(x+t) < gj(x) = cj for all j = 1, . . . , k, but then x is nota constrained local maximizer, a contradiction. In the former case, note that linearindependence of{Dg1(x), . . . , D gk(x)} implies that 0= 0, and so we can define

    45

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    47/68

    j =j/0, j = 1, . . . , k, to fulfill (7) andk+1= =m = 0 to fulfill (8) and (9).Again, linear independence implies that these coefficients are unique.

    Geometrically, the first order condition from the Karush-Kuhn-Tucker theorem meansthat the gradient of the objective function, Df(x), is contained in the semi-

    Dg1(x)

    Dg2(x)

    x

    positive cone generated by the gradients of bindingconstraints, i.e., it is contained in the set

    mj=1

    jDgj(x) | 1, . . . , m 0

    ,

    depicted to the right. The technology of the proof isessentially a form of the separating hyperplane theo-rem, but one known as a theorem of the alternative that is especially adapted forproblems exhibiting a linear structure. In turn, there are different versions of thetheorem of the alternative, depending on the types of inequalities involved. (Someversions involve all strict inequalities, some all weak, etc.) We use Gales (1960) The-orem 2.9, which states that a vector y lies in the semi-positive cone of a collection{a1, . . . , ak} if and only if it is not the case that there exists a vector t such thataj t >0 for all j = 1, . . . , k.

    In practical terms, the first order conditions (7) and (8) give us n+m equations inn + munknowns. If we can solve for all of the solutions of this system, then we havean upper bound on the interior constrained local maximizers. Typically, one goesthrough all combinations of binding constraints; given one set of binding constraintsmeeting the constraint qualification, solve the problem as though it were just one of

    multiple equality constraints. Furthermore, if any solutions involvej

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    48/68

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    49/68

    x

    x2

    x1

    g2 c2

    g1 c1

    Dg2(x)

    Dg1(x)Df(x)

    Figure 13: Constraint qualification needed

    (2) g2(x1, x2) 0, and (3) g3(x1, x2) 0. Note the constraints cannot bind simul-taneously. First, consider the possibility that only (2) binds, i.e., p1x1+ p2x2 < I,x1 = 0, and x2 > 0. Note that Dg2(x) = (1, 0)= 0, so the constraint qualificationis met. By complementary slackness, it follows that1 = 3 = 0, so the first ordercondition becomes

    u

    x1(x1, x2) = 2

    g2x1

    (x1, x2) = 2

    u

    x2(x1, x2) = 2

    g2x2

    (x1, x2) = 0

    g2(x1, x2) = 0, 2 0,but this is incompatible with monotonicity ofu, so we discard this case. Similarly forthe case in which only (3) binds, the case in which (2) and (3) both bind, and the casein which no constraints bind. Next, consider the case in which (1) and (2) bind, i.e.,

    p1x1+ p2x2 = I , x1 = 0, x2 > 0. Note that Dg1(x) = (p1, p2) and Dg2(x) = (1, 0)are linearly independent, so the constraint qualification is met. Since x2 > 0, com-plementary slackness implies

    3= 0, so the first order conditions are

    u

    x1(x1, x2) = 1

    g1x1

    (x1, x2) +2g2x1

    (x1, x2)

    u

    x2(x1, x2) = 1

    g1x2

    (x1, x2) +2g2x2

    (x1, x2)

    g1(x1, x2) =I , g2(x1, x2) = 0, 1, 2 0.

    48

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    50/68

    We substitute x1= 0 intop1x1+p2x2 = Ito solve for x2= Ip2

    , and we conclude that

    the bundle (0, Ip2

    ) is one possible optimal bundle for the consumer. Similarly, when

    (1) and (3) bind, we find the possible bundle ( Ip1

    , 0). Finally, we consider the case in

    which only (1) binds. Then complementary slackness implies2 = 3 = 0, and the

    first order conditions areu

    x1(x1, x2) = 1

    g1x1

    (x1, x2)

    u

    x2(x1, x2) = 1

    g1x2

    (x1, x2)

    g1(x1, x2) = I, 1 0.When u is Cobb-Douglas, these equations yield x1 =

    +

    Ip1

    and x2 =

    +I

    p2, and

    checking the three possible solutions, youll see that this one indeed solves the con-

    sumers problem. Assume, instead, that the two goods are perfect substitutes, i.e.,u(x1, x2) = ax1+ bx2 with a, b > 0, and consider the case in which only (1) binds.The first order conditions imply a= 1p1 and b= 1p2, so this case is only possiblewhen the consumers marginal rate of substitution (measuring the value of good 1 interms of good 2) is equal to the relative price of good 1: a

    b = p1

    p2. Then every bundle

    (x1, x2) satisfying the budget constraint with equality yields utility

    ax1+b

    Ip1x1

    p2

    = ax1+

    ap2p1

    Ip1x1

    p2

    =

    aI

    p1=

    bI

    p2,

    so all such bundles are optimal. If the razors edge condition on marginal rates of sub-

    stitution and relative prices does not hold, then either a

    b > p1

    p2 or the opposite obtain,and the only possible optimal bundles are the corner solutions. In the former case,

    u

    I

    p1, 0

    =

    aI

    p1>

    bI

    p2= u

    0,

    I

    p2

    ,

    so the consumer optimally spends all of his money on good 1, and in the remainingcase he spends everything on good 2.

    6.2 Concave Programming

    Like optimization subject to equality constraints, optimization problems subject toinequality constraints are simplified under concavity conditions. In fact, such prob-lems are even more amenable to this structure. We first extend our earlier resultsfor concave objective and linear constraints. First, we establish a general result thatimplies our earlier results for quasi-concave objective and linear equality constraints.Now, it is enough that constraints are quasi-concave: the full strength of linearity isnot needed for inequality constraints.

    49

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    51/68

    Theorem 6.2 Letf: Rn R be quasi-concave and continuously differentiable, andletg1 : R

    n R, . . . , gm : Rn R be quasi-convex. Suppose there exist1, . . . , mR such that the first order condition (7)(9) holds with respect to x. Then x is aconstrained global maximizer of f subject to g1(x) c1, . . . , gm(x) cm providedeither of two conditions holds:

    1. Df(x) = 0, or2. f is concave.

    Proof Note that either Df(x)= 0 or, under the assumptions of the theorem,Df(x) = 0 andfis concave, which implies that x is an unconstrained (and thereforea constrained) global maximizer. Thus, we consider the Df(x) = 0 case. Lety be anyelement of the constraint setC, i.e.,y satisfiesgj(y) cj forj = 1, . . . , m, and lett =

    1||yx||(y x) be the direction pointing to the vectory fromx. Given (0, 1], define

    z() = x +(y x) = ( 1 )x +y,a convex combination ofx and y . Note thatgj(x) cj andgj(y) cj for eachj , andso by quasi-convexity, we have

    gj(z()) max{gj(x), gj(y)} cj.

    For each binding constraint j, we then have gj(z()) cj =gj(x), and therefore

    Dtgj(x) = lim0

    gj(z()) gj(x)||y x|| 0,

    and of course, for each slack constraint, we have j = 0. Combining these observa-tions, we conclude

    Dtf(x) =

    mj=1

    jDtgj(x) 0.

    Now suppose in order to derive a contradiction that f(y)> f(x). Then there exists (0, 1] such that

    Dtf(z()) = Df(z())t > 0,

    and by quasi-concavity of f, we have f(z()) f(x). See Figure 14 for a visualdepiction. By continuity of the dot product, there exists >0 sufficiently small that

    Df(z())(t Df(x))> 0. Letting t = 1||t+Df(x)||(t Df(x)) point in the directionof the perturbed vectort Df(x), it follows that the derivative off atz() in thisdirection is positive, i.e., Dtf(z())>0. This means that for sufficiently small >0,we can definew = z()+t such that f(w)> f(z()) f(x). Given (0, 1], define

    v() = x+(w x) = ( 1 )x +w,

    50

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    52/68

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    53/68

    x2

    x1

    x

    Df(x)

    t

    ts w

    z() y

    set offlevel

    Figure 14: Proof of Theorem6.2

    Theorem 6.3 Let X Rn be convex, let f: X R be quasi-concave, and letg1 : R

    n R, . . . , gm : Rn R be quasi-convex. If x X is a constrained strictlocal maximizer, then it is the unique constrained global maximizer of f subject tog1(x) c1, . . . , gm(x) cm.

    We end this section with an analysis that is particular to inequality constraints. Undera weak version of the constraint qualification, and with concave objective and convexconstraints, solutions to the constrained maximization problem can be re-cast asunconstrained maximizers of the Lagrangian, with appropriately chosen multipliers.Formally, writing = (1, . . . , m) for a vector of multipliers, we say (x

    , ) is asaddlepointof the Lagrangian if for all x Rn and all Rm+ ,

    L(x, ) L(x, ) L(x, ).In words, given x, j minimizes

    mj=1 j(cj gj(x)); and given , x maximizes

    f(x) +m

    j=1 j(cj gj(x)). Note that the maximization problem over x is uncon-

    strained, but if (x, ) is a saddlepoint, thenx will indeed satisfygj(x) cj for each

    j; indeed, ifcj gj(x)< 0, then the term j(cj gj(x)) could be made arbitrarilynegative by choice of arbitrarily largej , so (x

    , ) could not be a saddlepoint.

    Theorem 6.4 Let f: Rn

    R be concave, let g

    1: Rn

    R, . . . , g

    m: Rn

    R be

    convex, and let x Rn. If there exist 1, . . . , m R such that (x, ) is a sad-dlepoint of the Lagrangian, thenx is a global constrained maximizer off subject tog1(x) c1, . . . , gm(x) cm. Conversely, assume there is somex Rn such thatg1(x) < c1, . . . , gm(x) < cm. If x

    is a constrained local maximizer of f subject tog1(x) c1, . . . , gm(x) cm, then there exist1, . . . , m R such that (x, ) is asaddlepoint of the Lagrangian. Furthermore, if f, g1, . . . , gm are differentiable atx,then the first order condition (7)(9) holds.

    52

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    54/68

    The condition g1(x) < c1, . . . , gm(x) < cm is called Slaters condition. To gain anintuition for the saddlepoint theorem and the need for Slaters condition, considerFigure15. Here, we consider maximizing a function of any number of variables, butto illustrate the problem in a two-dimensional graph, we assume there is a singleinequality constraint, g (x)

    c. On the horizontal axis, we graph values off(x) as x

    varies over Rn, and on the vertical axis, we graph cg(x) as xvaries over the Euclideanspace. When f is concave and g is convex (so c g(x) is concave), you can checkthat the set {(f(x), c g(x)) | x Rn}, which is shaded in the figure, is convex. Thevalues (f(x), c g(x)) corresponding to vectors x satisfying the constraint g(x)clie above the horizontal axis, the darker shaded regions in the figure. The orderedpairs (f(x), c g(x)) corresponding to solutions of the constrained maximizationproblem are indicated by the black dots.

    Consider the problem of minimizing f(x) +(c g(x)) with respect to , holdingx fixed. This simply means that at a saddlepoint, (i) ifc g(x)> 0, then = 0,and (ii) ifc g(x

    ) = 0, then

    can be any non-negative number. Figure15depictsthe first possibility in Panel (a) and the second possibility in Panels (b) and (c). Nowconsider the problem of maximizing f(x) + (c g(x)) with respect to x, holding fixed. Lets write the objective function as a dot product: (1, )(f(x), cg(x)). Viewed this way, we can understand the problem as choosing the ordered pair(f(x), cg(x)) in the shaded region that maximizes the linear function with gradient(1, ). This is depicted in Panels (a) and (b). The difference in the two panels isthat in (a), the constraint is not binding at the solution to the optimization problem(soDf(x) = g(x) = 0), while in (b) it is (so may be positive).

    The difference between Panels (b) and (c) is that Slaters condition is not satisfied in

    the latter: there is no x such that g(x)< c; graphically, the shaded region does notcontain any points above the horizontal axis. The pair (f(x), cg(x)) correspondingto the solution of the maximization problem is indicated by the black dot; we thenmust choose such that (f(x), cg(x)) maximizes the linear function with gradient(1, ). The difficulty is that for any finite , the pair (f(x), c g(x)) does notmaximize the linear function; instead, the maximizing pair will correspond to a vectorx that violates the constraint, i.e., c g(x) < 0. To make (f(x), c g(x)) themaximizing pair, the gradient of the linear function must be pointing straight up,which would correspond to something like infinite (whatever that would mean). Inother words, if Slaters condition is not satisfied, then there may be no way to choose

    a multiplier to solve the saddlepoint problem.

    Example For a formal example demonstrating the need for Slaters condition, letn = 1, f(x) = x, m = 1, c1 = 0, and g(x) = x

    2. The only point in R satisfyingg1(x) 0 isx = 0, so this is trivially the constrained maximizer off. ButDf(0) = 1andDg1(0) = 0, so there is no 0 such thatDf(0) =Dg1(0).

    53

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    55/68

    c g(x)

    c g(x)

    c g(x)

    f(x)

    f(x)

    f(x)

    (1, 0)

    = 0

    (1, )

    >0

    (1, )

    (f(x), c g(x))

    (f(x), c g(x))

    (f(x), c g(x))

    (1, )?

    a)

    b)

    c)

    Figure 15: Saddlepoint theorem

    54

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    56/68

    Note that Slaters condition is implied by the usual constraint qualification.

    0

    convexhull

    Dg1(x)

    Dg2(x)

    p

    Indeed, suppose the gradients of the constraints{Dg1(x), . . . , D gm(x)} are linearly independent; inparticular, there do not exist non-negative co-efficients 1, . . . , m summing to one such that

    mj=1 jDgj(x) = 0. In geometric terms, the zero

    vector does not belong to the convex hull of the setof gradients. By the separating hyperplane theorem,there is a direction p such that p Dgj(x) > 0 for

    j = 1, . . . , m, and this means the derivative in directionp is negative for each con-straint: Dpgj(x) < 0. Then we can choose >0 sufficiently small that z= x psatisfiesgj(z)< gj(x) cj forj = 1, . . . , m, fulfilling Slaters condition. In fact, thisargument shows that we can fulfill Slaters condition using vectors arbitrarily closeto the constrained local maximizer.

    6.3 Second Order Analysis

    The second order analysis parallels that for multiple equality constraints, modified toaccommodate the different first order conditions. Again, the necessary condition isthat the second directional derivative of the Lagrangian be non-positive in a restrictedset of directions. A difference is that now the inequality must hold only for directionsorthogonal to the gradients ofbindingconstraints.

    Theorem 6.5 Let f: Rn R, g1 : Rn R, . . . , gm : Rn R be twice contin-uously differentiable in an open neighborhood around x. Suppose the first k con-

    straints are the binding ones atx, and assume the gradients of the binding constraints,{Dg1(x), . . . , D gk(x)}, are linearly independent. Assumex is a constrained local max-imizer off subject to g1(x) c1, . . . , gm(x) cm, and let1, . . . , m R+ satisfythe first order conditions (7)(9). Consider any directiont such thatDgj(x)t= 0forall binding constraintsj = 1, . . . , k. Then

    t

    D2f(x)

    mj=1

    jD2gj(x)

    t 0.

    Note that the range of directions for which the above inequality must hold is theset of directions that are orthogonal to the gradients of binding constraints. Onemight think it should hold as well for directions t such that Dgj(x)t 0 for all

    j = 1, . . . , m, since any direction with Dgj(x)t

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    57/68

    holds with 1 = 1. Furthermore, the direction t =1 satisfies Dg1(0)t =1 < 0.Nevertheless,

    D2f(0) 1D2g1(0) = 1 > 0,

    violating the stronger version of the condition.

    Again, strengthening the weak inequality to strict gives us a second order conditionthat, in combination with the first order condition, is sufficient for a constrained strictlocal maximizer. In contrast to the analysis of necessary conditions, the next resultdoes not rely on the constraint qualification.

    Theorem 6.6 Letf: Rn R, g1 : Rn R, . . . , gm : Rn R be twice continuouslydifferentiable in an open neighborhood aroundx. Assume x satisfies the constraintsg1(x) c1, . . . ,gm(x) cmand the first order condition with multipliers1, . . . , m R+satisfying (7)(9). Assume that for all directionstwithDgj(x)t 0for all bindingconstraintsj = 1, . . . , k, we have

    t

    D2f(x)

    mj=1

    jD2gj(x)

    t < 0. (11)

    Thenx is a constrained strict local maximizer offsubject tog1(x) c1, . . . ,gm(x) cm.

    Note that, in contrast to Theorem6.5, the range of directions over which the inequality

    holds in Theorem 6.6 is now larger, also holding for directions in which bindingconstraints are decreasing: it holds for all t such that Dgj(x)t 0 rather thanDgj(x)t= 0. This subtlety does not arise in the analysis of equality constraints, andthe next example demonstrates that it plays a critical role.

    Example Let n = 1, f(x) = x2, m = 1, c1 = 0, and g1(x) =x. Obviously,x= 0 is not a local maximizer offsubject tog1(x) 0, and the first order conditionfrom Theorem 6.1 holds with = 0. Nevertheless, it is vacuously true that for alldirectionst such thatDg1(0)t= 0, the inequality (11) holds.

    As with equality constraints, we can consider the parameterized optimization problem

    and can provide conditions under which a constrained local maximizer is a well-defined, smooth function of the parameter. As before, we reinstate the constraintqualification. A change from the previous result is that we strengthen the first ordercondition by assuming strict complementary slackness, which entails that j > 0 ifand only ifgj(x) = cj . That is, whereas complementary slackness means gj(x) = cjifj >0, we now add the converse direction of this statement.

    56

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    58/68

    Theorem 6.7 LetIbe an open interval, and letf: RnI Randg1 : RnI R,. . . , gm : R

    n I R be twice continuously differentiable in an open neighborhood of(x, ). Assume x satisfies the constraints g1(x

    , ) c1, . . . , gm(x, ) cm,suppose the firstk constraints are the binding ones atx, and assume the gradientsof the binding constraints,

    {Dxg1(x

    , ), . . . , Dxgk(x, )

    }, are linearly independent.

    Assumex satisfies the first order condition at, i.e.,

    Dxf(x, ) =

    mj=1

    jDxgj(x, )

    j(cj gj(x, )) = 0 j = 1, . . . , mj 0 j = 1, . . . , m ,

    with multipliers1, . . . , m R+, and that strict complementary slackness holds, i.e.,

    j > 0 if and only if j k. Assume that for all t with Dxgj(x, )t 0 for allbinding constraintsj = 1, . . . , k, we have

    t

    D2xf(x

    , ) m

    j=1

    jD2xgj(x

    , )

    t < 0.

    Then there are an open setY Rn withx Y, and open intervalJ R with J,and continuously differentiable mappings : J Y, 1 : J R, . . . , m : J Rsuch that for all J, (i)()is the unique maximizer off(, )subject tog1(x, ) c1, . . . , gm(x, ) cm belonging to Y, (ii) the unique multipliers for which ()satisfies the first order necessary condition (1) with strict complementary slackness at are1(), . . . , m(), and (iii) () satisfies the second order sufficient condition

    (11) at with multipliers1(), . . . , m().

    Fortunately, the statement of the envelope theorem carries over virtually unchanged.

    Theorem 6.8 LetIbe an open interval, and letf: RnI Randg1 : RnI R,. . . , gm : R

    n I R be twice continuously differentiable in an open neighborhood of(x, ). Let : I Rn and1 : I R, . . . , m : I R be continuously differentiablemappings such that for all I, () is a constrained local maximizer satisfying the

    first order condition (7)(9) at with multipliers1(), . . . , m(). Letx = ()

    andj =j(), j = 1, . . . , m, and define the mappingF: I R byF() = f((), )

    for all I. ThenF is continuously differentiable and

    DF() = L

    (x, , ).

    57

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    59/68

    Again, we can use the envelope theorem to characterize j as the marginal effectof increasing the value of the jth constraint; with inequality constraints, of course,this cannot diminish the maximized value of the objective, so the multipliers arenon-negative.

    7 Pareto Optimality Revisited

    We now return to the topic of characterizing Pareto optimal alternatives and explorean alternative approach using the framework of constrained optimization. First, wegive a general characterization in terms of inequality constrained optimization. Sec-ond, we establish a necessary first order condition for Pareto optimality that adds arank condition on gradients of individual utilities to the assumptions of Theorem3.9to deduce strictly positive coefficients and provides an interpretation of the coefficientsin terms of shadow prices of utilities. Finally, we establish that with quasi-concaveutilities, the first order condition is actually sufficient for Pareto optimality as well.This gives us a full characterization that, in comparison with Corollary 3.7, weak-ens the assumption of concavity to quasi-concavity but adds the rank condition ongradients.

    The next result is structure free, extending our earlier analysis by dropping all con-vexity, concavity, and differentiability conditions. It gives a full characterization: analternative is Pareto optimal if and only if it satisfies n different maximization prob-lems (one for each individual) subject to inequality constraints. The proof followsdirectly from definitions and is omitted.

    Theorem 7.1 Letx A be an alternative, and letui = ui(x) for all i. Thenx isPareto optimal if and only if it solves

    maxyXui(y)

    s.t. uj(y) uj, j = 1, . . . , i 1, i + 1, . . . , nfor alli.

    Note that the sufficiency direction of Theorem7.1uses the fact that the alternativex solves n constrained optimization problems, one for each individual. Figure 16demonstrates that this feature is needed for the result: there, x maximizes u2(y)subject to u1(y) u1, but it is Pareto dominated by x. Obviously, x is Paretooptimal, as it maximizes u1(y) subject to u2(y) u2 and it maximizes u2(y) subjecttou1(y) u1.

    Of course, we can use our analysis of maximization subject to multiple inequality con-straints to draw implications of Theorem7.1. Consider a Pareto optimal alternative

    58

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    60/68

    U

    u1

    u1

    u2 u2

    (u1(x), u2(x))

    (u1(x), u2(x

    ))

    (u1(x), u2(x

    ))

    utilityfor 1

    utility for 2

    p (1, 12) (21, 1)

    Figure 16: Pareto optimality without concavity

    x, as in Figure 16, for which the constraint qualification holds for the optimizationproblem corresponding to eachi. In this context, note that all constraints are bindingby construction: uj(x) = uj for all j= i. Thus, the constraint qualification is thatthe gradients

    Du1(x), . . . , D ui1(x), Dui+1(x), . . . , D un(x)

    are linearly independent. One implication of the constraint qualification is that theset of alternatives has dimension at least n

    1. Furthermore, an implication of the

    constraint qualification holding for alli is that all individuals gradients are non-zeroatx. When there are just two individuals, the qualification becomes Du2(x) = 0 forindividual 1s problem and Du1(x) = 0 for individual 2s problem, i.e., the conditionof non-zero gradients is necessary and sufficient for the constraint qualification.

    Note that for eachi,xis a constrained local maximizer ofuisubject to uj(x) uj ,j = i. Then the first order condition from Theorem 6.1 holds, as stated in thenext theorem, where we omit the complementary slackness conditions because allconstraints are binding.

    Theorem 7.2 AssumeA Rd, letxbe an alternative interior toA, and assume eachui : R

    d Ris continuously differentiable in an open neighborhood aroundx. Supposethat for eachi, the gradients{Duj(x) | j=i}are linearly independent. Ifx is Paretooptimal, then for eachi, there exist unique multipliersi1,

    ii1,

    ii+1, . . . ,

    in 0,j=i,

    such that

    Dui(x) = j=i

    ijDuj(x). (12)

    59

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    61/68

    Recall that the multiplier on a constraint has the interpretation of giving the rate ofchange of the maximized objective function as we increase the value of the constraint.In this context, the multiplier ij has a special meaning: it is the rate at which wecan increase i utility by taking utility away from individual j. Put differently, it isthe rate at which is utility would decrease if we increase j s utility (holding all otherindividuals at the constraint). Thus, it is the shadow price of utility for j in termsof utility for i. Geometrically, viewed in Rd, the gradient Dui(x) of individual i lieson the (n 1)-dimensional hyperplane spanned by the other individuals gradients.

    Now recall the mapping u : X Rn defined by u(x) = (u1(x), . . . , un(x)). Thenu(X) is the set of possible utility vectors, and the linear independence assumption inTheorem7.2 is equivalent to the requirement that the derivative ofu at x, which isthe matrix

    u1x1

    (x) u1xd

    (x)...

    . . . ...

    unx1 (x) unxd (x)

    ,

    has rankn1. This means that there is a uniquely defined hyperplane that is tangenttou(X) at the point u(x). When there are just two individuals, this implies there isa unique tangent line at (u1(x), u2(x)), as in Figure16. See Figure7for the case ofthree individuals. This hyperplane has a normal vectorp that is uniquely defined upto a non-zero scalar. The first order condition (12) from Theorem7.2 can be writtenin matrix terms as

    i1 ii1 1 ii+1 in

    u1x1

    (x) u1xd

    (x)...

    . . . ...

    unx1 (x) unxd (x)

    = 0,

    and we conclude that p is, up to a non-zero scalar, equal to the vector of multipliers(with a coefficient of one fori) for individual is problem.

    An implication of the above analysis is that the vectors (i1, . . . , ii1, 1,

    ii+1, . . . ,

    in)

    of multipliers corresponding to individuals i = 1, . . . , n are collinear. Indeed, theyare each normal to the tangent hyperplane at u(x), and the set of normal vectors isone-dimensional, so the claim follows. The claim can also be verified mechanically bymultiplying both sides of

    Dui(x) = j=i

    ijDuj(x)

    by 1ij

    and manipulating to obtain

    Duj(x) = 1ij

    Dui(x) k=i,j

    ikij

    Duk(x).

    60

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    62/68

    Since the multipliers for js problem are unique, this implies ji = 1ij

    and for all

    k=i, j, jk = ik

    ij, as claimed.

    Three interesting conclusions follow from these observations. First, the multipliers

    from Theorem7.2are actually strictly positive. Second, the utility shadow prices forany two individuals are reciprocal: we can transfer utility from j toi at rateij, and

    we can transfer utility from i to j at rate ji = 1ij

    . Third, the relative prices of any

    two individuals are independent of the problem we consider. To see this, consider anytwo individuals h, i, and let j and k be any two individuals. Then from the analysisin the preceding paragraph, we have

    ijik

    =hj /

    hi

    hk/hi

    =hjhk

    .

    If, for example, it is twice as expensive, in terms ofis utility, to increase js utility

    as it is to increase k s utility, then it is also twice as expensive in terms ofhs utility.

    A finaland importantgeometric insight stems from the sign of the multipliers;they are all non-negative, and at least one is strictly positive. Thus, the tangenthyperplane to u(X) has a normal vector with all non-negative coordinates, at leastone positive. When there are just two individuals, this means that the utility frontieris sloping downward at (u1(x), u2(x)), as in Figure 16, and the idea extends to ageneral number of individuals, as in Figure7. We conclude that at a Pareto optimalalternative for which the constraint qualification is satisfies, the boundary ofu(X) issloping downward, in a precise sense.

    This is only a necessary condition, as Figure 17 illustrates: the boundary of u(X)is downward sloping at (u1(x), u2(x)), but x is Pareto dominated by y. Althoughconceptually possible, however, the anomaly depicted in the figure is precluded underthe typical assumption of quasi-concave utility. Recall that, by Theorem6.2,the firstorder condition is sufficient for a maximizer when the objective and constraints arequasi-concave. With Theorem7.1, this yields the following result.

    Theorem 7.3 AssumeA Rd is convex, letx A be an alternative, and assumeeachui : R

    d R is continuously differentiable and quasi-concave. Suppose that foreach i, Dui(x)

    = 0 and there exist multipliersi1,

    ii1,

    ii+1, . . . ,

    in

    0, j

    =i, such

    thatDui(x) =

    j=i

    ijDuj(x).

    Thenx is Pareto optimal.

    Thus, under quite general conditions, the first order condition (12) is necessary andsufficient for Pareto optimality.

    61

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    63/68

    u(X)

    (u1(x), u2(x))

    (u1(y), u2(y))

    utilityfor 1

    utility for 2

    p

    Figure 17: Violation of Pareto optimality

    Corollary 7.4 AssumeA Rd is convex, letx be an alternative interior to A, andassume each ui : R

    d R is continuously differentiable and quasi-concave. Supposethat for each i, the gradients{Duj(x)| j= i} are linearly independent. Then x isPareto optimal if and only if there exist strictly positive multipliers 1, . . . , n > 0such that

    n

    i=1

    iDui(x) = 0.

    As discussed above, Pareto optimality implies strictly positive coefficients throughthe rank condition, and then we select any i in Theorem 7.2 and manipulate (12)to obtain the simpler first order condition in the above corollary. For the otherdirection, obviously all gradients must be non-zero, and we can manipulate the firstorder condition and set ij =j/i to fulfill the assumptions of Theorem7.3.

    8 Mixed Constraints

    The goal of this section is simply to draw together results for equality constrainedand inequality constrained maximization into a general framework. Conceptually,nothing new is added.

    Letf: Rn R, g1 : Rn R, . . . , g : Rn R, and h1 : Rn R, . . . , hm : Rn R.

    62

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    64/68

    We consider the hybrid optimization problem

    maxxRnf(x)

    s.tgj(x) =cj , j= 1, . . . ,

    hj(x) dj, j= 1, . . . , m ,incorporating any restrictions on the domain off into the constraints.

    The first order analysis extends from the previous sections. See Theorem 1 andCorollary 3 of Fiacco and McCormick (1968).

    Theorem 8.1 Let f: Rn R, g1 : Rn R, . . . , g : Rn R, h1 : Rn R, . . . ,hm : R

    n

    R be continuously differentiable in an open neighborhood aroundx. Sup-

    pose the first k inequality constraints are the binding ones at x, and assume thegradients

    {Dg1(x), . . . , D g(x), Dh1(x), . . . , D hk(x)}are linearly independent. Ifx is a constrained local maximizer offsubject to g1(x) =c1, . . . ,g(x) =c andh1(x) d1, . . . ,hm(x) dm,then there are unique multipliers1, . . . , 1, . . . , m R such that

    Df(x) =

    j=1

    jDgj(x) +m

    j=1

    jDhj(x) (13)

    j(dj hj(x)) = 0 j = 1, . . . , m (14)j 0 j = 1, . . . , m . (15)

    As above, we can define the Lagrangian L : Rn R Rm Rby

    L(x, 1, . . . , , 1, . . . , m) = f(x) +

    j=1

    j(cj gj(x)) +m

    j=1

    j(dj hj(x)),

    and condition (13) from Theorem8.1is then the requirement that x is a critical pointof the Lagrangian given multipliers 1, . . . , , 1, . . . , m.

    Our results for quasi-concave objective functions with non-zero gradient go throughin the general setting, now with the assumption that all equality constraints are linearand all inequality constraints are quasi-convex. Again, we rely on Theorem6.2 forthe proof.

    63

  • 8/10/2019 Notes on Optimization and Pareto Optimality

    65/68

    Theorem 8.2 Let f: Rn R be quasi-concave and continuously differentiable, letg1 : R

    n R, . . . , g : Rn R be linear, and let h1 : Rn R, . . . , hm : Rn R bequasi-convex. Suppose there exist1, . . . , , 1, . . . , m R such that the first ordercondition (13)(15) holds with respect to x. Thenx is a constrained global maximizeroff subject to g1(x) =c1, . . . , g(x) =c andh1(x)

    d1, . . . , hm(x)

    dm provided

    either of two conditions holds:

    1. Df(x) = 0, or2. f is concave.

    With the above convexity conditions on the objective and constraints, ifx is a con-strained strict local maximizer, then it is the unique global maximizer.

    Theorem 8.3 Letf: Rn R be quasi-concave, letg1 : Rn R, . . . , g : Rn R belinear andh1 : R

    n R, . . . , hm : Rn