Lagrangian Notes

7/23/2019 Lagrangian Notes

http://slidepdf.com/reader/full/lagrangian-notes 1/31

Supplementary Notes for EC4101c Indranil Chakraborty

(Strictly not for further distribution )

This note gives an overview of the mathematical details that advanced undergrad-

uate students would bene…t from. This will not be covered in an examination and is

being supplied only for the more mathematically inclined students. It is likely that most

students will have a hard time following the vector notations. In which case writing

down the corresponding longhand expressions with just one or two constraints would

help understand things clearly. In particular, the section on Lagrangian could be more

of interest to you. It might help if you set n = 2 and m = k = 1 in those section.Accordingly, you can ignore the notation T in the superscripts of vectors, e.g., T and

T , that mean transpose. If you are dealing with m = k = 1 then such constants are

scalars rather than vectors. The section on gradient should be skipped unless you are

really into understanding things at a much deeper level. Nonetheless, I inclueded them

in here just in case some of you would like a quick summary of the related ideas and

concepts.

The mathematical concepts are best reviewed by Mathematics for Economists by

Carl P. Simon and Lawrence Blume (henceforth to be referred as “S&B”). For furtherclari…cations of the concepts that is hard to follow the best strategy is to Google it.

Notations: 2 : “is in”

9: “there exists a/an”

8: “for all”

1 A quick review of standard de…nitions and results

Set . A well-de…ned collection of elements is called a set. E.g., (i) set of integers (ii) set

of positive numbers (iii) set of functions that take values in the interval [0,1].

Function . A function f from a set D to a set R is a relationship or mapping that

associates each element of D with a unique element in R. We say that the function

takes each element of D to an element of R and write f : D ! R. This translation



from one set to another is described mathematically using a formula that is denoted by

f (x) for every x 2 D. E.g., (i) f (x) = x + 1 is a function f : R ! R, (ii) f (x) = x2 is

a function f : R ! R+, (iii) f (x) = ex is a function f : R ! R++.In this case D is called the domain (of de…nition) of function f .

A function is called one-to-one if f (x) = f (y) =) x = y. A function is called onto

if for each z 2 R there is a x 2 D such that f (x) = z.

n-dimensional Euclidean space (Rn). We de…ne as the Cartesian production of real

numbers Rn =

n times z }| { R R R:

Finite, countable and uncountable sets . A subset is called …nite if the number of

elements in the subset is …nite. A subset that is not …nite is called an in…nite set. A

subset S is called countable if its elements can be placed in a one-to-one correspondence

with the set of natural numbers N (i.e., if we can de…ne a one-to-one function f from

N to S ). A subset is uncountable if it is not countable.

Note that we often refer to …nite subsets as countably …nite . Countable subsets that

are in…nite are called countably in…nite .

In…nite (countable) sequences . An array of numbers indexed by the natural numbers

is called a sequence . E.g., f

x2

g1

x=1

denotes the sequence of numbers 1; 4; 9; 16; :::::

Note that a sequence is essentially a function on the set of natural numbers N.

Limits of (in…nite) sequences. We say that a sequence fxng1n=1 converges to a limit

x if for every " > 0 there is a N (") 2 N such that jxn xj < " for all n > N ("). We say

that a sequence converges (or that the sequence is convergent) if it converges to some

limit.

Subsequence . An in…nite sub-collection from a sequence is called the subsequence of

the sequence, e.g., the sequence

fn2

g1

n=1 is a subsequence of the sequence

fn

g1

n=1. The

subsequence of a sequence is a sequence in its own right and, thus, may have limits,

etc.

Theorem . A sequence is convergent if and only if every subsequence of the sequence

converges to a common limit.

2



Closed set . We say that a set is closed if the limit of every convergent sequence in

the set is also in that set. E.g., the set R is closed, N is closed (because N does not

have a convergent sequence in it), [0; 1] is closed.

Theorem . (i) The union of a …nite number of closed sets is always closed. (ii) The

intersection of a …nite number of closed sets is always closed.

Open set . A set S is called open if its complement S c is closed. E.g., (0; 1).

Theorem . The union of a …nite number of open sets is always open. The intersection

of a …nite number of open sets is always open.

Example . The set [0; 1) is neither closed nor open.

Closure of a set . The closure of a set S , denoted S , is simply the set obtained by

taking the union of all limit or accumulation points of S with S . Thus the closure of

an open set is a closed set. E.g., the closure of the set (0,1) is [0,1].

Theorem . The closure of a set is a closed set.

Interior of a set . An element x 2 S is said to be in the interior of the set if there

is an " > 0 such that for all y with the property jy xj < ", y 2 S (i.e., all elements

in a close neighborhood of x are also in S ). The set of all elements in the interior of aset is called the interior of the set. E.g., the set (0; 1) is the interior of the sets [0; 1],

[0; 1); (0; 1] and (0; 1).

Boundary of a set . The boundary of a set S , denoted @ S , consists of all elements in

the closure of the set that are not in the interior of the set. E.g., f0; 1g is the boundary

of the sets [0,1], (0,1], [0,1) and (0,1).

Important note . The notation @ is the same as that used in denoting partial deriv-

atives. It is generally clear what exactly @ denotes from the context in which it is

written.

Bounded set . A set S is called bounded if there is M 2 N such that jxj < M for all

x 2 S .

Convex set . A set is convex if for every two elements x and y in the set the element

x + (1 )y is also in the set for all 2 (0; 1). E.g., the set (0; 1] is convex.

3



Continuous functions . A function f is continuous in its domain D if for each x 2 D

and each " > 0 there is a > 0 such that whenever y 2 D satis…es jy xj < ;

jf (y) f (x)j < ". A function is said to be discontinuous at x if it is not continuous atx. E.g., the function f (x) = x2 is continuous on [0; 1) while the function f de…ned as

f (x) = x2 if x > 0

= x + 1 if x 0

de…ned on R is discontinuous at 0.

Theorem . A function f is continuous if and only if for every convergent sequence

fxn

g in D that converges to x in D,

ff (xn)

g converges to f (x).

Left and right derivatives . We de…ne the left derivative of a function f at x by

f 0

(x) limh!0

f (x + h) f (x)

h

and the right derivative by

f 0+(x) limh!0+

f (x + h) f (x)

h :

We say that the derivative f 0(x) of f exists at x if f 0

(x) = f 0+

(x). In that case the right

(i.e., the left) derivative is called the derivative of f at x. The function is also called

di¤erentiable at x :

Di¤erentiable functions . A function is called di¤erentiable if its derivative exists

everywhere in its domain, i.e., it is di¤erentiable everywhere in its domain.

Continuously di¤erentiable functions . A function f : D ! R is called continuously

di¤erentiable if its derivative f 0(x), as a function of x, is continuous everywhere in D.

The collection of all functions on D that are continuously di¤erentiable is denoted by

C 1(D) or simply C 1 if the domain is clear from the context.

Twice continuously di¤erentiable functions . A function f : D ! R is called twice

continuously di¤erentiable if its derivative f 0(x) is di¤erentiable everywhere on D and

that second derivative f 00(x) is continuous everywhere on D. The collection of all

functions on D that are twice continuously di¤erentiable is denoted by C 2.

4



Convex and concave functions . A di¤erential function f is called convex if f 00(x) 0

for all x 2 D and concave if f 00(x) 0 for all x 2 D. The function is called strictly

convex or strictly concave if the inequalities are strict. More generally, we call a functionf convex (resp., concave) if for all x; y 2 D and 2 (0; 1), f (x + (1 )y) (resp.,

) f (x) + (1 )f (y).

1.1 Functions of several variables (S&B ch - 10, 12, 13, 14)

The Real Line . It is simply the collection of all real numbers. It is denoted by R and

represented by a straight line that extends both in positive and negative directions.

Euclidean Space with Higher Dimensions . The direct product of n real lines gives

the n-dimensional Euclidean space Rn. When n = 2 this space is called the Euclidean

plane .

Vectors and scalars . An n-tuple (x1;:::;xn) 2 Rn is often called a vector (in the n

dimensional Euclidean space and often written as a column rather than a row) with the

interpretation that it has a sense of magnitude as well as a direction, rather than just

a location in the space. This interpretation comes from the area of mechanics and is

useful for doing multivariate calculus.

A scalar , on the other hand, is a real number with the interpretation that it has asense of magnitude only.

(See section 10.2 of S&B and the exercises therein.)

Important concepts to go over : Addition and subtraction of vectors, scalar multipli-

cation of a vector.

Dot (or inner) product of vectors . The dot product of two n-vectors u and v is

written and de…ned as u v =Xn

i=1uivi. A standard result interprets the dot product

u v = kuk kvk cos , where is the angle between the two vectors (the concept of angle

between vectors is tricky when we are in more than three dimensions, so just think onlyof n = 2 or 3).

Norm or metric . There is an easy way to measure the distance between two points

x and y in R. Simply calculate the absolute value jx yj of the di¤erence. When

we deal with higher dimensional Euclidean spaces, like Rn, there is no straightforward

5



de…nition of “distance between two points.” In that case, we construct an appropriate

de…nition for the “distance” and call it the metric or the norm of the Euclidean space.

Conventionally, the distance (as de…ned) between the point 0 of the vector spaceand a point x under consideration is called the norm of the space and denoted by kxk.

The idea is that this same de…nition can be used to calculate the “distance” d(x; y)

between two points x and y as d(x; y) = kx yk.

Several notions of distance are used depending on the nature of the problem at hand.

E.g., (i) kxk jxj (the absolute value) is a norm on the vector space R, (ii) kxk =q Xn

i=1x2i is a norm or metric on the vector space Rn. An alternative metric on Rn is

also given by kxk =Xn

i=1jxij. Note that the two di¤erent norms in the example mean

the same thing when n = 1, i.e., the vector space is the real line R

.

Bounded set . A set S is called bounded with respect to a norm kk if there is a

M 2 R+ such that kxk M for all x 2 S . A set that is not bounded is called an

unbounded set .

Convergence of Sequences in Rn. (i) Sequences in Rn; (ii) Convergence of a sequence;

(iii) Limit of a sequence.

Results

i. A sequence can have at most one limit.ii. lim(xn + yn) = lim xn + lim yn.

iii. lim cxn = c lim xn.

iv. lim xn yn = lim xn lim yn.

v. A sequence in Rn converges if and only if every component converges is R.

Subsequence of a sequence in Rn. Any sub-collection of the elements in the same

order is called the subsequence of the original sequence.

Results

i. A sequence converges if and only if every subsequence converges.

ii. The limit of a convergent sequence is the same as the limit of any of its subse-

quences.

Limit points or accumulation points of a set . The limit of a convergent sequence

constructed from the elements of a set is called a limit or accumulation point of the set.

6



Closed set . A subset S Rn is called a closed set if it contains all its limit points.

Open sets . A subset S Rn

is called open if its complement is closed.

Results .

i. Union of a …nite number of open (resp., closed) sets is (resp., closed).

ii. The complement of an open (resp., closed) set is closed (resp., open).

Closure of a set . A subset S with all its limit points included is called the closure

of the set . It is denoted by S .

Boundary of a set . An element that is in the closure of a set as well as the closure

of its complement is called a boundary point of the set. The set of boundary points of a set S is denoted by @ S .

Interior of a set . The interior of a set consists of all points in the set that are not

in the boundary of the set. The interior of a set S is denoted by S .

Open ball and closed ball . The set B(x) = fx 2 Rn : kxk < g is called an open ball

around x of diameter > 0. The set B(x) = fx 2 Rn : kxk g is called a closed ball

around x of diameter > 0. (Note that precisely what the set looks like depends on

the norm that is being used.)

Compact set . A closed and bounded set is called a compact set .

Vector-valued functions on Euclidean spaces . A vector-valued function is a function

f : Rk ! Rm or S ! T where S Rk and T Rm de…ned by a set of m real valued

functions as follows:

f (x) =

26664

f 1(x)f 2(x)

...

f m(x)

37775

.

Level curves and level sets . The level curve or set (corresponding to a level c) of

a function f de…ned on D Rn is precisely the set or curve described by the set

fx 2 Djf (x) = cg. Typically the level curve is simply described by the equation

f (x) = c. E.g., indi¤erence curve, isoquant curve.

7



Example . Consider the function f : S ! R given by f (x) =p

25 x21 x2

2 where

S = fxjx21 + x2

2 25g. Let us …nd its level curves corresponding to c 2 R++ and

describe these level curves. The level curve/set corresponding to level c is simply theset of x 2 R such that f (x) = c. It is described by the level set fxjx2

1 + x22 = 25 c2g.

As you can tell, the level set is de…ned only as long as jcj 5. So for c 2 (0; 5) the

level set corresponding to c is simply a circle centered around the origin with a radiusp 25 c2. As c increases from 0 to 5 we get a system of concentric circles all centered

at the origin but with smaller and smaller radii. Note that when x1 = 0 and x2 = 0

the level is at its highest, viz. 5. In fact, the level set corresponding to c = 5 is just the

singleton f(0; 0)g.

We call a set such as above a level set. However, when the set can be geometrically

represented by a curve, it is called a level curve. They are also called the contours of

the graph of f (x) =p

25 x21 x2

2.

Continuous functions . A function f is continuous at x 2 D Rn (that is a subset

of a normed linear space) if for each " > 0 there is a > 0 such that whenever y 2 D

satis…es ky xk < ; jf (y) f (x)j < ". The function f is said to be continuous on D

if it is continuous for every x in D.

One-to-one and onto functions . A function f : D

! R is called a one-to-one

function if whenever for x; y 2 D satisfy x 6= y then f (x) 6= f (y). A function is called

onto if for every y 2 R there is a x 2 D with f (x) = y.

Inverse functions . The inverse function for a function f : D ! R is a function

f 1 : R ! D such that f 1(f (x)) = x for all x 2 D and f (f 1(y)) = y for all y 2 R.

Results

i. The inverse of a function exists if and only if it is one-to-one and onto.

ii. Every strictly increasing function has an inverse.

(First order) partial derivative . The derivative of a function with respect to exactly

one variable, and while treating the other variables as constant, is called a partial

derivative. Speci…cally we de…ne the partial derivative of f on Rnwith respect to xi as

@

@xi

f (x) limh!0

f (x1;::;xi1; xi + h; xi+1;::;xn) f (x)

h

8



whenever the limit exists.

See 14.1 and 14.3 of S&B. The geometric interpretations in this chapter are partic-

ularly useful.

Di¤erentiability . We say that f is di¤erentiable if all the …rst order partials exist.

Total derivative. The total derivative of f is de…ned as

df (x) Xn

i=1

@

@xi

f (x)dx

it is interpreted as the calculation that goes into expressing the (small) change in f

in terms of the changes in the component variables and the slope of f in the di¤erent

directions. E.g., consider the case of the function f de…ned on the real line, then we

have df (x) f (x + dx) f (x) = f 0(x)dx.

Example . Let f (x) = x21 + x3

2 x23 + x3

1x2 be a function on R3. The total di¤erential

of this function is

df (x) = (2x1 + 3x21x2)dx1 + (3x2

2 + x31)dx2 2x3dx3.

Parametric curves . There are di¤erent ways in which curves in Rn can be de…ned.

One way is to describe the coordinates as we move along the curve. For instance, we

could describe the curve by the function (x1(t);:::;xn(t)) of a parameter t which could

be interpreted as time. The curve is then fully described by varying the parameter t on

an interval, say [0; 1].

Example . Consider the circle x2 + y2 = 1. The circle can be described using a

parametric representation as given below:

x = cos

y = sin

for 0 2.

Tangent vector to a curve . The tangent vector to a parametrically described curve

x(t) at a point t = t0 is given by x0(t0).

Example . Suppose that a piece of stone is held at the end of a string of length 1

unit and rotated about the origin. Suppose I rotate it and let go of it at time t then the

9



stone will move in the direction of the tangent to the circle where its position would be

at time t. This tangent is a vector with a direction and magnitude. The direction will

depend on the position of the stone on the circle at time t and intuitively the magnitudeof the vector will determine the distance that the stone will be carried once I let go

of it. Again, intuitively one would expect that higher the speed at which I rotate the

stone at the end of the string the greater the distance it will go once it is released.

This is easily seen by considering the path of the stone over time t as it rotates as

described by the parametric form

x = cos t

y = sin t

The speed at which I am rotating the stone covers an angle t in time t. If I rotate it at

a faster speed it covers an angle 2t in time t. In the …rst case, the position of the stone

at time t = =2 is (0; 1). The tangent vector at that point is

( sin =2; cos =2) = (1; 0):

That gives us the idea of the force acting on the stone as the stone is released. In the

second case when I am rotating the stone at a higher speed the position of the stone is

described as

x = cos 2t

y = sin 2t

and stone reaches the same position (0; 1) in time =4. Now let us calculate the tangent

vector at that point. It is given by

(2sin2=4; 2cos2=4) = (2; 0):

Thus the force acting on the stone in this case is twice that in the earlier case but in

the same direction.

Chain rule . Consider the function f (x) as its value changes on a curve written

parametrically as (x1(t);:::;xn(t)). The derivative of the function f (x(t)) of t with

respect to t can be calculated as

d

dtf (x(t)) =

Xn

i=1

@

@xi

f (x(t))dxi(t)

dt :

10



at x.

Example . A person holds a stock for company A and a stock for company B. The

person’s utility when the prices of the stocks are pA and pB is U ( pA; pB) = 20 pA+ 5 p2A+20 pB. Suppose at time t prices are given by pA(t) = t and pB(t) = 2t2. How does the

person’s utility change over time?

Directional derivatives and Gradients . The chain rule is very useful for computing

how a function changes in a given direction. For instance, suppose we are interested

in knowing how a function f :Rn ! R changes as we move in the direction of the vector

v starting at a point x. The curve that this linear motion de…nes is described by the

parametric function

x(t) x + vt:

Now using the chain rule we have that the derivative of f at x in the direction v is

given byd

dtf (x(t)) =

Xn

i=1

@

@xi

f (x)vi

The array of all partial derivatives written and interpreted as a vector in its own

right is called the gradient (vector) of f at x and written as

Of (x) =0B@

@

@x1 f (x

)...@

@xnf (x)

1CA :

Now we can write the above directional derivative in a more compact form as Of (x)v.

Observe that the magnitude of the directional derivative depends on the magnitude

of v. So in order to make all directional derivatives comparable for all directions often

we consider vectors v with kvk = 1.

Interpreting the Gradient Vector . The directional derivative takes the greatest value

in the direction in which f increases most rapidly. Thus, Of (x) v is the greatest

when v points in the direction where f increases most rapidly. Recall that Of (x) v = kOf (x)k kvk cos where is the angle between the two vectors Of (x) and v.

Since cos takes the highest value of 1 when = 0, this means that Of (x) v takes

the greatest value when v points in the same direction as Of (x). In other words, the

11



vector Of (x) really points at the direction in which f increases most rapidly (i.e., the

direction of the steepest ascent).

Example . Consider the function f (x) = x21 + x3

2 x31x2 on R2. The directional

derivative of this function at (1; 1) in the direction of the vector (1; 2) is given by

df

dt

11

+ t

12

t=0

= df

dt

1 + t1 + 2t

t=0

= d

dt((1 + t)2 + (1 + 2t)3 (1 + t)3(1 + 2t))jt=0

= (2(1 + t) + 3 2 (1 + 2t)2

3(1 + t)2

(1 + 2t) 2(1 + t)3

)jt=0

= 2 + 3 2 3 2

= 3:

Exercise . Calculate the same directional derivative using the alternative formula

v Of (x) and check that the answer is the same as above.

Explicit functions from Rn to Rm. A function y = f (x1;:::;xn) of variables x1;:::;xn

is called an explicit function. For instance the function y = x4

1 + x2 3 is an explicitfunction.

Jacobian of a function f : Rn ! Rm. The Jacobian of f is de…ned as the matrix of

partial derivatives

@f (x)

@x

0B@

@ @x1

f 1(x) @ @x2

f 1(x) @ @xn

f 1(x)...

... . . .

...@

@x1f m(x) @

@x2f m(x) @

@xnf m(x)

1CA

Higher order derivatives . The derivatives @ r

@xri f (x) i = 1;:::;n are called the r-thorder partial derivatives of f . Derivatives like @ r

@xki @xlj

f (x) where k + l = r are called the

r-th order mixed partial derivatives of f .

12



1.2 Implicit Functions and Related Concepts

We call a function y = f (x) where f : Rm

!Rn to be implicitly de…ned if it is described

as follows:

F (x1;:::;xm; y1;:::;yn) = 0

with the y variables being identi…ed for every value of the x variables. In other words

we say that y is an implicit function of x if all of the x and y variables are “packed” in a

function on the same side of the equation as above. Note that the function F is a vector

valued function if n > 1. In many applications, including econometrics the y variables

are thought of as endogenous while the x variables are thought of as exogenous .

Example . In the following y is an implicit function of x x1 + x2 + y1 + y2 = 0x1 x2 + y1 y2 = 0

:

Example . In the following y is not an implicit function of x

x2 + y2 1 = 0

around (1; 0).

Often when we deal with the …rst order conditions for our optimization problems

we face systems of equations the solution of which potentially identi…es the solution

to the optimization problem. In order to know, if a solution to such equations exist

it becomes important to know whether the relevant explicit function exists locally.

Whether a solution does exist and the behavior of the solution crucially hinges on the

following two questions:

i. Is it possible to write an implicit function G(x; y) = 0 in an explicit form y = f (x)?

ii. Is it possible to calculate f 0(x)?

If there does exist such a function then it must be the case that

G(x; f (x)) = 0:

Using the chain rule of di¤erentiation we have

Gx(x; f (x)) + Gy(x; f (x))f 0(x) = 0

13



i.e.,

f 0(x) = Gx(x; f (x))

Gy

(x; f (x)):

Formally, we have the following theorem:

Implicit function theorem with two variables

Let G(x; y) be a C 1 function on a ball around (x; y) in R2. Suppose that G(x; y) =

c and consider the expression G(x; y) = c.

If @G@y

(x; y) 6= 0, then there exists a C 1 function y = f (x) de…ned on an interval I

about the point x such that:

(a) G(x; f (x))

c for all x in I ;

(b) f (x) = y, and

(c) f 0(x) = @G@x

(x;y)@G@y (x;y)

:

This theorem generalizes to the following:

Implicit function theorem with many variables

Let G(x1; x2;:::;xn; y) be a C 1 function on a ball about (x1; x2::;xn; y) in Rn+1.

Suppose that G(x1; x2::;xn; y) = c and consider the expression G(x1; x2::;xn; y) = c.

If @G@y

(x1; x2::;xn; y)

6= 0, then there exists a C 1 function y = f (x1; x2::;xn) de…ned

on a ball B about the point (x1; x2::;xn; y) such that:

(a) G(x1; x2::;xn; f (x1; x2::;xn)) c for all (x1; x2::;xn) in B;

(b) f (x1; x2::;xn) = y, and

(c) f 0(x1; x2::;xn) = @G@x

(x1;x2::;xn;y

)@G@y (x

1;x2::;xn;y

).

Taylor’s Theorem (For a function of a single variable). Let f : (a; b) ! R be a

C k(a; b). Then for x 2 (a; b) and h 6= 0 such that x + h 2 (a; b) we have

f (x + h) = f (x) + Xk1

r=1

hr

r!

f (r)(x) + hk

k!

f (k)(~x)

for some number ~x between x and x + h.1

1 We could have alternatively stated the theorem as

f (x + h) = f (x) +Xk1

r=1

hr

r! f (r)(x) +

hk

k! f (k)(x + th)

for some number t 2 (0; 1).

14



Hessians . The Hessian of a function f of n variables is the matrix of all its second

order partial derivatives. It is written as

f xx(x) = D2f x(x) =

0BBBB@

@ 2

@x21

f (x) @ 2

@x2@x1f (x) @ 2

@xn@x1f (x)

@ 2

@x1@x2f (x) @ 2

@x22

f (x) @ 2

@xn@x2f (x)

... ...

. . . ...

@ 2

@x1@xnf (x) @ 2

@x2@xnf (x) @ 2

@x2nf (x)

1CCCCA

Young’s Theorem . This result essentially tells us that in the kind of situations that

we will encounter the order in which derivatives are taken in mixed partial derivatives

is irrelevant, i.e.,

@ 2

@xi@x jf (x) = @

2

@x j@xi

f (x):

Taylor’s Theorem (For a function of many variables). Let f : X ! R, where X is

an open subset of Rn be a C 2 function. Then for x 2 X and h 6=0 with x + th 2 X

for all t 2 [0; 1] we have

f (x + h) = f (x) + f x(x + h)h

for some

2(0; 1), and

f (x + h) = f (x) + f x(x)h + 1

2!hT f xx(x + h)h

for some 2 (0; 1) where f xx is the Jacobian of f .2

1.3 Quadratic forms

Quadratic form . A function of many variables of the form Q(x) =Xn

i=1aijxix j is

called a quadratic form. Incidentally a quadratic form can be written as

Q(x) = xT Ax

where A is a symmetric matrix.

2 Note that the Taylor’s theorem for functions of many variables can be stated to approximate afunction with the k -th and lower order derivatives. However, in that case the formula must be writtenin the longhand.

15



We call a quadratic form Q(x) (i) negative de…nite if Q(x) < 0 for all x 6= 0, (ii)

positive de…nite if Q(x) > 0 for all x 6= 0, (iii) non-negative de…nite (or positive semi-

de…nite) if Q(x) 0 for all x 6= 0, (iv) non-positive de…nite (or negative semi-de…nite)if Q(x) 0.

Theorem 16.1 (S&B). (a) A symmetric matrix A is positive de…nite if and only if all

its leading principal minors are strictly positive. (b) A symmetric matrix A is negative

de…nite if and only if all its leading principal minors alternate in signs, starting with

the negative sign.

Example . The quadratic form Q(x) = 2x2 is positive de…nite while Q(x) = 2x2 is

negative de…nite.

It can be easily checked that Q(0) = 0 holds for any quadratic form. The sign

de…niteness of a quadratic form is interesting because it is equivalent to x = 0 being

a maximum or a minimum of Q(x). If the quadratic form is positive de…nite (resp.,

negative de…nite) then the quadratic form attains its minimum (resp., maximum) at

x = 0. As we will see later on this also is helpful in contexts where we are dealing

with more general functions. The reason being that locally all C 2 functions can be

approximated by a quadratic form. (Remember Taylor’s Theorem!!)

Observe from the …rst example that whether the extremum is a max or a mindepends on whether the second order derivative is negative or positive.

1.4 Existence of maximum/minimum

Weierstrass Theorem . Every continuous function on a closed and bounded subset of

Rn attains its maximum and minimum.

Theorem . Every strictly concave (resp., convex) continuous function on a closed,

bounded and convex subset has a unique maximum (resp., minimum).

2 Static Optimization

Consider the problem of a monopoly …rm. It faces a cost function C (q ) = q 2 and an

inverse demand function P (q ) = 100 q . The problem that the …rm faces is to …nd

16



the output it should produce and the price it should charge to maximize its pro…t.

Without even looking at the details of the functions we can write down the necessary

condition (viz. MC = MR) that the …rm needs to solve in order to make its price-output decision. The rule is obtained by solving the seller’s pro…t maximization problem

involving a single variable.

Consider the problem of the same monopolist if it sells its product to two di¤erent

markets that are disjoint with no arbitrage the inverse demand for the second market

being P (q ) = 100 2q . The problem becomes slightly more complex in that it now

involves two variables q 1 and q 2 representing the outputs in the two markets.

Now suppose that the government stipulates that the …rm must produce 20 units

of the output in total. How should the …rm set its price-quantity decisions for the

two markets? The constrained optimization problem involved in this case is much more

complex than the ones described above. So how does one solve such problems in general,

and use them in economic analysis is what we are going to talk about in the next several

lectures. Before we consider the more complex area of constrained optimization, let us

refresh our memory of unconstrained optimization, in a rather formal way. In the

process, we will try to develop a di¤erent approach that will be helpful for solving and

analyzing the constrained optimization problems.

3.1. Unconstrained Static Optimization

Consider the problem maxx2S f (x) of maximizing a function on an open subset S

of Rn.

Theorem 3.1.1. (First order necessary condition). If f : S ! R is C 1 on an open

set S Rn and f has a local maximum at x 2 S , then f x(x) = 0.

Theorem 3.1.2. (Su¢cient condition). If f : S ! R is C 2 (S is an open subset

of Rn) and at x satis…es the conditions (i) f x(x) = 0 and (ii) the Hessian f xx(x) is

negative semi-de…nite in an open neighborhood of x then f has a local maximum at

x.

We will use only the strict version of this theorem:

Theorem 3.1.2a. (Su¢cient condition). If f : S ! R is C 2 (S is an open subset

of Rn) and at x satis…es the conditions (i) f x(x) = 0 and (ii) the Hessian f xx(x) is

negative de…nite then f has a local maximum at x.

17



Theorem 3.1.3. (Su¢cient condition for strict maximum). If f : S ! R is C 2

and at x satis…es the conditions (i) f x(x) = 0 and (ii) the Hessian f xx(x) is negative

de…nite then f has a strict local maximum at x

.

Example. Consider a competitive …rm with a U -shaped marginal cost curve C q(q ).

The …rst-order condition solves MC (q ) = p which is satis…ed both where the pro…t is

minimized and maximized. The second-order condition then rules out one of the points.

Note. If in addition to the conditions in Theorems 2.2 and 2.3, f is also a concave

function, then f has a global maximum at x.

Example 3.1.1. Consider the pro…t function (x1; x2) = 100

(x1

10)2

(x2

15)2

to a competitive …rm when it uses inputs x1 and x2 to produce its output and sell it

at the market price (over the open subset S = [x : (x1 10)2 + (x2 15)2 < 100]). A

theorem from above (which one?) implies that the pro…t function attains a strict global

maximum at x = (10; 15)T .

Now consider the level sets r [x 2 S : (x) = r]. Observe that an alternative

(although a bit di¢cult) method of solving the same example above is to solve for the

x 2 S such that r = fxg for some r . While this may seem awkward in this case, a

similar approach makes solving a constrained optimization problem easy.

3 Static Optimization with equality constraints

Let us continue with the above example. Suppose that we want to maximize (x1; x2) =

100 (x1 10)2 (x2 15)2 subject to the constraint that the …rm must spend $200

on inputs with prices p1 = 20 and p2 = 5. This is really a maximization problem of the

following type:

maxx

f (x)

s.t. h(x) = 0

In this case the problem takes the form

maxx

100 (x1 10)2 (x2 15)2

s.t. 20x1 + 5x2 = 200:

18



The most direct way of solving the problem involves writing the constraint as x2 =

404x1 and substituting it in the objective function so that it becomes a maximization

problem involving a single variable x1 as follows:

maxx1

100 (x1 10)2 (25 4x1)2:

By a previous result, the …rst-order condition is given by

2(x1 10) + 8(25 4x1) = 0

which gives x1 = 11017

6:47. We then have x2 = 40 4 11017 = 240

17 = 14:12. Compare

this solution with the solution (10; 15) of the unconstrained maximization graphically.

3.1 The tangent method

The direct method becomes computationally intensive when there are two or more

constraints involved in the maximization. When the constraints are nonlinear that

adds to the di¢culty, as well. This calls for developing a more convenient tool that

we discuss below. To see how we can come up with the same necessary condition for

a maximum using an alternative technique, consider the level curves of the objective

function f . Notice that if f attains its maximum at x under the single constraint

h(x) = 0 then one of the level curves viz. fx : f (x) = f (x)g must be just about

touching the curve h(x) = 0. Thus the curves have the same slope at x and share a

common tangent (in the case of the above example, the constraint itself describes the

tangent). This “common-tangent” condition implies that

@f

@x1(x)

@f

@x2(x)

= @h@x1

(x)@h@x2

(x)

or,@f

@x1(x)

@h@x1

(x) =

@f

@x2(x)

@h@x2

(x) =

for some . Thus we have the necessary conditions

@f

@x1(x)

@h

@x1(x) = 0

@f

@x2(x)

@h

@x2(x) = 0

19



to which we can also add the constraint h(x) = 0. These necessary conditions can

be interpreted as saying that the gradients are perfectly aligned, although possibly in

opposite directions. The gradients point in the same or opposite direction dependingon whether is > or < 0.

The same necessary conditions are also obtained by considering the Lagrangian

function L(x; ) f (x) h(x) and writing its …rst order (necessary) conditions for

maximization at (x; ).3 Observe that for the tangency argument to work the curves

must be smooth hence we need to assume that f and h are C 1. This discussion is

summarized in the following theorem.

Theorem 4.1.1. (Theorem 18.1 of S&B; necessary condition with a single equality

constraint). Let f and h be C 1 functions of two variables on an open subset of R2.

Suppose that x is a solution of the problem

maxx

f (x)

s.t. h(x) = 0

Suppose further that the non-degenerate constraint quali…cation (NDCQ) hx(x) 6= 0 is

satis…ed and de…ne the Lagrangian function

L(x; ) f (x1; x2) h(x):

Then, there is a real number such that Lx;(x; ) = 0.

A few words about the NDCQ: To see the role of the NDCQ observe that if

hx(x) = 0 then we cannot rule out the possibility that the constraint is “thick” at x

in which case the tangency argument fails. The NDCQ, however, is not a necessary

condition for the constraint to be thin (see Dixit’s discussion on NDCQ on pages 13-

14 for an example). To avoid complex discussions here we simply do not deal with

situations where the NDCQ is not satis…ed.

Next we generalize the above theorem to the case of an arbitrary number of variables

and constraints.

3 Read also the discussions in Simon and Blume, and in Dixit.

20



Theorem 4.1.2. (Theorem 18.2 of S&B: necessary conditions with multiple equal-

ity constraint). Let f , h1;:::;hm be C 1 functions on an open subset of Rn (i.e., functions

of n variables). Suppose that x

is a solution of the problem

maxx

f (x)

s.t. h(x) = 0

Suppose further that the Jacobian hx(x) has rank m (as large as it can be)4 and de…ne

the Lagrangian function

L(x; ) f (x) T h(x):

Then, there is a 2 Rm such that Lx;(x; ) = 0.

Question. Can you tell why the statements involve open subsets rather than

closed?

Theorem 4.1.3. (Su¢cient condition for a strict local maximum) Let f , h1;:::;hm

be C 2 functions on an open set in Rn. If there exist vectors x and such that

Lx;(x; ) = 0 and the Hessian Lxx(x; ) is negative de…nite then f has a strict

local maximum at x.

Example. Consider again the example of the …rm that must maximize its pro…t

subject to an expenditure of $200. Let us solve the same problem now using theLagrangian method. First of all, observe that the NDCQ is satis…ed. We have L(x; ) =

100 (x1 10)2 (x2 15)2 (20x1 + 5x2 200). Suppose that a maximum exists

at x, then the necessary conditions for maximization can be written as

Lx1(x; ) = 2(x1 10) 20 = 0

Lx2(x; ) = 2(x2 15) 5 = 0

L(x; ) = 20x1 + 5x2 200 = 0

Solving these equations we have x1 =

110

17 ; x2 =

240

17 , and =

6

17 .Now we check if the su¢ciency condition for maximization is satis…ed at this solu-

tion. The relevant Hessian is given by

Lxx(x; ) =

2 00 2

:

4 This condition is the non-degenerate constraint quali…cation (NDCQ) for this result.

21



Since the Hessian is negative de…nite at x1 = 11017 ; x2 = 240

17 , and = 617 , therefore, the

aforementioned outputs maximize the seller’s pro…t under the given constraint.

Observe that outputs cannot possibly be negative. So we should have really imposedthe constraints x1 0 and x2 0, as well. In this case, we were just lucky that

that did not cause any problem, but there can be situations where not imposing these

constraints explicitly may end up giving negative outputs which would be meaningless

in this context. Later, we will see such a situation.

3.2 Optimization with inequality constraint

Theorem 4.2.1. (Theorem 18.3 of S&B: optimization under a single inequality con-

straint). Suppose that f and g are C 1 functions on R2 and that (x1; x2) maximizes f

on the constraint set g(x1; x2) 0. If g(x1; x2) = 0, suppose that

@g

@x1(x1; x2) 6= 0 or

@g

@x2(x1; x2) 6= 0:

Then there is a multiplier such that:

(i) @L

@x1(x1; x2; ) = 0

(ii) @L

@x2

(x1; x2; ) = 0

(iii) g(x1; x2) = 0

(iv) 0

(v) g(x1; x2) 0

where the Lagrangian L(x1; x2; ) = f (x1; x2) g(x1; x2).

Theorem 4.2.2. (Theorem 18.4 of S&B: optimization under multiple inequality

constraints). Suppose that f and g1;:::;gk are C 1 functions on Rn and that x maximizes

f on the constraint set g(x) = (g1(x);:::;gk(x))

T

0. Suppose also that the …rst k0constraints are binding and that the last k k0 constraints are not binding, and that

the rank of the Jacobian of the binding constraints0B@

@g1@x1

(x) @g1@xn

(x)...

. . . ...

@gk0@x1

(x) @gk0@xn

(x)

1CA

22



at (x; y) is k0.

Then, there is a vector of multipliers such that:

(i) Lx(x; ) = 0

(ii) i gi(x) = 0, i = 1; : : ;k

(iii) 0

(iv) g(x) 0

where the Lagrangian L(x; ) = f (x) T g(x).

3.3 More results with multiple equality/inequality constraintsTheorem 4.4.1. (Theorem 18.5 of S&B: necessary condition for optimization with

equality and inequality constraints). Suppose that f; g1;:::;gk; h1;:::;hm are C 1 func-

tions of n variables. Suppose x 2 Rn is a local maximizer of f on the constraint set

de…ned by the k inequalities and m equalities:

g(x) 0

h(x) = 0:

Assume, without loss of generality, that the …rst k0 inequality constraints are binding

and the remaining kk0 are not. Suppose also that the NDCQ that require the binding

constraints to have a nonzero Jacobian, holds, i.e.

0BBBBBBBB@

@g1@x1

(x) @g1@xn

(x)...

. . . ...

@gk0@x1

(x) @gk0@xn

(x)@h1@x1

(x) @h1@xn

(x)...

. . . ...

@hm@x1 (x) @hm

@xn (x)

1CCCCCCCCA

23



has a rank k0 + m. Then there exist multipliers 1;:::;k; 1;:::;m such that

(i) Lx(x; ; ) = 0

(ii) i gi(x) = 0, i = 1;::;k

(iii) h(x) = 0

(iv) 0

(v) g(x) 0

where the Lagrangian

L(x;;) = f (x) T g(x) T h(x):

Theorem 4.4.2. (Theorem 18.7 of S&B: Kuhn-Tucker necessary condition). Let f

and gi, i = 1;:::;k be C 1 functions on an open subset S of Rn. Suppose x is a solution

of the constrained maximization problem

maxx

f (x)

g(x)

0

x 0

and that the rank of the Jacobian matrix (@gi=@x j) has the maximal rank at x where

the i-s vary over the indices for which the constraint gi(x) 0 is binding at x and the

j-s vary over the indices for which x j > 0. Then there exist nonnegative multipliers

1;:::;k such that x1;:::;xn; 1;:::;k satisfy the following system of equalities and

inequalities:

@ ~L

@x

(x; )

0

@ ~L

@(x; ) 0

xi

@ ~L

@xi

(x; ) = 0, i = 1;::;n

j@ ~L

@ j(x; ) = 0, j = 1;::;k

24



where ~L(x; ) = f (x) T g(x) is the Kuhn-Tucker Lagrangian of the problem.

Theorem 4.4.3. (Kuhn-Tucker su¢ciency condition) Let f be a concave C 1 func-

tion and gi, i = 1;:::;k be convex C 1 functions on an open convex subset S of Rn. If

x 2 S satis…es the Kuhn-Tucker conditions

@ ~L

@x(x; ) 0

@ ~L

@(x; ) 0

xi

@ ~L

@xi

(x; ) = 0, i = 1;::;n

j@ ~L

@ j (x

;

) = 0, j = 1;::;k

then f has a global maximum at x subject to the constraints

g(x) 0

x 0:

Example. Consider the problem of a price discriminating monopolist with a cost

function C (q ) = q 2 who sells to two disjoint markets (without arbitrage) that have

inverse demands P 1(q 1) = 100 q 1 and P 2(q 2) = 150 q 2. Suppose that environmental

regulations prohibit the monopoly …rm from producing more than 25 units of outputin total. Calculate how much this pro…t maximizing monopolist will be selling in each

market.

Kuhn-Tucker necessary conditions for the problem are given by

@L

@q 1= 100 2q 1 2(q 1 + q 2) 0

@L

@q 2= 150 2q 2 2(q 1 + q 2) 0

q 1@L

@q 1= q 1(100

2q 1

2(q 1 + q 2)

) = 0

q 2@L

@q 2= q 2(150 2q 2 2(q 1 + q 2) ) = 0

and

q 1 + q 2 25 0

(q 1 + q 2 25) = 0:

25



If q 1 + q 2 25 < 0 then = 0, which implies that

100 2q 1 2(q 1 + q 2) 0150 2q 2 2(q 1 + q 2) 0

adding which we have q 1 + q 2 2506

> 25, which is impossible. Therefore, it must be

true that

q 1 + q 2 = 25:

Next the …rst two inequalities imply that

250 2(q 1 + q 2) 4(q 1 + q 2) 2 0

or, that

100 2

or that 50.

Next suppose that q 1 > 0 which implies that

100 2q 1 2(q 1 + q 2) = 0

or

50 2q 1 = 0

or, that

= 50 2q 1 < 50

since q 1 > 0. This is a contradiction to our previous …nding that 50. Hence q 1 = 0

and q 2 = 25. In other words, it is optimal for the …rm to produce 25 units and sell the

entire output in market 2 only. The price it charges in market 2 is $125. (Note that in

this case it is not even necessary to prevent arbitrage.

3.4 Interpretations of Lagrange Multiplier and Comparativestatics with Envelope Theorem

The precise manner in which exogenously given parameters of an economic system a¤ect

the endogenously obtained economic outcomes are of great interest to economists. For

26



instance, in the principal-agent model of designing the optimal contract the principal

would be interested to know how the reserve utility of the agent (which is often a

function of the agent’s outside options) would impact the principal’s pro…t. The impactof the di¤erent constraints faced in decision making on our chosen objective is also

interesting, among other things, for a better understanding and appreciation of the

trade-o¤s we face in economic contexts. In what we have seen so far, the Lagrangian

multiplier that accompanied a constraint in the optimization problem was simply a

tool that facilitated our computation. In this section we will see that the Lagrangian

multiplier is also loaded with information about the tradeo¤s made in the optimization

process. We will also see results that tell us how to compute the relationship between

the parameters of the model and the optimal value of the objective function.

First consider the following two theorems that tell us how to interpret the Lagrangian

multipliers.

Theorem 5.1. (Theorem 19.2 of S&B: Interpretation of the multiplier: equality

constraints). Let f; h1;::;hm be C 1 functions on Rn. Let a = (a1;:::;am) be an m-tuple

of exogenous parameters, x(a) be the solution of the maximization problem

maxx

f (x)

h(x) = a

with the corresponding multipliers being (a). Suppose that x(a) and (a) are

di¤erentiable function of a and that the relevant NDCQ holds at a. Then the multiplier

corresponding to the j -th constraint satis…es:

j(a) = @

@a jf (x(a)):

Thus the multiplier is the rate at which the maximum value of the objective functionchanges when the constraint is relaxed. It can be seen as the price we will be willing to

pay to relax the constraint a little bit. It is, therefore, called the shadow price of the

corresponding constraint. Increasing the level of ai may have a positive or a negative

e¤ect on the objective. To see the intuition behind this observation consider the case of

a manufacturer who is constrained to use a …xed number of workers (due to government

27



regulations or worker union demands). It is possible that the number of workers the

…rm is allowed to employ is too few and increasing the number of workers will help the

…rm to organize things better (e.g. perform division of labor, etc.) In that case, the…rm’s pro…t can be expected to increase if it is allowed to add one more employee. On

the other hand, consider the situation where the …rm has many more employees than it

needs. Imagine the extreme scenario, where the …rm is so over-sta¤ed that some of the

employees cannot be put into any productive activity. If the …rm is constrained to add

one more employee then that will not increase its output or revenue. This simply adds

to the …rm’s cost. In other words, adding the employee in this case is actually going to

decrease the …rm’s pro…t. It is due to this reason that the shadow price corresponding

to an equality constraint can be negative or positive.

Next consider the case of the inequality constraints:

Theorem 5.2. (Theorem 19.3 of S&B: Interpretation of the multiplier: inequality

constraints). Let f ; g1;::;gm be C 1 functions on Rn. Let a = (a1;:::;am) be an m-tuple

of exogenous parameters, x(a) be the solution of the maximization problem

maxx

f (x)

s:t: g(x) awith the corresponding multipliers being (a). Suppose that x(a) and (a) are

di¤erentiable function of a and that the relevant NDCQ holds. Then the multiplier

corresponding to the j -th constraint satis…es:

j(a) = @

@a jf (x(a)):

Example. Consider a monopoly seller who uses l units of labor and r units of

a certain raw material to produce lr units of it product. The seller faces an inversedemand of P (q ) = 100q in the market. Let us calculate the optimal choice of the seller

if it is allowed to use at most 10 units of its raw material by government regulations

when the prices for labor and the raw material are given by $2 and $1 per unit. In this

28



case, we need to solve

maxl;r

(100

lr)lr

(2l + r)

s.t. r 10

l 0

r 0

The Lagrangian is given by

L = (100 lr)lr (2l + r) 1(r 10) + 2l + 3r:

The …rst-order conditions are

100r 2lr2 2 = 0

100l 2l2r 1 1 = 0

The 2 and 3 are ignored in the equation since these multipliers will be equal to zero

in light of the fact that labor and the raw material will be chosen in positive quantities

(otherwise the seller will make a zero pro…t when he could do better).

There are two possibilities. In the …rst r < 10 in which case 1 = 0 and we need to

solve

100r 2lr2 2 = 0

100l 2l2r 1 = 0:

These equations give two sets of (non-negative) solutions: (r 9:99; l 4:99) and

(r 0:02; l 0:01). The corresponding values of the objective function are 2475.02

and 0:03, respectively. In the second possibility r = 10 in which case the …rst equation

gives

100 10 2 102l 2 = 0

or, that l = 4:99. The corresponding value of the objective function is 2480.01. Thus

the solution we are looking for is given by r = 10; l = 4:99. The value of the Lagrangian

in this case is given by

1 100 4:99 2 2:992 10 1

= 319:198

29



Trick: In this case we do not need to check for the second order condition. We

know that if there are multiple critical points some of them must be the max-s and the

other the min-s. Therefore, using the continuity and di¤erentiability of the functionsthe solution with the largest value must be a max.

Interpretation of the multiplier: The monopoly seller is willing to pay a price

of $319.198 per unit for using a tiny bit more of the raw material. That is because the

seller’s pro…t will increase at a rate of $319.198 if it can use a tiny bit more of the raw

material. (Note the word “rate” in the statement .)

Next, let us examine the e¤ect of other types of parameters (i.e. parameters that is

not necessarily associated with the level of the constraint) on the optimal value of theobjective function.

Theorem 5.3. (Theorem 19.4 of S&B: Envelope Theorem for unconstrained max-

imization). Let f (x; a) be a C 1 function of x 2 Rn and the real valued parameter a.

For each choice of a let x(a) be the C 1 solution of the unconstrained maximization

problem maxa f (x; a). Then

d

daf (x(a); a) =

@

@af (x(a); a):

Proof. Recall that the …rst-order necessary condition for the maximization prob-

lem is@f

@xi

f (x(a); a) = 0 for all i.

Now using the chain rule of taking derivatives (see the discussion of preliminary math-

ematical results) we have for each j

d

da jf (x(a); a) =

Xi

@f

@xi

f (x(a); a)@x(a)

@a j+

@f

@a jf (x(a); a)

= @f @a j

f (x(a); a)

using the above …rst order condition.

Theorem 5.4. (Theorem 19.5 of S&B: Envelope Theorem for maximization with

equality constraints). Let f (x; a); h1(x; a);:::;hk(x; a) be real valued C 1 functions on

30



Rn R. Also let x(a) be the solution of the maximization problem

maxx f (x; a)

h(x; a) = 0

with the corresponding multipliers being (a). Suppose that x(a) and (a) are C 1

functions of a and that the NDCQ holds. Then,

d

daf (x(a); a) =

@L

@a(x(a); (a); a);

where L is the Lagrangian for the problem.

Documents

Lagrangian Notes