161
Lecture Notes on Mathematics for Economists: A refresher (Draft for teaching only. Do not cite.) Andrés Carvajal University of Warwick [email protected] August, 2006

Lecture Notes on Mathematics for Economists: A refresher

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Lecture Notes on Mathematics for Economists: A refresher

Lecture Notes on Mathematics forEconomists: A refresher

(Draft for teaching only. Do not cite.)

Andrés Carvajal

University of Warwick

[email protected]

August, 2006

Page 2: Lecture Notes on Mathematics for Economists: A refresher

Contents

Preface ix

1 Preliminary concepts 11.1 Some useful notation . . . . . . . . . . . . . . . . . . . . . . . 11.2 Set theoretical concepts . . . . . . . . . . . . . . . . . . . . . 2

1.2.1 Definitions: . . . . . . . . . . . . . . . . . . . . . . . . 21.2.2 Operations: . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 The principle of mathematical induction . . . . . . . . . . . . 6

2 Introduction to real analysis 72.1 Natural and Real numbers . . . . . . . . . . . . . . . . . . . . 72.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.3 Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 Cauchy sequences and subsequences . . . . . . . . . . . . . . . 122.5 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.5.1 Limits of sequences . . . . . . . . . . . . . . . . . . . . 132.5.2 Limits of functions . . . . . . . . . . . . . . . . . . . . 142.5.3 Properties of limits . . . . . . . . . . . . . . . . . . . . 17

2.6 Euler’s number and natural logarithm . . . . . . . . . . . . . . 21

3 Topology of RK 233.1 Open and closed sets . . . . . . . . . . . . . . . . . . . . . . . 23

3.1.1 Open sets . . . . . . . . . . . . . . . . . . . . . . . . . 233.1.2 Closed sets . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Bounded sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Compact sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.4 Infimum and Supremum . . . . . . . . . . . . . . . . . . . . . 27

v

Page 3: Lecture Notes on Mathematics for Economists: A refresher

vi CONTENTS

4 Continuity 294.1 Continuity of functions . . . . . . . . . . . . . . . . . . . . . . 294.2 Properties and the intermediate value theorem . . . . . . . . . 304.3 Left/Right-continuity . . . . . . . . . . . . . . . . . . . . . . . 32

5 Differentiability 355.1 Functions on R . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1.1 Differentiability . . . . . . . . . . . . . . . . . . . . . . 355.1.2 Continuity and differentiability . . . . . . . . . . . . . 385.1.3 Computing derivatives . . . . . . . . . . . . . . . . . . 385.1.4 Higher order derivatives . . . . . . . . . . . . . . . . . 415.1.5 Derivatives and limits . . . . . . . . . . . . . . . . . . 43

5.2 Functions on RK . . . . . . . . . . . . . . . . . . . . . . . . . 445.2.1 Partial differentiability . . . . . . . . . . . . . . . . . . 445.2.2 Differentiability . . . . . . . . . . . . . . . . . . . . . . 45

6 Composite functions 476.1 Continuity of composite functions . . . . . . . . . . . . . . . . 476.2 Differentiability: the chain rule . . . . . . . . . . . . . . . . . 48

7 Taylor approximations 517.1 Approximations by polynomials . . . . . . . . . . . . . . . . . 517.2 Taylor approximations . . . . . . . . . . . . . . . . . . . . . . 547.3 The remainder . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

7.3.1 The remainder . . . . . . . . . . . . . . . . . . . . . . 567.3.2 Mean value and Taylor’s theorems . . . . . . . . . . . . 57

7.4 Local accuracy of Taylor approximations . . . . . . . . . . . . 58

8 Linear algebra 618.1 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.2 Linear functions . . . . . . . . . . . . . . . . . . . . . . . . . . 628.3 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . 638.4 Linear independence, dimension and rank . . . . . . . . . . . . 648.5 Inverse matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 658.6 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . 668.7 Quadratic forms . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Page 4: Lecture Notes on Mathematics for Economists: A refresher

CONTENTS vii

9 Concavity and convexity 699.1 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 699.2 Concave and convex functions . . . . . . . . . . . . . . . . . . 699.3 Concavity and second order derivatives . . . . . . . . . . . . . 719.4 Quasiconcave and strongly concave functions . . . . . . . . . . 749.5 Composition of functions and concavity . . . . . . . . . . . . . 759.6 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

10 Unconstrained maximization 7910.1 Maximizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7910.2 Existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8110.3 Characterizing maximizers . . . . . . . . . . . . . . . . . . . . 81

10.3.1 Problems in R . . . . . . . . . . . . . . . . . . . . . . . 8210.3.2 Higher-dimensional problems . . . . . . . . . . . . . . . 84

10.4 Maxima and concavity . . . . . . . . . . . . . . . . . . . . . . 86

11 Constrained Maximization 8911.1 Equality constraints . . . . . . . . . . . . . . . . . . . . . . . . 9011.2 Inequality constraints . . . . . . . . . . . . . . . . . . . . . . . 9511.3 Parametric programming . . . . . . . . . . . . . . . . . . . . . 98

11.3.1 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . 9911.3.2 Differentiability . . . . . . . . . . . . . . . . . . . . . . 100

12 Riemann integration 10312.1 The Riemann integral . . . . . . . . . . . . . . . . . . . . . . . 10312.2 Properties of the Riemann integral . . . . . . . . . . . . . . . 10512.3 Fundamental Theorems of Calculus . . . . . . . . . . . . . . . 10712.4 Antiderivatives (indefinite integrals) . . . . . . . . . . . . . . . 10712.5 Integration by parts . . . . . . . . . . . . . . . . . . . . . . . . 11012.6 Improper integrals . . . . . . . . . . . . . . . . . . . . . . . . 11112.7 Integration in higher-dimensional spaces . . . . . . . . . . . . 113

13 Probability 12113.1 Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . 121

13.1.1 Algebras and σ-algebras: . . . . . . . . . . . . . . . . . 12113.1.2 Measure . . . . . . . . . . . . . . . . . . . . . . . . . . 12513.1.3 Example: Lebesgue measure . . . . . . . . . . . . . . . 127

13.2 Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Page 5: Lecture Notes on Mathematics for Economists: A refresher

viii CONTENTS

13.3 Conditional probability . . . . . . . . . . . . . . . . . . . . . . 13013.4 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . 13113.5 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . 13313.6 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13813.7 Independence of random variables . . . . . . . . . . . . . . . . 14313.8 Convergence of random variables . . . . . . . . . . . . . . . . 14513.9 The (weak) law of large numbers . . . . . . . . . . . . . . . . 14913.10Central limit theorem . . . . . . . . . . . . . . . . . . . . . . . 150

Page 6: Lecture Notes on Mathematics for Economists: A refresher

Preface

These lecture notes have been written as reference material for a refreshercourse in mathematics, for students entering a Ph.D. program in economicsin which further, more advanced, math courses are offered. What materialshould be covered, and at what level, were matters of choice. The concernhere was that there usually is a gap between the math that an undergraduatestudent in economics is taught, and the complexity of the mathematicalanalysis that faces the first year Ph.D. student. These notes present topicswith which the typical undergrad is familiar, only at a slightly higher level.The objective is to remind the students of mathematical concepts and resultsthat they already know, with enough formality so as to introduce them tothe type of reasoning that will be used in advanced courses. It is both arefresher and a warm-up.The notes were used for a 16-lecture course offered to classes entering

the Ph.D. in economics at Brown University, between 2000 and 2002. Brownoffers two courses in mathematics for their first-year students, and the coursewas integrated to that sequence. In a sense, the 16 lectures simply meantthat the math sequence began before the core courses. It seems fair to saythat for a program in which no further math instruction is offered, thesenotes leave out some important material. At Brown, the reactions to thecourse were mixed: some students considered that the course was way tooeasy, while others found it challenging. This was not unexpected, and I tookit as good news. Brown students were kind enough to pick on many typosand mistakes. Many thanks to them!The chapter on measure and probabililty was written in cooperation with

Alvaro Riascos, and has been tested with students at Universidad del Rosarioand Universidad Javeriana; Thanks a lot to them too.There is no claim of originality about the material presented. I borrowed

heavily from classical books, such as [7], [6] and [1]. A book which containsthe same material and more, and is written for economists, is [8]. Moreadvanced, but with the same goal in mind, is [5]. I borrowed from thesebooks, and from [4], [3] and [2], as well. Also, the notes remit students to [?]for some exercies.

ix

Page 7: Lecture Notes on Mathematics for Economists: A refresher
Page 8: Lecture Notes on Mathematics for Economists: A refresher

Chapter 1

Preliminary concepts

1.1 Some useful notation

Besides the standard notation in mathematics, the following will be usedthroughout the lecture notes.The symbol “∃” means “there exists”; the symbol “∀” means “for all”.

Both of the symbols “:” and “|” will be used to mean “such that” (theformer will be used accompanying mathematical or logical statements, whilethe latter will be used when denoting conditions in the definition of a set).The symbol “∧” will mean “and”, whereas “∨” will mean “or”.Example 1.1 The statement ∀x ∈ X,∃y ∈ Y : y < x means that for anyelement x of the set X, there exists some y in the set Y such that y < x.

Example 1.2 We can define the set X as the subset of elements of the setZ that are greater than some given y, by saying:

X = x ∈ Z | x > yThe symbol “=⇒” means “then”. For example, A =⇒ B is shorthand

for saying that statement A implies statement B. In such a case, A is saidto be sufficient condition for B (and B necessary condition for A).The symbol “⇐⇒” means “if and only if” (which we will write iff). For

instance, A ⇐⇒ B means that A occurs iff B occurs (in which case we saythat the statements are equivalent).The symbol “¬” will be used to negate a statement. In some cases, we

will use parenthesis to clarify the notation.

1

Page 9: Lecture Notes on Mathematics for Economists: A refresher

2 CHAPTER 1 PRELIMINARY CONCEPTS

Example 1.3 (A =⇒ B)⇐⇒ (¬B =⇒ ¬A).Example 1.3 is the principle of “contrapositive arguments”, which is very

useful for mathematical proofs. A closely related method of proof is the oneof “arguments by contradiction”, in which, in order to show that A =⇒ B,one shows that (A ∧ ¬B) is impossible. These are alternatives to the methodof “direct proof”, in which, in order to prove A =⇒ B, one finds collection ofn <∞ statements (whether definitions, axioms or already proven theorems)of the form Am−1 =⇒ Am, for each m ∈ 1, ..., n, such that A0 = A andAn = B. Then, one has the following reasoning:

A = A0 =⇒ A1 =⇒ A2 =⇒ ... =⇒ An−1 =⇒ An = B

Finally, we may use the symbol “∴” to mean “therefore”, while “¥” willdenote the end of a proof.

1.2 Set theoretical concepts

1.2.1 Definitions:

By set, we mean a collection of objects. These objects are called elements.Of course, this is not a bona fide definition, since the concept “collection”has not been defined either. Rather that trying to define a collection, we willtake the concept of set as a “primitive” of our theory.The idea that is important, though, is that a set is completely defined by

its elements (no matter what means one uses to describe them).

Definition 1.1 We define the empty set, ∅, as the set which has no elements

Notice that we say “the empty set”, rather than “an empty set”. Thereason is that, since a set is completely defined by its elements, there existsonly one empty set (no matter how one ends up finding it!).

Definition 1.2 We say that X ⊆ Y whenever x ∈ X =⇒ x ∈ Y . When wehave that X ⊆ Y and Y ⊆ X, we say that X = Y.

Theorem 1.1 For every set X, ∅ ⊆ X and X ⊆ X.

Proof. Left as an (easy) exercise.The symbols “N” and “R” will respectively be used to denote the sets of

natural and real numbers.

Page 10: Lecture Notes on Mathematics for Economists: A refresher

1.2 SET THEORETICAL CONCEPTS 3

1.2.2 Operations:

For the rest of this chapter, let us fix some setX. Relative to this “universe,”we can define the elementary set operations.

Definition 1.3 ∀A,B ⊆ X, we define:

A ∩B = x ∈ X | x ∈ A ∧ x ∈ BA ∪B = x ∈ X | x ∈ A ∨ x ∈ BA\B = x ∈ X | x ∈ A ∧ x /∈ BAc = X\A

Notice that if A,B ⊆ Y , then the first three operations don’t changewhen defined relative to Y , but the fourth one does.

Theorem 1.2 ∀A,B ⊆ X, we have:

A ⊆ B ⇐⇒ Bc ⊆ Ac

(Ac)c = A

∅c = X

Xc = ∅A ∪Ac = X

A ∩Ac = ∅A\B = A ∩Bc

A ∩B = ∅⇐⇒ A ⊆ Bc

A ∩B ⊆ A

A ∩B = A⇐⇒ A ⊆ B

A ⊆ A ∪BA ∪B = A⇐⇒ B ⊆ A

Proof. Left as an exercise. For illustration purposes, we prove the firststatement:We first prove the “if” part: suppose thatBc ⊆ Ac. Then, x ∈ Bc =⇒ x ∈

Ac, which means that, by definition, (x ∈ X ∧ x /∈ B) =⇒ (x ∈ X ∧ x /∈ A),and hence, by example 1.3, ¬ (x ∈ X ∧ x /∈ A) =⇒ ¬ (x ∈ X ∧ x /∈ B). Then,

(¬ (x ∈ X) ∨ ¬ (x /∈ A)) =⇒ (¬ (x ∈ X) ∨ ¬ (x /∈ B))

Page 11: Lecture Notes on Mathematics for Economists: A refresher

4 CHAPTER 1 PRELIMINARY CONCEPTS

and, therefore,

¬ (x /∈ A) =⇒ (¬ (x ∈ X) ∨ ¬ (x /∈ B))

meaning thatx ∈ A =⇒ (¬ (x ∈ X) ∨ x ∈ B)

but then, since A ⊆ X,x ∈ A =⇒ x ∈ B

We now prove the “only if” part: suppose that A ⊆ B. If Bc = ∅, we aredone, by theorem 1.1. Else, suppose that x ∈ Bc. Then, x ∈ X and x /∈ B.Since A ⊆ B, we must have that x /∈ A. Then, we have x ∈ X and x /∈ A,which means that x ∈ Ac. Since this holds true ∀x ∈ Bc, we conclude thatBc ⊆ Ac.Notice that the argument used to prove sufficiency (if) uses the contrapos-

itive principle, whereas the necessity (only if) part argues by contradiction(where?). As part of the exercise, you may want to try the sufficiency partusing a contradiction argument and the necessity part via the contrapositiveprinciple.

Theorem 1.3 ∀A,B,C ⊆ X, we have:

A ∪ (B ∪ C) = (A ∪B) ∪ C = (A ∪ C) ∪BA ∩ (B ∩ C) = (A ∩B) ∩ C = (A ∩ C) ∩B = A ∩B ∩ CA ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C)A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C)

(A ∩B) ∪ (A\B) = A

Proof. Left as an exercise. For illustration, we prove the last statement:By definition, we have to show that (A ∩B) ∪ (A\B) ⊆ A and A ⊆

(A ∩B) ∪ (A\B).For the first part, notice that if (A ∩B) ∪ (A\B) = ∅, we are done by

theorem 1.1. Otherwise, suppose that x ∈ (A ∩B) ∪ (A\B). By definition,then, either x ∈ (A ∩B), or x ∈ (A\B). If the former is true, x ∈ A followsfrom theorem 1.2 (A ∩ B ⊆ A). If the latter is true, x ∈ A follows bydefinition.For the second part, again notice that if A = ∅ the result is straightfor-

ward. Otherwise, consider x ∈ A. Since A ⊆ X, then x ∈ X. Obviously,

Page 12: Lecture Notes on Mathematics for Economists: A refresher

1.2 SET THEORETICAL CONCEPTS 5

either x ∈ B or x /∈ B. Since B ⊆ X, either x ∈ B or x ∈ Bc. In the firstcase, x ∈ A ∩B. In the second x ∈ A ∩Bc = A\B, by theorem 1.2.

Given the first result of the previous theorem, if we just define

A ∪B ∪ C = x ∈ X|x ∈ A ∨ x ∈ B ∨ x ∈ C

we get that A ∪ (B ∪ C) = (A ∪B) ∪ C = (A ∪ C) ∪ B = A ∪ B ∪ C. Thesame can be done for intersection and for collections or more than three sets.

Theorem 1.4 (DeMorgan’s laws) ∀A,B ⊆ X, we have:

(A ∩B)c = Ac ∪Bc

(A ∪B)c = Ac ∩Bc

Proof. For the first law:If (A ∩B)c = ∅, we have that (A ∩B)c ⊆ Ac ∪ Bc. If (A ∩B)c 6= ∅,

suppose that x ∈ (A ∩B)c. Then x ∈ X and x /∈ (A ∩B) . The latterimplies that ¬ (x ∈ A ∧ x ∈ B), which is the same as saying (x /∈ A ∨ x /∈ B).Since x ∈ X, we have that (x ∈ Ac ∨ x ∈ Bc), so that x ∈ Ac ∪ Bc and(A ∩B)c ⊆ Ac ∪Bc.Now, if Ac ∪Bc = ∅, we have that Ac ∪Bc ⊆ (A ∩B)c. If Ac ∪Bc 6= ∅,

suppose that x ∈ Ac ∪Bc. Then, (x ∈ Ac ∨ x ∈ Bc), which means that

((x ∈ X ∧ x /∈ A) ∨ (x ∈ X ∧ x /∈ B))

Therefore, we have that x ∈ X and ¬ (x ∈ A ∧ x ∈ B), so that x ∈ Xand ¬ (x ∈ A ∩B) or that x ∈ (A ∩B)c. Then, Ac ∪Bc ⊆ (A ∩B)c .The proof of the second law is left as an exercise.

All these results exist in far more general versions. In many cases, how-ever, the ideas behind their proofs are the same as here.

Exercise 1.1 (Generalized DeMorgan’s laws) A general version of De-Morgan’s laws will later prove to be useful. Formulate and prove a law thatapplies to more general collections of sets (and not just to two-set collections).

Page 13: Lecture Notes on Mathematics for Economists: A refresher

6 CHAPTER 1 PRELIMINARY CONCEPTS

1.3 The principle of mathematical induction

We already know some standard techniques to prove statements of the formA =⇒ B. In general, in order to prove that in some specific mathematicalcontext statement B is true, we can use these techniques, using as “A” thewhole mathematical structure that the context has. For a particular typeof problems, though, there exists a technique that usually proves to be veryeffective. Consider first the following axiom:

Axiom 1.1 (The Principle of Mathematical Induction)

(1 ∈ Ω ∧ (∀n ∈ N, n ∈ Ω =⇒ n+ 1 ∈ Ω)) =⇒ N ⊆ Ω

Now, suppose that we want to show that ∀n ∈ N, the statement B (n) istrue. Then, it follows from axiom 1.1 that all we need to show is that

B (1) is true

and that ∀n ∈ N,

B (n) is true =⇒ B (n+ 1) is true

Page 14: Lecture Notes on Mathematics for Economists: A refresher

Chapter 2

Introduction to real analysis

2.1 Natural and Real numbers

In a Real Analysis course, one should probably be concerned with the con-struction of the set of real numbers, R. One way to do this would be totake as a primitive the set N of natural numbers, then construct the set ofrational numbers, and finally fill in the holes that are left by the latter (i.e.,the irrational numbers). Once R is constructed, it is shown that it can becompletely characterized by three groups of axioms. Although we will takeas given the existence and properties of R, we now recall the first two groupsof axioms exhibited by R (you’ll see how trivial they look; that’s why weneed not spend much time on them, and take them as given).

Axiom 2.1 (Field Axioms) ∀x, y, z ∈ R, we have:

x+ y = y + x

(x+ y) + z = x+ (y + z)

∃0 ∈ R : ∀x ∈ R, x+ 0 = x

∀x ∈ R, ∃w ∈ R : x+ w = 0

xy = yx

(xy) z = x (yz)

∃1 ∈ R : ∀x ∈ R, 1x = x

∀x ∈ R, x 6= 0,∃w ∈ R : xw = 1x (y + z) = xy + xz

7

Page 15: Lecture Notes on Mathematics for Economists: A refresher

8 CHAPTER 2 INTRODUCTION TO REAL ANALYSIS

We will denote by R+ the set of nonnegative real numbers and by R++the set of positive real numbers. Accordingly, we denote R− = R\R++ andR−− = R\R+

Axiom 2.2 (Order Axioms) ∀x, y ∈ R++ and ∀z ∈ R, we have:

x+ y ∈ R++xy ∈ R++−x /∈ R++z ∈ R++, or − z ∈ R++, or z = 0

In 3, we will introduce the third axiom (axiom 3.1). This one is satisfiedby R, but not by the rationals.The usual way to measure how far away from 0 a real number is is the

absolute value. This is defined as follows:

|x| = x if x > 0−x if x < 0

Technically speaking, the absolute value is a “norm,” and, when used, itdefines R as a normed vector space.Four properties of the absolute value are straightforward: (1) ∀x ∈

R, |x| > 0; (2) ∀x ∈ R, |x| > x; (3) if x ∈ R, and y ∈ R+ are such that−y 6 x 6 y, then |x| 6 |y|; and (4) ∀x ∈ R, |x| = |−x|.Also, the absolute value satisfies a crucial property:

Lemma 2.1 (Triangle Inequality in R) ∀x, y ∈ R, |x+ y| 6 |x|+ |y| .

Proof. If x + y > 0, we have |x+ y| = x + y 6 |x| + |y|, by definition andproperty (2). Alternatively, if x + y < 0, we have |x+ y| = − (x+ y) =(−x) + (−y) 6 |−x|+ |−y| = |x| + |y|, by definition and properties (2) and(4).

Exercise 2.1 Prove that if x ∈ R++ and y ∈ R is such that |y − x| < x, theny ∈ R++. Also prove that if z ∈ R−− and y ∈ R is such that |y − z| < −z,then y ∈ R−−.

Page 16: Lecture Notes on Mathematics for Economists: A refresher

2.1 NATURAL AND REAL NUMBERS 9

For K ∈ N, the K-dimensional real (Euclidean) space is the K − foldCartesian productR. We denote this space byRK , so x ∈ RK is (x1, x2, ..., xK).Similar notation is used for orthants of RK.In order to measure how far from 0 (that is, from (0, 0, ..., 0)) an element

x of RK is, we use the Euclidean norm:1

kxk =Ã

KXk=1

x2k

!1/2It is obvious that when K = 1, then k·k = |·|. More importantly, it is alsoclear that ∀x ∈ RK , kxk ≥ 0, kxk = 0 ⇐⇒ x = 0, −y ≤ x ≤ y =⇒kxk ≤ kyk and kxk = k−xk. The crucial property, finally, is that Triangleinequality does also hold in RK :

Lemma 2.2 (Triangle Inequality in RK) ∀x, y ∈ RK, kx+ yk 6 kxk +kyk .Proof. Awell-established result in mathematics, called the Cauchy-Schwartz

inequality, states that ∀x, y ∈ RK ,³PK

k=1 xkyk´2≤³PK

k=1 x2k

´³PKk=1 y

2k

´.

Take this for granted and let x, y ∈ RK . Then,

kx+ yk2 =KXk=1

(xk + yk)2

=KXk=1

x2k + 2KXk=1

xkyk +KXk=1

y2k

≤KXk=1

x2k + 2

ÃKXk=1

x2k

!1/2Ã KXk=1

y2k

!1/2+

KXk=1

y2k

=

à KXk=1

x2k

!1/2+

ÃKXk=1

y2k

!1/22

= (kxk+ kyk)2

1If you want to avoid confusion, you can be explicit about the dimension for which thenorm is being used, and use the notation k·kK .

Page 17: Lecture Notes on Mathematics for Economists: A refresher

10 CHAPTER 2 INTRODUCTION TO REAL ANALYSIS

2.2 Functions

Let X and Y be two nonempty sets.

Definition 2.1 A function f from a set X into a set Y , denoted f : X → Y ,is a rule that assigns to each x ∈ X a unique f (x) ∈ Y.

Definition 2.2 If f : X → Y , X is said to be the domain of f , and Y itstarget set.

Definition 2.3 If f : X → Y , and A ⊆ X, we define the image of A underf , denoted f [A], to be

f [A] = y ∈ Y | ∃x ∈ A : f (x) = yIn particular, f [X] is called the range of f.

Obviously, if f : X → Y , and A ⊆ X, we have f [A] ⊆ Y .

Definition 2.4 The function f : X → Y is said to be onto (or surjective) iff [X] = Y

Definition 2.5 The function f : X → Y is said to be one-to-one (or injec-tive) if ∀x1, x2 ∈ X

x1 6= x2 =⇒ f (x1) 6= f (x2)

Definition 2.6 The function f : X → Y is said to be a one-to-one corre-spondence (or a bijective function) if it is both onto and one-to-one.

Definition 2.7 If f : X → Y , and B ⊆ Y , we define the inverse image ofB under f , denoted f−1 [B], to be

f−1 [B] = x ∈ X | f (x) ∈ BIf f : X → Y is a one-to-one correspondence, the “inverse” function

f−1 : Y → X is implicitly defined by f−1 (y) = f−1 [y] (Would thisbe a bona fide definition, had we forgotten to say that f is a one-to-onecorrespondence? What could have gone wrong?)

Theorem 2.1 The function f : X → Y is onto iff ∀B ⊆ Y : B 6= ∅,f−1 [B] 6= ∅.Proof. It is left as an exercise.From now on, we only concentrate on definitions and concepts in Euclid-

ean spaces and maintain the assumption that K ∈ N.

Page 18: Lecture Notes on Mathematics for Economists: A refresher

2.3 SEQUENCES 11

2.3 Sequences

Definition 2.8 A finite sequence in RK is a function f : X → RK, where,for some n∗ ∈ N, we have X = n ∈ N | n 6 n∗.

Definition 2.9 An infinite sequence in RK is a function f : N→ RK.

If no confusion is likely, the space in which a sequence lies is omitted.In the cases of finite sequences, it is usual to express them extensively as

(a1, a2, ..., an∗), where an = f (n), for n ∈ X. Shorthand for (a1, a2, ..., an∗) is(an)

n∗n=1.

2

Similarly, we can express infinite sequences as (a1, a2, ...) or (an)∞n=1, where

an = f (n), for n ∈ N.

Example 2.1 Suppose that n∗ = 5, so that X = 1, 2, 3, 4, 5, and f (n) =n2−1, for n ∈ X. Then, we can express the finite sequence as (0, 3, 8, 15, 24)or (n2 − 1)5n=1.

Example 2.2 Suppose that f (n) = (√n, 1/n, 3), for n ∈ N. Then, we

can express the infinite sequence as¡(1, 1, 3) ,

¡√2, 1/2, 3

¢,¡√3, 1/3, 3

¢, ...¢

or (√n)∞n=1.

It is very important to notice that a sequence has more structure than aset (i.e. it is more complicated). Remember that a set is completely definedby its elements, no matter how they are described. For example, the set0, 3, 8, 15, 24 is the same as the set 24, 15, 8, 3, 0. However, the sequences(0, 3, 8, 15, 24) and (24, 15, 8, 3, 0) are clearly different. In a sequence, theorder matters!Following also a common usage in the literature, when referring to an

infinite sequence, we will simply say “a sequence.”

Definition 2.10 A sequence (an)∞n=1 is said to be (monotonically) nonde-

creasing if ∀n ∈ N, an+1 > an and nonincreasing if ∀n ∈ N, an+1 6 an. If∀n ∈ N, an+1 > an, the sequence is said to be (monotonically) increasing, andif ∀n ∈ N, an+1 < an, the sequence is said to be (monotonically) decreasing..

2We have already introduced finite sequences: an element of RK is nothing but asequence in R, with n∗ = K.

Page 19: Lecture Notes on Mathematics for Economists: A refresher

12 CHAPTER 2 INTRODUCTION TO REAL ANALYSIS

Notice that when we say x ≤ y with x, y ∈ RK, we are expressing Kinequalities: xk ≤ yk for every k ∈ 1, ...,K. This implies that for everyx, y ∈ R, either x ≤ y or x ≥ y, but the same is not true in higher-dimensionalspaces. Hence, the previous concepts are more useful in R than in RK forK ≥ 2.Definition 2.11 A sequence (an)

∞n=1 is said to be bounded above if there

exists a ∈ R such that ∀n ∈ N, an ≤ a. It is said to be bounded below if thereexists a ∈ R such that ∀n ∈ N, an ≥ a, and is said to be bounded if it isbounded above and below.

2.4 Cauchy sequences and subsequences

Definition 2.12 Given a sequence (an)∞n=1, a sequence (bm)

∞m=1 is said to be

a subsequence of (an)∞n=1 if there exists a monotonically increasing sequence

(nm)∞m=1 in R, such that ∀m ∈ N, nm ∈ N, and ∀m ∈ N,

bm = anm

A subsequence is a selection of some (possibly all) members of the originalsequence that preserves the original order.

Example 2.3 Consider the sequence (1/√n)∞n=1, it is easy to see that³

1/√2n+ 5

´∞n=1

is a subsequence of the former. To see why, consider the sequence

(nm)∞m=1 = (2m+ 5)∞m=1

and notice that ³1/√2n+ 5

´∞n=1

= (1/√nm)

∞m=1

Exercise 2.2 Is (1/√n)∞n=1 a subsequence of (1/n)

∞n=1? How about the other

way around?

A type of sequences that is very useful is the following:

Definition 2.13 A sequence (an)∞n=1 is said to be a Cauchy sequence if ∀ε >

0, ∃n∗ ∈ N : ∀n1, n2 > n∗, kan1 − an2k < ε.

Page 20: Lecture Notes on Mathematics for Economists: A refresher

2.5 LIMITS 13

2.5 Limits

2.5.1 Limits of sequences

Definition 2.14 a ∈ RK is the limit of a sequence (an)∞n=1 in RK if ∀ε > 0,

∃n∗ ∈ N : ∀n > n∗, kan − ak < ε.

Exercise 2.3 Does the sequence (1/√n)∞n=1 have a limit? Is it Cauchy?

How about³

3nn+√n

´∞n=1?

Definition 2.15 A sequence (an)∞n=1 is said to be convergent when it has a

limit a ∈ RK.

When (an)∞n=1 converges to a, the following notation is also sometimes

used: an → a, or limn→∞ an = a.

Exercise 2.4 Does ((−1)n)∞n=1 converge? Does (−1/n)∞n=1?

It is convenient to allow +∞ and −∞ to be limits of sequences. Thus,we extend the definition as follows:

Definition 2.16 For a sequence (an)∞n=1 in R, we say that limn→∞ an =∞

when ∀∆ > 0, ∃n∗ ∈ N : ∀n > n∗, an > ∆. We say that limn→∞ an = −∞when limn→∞(−an) =∞.

Exercise 2.5 Does the sequence³3n√n

´∞n=1

have a limit? Does it converge?

The importance of concepts introduced in the previous section is givenby the following theorems:

Theorem 2.2 A sequence (an)∞n=1 converges to a ∈ RK iff every subsequence

of (an)∞n=1 converges to a.

Proof. It is left as an exercise.

Theorem 2.3 A sequence (an)∞n=1 converges iff it is a Cauchy sequence.

Proof. The “if” part is difficult and we will skip it. The “only if” part isleft as an exercise.

Page 21: Lecture Notes on Mathematics for Economists: A refresher

14 CHAPTER 2 INTRODUCTION TO REAL ANALYSIS

Theorem 2.4 If a sequence (an)∞n=1 in R is convergent, then it is bounded.

Proof. It is left as an exercise.

Theorem 2.5 If a sequence (an)∞n=1 is monotone and bounded, then it is

convergent.

Proof. It is left as an exercise. Moreover, can you give a stronger statementof the result?

Theorem 2.6 (Bolzano-Weierstrass) If a sequence (an)∞n=1 is bounded,

then it has a convergent subsequence.

Proof. A formal proof is deferred to chapter 3. A slightly informal argu-ment, for sequences in R, is as follows: if (an)∞n=1 is bounded, then it lies insome bounded interval I1. Slice that interval in halves. At least one of thehalves will contain infinitely many terms of the sequence. Call that intervalI2, slice it in halves, and let I3 be a half that contains infinitely many ele-ments... By doing this indefinitely, we construct intervals I1, I2, ... such that∀n ∈ N, In contains infinitely many terms of the sequence and In+1 ⊆ In.By construction, we can find a subsequence (xnm)

∞m=1 such that ∀m ∈ N,

anm ∈ Im. This subsequence must be Cauchy (hence convergent) because, byconstruction, our "sequence" of intervals is in fact shrinking to zero diameteras m goes to ∞.

2.5.2 Limits of functions

Definition 2.17 Let x ∈ RK and δ > 0. The open ball of radius δ aroundx, denoted Bδ (x), is defined as

Bδ (x) = y ∈ R | ky − xk < δ

The punctured open ball of radius δ around x, denoted B0δ (x), is defined as

B0δ (x) = Bδ (x) \ x

Definition 2.18 A point x ∈ RK is said to be a limit point of X ⊆ RK if∀ε > 0, B0

ε (x) ∩X 6= ∅.

Page 22: Lecture Notes on Mathematics for Economists: A refresher

2.5 LIMITS 15

Exercise 2.6 Prove the following: Let X ⊆ R. A point x ∈ R is a limitpoint of X iff there exists a sequence (xn)

∞n=1 defined in X\ x that converges

to x.

Another type of limit has to do with functions, although not directly withsequences.

Definition 2.19 Consider a function f : X → R, where X ⊆ RK. Supposethat x ∈ RK is a limit point of X and that y ∈ R. We say that

limx→x

f (x) = y

when ∀ε > 0,∃δ > 0 such that ∀x ∈ B0δ (x)∩X, we have that |f (x)− y| < ε.

It is important to notice that we do not require x ∈ X in our previousdefinition, so that f (x) need not be defined. Also, one should notice thateven if x ∈ X, x is not always a limit point of X, in which case the definitiondoes not apply. Finally, notice that even if x ∈ X and x is a limit point ofX, it need not always be the case that

limx→x

f (x) = f (x)

Definition 2.20 Consider a function f : X → R, where X ⊆ RK. Supposethat x ∈ RK is a limit point of X. We say that

limx→x

f (x) =∞

when ∀∆ > 0,∃δ > 0 such that ∀x ∈ B0δ (x) ∩X, we have that f (x) > ∆.

Definition 2.21 Consider a function f : X → R, where X ⊆ R. Supposethat x ∈ R is a limit point of X. We say that

limx→x

f (x) = −∞

whenlimx→x−f (x) =∞

Exercise 2.7 Suppose that X = R and ∀x ∈ R, f (x) = x + a, for somea ∈ R. What is limx→0 f (x)?

Page 23: Lecture Notes on Mathematics for Economists: A refresher

16 CHAPTER 2 INTRODUCTION TO REAL ANALYSIS

Exercise 2.8 Suppose that X = R and f : X → R is defined by

f (x) =

1/x if x 6= 00 if x = 0

What is limx→5 f (x)? What is limx→0 f (x)?

Example 2.4 Let X = R\ 0 and f : X → R is defined by

f (x) =

1 if x > 0

−1 if x < 0

In this case, we claim that limx→0 f (x) does not exist. To see why, fix ε : 0 <ε < 1, and notice that ∀δ > 0, ∃x1, x2 ∈ Bδ (0) : f (x1) = 1 and f (x2) = −1,so that |f (x1)− f (x2)| = 2 > 2ε. Hence, because of triangle inequality, it isimpossible that for some y ∈ R, we have |f (x1)− y| < ε and |f (x2)− y| < ε.Also, it is obvious that limx→0 f (x) = ∞ and limx→0 f (x) = −∞ are bothimpossible.

There exists a tight relationship between limits of functions and limits ofsequences, which is explored in the following theorem.

Theorem 2.7 Consider a function f : X → R, where X ⊆ RK. Supposethat x ∈ RK is a limit point of X and that y ∈ R. Then, limx→x f (x) = yiff for every sequence (xn)

∞n=1 such that ∀n ∈ N, xn ∈ X\ x and that

limn→∞

xn = x

we have thatlimn→∞

f (xn) = y

Proof. (If:) We argue by contradiction. Suppose that for every sequence(xn)

∞n=1 such that ∀n ∈ N, xn ∈ X\ x and that limn→∞ xn = x, we have

that limn→∞ f (xn) = y, but it is not true that limx→x f (x) = y. Then, ∃ε >0 : ∀δ > 0,∃x ∈ B0

δ (x)∩X : |f (x)− y| > ε. Fix one such ε. By assumption,∀n ∈ N, ∃x∗n ∈ B0

1/n (x) ∩ X : |f (xn)− y| > ε. Construct the sequence(xn)

∞n=1. By construction, ∀n ∈ N, x∗n ∈ X\ x and since 1/n→ 0, we have

Page 24: Lecture Notes on Mathematics for Economists: A refresher

2.5 LIMITS 17

that limn→∞ x∗n = x. However, by assumption, ¬ (limn→∞ f (xn) = y), whichcontradicts the initial hypothesis..(Only if:) Consider any sequence, (xn)

∞n=1 such that ∀n ∈ N, xn ∈ X\ x

and that limn→∞ xn = x. Fix ε > 0. Since limx→x f (x) = y ∈ R, then,∃δ > 0 : ∀x ∈ B0

δ (x) ∩X : |f (x)− y| < ε. Since limn→∞ xn = x, ∃n∗ ∈ N :∀n > n∗, xn ∈ Bδ (x). Moreover, since ∀n ∈ N, xn ∈ X\ x, we have that∀n > n∗, xn ∈ B0

δ (x) ∩ X and, therefore ∀n > n∗, |f (xn)− y| < ε. Sinceε > 0 was arbitrarily chosen, this implies that limn→∞ f (xn) = y.

2.5.3 Properties of limits

Theorem 2.8 Let f : X → R and g : X → R. Let x be a limit point of X.Suppose that ∃y1, y2 ∈ R :

limx→x

f (x) = y1 and limx→x

g (x) = y2

Then3,

limx→x

(f + g) (x) = y1 + y2

limx→x

(αf) (x) = αy1, ∀α ∈ Rlimx→x

(f.g) (x) = y1.y2

Moreover, if y2 6= 0,limx→x

µf

g

¶(x) =

y1y2

Proof. We prove only the first two parts of the theorem.For the first part, we have that ∀ε > 0,∃δ1, δ2 > 0 : ∀x ∈ B0

δ1(x) ∩

X, |f (x)− y1| < ε/2 and ∀x ∈ B0δ2(x) ∩ X, |g (x)− y2| < ε/2. Let δ =

min δ1, δ2 > 0. Then, by construction, ∀x ∈ B0δ (x) ∩X, |f (x)− y1| < ε/2

and |g (x)− y2| < ε/2. This implies that ∀x ∈ B0δ (x) ∩ X, |f (x)− y1| +

|g (x)− y2| < ε. Now, by triangle inequality,

|(f + g) (x)− (y1 + y2)| 6 |f (x)− y1|+ |g (x)− y2|3The following notation is introduced. We define (f + g) : X → R by saying, ∀x ∈ X,

(f + g) (x) = f (x) + g (x). We define (f.g) and (αf) , for α ∈ R, accordingly.Now, define X∗g = x ∈ X | g (x) 6= 0. Then, we define

³fg

´: X∗g → R by saying,

∀x ∈ X∗g ,³fg

´(x) = f(x)

g(x) .

Page 25: Lecture Notes on Mathematics for Economists: A refresher

18 CHAPTER 2 INTRODUCTION TO REAL ANALYSIS

So that ∀x ∈ B0δ (x)∩X, |(f + g) (x)− (y1 + y2)| < ε. Since y1+ y2 ∈ R, the

latter implies thatlimx→x

(f + g) (x) = y1 + y2

For the second part, notice first that if α = 0 the proof is trivial. Then,consider α 6= 0. Since limx→x f (x) = y1 ∈ R, ∀ε > 0,∃δ > 0 : ∀x ∈B0δ (x) ∩X, |f (x)− y1| < ε/ |α|. This implies that ∀x ∈ B0

δ (x) ∩X,

|(αf) (x)− αy1| = |α (f (x)− y1)| = |α| |f (x)− y1| < ε

and, therefore, thatlimx→x

(αf) (x) = αy1, ∀α ∈ R

Given the relationship found in theorem 2.7, it comes as no surprise thata theorem analogous to the previous one holds for sequences:

Theorem 2.9 Let (an)∞n=1 and (bn)

∞n=1 be two sequences. Suppose that ∃a, b ∈

R :limn→∞

an = a and limn→∞

bn = b

Then,

limn→∞

(an + bn) = a+ b

limn→∞

αan = αa, ∀α ∈ Rlimn→∞

(an.bn) = a.b

Moreover, if K = 1,∀n ∈ N, bn 6= 0, and b 6= 0

limn→∞

µanbn

¶=

a

b

Proof. The proof of the first two parts is left as an exercise.The following theorem is also very useful:

Theorem 2.10 For every sequence (an)∞n=1 in R, such that ∀n ∈ N, an > 0,

limn→∞

an =∞⇐⇒ limn→∞

1

an= 0

Page 26: Lecture Notes on Mathematics for Economists: A refresher

2.5 LIMITS 19

Proof. Consider first the if part: suppose that limn→∞ 1an= 0. Fix ∆ > 0.

Then, ∃n∗ ∈ N : ∀n > n∗, |1/an − 0| < 1/∆. Then, since ∀n ∈ N, an > 0,it follows that ∀n > n∗, an > ∆. Since ∆ > 0 was arbitrary, we have shownthat ∀∆ > 0, ∃n∗ ∈ N : ∀n > n∗, an > ∆.The only if part is left as an exercise.

Exercise 2.9 Repeat the last part of exercise 2.3, using the previous theorem.Is it easier? Show that

limn→∞

µ15n5 + 73n4 − 118n2 − 98

30n5 + 19n3

¶=1

2

A very useful property of limits (for both sequences and functions) is thatthey preserve weak inequalities:

Theorem 2.11 Consider a sequence (an)∞n=1 in R such that ∀n ∈ N, an 6 α,

for some α ∈ R. If ∃a ∈ R :

limn→∞

an = a

then, a 6 α.

Proof. Left as an exercise.

Corollary 2.1 Consider a sequence (an)∞n=1 in R such that ∀n ∈ N, an > α,

for some α ∈ R. If ∃a ∈ R :

limn→∞

an = a

then, a > α.

Proof. Left as an exercise.

Exercise 2.10 Can we strengthen our results to something like: “Considera sequence (an)

∞n=1 in R such that ∀n ∈ N, an < α, for some α ∈ R. If

∃a ∈ R :limn→∞

an = a

then, a < α.”?

Page 27: Lecture Notes on Mathematics for Economists: A refresher

20 CHAPTER 2 INTRODUCTION TO REAL ANALYSIS

Theorem 2.12 Consider f : X → R such that ∀x ∈ X, f (x) 6 γ, and letx ∈ RK be a limit point of X. If ∃y ∈ R :

limx→x

f (x) = y

then y 6 γ.

Proof. Left as an exercise.

Exercise 2.11 The previous theorem can be proved by two different argu-ments. Can you give them both? (Hint: one argument is by contradiction;the other one uses theorem 2.11 directly.)

Corollary 2.2 Consider f : X → R such that ∀x ∈ X, f (x) > γ, and letx ∈ RK be a limit point of X. If ∃y ∈ R :

limx→x

f (x) = y

then y > γ.

Proof. Left as an exercise.

Corollary 2.3 Consider f : X → R and g : X → R such that ∀x ∈ X,f (x) > g (x), and let x ∈ R be a limit point of X. If ∃y1, y2 ∈ R :

limx→x

f (x) = y1

limx→x

g (x) = y2

then y1 > y2.

Proof. Left as an exercise.Obviously, a sequence (an)

∞n=1 in RK is nothing but an array of K se-

quences in R: (ak,n)∞n=1 for each k ∈ 1, ..., K. So, it should not come as nosurprise that some relations exist between these objects:

Theorem 2.13 Sequence (an)∞n=1 in RK is bounded if, and only if, ∀k ∈

1, ...,K, sequence (ak,n)∞n=1 in R is bounded.

Proof. Left as an exercise.

Page 28: Lecture Notes on Mathematics for Economists: A refresher

2.6 EULER’S NUMBER AND NATURAL LOGARITHM 21

Theorem 2.14 Sequence (an)∞n=1 in RK converges to a if, and only if, ∀k ∈

1, ..., K, sequence (ak,n)∞n=1 in R converges to ak.

Proof. (If:) Fix > 0. For every k ∈ 1, ...,K, there is n∗k ∈ N such that∀n ≥ n∗k, |ak,n − ak| < /

√n. Let n∗ = max n∗1, ..., n∗K ∈ N, and let n ≥ n∗.

By construction,

kan − ak =Ã

KXk=1

(ak,n − ak)2

!1/2

<

ÃKXk=1

2/K

!1/2=

(Only if:) Let k ∈ 1, ..., K and > 0. There is n∗ ∈ N such that∀n ≥ n∗, kan − ak < , which suffices to imply that |ak,n − ak| < .

2.6 Euler’s number and natural logarithm

One of the must important numbers in mathematics is given by the followingdefinition:

Definition 2.22 We define the number e as

e = limn→∞

µ1 +

1

n

¶n

And it can be shown (later you will) that

e =∞Xn=0

1

n!

and, numerically, e = 2.71828182...

Definition 2.23 For x ∈ R++, we define the natural logarithm of x, denotedln (x), as the number y such that

ey = x

Page 29: Lecture Notes on Mathematics for Economists: A refresher
Page 30: Lecture Notes on Mathematics for Economists: A refresher

Chapter 3

Topology of RK

From now on, we deal only with subsets of RK , for K ∈ N; that is, wheneverwe introduce sets X or Y , we assume that X,Y ⊆ RK and use all thealgebraic structure of RK , and the structure induced in RK by the Euclideannorm. Whenever we take complements, they are relative to RK.

3.1 Open and closed sets

3.1.1 Open sets

Definition 3.1 A set X is said to be open if ∀x ∈ X, ∃ε > 0 : Bε (x) ⊆ X.

Example 3.1 (Open intervals are open sets in R) We define an openinterval, denoted (a, b),1 where a, b ∈ R and a < b, as x ∈ R | a < x < b.To see that these are open sets in R, take x ∈ (a, b), and define

ε =min x− a, b− x

2> 0

By construction, Bε (x) ⊆ X. As a consequence, notice that open balls areopen sets in R. The same is true in RK, for K ≥ 2.

Theorem 3.1 The empty set is open.

1Sometimes open intervals are denoted by ]a, b[ rather that (a, b) in order to distinguishthem from two-element sequences. We will, however, follow the more standard notation.

23

Page 31: Lecture Notes on Mathematics for Economists: A refresher

24 CHAPTER 3 TOPOLOGY OF RK

Proof. A set X fails to be open if ∃x ∈ X : ∀ε > 0, Bε(x)∩Xc 6= ∅. Clearly,∅ cannot exhibit such property.

Theorem 3.2 The union of any collection of open sets is an open set. Theintersection of a finite collection of open sets is an open set.

Proof. For the first part, suppose that Z is the union of a given collectionof open sets (whether finite or infinite doesn’t matter). If Z = ∅, we aredone. Suppose now that x ∈ Z. By definition, then, there exists a memberX of the collection of sets such that x ∈ X. By assumption, X is open, sothat ∃ε > 0 : Bε (x) ⊆ X. It follows then that Bε (x) ⊆ Z.For the second part, suppose that Z is the intersection of a finite collection

of open sets, say X1,X2, ..., Xn∗, for n∗ ∈ N. If Z = ∅, we are done.Suppose now that x ∈ Z. By definition, then, ∀n ∈ 1, 2, ..., n∗, it is truethat x ∈ Xn. By assumption, each Xn is open, so that ∀n ∈ 1, 2, ..., n∗,∃εn > 0 : Bεn (x) ⊆ Xn. Let ε = min ε1, ε2, ..., εn∗ > 0 (this is why weimpose finiteness). By construction, ∀n ∈ 1, 2, ..., n∗, Bε (x) ⊆ Bεn (x) ⊆Xn and therefore Bε (x) ⊆ Z.

Theorem 3.3 RK is an open set.

Proof. It is left as an exercise.

It is easy to see that if we extend the definition of the open interval(a, b) to x ∈ R | a < x < b where a, b ∈ R ∪ +∞,−∞and a < b, then itcontinuous to be true that open intervals are open sets.

Definition 3.2 We say that x is an interior point of set the X, if ∃ε > 0 :Bε (x) ⊆ X. The set of all the interior points of X is called the interior ofX, and is usually denoted int (X).

Notice that int (X) ⊆ X

Exercise 3.1 Show for every X, int (X) is open and that X is open iffint (X) = X.

Exercise 3.2 Prove the following: If x ∈ int(X), then x is a limit point ofX.

Exercise 3.3 Did we really need finiteness in the second part of theorem 3.2?Consider the following infinite collection of open intervals: ∀n ∈ N, defineIn =

¡− 1n, 1n

¢; Find the intersection of all those intervals, denoted

T∞n=1 In.

Is it an open set?

Page 32: Lecture Notes on Mathematics for Economists: A refresher

3.1 OPEN AND CLOSED SETS 25

3.1.2 Closed sets

Definition 3.3 A set X is said to be closed if for every sequence (xn)∞n=1

that satisfies that ∀n ∈ N, xn ∈ X and converges to x, we have that x ∈ X.

Theorem 3.4 The empty set is closed.

Proof. In order to show that a setX fails to be closed, we have to argue thatthere exists (xn)

∞n=1 satisfying that ∀n ∈ N, xn ∈ X, and that it converges to

x, and such that x /∈ X. Clearly, we cannot find such sequence if X = ∅.

Exercise 3.4 If a, b ∈ R∪ −∞,∞, a < b, is (a, b) closed? We define thehalf-closed interval (a, b], where a ∈ R∪ −∞, b ∈ R, a < b, as (a, b] =x ∈ R | a < x 6 b . Similarly, we define the half-closed interval [a, b) wherea ∈ R, b ∈ R∪ ∞, a < b, as [a, b) = x ∈ R | a 6 x < b . Are half-closedintervals closed sets? Are they open? If x ∈ RK, is x an open set, a closedset or neither?

Theorem 3.5 A set X is closed iff Xc is open.

Proof. (If:) Suppose thatXc is open. IfX is empty, we are done. Otherwise,consider any sequence (xn)

∞n=1 satisfying that ∀n ∈ N, xn ∈ X, and that

it converges to x. We need to show that x ∈ X. In order to argue bycontradiction, suppose that x ∈ Xc. Since Xc is open, ∃ε > 0 : Bε (x) ⊆ Xc.Since xn → x, ∃n∗ ∈ N : ∀n > n∗, kxn − xk < . Then ∀n > n∗, xn ∈B (x) ⊆ Xc, which contradicts the fact that ∀n ∈ N, xn ∈ X. Thus, weconclude that x ∈ X.(Only if:) Suppose now that X is closed. If Xc = ∅, we are done.

Otherwise, fix x ∈ Xc. We need to show that ∃ε > 0 : Bε (x) ⊆ Xc. Supposenot: ∀ε > 0, Bε (x)∩X 6= ∅. Clearly ∀n ∈ N, ∃xn ∈ B1/n (x)∩X. Constructa sequence (xn)

∞n=1 of such elements. Since 1/n→ 0 it follows that xn → x.

Since ∀n ∈ N, xn ∈ X and X is closed, then x ∈ X, contradicting the factthat x ∈ Xc. Thus, we conclude, ∃ε > 0 : Bε (x) ⊆ Xc.

Theorem 3.6 RK is closed.

Proof. It is left as an exercise.

(Yes: the sets ∅ and RK are both open and closed in RK . In fact, in thecontext in which we are working, those are the only two sets for which thisis true.)

Page 33: Lecture Notes on Mathematics for Economists: A refresher

26 CHAPTER 3 TOPOLOGY OF RK

Theorem 3.7 The intersection of any collection of closed sets is closed. Theunion of a finite collection of closed sets is closed.

Proof. It is left as an exercise. (Hint: Hopefully, you can use the generalizedversion of DeMorgan’s laws that you proposed and proved in exercise 1.1.)

Definition 3.4 Given a set X ⊆ RK, we define its closure, denoted by X,as

X =©x ∈ RK

¯∀ > 0, B (x) ∩X 6= ∅ªExercise 3.5 Prove the following: given a set X ⊆ RK, x ∈ X iff thereexists a sequence (xn)

∞n=1 in X such that xn −→ x.

Exercise 3.6 Prove the following: for every set X ⊆ RK, X ⊆ X, and Xis closed iff X = X.

Example 3.2 Closed intervals are closed sets. We define an closed interval,denoted [a, b], where a, b ∈ R and a6b as x ∈ R | a6x6b. To see that theseare closed sets, notice that [a, b]c = (−∞, a) ∪ (b,∞), and conclude based onprevious results.

Exercise 3.7 Did we really need finiteness in the second part of theorem3.7? Consider the following infinite collection of closed intervals: ∀n ∈ N,define Jn =

£1 + 1

n, 3− 1

n

¤; Find the union of all those intervals, denotedS∞

n=1 Jn. Is it a closed set?

3.2 Bounded sets

Definition 3.5 A set X ⊆ RK is said to be bounded above if ∃α ∈ RK :∀x ∈ X, x6α; it is said to be bounded below if ∃β ∈ R : ∀x ∈ X, x > β, andis said to be bounded if it is bounded above and below.

Exercise 3.8 Show that a set X is bounded if and only if ∃α ∈ R+ : ∀x ∈X, kxk ≤ α.

Page 34: Lecture Notes on Mathematics for Economists: A refresher

3.3 COMPACT SETS 27

3.3 Compact sets

Definition 3.6 A set X ⊆ RK is said to be compact if it is closed andbounded.

The previous definition is fine for subsets of RK but this is not truefor more general spaces. The general concept (in topology) is that a set iscompact if, whenever you can "cover" it with a collection of open sets, thenyou can also do it with finitely many of those sets (people usually say: "ifevery open cover has a finite subcover"). Here, we use the simpler definitionabove.

Exercise 3.9 Prove the following statement: if (xn)∞n=1 is a sequence defined

on a compact set X, then it has a subsequence that converges to a point inX.

3.4 Infimum and Supremum

Definition 3.7 Let X 6= ∅, X ⊆ R. A number α ∈ R is said to be an upperbound of X if ∀x ∈ X, x ≤ α.A number β ∈ R is said to be a lower bound ofX if ∀x ∈ X, x ≥ α.

Definition 3.8 Let X 6= ∅, X ⊆ R. A number α ∈ R is said to be the leastupper bound of X, denoted α = supX, if: (i) α is an upper bound of X; and(ii) if γ is an upper bound of X, then γ > α.

Definition 3.9 Let X 6= ∅, X ⊆ R. A number β ∈ R is said to be thegreatest lower bound of X, denoted β = infX, if: (i) β is a lower bound ofX; and (ii) if γ is a lower bound of X, then γ6β.

Theorem 3.8 Let X 6= ∅, X ⊆ R. α = supX if, and only if, ∀ > 0 it istrue that ∀x ∈ X, x < α+ and ∃x ∈ X : α− < x.

Proof. It is left as an exercise.

Remember that in chapter 1, we announced one axiom of R that is notsatisfied by the Rationals? Here it goes:

Axiom 3.1 (Axiom of Completeness) Let X ⊆ R, X 6= ∅. If X isbounded above, then it has a least upper bound.

Page 35: Lecture Notes on Mathematics for Economists: A refresher

28 CHAPTER 3 TOPOLOGY OF RK

The axiom of completeness gives us the tool for a formal proof of theBolzano-Weierstrass theorem (2.6):Proof of theorem 2.6. By theorems 2.13 and 2.14, it suffices that weconsider just the case K = 1.Since (an)

∞n=1 is bounded, there exists a ∈ R such that ∀n ∈ N, −a <

an < a. Define the set

X = x ∈ R| an ≥ x for infinitely many terms of (an)∞n=1

Since −a ∈ X and ∀x ∈ X, x ≤ a, it follows from axiom 3.1 that X has aleast upper bound.Let α = supX.Fix > 0 and en ∈ N. By definition, α + /2 /∈ X, which means that

∃n∗ ∈ N : ∀n ≥ n∗, an < α+ . Now, if ∃bn ∈ N,∀n ≥ bn : an < α− /2, then,it follows that if x ∈ X, then x ≤ α − /2, which contradicts the fact thatα = supX. So, it must be that ∀bn ∈ N,∃n ≥ bn : an ≥ α − /2. It followsthat ∃n > en, such that α− < an < α+ .Then, we can define (nm)

∞m=1 recursively as follows:

n1 = min n ∈ N|α− 1 < an < α+ 1nm = min n ∈ N|n ≥ nm−1 ∧ α− 1/m < an < α+ 1/m

It is straightforward that (anm)∞m=1 is a convergent subsequence of (an)

∞n=1.

Page 36: Lecture Notes on Mathematics for Economists: A refresher

Chapter 4

Continuity

Throughout this chapter, we maintain the assumption that X ⊆ RK, K ∈ N.

4.1 Continuity of functions

Definition 4.1 We say that the function f : X → R is continuous at x ∈ Xif ∀ε > 0, ∃δ > 0 : ∀x ∈ Bδ (x) ∩X, we have that |f (x)− f (x)| < ε.

Notice first that continuity at x ∈ X is a local concept. Second, x inthe definition may or may not be a limit point of X. Therefore, two pointsare worth noticing. If x is not a limit point of X, then any f : X → R iscontinuous at x. (Why?) If, on the other hand, x is a limit point of X, it isstraightforward that f : X → R is continuous at x iff

limx→x

f (x) = f (x)

Intuitively, this occurs when a function is such that in order to get arbi-trarily close to f (x) in the range, all we need to do is get close enough to xin the domain. By theorem 2.7, it follows that when x ∈ X is a limit pointof X, f is continuous at x iff whenever we take a sequence of points in thedomain that converges to x, the sequence formed by their images convergesto f (x) (that in this case the concept is not vacuous follows from exercise2.6).

Definition 4.2 We say that the function f : X → R is continuous if ∀x ∈X, it is continuous at x.

29

Page 37: Lecture Notes on Mathematics for Economists: A refresher

30 CHAPTER 4 CONTINUITY

Exercise 4.1 Consider the function introduced in exercise 2.8. Is it contin-uous?

Exercise 4.2 Consider the function introduced in example 2.4. Is it contin-uous? What if we change the function slightly as follows: f : R→ R, definedas

f (x) =

1 if x > 0

0 if x = 0

−1 if x < 0

Is it continuous?

4.2 Properties and the intermediate value the-orem

The following properties of continuous functions are derived from the prop-erties of limits.

Theorem 4.1 Suppose that f : X → R and g : X → R are continuous atx ∈ X and α ∈ R. Then, the functions f + g, αf , and f.g are continuous atx ∈ X. Moreover, if g (x) 6= 0, then f

gis continuous at x ∈ X.

Proof. This theorem follows from theorem 2.8. For example, if x ∈ X is alimit point of X, then, continuity of f and g at x implies that limx→x f (x) =f (x) and limx→x g (x) = g (x). Since f (x) ∈ R and g (x) ∈ R, it followsfrom theorem 2.8 that

limx→x

(f + g) (x) = f (x) + g (x) = (f + g) (x)

so that (f + g) is continuous at x. The proof of the rest of the theorem issimilar.Another result that is useful in economics is the following.

Theorem 4.2 f : RK → R is continuous iff ∀U ⊆ R, U open, we have thatf−1 [U ] is open.

Page 38: Lecture Notes on Mathematics for Economists: A refresher

4.2 PROPERTIESANDTHE INTERMEDIATEVALUETHEOREM31

Proof. (If:) Consider x ∈ RK , and ε > 0. By example 3.1, Bε (f (x)) is openand, therefore, so is f−1 [Bε (f (x))]. Since x ∈ f−1 [Bε (f (x))], we have that∃δ > 0 : Bδ (x) ⊆ f−1 [Bε (f (x))], or, what is the same, that ∀x ∈ Bδ (x),|f (x)− f (x)| < ε.(Only if:) Let U ⊆ R be an open set. If f−1 [U ] = ∅, we are done.

Otherwise, let x ∈ f−1 [U ]. By definition, f (x) ∈ U , and since U is open,∃ε > 0 : Bε (f (x)) ⊆ U .Now, since f is continuous, limx→x f (x) = f (x), which implies that ∃δ >

0 : ∀x ∈ Bδ (x), |f (x)− f (x)| < ε. The latter implies that ∀x ∈ Bδ (x),f (x) ∈ Bε (f (x)) ⊆ U , or, what is the same, that Bδ (x) ⊆ f−1 [U ].

I stated the previous theorem in a weaker form than it really has to be.Actually, we don’t really need the domain of f to be RK . If the domain isX ⊆ RK , the result continues to hold, but we need to qualify the definitionof open set, to make it relative to the set X (which we won’t in these Notes).The following result is very intuitive:

Theorem 4.3 (The Intermediate Value Theorem in R) If f : [a, b]→R is continuous, and f (b) > f (a), then ∀γ ∈ [f (a) , f (b)], ∃x ∈ [a, b] :f (x) = γ.

Proof. If γ = f (a) or γ = f (b), the result is obvious. Then, we assume thatf (a) < γ < f (b) . Denote S = f−1 [(−∞, γ)]. Since a ∈ S, it follows thatS 6= ∅. Since S ⊆ [a, b], it follows that S is bounded. Then, by the axiomof completeness (axiom 3.1) we have that x = supS exists. By definition,∀, x − 1

nis not an upper bound of S (see definition 3.8). Hence,∀n ∈ N,

∃xn ∈ S : x− 1n< xn 6 x. Construct such sequence (xn)

∞n=1. By construction,

∀n ∈ N, f (xn) < γ, whereas since 1/n→ 0, we have that xn → x and sincef is continuous, we have that

limn→∞

f (xn) = f (x) 6 γ

where the inequality follows from theorem 2.11.Now, define ∀n ∈ N, exn = min©b, x+ 1

n

ª. Consider n ∈ N. If exn = b,

then exn /∈ S. Alternatively, exn = x + 1n> x, from where if exn ∈ S, we

have that ∃x ∈ S : x > supS which is a contradiction. It must then be that∀n ∈ N, exn /∈ S, which implies that ∀n ∈ N, f (exn) > γ. Again, since exn → x(why?), we have that

limn→∞

f (exn) = f (x) > γ

Page 39: Lecture Notes on Mathematics for Economists: A refresher

32 CHAPTER 4 CONTINUITY

by continuity of f and corollary 2.11.Combining the two inequalities, we obtain that f (x) = γ.

Corollary 4.1 If f : [a, b] → R is continuous, and f (a) > f (b), then∀γ ∈ [f (b) , f (a)], ∃x ∈ [a, b] : f (x) = γ.

It should be clear that if we consider f defined on X ⊆ RK , even witha, b ∈ X, the object [a, b] is not well defined. The line segment connecting aand b, however, is:©

x ∈ RK¯∃θ ∈ [0, 1] : θa+ (1− θ) b ∈ X

ªThe following result now generalizes the previous theorem:

Theorem 4.4 Let a, b ∈ X be such that©x ∈ RK

¯∃θ ∈ [0, 1] : θa+ (1− θ) b ∈ Xª ⊆ X.

If f : X −→ R is continuous, then ∀γ ∈ [f (a) , f (b)] ∪ [f (b) , f (a)], ∃x ∈©x ∈ RK

¯∃θ ∈ [0, 1] : θa+ (1− θ) b ∈ Xª: f (x) = γ.

Proof. Fix γ ∈ [f (a) , f (b)]∪ [f (b) , f (a)]. Define the function ϕ : [0, 1] −→R by ϕ (θ) = f (θa+ (1− θ) b), which we can do because©

x ∈ RK¯∃θ ∈ [0, 1] : θa+ (1− θ) b ∈ X

ª ⊆ X.

By construction, ϕ (1) = f (a) and ϕ (0) = f (b). By a later result (theorem6.1, in chapter 6, which does not require the theorem we are now showing),it follows that ϕ is continuous. By the results on R, it follows that ∃θ ∈ [0, 1]such thatϕ

¡θ¢= γ. Let x = θa+

¡1− θ

¢b.

4.3 Left/Right-continuity

When we are dealing with functions defined on X ⊆ R, we can easily identifyfor each x ∈ X, which part of the domain is above x and which one is below.This property allows us to study how the function behaves as we approachx from above (the right) or below (the left).

Page 40: Lecture Notes on Mathematics for Economists: A refresher

4.3 LEFT/RIGHT-CONTINUITY 33

Definition 4.3 Consider a function f : X → R, where X ⊆ R. Supposethat x is a limit point of X, and let ∈ R. We say that

limx&x

f (x) =

when ∀ε > 0,∃δ > 0 : ∀x ∈ X ∩Bδ (x) : x > x, we have that |f (x)− | < ε.

In such case, we say that the function f converges to as x tends to xfrom above. Similarly, we define:

Definition 4.4 Consider a function f : X → R, where X ⊆ R. Supposethat x is a limit point of X, and let ∈ R. We say that

limx%x

f (x) =

when ∀ε > 0,∃δ > 0 : ∀x ∈ X ∩Bδ (x) : x < x, we have that |f (x)− | < ε.

In this case, we say that f converges to as x tends to x from below.

Definition 4.5 We say that the function f : X → R is right-continuous atx ∈ X, where x is a limit point of X, if

limx&x

f (x) = f (x)

We say that f is right-continuous if ∀x ∈ X, it is right-countinuous at x.

Definition 4.6 We say that the function f : X → R is left-continuous atx ∈ X, where x is a limit point of X, if

limx%x

f (x) = f (x)

We say that f is left-continuous if ∀x ∈ X, it is left-countinuous at x.

Exercise 4.3 Consider the function introduced in exercise 4.2 (the new one).Is it right-continuous? Left-continuous? What if we have

f (x) =

1 if x > 0−1 if x < 0

Is it left- or right-continuous? How about

f (x) =

1 if x > 0

−1 if x60?

Page 41: Lecture Notes on Mathematics for Economists: A refresher
Page 42: Lecture Notes on Mathematics for Economists: A refresher

Chapter 5

Differentiability

For simplicity, we first consider functions defined on R, and study higher-dimensional spaces later.

5.1 Functions on RThroughout this section, we maintain the assumption that X ⊆ R is open.Suppose that we have f : X → R. Fix x ∈ X. Define

Hx = h ∈ R\ 0|x+ h ∈ XNow, ∀h ∈ Hx, evaluate the expression

f (x+ h)− f (x)

h

Since x is fixed, the expression depends on (is a function of) h only, onthe nonempty (why?) domain Hx. Moreover, since 0 is a limit point of Hx,we can use definition 2.19, to study the following object:

limh→0

f (x+ h)− f (x)

h

5.1.1 Differentiability

Definition 5.1 A function f : X → R is said to be differentiable at x ∈ Xif ∃ ∈ R :

limh→0

f (x+ h)− f (x)

h=

35

Page 43: Lecture Notes on Mathematics for Economists: A refresher

36 CHAPTER 5 DIFFERENTIABILITY

Notice that the definition does require the limit to be a real number.Besides, since we only define limx→x g (x) when x is a limit point of thedomain of function g, our definition of differentiability implicitly requiresx to be a limit point of X\ x and, therefore, of X. But it follows fromexercises 3.1 and 3.2 that this is always the case since X is open1.

Definition 5.2 A function f : X → R is said to be differentiable if it isdifferentiable at all x ∈ X.

Definition 5.3 Suppose that a function f : X → R is differentiable at x ∈X, then we define the derivative of f at x, denoted f 0 (x), to be

f 0 (x) = limh→0

f (x+ h)− f (x)

h

Example 5.1 Let f : R → R be defined by f (x) = x2. We want to knowwhether the function is differentiable. Fix x ∈ R. Now, ∀h 6= 0,

f (x+ h)− f (x)

h=(x+ h)2 − x2

h

=x2 + 2xh+ h2 − x2

h

=2xh+ h2

h= 2x+ h

Then, by exercise 2.7, we know that f is indeed differentiable and ∀x ∈ Rf 0 (x) = 2x

Example 5.2 Let f : R → R be defined by f (x) = |x|. We want to knowwhether the function is differentiable at 0. Fix x = 0 and evaluate, ∀h 6= 0,

f (x+ h)− f (x)

h=|h|− 0

h

=

1 if h > 0

−1 if h < 0

1One can study differentiability in a slightly more general context by not restricting Xand only defining the concept at limit points of the domain. In R, applying our definitionto noninterior limit points will encompass the concepts of left- and right-differentiability,which we won’t cover here.

Page 44: Lecture Notes on Mathematics for Economists: A refresher

5.1 FUNCTIONS ON R 37

By example 2.4, we know that in this case

limh→0

f (x+ h)− f (x)

h

does not exist. Therefore, f is not differentiable at 0.

A useful characterization of differentiability is the following:

Theorem 5.1 A function f : X → R is differentiable, and has derivativef 0 (x) = , at x ∈ X iff ∃ε > 0 and ∃ϕ : Bε (0)→ R such that

limh→0

ϕ (h)

h= 0

and∀h ∈ Bε (0) , f (x+ h) = f (x) + h+ ϕ (h)

Proof. (If:) Suppose that ∃ε > 0 and ∃ϕ : Bε (0)→ R, with the mentionedproperties. Then,

∀h ∈ B0ε (0) ,

f (x+ h)− f (x)

h− =

ϕ (h)

h

from where

limh→0

ϕ (h)

h= 0

implies the result.(Only if:) Suppose now that f is differentiable at x. Since x is open,

∃ε > 0 : Bε (x) ⊆ X. Fix one such ε, and define ϕ : Bε (0)→ R by

∀h ∈ Bε (0) , ϕ (h) = f (x+ h)− f (x)− h

which is well defined since, ∀h ∈ Bε (0), x+ h ∈ Bε (x) ⊆ X. Then,

∀h ∈ B0ε (0) ,

ϕ (h)

h=

f (x+ h)− f (x)

h−

from where, by definition,

limh→0

f (x+ h)− f (x)

h= f 0 (x) =

implies the result.

Page 45: Lecture Notes on Mathematics for Economists: A refresher

38 CHAPTER 5 DIFFERENTIABILITY

5.1.2 Continuity and differentiability

For simplicity of notation, notice that if h ∈ Hx, then there exists x ∈ X :x = x+ h. Thus, it follows that

limh→0

f (x+ h)− f (x)

h= lim

x→x

f (x)− f (x)

x− x

A very strong relation is the following:

Theorem 5.2 If a function f : X → R, is differentiable at x ∈ X, then itis continuous at x ∈ X.

Proof. If the function is differentiable at x ∈ X, then ∃ ∈ R :

limx→x

f (x)− f (x)

x− x=

Notice that ∀x ∈ X\ x

f (x) = f (x) +

µf (x)− f (x)

x− x

¶(x− x)

Now, since limx→x f (x) = f (x) ∈ R, limx→xf(x)−f(x)

x−x = ∈ R andlimx→x (x− x) = 0 ∈ R, then by theorem 2.8, we have that

limx→x

f (x) = limx→x

µf (x) +

µf (x)− f (x)

x− x

¶(x− x)

¶= f (x) + .0

= f (x)

so that f is continuous at x

5.1.3 Computing derivatives

There are some very well known rules to compute derivatives. The followingtheorem appears as theorem 2.8 in Simon and Blume (p. 28).

Page 46: Lecture Notes on Mathematics for Economists: A refresher

5.1 FUNCTIONS ON R 39

Theorem 5.3 Suppose that f, g : X → R are differentiable. Then ∀x ∈ Xand ∀k ∈ R, we have that

(f + g)0 (x) = f 0 (x) + g0 (x)

(k.f)0 (x) = k.f 0 (x)(f.g)0 (x) = g (x) f 0 (x) + f (x) g0 (x)µf

g

¶0(x) =

g (x) f 0 (x)− f (x) g0 (x)

(g (x))2if g (x) 6= 0³

(f (x))k´0= k (f (x))k−1 f 0 (x)

Proof. The first two parts follow straightforwardly from the properties oflimits. These two parts are left as exercise. Once the first part has beenproven, the fifth part can be proven for the case k ∈ N using the principle ofmathematical induction (over k). The general case is more complicated andwe will not attempt to prove it. We now prove the third and fourth parts.

For the third part, notice that ∀x ∈ X, ∀h ∈ R\ 0 : x+ h ∈ X,

(f.g) (x+ h)− (f.g) (x)h

=f (x+ h) g (x+ h)− f (x) g (x)

h

=f (x+ h) g (x)− f (x) g (x)

h

+f (x+ h) g (x+ h)− f (x+ h) g (x)

h

= g (x)f (x+ h)− f (x)

h+ f (x+ h)

g (x+ h)− g (x)

h

So that, by theorems 5.1 and 2.8,

limh→0

(f.g) (x+ h)− (f.g) (x)h

= g (x) f 0 (x) + f (x) g0 (x)

Page 47: Lecture Notes on Mathematics for Economists: A refresher

40 CHAPTER 5 DIFFERENTIABILITY

For the fourth part, notice that ∀x ∈ X, ∀h ∈ R\ 0 : x + h ∈ X,g (x+ h) 6= 0 and g (x) 6= 0,

(f/g) (x+ h)− (f/g) (x)h

=f (x+ h) /g (x+ h)− f (x) /g (x)

h

=f (x+ h) g (x)− f (x) g (x+ h)

hg (x+ h) g (x)

+f (x) g (x)− f (x) g (x)

hg (x+ h) g (x)

=(f (x+ h)− f (x)) g (x)

hg (x+ h) g (x)

−(g (x+ h)− g (x)) f (x)

hg (x+ h) g (x)

and, again, by theorems 5.1 and 2.8, we have that

limh→0

(f/g) (x+ h)− (f/g) (x)h

=g (x) f 0 (x)− f (x) g0 (x)

(g (x))2

Exercise 5.1 Solve exercise 2.11 (except for exercise k), in page 29 of Simonand Blume. Also find the derivatives of

f (x) =

√x

1 + x

f (x) =x

1 +√x

f (x) =1

1 +√x

at x ∈ R++.Exercise 5.2 Let n ∈ N. Suppose that ∀n ∈ 1, ..., n, fn : X → R isdifferentiable. Define g : X → R by

g (x) = f1 (x) .f2 (x) ...fn (x)

Is g differentiable? Compute g0 (x) and show that

g0 (x)g (x)

=f 01 (x)f1 (x)

+f 02 (x)f2 (x)

+ ...+f 0n (x)fn (x)

Page 48: Lecture Notes on Mathematics for Economists: A refresher

5.1 FUNCTIONS ON R 41

Remark: Shorthand notation for the last two expressions would respectivelybe

g (x) =nY

n=1

fn (x)

andg0 (x)g (x)

=nX

n=1

f 0n (x)fn (x)

(Hint: Use the principle of mathematical induction, example 1.1.)

Another important result is:

Theorem 5.4 If x ∈ R, y ∈ R++, then(ex)0 = ex

ln (y)0 =1

y

Proof. The proof of this result is not complicated, but we won’t attempt ithere.

5.1.4 Higher order derivatives

Suppose that f : X → R, is differentiable. Then, ∀x ∈ X, ∃f 0 (x) ∈ R. Inother words, the derivative itself assigns a real number f 0 (x) to each x ∈ X.This means that f 0 : X → R.If f : X → R, is differentiable and f 0 : X → R is continuous, we say that

f is continuously differentiable. In such case, we say that f ∈ C1, and referto C1 as the class of continuously differentiable functions.

Definition 5.4 Consider f : X → R, f ∈ C1. If f 0 is differentiable atx ∈ X, we say that f is twice differentiable at x, and define the second-orderderivative of f at x, denoted f 00 (x), to be the derivative of f 0 at x.

In other words, we define f 00 (x) to be (f 0)0 (x), whenever f 0 is differen-tiable at x. This means that ∃ ∈ R :

limh→0

f 0 (x+ h)− f 0 (x)h

=

and we let f 00 (x) = .

Page 49: Lecture Notes on Mathematics for Economists: A refresher

42 CHAPTER 5 DIFFERENTIABILITY

Definition 5.5 A function f : X → R is said to be twice differentiable (onX) if it is twice differentiable at all x ∈ X.

As before, if f : X → R is twice differentiable, then f 00 : X → R. Inthis case, if f 00 : X → R is continuous, we say that f is twice continuouslydifferentiable and that f ∈ C2, where C2 is the class of twice continuouslydifferentiable functions. It follows then that C2 ⊆ C1.We can continue to define higher-order levels of differentiability in a re-

cursive manner. Fix k ∈ N. Denote by Ck−1 the class of (k − 1) timescontinuously differentiable functions and, for f ∈ Ck−1, denote by f [k−1] (x)the (k − 1)th-order derivative of f at x ∈ X. Then:

Definition 5.6 Consider f : X → R, f ∈ Ck−1. If f [k−1] is differentiable atx ∈ X, we say that f is k times differentiable at x, and define the kth-orderderivative of f at x, denoted f [k] (x) , to be the derivative of f [k−1] at x.

Definition 5.7 A function f : X → R is said to be k times differentiable(on X) if it is k times differentiable at all x ∈ X.

Once again, if f : X → R is k times differentiable, then f [k] : X → R. Inthis case, if f [k] : X → R is continuous, we say that f is k times continuouslydifferentiable and that f ∈ Ck, where Ck is the class of k times continuouslydifferentiable functions. It follows then that

Ck ⊆ Ck−1 ⊆ . . . ⊆ C2 ⊆ C1

Finally, if ∀k ∈ N, we have f ∈ Ck, then we say that f is infinitelydifferentiable (or “smooth”) and that f ∈ C∞, where C∞ is the class ofinfinitely differentiable functions.

Example 5.3 Consider the function f : R→ R, defined by

f (x) =

x3

3if x > 0

−x3

3if x < 0

It is easy to see that f is differentiable and

f 0 (x) =

x2 if x > 0−x2 if x < 0

Page 50: Lecture Notes on Mathematics for Economists: A refresher

5.1 FUNCTIONS ON R 43

which is continuous, so that f ∈ C1. From f 0, we further conclude that f istwice differentiable and

f 00 (x) =

2x if x > 0−2x if x < 0

= 2 |x|which is continuous, so that f ∈ C2. However, we know from example 5.2that f 00 is not differentiable at 0. Thus, we conclude that ∀k ∈ N, k > 3 it istrue that f /∈ Ck. Another way to state this result is to say that f ∈ C2\C3.Exercise 5.3 Polynomials. A polynomial of degree n ∈ N, defined on adomain X, is a function Pn : X → R, defined by

∀x ∈ X,Pn (x) =nX

j=0

ajxj

where ∀j ∈ 0, 1, 2, ..., n , aj ∈ R and an 6= 0. Show that ∀n ∈ N, and∀ ajnj=0, Pn ∈ C∞. (Hint: Probably the easiest way to do this is by mathe-matical induction on the order of the polynomial.)

5.1.5 Derivatives and limits

A very useful result is the following:

Theorem 5.5 (L’Hopital’s rule:) Suppose that f : (a, b) → R and g :(a, b) → R\ 0, where −∞ 6 a < b 6 ∞, are differentiable, x ∈ [a, b] ,and for ∈ R ∪ −∞,∞,

limx→x

f 0 (x)g0 (x)

=

Iflimx→x

f (x) = 0 and limx→x

g (x) = 0

then

limx→x

µf

g

¶(x) =

Proof. The proof of this result is complicated and we won’t try it here.

Page 51: Lecture Notes on Mathematics for Economists: A refresher

44 CHAPTER 5 DIFFERENTIABILITY

5.2 Functions on RK

5.2.1 Partial differentiability

We now consider X ⊆ RK open.Suppose that x ∈ X and fix k ∈ 1, ..., K. It is easy to see that the set

∆ = δ ∈ R| (x1, ..., xk−1, xk + δ, xk+1, ..., xK) ∈ X

is open as subset of R and that the function ϕ : ∆ −→ R, defined by

ϕ (δ) = f (x1, ..., xk−1, xk + δ, xk+1, ..., xK)

is well defined. So, we can directly apply all the ideas of the first section ofthis chapter to function ϕ and study the differentiability of function f when,starting from the point x, we change the kth argument of the function butmaintain the other arguments fixed at (x1, ..., xk−1, xk+1, ..., xK). When ϕ isdifferentiable at zero, we say that f is partially differentiable with respectto xk at x, and say that the partial derivative of f with respect to xk at x,denoted by ∂f

∂xk(x) is

∂f

∂xk(x) = ϕ0 (0)

When f is partially differentiable with respect to xk at every x ∈ X, wesay that f is partially differentiable with respect to xk. When f is partiallydifferentiable with respect to xk for every k ∈ 1, ..., K, we say that f ispartially differentiable.Notice that each ∂f

∂xk(x) depends on x and not just on xk (exceptions ob-

viously exist). Importantly, we can study continuity of these functions andtalk of continuous partial differentiability, and we can also study differen-tiability of the functions and introduce higher-order partial differentiability.For example, if the function ∂f

∂xkis differentiable with respect to xj at x, then

we say that the second derivative of f with respect to xj and xk, denoted by∂2f

∂xj∂xk(x), is

∂2f

∂xj∂xk(x) =

∂³

∂f∂xk

´∂xj

(x)

When appropriate, we define the Hessian of f at x, denoted by Hf (x),

Page 52: Lecture Notes on Mathematics for Economists: A refresher

5.2 FUNCTIONS ON RK 45

by the double-array (K ×K matrix)

Hf (x) =

∂2f

∂x1∂x1(x) · · · ∂2f

∂x1∂xK(x)

.... . .

...∂2f

∂xK∂x1(x) · · · ∂2f

∂xK∂x1(x)

A crucial result in calculus is:

Theorem 5.6 (Young’s theorem) If f is twice- partially differentiable,then Hf (x) is symmetric.

Proof. The proof is standard in calculus and is omitted here.

5.2.2 Differentiability

The concept of partial differentiability is important (and straightforward but,at least in principle, limited: it only consider perturbations in the “canonical”directions: one keeps all but one of the arguments fixed. A stronger conceptis needed for more general perturbations, which we can do by analogy totheorem 5.1.

Definition 5.8 A function f : X → R is differentiable, and has gradientDf (x) = ∆, at x ∈ X if ∃ε > 0 and ∃ϕ : Bε (0)→ R such that

limh→0

khk−1 ϕ (h) = 0

and∀h ∈ Bε (0) , f (x+ h) = f (x) +∆h+ ϕ (h)

For reasons that will later be obvious, the gradient of a function at somepoint is the direction in which the function increases most rapidly, whenperturbed infinitesimally in its domain. As before, we can study continuityof Df as a function of f and talk of continuous differentiability.The definition is not easy to verify, but the following result bridges the

gap between differentiability and the more operational concept of partialdifferentiability

Page 53: Lecture Notes on Mathematics for Economists: A refresher

46 CHAPTER 5 DIFFERENTIABILITY

Theorem 5.7 If f is continuously partially differentiable at x, then it isdifferentiable at x, and

Df (x) =

µ∂f

∂x1(x) , ...,

∂f

∂xK(x)

¶If, on the other hand, f is continuously differentiable at x, then it is

continuously partially differentiable at x, andµ∂f

∂x1(x) , ...,

∂f

∂xK(x)

¶= Df (x)

Higher order differentiability is also possible. If f : X → R is differen-tiable at x and ∃ε > 0 and ∃ϕ : Bε (0)→ R such that

limh→0

khk−2 ϕ (h) = 0

and

∀h ∈ Bε (0) , f (x+ h) = f (x) +Df (x)h+1

2h>∆h+ ϕ (h)

for some real-valued K ×K matrix ∆, then we say that f is twice differen-tiable at x and

D2f (x) = ∆

It follows from the previous theorem that if f is twice continuously partiallydifferentiable at x, then it is twice differentiable at x, and

D2f (x) = Hf (x)

whereas, on the other hand, if f is twice continuously differentiable at x,then it is twice continuously partially differentiable at x, and

Hf (x) = Df (x)

Page 54: Lecture Notes on Mathematics for Economists: A refresher

Chapter 6

Composite functions

Throughout this chapter, we maintain the assumptions X ⊆ RK , K ∈ N andY ⊆ R.

Definition 6.1 Given the functions f : X → Y and g : Y → R, we definethe composite function (g f) : X → R as follows: ∀x ∈ X, (g f) (x) =g (f (x)).

6.1 Continuity of composite functions

Theorem 6.1 Suppose that f : X → Y is continuous at x ∈ X, and g :Y → R is continuous at f (x). Then, (g f) : X → R is continuous at x.

Proof. Fix ε > 0. Since g is continuous at f (x), ∃γ > 0 : ∀y ∈ Bγ (f (x)) ∩Y, |g (y)− g (f (x))| < ε. Fix one such γ.Since f is continuous at x andγ > 0, ∃δ > 0 : ∀x ∈ Bδ (x) ∩ X, |f (x)− f (x)| < γ. Thus, ∀x ∈ Bδ (x) ∩X, f (x) ∈ Bγ (f (x)). And since ∀x ∈ X, f (x) ∈ Y , we have that ∀x ∈Bδ (x) ∩ X, f (x) ∈ Bγ (f (x)) ∩ Y , and therefore |g (f (x))− g (f (x))| < ε.

Corollary 6.1 Suppose that f : X → Y and g : Y → R are continuous.Then, (g f) : X → R is continuous.

Proof. It follows straightforwardly from the theorem.

47

Page 55: Lecture Notes on Mathematics for Economists: A refresher

48 CHAPTER 6 COMPOSITE FUNCTIONS

6.2 Differentiability: the chain rule

We now further assume that X and Y are both open. For simplicity, weassume K = 1 (i.e. X ⊆ R). The higher-dimensional result for partialdifferentiation is straightforward.

Theorem 6.2 (The Chain Rule) Suppose that f : X → Y is differen-tiable at x ∈ X, and g : Y → R is differentiable at f (x). Then, (g f) : X →R is differentiable at x, and

(g f)0 (x) = g0 (f (x)) f 0 (x)

Proof. By definition of derivative, we are interested in

(g f) (x+ h)− (g f) (x)h

for h ∈ R\ 0 : (x+ h) ∈ X. By definition of composite function, thisexpression equals

g (f (x+ h))− g (f (x))

h

Now, for simplicity of notation, let y = f (x) andΩ = k ∈ R | (y + k) ∈ Y .Now, we define ϕ : Ω→ R as follows:

ϕ (k) =

g(y+k)−g(y)

k− g0 (y) if k 6= 0

0 if k = 0

Notice that limk→0 ϕ (k) = 0. This, together with the fact that g is continuous(theorem 5.2) suffices to ensure that ϕ is continuous, by theorem 4.1.Clearly, ∀k ∈ Ω it is true that

g (y + k)− g (y) = k. (ϕ (k) + g0 (y))

Now, evaluate the last expression at kh = f (x+ h)− f (x) (we keep thesubscript h to remark that k will change as h does1). By construction, wehave

g (f (x+ h))− g (f (x)) = kh. (ϕ (kh) + g0 (y))

1Formally, we are introducing a function k : h ∈ R : (x+ h) ∈ X → Ω. Notice thatthis function is continuous. For simplicity, we are denoting the function using just thesubscript (i.e., we are writing kh rather that k (h).)

Page 56: Lecture Notes on Mathematics for Economists: A refresher

6.2 DIFFERENTIABILITY: THE CHAIN RULE 49

so that (for h 6= 0)g (f (x+ h))− g (f (x))

h=

khh. (ϕ (kh) + g0 (y))

Now, notice that, by definition,

limh→0

khh= lim

h→0f (x+ h)− f (x)

h= f 0 (x) ∈ R

whereas2,limh→0

ϕ (kh) = limk→0

ϕ (k) = 0

and g0 (y) = g0 (f (x)) ∈ R. Thus, by theorem 2.8,

limh→0

g (f (x+ h))− g (f (x))

h= g0 (f (x)) f 0 (x) ∈ R

Corollary 6.2 Suppose that f : X → Y and g : Y → R are differentiable.Then, (g f) : X → R is differentiable.

Proof. It follows straightforwardly from the theorem.

Exercise 6.1 Solve exercises 2.11.k, 4.1, 4.5 and 5.8 in pages 29, 74, 75and 97 of Simon and Blume, respectively.

2Following what we said in the last footnote, what we have is

limh→0

ϕ (k (h)) = limh→0

(ϕ k) (h)

Now, by corollary 6.1, ϕ k is continuous, so that

limh→0

(ϕ k) (h) = (ϕ k) (0) = 0

Page 57: Lecture Notes on Mathematics for Economists: A refresher
Page 58: Lecture Notes on Mathematics for Economists: A refresher

Chapter 7

Taylor approximations

For simplicity, throughout this chapter, we maintain the assumption X ⊆ R,X open.

7.1 Approximations by polynomials

Suppose that we have a function f : X → R, f ∈ Cm. Suppose that 0 ∈ X.Suppose also that for some n ∈ N, n 6 m, we want to construct an nth-degreepolynomial, Pn : X → R, such that the values of f and its first n derivativesevaluated at 0 are the same as the values of Pn and its first n derivatives at0.

We know that ∀ (aj)nj=0, Pn (x) has the form

Pn (x) = a0 + a1x+ a2x2 + ...+ anx

n

51

Page 59: Lecture Notes on Mathematics for Economists: A refresher

52 CHAPTER 7 TAYLOR APPROXIMATIONS

and we know that Pn ∈ C∞. Also, since1

P 0n (x) = a1 + 2a2x+ ...+ nanx

n−1

P 00n (x) = 2a2 + 3.2a3x...+ n (n− 1) anxn−2

P 000n (x) = 3.2a3 + 4.3.2a4x...+ n (n− 1) (n− 2) anxn−3

...

P [n]n (x) = n (n− 1) (n− 2) ...(2)(1)an

So that

Pn (0) = a0

P 0n (0) = a1

P 00n (0) = 2a2

P 000n (0) = 3.2a3 = 3!a3...

P [n]n (0) = n (n− 1) (n− 2) ...(2)(1)an = n!an

Now, since what we want is to find (aj)nj=0 such that

Pn (0) = f (0)

P 0n (0) = f 0 (0)

P 00n (0) = f 00 (0)

P 000n (0) = f 000 (0)

...

P [n]n (0) = f [n] (0)

it is easy to see that the (only) sequence (aj)nj=0 that satisfies such equalities

is to let

a0 = f(0)

∀k ∈ N, k 6 n, an =1

k!f [k] (0)

1In what follows, I’m going to use an extensive notation. If you want to keep notationshort, although complicated, notice that ∀k ∈ N , k 6 n,

P [k]n (x) =nX

j=k

j!

(j − k)!ajx

j−k

Page 60: Lecture Notes on Mathematics for Economists: A refresher

7.1 APPROXIMATIONS BY POLYNOMIALS 53

so that our desired polynomial is

Pn (x) = f (0) + f 0 (0)x+1

2f 00 (0) x2 + ...+

1

n!f [n] (0)xn

Since the restrictions we imposed are such that f and Pn are very closeto each other and have the same derivatives when x is close to 0, we say thatPn is an nth-order approximation of f about 0. Usually, this fact is expressedby saying that

f (x) ≈ f (0) + f 0 (0)x+1

2f 00 (0)x2 + ...+

1

n!f [n] (0)xn

One important remark is in order. Notice first that ∀ ajnj=0, ∀x ∈X,∀k > n, P

[k]n (x) = 0. This implies that we should not try to equate more

than the first n derivatives of the polynomial to the ones of the function.However, it is true that the higher the degree of the polynomial (subject tothe differentiability of f), the better the approximation to the function. Wewill later come back to this point. However, the following example may beillustrative:

Example 7.1 Let f : [−0.5, 0.5]→ R be defined by f (x) = 1(1+x)2

. One can

check that f ∈ C∞. Also, we have that f 0 (x) = −2(1+x)3

and f 00 (x) = 6(1+x)4

.In particular, f (0) = 1, f 0 (0) = −2 and f 00 (0) = 6. Thus, we have that

P1 (x) = 1− 2xP2 (x) = 1− 2x+ 3x2

The graph of f and these two polynomial approximations is figure 1 below.Notice how well the second-order polynomial (thin curve) approximates thefunction (dashed curve). It certainly does better than the first-order polyno-

Page 61: Lecture Notes on Mathematics for Economists: A refresher

54 CHAPTER 7 TAYLOR APPROXIMATIONS

mial (thick line)!

0.79

0.84

0.89

0.94

0.99

1.04

1.09

1.14

1.19

1.24

-0.15 -0.1 -0.05 0 0.05 0.1 0.15

X

f(x) P1(x) P2(x)

f (x), P1 (x) and P2 (x) for x ∈ [−0.1, 0.1].

Exercise 7.1 Develop first and second order polynomial approximations tof : [−1, 1]→ R, defined by f (x) = ex.

Exercise 7.2 Argue that

e =∞Xn=0

1

n!

7.2 Taylor approximations

The method that we used in the previous section is limited in that it requiresthat 0 ∈ X, and in that we can only approximate the function about 0. Thenatural way to generalize this particular case is the use of Taylor polynomials.Let x ∈ R. An nth-degree polynomial about x is a function Pn,x : R→ R

of the form:

Pn,x (x) = a0 + a1 (x− x) + a2 (x− x)2 + ...+ an (x− x)n

Page 62: Lecture Notes on Mathematics for Economists: A refresher

7.2 TAYLOR APPROXIMATIONS 55

where, ∀j ∈ 0, 1, 2, ..., n , aj ∈ R. It is easy to show that ∀x ∈ R, ∀n ∈ N,and ∀ ajnj=0, Pn,x ∈ C∞.Now, suppose that we have a function f : X → R, f ∈ Cm and that, for

some n ∈ N, n 6 m, and some x ∈ X, we want to construct an nth-degreeTaylor polynomial, Pn,x : X → R, such that the values of f and its firstn derivatives evaluated at x are the same as the values of Pn,x and its firstn derivatives at x. Then, we only have to repeat the procedure we alreadyused.Again, we have that

P 0n,x (x) = a1 + 2a2 (x− x) + ...+ nan (x− x)n−1

P 00n,x (x) = 2a2 + 3.2a3 (x− x) ...+ n (n− 1) an (x− x)n−2

P 000n,x (x) = 3.2a3 + 4.3.2a4 (x− x) ...+ n (n− 1) (n− 2) an (x− x)n−3

...

P[n]n,x (x) = n (n− 1) (n− 2) ...(2)(1)an

So that

Pn,x (x) = a0

P 0n,x (x) = a1

P 00n,x (x) = 2a2

P 000n,x (x) = 3.2a3 = 3!a3

...

P[n]n,x (x) = n (n− 1) (n− 2) ...(2)(1)an = n!an

Now, since what we want is to find ajnj=0 such that

Pn,x (x) = f (x)

P 0n,x (x) = f 0 (x)

P 00n,x (x) = f 00 (x)

P 000n,x (x) = f 000 (x)

...

P[n]n,x (x) = f [n] (x)

Page 63: Lecture Notes on Mathematics for Economists: A refresher

56 CHAPTER 7 TAYLOR APPROXIMATIONS

it is easy to see that the (only) sequence (aj)nj=0 that satisfies such equalities

is to let

a0 = f(x)

∀k ∈ N, k 6 n, an =1

k!f [k] (x)

When we use these particular values of (aj)nj=0 we obtain the n

th-degreeTaylor polynomial approximation to f about x. We denote this function byTf,n,x : X → R, and define it as

Tf,n,x (x) = f (x) + f 0 (x) (x− x) +1

2f 00 (x) (x− x)2 + ...+

1

n!f [n] (x) (x− x)n

= f (x) +nX

j−0

1

j!f [j] (x) (x− x)j

Again, in order to highlight that this is an approximation to f , it isusually written

f (x) ≈ f (x) + f 0 (x) (x− x) +1

2f 00 (x) (x− x)2 + ...+

1

n!f [n] (x) (x− x)n

Exercise 7.3 Argue that

ex =∞Xn=0

1

n!xn

Exercise 7.4 Develop first and second order Taylor approximations around1 to f : R++ → R, defined by f (x) = ln (x).

7.3 The remainder

In this section, we maintain the assumption that we have f : X → R,f ∈ Cm, and that for some n ∈ N, n 6 m, and some x ∈ X, Tf,n,x : X → Ris the nth-degree Taylor approximation to f about x.

7.3.1 The remainder

Definition 7.1 We define the remainder of the nth-degree Taylor polynomialapproximation to f about x, denoted Rf,n,x, by the function Rf,n,x : X → R,where, ∀x ∈ X,

Rf,n,x (x) = f (x)− Tf,n,x (x)

Page 64: Lecture Notes on Mathematics for Economists: A refresher

7.3 THE REMAINDER 57

The remainder measures (locally) the error that we are making whenapproximating the function by the nth-degree Taylor polynomial. It followsthen, by definition, that

f (x) = Tf,n,x (x) +Rf,n,x (x)

and thatRf,n,x (x) = 0

but these properties are of no particular interest, since they are imposed byconstruction.

7.3.2 Mean value and Taylor’s theorems

Before we introduce the most important property of the remainder, the fol-lowing result is interesting:

Theorem 7.1 (The Mean Value Theorem) Suppose that we have f :X → R, f differentiable. If x, x ∈ X are such that [x, x] ⊆ X, [x, x] 6= ∅,then ∃x∗ ∈ [x, x] such that

f (x) = f (x) + f 0 (x∗) (x− x)

Similarly, if x, x ∈ X are such that [x, x] ⊆ X, [x, x] 6= ∅, then ∃x∗ ∈ [x, x]such that

f (x) = f (x) + f 0 (x∗) (x− x)

Notice the similarity between the expression resulting from the meanvalue theorem and a first-degree polynomial approximation about x. Theonly difference is that the derivative is not (necessarily) evaluated at x, but(maybe) at some other point in the interval between x and x. The importanceof the result is that, with just that little change, our result is no longer anapproximation: it is exact!The mean value theorem allows to prove the following result. Recall that

we maintain the assumptions introduced at the beginning of this section:

Theorem 7.2 (Taylor’s Theorem) Suppose that f is (n+ 1) times differ-entiable (i.e. f [n+1] : X → R exists). If x, x ∈ X are such that [x, x] ⊆ X,[x, x] 6= ∅, then ∃x∗ ∈ [x, x] :

Rf,n,x (x) =1

(n+ 1)!f [n+1] (x∗) (x− x)n+1

Page 65: Lecture Notes on Mathematics for Economists: A refresher

58 CHAPTER 7 TAYLOR APPROXIMATIONS

Similarly, if x, x ∈ X are such that [x, x] ⊆ X, [x, x] 6= ∅, then ∃x∗ ∈ [x, x] :

Rf,n,x (x) =1

(n+ 1)!f [n+1] (x∗) (x− x)n+1

Two remarks are in order:Notice first that n < m suffices as hypothesis for the theorem. But it is

also important to notice that we do not require f [n+1] to be continuous, onlyto exist.Second, notice again how similar this expression for the remainder is to

each one of the terms of the Taylor polynomial. Again, the only differenceis that the derivative in the remainder is computed at some point in theinterval between x and x (an interval that must, obviously, be part of thedomain), rather than at x. The importance comes then from the fact that,if we have f : X → R, f ∈ Cn, and f [n+1] exists, then, for some in the x∗

interval between x and x, the expression

f (x) = f (x) + f 0 (x) (x− x) +1

2f 00 (x) (x− x)2 + ...

+1

n!f [n] (x) (x− x)n +

1

(n+ 1)!f [n+1] (x∗) (x− x)n+1

is not an approximation. It is exact.Unfortunately, we do not (yet) have the necessary elements to prove the

mean value theorem (and, therefore, Taylor’s theorem). However, we shouldbe able to convince ourselves that the mean value theorem is intuitively clear.

7.4 Local accuracy of Taylor approximations

Although the mean value and Taylor’s theorems are very important, in manycases one doesn’t have the possibility to find which particular point in theinterval between x and x will make our expressions exact. In those cases,we must stick to our nth-order approximation. In this section, we claim thatTaylor approximations are very good whenever x and x are close to eachother, and that the higher n the better (locally) is the approximation. Theexact sense in which this is true is that we will argue that as x→ x, we havethat Rf,n,x (x)→ 0 faster than (x− x)n. In other words, we claim that

limx→x

µRf,n,x (x)

(x− x)n

¶= 0

Page 66: Lecture Notes on Mathematics for Economists: A refresher

7.4 LOCAL ACCURACY OF TAYLOR APPROXIMATIONS 59

Notice that if n < m, then, by Taylor’s theorem, the result is straightfor-ward. However, since we didn’t prove the theorem, you may still be doubtingabout the result. We now offer a heuristic argument, for n = 1.In such a case, it is clear that

Rf,1,x (x) = f (x)− f (x)− f 0 (x) (x− x)

so that ∀x 6= x,Rf,1,x (x)

(x− x)=

f (x)− f (x)

(x− x)− f 0 (x)

and, by definition,

limx→x

Rf,1,x (x)

(x− x)= lim

x→x

µf (x)− f (x)

(x− x)− f 0 (x)

¶= f 0 (x)− f 0 (x)

= 0

meaning that Rf,1,x (x) goes to 0 faster than x− x.

Exercise 7.5 (L’Hopital’s rule is really useful, I) Remember L’Hopital’srule? If so, you can further convince yourself of our claim for n > 2. Forexample, if n = 2, notice that

Rf,2,x (x) = Rf,1,x (x)− 12f 00 (x) (x− x)2

Now, you can use L’Hopital’s rule (after showing that it applies, of course,)to argue that

limx→x

Rf,1,x (x)

(x− x)2=1

2f 00 (x)

from where it follows that

limx→x

Rf,2,x (x)

(x− x)2= 0

Moreover, if you do this, you can prove the general result by mathematicalinduction.

Page 67: Lecture Notes on Mathematics for Economists: A refresher
Page 68: Lecture Notes on Mathematics for Economists: A refresher

Chapter 8

Linear algebra

Let K, J, L,N ∈ N.

8.1 Matrices

Given a finite sequence (xn)Nn=1 in RK, define the span of (xn)

Nn=1, denoted

Sp (xn)Nn=1, as

Sp (xn)Nn=1 =

(x ∈ RK

¯ ³∃ (θn)Nn=1 in R´ : NXn=1

θnxn = x

)

That is, the span of a sequence of vectors is the set of all its possible linearcombinations.The canonical vectors in RK are the sequence (ek)

Kk=1 in RK, where, ∀k ∈

1, ..., K, ek,l = 1 if k = l, and ek,l = 0 otherwise. Evidently,

Sp (ek)Kk=1 = R

K

Any sequence (xn)Nn=1 in RK such that Sp (xn)

Nn=1 = RK is called a basis

for RK . In particular, (ek)Kk=1 is called the canonical basis for RK .

A K × J matrix is an array of J ∈ N vectors in RK, each of which istaken as a column. If K = J the matrix is said to be square.

Example 8.1 The identity matrix of sizeK, denoted I, is I =he1 e2 · · · eK

i.

61

Page 69: Lecture Notes on Mathematics for Economists: A refresher

62 CHAPTER 8 LINEAR ALGEBRA

Definition 8.1 Given a K × J matrix

A =

a1,1 a1,2 · · · a1,J

a2,1 a2,2 · · · a2,J...

.... . .

...

aK,1 aK,2 · · · aK,J

the transpose of A, denoted by A>, is the J ×K matrix

A> =

a1,1 a2,1 · · · aK,1

a1,2 a2,2 · · · aK,2

....... . .

...

a1,J a2,J · · · aK,J

Exercise 8.1 Prove the following: if A is a K ×J matrix and B is a J ×Lmatrix, then (AB)> = B>A>.

A matrix A is said to be symmetric if A> = A, which obviously requiresit to be square.

8.2 Linear functions

Definition 8.2 A function f : RK −→ RJ , where J ∈ N, is said to be linearif

1. ∀x, x0 ∈ RK, f (x+ x0) = f (x) + f (x0);

2. ∀x ∈ RK and ∀θ ∈ R, f (θx) = θf (x).

Theorem 8.1 A function f : RK −→ RJ is linear iff there exists a J ×Kmatrix A such that ∀x ∈ RK, f (x) = Ax

Proof. Sufficiency is obvious. For necessity, consider the following (J ×K)matrix

A =hf (e1) f (e2) · · · f (eK)

iExercise 8.2 Show that if f : RK −→ RJ and g : RJ −→ RL are linear,then g f is linear.

Page 70: Lecture Notes on Mathematics for Economists: A refresher

8.3 DETERMINANTS 63

8.3 Determinants

Definition 8.3 Given a K ×K matrix

A =

a1,1 a1,2 · · · a1,K

a2,1 a2,2 · · · a2,K...

.... . .

...

aK,1 aK,2 · · · aK,K

denote by A¬(j,l) the (K − 1) × (K − 1) matrix resulting from deleting thejth row and the lth column. The determinant of A, denoted by det (A), isconstructively defined by

det (A) =KXk=1

(−1)1+k a1,k det¡A¬(1,k)

¢The latter definition is known as cofactor expansion along the first row.

It is a fact that the determinant can be found using any one of the rows orcolumns for the cofactor expansion, which incidentally implies that det (A) =det

¡A>¢.

Exercise 8.3 Prove the following results:

1. det (I) = 1

2. For any pair of K×K matrices A and B, det (AB) = det (A)+det (B).

3.

det

a1,1 a1,2

a2,1 a2,2

= a1,1a2,2 − a2,1a1,2

Important and easily verifiable facts about determinants are the following:let (ak)

Kk=1 be in RK ,

1. ∀x ∈ RK

det³h

a1 + x a2 · · · aK

i´= det

³ha1 a2 · · · aK

i´+det

³hx a2 · · · aK

Page 71: Lecture Notes on Mathematics for Economists: A refresher

64 CHAPTER 8 LINEAR ALGEBRA

2. ∀θ ∈ R,det

³hθa1 a2 · · · aK

i´= θ det

³ha1 a2 · · · aK

i´3. det

³ha1 + a2 a2 · · · aK

i´= det

³ha1 a2 · · · aK

i´.

4. det³h

a2 a1 · · · aK

i´= − det

³ha1 a2 · · · aK

i´.

8.4 Linear independence, dimension and rank

Definition 8.4 A sequence (xn)Nn=1 in RK is linearly independent if

nXn=1

θnxn = 0 =⇒ (θn)Nn=1 = (0)

Nn=1

Exercise 8.4 Prove the following: (xn)Nn=1 is linearly dependent iff for some

n∗ ∈ 1, ..., N, and some (θn)n∈1,...,N\n∗ such thatxn∗ =

Xn∈1,...,N\n∗

θnxn

A set L ⊆ RK is said to be a linear subspace of RK, denoted by L 6 RK ,if (i)∀x, y ∈ L, x+ y ∈ L; and (ii) ∀x ∈ L and ∀θ ∈ R, θx ∈ L.

Exercise 8.5 Prove the following: for every (xn)Nn=1 in RK, Sp (xn)

Nn=1 6

RK.

Definition 8.5 A sequence (xn)Nn=1 is a basis for a linear subspace L of RK

if it is linearly independent and Sp (xn)Nn=1 = L.

It is a fact that any two bases of a linear subspace have the same numberof vectors. This common number is known as the dimension of L. Thedimension of RK is K, while the dimension of any linear subspace of RK isat most K.Given a K × J matrix A =

ha1 a2 · · · aJ

i, the span of A, denoted by

Sp (A), is Sp (A) = Sp (aj)Jj=1.

Definition 8.6 Given aK×J matrix A, the rank of A, denoted by rank (A),is the dimension of Sp (A).

It is a fact that rank (A) ≤ min (K, J). Intuitively, the rank is the largestnumber of linearly independent rows or columns of the matrix.

Page 72: Lecture Notes on Mathematics for Economists: A refresher

8.5 INVERSE MATRIX 65

8.5 Inverse matrix

Definition 8.7 A K × K matrix A is said to be invertible if there existsanother K ×K matrix B such that BA = BA = I. In this case, B is saidto be the inverse matrix of A, and is denoted by A−1.

Exercise 8.6 Prove that if A and B are invertible K × K matrices, thenAB is invertible and (AB)−1 = B−1A−1.

The fundamental theorem of Algebra, which we now state without proof,says the following:

Theorem 8.2 Given a K ×K matrix A =ha1 a2 · · · aK

i, the following

statements are equivalent:

1. A is invertible.

2. det (A) 6= 0.

3. rank (A) = K.

4. (ak)Kk=1 is linearly independent.

Also, a key result that makes it operational to find the inverse of aninvertible matrix is the following: given invertible A,

A−1 =1

det (A)

a1,1 a1,2 · · · a1,K

a2,1 a2,2 · · · a2,K...

.... . .

...

aK,1 aK,2 · · · aK,K

>

where aj,l =h(−1)j+l det ¡A¬(j,l)¢i.

Page 73: Lecture Notes on Mathematics for Economists: A refresher

66 CHAPTER 8 LINEAR ALGEBRA

8.6 Eigenvalues and Eigenvectors

Let a K ×K matrix A be fixed.

Definition 8.8 An Eigenvector of A is x ∈ RK\ 0 such that for someλ ∈ R, Ax = λx. The λ associated to an Eigenvector is called Eigenvalue.

Intuitively, an Eigenvector is such that the linear transformation simplyre-scales it by the associated Eigenvalue. This implies that Eigenvectors canonly be determined up to scalar multiplication.Eigenvectors and Eigenvalues are also know as characteristic vectors and

values of the matrix, respectively. Since the system that defines them is(A− λI) x = 0, and we are interested in x 6= 0, we can find the Eigenvaluesby solving the equation det (A− λI) = 0. The latter equation is known ascharacteristic function. Notice that it is Kth-degree polynomial (so it has atmost K real roots, and in fact has K roots, whether real or imaginary).Three key results are the following: denote by λ1, ..., λK the Eigenvalues

of A; then,

1.PK

k=1 λk =PK

k=1 ak,k. The numberPK

k=1 ak,k is known as the trace ofA and is denoted Tr (A).

2.QK

k=1 λk = det (A) .

3. The number of nonzero Eigenvalues of A is rank (A).

8.7 Quadratic forms

Let a K ×K symmetric matrix A be fixed.The quadratic form associated toA is the function fA : RK −→ R; fA (x) =

x>Ax.

Definition 8.9 A is negative definite if ∀x ∈ RK\ 0, fA (x) < 0. It isnegative semidefinite if ∀x ∈ RK, fA (x) ≤ 0.Definition 8.10 A is positive definite if ∀x ∈ RK\ 0, fA (x) > 0. It isnegative semidefinite if ∀x ∈ RK, fA (x) ≥ 0.Exercise 8.7 Show that A is positive (semi-)definite iff −A is negative (semi-)definite.

Page 74: Lecture Notes on Mathematics for Economists: A refresher

8.7 QUADRATIC FORMS 67

Exercise 8.8 Show that I is positive definite.

Exercise 8.9 Show that A is negative semidefinite iff fA satisfies the follow-ing property: ∀x, x0 ∈ RK, ∀θ ∈ [0, 1],

fA (θx+ (1− θ)x0) ≥ θfA (x) + (1− θ) fA (x0) .

A key fact is the following: A is positive definite if all its Eigenvalues are(strictly) positive, and negative definite if all its Eigenvalues are (strictly)negative; moreover, it is positive semidefinite, but not definite, if all its Eigen-values are nonnegative and at least one is zero, and negative semidefinite,but not definite, if all its Eigenvalues are nonpositive and at least one is zero.The last observations and the second result about Eigenvalues make the

following theorems very intuitive.Define the principal minor of order k ∈ 1, ..., K of A as

det

a1,1 a1,2 · · · a1,k

a2,1 a2,2 · · · a2,k...

.... . .

...

ak,1 ak,2 · · · ak,k

that is, the determinant of the submatrix resulting from keeping the first krows and columns, dropping the rest. Also, the determinant of the submatrixresulting from keeping k rows and the corresponding columns, and droppingthe rest of the matrix, is called a minor of A of order k.

Theorem 8.3 A is negative definite iff each principal minor of order k ofA has sign (−1)k. It is positive definite iff all principal minors are positive.

Theorem 8.4 A is negative semidefinite iff all minors of order k of A havesign (−1)k or 0. It is positive semidefinite iff all minors are nonnegative.

Page 75: Lecture Notes on Mathematics for Economists: A refresher
Page 76: Lecture Notes on Mathematics for Economists: A refresher

Chapter 9

Concavity and convexity

Throughout this chapter, we maintain the assumption that X ⊆ RK, K ∈ Nand X 6= ∅.

9.1 Convex sets

Definition 9.1 A set X is said to be convex if ∀x, y ∈ X and ∀θ ∈ [0, 1],we have that θx+ (1− θ) y ∈ X.

Exercise 9.1 Show that X = x is convex. Show that ∀x ∈ R and ∀ε > 0,Bε (x) is convex. Is N convex? Is ∅ convex? Is R convex? Show that if Xand Y are convex, X ∩ Y is convex. Is X ∪ Y convex in that same case?

9.2 Concave and convex functions

Definition 9.2 Suppose that X is a convex set. A function f : X → R issaid to be concave if ∀x1, x2 ∈ X, and ∀θ ∈ [0, 1], it is true that

f (θx1 + (1− θ)x2) > θf (x1) + (1− θ) f (x2)

Definition 9.3 Suppose that X is a convex set. A function f : X → R issaid to be convex if ∀x1, x2 ∈ X, and ∀θ ∈ [0, 1], it is true that

θf (x1) + (1− θ) f (x2) > f (θx1 + (1− θ)x2)

Exercise 9.2 1. LetX = R. Is f (x) = x a concave or a convex function?

69

Page 77: Lecture Notes on Mathematics for Economists: A refresher

70 CHAPTER 9 CONCAVITY AND CONVEXITY

2. Let X = R. Is f (x) = x2 a concave or a convex function?

3. Let X = R. Is f (x) = x3 a concave or a convex function? What ifX = R+?

4. Let X = R+. Is f (x) =√x a concave or a convex function?

5. Let X = R. Is f (x) = |x| a concave or a convex function? What ifX = R+?

6. Let X = R2. Is f (x) = x1 + x2 a concave or a convex function?

7. Let X = R2+. Is f (x) = x1x2 a concave or a convex function?

From now on, we maintain the assumption that X is a convex set.

Theorem 9.1 A function f : X → R is concave iff the function (−f) : X →R, defined as (−f) (x) = −f (x), is convex.

Proof. It is left as an exercise.

The importance of the previous theorem is that it allows us to deriveproperties of convex functions straightforwardly from those of the concavefunctions.

Theorem 9.2 If f : X → R is concave, then ∀k ∈ N, and for any finitesequences (xn)

kn=1 and (θn)

kn=1 satisfying that ∀n ∈ 1, 2, ..., k, xn ∈ X, and

θn ∈ [0, 1] and thatPk

n=1 θn = 1, it is true that

f

ÃkX

n=1

θnxn

!>

kXn=1

θnf (xn)

Proof. It is left as an exercise. (Hint: remember the principle of mathemat-ical induction.)

Page 78: Lecture Notes on Mathematics for Economists: A refresher

9.3 CONCAVITY AND SECOND ORDER DERIVATIVES 71

9.3 Concavity and second order derivatives

Notice that concavity is not a differential property. However, when a functionis twice differentiable, there exists a tight relationship between concavity, andthe sign of the second derivatives of the function.

Theorem 9.3 Consider f : X → R, f ∈ C2. Then, f is concave iff ∀x ∈ X,D2f (x) is negative semidefinite.

Proof. (If:) Let x1, x2 ∈ X and θ ∈ [0, 1]. Since X is convex, we have thatθx1 + (1− θ)x2 ∈ X, and by Taylor’s theorem, we have that

f (x1) = f (θx1 + (1− θ)x2) + (1− θ)Df (θx1 + (1− θ)x2) (x1 − x2)

+(1− θ)2

2(x1 − x2)

>D2f (x∗1) (x1 − x2)

and that

f (x2) = f (θx1 + (1− θ)x2) + θDf (θx1 + (1− θ)x2) (x2 − x1)

+θ2

2(x2 − x1)

>D2f (x∗2) (x2 − x1)

where x∗1 lies in the interval between x1 and θx1+(1− θ)x2, and x∗2 lies in theinterval between x2 and θx1 + (1− θ)x2. Since ∀x ∈ X, D2f (x) is negativesemidefinite, it follows that

f (x1) 6 f (θx1 + (1− θ)x2) + (1− θ)Df (θx1 + (1− θ)x2) (x1 − x2)

f (x2) 6 f (θx1 + (1− θ)x2) + θDf (θx1 + (1− θ)x2) (x2 − x1)

Now, multiplying the first equation by θ and the second one by (1− θ),both of which are nonnegative, and adding, one gets that

θf (x1) + (1− θ) f (x2) 6 f (θx1 + (1− θ)x2)

(Only if:) For simplicity, we consider only the case K = 1 here, and deferthe general case for the Appendix of the Chapter.Since, ∀θ ∈ (0, 1] and ∀x, x ∈ X : x 6= x, we have that

f (θx+ (1− θ)x) > θf (x) + (1− θ) f (x)

Page 79: Lecture Notes on Mathematics for Economists: A refresher

72 CHAPTER 9 CONCAVITY AND CONVEXITY

Denote∆ = x−x 6= 0. This implies that x = x+∆ and that θx+(1− θ)x =x+ θ∆, so that our inequality becomes

f (x+ θ∆) > θf (x+∆) + (1− θ) f (x)

so that (since θ 6= 0 and ∆ 6= 0),

f (x+∆) 6 f (x) +f (x+ θ∆)− f (x)

θ∆∆

And since the latter is true ∀θ ∈ (0, 1] it is also true when we take θ → 0(by corollary 2.3). This implies that

f (x+∆) 6 f (x) + f 0 (x)∆ ( )

Now, keeping x fixed, fix also z ∈ R\ 0. Consider the function ϕ :θ ∈ [0, 1] | (x+ θz) ∈ X→ R defined by ϕ (θ) = f (x+ θz). Since f ∈ C2,we have ϕ ∈ C2, and by the chain rule ϕ0 (θ) = f 0 (x+ θz) z and ϕ00 (θ) =f 0 (x+ θz) z2.Also, ∀θ ∈ (0, 1], by Taylor’s theorem (7.2), we have that ∃θ∗ ∈ [0, θ] :

ϕ (θ) = ϕ (0) + ϕ0 (0) θ +1

2ϕ00 (θ∗) θ2

which means that

f (x+ θz) = f (x) + f 0 (x) zθ +1

2f 00 (x+ θ∗z) z2θ2

so that1

2f 00 (x+ θ∗z) z2θ2 = f (x+ θz)− f (x)− f 0 (x) zθ

By our previous finding (equation ), we have that the right hand side ofthe last inequality is nonpositive (just use zθ instead of ∆). Moreover, sinceθ > 0 and z 6= 0, this implies that

f 00 (x+ θ∗z) 6 0

Finally, since θ∗ ∈ [0, θ] and f ∈ C2, if we take the limit as θ → 0, wefind that

f 00 (x) 6 0

Page 80: Lecture Notes on Mathematics for Economists: A refresher

9.3 CONCAVITY AND SECOND ORDER DERIVATIVES 73

Corollary 9.1 Consider f : X → R, f ∈ C2. Then, f is convex iff ∀x ∈ X,D2f (x) is positive semidefinite.

Proof. It follows directly from theorems 9.1, 9.3 and 5.3.

Exercise 9.3 (L’Hopital’s rule is really useful, II) We now get anotherproof of the only if part of theorem 9.3 for the case K = 1. Your job is to com-plete and justify all the steps. We argue by contradiction: given f : X → R,f ∈ C2, suppose that f is concave but ∃x ∈ X, f 00 (x) > 0. Take y ∈ X\ xand θ ∈ (0, 1). By concavity, we have that

f (θx+ (1− θ) y) > θf (x) + (1− θ) f (y)

Now, since f ∈ C2, we can use the mean value theorem to get that

f (θx+ (1− θ) y) > f (x) + (1− θ) f 0 (x) (y − x)

+ (1− θ)1

2f 00 (x∗) (y − x)2

for some x∗ in the interval between x and y. Since y ∈ X\ x, this implies

(1− θ)1

2f 00 (x∗) 6 f (θx+ (1− θ) y)− f (x)− (1− θ) f 0 (x) (y − x)

(y − x)2

You can use L’Hopital’s rule and the chain rule (twice) to show that

limy→x

f (θx+ (1− θ) y)− f (x)− (1− θ) f 0 (x) (y − x)

(y − x)2= (1− θ)2

1

2f 00 (x)

while

limy→x

µ(1− θ)

1

2f 00 (x∗)

¶= (1− θ)

1

2f 00 (x)

and therefore,

(1− θ)1

2f 00 (x) 6 (1− θ)2

1

2f 00 (x)

Now, since f 00 (x) > 0, this implies that

(1− θ) 6 (1− θ)2

contradicting the fact that θ ∈ (0, 1).

Page 81: Lecture Notes on Mathematics for Economists: A refresher

74 CHAPTER 9 CONCAVITY AND CONVEXITY

9.4 Quasiconcave and strongly concave func-tions

Definition 9.4 Suppose that X is a convex set. A function f : X → R issaid to be quasiconcave if ∀x1, x2 ∈ X, and ∀θ ∈ [0, 1], it is true that

f (θx1 + (1− θ)x2) ≥ min f (x1) , f (x2)

Definition 9.5 Suppose that X is a convex set. A function f : X → R issaid to be strictly concave if ∀x1, x2 ∈ X, x1 6= x2, and ∀θ ∈ (0, 1), it is truethat

f (θx1 + (1− θ)x2) > θf (x1) + (1− θ) f (x2)

Definition 9.6 Suppose that X is a convex set. A function f : X → R issaid to be strongly quasiconcave if ∀x1, x2 ∈ X, x1 6= x2, and ∀θ ∈ (0, 1), itis true that

f (θx1 + (1− θ)x2) > min f (x1) , f (x2)

As before, we can find relationships between concavity, convexity andsecond-order derivatives. Again, we assume that X is convex.

Theorem 9.4 Consider f : X → R ∈ C2. If ∀x ∈ X, Df (x) 6= 0 and∀∆ ∈ RK\ 0 such that ∆ · Df (x) = 0, it is true that ∆>D2f (x)∆ < 0,then f is strictly concave.

Proof. The proof is complicated and thus omitted.Notice that the previous result is pretty intuitive from the point of view

of a second order Taylor approximation.

Theorem 9.5 Consider f : X → R, a twice differentiable function. If∀x ∈ X, D2f (x) is negative definite, then f is strictly concave.

Proof. It is left as an exercise.Notice that in theorems 9.4 and 9.5, the condition is sufficient but not

necessary (in contrast with theorem 9.3).Of course, definitions and corollaries for convex, quasiconvex and strictly

convex functions follow straightforwardly.

Page 82: Lecture Notes on Mathematics for Economists: A refresher

9.5 COMPOSITION OF FUNCTIONS AND CONCAVITY 75

9.5 Composition of functions and concavity

A key point to observe is that quasiconcavity is an ordinal property, whereasconcavity has cardinal character:

Theorem 9.6 If f : X → R is quasiconcave and g : R → R is increasing(i.e. y0 > y =⇒ g (y0) > g (y)), then g f is quasiconcave.

Theorem 9.7 If f : X → R is concave and g : R → R is concave andincreasing, then g f is concave.

Proof. Left as an exercise.

Exercise 9.4 Propose and prove a result analogous to the previous one toimply that g f is strictly concave.

Page 83: Lecture Notes on Mathematics for Economists: A refresher

76 CHAPTER 9 CONCAVITY AND CONVEXITY

9.6 Appendix

Our goal is to generalize the necessity part of theorem 9.3 to K ≥ 2: f ∈ C2.is concave only if (∀x ∈ D) : D2f (x) is negative semidefinite.Given x ∈ D and ∆ ∈ RK\ 0, define

Dx,∆ = δ ∈ R|x+ δ∆ ∈ D ⊆ Rϕx,∆ : Dx,∆ −→ R;ϕx,∆ (δ) = f (x+ δ∆)

We use the following lemmas

Lemma 9.1 Let x ∈ D and ∆ ∈ Rn\ 0. If D is convex, Dx,∆ is convex.If f is concave, ϕx,∆ is concave. If ∆ is open, ∆x,∆ is open. If f is differen-tiable, then ϕx,∆ is differentiable and ∀δ ∈ Dx,∆, ϕ0x,∆ (δ) = ∆•Df (x+ δ∆).If f ∈ C2, then ϕx,∆ ∈ C2 and ∀δ ∈ Dx,∆, ϕ

00x,∆ (δ) = ∆>D2f (x+ δ∆)∆.

Proof. We first show that Dx,∆ is convex: let δ, δ0 ∈ Dx,∆ and λ ∈ [0, 1]. Bydefinition, x + δ∆ ∈ D and x + δ0∆ ∈ D. Since D is convex, λ (x+ δ∆) +(1− λ) (x+ δ0∆) = x+ (λδ + (1− λ) δ0)∆ ∈ D, so (λδ + (1− λ) δ0) ∈ Dx,∆.We now show that ϕx,∆ is concave. Let δ, δ

0 ∈ Dx,∆ and λ ∈ [0, 1]. Bydefinition, x+ δ∆ ∈ D and x+ δ0∆ ∈ D. By concavity,

ϕx,∆ (λδ + (1− λ) δ0) = f (x+ (λδ + (1− λ) δ0)∆)= f (λ (x+ δ∆) + (1− λ) (x+ δ0∆))

> λf (x+ δ∆) + (1− λ) f (x+ δ0∆)= λϕx,∆ (δ) + (1− λ)ϕx,∆ (δ

0)

To see that Dx,∆ is open, let δ ∈ Dx,∆. By definition, x+ δ∆ ∈ D and, sinceD is open, ∃ε0 > 0 such that Bε0 (x+ δ∆) ⊆ D. Define ε = ε0/ k∆k > 0 andsuppose that δ0 ∈ Bε (δ). Then, kx− (x+ δ0∆)k = |δ| k∆k < ε k∆k = ε0, sox + δ0∆ ∈ Bε0 (x) ⊆ D and, hence, δ0 ∈ Dx,∆, which implies that Bε (δ) ⊆Dx,∆.That if f is differentiable, then ϕx,∆ is differentiable and ∀δ ∈ Dx,∆,

ϕ0x,∆ (δ) = ∆ •Df (x+ δ∆), and if f ∈ C2, then ϕx,∆ ∈ C2 and ∀δ ∈ Dx,∆,ϕ00x,∆ (δ) = ∆>D2f (x+ δ∆)∆ is left as an exercise.

Lemma 9.2 Suppose that there exists x ∈ D such that D2f (x) is not neg-ative semidefinite. Then, there exists ∆ ∈ Rn\ 0 such that ϕx,∆ is notconcave.

Page 84: Lecture Notes on Mathematics for Economists: A refresher

9.6 APPENDIX 77

Proof. Since D2f (x) is not negative semidefinite, there is ∆ ∈ Rn\ 0such that ∆>D2f (x)∆ > 0, and by the previous lemma ϕx,∆ ∈ C2 andϕ00x,∆ (0) > 0. From the case K = 1, it follows that ϕx,∆ is not concave.

Page 85: Lecture Notes on Mathematics for Economists: A refresher
Page 86: Lecture Notes on Mathematics for Economists: A refresher

Chapter 10

Unconstrained maximization

Throughout this chapter, we maintain the assumption that D ⊆ RK, K ∈ N,X 6= ∅.

10.1 Maximizers

Theorem 10.1 Let Y ⊆ R and let b = supY ∈ R. b /∈ Y if, and only if,∀a ∈ Y , ∃a0 ∈ Y such that a0 > a.

Proof. (If:) Left as an exercise.(Only if:) If ∃a ∈ Y such that ∀a0 ∈ Y , a0 ≤ a, then, by definition b ≤ a,

whereas a ≤ b, which implies that b = a ∈ Y , a contradiction.It follows that we need a stronger concept of extreme, in particular one

that implies that the extreme lies within the set.

Definition 10.1 A point b ∈ R is said to be the maximum of Y ⊆ R, denotedb = maxA, if b ∈ Y and ∀a ∈ Y , a ≤ b.

Theorem 10.2 If maxY exists, then it is unique.

Proof. Left as an exercise.

Theorem 10.3 If maxY exists, then supY exists and supY = maxY . IfsupY exists and supY ∈ Y , then maxY exists and maxY = supY .

Proof. Left as an exercise.

79

Page 87: Lecture Notes on Mathematics for Economists: A refresher

80 CHAPTER 10 UNCONSTRAINED MAXIMIZATION

Exercise 10.1 Given Y, Y 0 ⊆ R, prove the following:1. If supY and supY 0 exist, and ∀ (a, a0) ∈ Y × Y 0, a ≤ a0, then supA ≤inf A0.

2. If supY and supY 0 exist, λ, λ0 ∈ R++ andeA = ea|∃ (a, a0) ∈ Y × Y 0 : λa+ λ0a0 = eathen sup eY = supY + supY 0.

3. If supY and supY 0 exist, and ∀a ∈ Y,∃a0, a ≤ a0, then supA ≤ supY 0.

Show also that a strict version of the third statement is not true.

Now, it typically is of more interest in economics to find extrema offunctions, rather than extrema of sets. To a large extent, the distinctionis only apparent: what we will be looking for are the extrema of the imageof the domain under the function.

Definition 10.2 A point x ∈ D is said to be a global maximizer of f : D→R if ∀x ∈ D it is true that

f (x) 6 f (x)

Definition 10.3 A point x ∈ D is said to be a local maximizer of f : D→ Rif ∃ε > 0 : ∀x ∈ Bε (x) ∩D it is true that

f (x) 6 f (x)

When x ∈ D is a local (global) maximizer of f : D → R, f (x) is saidto be a local (the global) maximum of f . Notice that, in the latter case,f (x) = max f [D], although more standard notation for max f [D] is maxD for maxx∈D f (x).1 Notice that there is a conceptual difference between max-imum and maximizer! Also, notice that a function can have only one globalmaximum even if it has multiple global maximizers, but the same is not truefor the local concept. The set of maximizers of a function is usually denotedby ArgmaxD f and, if the latter set is a singleton, then its unique elementis denoted by argmaxD f .By analogy,1A point x ∈ D is said to be a local minimizer of f : D → R if ∃ε > 0 : ∀x ∈ Bε (x)∩D

it is true that f (x) > f (x). A point x ∈ D is said to be a global minimizer of f : D→ Rif ∀x ∈ D it is true that f (x) > f (x). From now on, we only deal with maxima, althoughthe minimization problem is obviously covered by analogy.

Page 88: Lecture Notes on Mathematics for Economists: A refresher

10.2 EXISTENCE 81

Definition 10.4 b ∈ R is said to be the supremum of f : D −→ R, denotedb = supD f or b = supx∈D f (x), if b = sup f [D].

There is no reason why ∃x ∈ D such that f (x) = supD f even if thesupremum is defined.

10.2 Existence

We now present a weak version of a result that is very useful in economics.

Theorem 10.4 (Weierstrass) Let C ⊆ X be nonempty and compact. Ifthe function f : X → R is continuous, then ∃x, x ∈ C : ∀x ∈ C, f (x) 6f (x) 6 f (x).

Proof. Let (yn)∞n=1 be a sequence in f [C] and such that yn −→ y. Fix

(xn)∞n=1 in C such that ∀n ∈ N, f (xn) = yn. Since C is bounded, there exists

a subsequence (xnm)∞m=1 that converges to some x, with x ∈ C because C is

bounded. By continuity, y = limm−→∞ ynm = limm−→∞ f (xnm) = f (x), soy ∈ f [C].Now, suppose that ∀∆ ∈ R, ∃y ∈ f [C] : |y| ≥ ∆. Then, ∀n ∈

N, ∃xn ∈ C : |f (xn)| ≥ n. Since C is compact, as before, there existsa subsequence (xnm)

∞m=1 that converges to some x ∈ C. By continuity,

|f (x)| = limm−→∞ |f (xnm)| =∞, which is impossible.It follows that f [C] is compact.Let y = sup f [C]. By theorem 3.8, ∀n ∈ N,∃yn ∈ f [C] : y − 1/n < yn <

y. Clearly, yn −→ y, so, since f [C] is closed, y ∈ f [C], so it follows that∃x ∈ C : f (x) = y. By definition, ∀x ∈ C, f (x) ≤ y = f (x).Existence of x is left as an exercise.The importance of this result is that when the domain of a continuous

function is closed and bounded, then the function does attain its maximaand minima within its domain.

10.3 Characterizing maximizers

Even though maximization is not a differential problem, when one has dif-ferentiability there are results that make it easy to find maximizers. For thissection, we take D ⊆ RK , K ∈ N, D 6= ∅, D open.

Page 89: Lecture Notes on Mathematics for Economists: A refresher

82 CHAPTER 10 UNCONSTRAINED MAXIMIZATION

10.3.1 Problems in RFor simplicity, we first consider the case K = 1.

Lemma 10.1 Suppose that f : D → R is differentiable. Let x ∈ X. Iff 0 (x) > 0, then ∃δ > 0 : ∀x ∈ Bδ (x) ∩D we have

x > x =⇒ f (x) > f (x)

x < x =⇒ f (x) < f (x)

Proof. By assumption, we have f 0 (x) ∈ R++. Then, by definition, ∃δ > 0 :∀x ∈ B0

δ (x) ∩D, ¯f (x)− f (x)

x− x− f 0 (x)

¯< f 0 (x)

and by exercise 2.1, since f 0 (x) > 0,

f (x)− f (x)

x− x> 0

Then, it is clear that ∀x ∈ Bδ (x) ∩ D, (x > x) =⇒ (f (x) > f (x)) and(x < x) =⇒ (f (x) < f (x)).

Corollary 10.1 Suppose that f : D → R is differentiable. Let x ∈ D. Iff 0 (x) < 0, then ∃δ > 0 : ∀x ∈ Bδ (x) ∩D we have

x > x =⇒ f (x) < f (x)

x < x =⇒ f (x) > f (x)

Proof. It is left as an exercise.

Theorem 10.5 Suppose that f : D → R is differentiable. If x ∈ int (D) isa local maximizer of f then f 0 (x) = 0.

Proof. Suppose not: f 0 (x) 6= 0. If f 0 (x) > 0, then, by lemma 10.1, ∃δ >0 : ∀x ∈ Bδ (x) ∩ D : x > x, we have that f (x) > f (x). Since x is a localmaximizer of f , then ∃ε > 0 : ∀x ∈ Bε (x) ∩D it is true that f (x) 6 f (x).Since x ∈ int (D), ∃γ > 0 : Bγ (x) ⊆ D. Let β = min ε, δ, γ > 0. Clearly,(x, x+ β) ⊂ B0

β (x) 6= ∅ and B0β (x) ⊆ D.Moreover, B0

β (x) ⊆ Bδ (x)∩D andB0β (x) ⊆ Bε (x) ∩D. This implies that ∃x : f (x) > f (x) and f (x) 6 f (x),an obvious contradiction.

Page 90: Lecture Notes on Mathematics for Economists: A refresher

10.3 CHARACTERIZING MAXIMIZERS 83

Now, if f 0 (x) < 0, then, by corollary 10.1, ∃δ > 0 : ∀x ∈ Bδ (x) ∩ D :x < x, we have that f (x) > f (x). Since x is a local maximizer of f , then∃ε > 0 : ∀x ∈ Bε (x)∩D it is true that f (x) 6 f (x). Since x ∈ int (D), ∃γ >0 : Bγ (x) ⊆ D. Let β = min ε, δ, γ > 0. Clearly, (x− β, x) ⊂ B0

β (x) 6= ∅and B0

β (x) ⊆ D. Moreover, B0β (x) ⊆ Bδ (x) ∩ D and B0

β (x) ⊆ Bε (x) ∩ D.This implies that ∃x : f (x) > f (x) and f (x) 6 f (x), again a contradiction.

Theorem 10.6 Let f : D → R such that f ∈ C2. If x ∈ int (D) is a localmaximizer of f then f 00 (x) 6 0.

Proof. Since x ∈ int (D), ∃ε > 0 : Bε (x) ⊆ D. Fix one such ε.∀h ∈ Bε (0), since f is twice differentiable, by Taylor’s theorem (7.2),

∃x∗h ∈ [x+ h, x] ∪ [x, x+ h] such that

f (x+ h) = f (x) + f 0 (x)h+1

2f 00 (x∗h)h

2

Since x is a local maximizer, ∃δ > 0 : ∀x ∈ Bδ (x) ∩ D, f (x) 6 f (x). Letβ = min ε, δ > 0. By construction, ∀h ∈ B0

β (0),

f 0 (x)h+1

2f 00 (x∗h)h

2 = f (x+ h)− f (x)

6 0

By theorem 10.5, since f is differentiable and x is a local maximizer, f 0 (x) =0, from where ∀h ∈ B0

β (0),

f 00 (x∗h)h2 6 0

which implies that ∀h ∈ B0β (0),

f 00 (x∗h) 6 0

Now, letting h→ 0, we get, by theorem 2.12, that

limh→0

f 00 (x∗h) 6 0

and hence, since f 00 is continuous and ∀h ∈ B0β (0), x

∗h ∈ [x+ h, x]∪[x, x+ h],

it follows thatf 00 (x) 6 0

Page 91: Lecture Notes on Mathematics for Economists: A refresher

84 CHAPTER 10 UNCONSTRAINED MAXIMIZATION

Exercise 10.2 Prove theorems analogous to the previous two, for the caseof local minimizers.

Notice that the last theorems only give us necessary conditions:2 this isnot a tool that tells us which points are local maximizers, but it tells uswhat points are not. A complete characterization requires both necessaryand sufficient conditions. We now find sufficient conditions:

Theorem 10.7 Suppose that f : D → R is twice differentiable. Let x ∈int (D). If f 0 (x) = 0 and f 00 (x) < 0, then x is a local maximizer.

Proof. Since f : D → R is twice differentiable and f 00 (x) < 0, we have,by corollary 10.1, that ∃δ > 0 : ∀x ∈ Bδ (x) ∩ D we have (x > x) =⇒(f 0 (x) < f 0 (x) = 0) and (x < x) =⇒ (f 0 (x) > f 0 (x) = 0).Since x ∈ int (D), ∃ε > 0 : Bε (x) ⊆ D.Let β = min δ, ε > 0. Now, by the mean value theorem (7.1), we have

that ∀x ∈ Bβ (x) ,f (x) = f (x) + f 0 (x∗) (x− x)

for some x∗ in the interval between, x and x (why?). Thus, if x > x, we havex∗ > x, and, therefore, f 0 (x∗) 6 0, so that f (x) 6 f (x). On the other hand,if x < x, we have x∗ 6 x, and, therefore, f 0 (x∗) > 0, so that f (x) 6 f (x).

Exercise 10.3 Prove an analogous theorem, for the case of a local mini-mizer.

Notice that the sufficient conditions are stronger than the set of necessaryconditions: there is a little gap that the differential method does not cover.

10.3.2 Higher-dimensional problems

We now allow K ≥ 2 and use the results of the one-dimensional case, usingthe definition on the appendix of chapter 9.

Lemma 10.2 If x∗ ∈ D is a local maximizer of f : D −→ R, then ∀∆ ∈RK\ 0, 0 is a local maximizer of ϕx∗,∆.

2And there are further necessary conditions.

Page 92: Lecture Notes on Mathematics for Economists: A refresher

10.3 CHARACTERIZING MAXIMIZERS 85

Proof. Let ∆ ∈ RK\ 0. By construction ∃ε0 > 0 such that ∀x ∈Bε0 (x

∗) ∩ D, f (x) ≤ f (x∗). Define ε = ε0/ k∆k and suppose that δ ∈Bε (0) ∩Dx∗,∆. By construction, kx∗ + (x∗ + δ∆)k = |δ| k∆k < ε k∆k = ε0,while x∗ + δ∆ ∈ D, which implies that x∗ + δ∆ ∈ Bε0 (x

∗) ∩D and, hence,ϕx∗,∆ (δ) = f (x∗ + δ∆) ≤ f (x∗) = ϕx∗,∆ (0).The previous lemma implies:

Theorem 10.8 If f : D −→ R is differentiable and x∗ ∈ D is a localmaximizer of f , then Df (x∗) = 0.

Proof. By lemma 10.2, for every k ∈ 1, ..., K, 0 is a local maximizer ofϕx∗,eK , where ek is the k

th canonical vector inRK. Since ϕx∗,ek is differentiableby lemma 9.1, it follows from theorem 10.5 than ϕ0x∗,ek (0) = 0, whereas, againby lemma 9.1, ϕ0x∗,ek (0) = ek •Df (x∗) = ∂f

∂xk(x∗).

Theorem 10.9 If f : D −→ R ∈ C2 and x∗ ∈ D is a local maximizer of f ,then D2f (x∗) is negative semidefinite.

Proof. Let ∆ ∈ RK\ 0. By lemma 10.2 and theorem 10.6, ϕ00x∗,∆ (0) ≤ 0,whereas by lemma 9.1 ϕ00x∗,∆ (0) = ∆>D2f (x∗)∆.As before, these conditions do not tell us which points are maximizers,

but only which ones are not. Before sufficiency, we need to introduce thefollowing lemma:

Lemma 10.3 If f : D −→ R ∈ C2 and D2f (x∗) is negative definite, thenthere exists ε > 0 such that for every x ∈ Bε (x

∗), D2f (x) is negative definite.

Proof. Suppose not. Then, ∀n ∈ N, ∃xn ∈ B1/n (x∗) and ∃∆n ∈ RK\ 0

such that ∆>nD

2f (xn)∆ ≥ 0. Since ∀n ∈ N, ∆n 6= 0, we can assume withoutlosing generality that ∀n ∈ N, k∆nk = 1. Then, it follows that for some sub-sequence

¡xn(m),∆n(m)

¢∞m=1

we have that ∀m ∈ N, ∆>n(m)D

2¡xn(m)

¢∆n(m) ≥

0, and¡xn(m),∆n(m)

¢ −→ (x∗,∆) for some ∆ such that k∆k = 1. Since f ∈C2, ∆>D2f (x∗)∆ ≥ 0, contradicting the negative definiteness of D2f (x∗).

Theorem 10.10 Suppose that f : D → R ∈ C2. Let x ∈ D. If Df (x) = 0and D2f (x) is negative definite, then x is a local maximizer.

Proof. It is left as an exercise. (Hint: use the previous lemma.)

Page 93: Lecture Notes on Mathematics for Economists: A refresher

86 CHAPTER 10 UNCONSTRAINED MAXIMIZATION

10.4 Maxima and concavity

For this section, we take D ⊆ RK , K ∈ N, D 6= ∅ and drop the opennessassumption.The results that we obtained in the previous sections hold only locally.

However:

Theorem 10.11 Suppose that D is a convex set and f : D→ R is a concavefunction. Then, if x ∈ D is a local maximizer of f , it is also a globalmaximizer.

Proof. We argue by contradiction: suppose that x ∈ D is a local maximizerof f , but it is not a global maximizer. Then, ∃ε > 0 : ∀x ∈ Bε (x) ∩ D,f (x) 6 f (x), and ∃x∗ ∈ D : f (x∗) > f (x). Clearly, then, x∗ /∈ Bε (x),which implies that kx∗ − xk > ε. Now, since D is convex and f is concave,we have that ∀θ ∈ [0, 1],

f (θx∗ + (1− θ) x) > θf (x∗) + (1− θ) f (x)

but, since f (x∗) > f (x), we have that ∀θ ∈ (0, 1],θf (x∗) + (1− θ) f (x) > f (x)

so that ∀θ ∈ (0, 1],f (θx∗ + (1− θ)x) > f (x)

Now, consider θ∗ ∈³0, ε

kx∗−xk´6= ∅. Clearly, θ∗ ∈ (0, 1), so that

f (θ∗x∗ + (1− θ∗)x) > f (x)

However, by construction,

k(θ∗x∗ + (1− θ∗)x)− xk = kθ∗x∗ − θ∗xk= θ∗ kx∗ − xk<

µε

kx∗ − xk¶kx∗ − xk

= ε

which implies that (θ∗x∗ + (1− θ∗)x) ∈ Bε (x), and, moreover, by convexityof D, we have that (θ∗x∗ + (1− θ∗)x) ∈ Bε (x) ∩ D. This contradicts thefact that ∀x ∈ Bε (x) ∩D, f (x) 6 f (x).

Page 94: Lecture Notes on Mathematics for Economists: A refresher

10.4 MAXIMA AND CONCAVITY 87

Exercise 10.4 Prove an analogous theorem, for the case of a local mini-mizer.

Theorem 10.12 Suppose that D is convex, f : D → R ∈ C2 and ∀x ∈ D,Df 00 (x) is negative definite. Then, there exists at most one point x ∈ D :Df (x) = 0. If such point exists, it is a global maximizer.

Proof. We first prove the last part of the theorem. Suppose that ∃x ∈D : Df (x) = 0. By assumption, Df 00 (x) is negative definite, and therefore,by theorem 10.10, x is a local maximizer. Since ∀x ∈ D, Df (x) is negativedefinite, by theorem 9.5, we have that f is concave and, therefore, by theorem10.11, x is a global maximizer.We must now show that there cannot exist more than one such point.

We argue by contradiction: suppose that ∃x1, x2 ∈ D, x1 6= x2, such thatf 0 (x1) = f 0 (x2) = 0. By our last paragraph, both x1 and x2 are globalmaximizers, so that f (x1) = f (x2). Now, since ∀x ∈ D, D2f (x) is negativedefinite, by theorem 9.5, we have that f is strictly concave, and since 1

2∈

(0, 1), we have that

f

µ1

2x1 +

1

2x2

¶>1

2f (x1) +

1

2f (x2) = f 0 (x1) = f 0 (x2)

contradicting the fact that both x1 and x2 are global maximizers, since D isconvex.

Exercise 10.5 Prove an analogous theorem, for the case when ∀x ∈ D,f 00 (x) > 0.

Exercise 10.6 Solve exercise 3.9 in pages 57 and 58 of Simon and Blume.

Page 95: Lecture Notes on Mathematics for Economists: A refresher
Page 96: Lecture Notes on Mathematics for Economists: A refresher

Chapter 11

Constrained Maximization

Suppose that f : R −→ R is differentiable, and suppose that for a, b ∈ R,a < b, we want to find x∗ ∈ [a, b] such that ∀x ∈ [a, b], f (x) 6 f (x∗). Thatis, we want to solve the problem:

Maximize f (x) subject to x > a and x 6 b

If x∗ ∈ (a, b) solves the problem, then x∗ is a local maximizer of f (why?)and it follows from theorem 10.5 that f 0 (x∗) = 0. If, alternatively, x∗ = bsolves the problem, then by corollary 10.1, it must be that f 0 (x∗) > 0.Finally, if x∗ = a solves the problem, it follows from lemma 10.1 that f 0 (x∗) 60.It is then straightforward that if x∗ solves the problem, then there exist

λ∗a, λ∗b ∈ R such that:

f (x∗)− λ∗b + λ∗a = 0

and that:

x∗ ∈ (a, b) =⇒ (λ∗a = 0 ∧ λ∗b = 0)x∗ = b =⇒ (λ∗a = 0 ∧ λ∗b > 0)x∗ = a =⇒ (λ∗a > 0 ∧ λ∗b = 0)

This second set conditions can easily be rewritten as

λ∗a (x∗ − a) = 0 ∧ λ∗b (b− x∗) = 0

Summarizing:

89

Page 97: Lecture Notes on Mathematics for Economists: A refresher

90 CHAPTER 11 CONSTRAINED MAXIMIZATION

If x∗ solves the problem, there exist λ∗a, λ∗b ∈ R such that1:

f (x∗)− λ∗b + λ∗a = 0

x∗ > a ∧ λ∗a > 0 ∧ λ∗a (x∗ − a) = 0

x∗ 6 b ∧ λ∗b > 0 ∧ λ∗b (b− x∗) = 0

With functions on higher-dimensional spaces, it is customary to define afunction

L : R3 −→ R;L (x, λa, λb) = f (x) + λb (b− x) + λa (a− x)

which is called the Lagrangian, and with which the first condition can bere-written as

∂L

∂x(x∗, λ∗a, λ

∗b) = 0

where ∂L∂xdenotes the “partial derivative of L with respect to x,” which is the

derivative that we know for one-dimensional functions, keeping fixed valuesof λa and λb.In this section we show how these Lagrangian methods work, and empha-

size when they fail.

11.1 Equality constraints

For this subsection, we maintain the assumption that D ⊆ RK, K ∈ N, Dopen, f : D −→ R, g : D −→ RJ with J ≤ K.Suppose that we want to solve the problem

maxx∈D

f (x) s.t. g (x) = 0

1When functions on higher-dimensional spaces have been introduced, it is customaryto define a function

L : R3 −→ R;L (x, λa, λb) = f (x) + λb (b− x) + λa (a− x)

which is called the Lagrangean, and with which the first condition can be re-written as

∂L

∂x(x∗, λ∗a, λ

∗b) = 0

where ∂L∂x denotes the “partial derivative of L with respect to x,” which is the derivative

that we know for one-dimensional functions, keeping fixed values of λa and λb.

Page 98: Lecture Notes on Mathematics for Economists: A refresher

11.1 EQUALITY CONSTRAINTS 91

which means, in our previous notation, that we want to findmaxx∈D|g(x)=0 f .The method that is usually applied in economics consists of:

• Defining the Lagrangian function L : D × RJ −→ R, by L (x, λ) =f (x) + λ · g (x);

• Finding (x∗, λ∗) ∈ D ×RJ such that DL (x∗, λ∗) = 0.

That is, a recipe is applied as though there is a "result" that states:

Let f and g be differentiable. x∗ ∈ D solves the problem

maxx∈D

f (x) s.t. g (x) = 0

if, and only if, there exists λ∗ ∈ RJ such thatDf (x∗)+λ∗>Dg (x∗) =0.

Unfortunately, though, such a statement is not true, for reasons thatwe now study.For simplicity, suppose that D = R2 and J = 1, and denote the typical

element of R2 by (x, y). So, given f : R2 −→ R and g : R2 −→ R,we want tofind

max(x,y)∈R2

f (x, y) s.t. g (x, y) = 0

Let us suppose that we do not know the Lagrangian method, but are quitefamiliar with unconstrained optimization. A "crude" method suggests:

• Suppose that we can solve from the equation g (x, y) = 0, to express yas a function of x: we find a function y : R −→ R such that g (x, y) =0⇐⇒ y = y (x).

• With the function y at hand, we study the unconstrained problem

maxx∈R

F (x)

where F : R −→ R is defined by F (x) = g (x, y (x)).

Page 99: Lecture Notes on Mathematics for Economists: A refresher

92 CHAPTER 11 CONSTRAINED MAXIMIZATION

• Since we want to use calculus, if f and g are differentiable, we need tofigure out y0 (·). Now, if ∀x ∈ R, g (x, y (x)) = 0, then, differentiatingboth sides, we get that

∂xg (x, y (x)) + ∂yg (x, y (x)) y0 (x) = 0

from where ∀x ∈ R2,

y0 (x) = −∂xg (x, y (x))∂yg (x, y (x))

• Now, with F differentiable, we know that x∗ solvesmaxx∈R F (x) locallyonly if F 0 (x∗) = 0.

In our case, the last condition is simply that

∂xf (x∗, y (x∗)) + ∂yf (x

∗, y (x∗)) y0 (x∗) = 0

or, equivalently,

∂xf (x∗, y (x∗))− ∂yf (x

∗, y (x∗))∂xg (x

∗, y (x∗))∂yg (x∗, y (x∗))

= 0

so, if we define y∗ = y (x∗) and

λ∗ = −∂yf (x∗, y (x∗))

∂yg (x∗, y (x∗))∈ R

we get that∂xf (x

∗, y (x∗)) + λ∗∂xg (x∗, y (x∗)) = 0

whereas,∂yf (x

∗, y (x∗)) + λ∗∂yg (x∗, y (x∗)) = 0

So, our method has apparently shown that

Let f and g be differentiable. x∗ ∈ D locally solves the prob-lem2

maxx∈D

f (x) s.t. g (x) = 0

only if there exists λ∗ ∈ RJ such that Df (x∗) +λ∗>Dg (x∗) = 0.

2That is, ∃ > 0 such that ∀x ∈ B (x∗) ∩ x ∈ D| g (x) = 0, f (x) ≤ f (x∗).

Page 100: Lecture Notes on Mathematics for Economists: A refresher

11.1 EQUALITY CONSTRAINTS 93

The latter means that: (i) as in the unrestricted case, the differentialapproach, at least in principle, only finds local extrema; and (ii) the La-grangian condition is only necessary and not sufficient by itself. So, we needto be careful and study further conditions for sufficiency. Also, we need todetermine under what conditions can we find the function y and, moreover,be sure that it is differentiable.For sufficiency, we can again appeal to our crude method and use the

sufficiency results we inherit from unconstrained optimization. Since we nowneed F to be differentiable twice, so as to make it possible that F 00 (x∗) < 0,we must assume that so are f and g, and moreover, we need to know y00 (x).Since we already know y0 (x), by differentiation,

y00 (x) = − ∂

∂x

µ∂xg (x, y (x))

∂yg (x, y (x))

¶= −

µ∂2xxg (x, y (x)) + ∂2xy (x, y (x)) y

0 (x)∂yg (x, y (x))

−∂2xyg (x, y (x)) + ∂2yyg (x, y (x)) y

0 (x)

∂yg (x, y (x))2 ∂xg (x, y (x))

!

= − 1

∂yg (x, y (x))

³1 y0 (x)

´D2g (x, y (x))

1

y0 (x)

Now, the condition that F 00 (x∗) < 0 is equivalent to

0 >¡∂2xxf (x

∗, y (x∗)) + ∂2xyf (x∗, y (x∗)) y0 (x∗) + ∂2yx (x

∗, y (x∗)) y0 (x∗)

+∂2yyf (x∗, y (x∗)) y0 (x∗)2 + ∂yf (x

∗, y (x∗)) y00 (x∗)¢

substitution yields

0 >

³ 1 y0 (x∗)´D2f (x∗, y (x∗))

1

y0 (x∗)

−∂yf (x

∗, y (x∗))∂yg (x∗, y (x∗))

³1 y0 (x∗)

´D2g (x∗, y (x∗))

1

y0 (x∗)

Page 101: Lecture Notes on Mathematics for Economists: A refresher

94 CHAPTER 11 CONSTRAINED MAXIMIZATION

which, by definition of y∗ and λ∗, is

³1 y0 (x∗)

´D2(x,y)L (x

∗, y∗, λ∗)

1

y0 (x∗)

< 0

Obviously, the condition is satisfied if D2(x,y)L (x

∗, y∗, λ∗) is negative definite,but this would be overkill: notice that³

1 y0 (x∗)´·Dg (x∗, y∗) = 0

so it suffices that we guarantee that for every ∆ ∈ R2\ 0 such that ∆ ·Dg (x∗, y∗) = 0 we have that ∆>D2

(x,y)L (x∗, y∗, λ∗)∆ < 0.

So, in summary, we seem to have argued to following result:

Suppose that f, g ∈ C1. Then:1. If x∗ ∈ D locally solves the problem

maxx∈D

f (x) s.t. g (x) = 0

only if there exists λ∗ ∈ RJ such that DL (x∗, λ∗) = 0.2. If f, g ∈ C2 and there exists λ∗ ∈ RJ such thatDL (x∗, λ∗) =

0 and that for every ∆ ∈ R2\ 0 such that ∆ · Dg (x∗, y∗) = 0we have that ∆>D2

(x,y)L (x∗, y∗, λ∗)∆ < 0, then, x∗ ∈ D locally

solves the problem

maxx∈D

f (x) s.t. g (x) = 0.

But we still need to argue that we can indeed solve y as a function of x.Notice that it has been crucial throughout our analysis that ∂yg (x∗, y∗) 6= 0.Of course, even if the latter hadn’t been true, but ∂xg (x

∗, y∗) 6= 0, ourmethod would still have worked, mutatis mutandis. So, what we actuallyrequire is that Dg (x∗, y∗) have rank 1, its maximum possible. The obviousquestion is: is this a general result, or does it only work in our simplifiedcase?To see that it is indeed a general result, we introduce without proof the

following important result:

Theorem 11.1 (The implicit function theorem) Let D ⊆ RK+J andlet g : D −→ RJ ∈ C1. If (x∗, y∗) ∈ D is such that rank (Dyg (x

∗, y∗)) = J,then there exist ε, δ > 0 and γ : Bε (x

∗) −→ Bδ (y∗) ∈ C1 such that:

Page 102: Lecture Notes on Mathematics for Economists: A refresher

11.2 INEQUALITY CONSTRAINTS 95

1. ∀x ∈ Bε (x∗), (x, γ (x)) ∈ D;

2. ∀x ∈ Bε (x∗), (g (x, y) = g (x∗, y∗) ∧ y ∈ Bδ (y

∗))⇐⇒ y = γ (x) ;

3. ∀x ∈ Bε (x∗) , Dγ (x) = −Dyg (x, γ (x))

−1Dxg (x, γ (x)).

This important theorem allows us to express y as a function of x andgives us the derivative of this function: exactly what we wanted! Of course,we need to satisfy the hypotheses of the theorem if we are to invoke it. Inparticular, the condition on the rank is known as “constraint qualification”and is crucial for the Lagrangian method to work (albeit it is oftentimesforgotten!). So, finally, the following result is true:

Theorem 11.2 (Lagrange) Let f : D −→ R ∈ C1, g : D −→ RJ ∈ C1,with J ≤ K. Let x∗ ∈ D be such that rank (Dg (x∗)) = J. Then,

1. If x∗ locally solves the problem

maxD

f (x) s.t. g (x) = 0

then, there exists λ∗ ∈ RJ such that DL (x∗, λ∗) = 0.

2. If there exists λ∗ ∈ RJ such that DL (x∗, λ∗) = 0 and for every ∆ ∈RJ\ 0 such that ∆•Dg (x∗) = 0, it is true that ∆>D2

x,xL (x∗, λ∗)∆ <

0, then x∗ locally solves the problem

maxD

f (x) s.t. g (x) = 0

11.2 Inequality constraints

As before, let f : D −→ R ∈ C1 and f : D −→ RJ ∈ C1. Now suppose thatwe have the problem

maxD

f (x) s.t. g (x) ≥ 0Again, the usual method says that one should try to find (x∗, λ∗) ∈ D ×RJ

+

such that:

DxL (x∗, λ∗) = 0

g (x∗) ≥ 0

λ∗ · g (x∗) = 0

which is as though there is a theorem that states:

Page 103: Lecture Notes on Mathematics for Economists: A refresher

96 CHAPTER 11 CONSTRAINED MAXIMIZATION

If x∗ ∈ D locally solves

maxD

f (x) s.t. g (x) ≥ 0

then there exists λ∗ ∈ RJ+ such that:

DxL (x∗, λ∗) = 0

g (x∗) ≥ 0

λ∗ · g (x∗) = 0

Now, even though we are recognizing the local character and the necessityof the result, we still have to worry about constraint qualification. To seethat this is the case, consider the following example:

Example 11.1 Consider the problem

maxR2− ¡(x− 3)2 + y2

¢s.t. 0 ≤ y ≤ − (x− 1)3

The Lagrangian of this problem can be written as

L (x, y, λ1, λ2) = − (x− 3)2 − y2 + λ1¡− (x− 1)3 − y

¢+ λ2y

Notice that there is no solution (x∗, λ∗1, λ∗2) to the system

−2 (x∗ − 3) + 3λ∗1 (x∗ − 1)2 = 0

−2y∗ − λ∗1 + λ∗2 = 0

λ∗1 ≥ 0

λ∗2 ≥ 0

− (x∗ − 1)3 − y∗ ≥ 0

y∗ ≥ 0

λ∗1¡− (x∗ − 1)3 − y∗

¢= 0

λ∗2y∗ = 0

although (1, 0) solves the problem! If the first order conditions were necessaryeven without the constraint qualification (i.e. if the statement were true) thesystem would necessarily have to have a solution.

The point of the example is just that the theorem requires the constraintqualification condition:

Page 104: Lecture Notes on Mathematics for Economists: A refresher

11.2 INEQUALITY CONSTRAINTS 97

Theorem 11.3 Let f : D −→ R ∈ C1 and f : D −→ RJ ∈ C1. Letx∗ ∈ D be such that g (x∗) ≥ 0. Define I = j ∈ 1, ..., J| gj (x∗) = 0, letI = #I, and suppose that rank (Deg (x∗)) = I for eg : D −→ RI defined byeg (x) = (gj (x))j∈I. Then,1. If x∗ is a local solution to the problem

maxD

f (x) s.t. g (x) ≥ 0

then there exists λ∗ ∈ RJ+ such that:

DxL (x∗, λ∗) = 0

g (x∗) ≥ 0

λ∗ · g (x∗) = 0

2. If f, g ∈ C2 and there exists λ∗ ∈ RJ+ such that:

DxL (x∗, λ∗) = 0

g (x∗) ≥ 0

λ∗ · g (x∗) = 0¡∀∆ ∈ RI\ 0 : ∆ ·Deg (x∗) = 0¢ : ∆>D2x,xL (x

∗, λ∗)∆ < 0

then, x∗ is a local solution to the problem

maxD

f (x) s.t. g (x) ≥ 0

As before, it must be noticed that there is a gap between necessity andsufficiency, and that the theorem only gives local solutions. For the formerproblem, there is no solution. For the latter, one can study concavity of theobjective function and convexity of the feasible set. Importantly, notice thatwith inequality constraint the sign of λ does matter: this is because of thegeometry of the theorem: a local maximizer is attained when the feasibledirections, as determined by the gradients of the binding constraints is ex-actly opposite to the desired direction, as determined by the gradient of theobjective function. Obviously, locally only the binding constraints matter,which explains why the constraint qualification looks more complicated herethan with equality constraints. Finally, it is crucial to notice that the processdoes not amount to maximizing L: in general, L does not have a maximum;what one finds is a saddle point of L.

Page 105: Lecture Notes on Mathematics for Economists: A refresher

98 CHAPTER 11 CONSTRAINED MAXIMIZATION

Exercise 11.1 Prove the following results:

Theorem 11.4 1. Suppose that f : RK −→ R, g : RK −→ RJ ∈ C1.Suppose that F = x ∈ Rn| g (x) > 0 is compact. Suppose that forevery x ∈ F , if I (x) = j ∈ 1, ..., J| gj (x) = 0 and I (x) = #I (x) ,we have that

Rank³[Dgj (x)]j∈I(x)

´= I (x)

If there exists x∗ ∈ F such that¡∃λ∗ ∈ Rm+

¢: DxL (x

∗, λ∗) = 0 ∧ λ∗ · g∗ (x∗) = 0(∀x ∈ F\ x∗) ¡∀λ ∈ Rm

+

¢: DxL (x, λ) = 0 =⇒ λ · g (x) > 0

then x∗ uniquely solves the problem

max f (x) s.t. g (x) > 0

2. Suppose that f : RK −→ R, g : RK −→ RJ ∈ C1. Suppose that thereexists no solution to the system

DxL (x, λ) = 0

λ > 0

g (x) > 0

λ · g (x) = 0

Then, x∗ locally solves the problem

max f (x) s.t. g (x) > 0

only ifRango

¡(Dgi (x))i∈K

¢< K

where K = i ∈ 1, ...,m| gi (x∗) = 0 and K = #K.

11.3 Parametric programming

We now study how the solution of a problem depends on the parameter thatdefine the problem.

Page 106: Lecture Notes on Mathematics for Economists: A refresher

11.3 PARAMETRIC PROGRAMMING 99

11.3.1 Continuity

Let Ω ⊆ RM , Ω 6= ∅ and let D : Ω ⇒ RK be a correspondence from Ω intoRK : for every ω ∈ Ω, D assigns a subset D (ω) ⊆ RK .Suppose that D : Ω ⇒ RK is nonempty- and compact-valued: ∀ω ∈ Ω,

D (ω) 6= ∅, compact.

Definition 11.1 D is upper hemicontinuous at ω ∈ Ω if for every pair ofsequences if for every pair of sequence (ωn)

∞n=1 in RM , and (xn)

∞n=1in RK,

such that

• ∀n ∈ N, ωn ∈ Ω;

• limn−→∞ ωn = ω;

• ∀n ∈ N, xn ∈ D (ωn);

there exist a subsequence¡xn(k)

¢∞k=1

of (xn)∞n=1 and x ∈ D (ω) such that

limk−→∞ xn(k) = x.D is upper hemicontinuous if it is upper hemicontinuous at every ω ∈ Ω.

An upper hemicontinuous correspondence has the property that its graphis “closed” at points where the correspondence explodes.

Definition 11.2 D is lower hemicontinuous at ω ∈ Ω if for every sequence(ωn)

∞n=1 in RM , such that

• ∀n ∈ N, ωn ∈ Ω;

• limn−→∞ ωn = ω;

and for every x ∈ D (ω), there exists a sequence (xn)∞n=1 in RK such that

• ∀n ∈ N, xn ∈ D (ωn);

• limn−→∞ xn = x.

D is lower hemicontinuous if it is lower hemicontinuous at every ω ∈ Ω.

A lower hemicontinuous correspondence has the property that its graphis “open” at points where the correspondence explodes.

Page 107: Lecture Notes on Mathematics for Economists: A refresher

100 CHAPTER 11 CONSTRAINED MAXIMIZATION

Definition 11.3 D is continuous at ω ∈ Ω if it is upper and lower hemi-continuous at ω. D is continuous if it is continuous at every ω ∈ Ω.

A continuous correspondence does not explode.

Theorem 11.5 If F : Ω⇒ RK is single-valued (that is, ∀ω ∈ Ω, #D (ω) =1) and upper or lower hemicontinuous, then the function f : Ω −→ RK,defined implicitly by f (ω) = F (ω), is continuous.

Proof. Left as an exercise.In fact, the relationship between the concepts introduced in the previous

theorem is stronger: in the case of single valued correspondence, both typesof hemicontinuity are equivalent (iff) to continuity of the associated function(and equivalent to each other, then).The importance of the concept of continuity of preferences is the following:

Theorem 11.6 (Theorem of the Maximum ) Let f : RK × Ω −→ R becontinuous and let D : Ω⇒ RK be nonempty-, single-valued and continuous.The correspondence X : Ω⇒ RK defined by

X (ω) = Arg maxx∈D(ω)

f (x, ω)

is upper hemicontinuous (and nonempty- and compact-valued) and the (“value”)function v : Ω −→ R defined by

v (ω) = maxx∈D(ω)

f (x, ω)

is continuous.

Proof. The proof is not terribly difficult, but we won’t try it here.

11.3.2 Differentiability

Suppose now that D ⊆ RK , Ω ⊆ RM , with K,M ∈ N, D and Ω open.Suppose that f : D ×Ω −→ R and g : D × Ω −→ RJ .Consider the following (simplified) parametric problem: given ω ∈ Ω,

v (ω) = maxx∈D

f (x, ω) s.t. g (x, ω) = 0

Page 108: Lecture Notes on Mathematics for Economists: A refresher

11.3 PARAMETRIC PROGRAMMING 101

Suppose that the differentiability and second-order conditions are given sothatµx∗ solves max

x∈Df (x, ω) s.t. g (x, ω) = 0

¶⇐⇒ ∃λ∗ ∈ RJ : DL (x∗, λ∗, ω) = 0

Suppose further that we can define functions x : Ω −→ D and λ : Ω −→ RJ

given by the solution of the problem and the associated multiplier, for everyω. Then, it follows directly from the implicit function theorem that if for agiven ω ∈ Ω

rank

0J×J Dxg (x∗, ω)

Dxg (x∗, ω)> D2

x,xL (x∗, λ∗, ω)

= J +K

then there exists > 0 such that on B (ω) the functions x and λ are differ-entiable andDλ (ω)

Dx (ω)

= − 0J×J Dxg (x

∗, ω)

Dxg (x∗, ω)> D2

x,xL (x∗, λ∗, ω)

−1 Dωg (x (ω) , ω)

D2ω,xL (x (ω) , λ (ω) , ω)

It is then immediate that v is differentiable at ω and

Dv (ω) = Dxf (x (ω) , ω)Dx (ω)

A simpler method, however, is given by the following theorem

Theorem 11.7 (The Envelope Theorem) If, under the assumptions ofthis subsection, v is continuously differentiable at ω, then

Dv (ω) = DωL (x (ω) , λ (ω) , ω)

Proof. One just needs to use the chain rule: by assumption,

Dxf (x (ω) , ω) +Dxg (x (ω) , ω)> λ (ω) = 0

whereas g (x (ω) , ω) = 0, so

Dxg (x (ω) , ω)Dx (ω) +Dωg (x (ω) , ω) = 0

Page 109: Lecture Notes on Mathematics for Economists: A refresher

102 CHAPTER 11 CONSTRAINED MAXIMIZATION

Meanwhile,

Dv (ω) = Dx (ω)>Dxf (x (ω) , ω) +Dωf (x (ω) , ω)

= −Dx (ω)>Dxg (x (ω) , ω)

> λ (ω) +Dωf (x (ω) , ω)

and

DωL (x (ω) , λ (ω) , ω) = Dωf (x (ω) , ω) +Dωg (x (ω) , ω)> λ (ω)

= Dωf (x (ω) , ω)−Dx (ω)>Dxg (x (ω) , ω)

> λ (ω)

Exercise 11.2 Let f : RK −→ R, g : RK −→ RJ ∈ C2, with J 6 K ∈ N.Suppose that ∀ω ∈ Rm, the problem

max f (x) s.t. g (x) = ω

has a solution, which is characterized by the first order conditions of theLagrangian L : RK × RJ × RJ −→ R, defined by L (x, λ, ω) = f (x) + λ ·(ω − g (x)). Suppose further that these conditions define differentiable func-tions x : RJ −→ RK and λ : RJ −→ RJ . Prove the following result: letv : RJ −→ R be defined by

v (ω) = max f (x) s.t. g (x) = ω

then, ∀ω ∈ RJ , Dv (ω) = λ (ω).

Page 110: Lecture Notes on Mathematics for Economists: A refresher

Chapter 12

Riemann integration

In the first part of this chapter, we assume that a, b ∈ R, a < b. As before,we assume that X ⊆ R.

12.1 The Riemann integral

There are different, but equivalent, ways to define the Riemann integral. Wenow introduce the simplest (although not necessarily the best) one.

Definition 12.1 A function f : X → R is said to be bounded if ∃∆ ∈ R :∀x ∈ X, |f (x)| 6 ∆.

Definition 12.2 A function s : [a, b] → R is said to be a step function ifthere exist a monotonically increasing finite sequence (x1,...,xn∗) such thatx1 > a, xn∗ = b, and a finite sequence (s1,...,sn∗) satisfying that ∀n ∈1, 2, ..., n∗ it is true that ∀x ∈ (xn−1, xn), s (x) = sn, where we definex0 = a.

Example 12.1 Consider s : [−2, 2]→ R defined by

s (x) =

−1 if − 2 6 x < 0

0 if x = 0

1 if 0 < x 6 2

It is easy the see that s is a step function: use x1, x2 = 0, 2 and s1, s2 =−1, 1, and let x0 = −2; then, we have that ∀x ∈ (x0, x1) = (−2, 0) , s (x) =−1 = s1 and ∀x ∈ (x1, x2) = (0, 2) , s (x) = 1 = s2.

103

Page 111: Lecture Notes on Mathematics for Economists: A refresher

104 CHAPTER 12 RIEMANN INTEGRATION

It is important to notice, both in the definition and in the example, thatthe values of s (a) and s (b) do not matter. Similarly, the value of s at anypoint of discontinuity is irrelevant (e.g. s (0) in the example,) but there canbe only finitely many such points. It should also be clear that any stepfunction on [a, b] is bounded, because it takes at most (2n∗ + 1) (a finitenumber) different values.

Definition 12.3 Given a step function s : [a, b]→ R, we define the integralof s from a to b by

bZa

s (x) dx =n∗Xn=1

sn (xn − xn−1)

where x0 = a, (x1,...,xn∗) is a monotonically increasing finite sequence suchthat x1 > x0, xn∗ = b, and (s1,...,sn∗) is a finite sequence satisfying that∀n ∈ 1, 2, ..., n∗ it is true that ∀x ∈ (xn−1, xn), s (x) = sn.

Notice that the integral of a step function on [a, b] is always a real number.Also, it should be clear that the integral is unique, so that no matter whatparticular sequences one uses to find it, the summation is always the same.

Example 12.2 For s defined as in example 12.1, we have

2Z−2

s (x) dx = −1 (0− (−2)) + 1 (2− 0) = 0

Again, notice that the integral is independent of s (a) and s (b), and ofthe value of s at any point of discontinuity (e.g. 0 in our example).

Definition 12.4 Let f : [a, b]→ R be a bounded function. If there exists aunique I ∈ R such that

bZa

s (x) dx 6 I 6bZ

a

t (x) dx

for every pair of step functions s : [a, b] → R and t : [a, b] → R such that∀x ∈ [a, b]

s (x) 6 f (x) 6 t (x)

Page 112: Lecture Notes on Mathematics for Economists: A refresher

12.2 PROPERTIES OF THE RIEMANN INTEGRAL 105

then, f is said to be integrable (on [a, b]), and I is said to be the integral off from a to b, which we denote by

bZa

f (x) dx = I

It is important to notice that I is required by the definition to be finiteand unique.

Definition 12.5 Let f : [a, b]→ R be integrable. Then, we define

aZb

f (x) dx = −bZ

a

f (x) dx

Moreover, we defineaZ

a

f (x) dx = 0

The following definition is (has to be) given for formal completeness:

Definition 12.6 Let f : [a, b]→ R be integrable and let c, d ∈ [a, b], c 6 d.We define the integral of f from c to d as

dZc

f (x) dx =

dZc

ef (x) dxwhere ef : [c, d]→ R is defined by: ∀x ∈ [c, d] , ef (x) = f (x).

12.2 Properties of the Riemann integral

The following results list (some) important properties of the Riemann inte-gral. We state them without proof.

Page 113: Lecture Notes on Mathematics for Economists: A refresher

106 CHAPTER 12 RIEMANN INTEGRATION

Theorem 12.1 Suppose that the functions f : [a, b]→ R and g : [a, b]→ Rare integrable and α, β ∈ R. Then, (αf + βg) : [a, b]→ R is integrable and

bZa

(αf + βg) (x) dx = α

bZa

f (x) dx+ β

bZa

g (x) dx

Theorem 12.2 Suppose that f : [a, b]→ R is integrable and c ∈ [a, b]. Then,bZ

a

f (x) dx =

cZa

f (x) dx+

bZc

f (x) dx

Theorem 12.3 Suppose that the functions f : [a, b]→ R and g : [a, b]→ Rare integrable and ∀x ∈ [a, b],

f (x) 6 g (x)

Then,bZ

a

f (x) dx 6bZ

a

g (x) dx

Additionally, if f : [a, b]→ R is integrable, α ∈ R and β ∈ R\ 0, thenbZ

a

f (x) dx =

b+αZa+α

f (x− α) dx

andbZ

a

f (x) dx =1

β

βbZβa

f

µx

β

¶dx

Theorem 12.4 If f : [a, b] → R is either monotonic or continuous, then itis integrable.

It must be pointed out that the fact that the domain of f is assumedbounded in the previous theorem is crucial.

Page 114: Lecture Notes on Mathematics for Economists: A refresher

12.3 FUNDAMENTAL THEOREMS OF CALCULUS 107

12.3 Fundamental Theorems of Calculus

Following are the two most important results relating integral and differentialcalculus. (Their proofs use the Mean Value Theorem that we learned inchapter 7.)

Theorem 12.5 (First Fundamental Theorem of Calculus) If f : [a, b]→R is differentiable, and f 0 : [a, b]→ R is integrable, then

bZa

f 0 (x) dx = f (b)− f (a)

Theorem 12.6 (Second Fundamental Theorem of Calculus) Supposethat f : [a, b]→ R is integrable. Define F : [a, b]→ R by, ∀x ∈ [a, b],

F (x) =

xZa

f (t) dt

If f is continuous at x ∈ X, then

F 0 (x) = f (x)

12.4 Antiderivatives (indefinite integrals)

The results in last section show the tight relation that exists between dif-ferential and integral calculus. We now show how we can take advantage ofsuch relation in order to find the integral

Definition 12.7 A function F : X → R is said to be an antiderivative off : X → R if ∀x ∈ X,

F 0 (x) = f (x)

Notice that antiderivatives are not unique.

Notation 12.1 Suppose that F : X → R is an antiderivative of f : X → R.Then, we will also write

F =

Zf (x) dx

Page 115: Lecture Notes on Mathematics for Economists: A refresher

108 CHAPTER 12 RIEMANN INTEGRATION

The following results (which establish the most useful properties of anti-derivatives,) come straightforwardly from theorems 5.2, 5.3 and 6.2:Z

(αf) (x) dx = α

Zf (x) dx for α ∈ RZ

(f + g) (x) dx =

Zf (x) dx+

Zg (x) dxZ

xndx =xn+1

n+K if n 6= −1Z

1

xdx = ln (x) +KZ

exdx = ex +KZef(x)f 0 (x) dx = ef(x) +KZ

(f (x))n f 0 (x) dx =1

n+ 1(f (x))n+1 +K if n 6= −1Z

1

f (x)f 0 (x) dx = ln (f (x)) +K

where K ∈ R (and is arbitrary). Notice that the first two lines assume thatantiderivatives for f and g exist (on the domain in which the functions aredefined).The importance of these rules is that, together with the Fundamentals

Theorems of Calculus, they make the computation of Riemann integrals aprocess in which one just has to reverse the one of differentiation. In particu-lar, if we can find the antiderivative of an integrable function (with boundeddomain), we can use the First Fundamental Theorem of Calculus in the com-putation of the Riemann integral.

Example 12.3 Suppose that f : R→ R is defined by

f (x) = x3 − 3x∀x ∈ R. Now, we can use the first, second and third rules that we justintroduced, to show thatZ

f (x) dx =1

4x4 − 3

2x2 +K

Page 116: Lecture Notes on Mathematics for Economists: A refresher

12.4 ANTIDERIVATIVES (INDEFINITE INTEGRALS) 109

where K ∈ R. Now, let F : [a, b]→ R be defined as ∀x ∈ [a, b],

F (x) =1

4x4 − 3

2x2

Clearly, F ∈ C∞ so that F is differentiable and F 0 is integrable (by theorem12.4). By the First Fundamental Theorem of Calculus, it follows that

bZa

F 0 (x) dx = F (b)− F (a)

which means that

bZa

¡x3 − 3x¢ dx = µ

1

4b4 − 3

2b2¶−µ1

4a4 − 3

2a2¶

=

µ1

4x4 − 3

2x2¶b

a

where the last inequality is just introducing some new notation. Alternativenotation would be: µ

1

4x4 − 3

2x2¶¯b

a

For example,

2Z1

¡x3 − 3x¢ dx = µ

1

4x4 − 3

2x2¶21

= −34

Example 12.4 Suppose that we are interested inZe−

x2

2 xdx

By our first rule, Ze−

x2

2 xdx = −Z

e−x2

2 (−x) dx

Page 117: Lecture Notes on Mathematics for Economists: A refresher

110 CHAPTER 12 RIEMANN INTEGRATION

Now, let f (x) = −x2

2, so that f 0 (x) = −x andZ

e−x2

2 xdx = −Z

ef(x)f 0 (x) dx

= −ef(x) +K

= −e−x2

2 +K

using the sixth rule. As before, by the First Fundamental Theorem, ∀a ∈ RaZ

−a

e−x2

2 xdx =³−e−x2

2 +K´a−a

= 0

12.5 Integration by parts

Suppose that we have two functions u : X → R and v : X → R, both ofwhich are differentiable. We know from chapter 5 that

(u.v)0 (x) = u (x) v0 (x) + v (x)u0 (x)

so thatu (x) v0 (x) = (u.v)0 (x)− v (x)u0 (x)

Therefore, by the rules that we introduced in the last sectionZu (x) v0 (x) dx =

Z ¡(u.v)0 (x)− v (x)u0 (x)

¢dx

=

Z(u.v)0 (x) dx−

Zv (x)u0 (x) dx

= (u.v) (x)−Z

v (x)u0 (x) dx

= u (x) .v (x)−Z

v (x)u0 (x) dx

Example 12.5 Suppose we want to findZexxdx

Page 118: Lecture Notes on Mathematics for Economists: A refresher

12.6 IMPROPER INTEGRALS 111

Then, let v0 (x) = ex and u (x) = x. Clearly, v (x) = ex and u0 (x) = 1.Then, we have Z

exxdx =

Zu (x) v0 (x) dx

= u (x) .v (x)−Z

v (x)u0 (x) dx

= exx−Z

exdx

= exx− ex

Therefore, by the First Fundamental Theorem of Calculus,

1Z0

exxdx = (exx− ex)10

= 1

12.6 Improper integrals

So far, we have restricted our definition of Riemann Integral of the functionf to the case in which for a, b ∈ R, the function f is defined on [a, b] andis bounded. It is convenient, however, to generalize the definition of theintegral.Suppose initially that a ∈ R and b ∈ R ∪ ∞, b > a, and consider the

function f : [a, b) → R. Then, if ∀d ∈ [a, b), the function is integrable whenits domain is restricted to [a, d], then, we define

bZa

f (x) dx = limd→b

dZa

f (x) dx

provided that the limit exists.Similarly, suppose now that a ∈ R∪−∞ and b ∈ R, a < b, and consider

the function f : (a, b] → R. Then, if ∀d ∈ (a, b], the function is integrablewhen its domain is restricted to [d, b], then, we define

bZa

f (x) dx = limd→a

bZd

f (x) dx

Page 119: Lecture Notes on Mathematics for Economists: A refresher

112 CHAPTER 12 RIEMANN INTEGRATION

provided that the limit exists.Finally, if we have a ∈ R ∪ −∞ and b ∈ R ∪ ∞, b > a, and consider

the function f : (a, b) → R. Then, if ∀c, d ∈ (a, b), c < d, the function isintegrable when its domain is restricted to [c, d], then, we define

bZa

f (x) dx =

γZa

f (x) dx+

bZγ

f (x) dx

for (any) γ ∈ (a, b), provided that both integrals and its sum exist (oneshould be particularly worried about the result ∞ + (−∞), which does notexist).

Example 12.6 Recall example 12.5. It follows that∞Z0

exxdx = limd→∞

(exx− ex)d0

= limd→∞

¡ed (d− 1)¢+ 1

= ∞Exercise 12.1 (From Apostol, Calculus:) For a ∈ R, b ∈ R++, compute:

aZ0

¡1 + x+ x2

¢dx

2aZ0

¡1 + x+ x2

¢dx

2aZ−1

¡1 + x+ x2

¢dx

aZ−2

x2¡1 + x2

¢dx

a2Za

¡1 + x2

¢2dx

Page 120: Lecture Notes on Mathematics for Economists: A refresher

12.7 INTEGRATION IN HIGHER-DIMENSIONAL SPACES 113

bZ1

¡1 + x1/2

¢dx

b2Zb

¡x1/4 + x1/2

¢dx

Exercise 12.2 (From Apostol, Calculus:) Show that:Z √1− x2dx = x

√1− x2 +

Zx2√1− x2

dx

Exercise 12.3 Recall example 12.4. Show that

∞Z−∞

e−x2

2 xdx = 0

Exercise 12.4 Recall example 12.3. Show that

∞Z−∞

¡x3 − 3x¢ dx

does not exist.

12.7 Integration in higher-dimensional spaces

Recall that one of the goals of this notes was to introduce concepts in sucha way that generalizing to multiple dimensions was relatively simple. Sur-prisingly, one of the easiest concepts to generalize is the one of Riemannintegral.From now on, we maintain the assumption that a, b, c, d ∈ R, a < b and

c < d.

Definition 12.8 A function s : [a, b]× [c, d]→ R is a step function if thereexist finite monotonically increasing sequences (xn)

n∗n=1 and (ym)

m∗m=1 and a

finite double array³(sn,m)

n∗n=1

´m∗m=1

such that x1 > a, xn∗ = b, y1 > c, ym∗ =

Page 121: Lecture Notes on Mathematics for Economists: A refresher

114 CHAPTER 12 RIEMANN INTEGRATION

d and that ∀n ∈ 1, ..., n∗, ∀m ∈ 1, ...,m∗ and ∀ (x, y) ∈ (xn−1, xn) ×(ym−1, ym),1 s (x, y) = sn,m, where x0 = a and y0 = c.

Notice that this is a straightforward generalization of definition 12.2, withthe sole exception that step functions need no longer be bounded (why?).

Example 12.7 Consider s : [−2, 2]× [0, 1]→ R, defined by:

s (x, y) =

−1 if − 2 6 x < 0

y if x = 0

1 if 0 < x < 1 and 0 < y 6 0.5−1 if 0 < x < 1 and 0.5 < y 6 1−2 if 1 6 x 6 2 and 0 < y < 0.5

2 if 1 6 x 6 2 and 0.5 6 y 6 1

To see that s is step, define xn3n=1 = 0, 1, 2, ym2m=1 = 0.5, 1 anddefine

©sn,m3n=1ª2m=1 bys1,1 = −1s1,2 = −1s2,1 = 1

s2,2 = −1s3,1 = −2s3,2 = 2

Notice that the definition of s (x, y) when x = 0 does not matter. If wehad defined ∀y ∈ [0, 1], s (0, y) = 0 the same definitions would imply that sis step. Moreover, these definitions would still work if we defined s (0, 0) = 0and ∀y ∈ (0, 1], s (0, y) = ln (y), but in this case s would not be bounded!From now, we will denote Q = [a, b] × [c, d]. Notice that Q is a closed

cube.Again, a straightforward generalization is the following:1This is where notation exhibits a conflict: the left-hand side of the expression is an

ordered pair, whereas each one of the terms in the Cartesian product on the right-handside is an open interval. Had we used ]a, b[ for open intervals, the conflict would not havearisen, but this would be nonstandard.

Page 122: Lecture Notes on Mathematics for Economists: A refresher

12.7 INTEGRATION IN HIGHER-DIMENSIONAL SPACES 115

Definition 12.9 Given a step function s : Q→ R, we define the integral ofs on Q by:ZZ

Q

s (x, y) dxdy =m∗Xm=1

n∗Xn=1

sn,m (xn − xn−1) (ym − ym−1)

where x0 = a, y0 = c, (xn)n∗n=1 and (ym)

m∗m=1 are two finite monotonically

increasing sequences and³(sn,m)

n∗n=1

´m∗m=1

is a finite double array such that

x1 > a, xn∗ = b, y1 > c, ym∗ = d and that ∀n ∈ 1, ..., n∗, ∀m ∈ 1, ...,m∗and ∀ (x, y) ∈ (xn−1, xn)× (ym−1, ym), s (x, y) = sn,m.

Example 12.8 For s defined as in example 12.7, Q = [−2, 2]× [0, 1] andZZQ

s (x, y) dxdy = (−1) (0− (−2)) + 1 (1− 0) (0.5− 0)+ (−2) (2− 1) (0.5− 0) + (−1) (0− (−2)) (1− 0.5)+ (−1) (1− 0) (1− 0.5) + 2 (2− 1) (1− 0.5)

= −1 + 0.5− 1− 1− 0.5 + 1= −2

Given definitions 12.8 and 12.9, the following should appear natural as ageneralization of definition 12.4.

Definition 12.10 Let f : Q → R be a bounded function. If there exists aunique I ∈ R such thatZZ

Q

s (x, y) dxdy 6 I 6ZZ

Q

t (x, y) dxdy

for every pair of step functions s : Q → R and t : Q → R such that∀ (x, y) ∈ Q

s (x, y) 6 f (x, y) 6 t (x, y)

then, f is said to be integrable, and I is said to be the integral of f , whichwe denote by ZZ

Q

f (x, y) dxdy = I

Page 123: Lecture Notes on Mathematics for Economists: A refresher

116 CHAPTER 12 RIEMANN INTEGRATION

Notice that the definition requires I to exist and be unique.As before, the definition results clumsy to compute the integral. However,

the following theorem simplifies such computation.

Theorem 12.7 Let f : Q → R be bounded and integrable. Suppose that∀y ∈ [c, d], f (·, y) : [a, b] → R is integrable. Let A : [c, d] → R be defined by∀y ∈ [c, d],

A (y) =

bZa

f (x, y) dx

If A is integrable, then

ZZQ

f (x, y) dxdy =

dZc

A (y) dy

This result is normally expressed by saying that

ZZQ

f (x, y) dxdy =

dZc

bZa

f (x, y) dx

dy

or simply that ZZQ

f (x, y) dxdy =

dZc

bZa

f (x, y) dxdy

Notice, however, that the latter is not a definition. It is the implicationof a theorem and applies only when its assumptions hold.

Exercise 12.5 The proof of this theorem is relatively simple; try and fill inthe gaps.Proof. Let s : Q→ R and t : Q→ R be step functions such that ∀ (x, y) ∈ Q

s (x, y) 6 f (x, y) 6 t (x, y)

Fix y ∈ [c, d]. Clearly, s (·, y) and t (·, y) are also step functions and ∀x ∈[a, b]

s (x, y) 6 f (x, y) 6 t (x, y)

Page 124: Lecture Notes on Mathematics for Economists: A refresher

12.7 INTEGRATION IN HIGHER-DIMENSIONAL SPACES 117

Then, by definition (which one?)

bZa

s (x, y) dx 6 A (y) 6bZ

a

t (x, y) dx

Now, the left-most and right-most terms on the previous expressions arethemselves step functions (of y on [c, d]; check this!), and by assumptionthe term in the middle is integrable, so that, by theorem 12.3,

dZc

bZa

s (x, y) dx

dy 6dZc

A (y) dy 6dZc

bZa

t (x, y) dx

dy

and then, by properties of sums (justify this!),

ZZQ

s (x, y) dxdy 6dZc

A (y) dy 6ZZ

Q

t (x, y) dxdy

and, finally, since s and t were arbitrary,

ZZQ

f (x, y) dxdy =

dZc

A (y) dy

Moreover, most usually one can use the following:

Theorem 12.8 If f : Q→ R is continuous, then it is integrable and

ZZQ

f (x, y) dxdy =

dZc

bZa

f (x, y) dx

dy

=

bZa

dZc

f (x, y) dy

dx

Page 125: Lecture Notes on Mathematics for Economists: A refresher

118 CHAPTER 12 RIEMANN INTEGRATION

Example 12.9 Let f : [0, 1] × [0, 1] → R be defined by: ∀ (x, y) ∈ [0, 1] ×[0, 1],

f (x, y) = x3y − 3xy2f is continuous andZZ

[0,1]×[0,1]

f (x, y) dxdy =

1Z0

1Z0

¡x3y − 3xy2¢ dx

dy

=

1Z0

µ1

4x4y − 3

2x2y2

¶10

dy

=

1Z0

µ1

4y − 3

2y2¶10

dy

=

µ1

8y2 − 1

2y3¶10

= −38

Exercise 12.6 Show that ∀a ∈ R+ and ∀c, d ∈ R : c > d,ZZ[−a,a]×[c,d]

e−x2+y2

2 xydxdy = 0

(This result will be extremely useful in mathematical statistics. Do you knowwhy?)

So far we have restricted attention to functions defined on rectangles (orcubes) only. This is clearly limited, but easy to overcome.

Definition 12.11 Let S ⊆ Q. If f : D→ R and S ⊆ D ⊆ Q, defineZZS

f (x, y) dxdy =

ZZQ

ef (x, y) dxdywhere ef : Q→ R is defined by

ef (x, y) = f (x, y) if (x, y) ∈ S

0 otherwise

Page 126: Lecture Notes on Mathematics for Economists: A refresher

12.7 INTEGRATION IN HIGHER-DIMENSIONAL SPACES 119

Exercise 12.7 Let Q = [0, 2]× [0, 2] and suppose that f : Q→ R is definedby ∀ (x, y) ∈ Q, f (x, y) = xy. Let

S =©(x, y) ∈ R|x2 6 y < x

ªShow that S ⊆ Q. Show that S 6= ∅. Show thatZZ

S

f (x, y) dxdy =1

24

Can you prove the last result using two different arguments (orders of inte-gration)?

Page 127: Lecture Notes on Mathematics for Economists: A refresher
Page 128: Lecture Notes on Mathematics for Economists: A refresher

Chapter 13

Probability

13.1 Measure Theory

Suppose that we have fixed a universe S. Denote by P (S) the set of allsubsets of S. That is, E ∈ P (S) iff E ⊆ S. Obviously, ∅ ∈ P (S) andS ∈ P (S). One can also say that ∅, S ⊆ P (S) and ∅ ⊆ P (S), but itwould be a mistake to say that S ⊆ P (S).

13.1.1 Algebras and σ-algebras:

Our problem now is to define, in a consistent manner, the size of (some of) thesubsets of S. The consistency of our definition will require some “structure”on the family of subsets whose size we define: (i) we should be able to tellthe size of the set with no elements in it; (ii) If we are able to measure aset, we should also be able to measure the rest of the universe; (iii) If we areable to measure a series of sets, then we should also be able to measure theirunion.For this:

Definition 13.1 Σ ⊆ P (S) is a (Boolean) algebra if:1. ∅ ∈ Σ

2. A ∈ Σ =⇒ S\A ∈ Σ

3.³N ∈ N ∧ (An)

Nn=1 in Σ

´=⇒

N[n=1

An ∈ Σ

121

Page 129: Lecture Notes on Mathematics for Economists: A refresher

122 CHAPTER 13 PROBABILITY

Theorem 13.1 If Σ is an algebra, then:

1. S ∈ Σ

2.³N ∈ N ∧ (An)

Nn=1

sec⊆ Σ´=⇒

N\n=1

An ∈ Σ

Proof. Left as an exercise. For the second part, recall De Morgan’s laws.

Theorem 13.2 P (S) is an algebra.

Proof. Left as an exercise.

Theorem 13.3 Let ∆ 6= ∅ be a collection of Algebras. Σ =\Σ0∈∆

Σ0 is an

algebra.

Proof. By construction, (∀Σ0 ∈ ∆) : ∅ ∈ Σ0, which implies that∅ ∈ Σ. Sup-pose now that A ∈ Σ. By definition, (∀Σ0 ∈ ∆) : A ∈ Σ0, which implies that(∀Σ0 ∈ ∆) : S\A ∈ Σ0 and hence that S\A ∈ Σ. Finally, let N ∈ N and let(An)

Nn=1 be a sequence in Σ. By construction, (∀n ∈ 1, ..., N) (∀Σ0 ∈ ∆) :

An ∈ Σ0, which implies that (∀Σ0 ∈ ∆) (∀n ∈ 1, ..., N) : An ∈ Σ0, which

means that (∀Σ0 ∈ ∆) : (An)Nn=1

sec⊆ Σ0 and hence that (∀Σ0 ∈ ∆) :N[n=1

An ∈

Σ0, soN[n=1

An ∈ Σ.

Exercise 13.1 Is it true that if ∆ is a collection of algebras and ∆ 6= ∅,then Σ =

[Σ0∈∆

Σ0 is an algebra?

Theorem 13.4 For every A ⊆ P (S), there is an algebra Σ ⊆ P (S) suchthat:

1. A ⊆ Σ

2. If Σ0 ⊆ P (S) is an algebra and A ⊆ Σ0, then Σ ⊆ Σ0.

Page 130: Lecture Notes on Mathematics for Economists: A refresher

13.1 MEASURE THEORY 123

Proof. Consider the set

∆ = Σ0 ⊆ P (S) : Σ0 is an algebra and A ⊆ Σ0

Since P (S) ∈ ∆, it follows that ∆ 6= ∅. Let Σ =\Σ0∈∆

Σ0.

That Σ satisfies the properties of the statement is obvious, while it followsby theorem 13.3 that Σ is an algebra.The algebra Σ of the previous theorem is known as the algebra generated

by A. Notice that it is the and not a, because, so defined, Σis unique.Notice that the conditions of the definition of algebra have the intuition

we wanted. For some purposes, however, we need to strengthen the thirdproperty:

Definition 13.2 Σ ⊆ P (S) is a σ-algebra if:

1. ∅ ∈ Σ

2. A ∈ Σ =⇒ S\A ∈ Σ

3. (An)∞n=1 in Σ =⇒

∞[n=1

An ∈ Σ

Theorem 13.5 P (S) is a σ-algebra.

Proof. Left as an exercise.

Theorem 13.6 If Σ is a σ-algebra, then it is an algebra. When S is finite,if Σ is an algebra, then it is a σ-algebra.

Proof. Left as an exercise.

Theorem 13.7 If Σ is an σ-algebra, then

1. S ∈ Σ

2. (An)∞n=1 in Σ =⇒

∞\n=1

An ∈ Σ

Proof. Left as an exercise.

Page 131: Lecture Notes on Mathematics for Economists: A refresher

124 CHAPTER 13 PROBABILITY

Theorem 13.8 Let ∆ 6= ∅ be a collection of σ-algebras in S. Then, Σ =\Σ0∈∆

Σ0 is a σ-algebra.

Proof. Left as an exercise.

Exercise 13.2 Is it true that if ∆ is a collection of σ-algebras of S and∆ 6= ∅, then Σ =

[Σ0∈∆

Σ0 is a σ-algebra?

Theorem 13.9 For every A ⊆ P (S), there exists a σ-algebra Σ ⊆ P (S)such that:

1. A ⊆ Σ

2. If Σ0 ⊆ P (S) is a σ-algebra and A ⊆ Σ0, then Σ ⊆ Σ0.

Proof. Left as an exercise.The σ-algebra Σ of the previous theorem is the σ-algebra generated by

A. Notice that it is the, and not a, as in the case of algebras. The σ-algebragenerated by A is denoted by σ (A) .Definition 13.3 If Σ is a σ-algebra for S, then (S,Σ) is a measurable space.

There is an argument, oftentimes used when dealing with σ-algebras,known as the good-set principle. Let Σ ⊆ P (S) be a σ-algebra for S. Thinkof Σ as the family of all the subsets of S satisfying some property: the goodsubsets of S. The good-set principle says the following: if A is an arbitraryfamily of good subsets of S, then all the sets in σ (A) are good. The resultis trivial since, by hypothesis, the class of all good subsets is a σ-algebra, so,by definition, σ (A) ⊆ Σ. In words: if the good subsets form a σ-algebra,then all the sets in the σ-algebra generated by a family of good subsets aregood as well.

Exercise 13.3 Let A be a class of subsets of S and A ⊆ S. For any E ⊆P (S), denote by E ∩A the class B ∩A : B ∈ E. Show that:

σA(A ∩A) ⊆ σS(A) ∩Awhere σA(A ∩ A) denotes the σ-algebra generated by A ∩ A relative to theuniverse A and σS(A) is the σ-algebra generated for A relative to universeS.

Page 132: Lecture Notes on Mathematics for Economists: A refresher

13.1 MEASURE THEORY 125

The good-set principle allows us to show that the relationship betweenσA(A ∩ A) and σS(A) ∩ A is stronger that the previous exercise suggests:define Σ = E ∈ σS (A) : E ∩A ∈ σA (A ∩A); notice that Σ is a σ-algebra(for S) and satisfies that A ⊆ Σ; the latter implies that σS(A) ⊆ Σ, so

E ∈ σS(A) =⇒ E ∩A ∈ σA (A ∩A)

and, then, σS(A) ∩ A ⊆ σA (A ∩A). Here, Σ is the family of all subsets ofS. This result and the exercise together imply that

σA(A ∩A) = σS(A) ∩A

13.1.2 Measure

The idea here is that Σ is the collection of subsets of S that we can “measure”.Now, what do we understand by “measuring”? Intuitively, , what we want todo is to associate each set to a number. Of course, this assignment cannot bearbitrary: (i) sizes cannot be negative, and we must consider the possibilityof finding an “infinitely large” set; (ii) a set that contains nothing must havezero measure; (iii) if we take a collection of mutually separated sets andwe measure them, and we then measure their union, the sum of the firstmeasures must equal the last measure. Formally:Denote R∗ = R ∪ −∞,∞, which is usually called the extended real

line, and let us take its positive orthant: R∗+ = R+ ∪ ∞.

Definition 13.4 Let Σ be an algebra for S and let µ : Σ −→ R∗. µ is saidto be finitely additive if³

N ∈ N ∧ (An)Nn=1 in Σ ∧ (n 6= n0 =⇒ An ∩An0 = ∅)

´=⇒ µ

ÃN[n=1

An

!=

NXn=1

µ (An)

Definition 13.5 Let (S,Σ) be a measurable space and let µ : Σ −→ R∗. µis said to be σ-additive if

((An)∞n=1 in Σ ∧ (n 6= n0 =⇒ An ∩An0 = ∅)) =⇒ µ

à ∞[n=1

An

!=

∞Xn=1

µ (An)

Page 133: Lecture Notes on Mathematics for Economists: A refresher

126 CHAPTER 13 PROBABILITY

Theorem 13.10 Let (S,Σ) be a measurable space. If µ : Σ −→ R∗ is σ-additive, then it is additive.

Proof. Left as an exercise.

Theorem 13.11 If S is finite, Σ is an algebra for S and µ : Σ −→ R∗isfinitely additive, then (S,Σ) is a measurable space and µ is σ-additive.

Proof. Left as an exercise.

Theorem 13.12 Let Σ be an algebra for S and let µ : Σ −→ R∗ be finitelyadditive. If ∃A ∈ Σ such that µ (A) ∈ R, then µ (∅) = 0.

Proof. Left as an exercise.

Exercise 13.4 Let S be an infinite, countable set (i.e. there exists a bijectivefunction between N and S). Define the following class of subsets of S, Σ =A ⊆ S : A or Ac is finite and define µ : Σ → 0, 1 by µ(A) = 0 if A isfinite and µ(A) = 1 if Ac is finite.

1. Show that Σ is an algebra.

2. Show that µ is finitely additive but not σ-additive.

3. Show that there exists (An)∞n=1 in Σ such that (∀n ∈ N) : µ(An) = 0 but

µ(S∞

n=1An) = 1. (Hint: find (An)∞n=1 such that (∀n ∈ N) : An ⊆ An+1

andS∞

n=1An = S).

It is obvious that the structure imposed when we consider arbitrary se-quences is more than when we consider finite sequences only. But, is the extracomplication necessary? To see that it is, consider the following experiment:a coin is tossed until if comes head. Suppose that we want to measure theprobability that the experiment stops in an even toss. We need to considercountable but infinite sequences! Notice that additivity corresponds to thecondition (iii) that we want to impose to our measures.

Definition 13.6 Let (S,Σ) be a measurable space and let µ : Σ −→ R∗. µis said to be a measure if it is σ-additive and satisfies that µ (∅) = 0 and∀A ∈ Σ, µ (A) ∈ R∗+.

Page 134: Lecture Notes on Mathematics for Economists: A refresher

13.1 MEASURE THEORY 127

Definition 13.7 A measure space is (S,Σ, µ), where (S,Σ) is a measurablespace and µ : Σ −→ R∗+ is a measure.

Theorem 13.13 Let (S,Σ) be a measurable space and let µ, µ0 : Σ −→ R∗+be measures. µ∗ = µ+ µ0 is a measure as well

Proof. Left as an exercise.

Theorem 13.14 Let (S,Σ) be a measurable space and let A∗ ∈ Σ. Defineµ∗ : Σ −→ R∗+ by µ∗ (A) = µ (A ∩A∗). µ∗ is a measureProof. Left as an exercise.

Theorem 13.15 Let (S,Σ, µ) be a measure space. Then,

A,A0 ∈ Σ, A ⊆ A0 =⇒ µ (A) ≤ µ (A0)

If, additionally, µ (A) ∈ R, then µ (A) = µ (A0)− µ (A0\A).Proof. Left as an exercise.

Exercise 13.5 Prove the following results:

1. If X : S −→ R is ∅, S-measurable, then ∀s, s0 ∈ S, X (s) = X (s0).

2. Any X : S −→ R is P (S)-measurable.3. σ (S) = ∅, S.4. If Σ is a σ-algebra, then σ (Σ) = Σ.

13.1.3 Example: Lebesgue measure

Let L ∈ N. Denote by I the class of subsets I of RL that can be written asI =

QLl=1 [al, bl], for (al, bl)

Ll=1 defined in R2 such that ∀l ∈ 1, ..., L, al < bl.

Define v : I −→ R, by

v

ÃLYl=1

[al, bl]

!=

LYl=1

(bl − al)

Define the “outer measure” function, m : P ¡RL¢ −→ R∗+, by

m (A) = inf

( ∞Xn=1

v (In) : (In)∞n=1 in I ∧A ⊆

∞[n=1

In

)

Page 135: Lecture Notes on Mathematics for Economists: A refresher

128 CHAPTER 13 PROBABILITY

Definition 13.8 A set A ⊆ RL is Lebesgue-measurable if ∀ε > 0, ∃O ⊆ RL,open and such that m (O\A) < ε.

Denote by LL ⊆ P¡RL¢the class of all Lebesgue-measurable sets. Define

the “Lebesgue measure” µ : LL −→ R∗+ by µ (A) = m (A).

Theorem 13.16¡RL, LL, µ

¢is a measure space.

Proof. This proof is beyond these notes. It can be found in standardtextbook on the topic.

13.2 Probability

When the space S we are dealing with is the space of all possible results ofan experiment, the subsets we want to measure are called events and theirmeasures are understood as probabilities (which can be understood fromeither a frequentist or a likelihood perspective).. So,

Definition 13.9 Let (S,Σ, p) be a measure space. If p (S) = 1, we saythat S is a sample space, that (S,Σ, p) is a probability space and that p is aprobability measure.

So defined, the properties we impose for p to be considered a probabilitymeasure are known as Kolmogorov’s axioms.

Theorem 13.17 Let (S,Σ, p) be a probability space. For every E ∈ Σ,p (E) ≤ 1 and p (Ec) = 1− P (E), where Ec = S\E.

Proof. Left as an exercise.

Theorem 13.18 Let (S,Σ, p) be a probability space. For every E,E0 ∈Σ, p (E0 ∩Ec) = P (E0) − P (E0 ∩E) and p (E ∪E0) = p (E) + p (E0) −p (E ∩E0).

Proof. Left as an exercise.

Page 136: Lecture Notes on Mathematics for Economists: A refresher

13.2 PROBABILITY 129

Exercise 13.6 Prove the following generalization of theorem 13.18: let (S,Σ, p)be a probability space. A partition of S is a sequence (En)

Nn=1 in Σ, with

N ∈ N ∪ ∞, such that

n 6= n0 =⇒ En ∩En0 = ∅N[n=1

En = S

Then, for every E ∈ Σ and every partition (En)Nn=1 of S, with N ∈ N∪∞,

p (E) =NXn=1

p (E ∩En)

Proof. Left as an exercise.

Theorem 13.19 (Bonferroni’s simple inequality) Let (S,Σ, p) be a prob-ability space. For every E,E0 ∈ Σ, P (E ∩E0) ≥ p (E) + p (E0)− 1

Proof. Left as an exercise.

Theorem 13.20 (Boole’s inequality) Let (S,Σ, p) be a probability space.For every sequence (En)

∞n=1 in Σ,

p

à ∞[n=1

En

!≤

∞Xn=1

p (En)

Proof. Left as an exercise.

Theorem 13.21 Let (S,Σ, p) be a probability space. Let (En)∞n=1 be a se-

quence in Σ such that (∀n ∈ N) : En ⊆ En+1, then

p

à ∞[n=1

En

!= lim

n−→∞p (En)

Page 137: Lecture Notes on Mathematics for Economists: A refresher

130 CHAPTER 13 PROBABILITY

Proof. Since p is σ-additive,

p

à ∞[n=1

En

!= p (E1) +

∞Xn=1

p (En+1\En)

= p (E1) +∞Xn=1

(p (En+1)− p (En))

= p (E1) + limN−→∞

NXn=1

(p (En+1)− p (En))

= p (E1) + limN−→∞

(p (EN+1)− p (E1))

= limN−→∞

p (EN)

where the second step comes from theorem 13.18 and the limit exists becauseit is a monotone bounded sequence.

Corollary 13.1 Let (S,Σ, p) be a probability space. (En)∞n=1 be a sequence

in Σ such that (∀n ∈ N) : En ⊇ En+1, then

p

à ∞\n=1

En

!= lim

n−→∞p (En)

Proof. Left as an exercise.

13.3 Conditional probability

Henceforth, we fix a probability space (S,Σ, p).

Definition 13.10 Let E∗ ∈ Σ be such that p (E∗) > 0. The probabilitymeasure given (or conditional on) E∗ is defined by p (• | E∗) : Σ −→ [0, 1],

p (E | E∗) = p (E ∩E∗)p (E∗)

Theorem 13.22 Let E,E0 ∈ Σ and suppose that p (E) ∈ (0, 1). Then,

p (E0) = p (E) p (E0 | E) + (1− p (E)) p (E0 | Ec)

Page 138: Lecture Notes on Mathematics for Economists: A refresher

13.4 INDEPENDENCE 131

Proof. By definition,

p (E) p (E0 | E) + (1− p (E)) p (E0 | Ec) =

µp (E)

p (E0 ∩E)p (E)

+ (1− p (E))p (E0 ∩Ec)

p (Ec)

¶= p (E0 ∩E) + p (E0 ∩Ec)

= p (E0)

because of σ-additivity.The previous theorem, in fact, admits the following generalization:

Theorem 13.23 Let (En)Nn=1 be a partition of S such that (∀n ∈ 1, ..., N) :

p (En) > 0. Then, (∀E0 ∈ Σ) :

p (E0) =NXn=1

p (En) p (E0 | En)

Proof. Left as an exercise.

Theorem 13.24 Let N ∈ N and (En)Nn=1 in Σ be such that p

³TN−1n=1 En

´>

0. Then,

p

ÃN\n=1

En

!= p (E1) p (E2 | E1) p (E3 | E1 ∩E2) ...p

ÃEN |

N−1\n=1

En

!Proof. Left as an exercise. (Hint: mathematical induction!)

13.4 Independence

Definition 13.11 A family of events E ⊆ Σ is pairwise independent if

(∀E,E0 ∈ E , E 6= E0) : p (E ∩E0) = p (E) p (E0)

Definition 13.12 A family of events E ⊆ Σ is independent if

(∀N ∈ N)³∀ EnNn=1 ⊆ E

´: p

ÃN\n=1

En

!=

NYn=1

p (En)

Page 139: Lecture Notes on Mathematics for Economists: A refresher

132 CHAPTER 13 PROBABILITY

Theorem 13.25 If E is independent, then it is pairwise independent.

Proof. Left as an exercise.

Example 13.1 Notice that pairwise independence does not suffice for in-dependence. Let S = 1, 2, 3, 4, 5, 6, 7, 8, 9, Σ = P (S) and suppose that(∀s ∈ S) : p (s) = 1/9. Let E1 = 1, 2, 7, E2 = 3, 4, 7 and E3 =5, 6, 7, so p (E1) = p (E2) = p (E3) = 1/3. Now, if i, j ∈ 1, 2, 3, i 6= j,then p (Ei ∩Ej) = 1/9 = p (Ei) p (Ej), but

p (E1 ∩E2 ∩E3) = 1/9 6= 1/27 = p (E1) p (E2) p (E3)

(Question: why can”t one show that pairwise independence suffices for inde-pendence using mathematical induction?)

Notice thatT

E∈E E = ∅ is neither necessary nor sufficient for indepen-dence.

Theorem 13.26 If E,E0 is independent, then so is Ec, E0.

Proof. By additivity and independence of E,E0:

p (Ec ∩E0) = p (E0)− p (E ∩E0)= p (E0)− p (E) p (E0)

= (1− p (E)) p (E0)= p (Ec) p (E0)

Corollary 13.2 If E,E0 is independent, then so is Ec, (E0)c.

Proof. This is trivial.

Theorem 13.27 Let E be independent. E∗ = E ∈ Σ : Ec ∈ E is indepen-dent..

Proof. Left as an exercise.

Page 140: Lecture Notes on Mathematics for Economists: A refresher

13.5 RANDOM VARIABLES 133

13.5 Random variables

Fix a measurable space (S,Σ)

Definition 13.13 A function X : S −→ R is measurable with respect to Σ(or Σ-measurable) if for every x ∈ R,

s ∈ S : X (s) ≤ x ∈ Σ

Theorem 13.28 If X is Σ-measurable, then ∀x ∈ R,

s ∈ S : X (s) ≥ x ∈ Σ

s ∈ S : X (s) < x ∈ Σ

s ∈ S : X (s) > x ∈ Σ

s ∈ S : X (s) = x ∈ Σ

Proof. Left as an exercise.

Definition 13.14 A random variable (in R) is a Σ-measurable function X :S −→ R.

We now endow the measurable space with a probability measure p andfix a random variable X : S −→ R.

Definition 13.15 The distribution function of X is FX : R −→ [0, 1], de-fined by FX (x) = p (s ∈ S : X (s) ≤ x)

Notice that FX is well defined because X is Σ-measurable.

Theorem 13.29 Let FX be the distribution function of X. Then,

1. limx−→−∞ FX (x) = 0 and limx−→∞ FX (x) = 1

2. x ≥ x0 =⇒ FX (x) ≥ FX (x0)

3. FX is right continuous: ∀x ∈ R, limh↓0 FX (x+ h) = FX (x)

Page 141: Lecture Notes on Mathematics for Economists: A refresher

134 CHAPTER 13 PROBABILITY

Proof. To see that limx−→−∞ FX (x) = 0, consider

(En)∞n=1 = (s ∈ S : X (s) ≤ −n)∞n=1 ,

which is a sequence in Σ. Notice that (∀n ∈ N) : En ⊇ En+1, so

limx−→−∞

FX (x) = limn−→∞

p (En)

= p

à ∞\n=1

En

!

= p

à ∞\n=1

s ∈ S : X (s) ≤ −n!

= p (s ∈ S : (∀n ∈ N) : X (s) ≤ −n)= p (∅)= 0

where the second equality comes from corollary 13.1.Proving that limx−→∞ FX (x) = 1 and x ≥ x0 =⇒ FX (x) ≥ FX (x

0) is leftas exerciseNow, fix x ∈ R and consider (En)

∞n=1 = (s ∈ S : X (s) ≤ x+ 1/n)∞n=1.

Notice that (∀n ∈ N) : En ⊇ En+1, so

limh↓0

FX (x) = limn−→∞

p (En)

= p

à ∞\n=1

En

!

= p

à ∞\n=1

s ∈ S : X (s) ≤ x+ 1/n!

= p (s ∈ S : (∀n ∈ N) : X (s) ≤ x+ 1/n)= p (s ∈ S : X (s) ≤ x)= FX (x)

where the second equality comes from corollary 13.1.Notice that it is not necessarily true that limh↑0 FX (x+ h) = FX , so we

cannot guarantee that F is continuous. It is a good exercise to find a case in

Page 142: Lecture Notes on Mathematics for Economists: A refresher

13.5 RANDOM VARIABLES 135

which limh↑0 FX (x+ h) 6= FX . It is also important to see which step in theobvious proof of left-continuity would fail:

s ∈ S| (∃n ∈ N) : X (s) ≤ x− 1/n = s ∈ S|X (s) < x

which may be a proper subset of s ∈ S|X (s) ≤ x.

Theorem 13.30 Let FX be the distribution function of X. Then

1. (∀x ∈ R) : p (s ∈ S : X (s) > x) = 1− FX (x)

2. (∀x, x0 ∈ R : x ≤ x0) : p (s ∈ S : x < X (s) ≤ x0) = FX (x0)− FX (x)

3. (∀x ∈ R) : p (s ∈ S : X (s) = x) = FX (x)− limh↑0 FX (x+ h)

Proof. Part 1 is left as an exercise.To see the second part, notice that

p (s ∈ S : x < X (s) ≤ x0) = p (s ∈ S : X (s) ≤ x0 \ s ∈ S : X (s) ≤ x)= p (s ∈ S : X (s) ≤ x0)− p (s ∈ S : X (s) ≤ x)= FX (x

0)− FX (x)

where the second equality follows since x ≤ x0.For the third part, consider (En)

∞n=1 = (s ∈ S : X (s) ≤ x− 1/n)∞n=1.

Notice that (∀n ∈ N) : En ⊆ En+1, so

limh↑0

FX (x+ h) = limn−→∞

p (En)

= p

à ∞[n=1

En

!

= p

à ∞[n=1

s ∈ S : X (s) ≤ x− 1/n!

= p (s ∈ S : (∃n ∈ N) : X (s) ≤ x− 1/n)= p (s ∈ S : X (s) < x)= p (s ∈ S : X (s) ≤ x \ s ∈ S : X (s) = x)= p (s ∈ S : X (s) ≤ x)− p (s ∈ S : X (s) = x)= FX (x)− p (s ∈ S : X (s) = x)

Page 143: Lecture Notes on Mathematics for Economists: A refresher

136 CHAPTER 13 PROBABILITY

where the second equality comes from theorem 13.21 and the seventh sinces ∈ S : X (s) = x ⊆ s ∈ S : X (s) ≤ x.The distribution function of a random variable characterizes (totally de-

fines) its associated probability measure.

Theorem 13.31 Let FX be the distribution function of X. Let g : R −→ Rbe strictly increasing and define the random variable Y = g X : S −→ R.Denote by FY the distribution function of Y . Then,

(∀y ∈ R) : FY (y) = FX

¡g−1 (y)

¢Proof. Let y ∈ R.

FY (y) = p (s ∈ S : Y (s) ≤ y)= p (s ∈ S : g (X (s)) ≤ y)= p

¡©s ∈ S : X (s) ≤ g−1 (y)

ª¢= FX

¡g−1 (y)

¢where existence of g−1 and the third equality follow from the fact that g isstrictly increasing.

Theorem 13.32 Let FX be the distribution function of X. Let g : R −→ Rbe strictly decreasing and define the random variable Y = g X : S −→ R.Denote by FY the distribution function of Y . Then,

(∀y ∈ R) : FY (y) = 1− limh↑0

FX

¡g−1 (y) + h

¢If X is continuous

(∀y ∈ R) : FY (y) = 1− FX

¡g−1 (y)

¢Proof. Left as an exercise.

Definition 13.16 A random variable X is said to be continuous if FX iscontinuous. X is said to be absolutely continuous if there exists fX : R −→R+ (integrable) such that

(∀x ∈ R) : FX (x) =

xZ−∞

fX (u) du

In this case fX is said to be a density function of X.

Page 144: Lecture Notes on Mathematics for Economists: A refresher

13.5 RANDOM VARIABLES 137

Corollary 13.3 Let fX be a density function of an absolutely continuousrandom variable X. Let g : R −→ R ∈ C1 be strictly increasing and definethe random variable Y = g X : S −→ R. Define fY : R −→ R by:

(∀y ∈ R) : fY (y) = fX¡g−1 (y)

¢/g0¡g−1 (y)

¢fY is a density function for Y .

Definition 13.17 Left as an exercise (Hint: remember the chain rule andthe inverse function theorem).

Corollary 13.4 Let fX be a density function of an absolutely continuousrandom variable X. Let g : R −→ R ∈ C1 be strictly decreasing and definethe random variable Y = g X : S −→ R. Define fY : R −→ R by:

(∀y ∈ R) : fY (y) = −fX¡g−1 (y)

¢/g0¡g−1 (y)

¢fY is a density function for Y .

Proof. Left as an exercise.The last two theorems and corollaries have been stated under assumptions

stronger than needed: it suffices that g be increasing in the closure of the seton which F is increasing. The latter set is known as the support of X.

Exercise 13.7 Let FX be the distribution function of a continuous randomvariable X. Let g : R −→ R; g (x) = x2 and define the random variableY = g X : S −→ R+.Denote by FY the distribution function of Y . Showthat

(∀y ∈ R+) : FY (y) = FX (√y)− FX (−√y)

and find a density function for Y , under the assumption that X is absolutelycontinuous.

Exercise 13.8 Prove the following result: Let FX be the distribution func-tion of X. Suppose that FX is strictly increasing and define the randomvariable Y = FX X. Y follows a standard uniform distribution.

Page 145: Lecture Notes on Mathematics for Economists: A refresher

138 CHAPTER 13 PROBABILITY

13.6 Moments

Henceforth, we assume that X is an absolutely continuous random variable,with density fX . Assume that ∀D ⊆ R such that

RDfX (x) dx exists,

p (X ∈ D) =

ZD

fX (x) dx

where, also for simplicity, the notation p (X ∈ D) replaces p (s ∈ S : X (s) ∈ D).Definition 13.18 Let g : R −→ R and define the random variable g X :S −→ R. The expectation of g X is defined as

E (g (X)) =

Z ∞

−∞g (x) fX (x) dx

whenever the integral exists.

Whenever there is x∗ ∈ R such thatZ x∗

−∞xf (x) dx = −∞Z ∞

x∗xf (x) dx = ∞

we say that E (X) does not exist. The reason is simple:

E (X) =

Z ∞

−∞xf (x) dx

=

Z x∗

−∞xf (x) dx+

Z ∞

x∗xf (x) dx

= −∞+∞will not be defined. Notice, in particular, that even if ∀x∗ ∈ R+Z 0

−x∗|x| fX (x) dx =

Z x∗

0

|x| fX (x) dx ∈ RE (X) may fail to exist. Notice also that ifZ ∞

−∞|x| fX (x) dx ∈ R

then E (X) exists.

Page 146: Lecture Notes on Mathematics for Economists: A refresher

13.6 MOMENTS 139

Theorem 13.33 Let g1, g2 : R −→ R and a, b, c ∈ R. Then,

1. E (ag1 (X) + bg2 (X) + c) = aE (g1 (X)) + bE (g2 (X)) + c

2. If (∀x ∈ R) : g1 (x) ≥ 0, then E (g1 (X)) ≥ 0.3. If (∀x ∈ R) : g1 (x) ≥ g2 (x), then E (g1 (X)) ≥ E (g2 (X)).

4. If (∀x ∈ R) : a ≤ g1 (x) ≤ b, then a ≤ E (g1 (x)) ≤ b

Proof. Left as an exercise.

Theorem 13.34 (Chebychev’s Inequality) Let g : R −→ R+ be suchthat E (g (X)) ∈ R. ∀r > 0,

p (g (X) ≥ r) ≤ E (g (X))

r

Proof. By definition,

E (g (X)) =

Z ∞

−∞g (x) fX (x) dx

≥Z

x∈R:g(x)≥r

g (x) fX (x) dx

≥Z

x∈R:g(x)≥r

rfX (x) dx

= r

Zx∈R:g(x)≥r

fX (x) dx

= rp (g (X) ≥ r)

where the first inequality follows since (∀x ∈ R) : g (x) ≥ 0.Strictly speaking, in the previous proof we needed to argue thatZ

x∈R:g(x)≥rfX (x) dx

exists. For this, it would suffice, for example, that g be continuous.

Page 147: Lecture Notes on Mathematics for Economists: A refresher

140 CHAPTER 13 PROBABILITY

Definition 13.19 Let k ∈ N. the kth (noncentral) moment of X is E¡Xk¢,

whenever it exists.

Definition 13.20 Let k ∈ N and suppose that E (X) exists in R. The kthcentral moment of X is E

³(X −E (X))k

´, whenever it exists.

The first noncentral moment of X is its expectation or mean and itssecond central moment is its variance and is denoted V (X).

Corollary 13.5 Suppose that E (X) and V (X) > 0 exist. ∀t > 0,

p³|X −E (X)| ≥ t

pV (X)

´≤ 1

t2

Proof. Define g : R −→ R+ by

g (x) =(x−E (X))2

V (X)

By Chebychev’s inequality,

p³|X −E (X)| ≥ t

pV (X)

´= p

Ã(X −E (X))2

V (X)≥ t2

!

≤ 1

t2E

Ã(X −E (X))2

V (X)

!=1

t2

Exercise 13.9 Prove the following corollary: if E (X) and V (X) > 0 exist,then.

p³|X −E (X)| < 2

pV (X)

´≥ .75

Notice the implication: the probability that the realization of a random vari-able be at least two standard deviations from its mean is at least 0.75, regard-less of its distribution!

Definition 13.21 The moment generating function of X is MX : R −→ R,defined by MX (t) = E

¡etX¢, whenever the integral exists in R.

Page 148: Lecture Notes on Mathematics for Economists: A refresher

13.6 MOMENTS 141

Theorem 13.35 ∀k ∈ N, is the derivative exists, M (k)X (0) = E

¡Xk¢.

Proof. If the derivative exists,

M 0X (t0) =

∂E¡etX¢

∂t(t0)

=∂R∞−∞ etxfX (x) dx

∂t(t0)

=

Z ∞

−∞

∂etxfX (x)

∂tdx

¯t0

=

Z ∞

−∞

∂etxfX (x)

∂tdx

¯t0

=

Z ∞

−∞xet0xfX (x) dx

Now, suppose that k ∈ N\ 1 and

M(k−1)X (t0) =

Z ∞

−∞xk−1et0xfX (x) dx

Then,

M(k)X (t0) =

∂R∞−∞ xk−1etxfX (x) dx

∂t(t0)

=

Z ∞

−∞

∂xk−1etxfX (x)∂t

dx

¯t0

=

Z ∞

−∞xket0xfX (x) dx

By mathematical induction, it follows that ∀k ∈ N,

M(k)X (t0) =

Z ∞

−∞xket0xfX (x) dx

and hence that

M(k)X (0) =

Z ∞

−∞xkfX (x) dx = E

¡Xk¢

Page 149: Lecture Notes on Mathematics for Economists: A refresher

142 CHAPTER 13 PROBABILITY

Notice that the previous theorem assumes that the derivative exists andreplaces the derivative of an integral by the integral of the derivative, whichamount to replacing the limit of an integral by the integral of a limit. WhenXhas bounded support, this is just fine. In other cases, it suffices to show thatthere exist a random variable with larger absolute value than the integrandand finite integral (in which case one can appeal to a result known as theLebesgue’s Dominated Convergence Theorem).It is important to know that the moment generating function completely

characterizes a random variable’s distribution: ifMX =MX0 then FX = FX0.

Exercise 13.10 (Standard Uniform distribution) Suppose that the dis-tribution function of X is

FX (x) =

0 if x < 0

x if 0 ≤ x ≤ 11 if x > 1

find E (X) and MX. Show that MX is not differentiable at 0.

Exercise 13.11 (Standard Normal distribution) Suppose that the den-sity function of X is

(∀x ∈ R) : fX (x) = 1√2π

e−x2

2

Show that

(∀t ∈ R) : MX (t) = et2

2

E (X) = 0

V (X) = 1

Exercise 13.12 (Exponencial distribution) Suppose that the density func-tion of X is

(∀x ∈ R) : fX (x) = 0 if x < 0

1βe−

xβ if x ≥ 0

where β > 0. Find E (X) and E (X2).Show that MX :³−∞, 1

β

´−→ R is

MX (t) =1

1− βt

Use MX to verify that M 0X (0) = E (X) and M 00

X (0) = E (X2).

Page 150: Lecture Notes on Mathematics for Economists: A refresher

13.7 INDEPENDENCE OF RANDOM VARIABLES 143

13.7 Independence of random variables

Definition 13.22 Let N ∈ N and let (Xn)Nn=1 be a sequence of random

variables. The joint distribution of (Xn)Nn=1 is F(Xn)

Nn=1

: RN −→ [0, 1],definedby

F(Xn)Nn=1(x) = p

³ns ∈ S : (Xn (s))

Nn=1 ≤ x

o´Definition 13.23 Let N ∈ N and let (Xn)

Nn=1 be a sequence of random

variables. (Xn)Nn=1 is said to be absolutely continuous if there exists f(Xn)

Nn=1

:

RN −→ R+ such that¡∀x ∈ RN¢: F(Xn)

Nn=1(x) =

Zv≤x

f(Xn)Nn=1(u) du

in which case f(Xn)Nn=1

is said to be a joint density function for (Xn)Nn=1.

Definition 13.24 Let N ∈ N and let (Xn)Nn=1 be a sequence of absolutely

continuous random variables with densities (fXn)Nn=1. (Xn)

Nn=1 is said to be

independent if

f(Xn)Nn=1

=NYn=1

fXn

The definition above is not totally general in that independence does notreally require absolute continuity. For the purposes of these notes, however,the definition suffices.

Definition 13.25 A sequence (Xn)∞n=1 of random variables is said to be in-

dependent if for every K ∈ N, every finite sequence (Xnk)Kk=1 constructed

with elements of (Xn)∞n=1, is independent.

Definition 13.26 Let N ∈ N and let (Xn)Nn=1 be a sequence of absolutely

continuous random variables. Let g : RN −→ R. The expectation of g³(Xn)

Nn=1

´is defined as

E³g³(Xn)

Nn=1

´´=

Z ∞

−∞g (x) f(Xn)

Nn=1(x) dx

Page 151: Lecture Notes on Mathematics for Economists: A refresher

144 CHAPTER 13 PROBABILITY

Theorem 13.36 Let N ∈ N and let (Xn)Nn=1 be a sequence of random vari-

ables. If (Xn)Nn=1 is independent, then

E

ÃNYn=1

Xn

!=

NYn=1

E (Xn)

Proof. By definition

E

ÃNYn=1

Xn

!=

Z ∞

−∞

ÃNYn=1

xn

!f(Xn)

Nn=1(x) dx

=

Z ∞

−∞

ÃNYn=1

xn

!ÃNYn=1

fXn (xn)

!dx

=

Z ∞

−∞

Z ∞

−∞...

Z ∞

−∞

NYn=1

(xnfXn (xn)) dxN ...dx2dx1

=

Z ∞

−∞

Z ∞

−∞...

N−1Yn=1

(xnfXn (xn))

µZ ∞

−∞xNfXN

(xN) dxN

¶...dx2dx1

= E (XN)

Z ∞

−∞

Z ∞

−∞...

N−1Yn=1

(xnfXn (xn)) dxN−1...dx2dx1

...

= E (XN)E (XN−1) ...E (X2)

Z ∞

−∞x1fX1 (x1) dx1

= E (XN)E (XN−1) ...E (X2)E (X1)

Where the second inequality follows from independence.Strictly speaking, the third equality in the last expression also has to be

justified. This would follow from a result known as Fubini’s theorem.

Exercise 13.13 Prove the following corollary: if (X1,X2) is independent,then

Cov (X1,X2) = E ((X1 −E (X1)) (X2 − E (X2))) = 0

Perhaps because of the previous example, the idea of “no correlation” isoftentimes confused with independence. One must be careful about this: istwo random variables are independent, then their correlation is zero; but theother causality is not true: if X is normal, then X and X2 are uncorrelated,but they certainly are not independent.

Page 152: Lecture Notes on Mathematics for Economists: A refresher

13.8 CONVERGENCE OF RANDOM VARIABLES 145

13.8 Convergence of random variables

There are several concepts of convergence for random variables. We considerthree of them:

Definition 13.27 A sequence (Xn)∞n=1 of random variables converges in prob-

ability to the random variable X if

(∀ε > 0) : limn−→∞

p (|Xn −X| < ε) = 1

in which case we write Xnp−→ X

Definition 13.28 A sequence (Xn)∞n=1 of random variables converge almost

surely to the random variable X if p (limn−→∞Xn = X) = 1, in which casewe write Xn

c.s.−→ X.

Definition 13.29 A sequence (Xn)∞n=1 of random variables converges in dis-

tribution to the random variable X if

(∀x ∈ R : FX is continuous at x) : limn−→∞

FXn (x) = FX (x)

where each FXn is the distribution function of Xn and FX is the one of X.

In this case, we write Xnd−→ X.

It is important to understand the relationship between these three con-cepts, which we now do, albeit in a somewhat informal manner::We first introduce, without proof, two very intuitive results (formally they

follow from Lebesgue’s dominated convergence theorem):

1. If Xnc.s.−→ X and g : R −→ R is bounded, then E (Xn) −→ E (X).

2. If Xnp−→ X and g : R −→ R is bounded, then E (Xn) −→ E (X).

With these two results, we can argue that:

1. If Xnc.s.−→ X then Xn

p−→ X.

2. If Xnp−→ X then Xn

d−→ X.

Suppose the case in which all Xn and X are continuous.

Page 153: Lecture Notes on Mathematics for Economists: A refresher

146 CHAPTER 13 PROBABILITY

1. For the first result, fix ε > 0 and define I≥ε : R −→ 0, 1 by

I≥ε (x) =

1 if x ≥ ε

0 if x < ε

Since Xnc.s.−→ X, it follows that I≥ε (|Xn −X|) c.s.−→ 0, which guaran-

tees, by the first fact, that

E (I≥ε (|Xn −X|)) −→ 0

or, equivalently,p (|Xn −X| ≥ ε) −→ 0

which yields the result.

2. For the second result, fix x∗ ∈ R, a point of continuity of FX and defineI≤x∗ : R −→ 0, 1 by

I≤x∗ (x) =

1 if x ≤ x∗

0 if x > x∗

By the second fact, E (I≤x∗ (Xn)) −→ E (I≤x∗ (X)), which is equivalentto

p (Xn ≤ x∗) −→ p (X ≤ x∗)

Now, neither one of the opposite causalities is true. When the limitvariable is constant, convergence in distribution implies convergence in prob-ability:

Theorem 13.37 Let (Xn)∞n=1 be a sequence of continuous random variables

and let x∗ ∈ R be such that, Xnd−→ x∗. Then, Xn

p−→ x∗.1

Proof. Denote by F the distribution function of the random variable con-stant in x∗.Fix ε > 0. By definition,

1One must understant what the statement Xnd−→ x∗ means: what it says is that

Xnd−→ X where X : S −→ R is the random variable constant in x∗.

Page 154: Lecture Notes on Mathematics for Economists: A refresher

13.8 CONVERGENCE OF RANDOM VARIABLES 147

p (|Xn − x∗| ≥ ε) = p (Xn ≤ x∗ − ε) + p (Xn ≥ x∗ + ε)

= FXn (x∗ − ε) + 1− lim

h↑0FXn (x

∗ + ε+ h)

= FXn (x∗ − ε) + 1− FXn (x

∗ + ε)

−→ F (x∗ − ε) + 1− F (x∗ + ε)

= 0 + 1− 1= 0

where the third equality follows because FXn is continuous and convergentbecause x∗ + ε is a point of continuity of F , since

F (x) =

1 if x ≥ x∗

0 if x < x∗

However, if the limit variable is not a constant, convergence in distributiondoes not ensure convergence in probability and, in any case, convergencein probability does not ensure almost sure convergence, as shown in thefollowing example.

Example 13.2 Suppose S = [0, 1], endowed with the uniform measure. Forevery interval [a, b], define I[a,b] : R −→ 0, 1 by

I[a,b] (s) =

1 if s ∈ [a, b]0 if s /∈ [a, b]Consider the sequence (Xn)

∞n=1 of random variables, defined as: (∀s ∈ S) :

X1 (s) = s+ I[0,1] (s)

X2 (s) = s+ I[0,1/2] (s)

X3 (s) = s+ I[1/2,1] (s)

X4 (s) = s+ I[0,1/3] (s)

X5 (s) = s+ I[1/3,2/3] (s)

X6 (s) = s+ I[2/3,1] (s)

X7 (s) = s+ I[0,1/4] (s)...

Page 155: Lecture Notes on Mathematics for Economists: A refresher

148 CHAPTER 13 PROBABILITY

and define X by (∀s ∈ S) : X (s) = s. We shall show that Xnp−→ X, but it

is not true that Xnc.s.−→ X.

• To see that Xnp−→ X, notice that the interval in which Xn 6= X

gets smaller and smaller as n grows and, since S is endowed with theuniform measure, if ε < 1

p (|Xn −X| > ε) −→ 0

• Now, to see that it is not true that Xnc.s.−→ X, simply notice that there

is no s ∈ [0, 1] for which

Xn (s) −→ s = X (s)

since, in fact, for no s ∈ [0, 1] is Xn (s) converging, so,

p³lim

n−→∞Xn = X

´= 0

Example 13.3 Suppose a sequence (Xn : S −→ R++)∞n=1 such that Xnp−→

x∗ ∈ R++. Let ε ∈¡0,√x∗¤and γ ∈ (0, x∗] be such that ε = √x∗−√x∗ − γ.

Notice that

|x− x∗| < γ ⇐⇒ x∗ − γ < x < x∗ + γ

⇐⇒ √x∗ − γ <√x <√x∗ + γ

⇐⇒ √x∗ − γ −√x∗ < √x−√x∗ < √x∗ + γ −√x∗=⇒

¯√x−√x∗

¯<√x∗ −√x∗ − γ = ε

which implies that

p³¯p

Xn −√x∗¯< ε´≥ p (|Xn − x∗| < γ) −→ 1

Now, if ε >√x∗, then, p

¡¯√Xn −

√x∗¯< ε¢ ≥ p

¡¯√Xn −

√x∗¯<√x∗¢ −→

1, which implies that√Xn

p−→ √x∗.

Exercise 13.14 Consider a sequence (Xn : S −→ [x,∞))∞n=1, where x ∈ R++,such that Xn

p−→ x∗ ∈ R. Show that x∗ ∈ [x,∞) and x∗/Xnp−→ 1.

Page 156: Lecture Notes on Mathematics for Economists: A refresher

13.9 THE (WEAK) LAW OF LARGE NUMBERS 149

13.9 The (weak) law of large numbers

Definition 13.30 A sequence (Xn)∞n=1 of random variables is said to be i.i.d.

if it is independent and for every n, n0 ∈ N,(∀D ⊆ R) : p (Xn ∈ D) = p (Xn0 ∈ D)

Theorem 13.38 (The Weak Law of Large Numbers) Let (Xn)∞n=1 be a

sequence of random variables. Define the sequence of random variables¡Xn

¢∞n=1

=

Ã1

n

nXk=1

Xk

!∞n=1

If (Xn)∞n=1 is i.i.d. and for every n ∈ N E (Xn) = µ ∈ R and V (Xn) = σ2 ∈

R++, then Xnp−→ µ

Notice that the implication of the theorem is that Xn converges in prob-ability to the random variable constant at µ.Proof. Given that (Xn)

∞n=1 is i.i.d. E

¡Xn

¢= µ, V

¡Xn

¢= σ2/n (showing

this is left as an exercise). Now, by corollary 13.5, for ε > 0

p¡¯Xn − µ

¯ ≥ ε¢ ≤ σ2

nε2

So

limn−→∞

p¡¯Xn − µ

¯ ≥ ε¢ ≤ lim

n−→∞σ2

nε2= 0

Exercise 13.15 Let (Xn)∞n=1 be a sequence of random variables. Define the

sequences ¡Xn

¢∞n=1

=

Ã1

n

nXk=1

Xk

!∞n=1

(Vn)∞n=1 =

Ã1

n− 1nX

k=1

¡Xk −Xn

¢2!∞n=1

Show that ∀n ∈ N,

Xn+1 =Xn+1 + nXn

n+ 1

nVn+1 = (n− 1)Vn +µ

n

n+ 1

¶¡Xn+1 −Xn

¢2

Page 157: Lecture Notes on Mathematics for Economists: A refresher

150 CHAPTER 13 PROBABILITY

Exercise 13.16 Prove the following result: let Xbe an absolutely continuousrandom variable, Ω ⊆ R and γ = p (X ∈ Ω) ∈ (0, 1). Consider the followingexperiment: “n ∈ N realizations of X are taken independently”. Let Gn be therelative frequency with which a realization in Ω is obtained in the experiment.Then, Gn

p−→ γ.

The “strong” law of large number gives a.s. convergence. In econometrics,the weak law usually suffices.

13.10 Central limit theorem

Theorem 13.39 (The Central Limit Theorem, CLT) Let (Xn)∞n=1 be a

sequence of random variables. Define the sequence of random variables

¡Xn

¢∞n=1

=

Ã1

n

nXk=1

Xk

!∞n=1

If (Xn)∞n=1 is i.i.d. and for every n ∈ N, E (Xn) = µ ∈ R, V (Xn) = σ2 ∈

R++ and MXn =MX is defined in an open neighborhood of 0, then

√nXn − µ

σd−→

xZ−∞

1√2π

e−x2

2

Proof. Define the sequence (Yn)∞n=1 =

¡Xn−µ

σ

¢∞n=1

and denote by MY themoment generating function common to all the Ynvariables (which we cando, because (Xn)

∞n=1 is i.i.d.). Note that E (Yn) = 0 and V (Yn) = 1, and

that

1√n

nXk=1

Yk =1√n

nXk=1

Xk − µ

σ

=

√n

σ

nXk=1

µXk

n− µ

n

¶=

√n

σ

¡Xn − µ

¢

Page 158: Lecture Notes on Mathematics for Economists: A refresher

13.10 CENTRAL LIMIT THEOREM 151

so

M√nσ (Xn−µ) (t) = M 1√

n

Pnk=1 Yk

(t)

= E³et 1√

n

Pnk=1 Yk

´= E

ÃnY

k=1

et 1√

nYk

!

=nY

k=1

E³et 1√

nYk´

=nY

k=1

MY

µt1√n

¶= MY

µt1√n

¶n

Taking a Taylor expansion to the RHS around 0,

MY

µt1√n

¶=MY (0) +M 0

Y (0)t√n+1

2M 00

Y (0)t2

n+R3

µt√n

¶where, by Taylor’s theorem,

limn−→∞

µt√n

¶−2R3µ

t√n

¶= 0

By construction, M 0Y (0) = E (Yn) = 0 and M 00

Y (0) = E (Y 2n ) = V (Yn) = 1,

which implies

MY

µt1√n

¶= 1 +

1

2

t2

n+R3

µt√n

¶It then follows that

limn−→∞

MY

µt1√n

¶n

= limn−→∞

µ1 +

1

2

t2

n+R3

µt√n

¶¶n

= limn−→∞

µ1 +

1

n

µ1

2t2 + nR3

µt√n

¶¶¶n

= et2

2

Page 159: Lecture Notes on Mathematics for Economists: A refresher

152 CHAPTER 13 PROBABILITY

where the second equality follows since

limn−→∞

µt√n

¶−2R3µ

t√n

¶= 0 =⇒ lim

n−→∞nR3

µt√n

¶= 0

The latter suffices, since et2

2 is the moment generating function of

xZ−∞

1√2πe−

x2

2 .

Exercise 13.17 Let (Xn)∞n=1 be a sequence of random variables. Define the

sequence ¡Xn

¢∞n=1

=

Ã1

n

nXk=1

Xk

!∞n=1

of random variables. If (Xn)∞n=1 is i.i.d. and for every n ∈ N E (Xn) = µ ∈

R, V (Xn) = σ2 ∈ R++, so

E

Ã√n¡Xn − µ

¢σ

!= 0

V

Ã√n¡Xn − µ

¢σ

!= 1

Exercise 13.18 How can both the law of large numbers and the central limittheorem be true? That is, if the law says that Xn converges in probability toa constant(µ), and convergence in probability implies convergence in distrib-ution, then how can Xn also converge in distribution to the standard normal?

Please take a good look at what the CLT implies. In particular, it doesnot imply that every “large” sample of realizations of a random variable“tends” to be distributed normally!

Page 160: Lecture Notes on Mathematics for Economists: A refresher

Bibliography

[1] Apostol, T., 1969, Calculus I and II, Wiley.

[2] Casella, G. and Berger, R., 1990, Statistical Inference, Duxbury.

[3] Folland, G., 2002, Advanced Calculus, Prentice-Hall.

[4] Intriligator, M., 1971, Mathematical Optimization and Economic Theory,Prentice-Hall.

[5] Moore, J., 1999, Mathematical Methods in Economic Theory I and II,Springer.

[6] Royden, H.L., 1988, Real Analysis, Prentice-Hall.

[7] Rudin, W., 1976, Principles of Mathematical Analysis, McGraw Hill.

[8] Sundaram, R., 1996, A First Course in Optimization Theory, Cambridge.

153

Page 161: Lecture Notes on Mathematics for Economists: A refresher