Einstein Institute of Mathematics @ The Hebrew University ...razk/Teaching/LectureNotes/AdvancedCalculus… · Contents 1 Metric Spaces 1 1.1 Basic deﬁnitions . . . . . . .

Lecture Notes in Advanced Calculus 1 (80315)

Raz KupfermanInstitute of MathematicsThe Hebrew University

February 7, 2007

2

Contents

1 Metric Spaces 11.1 Basic definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Point set topology in metric spaces . . . . . . . . . . . . . . . . . 5

1.3 Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.4 Continuous functions on metric spaces . . . . . . . . . . . . . . . 19

1.5 Convergence of sequences of functions . . . . . . . . . . . . . . . 27

1.6 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

1.7 Contractive mapping theorem . . . . . . . . . . . . . . . . . . . . 42

1.8 Baire’s category theorem . . . . . . . . . . . . . . . . . . . . . . 45

2 Paths and integration along paths 492.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

2.2 Differentiable paths in Rn . . . . . . . . . . . . . . . . . . . . . . 54

2.3 Homotopy and connectedness . . . . . . . . . . . . . . . . . . . 58

2.4 Sets of measure zero and the Peano curve . . . . . . . . . . . . . 63

2.5 Integration of a vector field along a path . . . . . . . . . . . . . . 67

2.6 Conservative fields . . . . . . . . . . . . . . . . . . . . . . . . . 71

3 Differential Calculus in Rn 793.1 Normed spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

3.2 Linear mappings between normed spaces . . . . . . . . . . . . . 81

3.3 Differentiability and derivatives . . . . . . . . . . . . . . . . . . . 88

ii CONTENTS

3.4 Multivariate mean value and Taylor theorems . . . . . . . . . . . 104

3.5 Minima and maxima . . . . . . . . . . . . . . . . . . . . . . . . 109

3.6 The inverse function theorem . . . . . . . . . . . . . . . . . . . . 111

3.7 The implicit function theorem . . . . . . . . . . . . . . . . . . . 118

3.8 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . 121

4 Integration in Rn 127

Chapter 1

Metric Spaces

Lindenstrauss, Chapters IV and XI.

1.1 Basic definitions

Discussion: say something about abstraction and generalization of concepts we’vemet in previous courses.

Definition 1.1 (Metric space) A metric space is a pair (X, d), where X is a non-empty set, and d : X × X → R+ is a distance function, or, a metric. The metricd assigns to every pair of points x, y ∈ X a non-negative number, d(x, y), whichwe will call the distance between x and y. The metric has to satisfy three definingproperties:

À Symmetry: d(x, y) = d(y, x).

Á Positivity: d(x, y) ≥ 0 with equality iff x = y.

Â The triangle inequality, d(x, z) ≤ d(x, y) + d(y, z).

Comment: It follows by induction that for every finite set of points (xi)ni=1,

d(x1, xn) ≤ d(x1, x2) + d(x2, x3) + · · · + d(xn−1, xn).

2 Chapter 1

Comment: Metric spaces were first introduced by Maurice Frechet in 1906, uni-fying work on function spaces by Cantor, Volterra, Arzela, Hadamard, Ascoli,and other. Topological spaces were first introduced by Felix Haussdorf in 1914;their current formulation is a slight generalization from 1992 by Kazimierz Kura-towski.

Examples:

• Both R and C are metric spaces when endowed with the distance functiond(x, y) = |x − y|. From now on, we will often refer to the metric space R,without explicit mention of the metric.

• Let F : R → R be any strictly monotonic function (say increasing). Then,R endowed with the distance function

d(x, y) = |F(x) − F(y)|

is a metric space. We only need to verify the triangle inequality:

d(x, z) = |F(x) − F(z)| ≤ |F(x) − F(y)| + |F(y) − F(z)| = d(x, y) + d(y, x).

Note that the strict monotonicity is only needed to ensure the positivity ofd.

• Rn endowed with the Euclidean distance,

d(x, y) =

√√n∑

i=1

(xi − yi)2

is a metric space. Unless otherwise specified, Rn will always be assumed tobe endowed with the Euclidean metric.

• The surface of earth is a metric space, with d being the (shortest) distancealong great circles.

• Any set X can be endowed with the discrete metric,

d(x, y) =

0 x = y1 x , y.

This is a particularly boring metric, since it provides no information aboutthe structure of the space. We will often use it, however, as a clarifyingexample.

Metric Spaces 3

• Let (X, d) be an arbitrary metric space, and let Y ⊂ X. Then the set Y withthe function d restricted to Y is a metric space. We will call d|Y the metricon Y induced by the metric on X. A subspace of a metric space alwaysrefers to a subset endowed with the induced metric.

N

. Exercise 1.1 (Ex. 3 in L.) Prove that the set of infinite sequences of realnumbers, endowed with the function

d((xn), (yn)) =∞∑

n=1

12n

|xn − yn|

1 + |xn − yn|

is a metric space.

Recall the definition of a normed space: a vector space X endowed with a func-tion ‖ · ‖ : X → R (the norm), subject the three conditions of positivity, homo-geneity, and the triangle inequality. Obviously, every normed space is a metricspace, if we define

d(x, y) = ‖x − y‖.

Note, however, that normed spaces carry properties that are not pertinent to allmetric spaces. Metric spaces need not be vector spaces. In addition, vector spaces,as metric spaces, are translational invariant,

d(x − z, y − z) = d(x, y),

and homogeneous,d(λx, λy) = |λ| d(x, y),

where λ is a scalar.

Examples:

• Both R and C are normed space, with ‖x‖ = |x|, and the resulting metric isthe same as the one before.

• For X = Rn and 1 < p < ∞, the function ‖ · ‖p : X → R defined by

‖x‖p =

n∑i=1

|xi|p

1/p

is a norm, hence defines a metric. We give this family of metric spaces aname, `np.

4 Chapter 1

• The function ‖ · ‖∞ : Rn → R defined by

‖x‖∞ = max1≤i≤n|xi|.

is a norm. The corresponding metric space is denoted by `n∞.

• The set of infinite sequences (xi) for which

‖x‖p =

∞∑i=1

|xi|p

1/p

is finite is an example of an infinite-dimensional normed space (hence aninfinite-dimensional metric space). We denote this space by `p.

• Consider C[0, 1], the set of continuous real-valued functions on [0, 1]. Thefunction ‖ · ‖∞ : C[0, 1]→ R defined by

‖ f ‖∞ = max0≤x≤1

| f (x)|

is a norm, therefore C[0, 1] is a metric space if endowed with the distancefunction

d( f , g) = max0≤x≤1

| f (x) − g(x)|.

N

. Exercise 1.2 Show that the functions ‖ · ‖p : Rn → R, defined by

‖x‖p =

n∑i=1

|xi|p

1/p

1 < p < ∞

satisfy the definition of a norm. (Hint: the triangle inequality is proved throughthe Young inequality, followed by Holder’s inequality, followed by Minkowski’sinequality. Prove all three inequalities.)

. Exercise 1.3 (Ex. 4 in L.) Let x = (x1, x2, . . . , xn). Prove that the function

ϕ(p) = ‖x‖p

defined for 1 < p < ∞ is monotonically decreasing, and that ϕ(p) → ‖x‖∞ asp→ ∞.

Metric Spaces 5

1.2 Point set topology in metric spaces

Throughout this section we will assume a metric space (X, d).

r

aX

Definition 1.2 A sphere of radius r centered at a ∈ X is the set of points,

{x ∈ X : d(x, a) = r} .

(Note that this set may be empty). An open ball of radius r centered at a is the set

B0(a, r) = {x ∈ X : d(x, a) < r} ,

whereas the closed ball is defined by

B(a, r) = {x ∈ X : d(x, a) ≤ r} .

Comment: In R3 these definitions coincide with our intuitive pictures of spheresand balls. Note, however, that for a space endowed with the discrete metric wealways have

B0(a, r) =

{a} r ≤ 1X r > 1.

. Exercise 1.4 Draw the unit balls in R2 for the metrics induced by the 1-, 2-,and∞-norms.

A

A

A

A bounded set

An open set

Definition 1.3 A set A ∈ X is called bounded if it is enclosed in some ball,it is called open if every point x ∈ A is the center of an open ball containedin A (note that we can replace “open” by “closed”, since if B0(a, r) ⊆ A, thenB(a, r/2) ⊆ A).

Comment: By definition, X itself is always an open set. By convention, the emptyset ∅ is always considered to be open as well.

Proposition 1.1 In every metric space (X, d),

À The intersection of any finite number of open sets is open.Á The union of any collection (not necessarily countable) of open sets is open.Â An open ball is an open set.

6 Chapter 1

Ã A set is open iff it is the union of open balls.

Proof :

À Let (Ai)ni=1 be a finite collection of open subsets of X and set A = ∩n

i=1Ai.If A = ∅ then we are done. Otherwise, for every x ∈ A we need to showthat there exists an r > 0 such that B0(x, r) ⊆ A. Since x belongs to each ofthe Ai, which are all open in X, there exists a collection of positive numbers(ri)n

i=1 such thatB0(x, ri) ⊆ Ai, i = 1, . . . , n.

Take, for example, r = minni=1 ri, then

B0(x, r) ⊆ Ai, i = 1, . . . , n,

i.e., B0(x, r) ⊆ A.

Á Every point x in the union A = ∪αAα belongs to at least one open set Aβ,therefore there exists an r > 0 such that B0(x, r) ⊆ Aβ ⊆ A.

a

x

r

d(a,x)

Â Let B0(a, r) be an open ball, and let x ∈ B0(a, r). We claim that the open ballB0(x, r − d(a, x)) in contained in B0(a, r). Indeed, if y ∈ B0(x, r − d(a, x)),then by definition,

d(y, x) < r − d(a, x),

i.e.,d(a, y) ≤ d(a, x) + d(x, y) < r,

so y ∈ B0(x, r − d(a, x)) implies y ∈ B0(a, r).

Ã We have already seen that every open ball is open and that any union ofopen sets is open. Thus, it only remains to show that every open set can berepresented as a union of open balls. By definition, to every point x in aopen set A corresponds an open ball B0(x, r(x)) included in A. Clearly,

A =⋃x∈A

x ⊆⋃x∈A

B0(x, r(x)) ⊆ A.

n(2 hrs)

Metric Spaces 7

Comment: An infinite intersection of open sets is not necessarily open. Takefor example X = R with the standard metric. Then for every integer n the set(−1/n, 1/n) is open. On the other hand,

∞⋂n=1

(−

1n,

1n

)= {0}

is not open.

Definition 1.4 A set A ⊆ X is called closed it is complement Ac is open.

The following proposition is a direct consequence of the previous proposition,using de Morgan’s laws,

Proposition 1.2 À Every finite union of closed sets is closed.Á Every arbitrary intersection of closed sets is closed.

. Exercise 1.5 Show, by example, that it is not generally true that an infiniteunion of closed sets is closed.

Comments:

• X and ∅ are both open and closed.

• For every x ∈ X the singleton {x} is a closed set. Every y , x is the centerof an open ball B(y, d(x, y)) ⊂ {x}c.

• By the previous proposition, every finite collection of points is closed aswell.

• Every closed ball B(a, r) is closed. Indeed, if y ∈ Bc(a, r) then B0(y, d(y, a)−r) ⊆ Bc(a, r).

• In a set endowed with the discrete metric, every subset is both open andclosed. To prove it, it is sufficient to show that every singleton is both openand closed. We have already seen that singletons are always closed, regard-less of the metric. They are also open since B0(x, 1/2) = {x}.

• A metric space endowed with the collection of its open subsets is an instanceof a topological space (a set with a collection of subsets, the open sets, thatare closed under finite intersection, arbitrary union, and contain the emptyset and the whole space).

8 Chapter 1

Definition 1.5 (Limit of a sequence) A sequence (xn) in a metric space (X, d) iscalled convergent if there exists a point x ∈ X such that

limn→∞

d(xn, x) = 0.

(For every ε > 0 there exists an Nε such that for every n > Nε , d(xn, x) < ε.)Thepoint x is called the limit of the sequence (xn), and we write

limn→∞

xn = x or xn → x.

Comments:

• If a sequence converges then its limit is unique. Indeed, suppose that asequence (xn) has two limits x and y. Then, for every ε > 0 there exists a(sufficiently large) n such that

d(xn, x) < ε and d(xn, y) < ε,

from which we conclude that for every ε > 0,

d(x, y) ≤ d(x, xn) + d(xn, y) < 2ε,

i.e., d(x, y) = 0.

• This notion of convergence coincides with that of convergence in topologi-cal spaces: (xn) converges to x if for every open set A that contains x thereexists an NA such that xn ∈ A for all n > NA (the sequence is eventuallyconfined to every neighborhood of x).

. Exercise 1.6 Let (X, ρ) be a metric space and define for every x, y ∈ X,

d(x, y) =ρ(x, y)

1 + ρ(x, y).

À Show that d is a metric on X.

Á Show that convergence with respect to this new metric d is equivalent toconvergence with respect to the original metric ρ. That is, prove that asequence (xn) is convergent in (X, d) if and only if it is convergent in (X, ρ).

Metric Spaces 9

. Exercise 1.7 Let X = {0, 1}N (the space of infinite binary sequences), anddefine for every two sequences (xn), (yn) ∈ X,

d1((xn), (yn)) =∞∑

n=1

2−n|xn − yn|

d2((xn), (yn)) = 2−min(i|xi,yi).

À Show that these are indeed metrics.Á Show that these two metrics are equivalent.

A

x

Proposition 1.3 A subset A ⊆ X is closed iff the limit of every convergent se-quence of points (xn) is also in A. That is, A is closed iff for every (xn) ⊂ A suchthat there exists an x ∈ X for which xn → x, x ∈ A.

Proof : Suppose first that A is closed (i.e., its complement is open), and let (xn) bea sequence in A with limit x (a priori, not necessarily in A). We need to show thatx ∈ A, or that x < Ac. If, by contradiction, x ∈ Ac, then it is the center of an openball B0(x, a) ⊂ Ac. This implies that d(xn, x) > a for all n, in contradiction with xbeing the limit of xn.

Suppose then that A has the property that every convergent sequence in A has itslimit in A. We need to show that Ac is open. If this weren’t the case, there wouldexists a point x < A such that all the open balls B0(x, 1/n) have a non-emptyintersection with A. For every n take a point xn ∈ A ∩ B0(x, 1/n). We thus obtaina sequence in A that converges to x < A, hence, a contradiction. n

Definition 1.6 Let A ⊂ X be a set in a metric space. Its interior, A◦, is the unionof all open subsets of A (it is the largest open subset of A). Its closure, A, isthe intersection of all closed sets that include A (it is the smallest closed set thatincludes A).

Comment: Obviously, A◦ is open and A is closed.

Example: Let X = R and A = Q. Then, A◦ = ∅ and A = R. N

. Exercise 1.8 Prove that for every A ⊂ X

(A◦)c = Ac.

10 Chapter 1

Proposition 1.4 The closure of a set A is the set of points that are the limit ofsome sequence in A,

A = {x ∈ X : ∃(xn) ⊂ A such that xn → x} .

Proof : Define

B = {x ∈ X : ∃(xn) ⊂ A such that xn → x} .

We need to show that both A ⊆ B and B ⊆ A.

(1) We first show that B ⊆ A. Let x ∈ B. Then it is the limit of a sequence (xn) ⊂ A.Since A ⊂ A, this sequence is also a sequence in A. But A is closed, hence thelimit x must be in A, i.e., B ⊆ A.

(2) We then show that A ⊆ B. We first argue that B is closed; indeed, every pointin its complement must have an open neighborhood that does not intersect A, i.e.,a neighborhood that does not intersect B. On the other hand, every point in A isalso in B (the trivial sequence). Thus, B is a closed set including A, and thereforeA ⊆ B. n

Definition 1.7 The boundary ∂A of a set A is defined as

∂A = A \ A◦.

Comment: The boundary is always closed since it is the intersection of two closedsets,

∂A = A ∩ (A◦)c

. Exercise 1.9 (Ex. 16 in L.) Let A, B be subsets of a metric space (x, d). Provethat

À A ∪ B = A ∪ B.Á A ∩ B ⊂ A ∩ B.Â If A is open then (∂A)◦ = ∅.Ã A is closed iff ∂A ⊂ A.Ä A is open iff ∂A ∩ A = ∅.

Metric Spaces 11

. Exercise 1.10 Let A be a set in a metric space (X, d). Show that its boundary∂A coincides with the set of points x, such that every open ball with center at xintersects both A and Ac.

!A

A

Consider now a metric space (X, d). Any Y ⊂ X is a metric space with the inducedmetric. Note, however, that a set A ⊂ Y can be open in the metric space (Y, d), butclosed in the metric space (X, d) (e.g., the set Y itself is always open in (Y, d), butcould be closed in (X, d)).

Proposition 1.5 Let (Y, d) be a subspace of the metric space (X, d). A set A ⊂ Yis open in (Y, d) iff it is the intersection of Y and a set that is open in (X, d). Thesame applies if we replace “open” by “closed”.

Proof : (1) Let A ⊂ Y be open in (Y, d). we can express it as follows,

A = ∪y∈Y {z ∈ Y : d(z, y) < r(y)}= ∪y∈YY ∩ {z ∈ X : d(z, y) < r(y)}

= Y ∩(∪y∈Y {z ∈ X : d(z, y) < r(y)}

).

(2) Let A ⊂ X be open in (X, d) and set B = A∩Y . We need to show that B is openin (Y, d). Take y ∈ B ⊂ A. Because A is open, there exists an r such that

{z ∈ X : d(z, y) < r} ⊆ A.

It follows that{z ∈ Y : d(z, y) < r} ⊆ A ∩ Y = B,

hence B is open in (Y, d). n (4 hrs)

. Exercise 1.11 (Ex. 5 in L.) A metric space (X, d) is called separable if thereexists a countable set {xn} ⊂ X such that

X = {xn}.

Show that every subspace (Y, d) of a separable metric space (X, d) is separable.

. Exercise 1.12 (Ex. 6 in L.) (a) Prove that in a normed space,

(B(a, r))◦ = B0(a, r) and B0(a, r) = B(a, r).

(b) Prove that this is not necessarily true in a general metric space.

12 Chapter 1

. Exercise 1.13 (Ex. 12 in L.) A point a in a subset A of a metric space X iscalled an accumulation point of A if in any open ball centered at a there existsat least one point of A other than a. Prove that a is an accumulation point of A iffthere exists a sequence of distinct points in A that converges to a.

. Exercise 1.14 (Ex. 18 in L.) Prove that the only sets in R that are both openand closed are R and ∅.

. Exercise 1.15 (? Ex. 26 in L.) A point x ∈ A ⊂ X is called isolated in A ifthere exists a ball B(x, r) such that B(x, r) ∩ A = {x}. Prove that if A ⊂ R is anon-empty, countable and closed, then it contains at least one isolated point.

. Exercise 1.16 (? Ex. 27 in L.) Prove that `∞, the space of infinite sequencewith the metric induced by the norm

‖x‖∞ = supn|xn|

is not separable. Hint: use Cantor’s diagonalization technique.

1.3 Compactness

Definition 1.8 (Compactness) A metric space (X, d) is called compact if everysequence (xn) has a convergent subsequence. A subset K ⊂ X is called com-pact if the induced subspace (K, d) is compact (i.e., if every sequence in K has asubsequence that converges to a limit in K).

Comment: Note that the compactness of a set is a property pertinent only to theset, whereas openness/closedness is property of a set in relation with the entirespace.

. Exercise 1.17 (Ex. 17 in L.) Let (X, d) be a metric space and (xn) a sequencethat converges to a limit x. Prove that the set x ∪ {xn} is compact.

Proposition 1.6 A finite union of compact sets is compact.

Proof : Given a sequence (xn) ⊂ ∪ni=1Ki = A, Ki compact, it must have an infinite

subsequence in at least one of the Ki’s. Since this Ki is compact, there is a sub-subsequence that converges in Ki hence in A. n

Metric Spaces 13

Examples:

• Every metric space that has a finite number of elements is compact.

• A set in a discrete space is compact iff it is finite (if it is infinite, a sequencethat never repeats itself has no converging subsequence).

• The Bolzano-Weierstraß theorem implies that a closed segment on the lineis compact.

N

Proposition 1.7 A compact set in a metric space is both bounded and closed.

Proof : Let K be a compact set in a metric space (X, d).

Closed By definition, every sequence (xn) ⊂ K has a subsequence that convergesto a point in K. In particular, if (xn) is a sequence that converges to a point x ∈ X,the point x must be in K. It follows from Proposition 1.3 that K is closed.

a

x2

x1

Bounded Suppose, by contradiction, that K was not bounded. Given a pointa ∈ X, no open ball B0(a, n) contains all of K. Thus, we can find a point

x1 ∈ K such that d(x1, a) > 1.

Then, we can find a point

x2 ∈ K such that d(x2, a) > 1 + d(x1, a),

and it follows that

d(x2, x1) + d(x1, a) ≥ d(x2, a) > 1 + d(x1, a),

i.e., d(x2, x1) > 1. We can further find a point

x3 ∈ K such that d(x3, a) > 1 + d(x2, a),

whose distance from both x1 and x2 is greater than 1. We can thus generate asequence (xn) ⊂ K, which has no converging subsequence, in contradiction withK being compact. n

14 Chapter 1

Comment: People often identify “compact” with “bounded and closed”. In gen-eral, the implication is only one-directional, although we will soon see situationsin which there is indeed a correspondence between the two. A simple counterexample is the segment (0, 1), which is closed (as a space) and bounded, but notcompact.

Proposition 1.8 If (X, d) is a compact metric space, then every closed subset iscompact.

Proof : This is quite obvious. Let (xn) ⊂ K ⊆ X. Since X is compact, (xn) has asubsequence (xnk) that converges to some limit x ∈ X. But because K is closed,this limit must be in K, hence K is compact. n

Corollary 1.1 In a compact metric space closedness and compactness are thesame.

Theorem 1.1 (Bolzano-Weirestraß in Rm) In Rm (endowed with the Euclideanmetric) a set is compact iff it is closed and bounded.

Proof : We only need to show that closedness and boundedness imply compact-ness. Let K ⊂ Rm be closed and bounded. Thus, there exists an M > 0 suchthat

maxi=1,...,n

|xi| < M ∀x ∈ K.

Let now (xn) = (x1n, . . . , x

mn ) be a sequence in K. Consider the sequence formed by

the first component, (x1n). By the Bolzano-Weierstraß theorem, (xn) has a subse-

quence such that the first component has a limit x1. Consider now the second com-ponent: the subsequence has a sub-subsequence such that the second componentconverges to a limit x2. We repeat this procedure n times, until we obtain a sub-sequence which converges to a limit x = (x1, . . . , xm) component-by-component.Since K is closed, this limit is in K, hence K is compact. n

. Exercise 1.18 (Ex. 7 in L.) Prove that in a metric space in which all the closedballs are compact, the compact sets coincide with the sets that are closed andbounded.

A

a

Definition 1.9 Let (X, d) be a metric space with A, B ⊂ X and a ∈ X. Then wedefine the distance between a point and a set, and the distance between two sets:

d(a, A) = infx∈A

d(a, x) and d(A, B) = infx∈A,y∈B

d(x, y).

Metric Spaces 15

Proposition 1.9 Let A, B ⊂ X be nonempty, closed disjoint sets, such that at leastone of them is compact. Then d(A, B) > 0.

Comment: A “counter-example” (an example that highlights the importance ofthe conditions): let X = R2 with A = {(x, 1/x) : x > 0} and B = {(x, 0) : x ≥ 0}.The sets A, B are non-empty, disjoint and closed (why?). Yet, d(A, B) = 0. Thereis no contradiction because neither A nor B is compact.

Proof : Suppose w.l.o.g. that A were compact, i.e., every sequence in A has a sub-sequence that converges to a point in A. Assume, by contradiction, that d(A, B) =0. Then, there exists sequences (xn) ⊂ A, (yn) ⊂ B, such that d(xn, yn) → 0. Thesequence (xn) has a subsequence (xnk) with limit x ∈ A, Now,

d(x, ynk) ≤ d(x, xnk) + d(xnk , ynk)→ 0,

from which follows at once that ynk → x. But since B is closed, x ∈ B, contradict-ing the disjointness of A and B. n

AK

d!K,Ac"Corollary 1.2 If A is open in (X, d) and K ⊆ A is compact, then Ac is closed anddisjoint from K, from which follows that d(K, Ac) > 0.

Definition 1.10 Let (X, d) be a metric space. A covering of X is a collection ofsubsets {Aα}, whose union coincides with X. A covering is called a finite coveringif the number of sets is finite. It is called an open covering if all the Aα are opensets (in X).

Let {Aα} be a covering of X. One may ask whether we could still cover X witha smaller collection of subsets. Consider for example the case where X = R; thecountable collection of subsets

{(n − 1, n + 1) : n ∈ Z}

is a covering of R. The omission of any of those (n − 1, n + 1) would fail to coverthe point x = n. As the following theorem shows, compactness can be definedalternatively as “every open covering can be reduced to a finite sub-covering”. (6 hrs)

Theorem 1.2 (Heine-Borel) The following three properties of a metric space (X, d)are equivalent:

À X is compact.

16 Chapter 1

Á Every open covering of X has a finite sub-covering.Â Every collection of closed sets {Fα}α∈I has the property that if the intersec-

tion of every finite sub-collection is not empty, then the intersection of theentire collection is not empty.

Proof :

• The equivalence of properties and ® follows from the identity,

(∩αFα)c = ∪αFcα.

Note that the Fcα are a collection of open sets. Now,

∀ {Fα}α∈I :⋃α∈I

Fcα = X ⇒ exists a finite J ⊂ I s.t.

⋃α∈J

Fcα = X

is equivalent to

∀ {Fα}α∈I :

⋃α∈I

Fcα

c

= ∅ ⇒ exists a finite J ⊂ I s.t.

⋃α∈J

Fcα

c

= ∅,

which in turn is equivalent to

∀ {Fα}α∈I :⋂α∈I

Fα = ∅ ⇒ exists a finite J ⊂ I s.t.⋂α∈J

Fα = ∅.

Finally, this is equivalent to

∀ {Fα}α∈I : for all finite J ⊂ I ,⋂α∈J

Fα , ∅. ⇒⋂α∈I

Fα , ∅

• We next show that ® implies ¬. Let (xn) be a sequence, and consider the(not necessarily closed) sets

An = {xn, xn+1, . . . } .

This is a decreasing sequence of sets that has the property that every finitesub-collection has a non-empty intersection. By assumption, the countableintersection, ∩∞n=1An is not empty: there exists a point

x ∈∞⋂

n=1

{xn, xn+1, . . . }.

Metric Spaces 17

This point x belongs to each of the closed sets Ak, and therefore in eachof the Ak there exists a point xnk whose distance from x is less than 1/k.Thus, we can construct a subsequence (xnk) that converges to x, hence X iscompact.

• It remains to show that ¬ implies , for which we need two lemmas.

n

Lemma 1.1 Let (X, d) be a compact metric space, and F an open covering ofX. Then there exists an ε > 0 such that every open ball of radius ε (or less) iscontained in at least one of the sets in F .

Comment: Given an open covering, the supremum over all such ε is called theLebesgue number of the covering.

Proof :

X

12

Suppose that the desired property does not hold, i.e., we can construct a sequenceof open balls B0(xn, 1/n), that none of them is contained in any of the sets in F .Since X is compact, the sequence (xn) has a convergent subsequence xnk → x.Since F is a covering, there exists an open set A ∈ F such that x ∈ A. Moreover,since A is open, there exists an open ball B0(x, r) contained in A. Since x is apartial limit of (xn), there exists infinitely many n for which d(xn, x) < r/2, and atleast one for which 1/n < r/2. Thus, the set B0(xn, 1/n) is a subset of A ∈ F incontradiction with our assumption. n

Lemma 1.2 Let (X, d) be a compact metric space. Then for every ε > 0, X can becovered by a finite set of open balls of radius ε.

Proof : Let ε > 0 be given and let x0 ∈ X be an arbitrary point. If B0(x0, ε) =X then we are done. Otherwise, there exists a point x1 ∈ X \ B0(x0, ε). IfB0(x0, ε) ∪ B0(x1, ε) = X, then we are done, otherwise, etc. Thus, either thelemma is correct, or we can construct an infinite sequence of point (xn) that arepairwise at a distance greater than ε. Such a sequence does not have a convergentsubsequence, in contradiction to X being compact. n

Completion of the Heine-Borel theorem: Let X be compact and let F be an opencovering. By the first lemma there exists an ε > 0 such that every open ball of

18 Chapter 1

radius ε is contained in an element of F . By the second lemma, we can coverX with a finite number of such open balls. This finite collection of open balls iscontained in a finite collection of sets in F , which is therefore a covering.

Comment: The term compactness was introduced by Maurice Frechet in 1906. Itwas primarily introduced in the context of metric spaces in the sense of “sequentialcompactness”. The “covering compact” definition has become more prominentbecause it allows us to consider general topological spaces, and many of the oldresults about metric spaces can be generalized to this setting. This generalizationis particularly useful in the study of function spaces, many of which are not metricspaces.

One of the main reasons for studying compact spaces is because they are in someways very similar to finite sets. In other words, there are many results which areeasy to show for finite sets, the proofs of which carry over with minimal changeto compact spaces.

Comment: A “counter-example”: the space X = (0, 1) is not compact, yet forevery ε > 0 we can cover (0, 1) with a finite number of segments of size ε. Thereis no contradiction. We can form an open covering of (0, 1),

F = {(1/(n + 2), 1/n) : n ∈ N} ,

which does not have a finite sub-covering.

The Heine-Borel theorem refers to compact metric spaces. An analogous theoremapplies for compact sets in metric spaces:

Theorem 1.3 (Heine-Borel for sets) A set K in a metric space X is compact iff forevery collection of open sets {Aα}α∈I , such that K ⊂ ∪α∈IAα, there exists a finitesub-collection {Aα}α∈J, such that K ⊂ ∪α∈JAα.

. Exercise 1.19 Prove it.

Definition 1.11 A set A ⊂ X is called dense in X if A = X (every non-empty openset in X intersects A). A metric space is called separable if is contains a densecountable subset.

Example: R is separable because it contains the countable dense set Q. So is anyRn, which contains the countable dense set Qn. N

Metric Spaces 19

Proposition 1.10 A compact metric space is separable.

Proof : If X is compact, then it can be covered with a finite number of open ballsof size 1. The centers of these balls form the first elements of a countable set. Thenext elements are the finite number of points that are the centers of open balls ofsize 1/2 that cover X. Thus, we proceed, forming a countable set of points in X.It is obvious that this set is dense in X. n

. Exercise 1.20 (Ex. 8 in L.) Let (X, d) be a metric space in which the closed-and-bounded sets are compact (like Rn). Prove that for any a ∈ X and F ⊂ Xclosed, there exists a point f ∈ F such that

d(a, f ) = d(a, F).

. Exercise 1.21 (Ex. 20 in L.) Show that the finite product space of compactspaces is compact: Let (X1, d1), . . . , (Xn, dn) be metric spaces and consider theproduct space

X =n∏

i=1

{(x1, . . . , xn) : x j ∈ X j ∀ j

},

andd(x, y) = max

j=1,...,ndi(xi, yi).

(1) Prove that (X, d) is a metric space. (2) Prove that if the (Xi, di) are compactthen (X, d) is compact.

1.4 Continuous functions on metric spaces

Metric spaces are an abstraction that generalizes the real line. Abstractions, suchas metric spaces, groups, rings, etc., never live on their own. They are alwaysassociated with mappings (morphisms, in the language of categories) betweeninstances of the same abstraction, which preserve certain properties (e.g., groupsand homomorphisms). In the first calculus courses we learned about continuousmappings (i.e., functions) R → R. This concept will be generalized into continu-ous mappings between metric spaces.

X Y

x

f!x"

!

"

20 Chapter 1

Definition 1.12 Let (X, d) and (Y, ρ) be two metric spaces. A mapping f : X → Yis said to be continuous at a point x ∈ X if for every ε > 0 there exists a δ > 0such that

d(x′, x) ≤ δ ⇒ ρ( f (x′), f (x)) ≤ ε.

That is, f is continuous at x if for every ball B( f (x), ε) ⊂ Y there exists a ballB(x, δ) ⊂ X whose image under f is contained in B( f (x), ε). A function is calledcontinuous if it is continuous at all points.

Examples:

• Let (X, d) be an arbitrary metric space and a ∈ X an arbitrary point. Thefunction x 7→ d(x, a) is a continuous mapping f : (X, d) → (R, | · |) becausefor every ε > 0 take δ = ε. Then d(x, z) ≤ ε implies that

| f (x) − f (z)| = |d(x, a) − d(z, a)| ≤ d(x, z) ≤ ε.

• Every function defined on a discrete space is continuous because we canalways take δ = 1/2.

• Every function f : X → Rn is continuous iff each of its component is acontinuous function X → R.

N

Proposition 1.11 (Sequential continuity) Let f : (X, d) → (Y, ρ) be a functionbetween two metric spaces. Then f is continuous at a iff for every sequence (xn)in X

limn→∞

xn = a ⇒ limn→∞

f (xn) = f (a).(8 hrs)

Proof : Suppose first that f is continuous at a and let (xn) be a sequence in X thatconverges to a. Since f is continuous, there exists for every ε > 0 a δ > 0 suchthat x ∈ B(a, δ) implies f (x) ∈ B( f (a), ε). Because xn → a, there exists an Nδsuch that xn ∈ B(a, δ) for all n > Nδ, hence f (xn) ∈ B( f (a), ε) for all n > Nδ, fromwhich follows that f (xn)→ f (a).

Suppose next that for every sequence (xn) that converges to a the sequence f (xn)converges to f (a). Suppose that f were not continuous: then there exists an ε >

Metric Spaces 21

0 such that for every δ > 0 there exists an xn such that d(xn, a) ≤ δ and yetρ( f (xn), f (a)) > ε, in contradiction with the assumption. n

The following theorem implies alternative definitions to the continuity of func-tions between metric spaces. In fact, these definitions are more general in thesense that they remain valid in (non-metric) topological spaces (i.e., spaces thatare only endowed with a structure of open sets).

Theorem 1.4 Let f : (X, d) → (Y, ρ) be a function between two metric spaces.Then the following three statements are equivalent:

À f is continuous.Á The pre-image of every set open in Y is a set open in X.Â The pre-image of every set closed in Y is a set closed in X.

Proof : For every set A ⊂ Y , its pre-image under f is defined as

f −1(A) = {x ∈ X : f (x) ∈ A} .

The equivalence between and ® follows from the complementation property ofthe pre-image:

f −1(Ac) = ( f −1(A))c.

Af!1"A#

x f"x#We now show that ¬ implies . Let A ⊂ Y be an open set, and let x ∈ f −1(A).Since A is open and f (x) ∈ A, there exists a ball B( f (x), ε) ⊂ A. Since f is contin-uous, there exists a ball B(x, δ) whose image under f is a subset of B( f (x), ε), i.e.,B(x, δ) ⊂ f −1(A), which implies that f −1(A) is open.

It remains to show that implies ¬. Let x ∈ X be an arbitrary point; by as-sumption, for every ε > 0 the pre-image of the set B0( f (x), ε) is open in X, andincludes the point x, i.e., there exists an open ball B0(x, δ) ⊂ f −1(B0( f (x), ε)),which implies continuity at x. n

Comment: A “counter-example”: continuous function not necessarily map opensets into open sets. For example the function f : R → R defined by f : x 7→ 0maps the open set (0, 1) into the closed set {0}.

Definition 1.13 (Function composition) Let f : (X1, d1) → (X2, d2) and g :(X2, d2) → (X3, d3) be functions between metric spaces. Their composition g ◦ fis a function (X1, d1)→ (X3, d3) defined by

(g ◦ f )(x) = g( f (x)).

22 Chapter 1

Proposition 1.12 (Composition of continuous mappings) The composition of con-tinuous mappings is a continuous mapping.

Proof : We can use either interpretation of continuity. The proof is immediate ifwe use the “topological interpretation” and observe that

(g ◦ f )−1 = f −1 ◦ g−1.

n

Corollary 1.3 For continuous functions f , g : X → R, the functions f ± g, andf · g are continuous. Furthermore, f /g is continuous at all points where g , 0.

Proof : The function x 7→ ( f (x), g(x)) is a continuous mapping from R to R2,because, by the separate continuity of f , g, there exists for every ε > 0 a δ > 0,such that |x − a| ≤ δ implies.

√| f (x) − f (a)|2 + |g(x) − g(a)|2 ≤ ε. Furthermore,

the function (x, y) 7→ x+y is a continuous function from R2 to R because for everyε > 0 we can take δ = ε/2 and√|x − a|2 + |y − b|2 ≤ ε/2 implies |(x+y)− (a+b)| ≤ |x−a|+ |y−b| ≤ ε.

Finally the mapping x 7→ f (x) + g(x) is a composition of these two continuousmapping. The same argument holds for subtraction and multiplication.

To prove the continuity of division, it is sufficient to show that the mapping x →1/x is continuous at all points where x , 0 (which we know from elementarycalculus). n

Proposition 1.13 (Continuity and compactness) Continuous functions betweenmetric spaces map compact sets into compact sets.

Proof : Let f : X → Y and K ⊂ X a compact set. Take a sequence (yn) in f (K),and consider a corresponding sequence (xn) in K such that f (xn) = yn (it is notnecessarily unique). Since K is compact, (xn) has a subsequence that converges toa limit x ∈ K. Because f is continuous, the image of this subsequence convergesto f (x) ∈ f (K), hence f (K) is compact. n

Corollary 1.4 Recall that in compact spaces, the closed sets coincide with thecompact sets: thus, if f : X → Y and X is compact, the image of every closed setis closed. By complementation, the image of every open set is open.

Metric Spaces 23

The following theorem is a generalization of a well-known theorem in elementarycalculus:

Proposition 1.14 (Extremal values in compact spaces) A continuous function froma compact metric space into R assumes a minimum and a maximum.

Proof : Let f : X → R with X compact. By the previous proposition, the setf (X) ⊂ R is compact, hence bounded and closed. Since it is bounded, it has afinite supremum, and since it is closed this supremum must belong to the set, i.e.,it is a maximum. n

Discussion: Note the advantage of working with abstract concepts: we provepowerful theorems without having to deal with problem-specific details.

Definition 1.14 (Uniform continuity) A mapping f : (X, d) → (Y, ρ) is calleduniformly continuous if there exists for every ε > 0 a δ > 0 such that for everya, b ∈ X,

d(a, b) ≤ δ implies ρ( f (a), f (b)) ≤ ε.

That is, it is uniformly continuous if it is continuous, and the same δ can be chosenfor all points. The function δ(ε) is called the continuity modulus of the function.

Definition 1.15 (Holder and Lipschitz continuity) A mapping f : (X, d)→ (Y, ρ)is called Holder continuous of order α if there exists a real number k > 0 suchthat for every a, b ∈ X,

ρ( f (a), f (b)) ≤ k [d(a, b)]α.

The mapping is called Lipschitz continuous if it is Holder continuous of orderα = 1.

. Exercise 1.22 (Ex. 9 in L.) Let F ⊂ X be a closed subset in a metric space.Prove that the function X → R defined by x 7→ d(x, F) is Lipschitz continuous.

Theorem 1.5 A continuous function on a compact metric space is uniformly con-tinuous.

Comment: Note that we can always replace this statement by “a continuous func-tion defined on a compact set in a metric space”.

24 Chapter 1

Proof : Let f : (X, d) → (Y, ρ) be continuous with X compact. Suppose it isn’tuniformly continuous. Then, there exists an ε > 0 such that for all δ = 1/n thereexist (xn, yn) such that

d(xn, yn) ≤ 1/n and yet ρ( f (xn), f (yn)) > ε.

Since X is compact, there exists a sub-sequence (xnk , ynk) such that xn → x andyn → y. Because d(xn, yn) ≤ 1/n it follows that x = y. Recall that the dis-tance function is continuous in each of its arguments: it follows that the mapping(x, y) → ρ( f (x), f (y)) is continuous i.e., ρ( f (xn), f (yn)) → ρ( f (x), f (x)) = 0, thusa contradiction. n(10 hrs)

Definition 1.16 (Homeomorphism) A mapping f : (X, d) → (Y, ρ) between met-ric spaces is called a homeomorphism if it is one-to-one, onto, and both f andf −1 are continuous, i.e., A ⊂ X is open iff f (A) ⊂ Y is open. Two metric spacesare called homeomorphic of there exists a homeomorphism between them.

Comment: It is easy to see that homeomorphism is an equivalence relation be-tween metric spaces: it is symmetric, reflexive and transitive. In fact, homeomor-phism is a topological concept (only requires the definition of open sets), and isalso called topological equivalence. Note that this equivalence is with respect toall topological properties (openness, closedness, compactness), but not necessar-ily with respect to metric properties, such as boundedness. Indeed, a homeomor-phism maps open sets into open sets, closed sets into closed sets, and compact setsinto compact sets.

Examples:

• The identity map from (Rn, ‖·‖p) to (Rn, ‖·‖q) is a homeomorphism, becauseopen balls with respect to the ‖ · ‖p metric are open sets (although not balls)with respect to the ‖ · ‖q metric.

• Every two open segments (a, b) and (c, d) on the line are homeomorphic,because the linear mapping x→ c+(d−c)(x−a)/(b−a) is a homeomorphism.

• Every open segment (a, b) is homeomorphic to the entire real line becauseit is homeomorphic to the open segment (−π/2, π/2) and the latter is home-omorphic to the real line though the homeomorphism x→ tan x.

Metric Spaces 25

N

Proposition 1.15 If f is a continuous, one-to-one mapping from a compact metricspace (X, d) onto an (arbitrary) metric space (Y, ρ), then f is a homeomorphism,i.e., f −1 is continuous.

Proof : Let K ⊂ X be a closed set. Since it is a closed subset of a compact space,it is compact, and by Proposition 1.13 f (K) is compact and, in particular, closed.By Theorem 1.4, if f maps closed sets into closed sets then f −1 is continuous. n

Discussion: How does one show that two metric spaces are not homeomorphic?In principle, we should show that there does not exist a homeomorphism betweenthem. We can’t of course check all mappings. The idea is to show that the twospaces differ in a topological invariant—a property that is preserved by homeo-morphisms.

Examples:

• R is not homeomorphic to a discrete space because in a discrete space allcompact sets are finite, whereas this not true in R. Thus, the image of [0, 1]in the discrete space cannot be compact.

• (0, 1) is not homeomorphic to [0, 1] because the first is not compact whereasthe second is.

• The unit square [0, 1] × [0, 1] is not homeomorphic to [0, 1]. Both spacesare compact, and it requires more subtle arguments to show it.

• Rn is not homeomorphic to Rm for m , n. This is a very deep theorem(invariance of domain, Brouwer 1912), which is hard to prove and beyondthe scope of this course (requires advanced topology).

N

. Exercise 1.23 (Ex. 13 in L.) Let F,G be sets in a metric space X and let Y bea metric space. Given two functions f : F → Y and g : g→ Y such that f = g onF ∩G, we define the function

( f ∧ g)(x) =

f (x) x ∈ Fg(x) x ∈ G.

26 Chapter 1

À Prove that if F,G are closed in X and f , g are continuous, then f ∧ g iscontinuous.

Á Prove that this is also correct if F,G are open in X.

Â Show that this is not necessarily the case if F is open and G is closed.

. Exercise 1.24 (Ex. 14 in L.) Let f : (0, 1) → R be continuous. Show that fcan be extended into a continuous function on [0, 1] iff f is uniformly continuouson (0, 1).

. Exercise 1.25 (Ex. 23 in L.) A function f : (X, d) → (Y, ρ) is called anisometry if

ρ( f (x), f (y)) = d(x, y) ∀x, y ∈ X.

À Show that every isometry from X onto Y is an homeomorphism. Give anexample to show that the converse is not true.

Á Prove that every isometry from R onto itself is of the form x 7→ ±x + a.

. Exercise 1.26 (Ex. 24 in L.) Show that every isometry from a compact spaceinto itself is onto.

. Exercise 1.27 (Ex. 19 in L.) Let A ⊂ X. The indicator function of the theset A is defined as

χA(x) =

1 x ∈ A0 x < A.

Show that the points of discontinuity of χA coincide with ∂A.

. Exercise 1.28 (Ex. 30 in L.) Prove that the following spaces are homeomor-phic (using the natural metrics):

À R2.

Á The unit disc: D2 ={(x, y) : x2 + y2 < 1

}.

Â The unit square: Sq2 = {(x, y) : max(|x|, |y|) < 1}.

Ã The unit sphere in R3 less one point, e.g.S 2 \ ? =

{(x, y, z) : x2 + y2 + z2 = 1

}\ {(0, 0, 1)}.

Metric Spaces 27

1.5 Convergence of sequences of functions

This section deals with sequences of functions between metric spaces.

Definition 1.17 (Pointwise and uniform convergence) A sequence ( fn) of func-tions from (X, d) to (Y, ρ) is said to converge pointwise to a limit function f , if forevery x ∈ X,

limn→∞

fn(x) = f (x),

i.e., if for every x ∈ X and ε > 0 there exists an Nx,ε such that

ρ( fn(x), f (x)) ≤ ε for all n > Nx,ε .

It is said to converge uniformly if N can be chosen independently of x.

Comments:

• Pointwise convergence means that

limn→∞ρ( fn(x), f (x)) = 0 ∀x,

whereas uniform convergence means

limn→∞

supx∈Xρ( fn(x), f (x)) = 0.

• A sequence of functions ( fn) that converges uniformly to f satisfies theCauchy criterion for uniform convergence,

limn,m→∞

supx∈Xρ( fn(x), fm(x)) = 0.

(This is because ρ( fn(x), fm(x)) ≤ ρ( fn(x), f (x)) + ρ( fm(x), f (x)).) It can beshown (see exercise below) that for functions X → R the Cauchy criterionis also a sufficient condition for uniform convergence.

. Exercise 1.29 Prove that for functions X → R the Cauchy criterion is a nec-essary and sufficient conditions for uniform convergence.

Examples:

28 Chapter 1

• The functions between the discrete spaces N and {0, 1} defined by fn : j 7→δ j,n converge to the constant function f : j 7→ 0 pointwise by not uniformly.In fact, it is easy to see that a sequence of functions fn into a discrete spaceconverges uniformly to f iff fn coincides with f eventually.

• The sequence of functions fn : [0, 1) → [0, 1) (with the standard metric)defined by fn(x) = xn converges to the function f , defined by f (x) = 0,pointwise, but not uniformly, because for every n,

sup0≤x<1

| fn(x) − f (x)| = sup0≤x<1

|xn − 0| = 1.

!

1/!

• Consider the following sequence of functions fn : R→ R:

fn(x) =

n2x 0 < x < 1/n2n − n2x 1/n < x < 2/n0 otherwise.

It converges pointwise to the function f = 0, but not only it does not con-verges uniformly, we even have

limn→∞

supx∈R| fn(x) − f (x)| = lim

n→∞n = ∞.

N

Consider the functions fn(x) = xn on the closed interval [0, 1]. Although eachof the fn is continuous, this sequence converges pointwise to the discontinuousfunction

f (x) =

0 0 ≤ x < 11 x = 1.

The following theorem shows that this can’t happen if the convergence is uniform.

Theorem 1.6 Let ( fn) be a sequence of functions from (X, d) to (Y, ρ), which con-verge uniformly to a function f . If all the fn are continuous, then the limit iscontinuous. Moreover, if all the fn are uniformly continuous on X, then f is uni-formly continuous on X.

Metric Spaces 29

Proof : Suppose that all the fn are continuous at a point x ∈ X and let (xm) be asequence that converges to x. For every n,m,

ρ( f (xm), f (x)) ≤ ρ( f (xm), fn(xm)) + ρ( fn(xm), fn(x)) + ρ( fn(x), f (x)).

Let ε > 0 be given. Because the fn converge to f uniformly, we can choose nsufficiently large such that the first and third term are less than ε/3, irrespective ofm. Having fixed n, there exists an N such that for all m > N, the second term isless than ε/3 (since fn is continuous). To conclude, for all ε > 0 there exists an Nεsuch that for all m > Nε , ρ( f (xm), f (x)) ≤ ε, i.e., f is continuous.

Assume next that the fn are uniformly continuous. For x, y ∈ X we write

ρ( f (x), f (y)) ≤ ρ( f (x), fn(x)) + ρ( fn(x), fn(y)) + ρ( fn(y), f (y)).

Since fn converges uniformly to f , given ε > 0, there exists an n such that thefirst and third terms are less than ε/3. Since this fn is uniformly continuous, thereexists a δ > 0 such that d(x, y) ≤ δ implies ρ( fn(x), fn(y)) ≤ ε/3, i.e., impliesρ( f (x), f (y)) ≤ ε. n

Proposition 1.16 If ( fn) are continuous and converge uniformly to f , then forevery sequence (xn) in X,

xn → x implies fn(xn)→ f (x).

Comment: A “counter example”: the sequence fn(x) = xn defined on [0, 1] con-verges pointwise to the function

f (x) =

0 0 ≤ x < 11 x = 1.

Consider the sequence xn = 1 − 1/n, which converges to 1. Then,

limn→∞

fn(xn) =(1 −

1n

)n

=1e, f (1).

Proof : We have

ρ( fn(xn), f (x)) ≤ ρ( fn(xn), f (xn)) + ρ( f (xn), f (x))≤ sup

y∈Xρ( fn(y), f (y)) + ρ( f (xn), f (x)).

30 Chapter 1

The first term on the right hand side converges to zero by the uniform convergenceof fn, whereas the second term converges to zero by the continuity of f (whichresults from the previous proposition). n

Definition 1.18 Let ( fn) be a sequence of functions from a metric space (X, d)into R (we need the target to be a normed space). The series

∑∞n=1 fn is said to

converge uniformly to f if the sequence of partial sums converges uniformly to f .

Theorem 1.7 (Weierstraß’ M-test) Let ( fn) be a sequence of functions from ametric space (X, d) into R. Suppose that there exists a sequence of real numbers(Mn) such that

supx∈X| fn(x)| ≤ Mn,

and that the series∑∞

n=1 Mn converges. Then, the series∑∞

n=1 fn converges uni-formly on X.

Example: The figure below shows the functions

S n(x) =n∑

k=1

sin kxk1.1

for n = 2, 20, 200, 2000. By the Weierstraß M-test, the sequence S n convergesuniformly on [0, 2π], and since each of the partial sums is a uniformly continuousfunction, the limit is uniformly continuous.

0 2 4 6−1.5

−1

−0.5

0

0.5

1

1.5

n=20 2 4 6

−1.5

−1

−0.5

0

0.5

1

1.5

n=20

0 2 4 6−1.5

−1

−0.5

0

0.5

1

1.5

n=2000 2 4 6

−1.5

−1

−0.5

0

0.5

1

1.5

n=2000

N

Metric Spaces 31

Proof : For every x the series S n(x) =∑

n fn(x) converges absolutely (satisfiesCauchy’s criterion), which defines a limiting function S (x). It remains to showthat the convergence is uniform.

Note that for every n,

S n(x) − S (x) =∞∑

k=n+1

fk(x),

i.e.,

|S n(x) − S (x)| ≤∞∑

k=n+1

| fk(x)| ≤∞∑

k=n+1

Mk,

and since the right hand side does not depend on x,

supx∈X|S n(x) − S (x)| ≤

∞∑k=n+1

Mk.

It remains to let n→ ∞. n

Comment: In particular, if all the fn are continuous, so is the series S . (12 hrs)

Proposition 1.17 Let fn : [a, b] → R be a sequence of continuous functions thatconverges uniformly on [a, b] to a limit f . Then,

limn→∞

∫ b

afn(x) dx =

∫ b

af (x) dx.

Comment: We restrict ourselves to functions R→ R because we only know whatare integrals of such functions.

Proof : The integrals exists because all the fn and their limit f are continuous(Theorem 1.6), hence Riemann integrable. Now, for every n,∫ b

af (x) dx −

∫ b

afn(x) dx =

∫ b

a[ f (x) − fn(x)] dx.

To prove the theorem, we need to show that the right hand side tends to zero:∣∣∣∣∣∣∫ b

a[ f (x) − fn(x)] dx

∣∣∣∣∣∣ ≤∫ b

a| f (x) − fn(x)| dx

≤

∫ b

asup

a≤y≤b| f (y) − fn(y)| dx = (b − a) sup

a≤y≤b| f (y) − fn(y)|,

32 Chapter 1

and since fn converges to f uniformly, the right hand side tends to zero. n

Theorem 1.6 provides a criterion for the limit of a sequence of continuous func-tions to be continuous (respectively, uniformly continuous). What about differ-entiability? Under what conditions is the limit of a sequence of differentiablefunctions differentiable?

Theorem 1.8 Let fn : (a, b) → R be a sequence of differentiable functions suchthat (i) the sequence fn(x) converges at at least one point c ∈ (a, b), and (ii) thesequence of derivatives, f ′n converges uniformly on (a, b) to a limit function g.Then,

À The sequence fn converges uniformly on (a, b) to a limit f .Á f is differentiable with f ′ = g.

Comments:

• We need the first condition, because the non-convergent sequence fn(x) = nsatisfies the second condition with g = 0.

• Since we have not assumed the fn to be continuously differentiable, we can’tassume g to be continuous, and not even integrable. This prevents us fromdefining f (x) =

∫ x

ag(y) dy, and then show that fn converges to f uniformly.

This will “cost us” in technical complications. Providing a simpler prooffor the case where the fn are continuously differentiable is left as an exercise(see below).

Proof : The fact that the target space is R allows us to prove uniform convergencebased on Cauchy’s criterion. For every m, n, we use the mean value theorem toget

| fm(x) − fn(x)| ≤ |( fm − fn)(x) − ( fm − fn)(c)| + | fm(c) − fn(c)|= |x − c||( f ′m − f ′n)(ξ)| + | fm(c) − fn(c)|,

where ξ is between x and c. Hence,

supx| fm(x) − fn(x)| ≤ (b − a) sup

x| f ′m(x) − f ′n(x)| + | fm(c) − fn(c)|.

The first term on the right hand side converges to zero, as m, n→ ∞, by Cauchy’scriterion for uniform convergence for the sequence of derivatives. The second

Metric Spaces 33

term converges to zero by the first assumption. Hence, the functions fn convergesuniformly (to a limiting function f ) by Cauchy’s criterion.

It remains to show that f ′ = g. Let x be an arbitrary (but fixed!) point, and set

ϕ(h) =

1h [ f (x + h) − f (x)] h , 0g(x) h = 0.

We will be done if we prove that ϕ is continuous at zero.

How can we show that a function is continuous? For example, by showing that itis the uniform limit of a sequence of continuous functions. Define the sequenceof functions

ϕn(h) =

1h [ fn(x + h) − fn(x)] h , 0f ′n(x) h = 0.

These functions are, by definition, continuous at zero, and we already know thatthey converge pointwise to ϕ. To show that they converge uniformly, we show thatthey satisfy Cauchy’s criterion: if h , 0 then

|ϕn(h) − ϕm(h)| =1h|( fn − fm)(x + h) − ( fn − fm)(x)|

≤ supy∈X| f ′n(y) − f ′m(y)|,

whereas for h = 0,

|ϕn(h) − ϕm(h)| = | f ′n(0) − f ′m(0)| ≤ supy∈X| f ′n(y) − f ′m(y)|.

Letting n,m→ ∞, and using the fact that the right hand sides do not depend on hand that ( f ′n) converge uniformly, we have that ϕn converges to ϕ uniformly, whichconcludes the proof. n

. Exercise 1.30 (Ex. 31 in L.) Let fn : (a, b)→ R be a sequence of continuouslydifferentiable functions such that (i) the sequence fn(x) converges at at least onepoint c ∈ (a, b), and (ii) the sequence of derivatives, f ′n converges uniformly on(a, b) to a limit function g. Prove that

limn→∞

( fn(x) − fn(c)) = limn→∞

∫ x

cf ′n(y) dy =

∫ x

cg(y) dy.

Conclude then that ( fn) has a limit, which we denote by f , and that f ′ = g.

34 Chapter 1

Proposition 1.17 and Theorem 1.8 can be reformulated in terms of convergentseries:

Corollary 1.5 (1) If the series of continuous real-valued functions,∑∞

n=1 fn, con-verges uniformly on an interval [a, b], then∫ b

a

∞∑n=1

fn(x)

dx =∞∑

n=1

∫ b

afn(x) dx.

(2) Let ( fn) be a sequence of differentiable functions on (a, b) such that the seriesof derivatives converges uniformly on (a, b), and there exists a point c ∈ (a, b) atwhich the series

∑∞n=1 fn converges, then the series converges uniformly on (a, b)

and ∞∑n=1

fn(x)

′ = ∞∑n=1

f ′n(x).

We have seen that the uniform convergence of continuous functions implies acontinuous limit. What about the other direction? Does the fact that a (pointwise)converging sequence of continuous functions have a continuous limit imply thatthe convergence is uniform?

Theorem 1.9 (Dini) Let ( fn) be a sequence of continuous real-valued functionsdefined on a compact metric space (X, d), and converging (pointwise) to the func-tion f , which is also continuous. If for every x ∈ X the sequence ( fn(x)) is mono-tonic, then fn converges to f uniformly.

Comment: There is no requirement that ( fn(x)) and ( fn(y)) be monotonic in thesame direction.

. Exercise 1.31 Find a “counter example” to Dini’s theorem where the mono-tonicity assumption does not hold.

Proof : Suppose, by contradiction, that the convergence was not uniform. Then,there exists an ε > 0, a sequence (xk) ⊂ X and a subsequence nk ⊂ N, such that

| fnk(xk) − f (xk)| > ε.

Furthermore, since the sequence ( fn(x)) is monotonic for every x, it follows that

| fm(xk) − f (xk)| ≥ | fnk(xk) − f (xk)| > ε ∀m ≤ nk.

Metric Spaces 35

Since X is compact, the sequence (xk) has a converging subsequence xkl with limitx. Since fm and f are continuous, the limit l→ ∞ may be taken, giving

| fm(x) − f (x)| > ε

for all m. This violate the assumption that fm converges to f pointwise. n

Alternative proof : Here is another proof that uses the “covering version” of com-pactness. Since the sequence fn converge pointwise and monotonically to f , thenthere exists for every ε > 0 and every point x a number N(x, ε), such that

| fn(x) − f (x)| ≤ε

3∀n > N(x, ε).

Moreover, since both fn and f are continuous, there exists a δ(x, ε) > 0 such that

| f (y) − f (x)| ≤ε

3and | fN(x,ε)(y) − fN(x,ε)(x)| ≤

ε

3∀y ∈ B0(x, δ(x, ε)).

The union of all B0(x, δ(x, ε)) is an open covering of X. Since X is compact, thereexists a finite sub-covering,

X =m⋃

i=1

B0(xi, δ(xi, ε)).

Let now N = maxN(xi,ε) (here we exploit the finiteness). Since every y ∈ X iscontained in one of these balls, say the k-th ball, then for all n > N (here weexploit the monotonicity),

| fn(y) − f (y)| ≤ | fn(y) − fn(xk)| + | fn(xk) − f (xk)| + | f (xk) − f (y)| ≤ ε,

which prove uniform convergence. n

Corollary 1.6 Let ( fn) be a sequence of non-negative, real-valued function definedon a compact metric space (X, d). If the sequence

∑∞n=1 fn converges pointwise and

its limit is continuous, then it converges uniformly on X.(14 hrs)

Definition 1.19 Let K be a compact metric space, and consider the space ofcontinuous real-valued functions over K; it is a vector space (verify!). We canendow this space with the supremum norm,

‖ f ‖ = maxx∈K| f (x)|.

This normed space is denoted by C(K).

36 Chapter 1

Comments:

• Why is this a norm? First, it is defined, since by Proposition 1.14 a contin-uous function on a compact set assumes a maximum. The properties of thenorm are readily verified.

• Recall that any norm induces a metric, d( f , g) = ‖ f − g‖. Convergencefn → f in this metric means that

limn→∞

d( fn, f ) = limn→∞

maxx∈K| fn(x) − f (x)| = 0,

i.e., uniform convergence. Uniform convergence is the “natural” mode ofconvergence in the space of continuous functions!

• Note the structure, which repeats itself everywhere in analysis: we havemetric spaces, and construct a new metric space from mappings betweentwo metric spaces.

• The segment [0, 1] is compact (or, is a compact metric space once endowedwith its natural metric). What about the space C[0, 1]? Is it compact? Ifit were, the functions fn = xn would have a subsequence which convergesuniformly, which is not the case. Moreover, the set { fn} is closed in C[0, 1]and bounded and yet, does not have a converging subsequence. That is,closedness and boundedness does not ensure compactness in C[0, 1].

Thus, knowing that a set of continuous functions on a compact set is closed andbounded does not ensure the existence of a converging subsequence. This is unfor-tunate, because compactness is often used to show the existence of certain func-tions (e.g., existence of solutions to ordinary differential equations). It turns outthat if closedness and boundedness are supplemented with another property, setsin C[0, 1] are compact.

Definition 1.20 (Equicontinuity) Let K be a compact metric space and let A bea collection of functions in C(K). This collection is said to be equicontinuous iffor every ε > 0 there exists a δ > 0 such that

|x − y| ≤ δ ⇒ | f (x) − f (y)| ≤ ε ∀ f ∈ A ∀x, y ∈ K.

That is, all the f ∈ A are uniformly continuous, and the same “modulus of conti-nuity” δ(ε) applies for all f ∈ A.

Metric Spaces 37

Theorem 1.10 (Arzela-Ascoli) A closed and bounded set A ⊂ C(K) is compactiff it is equicontinuous.

Proof : Suppose all three conditions hold. Since the collection is bounded, thereexists an M such that1

supf∈A

(maxx∈K| f (x)|

)≤ M.

Consider a sequence of functions ( fn) ⊂ A. We need to show that it has a subse-quence which converges uniformly.

Let (xn) be a sequence of points which is dense in K; such a sequence exists byProposition 1.10 (compact spaces are separable). The sequence of real numbers( fn(x1)) is bounded hence there exists a subsequence ( f (1)

n ), which converges atthe point x1. Take then the sequence f (1)

n (x2); by the same argument, it has asubsequence ( f (2)

n ) which converges at the point x2 (and x1). Thus we proceedinductively, constructing a sequence of subsequences, ( f (k)

n ), which for given kconverge at the point x1, · · · , xk.

The following picture is helpful:

f (1)1 (x) f (1)

2 (x) · · · f (1)n (x) · · ·

f (2)1 (x) f (2)

2 (x) · · · f (2)n (x) · · ·

· · · · · · · · · · · ·

f (k)1 (x) f (k)

2 (x) · · · f (k)n (x) · · · .

Every row is a sequence of functions which is (i) a subsequence of the previousrow, and (ii) converges at a finite number of points.

Consider now the “diagonal” sequence f (n)n (x). Since it is a subsequence of f (1)

n itconverges at the point x1. By the same argument, it converges at every point xk,i.e., it converges pointwise in a dense set of points.

So far, we have not used equicontinuity. We will use it now to show that f (n)n

converges uniformly on K. Let ε > 0 be given, and let’s take δ according to thedefinition of equicontinuity. Since K is compact, we can cover it with a finite

1Recall that bounded means that there exists a function g ∈ A and an r > 0 such that for allf ∈ A

d( f , g) ≤ r ⇒ supf∈A

(max

x| f (x) − g(x)|

)< r. ⇒ sup

f∈A

(max

x| f (x)|

)< r +max

x|g(x)| = M.

38 Chapter 1

number of balls B0(xk, δ), where the center xk belong to the dense set. Each ofthese xk has an associated Nk such that for all n,m > Nk,

| f (n)n (xk) − f (m)

m (xk)| ≤ ε.

Finally, take N = maxk Nk.

Every x ∈ K is in one of those balls, whose center we denote by xk; for everym, n > N,

| f (n)n (x) − f (m)

m (x)| ≤ | f (n)n (x) − f (n)

n (xk)| + | f (n)n (xk) − f (m)

m (xk)| + | f (m)m (xk) − f (m)

m (x)|≤ ε + ε + ε.

It follows that for every ε > 0 there exists an N such that for all m, n > N | f (n)n (x)−

f (m)m (x)| ≤ ε, where N does not depend on x. By Cauchy’s criterion, the sequence

f (n)n converges uniformly on K (and its limit is continuous).

We leave the other direction as an exercise. n

Comment: A classical application of the Arzela-Ascoli theorem is the local exis-tence proof of solutions of ordinary differential equations.

Example: Consider a sequence of differentiable functions fn : [a, b]→ R, that areuniformly bounded, | f | ≤ M1 and have a uniformly bounded derivative, | f ′n | ≤ M2.Then the sequence has a subsequence that converges uniformly.

We need to show that this sequence is equicontinuous. That is, that for every ε > 0there exists a δ > 0 such that

|x − y| ≤ δ ⇒ | fn(x) − fn(y)| ≤ ε,

independently of x, y, n. Indeed,

| fn(x) − fn(y)| ≤ M2|x − y|,

so that we may take δ = ε/M2. The uniform boundedness of fn is of course neededas shows the example fn(x) = n, where each fn is continuous and bounded, andthe sequence is equicontinuous. N

. Exercise 1.32 (Ex. 29 in L.)

Metric Spaces 39

À Let f and ( fn) be continuous mappings from a compact metric space X intoa metric space Y . Show that if

xn → x ⇒ fn(xn)→ f (x)

for all x ∈ X and sequences (xn) ⊂ X, then ( fn) converges uniformly to f .Á Show, by a counter example, that this is not true if X is not compact.

. Exercise 1.33 (Ex. 34 in L.) Let ( fn) be a sequence of real-valued functionsuniformly bounded (i.e., fn(x) ≤ M) and Riemann integrable on [a, b]. Define forx ∈ [a, b]:

Fn(x) =∫ x

afn(y) dy.

Show that the sequence (Fn) has a subsequence that converges uniformly on [a, b].(16 hrs)

1.6 Completeness

Definition 1.21 (Cauchy sequence) Let (X, d) be a metric space. A sequence ofpoints (xn) in X is called a Cauchy sequence if

limm,n→∞

d(xn, xm) = 0,

i.e., if for every ε > 0 there exists an N such that d(xn, xm) ≤ ε for all m, n > N.

Comment: A converging sequence is always a Cauchy sequence, since if xn → x,then

d(xn, xm) ≤ d(xn, x) + d(xm, x).

On the other hand, a Cauchy sequence does not necessarily converge. Take thesequence xn = 1/n in (0, 1]. Then,

d(xn, xm) ≤ 1/min(m, n)→ 0,

but the sequence has no limit in this space.

Proposition 1.18 If a Cauchy sequence has a converging subsequence then thewhole sequence converges.

40 Chapter 1

Proof : Suppose that the Cauchy sequence (xn) has a subsequence (xnk) that con-verges to x. Then, for every ε > 0 there exists an N1 such that for all m, n > N1,d(xm, xn) ≤ ε and an N2 such that for all k for which nk > N2, d(xnk , x) ≤ ε. Then,for every n > max(N1,N2), there exists a k such that nk > N2 and

d(xn, x) ≤ d(xn, xnk) + d(xnk , x) ≤ ε + ε.

n

Definition 1.22 (Complete metric space) A metric space is called complete ifevery Cauchy sequence in X converges.

Examples:

• R is a complete metric space (elementary calculus course).

• Q is not a complete metric space because the sequence 1/n is Cauchy buthas no limit in Q.

• Completeness should not be confused with closedness. The metric space(Q, | · |) is closed (as every metric space). The fact that the set Q is open inthe metric space (R, | · |) is irrelevant to this purpose. Yet:

• If (X, d) is a complete metric space and Y ⊂ X, then (Y, d) is a completemetric space iff Y is closed in X.

• Every compact metric space is complete (a consequence of Proposition 1.18).

• A complete normed space is called a Banach space; a complete inner prod-uct space is called a Hilbert space.

N

Proposition 1.19 If K is a compact metric space then C(K) is a complete metricspace.

Proof : We have already seen that C(K) is a normed space, hence a metric space.Let ( fn) be a Cauchy sequence in C(K), i.e.,

limm,n→∞

supx∈K| fn(x) − fm(x)| = 0.

This is precisely Cauchy’s criterion for uniform convergence, that is, ( fn) con-verges uniformly, and its limit is in C(K). n

Metric Spaces 41

Comment: Completeness is not a topological invariant: it is not conserved undera homeomorphism, as shows the example of f (x) = tan x which is a homeomor-phism between the incomplete space (−π/2, π/2) and the complete space R.

. Exercise 1.34 Prove that a metric space (X, d) is complete if and only if forevery decreasing sequence of closed balls,

B(x1, r1) ⊃ B(x2, r2) ⊃ · · · ⊃ B(xn, rn) ⊃ . . . ,

such that rn → 0, we have∞⋂

n=1

B(xn, rn) , ∅.

. Exercise 1.35 This exercise shows that the condition rn → 0 in the previousexercise was necessary. Let X = N, and define

d(m, n) =

0 m = n1 + 1

m+n m , n..

À Show that d is a metric on X.Á Show that X is complete with respect to this metric.Â Show that every point in X is isolated, i.e., for every n ∈ N there exists an

rn such that B0(xn, rn) = {xn}.Ã Show that the sequence of closed balls Bn = B(n, 1 + 1/2n) is decreasing,

i.e., Bn+1 ⊂ Bn, and yet their countable intersection is empty.

. Exercise 1.36 Let (X, d) be a metric space and D ⊂ X a dense set. Show thatX is complete if and only if every Cauchy sequence in D converges in X.

. Exercise 1.37 Show that if X and Y are metric spaces, Y is complete, D ⊂ X isa dense subset of X, and f ; D→ Y is a uniformly continuous function, then f hasa unique extension which is a function on the whole X. That is, that there existsa unique continuous function, f : X → Y , such that f |D = f . Show, furthermore,that the uniform continuity of f is necessary.

. Exercise 1.38 Prove that the following spaces are Banach spaces:

À The space `∞ of infinite sequences with the norm

‖x‖∞ = supn|xn|.

42 Chapter 1

Á The space `1 of infinite sequences with the norm

‖x‖1 =∞∑

n=1

|xn|.

1.7 Contractive mapping theorem

x

f!x" Definition 1.23 Let (X, d) be a metric space. A mapping f : X → X is calledcontractive if there exists a real number λ < 1 such that for all x, y ∈ X

d( f (x), f (y)) ≤ λ d(x, y).

Theorem 1.11 (Contractive mapping theorem) Let (X, d) be a complete metricspace. If f : X → X is contractive, then it has a unique fixed point x ∈ X(i.e., a point where f (x) = x).

Proof : The proof is constructive: let x0 be an arbitrary point in X and considerthe sequence (xn) defined inductively by xn+1 = f (xn). We first show that (xn) is aCauchy sequence. Note that

d(xn+2, xn+1) = d( f (xn+1), f (xn)) ≤ λ d(xn+1, xn),

and by induction, d(xn+1, xn) ≤ λnd(x1, x0). For n > m,

d(xn, xm) ≤ d(xn, xn−1) + d(xn−1, xn−2) + · · · + d(xm+1, xm)

≤

n−1∑k=m

λkd(x1, x0) ≤λm

1 − λd(x1, x0),

which converges to zero as m, n → ∞. Since the space is complete it follows thatxn → x.

We next claim that the contractive property implies continuity, as for every ε > 0we may take δ = ε/λ, and

d(x, y) ≤ δ ⇒ d( f (x), f (y)) ≤ λδ = ε.

Then, taking the n → ∞ limit of the relation xn+1 = f (xn) we obtain that x is afixed point of f .

Finally, if x, y are two fixed point then

d(x, y) = d( f (x), f (y)) ≤ λ d(x, y) < d(x, y),

i.e., d(x, y) = 0, which proves uniqueness. n

Metric Spaces 43

Example: Let p > 1. What is the value of the following expression:

x =

√p +

√p +√

p + · · ·.

By this expression, we mean the limit of the sequence x0 = p, xn+1 =√

p + xn.To show that this limit exists, we use the contractive mapping theorem. First, weclaim that the mapping

x 7→√

p + x ≡ Φ(x)

maps the interval K = [0, 2p] into itself. Indeed, the mapping is monotonic, and2p 7→

√3p ≤ 2p. Thus, Φ is a mapping in the complete metric space K.

We then show that this mapping is contracting: for x, y ∈ K,

Φ(x) − Φ(y) = Φ′(ξ)(x − y),

where Φ′(ξ) = 1/2√

p + ξ < 1/2, and ξ is a point between x and y. It follows that

|Φ(x) − Φ(y)| ≤12|x − y|,

i.e., the mapping is contracting. Thus (xn) converges to the unique fixed point,satisfying x =

√p + x, i.e.,

x =12+

12

√1 + 4p.

N

Example: Suppose we want to solve the nonlinear equation f (x) = 0, where

x

f!x"

x0 x1

f : R→ R is continuous. Consider the following iterations,

xn+1 = xn − α f (xn) ≡ Φ(xn),

with x0 chosen arbitrarily, and α is to be determined. Then. for every x, y ∈ R,

|Φ(x) − Φ(y)| = |1 − α f ′(ξ)||x − y|,

where ξ is a point between x and y. If f ′(x) has the property that 0 < λ1 ≤ f ′(x) ≤λ2 < M, then we can choose α = 2/M, so that

|1 − α f ′(ξ)| ≤ max (|1 − 2λ1/M|, |1 − 2λ2/M|) < 1,

so that the mapping is contractive and the iterations converge. N

We will now use the contractive map theorem to prove the fundamental theoremof existence of solutions to ordinary differential equations:

44 Chapter 1

Theorem 1.12 (Picard, Existence and uniqueness of solutions to ODEs) Let f :R2 → R be a function such that f (t, y) is a continuous function of t for fixed y anduniformly Lipschitz continuous in y for fixed t. That is, there exists an L > 0 suchthat for every t

| f (t, y) − f (t, z)| ≤ L|y − z|.

Let t0 and y0 be arbitrary real numbers. Then, there exists an a > 0, such that thereexists a unique function y ∈ C[t0 − a, t0 + a] satisfying the differential equation,

y′(t) = f (t, y(t))

subject to the initial condition y(t0) = y0.

Comment: We have given here very restrictive conditions on f to simplify theproof. A similar proof holds if we restrict the conditions on f to a neighborhoodof the point (t0, y0). More general situations are treated with the course on ordinarydifferential equations.

Proof : For every a > 0, the constant function function z0(t) = y0 is in C[t0−a, t0+

a]. We construct then the following function,

z1(t) = y0 +

∫ t

t0f (s, z0(s)) ds,

which belongs to C[t0 − a, , t0 + a] as well. Thus, we construct a sequence offunctions, (zn) ⊂ C[t0 − a, t0 + a], defined recursively by

zn(t) = y0 +

∫ t

t0f (s, zn(s)) ds. (1.1)

Recall that since [t0 − a, t0 + a] is compact, then C[t0 − a, t0 + a] is a completemetric space. Eq. (1.1) defines a mapping Φ : C[t0 − a, t0 + a]→ C[t0 − a, t0 + a],given by

(Φ(z))(t) = y0 +

∫ t

t0f (s, z(s)) ds.

We will show that if we choose a sufficiently small, then this mapping is contrac-tive, from which follows at once the existence and uniqueness of a fixed point,z ∈ C[t0 − a, t0 + a].

Metric Spaces 45

Now,

d(Φ(z),Φ(y)) = supt0−a≤t≤t0+a

∣∣∣∣∣∣∫ t

t0f (s, z(s)) ds −

∫ t

t0f (s, y(s)) ds

∣∣∣∣∣∣≤ sup

t0−a≤t≤t0+a

∣∣∣∣∣∣∫ t

t0| f (s, z(s)) − f (s, y(s))| ds

∣∣∣∣∣∣≤

∣∣∣∣∣∣ supt0−a≤t≤t0+a

∫ t

t0L|z(s) − y(s)| ds

∣∣∣∣∣∣≤ aL sup

t0−a≤t≤t0+a|z(t) − y(t)|

= aL d(z, y).

If we choose a < 1/L then the mapping is contractive.

It follows, from the contractive mapping theorem, that there exists a unique func-tion y ∈ C[t0 − a, t0 + a] satisfying the integral equation

y(t) = y0 +

∫ t

t0f (s, y(s)) ds.

It remains to show that such a y solves the differential equation, and that everysolution of the differential equation satisfies this integral equation. This is obvious.n (18 hrs)

1.8 Baire’s category theorem

Definition 1.24 A subset A ⊂ X of a metric space is called nowhere dense (i.e.,sparse) if the closure of A has an empty interior, i.e.,

(A)◦ = ∅.

Comment: An alternative definition is the following: A is called nowhere dense ifthere are no open sets O ⊆ X in which A ∩ O is dense. Indeed, (i) suppose A isnowhere dense but there exists an open set O such that A ∩ O ⊃ O. Recall that

A ⊃ A ∩ O ⊃ A ∩ O ⊃ O.

Thus, there exists an open set in the closure of A, which implies that A is notnowhere dense. (ii) Suppose now that there are no open sets O in which A ∩ O ⊃

46 Chapter 1

O. We need to show that the closure of A does not contain open sets. Suppose, bycontradiction, that there exists an open set O ⊂ A. Then,

O = O ∩ A ⊂ O ∩ A,

i.e., a contradiction. (The last inequality states that points that are both in O andlimits of sequences in A are contained in the set of points that are limits of se-quences of points that are both in O or in A.)

Examples:

À The empty set is nowhere dense.Á Every closed set that has no interior is nowhere dense.Â A singleton {x} is a nowhere dense set if and only if it is not an isolated

point (i.e., there does not exist an r such that B0(x, r) = {x}). Indeed, if it isisolated, there exists an open set contained in {x} (which is its own closure)and therefore the interior of {x} is not empty.

Ã A line in the plane is nowhere dense. There is no open set in the plane thatis contained in the line.

Ä Q is not a nowhere dense set in R.Å If A is nowhere dense then every B ⊂ A is nowhere dense.

N

Definition 1.25 A set A ⊂ X is called of first category if it can be represented asa countable union of sets that are nowhere dense. A set that is not of first categoryis called of second category.

Examples:

À The rationals Q are of first category as a subset of R (a countable union ofnon-isolated points).

Á A nowhere dense set is of first category, as is any finite union of nowheredense sets.

Â Any countable union of sets of first category is of first category.

N

Metric Spaces 47

Comment: Belonging to the first category is a measure of smallness (differentfrom the smallness as defined in measure theory). Sometime a set of first categoryis called meager. The following theorem states in what sense are sets of firstcategory small.

Theorem 1.13 (Baire) A set of first category in a complete metric space has anempty interior, i.e., contains no (non-empty) open set.

Proof : Let A be of first category in X, and let ∅ , G ⊂ X be an arbitrary open set.We need to show that G 1 A, i.e., that there exists a point x ∈ G which does notbelong to A.

Since A s of first category, it can be represented as ∪nAn, where all the An arenowhere dense. We can assume that the An are closed, otherwise, we will showthat ∪nAn has an empty interior.

Because A1 is nowhere dense and G is open, there exists a point x1 ∈ G \ A1.Furthermore, since G \ A1 is open, there exists an r1 < 1 such that

B0(x1, r1) ⊂ G \ A1.

Now, B0(x1, r1) is open, therefore there exists a point x1 ∈ B0(x1, r1) \ A2. More-over, since B0(x1, r1) \ A2 is open then there exists an r2 < 1/2 such that

B0(x2, r2) ⊂ B0(x1, r1) \ A2 ⊂ G \ (A1 ∪ A2).

We thus proceed inductively, where at the n-step we find a point xn and a numberrn < 1/n, such that

B0(xn, rn) ⊂ G \ (A1 ∪ A2 ∪ · · · ∪ An).

Consider now the sequence (xn). By construction, for every m > n we haved(xm, xn) < 1/n, which means that the sequence is a Cauchy sequence. Sincethe space is complete, it has a limit x. Since this point belongs to all the setsB0(xn, rn), it COMPLETE. n

48 Chapter 1

Chapter 2

Paths and integration along paths

2.1 Definitions

Definition 2.1 A path in a metric space is a continuous function f from a closedinterval, [a, b], into a metric space X. The path is called simple if the function fis one-to-one. It is called closed if f (a) = f (b). It is called simple and closed ifit is one-to-one, except for the end points.

Examples:

−4 −2 0 2 4 6 8−5

−4

−3

−2

−1

0

1

2f(t) = (t cos t, t sin t)

À The function f : [0, 2π]→ R2 defined by

f (t) = (t cos t, t sin t)

is a simple path.Á The function g : [0, 2π]→ R2 defined by

g(t) = (cos t, sin t)

is a simple closed path.

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1h(t) = (sin t, sin 2t)

Â The function h : [0, 2π]→ R2 defined by

h(t) = (sin t, sin 2t)

is a closed path, but it is not simple since h(0) = h(π) = h(2π).

50 Chapter 2

N

Note that we have defined paths to be functions from a segment into a metricspace. One may claim that the path as a geometric object is a set in the metricspace, and that the mapping from the segment is of second importance. For thatreason, it makes sense to call two paths equivalent if they only differ by the waythey are parameterized.

[a,b] [c,d]

Xf g

φ Definition 2.2 Two paths f : [a, b] → X and g[c, d] → X are said to beequivalent if there exists a strictly monotonically increasing continuous functionϕ : [a, b]→ [c, d] such that f = g ◦ ϕ. That is, f (t) = g(ϕ(t)), for all t ∈ [a, b].

Comment: The definition is not symmetric between f and g. A continuous strictlyincreasing function ϕ has a inverse ϕ−1 which is also continuous and strictly in-creasing, so that g = f ◦ ϕ−1 as well. It is easy to see that this is indeed anequivalence relation. We say that two equivalent paths are the same up to a repa-rameterization.

Paths have the property that they can be concatenated if the end point of one pathcoincides with the starting point of the second path.

Definition 2.3 Let f : [a, b] → X and g : [c, d] → X be paths with f (b) = g(c).Then, there is a natural concatenation, which is a new path, h : [a, b + d − c],denoted by h = g ? f , defined by

h(t) =

f (t) a ≤ t ≤ bg(t + c − b) b < t ≤ b + d − c.

Example: Consider the concatenation f ? g of the two paths in the examplesabove. What about the concatenation g ? f ? N

Definition 2.4 A path in a normed space X is called a segment in X if it is equiv-alent to a path f : [0, 1] → X of the form f (t) = tx + (1 − t)y, for x, y ∈ X. Apolygonal path is a concatenation of segments.

Notation: Let f : [a, b] → X and set [c, d] ⊂ [a, b]. We denote the restriction ofthe path f to the segment [c, d] by f |[c,d].

Paths in metric spaces—in spaces where we have a notion of distance—can beattributed a length. The distance between two points is well defined; intuitively,the length of a path is a sum of distances along the path. The precise definition isthe following:

Paths and integration along paths 51

Definition 2.5 Let f : [a, b]→ X be a path in a metric space (X, d). Its length isdefined as follows,

`( f ) = sup

n∑i=1

d( f (ti), f (ti−1)) : {ti}ni=0 is a partition of [a, b]

.If this supremum is finite, we say that the path f has finite length.

The subsequent analysis will be made simpler if we introduce some auxiliarynotations: first, a partition of a segment [a, b] is a finite ordered set of points{ti}

ni=0, with t0 = a and tn = b. We denote the set of all partitions of the segment

[a, b] by P[a, b]. Finally, for A ∈ P[a, b] we denote by |A| the number of segmentspartitioned by A.

Let then f : [a, b]→ X be a path. For every partition A ∈ P[a, b] we define

L( f ,A) =|A|∑i=1

d( f (ti), f (ti−1)).

With these definitions, the length of f is

`( f ) = supA∈P[a,b]

L( f ,A).

Comments:

À The length of a path is non-negative.

Á The length of a path whose range has two distinct points is positive.

Â It is always true that

`( f ) ≥ d( f (a), f (b)).

Ã A refinement of a partition A ∈ P[a, b] is a partition P[a, b] 3 B ⊃ A. Bythe triangle inequality, L( f ,B) ≥ L( f ,A).

Ä The length of a path is invariant under reparameterization, i.e., equivalentpaths has equal length. Indeed, let f : [a, b] → X and g : [c, d] → X be

52 Chapter 2

equivalent paths with f = g ◦ ϕ. Then,

`( f ) = sup

n∑i=1

d( f (ti), f (ti−1)) : {ti}ni=0 ∈ P[a, b]

= sup

n∑i=1

d(g(ϕ(ti)), g(ϕ(ti−1))) : {ti}ni=0 ∈ P[a, b]

= sup

n∑i=1

d(g(si), g(si−1)) : {si}ni=0 ∈ P[c, d]

= `(g).

Å The length of a path is not (necessarily) the “size” of image in X. Considerfor example the path f : [0, π] → R defined by f (t) = sin t. Its image is thesegment [0, 1] but its length is 2 (because is passes twice at every point).

Æ The story of Richardson who wanted to measure the length of the seashoreof Great Britain.

Proposition 2.1 Let f : [a, b] → X and g[c, d] → X be paths with f (b) = g(c).Then,

`(g ? f ) = `( f ) + `(g).

That is, the length of a concatenated path is the sum of the lengths of its con-stituents.

Proof : The concatenated path h = g ? f is defined on the interval [a, b + d − c].Thus,

`(h) = supA∈P[a,b+d−c]

L(h,A).

Now, for every partition A ∈ P[a, b + d − c] corresponds a refinement A ∪ {b},which is the union of a partition B ∈ P[a, b] and a partition C ∈ P[b, b + d − c],where the latter is a partition of [c, d] shifted by b − c. Clearly,

L(h,A) ≤ L( f ,B) + L(g,C).

Since such a refinement can be found for any partition A it follows that

`(h) ≤ `( f ) + `(g).


Conversely, let B ∈ P[a, b] and C ∈ P[c, d]. For any such couple corresponds apartition A ∈ P[a, b + d − c], given by B ∪ (C + b − c), and

L( f ,B) + L(g,C) = L(h,A).

Since we can do so for any B,C, it follows that

`( f ) + `(g) ≤ `(h),

which concludes the proof. n (20 hrs)

Proposition 2.2 Let f : [a, b] → X be a path of finite length. For every t ∈ [a, b]define

s(t) = `( f |[0,t]),

which is the length of the path up to the point t. Then,

À s(t) is monotonically increasing.Á s(t) is continuous.Â If f is not constant on any segment, then s(t) is strictly increasing.

Proof : For every t1 < t2 we have f |[0,t2] = f |[t1,t2] ? f |[0,t1], then

s(t2) = s(t1) + `( f |[t1,t2]) ≥ s(t1),

i.e., s(t) is monotonically increasing.

Let t be given. We are going to show that it is left-continuous; the same argumentswill prove continuity from the right. Given ε > 0, there exists a partition A of [0, t]such that

L( f |[0,t],A) > `( f |[0,t]) −ε

2= s(t) −

ε

2.

Since f is continuous, there exists a point u ∈ (tn−1, t) such that

d( f (u), f (t)) <ε

2.

Now,

s(u) ≥ d( f (t1), f (t0)) + · · · + d( f (tn−1, f (u))

> d( f (t1), f (t0)) + · · · + d( f (tn−1, f (u)) + d( f (u), f (t)) −ε

2≥ L( f |[0,t],A) −

ε

2> s(t) − ε.

54 Chapter 2

Thus, for every ε > 0 there exists a u < t such that

r ∈ (u, t) ⇒ s(r) ∈ (s(t) − ε, s(t)),

i.e. s(t) is left-continuous.

Finally, suppose that f is nowhere constant in a segment, and let t1 < t2. Thenthere exists a point t1 < u < t2 which differs either from t1 or from t2, and

`( f[t1,t2]) ≥ d( f (t1), f (u)) + d( f (u), f (t2)) > 0,

from which follows that

s(t2) = s(t1) + `( f |[t1,t2]) > s(t1),

n

. Exercise 2.1 Let f : [a, b]→ X be a path of finite length which is not constanton any segment. Then there exists an equivalent path g : [0, `( f )]→ X, such that

`(g|[0,t]) = t

for all 0 ≤ t ≤ `( f ). The path g is said to be parameterized by its arc-length.

. Exercise 2.2 (Ex. 6 in L.) Let fn : I → Y be paths in a metric space Y , andsuppose that fn(t)→ f (t) for every t ∈ I.

À Prove that `( f ) ≤ lim infn→∞ `( fn).Á Find an example for a sequence of paths fn that converge uniformly to a

path f and yet `( f ) < lim infn→∞ `( fn).

2.2 Differentiable paths in Rn

So far, we have only required paths to be continuous. To extend our treatmentto differentiable paths, i.e., to path that also have smoothness properties, we mustconsider path in a normed space. In this section we deal with path in the Euclideanspace Rn.

Comment: Note that without loss of generality we can require the path to be afunction from the unit interval I = [0, 1] by a reparametrization.


Definition 2.6 A path f : [I → Rn is said to be differentiable at a point t ∈ (a, b)if the following limit exists,

limh→0

f (t + h) − f (t)h

def= f ′(t),

where the limit is with respect to the Euclidean metric. The value f ′(t) ∈ Rn

is called the derivative of the path f at the point t. We can similarly definedifferentiability on the right or on the left of the point.

Comment: Recall that convergence in Rn is equivalent to convergence in everycomponent, i.e., xk = (x1,k, . . . , xn,k) converges in Rn to x = (x1, . . . , xn) if and onlyif x j,k → x j in R for all j. It follows that f (t) = ( f1(t), . . . , fn(t)) is differentiable atthe point t if and only if every component f j(t) is differentiable at t (as a functionR→ R). Then,

f ′(t) = ( f ′1(t), . . . , f ′n(t)).

Example: The derivative of a segment f : I → Rn,

f (t) = tx + (1 − t)y

isf ′(t) = lim

h→0

[(t + h)x + (1 − t − h)y] − [tx + (1 − t)y]h

= x − y.

N

Definition 2.7 A path f : I → Rn is said to be differentiable if it is differentiableat every t ∈ (0, 1), and it is differentiable on the right at t = 0 and on the left att = 1. It is said to be continuous differentiable if its derivative is a continuousfunction. Finally, it is said to be piecewise differentiable it is can be representedas a finite number of concatenated differentiable paths.

Continuously differentiable paths have finite length. In fact, we will now derivean explicit expression for their length.

Lemma 2.1 Let f = ( f1, . . . , fn) be a continuously differentiable path I → Rn,and let s(t) be the length of the path up to point t. Then, s(t) is differentiable ands′(t) = ‖ f ′(t)‖.

56 Chapter 2

Proof : Without loss of generality, we may show that s(t) is right-sided differen-tiable. For h > 0 we have

s(t + h) − s(t)h

=1h`( f |[t,t+h]).

The right hand side satisfies

1h`( f |[t,t+h]) ≥

d( f (t), f (t + h))h

=

∥∥∥∥∥ f (t + h) − f (t)h

∥∥∥∥∥ ,where we have used the basic property of a length of a path being longer thenthe distance between its end points, and the explicit expression for the distancein Rn. Since the norm is a continuous function Rn → R, then the right-hand sideconverges as h→ 0, and therefore

lim infh→0+

s(t + h) − s(t)h

≥ ‖ f ′(t)‖. (2.1)

On the other, for any partition {ti}mi=0 of the segment [t, t + h],

1h

m∑i=1

d( f (ti), f (ti−1)) =m∑

i=1

(ti − ti−1)h

∥∥∥ f ′1(ti−1 + θi,1∆ti), . . . , f ′n(ti−1 + θi,n∆ti)∥∥∥

≤ lim infξ1,...,ξn

∥∥∥ f ′1(ξ1), . . . , f ′n(ξn)∥∥∥ ,

where ξ j ∈ (t, t + h) for all j = 1, . . . , n. Taking the supremum over all partitionsgives the inequality

1h`( f |[t,t+h]) ≤ lim inf

ξ1,...,ξn

∥∥∥ f ′1(ξ1), . . . , f ′n(ξn)∥∥∥ .

Finally, taking h→ 0,

lim suph→0+

s(t + h) − s(t)h

≤ ‖ f ′(t)‖. (2.2)

Combining (2.1) and (2.2) we conclude that

limh→0+

s(t + h) − s(t)h

= ‖ f ′(t)‖.

The existence of the left-sided derivative is proved similarly. n


Theorem 2.1 Let f : I → Rn be a piecewise continuously differentiable path.Then

`( f ) =∫ 1

0‖ f ′(t)‖ dt.

Proof : If f is continuously differentiable then by the above lemma,

`( f ) = s(b) = s(b) − s(a) =∫ b

as′(t) ds =

∫ b

a‖ f ′(t)‖ dt.

If it is piecewise continuously differentiable then we can express the length of thepath as a sum of such expressions and use the additivity of the integral. n

Example: Forf (t) = (cos t, sin t) t ∈ [0, 2π]

we have f ′(t) = (− sin t, cos t), hence

‖ f ′(t)‖ =√

sin2 + cos2 = 1,

and therefore

`( f ) =∫ 2π

0‖ f ′(t)‖ dt = 2π.

What happens if instead

f (t) = (cos√

t, sin√

t) t ∈ [0, 4π2]?

N (22 hrs)

. Exercise 2.3 (Ex. 1 in L.) Calculate the length of the path f : [0, 2π] → R3

defined byf (t) = (cos t, sin t, t).

What does the path look like?

. Exercise 2.4 (Ex. 4 in L.) Let f : I → Rn be a continuously differentiablesimple path, with f ′(t) , 0 everywhere on I. Denote by s(t) the arclength up to t.

À Prove thatlimh→0

|s(t + h) − s(t)|‖ f (t + h) − f (t)‖

= 1,

where the limit is uniform on I.

58 Chapter 2

Á Show that the condition f ′(t) , 0 is essential by a counter-example.

. Exercise 2.5 (Ex. 7 in L.) Let f : [0, 1]→ Rn be a continuously differentiablepath, where Rn is a metric space with respect to the norm ‖ · ‖p for some p ≥ 1.Prove that

`( f ) =∫ 1

0‖ f ′(t)‖p dt.

. Exercise 2.6 Parametrize the following curves and calculate their lengths:

À The cycloid: the path in R2 traced by a point on the perimeter of a rotatingwheel between two consecutive times it touches the ground.

Á A helix: a path in R3 whose projection on the xy plane is a circle and whoseprojection on the z axis has a fixed derivative. Calculate its length over onecycle.

. Exercise 2.7 Calculate the length of the following paths:

À The unit circle x2 + y2 = 1, where the metric is induced by the `∞ norm inR2.

Á The asteroid x2/3 + y2/3 = 1, where the metric is induced by the `1 norm inR2.

2.3 Homotopy and connectedness

Paths have multiple applications in mathematics. In subsequent sections these ap-plications will be of “analytical” nature. Paths also have multiple applications inother fields of mathematics, for example in topology. The central theme in al-gebraic topology is the determination whether two topological spaces are homeo-morphic. Recall that a positive answer can be given if a homeomorphism is found.To answer negatively, one has to find a topological invariant that differs in bothspaces, e.g., compactness. Paths provide another tool for constructing topologicalinvariants.

Definition 2.8 A set A in a metric space (X, d) is said to be path connected ifthere exists for every two points x, y ∈ A a path f : I → A, such that f (0) = x andf (1) = y.


Examples:

À The unit circle S1 ⊂ R2 is path connected. Let x = (cosα, sinα) and y =(cos β, sin β) be two points on the circle. The mapping

t 7→ (cos((1 − t)α + tβ), sin((1 − t)α + tβ)), t ∈ [0, 1]

is a path in S1 connecting x and y.

Á The punctured line, X = R\{0} is not path-connected. Suppose there existeda path f : I → X connecting the points −1 and 1. The function f is alsoa continuous function as a function into R, because if A ⊂ R is open in R,then A \ {0} is also open in X, and therefore its pre-image under f is open inI. By the mid-value theorem, there must be a 0 < t < 1 for which f (t) = 0,in contradiction of f being a function into X.

N

. Exercise 2.8 (Ex. 9 in L.) Let (X, d) be a path-connected metric space, and letf : X → R be a continuous function. Prove that if there exist points x, y ∈ X suchthat f (x) = a and f (y) = b, then f assumes on X all values between a and b.

Proposition 2.3 Let X, Y be metric spaces, and f : X → Y a continuous mapping.If X is path-connected so is its image f (X).

Proof : Let y1, y2 be two points in f (X). By definition, there exist two pointsx1, x2 ∈ X such that y1 = f (x1) and y2 = f (x2). Since X is path connected, thereexists a continuous function ϕ : I → X satisfying ϕ(0) = x1 and ϕ(1) = x2. Thecomposite function f ◦ ϕ is a path in f (X) connecting the points y1 and y2, hencef (X) is path-connected. n

Corollary 2.1 Path connectedness is a topological invariant.

We can now use path connectedness to show that a segment is not homeomorphicto a circle. For that, we will be using the following lemma:

Lemma 2.2 Let f : X → Y be a homeomorphism between two metric spaces, andlet A ⊂ X. Then, f is also a homeomorphism between X \ A and Y \ f (A).

60 Chapter 2

Proof : Because f is one-to-one and onto Y , then its restriction to X \ A is one-to-one and onto Y \ f (A). We only need to show that this restriction is continuoussince the same will automatically apply to its inverse. Let (xn) be a convergentsequence in X \ A; call its limit x. Since it is a sequence in X, then f (xn), which isa sequence in Y \ f (A), converges in Y to f (x). Thus, f (xn) converges in Y \ f (A),from which follows that the restriction of f to X \ A is continuous. n

Proposition 2.4 I and S1 are not homeomorphic.

Proof : Suppose there existed a homeomorphism f : I → S1, and let x be aninterior point in I. Then, it has a unique image y = f (x) ∈ S1. The function finduces a homeomorphism from I \ {x} into S1 \ { f (x)}, however the second ispath-connected whereas the first is not, which contradicts the previous corollary.Hence there can not be a homeomorphism from I to S1. n

. Exercise 2.9 (Ex. 11 in L.) A metric space X is called connected if for everypartition of X as the union of two disjoint open sets one of the sets is empty. Thatis, for every open A, B,

A ∪ B = X, A ∩ B = ∅ ⇒ A = ∅ or B = ∅.

À Show that path-connectedness implies connectedness.Á Show that the subset of R2,

{(x, sin 1/x) : x , 0} ∪ {(0, y) : y ∈ R}

is connected but not path-connected.

. Exercise 2.10 (Ex. 14 in L.) Let {Xi}ni=1 be path-connected sets in a metric

space X such that for every pair i, j, Xi ∩ X j , ∅. Show that ∪ni=1Xi is path-

connected.

Path connectivity is one among many other path-related topological invariants Inorder to construct more invariants we need to introduce the notion of homotopy.

t

sX

H

Definition 2.9 Two closed paths (or loops), f , g : I → X in a metric space arecalled homotopic if there exists a continuous function H : I2 → X satisfying,

À For every fixed s ∈ I the function H(t, s) is a closed path (as a function of t).


Á H(t, 0) is coincides with f (t) and H(t, 1) coincides with g(t).

The function H is called a homotopy between the closed paths f and g.

Comment: A homotopy can be defined between non-closed path as well. Twopaths are homotopic if, roughly speaking, there exists a continuous transition fromone to the other.

Proposition 2.5 Homotopy defines an equivalence relation between closed paths.

Proof : A path f : I → X is homotopic to itself with respect to the homotopyH(t, s) = f (t). If H(t, s) is a homotopy between f and g then H(t, 1 − s) is ahomotopy between g and f . Finally, let H(t, s) be a homotopy between the pathsf and g and H∗(t, s) be a homotopy between the path g and h. Then the functionH∗∗ : I2 → X defined by

H∗∗(t, s) =

H(t, 2s) 0 ≤ s ≤ 1/2H∗(t, 2s − 1) 1/2 < s ≤ 1

is a homotopy between f and h. n

Proposition 2.6 Two closed paths f , g : I → X that are equivalent are homo-topic.

Proof : By definition, there exists a continuous monotonically increasing ontofunction ϕ : I → I, such that f = g ◦ ϕ. The function

H(t, s) = f ((1 − s)t + sϕ(t))

is a homotopy between f and g. n

Comment: The fact that homotopy is preserved under equivalence justifies ourrestriction to paths defined in the unit interval I. More generally, two paths f :[a, b] → X and g : [c, d] → X are called homotopic if they are equivalent tohomotopic paths f ∗ : I → X and g∗ : I → X.

Definition 2.10 A metric space is called simply-connected if every two closedpaths in it are homotopic.

62 Chapter 2

Comment: Since a “point” is a closed path, simple-connectedness means thatevery loop can be continuously deformed to any point.

Example: Consider any convex set A in a normed space X. We claim that it issimply-connected. Let f , g : I → A be closed path, and consider the function

H(t, s) = sg(t) + (1 − s) f (t).

Since A is convex then H(t, s) ∈ A for all (t, s) ∈ I2, and this function is a homo-topy between f and g. N

. Exercise 2.11 Show that any “star-shaped” subset of a normed space is simplyconnected.

Proposition 2.7 A simply-connected metric space is path-connected.

Proof : Let X be a simply-connected metric space and let x, y ∈ X. We need toshow that there exists a path h : I → X that connects the points x and y. Theconstant path f (t) = x and g(t) = y are closed paths. Since they are homotopicthere exists a homotopy H(t, s) such that H(t, 0) = f (t) = x and H(t, 1) = g(t) = yfor all t. The function h(s) = H(0, s) is a path in X that connects the points x andy. n

Comment: Since a constant path (a point) is a closed path, it follows that a metricspace is simply connected if it is path-connected and every closed path is homo-topic to a point.

Proposition 2.8 Simple-connectivity is a topological invariant.

Proof : Let X be a simply-connected metric space and ϕ : X → Y a homeomor-phism. We need to show that Y is simply connected. Let then f , g : I → Y beclosed paths in Y . The functions ϕ−1 ◦ f and ϕ−1 ◦g are closed paths I → X. SinceX is simply-connected, there exists a homotopy H : I2 → X such that

H(t, 0) = ϕ−1( f (t)) and H(t, 1) = ϕ−1(g(t)).

Consider now the function

H(t, s) = ϕ(H(t, s)).


It is a continuous function I2 → Y satisfying,

H(t, 0) = f (t) and H(t, 1) = g(t),

from which we conclude that Y is simply-connected. n

We conclude this section by arguing that to prove that a path-connected space isnot simply-connected is not an easy task. An example of a space that is not simplyconnected is the unit circle S1 in which the closed path

f (t) = (cos t, sin t)

is not homotopic to the constant path g(t) = (0, 0). Proving it is however not easy.

Without proving it formally, we state that the punctured unit square I2 \ {(0, 0)} isnot simply-connected, whereas the punctured unit cube I3 \ {(0, 0, 0)} is simply-connected. By the previous proposition it follows that the unit cube is not home-omorphic to the unit square.

2.4 Sets of measure zero and the Peano curve

Measure theory (a third year course) deals with the assignment of “volume”, or“weight” (a measure) to sets. A measure on, say, Rn is a function that assignsa value, µ(A), to certain subsets A ⊂ Rn (the measurable sets). Building such ameasure requires a whole machinery. It turns out, however, that it is possible todefine when a set has measure zero, without actually defining what a measure is.

Definition 2.11 A rectangular box in Rn is a closed set of the form

A = [a1, b1] × · · · × [an, bn],

where a j < b j for j = 1, 2, . . . , n. We define the volume of the box to be

V(A) =n∏

i=1

(bi − ai).

Definition 2.12 A set A ⊂ Rn is said to have measure zero if for every ε > 0there exists a sequence of rectangular boxes {Bi} that cover A (i.e., A ⊂ ∪iBi), and

∞∑i=1

V(Bi) ≤ ε.

We write µ(A) = 0.

64 Chapter 2

Comment: If A can be covered by a finite set of boxes then we can always add aninfinite sequence of boxes that have arbitrarily small volumes.

Comment: Note that a set has measure zero with respect to the “embedding”space Rn. For example, a segment has a nonzero measure as a subset of R but zeromeasure as a subset of R2.

Example:

À A point x ⊂ Rn has measure zero. Indeed, for every ε > 0 the rectangularbox

A = [x1 − δ, x1 + δ] × · · · × [xn − δ, xn + δ]

includes x and has volume (2δ)n, so choosing δ < 12ε

1/n gives V(A) < ε.Á The segment

([a, b], 0) ⊂ R2

has zero measure in R2 since it can be covered by any box of the form[a, b] × [−δ, δ]. whose volume is 2(b − a)δ.

N(24 hrs)

Proposition 2.9

À If A ⊂ Rn has measure zero then any B ⊂ A has measure zero.Á If in a sequence (An) all the An have measure zero inRn, then their countable

union has measure zero.

Proof : The first statement is obvious. For the second, we note that given ε > 0we can cover each An by countably many boxes (Bn,m), such that

∞∑m=1

V(Bn,m) ≤ε

2n .

Now,∞⋃

n=1

An ⊂

∞⋃n,m=1

Bn,m,

and∞∑

n,m=1

V(Bn,m) ≤∞∑

n=1

ε

2n = ε.

n


Proposition 2.10 Let f : I → Rn be a path of finite length, n ≥ 2. Then the imageof f in Rn has measure zero.

Proof : Let f : I → Rn be given, and let s(t) be the arclength function; it is giventhat s(1) = `( f ) ≡ ` < ∞. Given ε > 0 we choose m such that

(2`)n

m< ε.

We then partition I into m segments,

0 = t0 < t1 < · · · < tn = 1,

such that s(t j) = j`/m (i.e., into sub-paths of equal length, `( f |[ti−1,ti]) = `/m).Around every point f (t j) we take a closed ball B j of radius `/m. We claim thateach such ball includes the sub-segment f |[ti−1,ti], otherwise this segment wouldhave a length longer than `/m. Moreover, every such ball B j in contained in a boxC j whose sides have length 2`/m. Thus,

{ f (t) : t ∈ I} ⊂m⋃

j=1

B(t j, `/m) ⊂m⋃

j=1

C j,

andm∑

j=1

V(C j) = m ·(2`m

)n

=1

mn−2

(2`)n

m≤ ε.

n

The last proposition requires explicitly the path to have finite length. It is possibleindeed to construct paths of infinite length whose image has finite measure in R2.An example of such a path is the Peano curve. It is a path in R2 whose image isthe entire unit square I2. The latter does not have measure zero.

Note that such a path cannot be a simple path, otherwise there would exist a home-omorphism between I and I2, which we know is impossible (the path-connectednessargument).

We proceed to construct the Peano curve. Consider the function f : R → [0, 1]defined by

f (x) =

x 0 ≤ x ≤ 11 1 < x ≤ 45 − x 4 < x ≤ 50 5 < x ≤ 8,

66 Chapter 2

and we continue this function periodically, f (x + 8) = f (x). For t ∈ I we definethe mapping t 7→ (x(t), y(t)), where

x(t) =∞∑

k=1

f (8kt)2k y(t) =

∞∑k=1

f (8kt + 2)2k . (2.3)

By the Weierstraß M-test (using the fact that | f (x)| ≤ 1) these series convergeuniformly, and therefore the limit is continuous. Since 0 ≤ x(t), y(t) ≤ 1, theimage of this path is included in the unit square, (x(t), y(t)) ∈ I2.

It remains to show that this path fills the entire unit square. Any point (x0, y0) hasa binary representation

x0 =

∞∑k=1

ak

2k y0 =

∞∑k=1

bk

2k ,

where ak, bk are either zero or one. Define then for every k ∈ N,

ξk =

5 ak = 0, bk = 0 ⇒ f (ξk) = ak

7 ak = 0, bk = 1 ⇒ f (ξk) = ak

3 ak = 1, bk = 0 ⇒ f (ξk) = ak

1 ak = 1, bk = 1 ⇒ f (ξk) = ak,

and set t0 =∑∞

k=1 ξk/8k. Then1,

x(t0) =∞∑

k=1

f (ξk)2k =

∞∑k=1

ak

2k = x0

y(t0) =∞∑

k=1

f (ξk + 2)2k =

∞∑k=1

bk

2k = y0.

That is, for every (x0, y0) ∈ I2 there exists a t0 ∈ I such that (x(t0), y(t0)) = (x0, y0).

In Figure 2.1 we display the images of paths generated by truncating the sum (2.3)at n = 2, 3, 5, 10.

Comment: One can also construct a simple path in R2 whose image does not havemeasure zero.

1Note that for every 0 < θ < 1, f (ξk + θ) = f (ξk) = ak. Now,

f

8k∞∑

l=1

ξl

8l

= f

k−1∑l=1

8k−lξl + ξk +

∞∑l=k+1

ξl

8l−k

= f (ξk).


0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.80

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8Peano curve, n=2

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.90

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9Peano curve, n=3

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Peano curve, n=5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Peano curve, n=10

Figure 2.1: paths generated by truncating the sum (2.3) at n = 2, 3, 5, 10.

2.5 Integration of a vector field along a path

In the first calculus course we learned about integrals of real-valued functions. Areal-valued function f (x) on an interval [a, b] can be viewed as a function definedon a path; the trivial path given by the identity [a, b] → [a, b]. This concept canbe generalized for integrals of functions defined on arbitrary paths.

Definition 2.13 A vector field on a set A ⊂ Rn is a function f : A→ Rn.

If vectors in Rn can be visualized as arrows that originate from the origin, the wayto visualize a vector field, f (x), is as an arrow that originates at the point x.

Example: Consider the vector field D2 → Rn (D2 is the unit disc):

f (x, y) =

−y√x2 + y2

,x√

x2 + y2

,which is visualized in the figure below.

68 Chapter 2

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

N

Thus a vector field is a mapping from (a subset of) Rn to Rn. If f is a vector field,we denote its components by

f (x) = ( f1(x1, . . . , xn), . . . , fn(x1, . . . , xn)).

Similarly, we denote the components of a path γ : I → Rn by

γ(t) = (γ1(t), . . . , γn(t)).

Definition 2.14 Let f : A → Rn be a vector field, where A ⊂ Rn and let γ :[a, b] → A be a differentiable path in A. The integral of f along the path γ isdenoted by ∫

γ

f · dγ

and is defined by ∫γ

f · dγ =n∑

j=1

∫ b

af j(γ(t)) γ′j(t) dt.

The definition holds when the integrals on the right hand side exist.

Comment: The integral of a vector field along a path does not necessitate, strictlyspeaking, the path to be differentiable. It is sufficient for the path to be of boundedvariation, a concept which is beyond the scope of the present course. A morecomprehensive treatment can be found in Lindenstrauss, Chapter IX.


Example: Consider the vector field on R2,

f (x, y) = (1 + y2, x)

and the path γ(t) = (cos t, sin t) for 0 ≤ t ≤ 2π (i.e., the path is a circle). Then,∫γ

f · dγ =∫ 2π

0(1 + sin2 t)(− sin t) dt +

∫ 2π

0cos t cos t dt = π.

N (26 hrs)

The following properties of the integral along paths are easily verified:

À Linearity. If f and g are vector fields, then∫γ

(a f + bg) · dγ = a∫γ

f · dγ + b∫γ

g · dγ.

Á Additivity. If γ1 and γ2 can be concatenated, then∫γ2?γ1

f · dγ2?1 =

∫γ1

f · dγ1 +

∫γ2

f · dγ2.

Â Path inversion. Let γ : [a, b] → Rn be a differentiable path and set γ1 :[−b,−a]→ Rn be the path that goes “backward”, γ1(t) = γ(−t). Then2,∫

γ1

f · dγ1 = −

∫γ

f · dγ.

Comment: By the additivity of the integral we can equally well make sense ofintegration along a piecewise differentiable path.

What motivates this definition of the integral along paths? The main motivationcomes from physics where, for example, the vector field may be a force field, and

2Indeed,∫γ1

f ·dγ =∑

j

∫ −a

−bf j(γ1(t)) (γ′1(t)) j dt =

∑j

∫ −a

−bf j(γ(−t)) (−γ j(−t)) dt =

∑j

∫ a

bf j(γ(s)) γ j(s) ds.

70 Chapter 2

the integral of the force along a path is the work exerted by the force field. Thinkof the path integral in terms of its approximation by a Riemann sum,

∫γ

f · dγ ≈N∑

i=1

n∑j=1

f j(γ(ti))γ′j(ti)∆ti.

To order O(∆t2j ) we have

γ′j(ti)∆ti = γ j(ti) − γ j(ti−1) def= ∆γ j(ti).

Thus, ∫γ

f · dγ ≈N∑

i=1

( f (γ(ti)),∆γ(ti)) ,

i.e., the integral of a vector field along a path is the limit of sums of the innerproduct of the function and increments along the path (scalar product in physics).

The next proposition shows that the integral of a vector field along a path is inde-pendent of parameterization, i.e., that equivalent paths yields the same integrals.Recall that paths γ1 : [a, b] → Rn and γ2 : [c, d] → Rn are equivalent if there ex-ists a continuous, monotonically increasing function φ : [a, b] → [c, d] such thatγ1 = γ2 ◦ φ. Note that γ1 could be differentiable without γ2 being differentiable,and vice versa. Since we restrict our discussion to differentiable paths, we assumethat the equivalence mapping is differentiable as well.

Proposition 2.11 Let γ1 : [a, b] → Rn and γ2 : [c, d] → Rn be equivalent differ-entiable paths with γ1 = γ2 ◦ φ, φ differentiable. Then for every function f (forwhich the integrals exist)

∫γ1

f · dγ1 =

∫γ2

f · dγ2.


Proof : It is a simple consequence of the change of variable formula:∫γ1

f · dγ1 =

n∑j=1

∫γ1

f j(γ1(t))(γ1)′j(t) dt

=

n∑j=1

∫γ1

f j(γ2(φ(t)))(γ2 ◦ φ)′j(t) dt

=

n∑j=1

∫γ1

f j(γ2(φ(t)))(γ2)′j(φ(t))φ′(t) dt

=

n∑j=1

∫γ1◦φ−1

f j(γ2(s))(γ2)′j(s) ds

=

∫γ2

f · dγ2.

n

2.6 Conservative fields

The following definition is again motivated from physics.

Definition 2.15 A vector field f defined on an open path-connected set A ⊂ Rn

is called conservative if the value of the path integral∫γ

f · dγ,

where γ is a differentiable path, depends on the end points of γ but not on thewhole path. That is, if γ1 : [a, b] → A and γ2 : [c, d] → A satisfy γ1(a) = γ2(c)and γ1(b) = γ2(d), then ∫

γ1

f · dγ1 =

∫γ2

f · dγ2.

Proposition 2.12 Let A ⊂ Rn be an open path-connected domain. A vector fieldf : A→ Rn is conservative if and only if∫

γ

f · dγ = 0,

72 Chapter 2

for all closed (differentiable) paths γ (we often denote integrals along closed pathsby

∮).

Proof : Let f be conservative and let γ : I → Rn by a closed differentiable path.Then we can always partition γ as γ2 ? γ1 where γ1 : [0, 1/2] → Rn and γ2 :[1/2, 1]→ Rn. Now define γ3 : [0, 1/2]→ Rn by γ3(t) = γ2(1 − t). Then,∮

γ

f · dγ =∫γ1

f · dγ +∫γ1

f · dγ

=

∫γ1

f · dγ −∫γ3

f · dγ = 0.

The other direction is obtained by reversing the argument. n

The question is how, given a vector field f , to identify whether it is conservativeor not. There is a simple criterion, which we will derive immediately. Before, weneed some reminders:

Definition 2.16 Let A ⊂ Rn be an open set and g : A → R. The partial deriva-tive of g with respect to the j-th coordinate at the point x is defined as the limit (ifit exists),

∂g∂x j

(x) = limh→0

g(x + he j) − g(x)h

= limh→0

g(x1, . . . , x j + h, . . . , xn) − g(x1, . . . , . . . , xn)h

.

If the function g has n partial derivatives at the point x, then the vector(∂g∂x1

(x), . . . ,∂g∂xn

(x))

is called the gradient of g at the point x, and is denoted ∇g(x). If the gradientof g is defined for all x ∈ A, then ∇g is a vector field on A. Finally, g is calleddifferentiable at x if its gradient exists and is continuous (as a function A → Rn)at x.

Comment: We will learn about differentiability in depth in the next chapter.


Proposition 2.13 Let A ⊂ Rn be an open path-connected set, and f a continuousvector field on A. Then f is conservative if and only if there exists a differentiablefunction g : A→ R such that f = ∇g, i.e.,

f j(x) =∂g∂x j

(x).

Proof : Suppose first that f is conservative, and let a ∈ A be an arbitrary point.For every x ∈ A we define

g(x) =∫γ

f · dγ,

where γ : I → A is a path that connects the points a and x. Since the domain ispath connected such a path exists, and since the path integral depends only on theend points, g(x) is well-defined.

For every j = 1, 2, . . . , n,


=1h

(∫γ1

f · dγ −∫γ

f · dγ)=

1h

∫γ2

f · dγ,

where γ1 is a paths the connects a and x + he j, passing through x, and γ2 is a paththat connects the points x and x + he j. We have used here again the independenceof the integral on the path, and the additivity of the integral under concatenation.

Since we are free to choose any path, we take

γ2(t) = x + hte j, t ∈ I,

so that


=1h

n∑l=1

∫ 1

0fl(x + hte j)hδl, j dt =

∫ 1

0f j(x + hte j) dt.

Since f is continuous, it follows that the h → 0 limit tends to f j(x). Since f iscontinuous, then all the partial derivatives of g are continuous, i.e., g is differen-tiable.

Conversely, suppose that f is the gradient of a differentiable function g. We needto show that f is conservative, or that the integral of f along every closed path iszero. Let then γ : I → A be a closed differentiable path and define the functionh : I → R,

h(t) = g(γ(t)).

74 Chapter 2

Then, by the chain rule,

h′(t) =n∑

j=1

∂g∂x j

(γ(t))γ′j(t) =n∑

j=1

f j(γ(t))γ′j(t),

i.e.,

0 = h(1) − h(0) =∫ 1

0h′(t) dt =

n∑j=1

∫ 1

0f j(γ(t))γ′j(t) dt =

∫γ

f · dγ.

n

Thus, every conservative field f is the gradient of a function g; such a function gis called a potential function, and it is determined only up to an additive constant.We now derive an additional property of conservative fields:

Proposition 2.14 Let f be a continuously differentiable conservative field (i.e.,all the n components of f have n partial derivatives that are continuous). Then,for every i, j = 1, . . . , n,

∂ fi

∂x j=∂ f j

∂xi.

A function f satisfying this property is called locally conservative.

Proof : We have f j =∂g∂x j

. Since g has continuous second derivatives it follows that

∂2g∂xi∂x j

=∂2g∂x j∂xi

,

which is precisely what we need to show. n(28 hrs)

x x+h!j

x+h!i x+ h!i +h!j

Here is an independent justification of local conservation. Let h > 0 be a finitenumber, and let x ∈ A. Since A was assumed to be open, there exists an open ballaround x which is in A, and we can take h sufficiently small such that B(x, 2h) ⊂ A.Consider now the following four paths:

γ1(t) = x + the j

γ2(t) = x + he j + thei

γ3(t) = x + thei

γ4(t) = x + hei + the j.


If the field f is conservative then∫γ1

f · dγ +∫γ2

f · dγ =∫γ3

f · dγ +∫γ4

f · dγ.

Substituting the definitions we have∫ 1

0f j(x+the j)h dt+

∫ 1

0fi(x+he j+thei)h dt =

∫ 1

0fi(x+thei)h dt+

∫ 1

0f j(x+hei+the j)h dt.

We re-organize this equation as follows:∫ 1

0

fi(x + he j + thei) − fi(x + thei)h

dt =∫ 1

0

f j(x + hei + the j) − f j(x + the j)h

dt.

By the mid-value theorem,∫ 1

0

∂ fi

∂x j(x + thei + θhe j) dt =

∫ 1

0

∂ f j

∂xi(x + the j + ηhei) dt.

It remains to apply the integral mid-value theorem and let h → 0 to recover thedesired result. Note that we have used explicitly the continuous-differentability off , and we didn’t need to use any property of second derivatives. This also clarifiesthe “local” nature of this property.

The question is whether this implication is reciprocal: does local conservationimply conservation? The following example shows that this is not the case.

Example: Consider the vectors field

f (x, y) =(−y

x2 + y2 ,x

x2 + y2

)defined on R2 \ {(0, 0)} (in R2 we use the coordinates (x, y) rather than (x1, x2)). Itis a locally conservative field since

∂ f1

∂y−∂ f2

∂x=

(−1

x2 + y2 +2y2

(x2 + y2)2

)−

(1

x2 + y2 −2x2

(x2 + y2)2

)=

(−x2 − y2 + 2y2) − (x2 + y2 − 2x2)(x2 + y2)2 = 0.

76 Chapter 2

This vector field is however not conservative. Take the closed path

γ(t) = (cos t, sin t) t ∈ [0, 2π],

then,f (γ(t)) = (− sin t, cos t),

and ∫γ

f · dγ =∫ 2π

0(− sin t)(− sin t) dt +

∫ 2π

0cos t cos t dt = 2π , 0.

N

Proposition 2.15 Let f be a locally conservative vector field in an open path-connected domain A. If γ1 and γ2 are two differentiable closed paths, and thereexists a continuously twice differentiable homotopy H(t, s) between γ1 and γ2, then∫

γ1

f · dγ =∫γ2

f · dγ.

Comment: Here again, the restrictions on differentiability can be alleviated andreplaced by a more delicate analysis.

Proof : Recall the definition of a homotopy. H is a twice continuously-differentiablefunction I2 → A such that for every s, H(t, s) is a closed path, and

H(t, 0) = γ1(t) H(t, 1) = γ2(t).

We also know that f is locally conservative. For every s we define J : I → R by

J(s) =∫

H(·,s)f · dγ,

and we will be done if we prove that J(s) is independent of s. Now,

J′(s) =dds

n∑j=1

∫ 1

0f j(H(t, s))

∂H j

∂t(t, s) dt

=

n∑j=1

∫ 1

0

[∂

∂sf j(H(t, s))

∂H j

∂t(t, s) + f j(H(t, s))

∂2H j

∂s∂t(t, s)

]dt

=

n∑j=1

∫ 1

0

[∂

∂sf j(H(t, s))

∂H j

∂t(t, s) + f j(H(t, s))

∂2H j

∂t∂s(t, s)

]dt

=

n∑j=1

∫ 1

0

[∂

∂sf j(H(t, s))

∂H j

∂t(t, s) −

∂

∂tf j(H(t, s))

∂H j

∂s(t, s)

]dt,


where we have used the integration by parts and the facts that H(0, s) = H(1, s)for every s. Using the chain rule we have

J′(s) =n∑

j=1

n∑k=1

∫ 1

0

[∂ f j

∂xk(H(t, s))

∂Hk

∂s(t, s)∂H j

∂t(t, s) −

∂ f j

∂xk(H(t, s))

∂Hk

∂t(t, s)∂H j

∂s(t, s)

]dt.

Using the local conservation property, this expression vanishes as the two sumsare the same, upon renaming the summation indexes. n

Corollary 2.2 In a simply-connected domain every locally conservative field isconservative.

Proof : In a simply-connected domain every closed path γ1 is homotopic to aconstant path γ2(t) = x ∈ A, whose derivative is zero, hence∫

γ1

f · dγ =∫γ2

f · dγ = 0.

n

With the previous example in mind we reach the following conclusion, which wehave already used to explain why R2 and R3 were not homeomorphic.

Corollary 2.3 The perforated plane R2 \ {(0, 0)} is not simply connected.

. Exercise 2.12 In Newton’s theory of gravitation, the force that earth exerts onan elephant is a vector field in R3 \ {(0, 0, 0)}:

f (x, y, z) = GMm(

−x(x2 + y2 + z2)3/2 ,

−y(x2 + y2 + z2)3/2 ,

−z(x2 + y2 + z2)3/2

),

where (x, y, z) are Cartesian coordinates with origin at the center of earth, G is aconstant, M is the earth’s mass, and m is the mass of the elephant.

À Prove that this is a conservative field.Á What is the potential?Â The integral of the force field along a path is the amount of work needed to

move the elephant between the ends of the path. If the elephant is originallyon earth (i.e., at a distance R0 from the origin), how much work is requiredto move it to “infinity”?

78 Chapter 2

Chapter 3

Differential Calculus in Rn

In the first chapter, we learned about metric spaces, their topology, and specifi-cally, about metric spaces of real-valued functions. Note that we have almost notdealt with differential and integral calculus. The reason is that derivatives and in-tegrals are associated with limits of differences and sums, which are not pertinentto general metric spaces. The notions of derivatives and integrals can be definedfor functions that take values in normed spaces, which, as we learned, are a sub-class of metric spaces. In this chapter we will develop the differential calculusof functions between finite-dimensional normed spaces, which are all isomorphicto a Euclidean space. Differential calculus can also be constructed for infinite-dimensional normed spaces, but this is beyond the scope of the present course1.

3.1 Normed spaces

In this section we briefly review some facts about vector spaces endowed withnorms. We will not restate the axiomatic definition of a vector space. Recall thata vector space is n-dimensional if there exist n (non-null) vectors that are linearlyindependent, but every n+1 (non-null) vectors are linearly dependent. The space isinfinite-dimensional if there exist n independent vectors for every n. Furthermore,recall that a basis u1, u2, . . . is a set of independent vectors such that every vector

1Physics students who studied analytical mechanics encountered functional derivatives,which are derivatives of mappings between functions and real numbers.

80 Chapter 3

x has a representation,

x =n∑

j=1

a ju j.

In an n-dimensional vector space every n linearly independent vectors form sucha basis. It is easy to see that for a given basis the representation is unique.

Let X be a vector space. A norm on X is a function ‖ · ‖ : X → R which sat-isfies the properties of positivity, homogeneity and the triangle inequality. Asdiscussed in our first chapter, a norm induces a metric on X, so that convergenceof a sequence xk to x in this metric space amounts to convergence in norm,

limk→∞‖xk − x‖ = 0.

Two norms ‖ · ‖1 and ‖ · ‖2 on X are said to be equivalent if there exist two realnumbers c,C > 0 such that for all x ∈ X.

c‖x‖1 ≤ ‖x‖2 ≤ C‖x‖1.

If two norms are equivalent then xk → x with respect to the ‖ · ‖1 norm if and onlyif xk → x with respect to the ‖ · ‖2 norm. An equivalence between norms is anequivalence relation.

Let X be an n-dimensional vector space, and let {u1, . . . , un} be a basis. Since everyx ∈ X has a unique representation,

x =n∑

j=1

a ju j,

it follows that there exists a one-to-one correspondence between elements of Xand n-tuples of real numbers, i.e., Rn, via the mapping

n∑j=1

a ju j ↔

a1...

an

.This mappings is in fact an isomorphism as is preserves vector operations. Inparticular, a norm on X induces a norm on Rn,∥∥∥∥∥∥∥∥∥

a1...

an

∥∥∥∥∥∥∥∥∥ def=

∥∥∥∥∥∥∥n∑

j=1

a ju j

∥∥∥∥∥∥∥ ,

Differential Calculus in Rn 81

and vice versa.

The main property of finite-dimensional normed spaces is that all norms are equiv-alent. By the previous argument, it is sufficient to show that this is correct for Rn.We will omit the proof, which relies on the compactness of the unit sphere.

3.2 Linear mappings between normed spaces

As always in mathematics, once we have defined a certain class of objects (here,normed vector spaces), we define mappings between such spaces. We start withdefinitions that hold also for infinite-dimensional normed spaces, before restrict-ing the discussion back to finite-dimensional ones.

Definition 3.1 Let X and Y be normed spaces over R, endowed with norms ‖ · ‖Xand ‖·‖Y , respectively. A mapping T : X → Y is called linear if for every x, x′ ∈ X,α, β ∈ R,

T (αx + βx′) = αT (x) + βT (x′).

Definition 3.2 (The norm of a linear transformation) Let T : X → Y be a lin-ear mapping between normed spaces. We define

‖T‖X,Y = sup‖x‖X=1

‖T x‖Y .

The supremum may be a priori infinite.

Comments:

• The notations ‖ · ‖X, ‖ · ‖Y , and ‖ · ‖X,Y are unbearable.... we shall alwaysuse the short-hand notation ‖ · ‖, provided that the meaning is clear from thecontext.

• By the linearity of T ,

supx,0

‖T x‖‖x‖

= supx,0

∥∥∥∥∥Tx‖x‖

∥∥∥∥∥ = sup‖x‖=1‖T x‖ = ‖T‖,

which provides and alternative definition for ‖T‖.

82 Chapter 3

• It follows from the definition that for all x , 0,

‖T x‖ = ‖x‖‖T x‖‖x‖

≤ ‖x‖ supy,0

‖Ty‖‖y‖= ‖T‖ ‖x‖. (3.1)

Obviously, this inequality holds also for x = 0.

• The number ‖T‖ (if it is finite) is the maximum amount (in norm) by whichthe linear operator T can stretch a vector.

• We denote the space of linear transformations from the vector space X tothe vector space Y by Hom(X,Y). It is a vector space over R with vectoraddition and scalar multiplication defined by

(T + S )x = T x + S x and (αT )x = α(T x).

(For those who know some algebra, the notation Hom is appropriate becausevector spaces are groups under the addition operation, and linear transfor-mations are homomorphisms between such groups.)

• We denote the subset of bounded linear operators

{T ∈ Hom(X,Y) : ‖T‖ < ∞}

by L(X,Y). The function ‖ · ‖ is a norm in L(X,Y). Positivity and homogene-ity are obvious. The triangle inequality follows from

‖T + S ‖ = sup‖x‖=1‖(T + S )x‖ = sup

‖x‖=1‖T x + S x‖

≤ sup‖x‖=1

(‖T x‖ + ‖S x‖) ≤ sup‖x‖=1‖T x‖ + sup

‖x‖=1‖S x‖

= ‖T‖ + ‖S ‖.

• The identity I is an element of L(X, X). We always have ‖I‖ = 1.(30 hrs)

Let T ∈ Hom(X,Y) and S ∈ Hom(Y,Z). Then their composition S ◦ T is a linearmapping from X to Z, i.e., S ◦ T ∈ Hom(X,Z).

Proposition 3.1 Let T ∈ L(X,Y) and S ∈ L(Y,Z). Then S ◦ T ∈ L(X,Z) and

‖S ◦ T‖ ≤ ‖S ‖ ‖T‖. (3.2)


Proof : For every x ∈ X, we use twice inequality (3.1) to get

‖(S ◦ T )x‖ = ‖S (T x)‖ ≤ ‖S ‖ ‖T x‖ ≤ ‖S ‖ ‖T‖ ‖x‖.

Taking the supremum over all ‖x‖ = 1 we get the desired result,

‖S ◦ T‖ = sup‖x‖=1‖(S ◦ T )x‖ ≤ sup

‖x‖=1‖S ‖ ‖T‖ ‖x‖ = ‖S ‖ ‖T‖.

n

An important property of linear operators between normed spaces (at this point,still possibly infinite-dimensional) is that boundedness coincides with continuity.

Proposition 3.2 Let T ∈ Hom(X,Y) where X,Y are normed spaces. Then T iscontinuous if and only if it is bounded (i.e., ‖T‖ < ∞).

Proof : (a) Bounded implies continuous: Assume that T was bounded and letxk → x in X (i.e., ‖xk − x‖ → 0). Then, by inequality (3.1):

‖T xk − T x‖ = ‖T (xk − x)‖ ≤ ‖T‖ ‖xk − x‖k→∞−→ 0,

i.e., T xk → T x, which implies that T is continuous.

(b) Continuous implies bounded: Conversely, assume that T was continuous.Since T : 0 7→ 0, then for a given ε > 0 there exists a δ > 0 such that

‖y‖ ≤ δ ⇒ ‖Ty‖ ≤ ε.

For every x on the unit sphere, we use the homogeneity of the norm to get

‖T x‖ =1δ‖T (δx)‖ ≤

ε

δ,

i.e., ‖T‖ ≤ ε/δ, which proves the boundedness of T . n

If X and Y are finite-dimensional, then every operator T ∈ Hom(X,Y) is bounded,i.e., L(X,Y) = Hom(X,Y). Thus, from now on we will only use the notation Hom,and the fact that every linear operator is continuous.

Comment: Having identified the space of linear transformations Hom(X,Y) as anormed space (and in particular a metric space), it makes sense to talk about the

84 Chapter 3

convergence of a sequence of linear transformations (Tk) ⊂ Hom(X,Y). Such asequence is said to converge to a linear transformation T if

limk→∞‖Tk − T‖ = 0.

An n-dimensional vector space X has a basis (e1, . . . , en), such that every x ∈ Xhas a unique representation as

x =n∑

i=1

aiei.

We endow this space with an inner-product, i.e., a bilinear, symmetric positivemapping X × X → R. By the bilinearity, the inner is completely defined by speci-fying it for every pair of basis vectors, (ei, e j). The basis is said to be orthonormalwith respect to the inner product if

(ei, e j) = δi j,

where δi j is the Kronecker delta. Then, for vectors x =∑n

i=1 aiei and y =∑nj=1 b je j we have

(x, y) =

n∑i=1

aiei,

n∑j=1

b je j

= n∑i=1

aibi.

The Euclidean norm in X is defined in terms of the inner product:

‖x‖ =√

(x, x) =

n∑i=1

a2i

1/2

.

In Rn, the standard orthonormal basis has basis vectors:

ei = (0, . . . , 0, 1, 0, . . . , 0)T ,

so that the inner product of the two vectors

a = (a1, . . . , an) and b = (b1, . . . , bn),

is

(a, b) =n∑

i=1

aibi.

Note that vectors in Rn are always column vectors.


. Exercise 3.1 Let X be a spaces endowed with an inner-product, (·, ·):

À Remind yourself the definition of an inner-product space.Á Prove that for every x, y ∈ X the Cauchy-Schwarz inequality holds:

|(x, y)| ≤ ‖x‖ ‖y‖.

Let now (e1, . . . , en) and ( f1, . . . , fm) be orthonormal bases for Rn and Rm, respec-tively. A linear transformation T ∈ Hom(Rn,Rm) is fully specified by prescribingits action on the basis vectors of Rn. Indeed, since any x ∈ Rn has a representation

x =n∑

i=1

ciei,

it follows by the linearity of T that

T x =n∑

i=1

ci Tei.

Now, for every i = 1, . . . , n, Tei, as a vector in Rm, has a unique representation ofthe form

Tei =

n∑j=1

b j f j.

Since the coefficients b j depend on i (for every i there are m such coefficients), wemake this fact explicit in the notation by renaming the b j as a ji. Finally, we wishto make it explicit that the coefficients a ji characterize the linear transformationT (another transformation will have another set of associated coefficients). Theseconsiderations lead us to the following choice of notation,

Tei =

m∑j=1

a ji(T ) f j.

If on both sides we take an inner product with fk, we find that

aki(T ) = (ek,Tei).

For an arbitrary x =∑n

i=1 ciei,

T x =n∑

i=1

ci Tei =

n∑i=1

ci

m∑j=1

a ji(T ) f j =

m∑j=1

n∑i=1

a ji(T ) ci

f j.

86 Chapter 3

Thus, if we represent the vectors in Rn and Rm by their standard representation ascolumn vectors, the action of T takes the form of matrix multiplication:

T :

c1...

cn

7→a11(T ) . . . a1n(T )...

...am1(T ) . . . amn(T )

c1...

cn

.Proposition 3.3 Given orthonormal bases (e1, . . . , en) and ( f1, . . . , fm), a sequenceof linear transformations (Tk) converges in Hom(Rn,Rm) to a linear transforma-tion T if and only if ai j(Tk) converges to ai j(T ) for every 1 ≤ i ≤ m and 1 ≤ j ≤ n.

Proof : (a) Convergence of matrix elements implies convergence in norm:Suppose that ai j(Tk)→ ai j(T ) for all i, j. Then,

‖Tk − T‖ = sup‖x‖=1‖(Tk − T )x‖

= sup∑c2

j=1

∥∥∥∥∥∥∥m∑

j=1

n∑i=1

(a ji(Tk) − a ji(T ))ci

f j

∥∥∥∥∥∥∥= sup∑

c2j=1

m∑j=1

n∑i=1

(a ji(Tk) − a ji(T ))ci

21/2

≤

m∑j=1

n∑i=1

(a ji(Tk) − a ji(T ))2

1/2

→ 0,

where we have used the Cauchy-Schwarz Inequality.

(b) Convergence in norm implies convergence of matrix elements: Supposethat Tk → T . Then, by the Cauchy-Schwarz inequality and (3.1),

|a ji(Tk) − a ji(T )| = |( f j, (Tk − T )ei)| ≤ ‖ f j‖ ‖(Tk − T )ei‖ ≤ ‖Tk − T‖ → 0.

n

Example: Linear transformation in Hom(Rn,Rn) can be represented as n × n ma-trices. Suppose that Rn with standard basis ei is endowed with the `∞n norm. Findan explicit expression for the norms of linear transformations.

In this case,‖x‖ = max

1≤i≤n|xi|,


and for a linear transformation T represented by a matrix with entries ai j,

‖T‖ = sup‖x‖=1‖T x‖ = sup

‖x‖=1max1≤i≤n

∣∣∣∣∣∣∣n∑

j=1

ai jx j

∣∣∣∣∣∣∣ ≤ sup‖x‖=1

max1≤i≤n

n∑j=1

|ai j||x j| ≤ max1≤i≤n

n∑j=1

|ai j|.

This maximum is however attainable if we take x j = sgn(ai j) for the maximizingi, hence,

‖T‖ = max1≤i≤n

n∑j=1

|ai j|,

i.e., it is the maximum over all rows of the matrix ai j of the sum of absolute valuesof its entries. N

. Exercise 3.2 Find explicit expressions for the `1n and the `2n norms of a matrix.

We focus now on Hom(Rn,Rn). Among all those linear transformation, thereexists a subset of transformations that are invertible; we denote them by GL(Rn)(the General Linear groups; the set of invertible matrices is a group under matrixmultiplication). In linear algebra we learned the following equivalences:

Proposition 3.4 The following statements are equivalent:

À T ∈ Hom(Rn,Rn) is invertible, i.e., there exists an S ∈ Hom(Rn,Rn) suchthat TS = S T = I (we denote the inverse by T−1).

Á The kernel of T includes only the zero vector, i.e., T x = 0 iff x = 0.

Â T is onto. For every y ∈ Rn there exists an x ∈ Rn such that y = T x.

Ã det T , 0.

In addition, we state the following topological properties of linear transforma-tions:

Proposition 3.5

À The determinant is a continuous function from Rn to R.

Á GL(Rn) is an open set in Hom(Rn,Rn).

Â The mapping T 7→ T−1 is a continuous mapping in GL(Rn).

88 Chapter 3

Proof : (a) Continuity of the determinant: We need to show that Tk → T impliesthat det Tk → det T . The determinant is a polynomial function of the entries ofthe matrix ai j(T ), and we have proved that Tk → T implies ai j(Tk) → ai j(T ) forall i, j, which in turn implies that det Tk → det T .

(b) GL(Rn) is open in Hom(Rn,Rn): Since the determinant is a continuous map-ping det : Hom(Rn,Rn)→ R, and R \ {0} is open in R, it follows that

GL(Rn) = {T ∈ Hom(Rn,Rn) : det T ∈ R \ {0}}

is open in Hom(Rn,Rn).

(c) Continuity of the inverse mapping: The continuity of the mapping T 7→ T−1

follows from Cramer’s rule. The ai j(T−1) are rational functions of the coefficientai j(T ), hence are continuously dependent of them. n

Comment: If follows that the mapping T 7→ T−1 is a homeomorphism in GL(Rn).(32 hrs)

3.3 Differentiability and derivatives

We now turn to general (not only linear) mappings from Rn to Rm. Having chosenthe standard bases in Rn and Rm, we can represent such transformations as mfunctions, each depending on n variables:

f1(x1, . . . , xn)...

fm(x1, . . . , xn).

Example: The following is a nonlinear mapping from R2 to R3:

f :(x1

x2

)7→

x21 sin x2√

x1/x2

x1

.The function f has three component, given by

f1(x1, x2) = x21 sin x2

f2(x1, x2) =√

x1/x2

f3(x1, x2) = x1.

N


0 1 2 3 4 5 6−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

x

y

The function sin(x)

−2

−1

0

1

2

−2

−1

0

1

2−14

−12

−10

−8

−6

−4

−2

0

x

The function −(x−0.5)2 − (y−0.5)2

y

z

Figure 3.1: The graphs of real-valued functions R→ R (left) and R2 → R (right).

Example: Real-valued functions, f : R→ R, are best visualized by their graphs,as one-dimensional curves embedded in the plane. One coordinate of the plane—the abscissa—represents the value of the independent variable, whereas the sec-ond coordinate—the ordinate—represents the value of its image. Similarly, real-valued functions f : R2 → R are visualized by their graph which is a (two-dimensional) surface embedded in R3 (see Figure 3.1). In the same way, the graphof a real-valued function f : Rn → R is an n-dimensional manifold embedded inRn+1. N

We want to define derivatives for such functions f : Rn → Rm. We first remindourselves how we defined derivatives of function f : R → R. The standarddefinition for the derivative of f at the point x is

f ′(x) = limh→0

f (x + h) − f (x)h

.

This definition cannot be generalized in any simple way to derivatives of multi-variate functions (we can only define partial derivatives). Another definition of thederivative of a univariate function is the following: we say that f is differentiableat x if there exists a real number a such that

limy→0

f (x + y) − f (x) − ayy

= 0,

and this number a, which can be shown to be unique, is the derivative of f at x.An equivalent way of writing it is

f (x + y) − f (x) − ay = o(|y|),

90 Chapter 3

which means that, to linear order in y, ay is the best approximation to f (x + y) −f (x). Note that although x and y are both real-numbers, they play different roles:x is a point in R, whereas y is a displacement in R (in a more general context, yis said to be an element to the space tangent to R at the point x).

Having formulated differentiability this way, we think of the number a as a lineartransformation, which converts the displacement y from the point x, into an (ap-proximate) displacement, ay, of the image f (x + y) − f (x).. This interpretation ofthe derivative can be generalized in a natural way into mappings from Rn to Rm:

Definition 3.3 The function f : Rn → Rm is said to be differentiable at a pointx ∈ Rn is there exists a linear transformation T ∈ Hom(Rn,Rm) such that

f (x + y) − f (x) − Ty = o(‖y‖),

i.e.,

limy→0

f (x + y) − f (x) − Ty‖y‖

= 0. (3.3)

We will denote the linear transformation T by D f (x) (the differential, or deriva-tive of f at the point x). We denote by D f (x)y, or by D f (x)(y) the differential of fat x operating on a vector y ∈ Rn, resulting in a vector in Rm.

Comment: Recall that the limit in (3.3) is the limit in norm, which amount to acomponentwise limit.

Comment: As explained above, D f (x) is a linear transformation which maps thedisplacement y from the point x into a vector D f (x)y, which approximates thedisplacement of f (x + y) from f (x).

Comment: The function D f is a (generally nonlinear) function fromRn to Hom(Rn,Rm);it is called the Frechet derivative of f ; it can be generalized to mappings betweenarbitrary Banach spaces. The standard notion of derivative from the first calculuscourse is called the Gateaux derivative. We will see the relation between the twodefinitions later on.

But first, we need to check that the derivative D f (x) is well defined, i.e., that if alinear transformation wit hthe required properties exists, it is unique.

Proposition 3.6 If f is differentiable at x then the linear transformation T in thedefinition (3.3) is unique.


Proof : Suppose that T, S ∈ Hom(Rn,Rm) both satisfy

limy→0

f (x + y) − f (x) − Ty‖y‖

= 0

limy→0

f (x + y) − f (x) − S y‖y‖

= 0.

For every finite displacement y ∈ Rn,

(S − T )y =[f (x + y) − f (x) − Ty

]−

[f (x + y) − f (x) − S y

].

Dividing by ‖y‖ and letting y→ 0,

limy→0

(S − T )y‖y‖= 0.

We need to show that this last relation implies that S − T = 0. In particular, wecan y → 0 in a specific way: let z ∈ Rn be a unit vector and set y = tz, so thatt → 0 implies y→ 0. Then,

(S − T )y‖y‖= (S − T )

tz‖tz‖= (S − T )z.

Letting t → 0 we get(S − T )z = 0,

from which follows that‖S − T‖ = 0,

i.e., S = T . n

As defined, D f (x)y is the derivative of f at x acting on a displacement vector y.Fix now y and consider the action D f (x) on displacements hy, where h ∈ R; thatis, we restrict our attention to displacement along a single direction. By definition,in f is differentiable at x, then

limh→0

f (x + hy) − f (x) − D f (x)(hy)‖hy‖

= limh→0

f (x + hy) − f (x) − h D f (x)y|h|‖y‖

= 0,

orD f (x)y = lim

h→0

f (x + hy) − f (x)h

. (3.4)

This gives a new interpretation to the vector D f (x)y ∈ Rm; it is the rate of changeof f when we displace its argument in the y-direction; we call it the directionalderivative of f at x in the y-direction.

92 Chapter 3

Comment: Directional derivatives are generalizations of partial derivatives. Iffollows directly from (3.4) that

∂ f∂x j

(x) = D f (x)e j. (3.5)

Comment: The right-hand side of (3.4) does not look like a linear operator actingon the vector y. The fact that it is so is due to f being differentiable.

. Exercise 3.3 (Ex. 10 in L.) Find a continuous function f for which the limitin (3.4) exists for all y, but it is nevertheless not a linear function of y.

Example: Consider the case where f is a linear transformation, f : x 7→ T x, withT ∈ Hom(Rn,Rm). Such a mapping is always differentiable as

f (x + y) − f (x) − Ty‖y‖

= 0,

so that D f (x) = T for all x, or, D f (x) : y 7→ Ty. This generalizes the identity(ax)′ = a. N

Example: If f is a constant function, f : x 7→ c, then D f (x) is the zero transfor-mation, D f (x) : y 7→ 0. N

Comment: Differentiability is a property of a function in a neighborhood of apoint. Thus, a notion of the derivative of f at x requires x to be an interior point.Thus, when we consider a function that is differentiable in a subset of Rn, thissubset must be open.

If f : Rn → Rm is differentiable is some domain, then for every x ∈ Rn, D f (x) ∈Hom(Rn,Rm). Since Hom(Rn,Rm) is a normed space, we have a notion of conti-nuity of D f (x) as function of x. Thus, we can define continuously differentiablefunctions.

Definition 3.4 Let A ⊂ Rn be an open set and f : A → Rm a function that isdifferentiable at every x ∈ A. f is said to be continuously-differentiable if themapping

D f : A→ Hom(Rn,Rm)

is continuous.


So far we have looked at the derivative in a general setting, without reference to aparticular basis. Equation (3.5) relates directional derivatives along basis vectorswith partial derivatives. More generally, we have:

Proposition 3.7 Let f (x) = ( f1(x1, . . . , xn), . . . , fm(x1, . . . , xn)) be differentiable ata point x. Then, all the m × n partial derivatives of f at x exists and the operatorD f (x) has the matrix representation

D f (x) =

∂ f1

∂x1. . .

∂ f1

∂xn...

......

∂ fm

∂x1. . .

∂ fm

∂xn

,where all the partial derivatives are evaluated at the point x.

Proof : The existence of the partial derivatives was already established in (3.5).From (3.4) we have

D f (x)e j = limh→0

f (x + he j) − f (x)h

.

Both sides are vectors in Rm. Taking the inner product with ei we get2

(ei,D f (x)e j

)= lim

h→0

fi(x + he j) − fi(x)h

def=∂ fi

∂x j(x).

n

Example: When f : Rn → R we have D f (x) ∈ Hom(Rn,R), i.e., the derivative off is represented by a row vector whose entries are(

∂ f∂x1

(x), . . . ,∂ f∂xn

(x)).

This vector is called the gradient of f at x. More precisely, the gradient, ∇ f (x),is a column vector formed with the same entries. The formal relation between thederivative D f (x) and the gradient ∇ f (x) is

D f (x)y = (∇ f (x), y) .

2Limits commute with inner products, because the latter is a continuous function Rn×Rn → R.

94 Chapter 3

Since Hom(Rn,R) is isomorphic Rn (the space of row vectors is isomorphic to thespace of column vectors), we can view the gradient either as an operator D f (x)or as a vector ∇ f (x)3. The gradient, as a vector, has a geometrical interpretation.Consider all the unit vectors in Rn. One may ask in which direction is the rate ofchange of f maximal, i.e., which unit vector y maximizes

limh→0

f (x + hy) − f (x)h

.

By (3.4), we are looking for the maximizer of

D f (x)y = (∇ f (x), y) .

Clearly, the unit vector along which f changes the fastest is parallel to the gradient.Thus, the gradient of f is a vector pointing in the direction of maximal change rateof f .

Moreover, consider the space

M⊥f (x) def=

{y ∈ Rn : D f (x)y = 0

}.

This is an (n − 1)-dimensional subpace of Rn that spans all the directions alongwhich f does not change, to first order in the displacement. This space spans theplane tangent to the level sets of f at the point x. N

Example: When f : R→ Rm we have D f (x) ∈ Hom(R,Rm), i.e., the derivative off is represented by a column vector whose entries are

∂ f1

∂x(x)...

∂ fm

∂x(x)

,which acts on elements of R by scalar multiplication. For functions defined on Rwe often use the notation f ′(x) for the derivative. N

3The one-to-one correspondence between vectors and real-valued linear operators on vectorsoccurs also in infinite dimension, It is known as the Riesz representation theorem.


Example: Let f : R2 → R2 be given by

f :(x1

x2

)7→

(x2

1 + x2

x1 + x22

).

Its derivative at x = (x1, x2) is represented by the 2-by-2 matrix

D f (x) =(2x1 11 2x2

).

N

Example: Consider the function f : R2 → R,

f(x1

x2

)=

√x2

1 + x22.

This function is not differentiable at 0 because its partial derivatives do not exist.For example,

limh→0

f (0 + he1) − f (0)h

= limh→0

|h|h

does not exist. N

The question is whether the opposite it true: does the existence of all the m ×n partial derivatives ensure the differentiability of the function? We approachthis question in two steps. We first show that differentiability amounts to thedifferentiability of all the components (which shouldn’t come as a surprise sincelimits in Rm coincide with componentwise limits).

Proposition 3.8 Let A ⊂ Rn be an open set and f : A → Rm; we denote its com-ponents by ( f1, . . . , fm), which are all functions A→ R. Then f is differentiable atx ∈ A if and only if all its components f j are differentiable at that point.

Proof : (a) Differentiability of the f j implies the differentiability of f : Sup-pose that all the f j were differentiable at x. By definition, there exist m lineartransformations T j ∈ Hom(Rn,R), j = 1, . . . ,m, such that

limy→0

f j(x + y) − f j(x) − T jy‖y‖

= 0.

96 Chapter 3

Each of the T j has a matrix representation as a row vector of length n. If weorganize these m equations as a column vector then

limy→0

1‖y‖

f1(x + y)...

fm(x + y)

−

f1(x)...

fm(x)

−T1...

Tm

y

= 0,

which implies that f is differentiable and D f (x) is represented by the matrixformed by the row vectors T j.

(b) Differentiability of f implies the differentiability of the f j: This followsby the same argument. n(34 hrs)

Theorem 3.1 Let A ⊂ Rn be an open set and f : A→ Rm. Then f is differentiableat x ∈ A if all its partial derivatives at x exist and are continuous.

Proof : By the previous proposition it is enough to prove this theorem for real-valued functions, i.e., for m = 1. Let f : A→ R and assume that

∂ f∂x j

(x), j = 1, . . . , n

exist are are continuous. We need to show that there exists a linear operator T ∈Hom(Rn,R) such that

limy→0

f (x + y) − f (x) − Ty‖y‖

= 0.

We know what this operator is; it is the row vector

T =(∂ f∂x1

(x), . . . ,∂ f∂xn

(x)).

That is, we need to show that

limy→0

1‖y‖

f (x + y) − f (x) −n∑

j=1

∂ f∂x j

(x)y j

= 0.

Since f has continuous partial derivatives we can invoke the mean-value theoremalong each component of its arguments. We first write the displacement of f in


the form of a telescoping sum,

f (x + y) − f (x) = f (x1 + y1, x2, . . . , xn) − f (x1, . . . , xn)+ f (x1 + y1, x2 + y2, . . . , xn) − f (x1 + y1, x2, . . . , xn)+ . . .

+ f (x1 + y1, . . . , xn + yn) − f (x1 + y1, . . . , xn−1 + yn−1, xn),

that is, each row consists of a variation of f upon the displacement of its argumentalong a single axis. By the (one-dimensional!) mean-value theorem, which weapply n times,

f (x+ y)− f (x) = y1∂ f∂x1

(x1 + θ1y1, x2, . . . , xn)+ · · ·+ yn∂ f∂xn

(x1 + y1, . . . , xn + θnyn),

where all the θ j are in (0, 1). Thus,

1‖y‖

f (x + y) − f (x) −n∑

j=1

∂ f∂x j

(x)y j

= n∑j=1

∂ f∂x j

x +∑

i

θiyiei

− ∂ f∂x j

(x)

y j

‖y‖.

Letting y→ 0, the right-hand side vanishes by the continuity of the partial deriva-tives. n

Example: The “counter example” is when f has all its m × n derivatives at apoint, but it is nevertheless not differentiable. This can happen when its partialderivatives are not continuous. Take the following example of f : R2 → R:

f :(x1

x2

)7→

x3

1

x21 + x2

2

x , 0

0 x = 0.

This function has partial derivatives at zero,

∂ f∂x1

(0) = limh→0

f (0 + he1) − f (0)h

= limh→0

h3/h2

h= 1

∂ f∂x2

(0) = limh→0

f (0 + he2) − f (0)h

= 0.

On the other hand, this function is not differentiable at 0. If it were, we wouldhave

limx1,x2→0

1√x2

1 + x22

(x3

1

x21 + x2

2

−(1 0

) (x1

x2

))= 0,

98 Chapter 3

i.e.,

limx1,x2→0

−x1x22

(x21 + x2

2)3/2= 0.

This is not the case as seen when taking x1 = x2 → 0.

N

We have seen that the existence of partial derivatives is a necessary by not suffi-cient condition for differentiability. The continuity of the partial derivatives en-sures differentiability. In fact, it ensures that the derivative is also continuous.Conversely, the continuity of the derivative implies the continuity of the partialderivatives. This leads us to the following corollary:

Corollary 3.1 Let A ∈ Rn be an open set. f : A → Rm is continuously differen-tiable in A if and only if its partial derivative exist and are continuous in A.

Definition 3.5 Let A ∈ Rn be an open set. We denote by C1(A;Rm) the set ofcontinuously-differentiable functions from A into Rm.

. Exercise 3.4 (Ex. 12 in L.) Let f : Rn → Rn such that D f (x) = 0 on an openconnected set U ⊂ Rn. Prove that f is constant on U.

. Exercise 3.5 Let A ⊂ Rn be an open set and f : A → Rm. Show that if f isdifferentiable in A then it is also continuous, in fact, Lipschitz continuous.

We have thus defined a notion of differentiability that generalizes the “old” con-cept of differentiability for functions R → R. Moreover, we have a formula forcalculating the derivative of a function Rn → Rm, which involves the calculationof m × n partial derivatives. The next stage is to do “useful” things with this newconcept, and in particular, generalize results known for real-valued functions.

Proposition 3.9 (The chain rule) Let A ⊆ Rn and B ⊆ Rm be open domains withf : A → Rm and g : B → Rk. Assume that x ∈ A satisfies f (x) ∈ B, with f , gdifferentiable at x and f (x), respectively. Then, the function g ◦ f : A → Rk isdifferentiable at x and its derivative is given by the chain rule:

Dg◦ f (x)︸︷︷︸Rn→Rk

= Dg( f (x))︸︷︷︸Rm→Rk

◦D f (x)︸︷︷︸Rn→Rm

.


Proof : For displacements y ∈ Rn and z ∈ Rm we define the following functions,

r(y) = f (x + y) − f (x) − D f (x)ys(z) = g( f (x) + z) − g( f (x)) − Dg( f (x))z.

By the definition of D f (x) and Dg( f (x)), we have r(y) = o(‖y‖) and s(z) = o(‖z‖).Now,

g( f (x + y)) = g(

f (x) + D f (x)y + r(y))

= g( f (x)) + Dg( f (x))(D f (x)y + r(y)

)+ s(D f (x)y + r(y)),

which we can re-organize as follows,

(g ◦ f )(x + y) − (g ◦ f )(x) − Dg( f (x)) ◦ D f (x)y‖y‖

= A1 + A2,

whereA1 = Dg( f (x))

r(y)‖y‖

A2 =s(D f (x)y + r(y))

‖y‖.

We need to show that both A1 and A2 tend to zero as y→ 0. For A1 we have

‖A1‖ ≤ ‖Dg( f (x))‖‖r(y)‖‖y‖

→ 0.

A2, on the other hand, can be bounded by

‖A2‖ =‖D f (x)y + r(y)‖

‖y‖‖s(D f (x)y + r(y))‖‖D f (x)y + r(y)‖

≤

(‖D f (x)‖ +

‖r(y)‖‖y‖

)‖s(z)‖‖z‖,

where z = D f (x)y + r(y). Since

‖z‖ ≤ ‖D f (x)‖ ‖y‖ + ‖r(y)‖,

it follows that y→ 0 implies z→ 0, from which we conclude that ‖s(z)‖/‖z‖ → 0,hence ‖A2‖ → 0. n

Example: Consider the case where f : R2 → R3 is given by

f :(x1

x2

)7→

cos x1 cos x2

cos x1 sin x2

sin x1

,

100 Chapter 3

and g : R3 → R is given by

g :

y1

y2

y3

7→ y21 + 2y2

2 + 3y23.

The composite function is

(g ◦ f ) :(x1

x2

)7→ cos2 x1 cos2 x2 + 2 cos x2

1 sin x22 + 3 sin2 x1.

The derivative of f at x is given by

D f (x) =

− sin x1 cos x2 − cos x1 sin x2

− sin x1 sin x2 cos x1 cos x2

cos x1 0

,whereas the derivative of g at y is given by

Dg(y) = (2y1, 4y2, 6y3) .

The composition of the derivatives gives

Dg( f (x))◦D f (x) = (2 cos x1 cos x2, 4 cos x1 sin x2, 6 sin x1)

− sin x1 cos x2 − cos x1 sin x2− sin x1 sin x2 cos x1 cos x2

cos x1 0

.On the other hand,

Dg◦ f (x) =(sin 2x1

(− cos2 x2 − 2 sin x2

2 + 3)− cos2 x1 sin 2x2 + 2 cos x2

1 sin 2x2).

It is easily checked that the two are equal. N

Comment: One often encounters a different notation of the chain rule. If f : Rn →

Rm, g : Rm → Rk, and y = f (x), z = g(y), then an alternative way of writing thechain rule is

∂zi

∂x j=

m∑`=1

∂zi

∂y`

∂y`∂x j.


Definition 3.6 Let f : Rn → Rm be differentiable at x. The Jacobian of f at x isdefined as

J f (x) = det D f (x).

A non-zero Jacobian means that the derivative is an invertible transformation.The Jacobian also plays an important role in integration theory, since it quantifieshow volumes are transformed under mappings.

Example: Consider the transformation

f :(x1

x2

)7→

(x1 cos x2

x1 sin x2

).

Its Jacobian is J f (x) = x1. N

Example: The Jacobian of the identity is 1. N

Example: The Jacobian of a composite function satisfies

Jg◦ f (x) = Jg( f (x))J f (x).

N (36 hrs)

For real-valued function, a “small” derivative implies that the function changes by“little” relative to the displacement of its argument. We expect a similar result tohold for functions between multi-dimensional Euclidean spaces, except that herethe magnitude of the derivative should be the norm of its derivative.

Lemma 3.1 Let h : [0, 1] → Rm bea continuously-differentiable path inRm such that

sup0≤t≤1‖Dh(t)‖ = M.

Then,

‖h(1) − h(0)‖ ≤ M.

R!

h"0#

h"1#

Comment: Recall that the derivative Dh(t) = h′(t) is a column vector whose entriesare h′j(t).

102 Chapter 3

Proof : The quantity that we want to bound is the norm of the end-to-end displace-ment of the path. For that we need to use the given data about the magnitude ofthe derivative. We therefore define a function, f : R→ R, given by

f (t) = (h(t) − h(0), h(1) − h(0)) ,

keeping in mind that f (1) is the norm-square of the end-to-end displacement. Bythe continuity of the inner-product, we may differentiate inside the inner-product:

f ′(t) = (Dh(t), h(1) − h(0)) .

By the Cauchy-Schwarz inequality,

| f ′(t)| ≤ ‖Dh(t)‖ ‖h(1) − h(0)‖ ≤ M ‖h(1) − h(0)‖.

Since f is a differentiable real-valued function, there exists by the mean-valuetheorem a point ξ ∈ (0, 1) such that

f (1) − f (0) = f ′(ξ),

i.e.,‖h(1) − h(0)‖2 = f ′(ξ) ≤ M ‖h(1) − h(0)‖,

which is the desired result. n

With this lemma we prove the following theorem:

Theorem 3.2 Let A be an open subset of Rn, g : A→ Rm a differentiable function,and x, y ∈ A such that

I = {ty + (1 − t)x : 0 ≤ t ≤ 1} ⊂ A.

(If A is convex, e.g., a ball, then this requirement holds.) Assume furthermore that

supz∈I‖Dg(z)‖ = M.

Then,‖g(y) − g(x)‖ ≤ M ‖y − x‖.


Proof : To use the previous lemma we define

h(t) = g(ty + (1 − t)x︸︷︷︸f (t)

) t ∈ [0, 1],

which is a path [0, 1]→ Rm satisfying by the chain rule

Dh(t) = Dg( f (t)) ◦ D f (t) = Dg( f (t))(y − x),

i.e.,‖Dh(t)‖ ≤ sup

t∈(0,1)‖Dg( f (t))‖ ‖y − x‖ = M ‖y − x‖.

By the previous lemma, ‖h(1) − h(0)‖ ≤ M‖y − x‖, which is the desired result. n

AR!

x

I

y

g"x#

g"y#

h"t#$

We conclude this section by a short discussion on higher derivatives. First, re-call that the space C1(Rn;Rm) is the space of continuously-differentiable functionsfrom Rn to Rm, and it coincides with the space of functions that have continuouspartial derivatives. For f ∈ C1(Rn;Rm), its derivative is a mapping

D f : Rn 7→ Hom(Rn,Rm),

i.e., a (generally nonlinear!) mapping between two finite-dimensional normedspaces. Thus, we can define the derivative of D f at x: D f is called differentiableat x is there exists a linear operator

D2f (x) : Rn → Hom(Rn,Hom(Rn,Rm)),

such that

limy→0

‖D f (x + y) − D f (x) − D2f (x)y‖

‖y‖= 0.

104 Chapter 3

Note that D2f (x) is a bilinear form. It maps an ordered pair of vectors in Rn into a

vector in Rm. Specifically, we have

D2f (x) ∈ Hom(Rn,Hom(Rn,Rm))

D2f (x)y ∈ Hom(Rn,Rm)

D2f (x)yz ∈ Rm.

To understand how does the second derivative operate, we use (3.4) to get

D2f (x)y = lim

h→0

D f (x + hy) − D f (x)h

.

Substituting y = e j we get

D2f (x)e j =

∂D f

∂x j(x).

This is an equality between operators. Applying both sides on the basis vector ei

we get

D2f (x)e jei =

∂

∂x j(D f (x)ei) =

∂2 f∂xi ∂x j

(x).

When f is real-valued this is a matrix called the Hessian (the matrix of secondderivatives). Given two vectors y, z ∈ Rn

D2f (x)yz =

n∑i, j=1

yiz j∂2 f∂xi ∂x j

(x).

In a similar way, higher derivatives may be defined. We conclude this discussionwith following definition:

Definition 3.7 We denote by Ck(A;Rm) the class of functions f for which allpartial derivatives,

∂k f∂xi1∂xi2 . . . ∂xik

exist and are continuous.

3.4 Multivariate mean value and Taylor theorems

Proposition 3.10 Let A ⊆ Rn be an open, path-connected set. Then f : A → Rm

is constant if and only if D f (x) = 0 (the zero operator) for all x ∈ A.


Proof : We have already seen that f = const implies D f = 0. Suppose nowthat D f = 0 in A, and let x, y ∈ A. Since A is path-connected, there exists a(continuously-differentiable) path ϕ : I → A such that ϕ(0) = x and ϕ(1) = y.Define then the path g : I → Rm by

g(t) = f (ϕ(t)).

By the chain rule,g′(t) = Dg(t) = D f (ϕ(t)) ◦ ϕ′(t) = 0,

from which we conclude that all the components of g are constant, i.e., f (x) =f (y). n

Corollary 3.2 Let A ⊆ Rn be open and path connected and f , g ∈ C1(A;Rm) suchthat

D f (x) = Dg(x) for all x ∈ A.

Then there exists a constant c ∈ Rm such that

f (x) = g(x) + c.

Proof : Apply the previous proposition to the function h(x) = f (x) − g(x). n

We now consider a generalization of the mean value theorem. Recall that forf ∈ C1([a, b];R), there exists a ξ ∈ (a, b) such that

f (b) − f (a) = f ′(ξ)(b − a),

or, alternatively, there exists a θ ∈ (0, 1) such that

f (b) − f (a) = f ′(a + θ(b − a))(b − a).

The question is whether this theorem holds for functions f : Rn → Rm, i.e., is ittrue that for every x, y ∈ Rn there exists a θ ∈ (0, 1) such that

f (y) − f (x) = D f (x + θ(y − x))(y − x)?

In general, this is false. It only holds for real-valued functions.

Theorem 3.3 (Mean value theorem) Let A ⊆ Rn be open, x, y ∈ A, such that thesegment that connects them is in A, and f ∈ C1(A;R). Then there exists a θ ∈ (0, 1)such that

f (y) − f (x) = D f (x + θ(y − x))(y − x).

106 Chapter 3

Proof : The idea is to use this known result for univariate function. Consider thefunction g : I → R:

g(t) = f (x + t(y − x)).

This function is differentiable with

g′(t) = D f (x + t(y − x))(y − x).

By the univariate mean value theorem there exists a θ ∈ (0, 1), such that

g(1) − g(0) = g′(θ),

which concludes the proof. n

Comment: Since a function f : Rn → Rm is a row of m real-valued functions,each component satisfies the mean-value theorem, but with a different θ, that isthere exist θ1, . . . , θm ∈ (0, 1) such that

f j(y) − f j(x) = D f j(x + θ j(y − x))(y − x).

In vector notation,

f (y) − f (x) =

∂ f1

∂x1(ξ1) . . .

∂ f1

∂xn(ξ1)

......

...∂ fm

∂x1(ξm) . . .

∂ fm

∂xn(ξm)

(y − x).

Example: Here is a “counter example”. Consider the function f : R→ R3:

f : t 7→

cos tsin t

t

.This function is differentiable with

D f (t) = f ′(t) =

− sin tcos t

1

,


but for all θ ∈ (0, 1),

D f (θ)(1 − 0) =

− sin θcos θ

1

,001

= f (1) − f (0).

N

We proceed to show that if f : Rn → Rm has continuous second partial derivatives,i.e., f ∈ C2(Rn;Rm), then the order of mixed derivation does not matter, i.e, forevery ` = 1, . . . ,m, i, j = 1, . . . , n,

∂2 f`∂xi∂x j

=∂2 f`∂x j∂xi

.

Clearly, it is enough to consider real-valued functions.

Theorem 3.4 (Equality of mixed derivatives) Let A ⊂ Rn be an open set andf ∈ C2(A;R). Then for every i, j = 1, . . . , n,

∂2 f∂xi∂x j

=∂2 f∂x j∂xi

.

Proof : Obviously, we only need to care about mixed derivatives, i , j. Let x ∈ A,and let t, s be real numbers such that the rectangle whose vertices are

x, x + tei, x + se j and x + tei + se j

is in A. Define,g(x) = f (x + te j) − f (x).

This function is differentiable with

Dg(x) = D f (x + te j) − D f (x).

By the mean value theorem, which we apply twice,

g(x + sei) − g(x) = Dg(x + θsei)(sei)

=[D f (x + te j + θsei) − D f (x + θsei)

](sei)

=

[∂ f∂xi

(x + te j + θsei) −∂ f∂xi

(x + θsei)]

s

= st∂2 f∂x j∂xi

(x + θsei + θte j)

108 Chapter 3

This can be rearranged as follows

1t

[f (x + tei + se j) − f (x + tei)

s−

f (x + se j) − f (x)s

]=∂2 f∂x j∂xi

(x + θsei + θte j),

and it remains to take s→ 0 followed by t → 0. n

Finally, we prove a multivariate version of Taylor’s theorem. Here again, weconsider real-valued functions. The result applies for Rm-valued functions in arow-by-row fashion.

Theorem 3.5 Let A ⊂ Rn be an open set, f ∈ Ck+1(A;R), x ∈ A, x+ y ∈ A (as wellas the segment connecting the two points). Then there exists a θ ∈ (0, 1) such that

f (x + y) = f (x) +n∑

i=1

yi∂ f∂xi

(x) +12

n∑i=1

n∑j=1

yiy j∂2 f∂x j∂xi

(x) + · · · + Rk(y),

where

Rk(y) =1

(k + 1)!

n∑i1=1

· · ·

n∑ik+1=1

yi1 . . . yik+1

∂k+1 f∂xi1 . . . ∂xik+1

(x + θy).

Comment: The compact formulation (that uses differential forms) is

f (x + y) = f (x) + D f (x)y +12

D2f (x)(y ⊗ y) +

16

D3f (x)(y ⊗ y ⊗ y) + . . . .

Proof : Here again, we base the proof on the univariate version of Taylor’s theo-rem. Define g : I → R by

g(t) = f (x + ty),

and expand g(1) about t = 0. We note that

g(0) = f (x)

g′(0) =n∑

i=1

∂ f∂xi

(x)yi

g′′(0) =n∑

i=1

n∑j=1

yiy j∂2 f∂xi ∂x j

(x),

and so on. n


Comment: For functions f : Rn → Rm each component f j satisfies Taylor’s the-orem, but every component will have its own θ, as we have already seen for themean value theorem.

Example: Taylor’s theorem is above all an approximation method. It states thatevery (smooth) function can, to zeroth order, be approximated by a constant, tofirst order by a linear function, and so on. First-order approximation gives riseto the multivariate Newton method. Suppose we have a function f : Rn → Rn

whose root(s) we want to compute. That is, we are looking for r ∈ Rn for whichRn 3 f (r) = 0. Suppose we have an initial guess x0 close enough to the desired r.By Taylor’s theorem,

f (x) = f (x0) + D f (x0)(x − x0) + . . . .

Retaining only the linear approximation, and provided that D f (x0) is not singular,an approximation for r is obtained by

f (x0) + D f (x0)(r − x0) = 0 ⇒ r = x0 − [D f (x0)]−1 f (x0).

This suggests the following iterative method:

xn+1 = xn − [D f (xn)]−1 f (xn),

known as the multivariate Newton method. The hope is that xn → r. This se-quence does not always converge, but when it does, it does it extremely fast! N

. Exercise 3.6 Assume that f ∈ C1(Rn;Rn), r a root of f and J f (r) , 0. Showthat there exists a neighborhood of r in which Newton’s method converges to r.

3.5 Minima and maxima

We devote this short section to the identification of extrema of multivariate real-valued functions. Throughout this section we consider functions f ∈ C2(A;R),where A ⊆ Rn is open.

Definition 3.8 A point a ∈ A is called a local maximum of f if there exists anopen neighborhood of a, U, such that

f (a) = supx∈U

f (x).

A local minimum is defined accordingly.

110 Chapter 3

The following proposition generalizes the well known property of extremal pointsfor univariate functions.

Proposition 3.11 If a ∈ A is a local maximum of f then D f (a) = 0.

Comment: A point where the derivative vanishes is called a critical point.

Proof : Let U be a neighborhood of a in which f (a) is maximal. By definition,there exists an open ball B0(a, r) ⊂ U in which f (a) is maximal. To show thatD f (a) = 0 we need to show that D f (x)y = 0 for every unit vector y. Let then y bean arbitrary unit vector and consider the path g : I → R,

g(t) = f (a + (2t − 1)ry),

which is inside B0(a, r). The path g is differentiable and reaches a local maximumat t = 1/2, hence

0 = g′(1/2) = 2r D f (a)y,

which concludes the proof. n

We know already for univariate functions that a vanishing derivative does notguarantee a local extremum. A univariate functions f reaches a local maximum ata if its first derivative vanishes and, in addition, its second derivative is negative.We expect a similar condition to hold for multivariate functions, except that thesecond derivative is a matrix (the Hessian).

Definition 3.9 An n-by-n matrix A is called positive-definite if

(x, Ax) ≥ 0 for all x ∈ Rn,

with equality only if x = 0. A is called positive-semidefinite if equality may alsohold for x , 0. Negative-definiteness is defined similarly.

Theorem 3.6 Let f ∈ C2(A;R) and let a ∈ A be a critical point. (a) If the Hessianof f at a is negative-definite then a is a local maximum. (b) If a is a local maximumthen the Hessian at a is negative-semidefinite.

Proof : (a) Negative-definiteness implies a local maximum: By Taylor’s theo-rem, there exists for every displacement y a θ ∈ (0, 1) such that

f (a + y) = f (a) + (y,D2f (a + θy)y),


where we have used the fact that a is a critical point. Since D2f (a) is negative-

definite and the unit ball is compact there exists a number r < 0 such that

r = max‖z‖=1

(z,D2f (a)z).

Since f is continuously twice differentiable, there exists a δ > 0 such that ify ∈ B0(a, δ) then

‖D2f (a + y) − D2

f (a)‖ ≤ −r/2.

For every y ∈ B0(a, δ) and unit vector z, using the negative-semidefiniteness ofD2

f (a),

(z,D2f (a + θy)z) = (z,D2

f (a)z) + (z, (D2f (a + θy) − D2

f (a))z)

≤ r + (z, (D2f (a + θy) − D2

f (a))z)

≤ r + ‖D2f (a + θy) − D2

f (a)‖,

or, (z,D2

f (a + θy)z)≤ r/2 < 0,

from which we conclude that f (a + y) < f (a), i.e., f assumes a local minimum ata.

(b) Local maximum implies semidefiniteness: Here we may use the univariatecondition of a maximum. Let B0(a, r) be an open ball in which f reaches itsmaximum at a, and let y ∈ Rn have norm less that r. Consider the path g : I → R,

g(t) = f (a + (2t − 1)y)

Since g reaches a local maximum at t = 1/2 is follows that

g′′(1/2) =n∑

i, j=1

yiy j∂2 f∂xi ∂x j

(a) ≤ 0,

i.e., D2f (a) is negative semi-definite. n

3.6 The inverse function theorem

Lemma 3.2 Let A ⊂ Rn be an open set and f ∈ C1(A;Rm). Let x, y ∈ A such thatthe segment I that connects them is in A. Then for every a ∈ A,

‖ f (x) − f (y) − D f (a)(x − y)‖ ≤ ‖x − y‖ supw∈I‖D f (a) − D f (w)‖.

112 Chapter 3

Comment: The one-dimensional reduction of this lemma is

| f (x) − f (y) − f ′(a)(x − y)| ≤ |x − y| supw∈I| f ′(a) − f ′(w)|.

Proof : Define the function g : A→ Rm,

g(x) = f (x) − D f (a)x.

Then,Dg(x) = D f (x) − D f (a),

and by Theorem 3.2,

‖g(x) − g(y)‖ ≤ ‖x − y‖ supw∈I‖Dg(w)‖,


Theorem 3.7 (Inverse function theorem) Let A ⊂ Rn be an open set and f ∈C1(A; Rn). Suppose that J f (a) , 0 for some point a ∈ A. Then there exists anopen neighborhood U of a, a ∈ U ⊂ A, such that V = f (U) is also open, and fis one-to-one and onto V. This defines an inverse function f −1 : V → U, which isalso continuously differentiable.

Comment: For real-valued functions, the inverse function theorem states that iff : R → R is continuously-differentiable and at the point a, f ′(a) , 0, then thereexists a neighborhood of a in which f has an inverse, which is also continuously-differentiable.

Comment: If the theorem holds, then we can derive a formula for the derivativeof f −1 using the chain rule. Since

f ( f −1(y)) = y,

it follows thatD f ( f −1(y)) ◦ D f −1(y) = I,

or,D f −1(y) = [D f ( f −1(y))]−1.

(38 hrs)


Proof : Since J f (a) is non-zero (i.e, det D f (a) , 0), the linear transformation D f (a)has an inverse (D f (a))−1. Moreover, since D f is continuous, there exists a closedball B(a, r) ⊂ A such that

‖D f (y) − D f (a)‖ ≤1

2 ‖(D f (a))−1‖

for all y ∈ B(a, r). (We choose this ball such that D f does not change by “toomuch” in it; in particular, we will show that D f remains invertible in B(a, r).)

Step 1: f is one-to-one in B(a, r) That is, we show that

x, y ∈ B(a, r) and x , y ⇒ f (x) , f (y).

From the Lemma 3.2 it follows that for every two points x, y ∈ B(a, r)

‖ f (x) − f (y) − D f (a)(x − y)‖ ≤ ‖x − y‖ supw∈B(a,r)

‖D f (w) − D f (a)‖ ≤‖x − y‖

2 ‖(D f (a))−1‖.

Applying the triangle inequality,

‖ f (x) − f (y)‖ ≥ ‖D f (a)(x − y)‖ − ‖ f (x) − f (y) − D f (a)(x − y)‖

≥ ‖D f (a)(x − y)‖ −‖x − y‖

2 ‖(D f (a))−1‖

=1

‖(D f (a))−1‖

[‖(D f (a))−1‖ ‖D f (a)(x − y)‖ −

12‖x − y‖

]≥

1‖(D f (a))−1‖

[‖(D f (a))−1D f (a)(x − y)‖ −

12‖x − y‖

],

i.e.,

‖ f (x) − f (y)‖ ≥‖x − y‖

2 ‖(D f (a))−1‖. (3.6)

This implies that f is one-to-one in B(a, r).

Step 2: D f is invertible in B(a, r) To show that, we need to show that

x ∈ B(a, r) and 0 , y ∈ Rn ⇒ D f (x)y , 0.

114 Chapter 3

Let x ∈ B(a, r) and 0 , y ∈ Rn; we use the continuity of the norm and (3.6) to get

‖D f (x)y‖ =∥∥∥∥∥lim

h→0

f (x + hy) − f (x)h

∥∥∥∥∥ = limh→0

1|h|‖ f (x + hy) − f (x)‖

≥ limh→0

1|h|

‖hy‖2 ‖(D f (a))−1‖

=‖y‖

2 ‖(D f (a))−1‖> 0,

which proves that D f (x) is invertible.

a f!a"

B!a,r"

f!B!a,r""

V

f#1!V"

Step 3: construction of U and V We proved that f was one-to-one in B(a, r).It seems as if the open ball B0(a, r) is a good candidate for the neighborhoodU of a. The problem is that it is not necessarily true that f (B0(a, r)) is open(continuous functions not necessarily map open sets into open sets). We nowshow that f (B0(a, r)) contains an open neighborhood V of f (a); its pre-image Uis guaranteed to be open, and f will be a one-to-one function from U onto V .

Let x ∈ ∂B(a, r); since f is one-to-one f (x) , f (a). The boundary of B(a, r) isclosed and bounded, hence compact, which implies that f (∂B(a, r)) is compactand does not include f (a). We proved in the past that it implies that there is apositive distance, ε, between f (a) and f (∂B(a, r)). Consider now the open ballB0( f (a), ε/2). We claim that

B0( f (a), ε/2) ⊂ f (B0(a, r)),

i.e.,∀w ∈ B0( f (a), ε/2) ∃b ∈ B0(a, r) such that f (b) = w.

Let w be an arbitrary point in B0( f (a), ε/2). Consider the function h : B(a, r)→ Rdefined by

h(x) = ‖ f (x) − w‖2 =n∑

j=1

( f j(x) − w j)2.


This function is continuous and defined on a compact set, hence it reaches a min-imum. Denote this minimum by b, that is b ∈ B(a, r) satisfies

‖ f (b) − w‖ ≤ ‖ f (x) − w‖ for all x ∈ B(a, r).

We claim that b cannot be on the boundary, since b ∈ ∂B(a, r) would imply

‖ f (b) − f (a)‖ ≥ ε,

and therefore

‖ f (b) − w‖ ≥ ‖ f (b) − f (a)‖ − ‖ f (a) − w‖ > ε −ε

2> ‖ f (a) − w‖,

which is a contradiction.

Thus, b ∈ B0(a, r). Since b is the minimizer of the differentiable function h, all itspartial derivatives must vanish at b, i.e.,

0 =∂h∂xi

(b) = 2n∑

j=1

( f j(b) − w j)∂ f j

∂xi(b) = 2( f (b) − w)T D f (b).

But D f (b) is invertible, from which we conclude that f (b) = w, i.e., that w ∈f (B0(a, r). Since w was chosen as an arbitrary point in B0( f (a), ε/2) it followsthat

B0( f (a), ε/2) ⊂ f (B(a, r)).

At this point we set V = B0( f (a), ε/2) and U = f −1(V) ∩ B0(a, r). These two setsare open and f is one-to-one and onto V , hence f −1 is defined on V .

Step 4: f −1 is continuously-differentiable We first show that f −1 is Lipschitzcontinuous: let y, z ∈ V , then by (3.6)

‖y − z‖ = ‖ f ( f −1(y)) − f ( f −1(z))‖ ≥‖ f −1(y) − f −1(z)‖

2 ‖(D f (a))−1‖,

or‖ f −1(y) − f −1(z)‖ ≤ 2 ‖(D f (a))−1‖ · ‖y − z‖.

We next show that f −1 is differentiable. Let y be an arbitrary point in V , and k ∈ Rn

sufficiently small such that y + k ∈ V . We define

x = f −1(y) and h = f −1(y + k) − f −1(y),

116 Chapter 3

and note that conversely,

y = f (x) and k = f (x + h) − f (x).

Since f −1 is Lipschitz continuous in V , there exists a constant M such that

‖h‖ ≤ M ‖k‖. (3.7)

Then,

f −1(y + k) − f −1(y) − (D f (x))−1k = (D f (x))−1[D f (y)h − k

]= −(D f (y))−1

[f (x + h) − f (x) − D f (x)h

].

By the differentiability of f and (3.7), the right hand side tends to zero whenk → 0, from which we conclude that f −1 is differentiable, and moreover,

D f −1(y) = (D f (x))−1 = [D f ( f −1(y))]−1.

Finally f −1 is continuously-differentiable due to the continuity of the matrix in-verse. n(40 hrs)

Before we proceed, some preliminaries. Recall that the column rank of a trans-formation A : Rn → Rm is the dimension of the vector space spanned by itscolumns (a subspace of Rm). Similarly, its row rank is the dimension of the vec-tor space spanned by its rows (a subspace of Rn). The row rank and column rankare always equal, hence

rank(A) ≤ min(m, n).

A is said to have full rank if rank(A) = m, thus it can have full rank only if m ≤ n(a transformation from a “large” space into a “small” space; A is a “fat-and-short”matrix).

Lemma 3.3 If A : Rn → Rm has full rank, then there exists a matrix B : Rm → Rn

such that AB : Rm → Rm is invertible.

Proof : Since A has rank m, its null space (or kernel) is (n − m)-dimensional, i.e.,Rn has an m-dimensional subpace which is orthogonal to ker A—there exist morthonormal vectors u1, . . . , un ∈ R

n, such that

Au j , 0 and Aw = 0 implies (w, u j) = 0.

Take then the columns of B be the u j. We claim that AB is invertible. Indeed, forevery non-zero x ∈ Rm, Bx ∈ (ker A)⊥, which implies that ABx , 0, i.e. AB isnon-singular. n(42 hrs)


Discussion: The derivative of a function f at a point a is a mapping betweendisplacements: it maps displacements from a into displacements from f (a). In themore general context of functions on manifolds, it maps vectors in the tangentspace that it attached to every point.

Consider momentarily functions f : R → R. The derivative f ′(a) tells us thatlocally a displacement x from the point a will be mapped into a displacementf ′(a)x from the point f (a). If f ′(a) , 0, an open neighborhood of a will bemapped into an open neighborhood of f (a), that is, the mapping is locally an“open” one. Similarly, for functions f : Rn → Rm the derivative D f (a) tells usthat a displacement x from a is mapped into a displacement D f (a)x of f (a) (up toan o(‖x‖) correction). If the rank of D f (a) is less than m, then there are directionsin Rm which are not “covered” by local displacements. If, on the other hand, D f (a)has full rank, then we expect again that an open neighborhood of a will be mappedinto an open neighborhood of f (a). The open mapping theorem formalizes theseideas.

Theorem 3.8 (Open mapping theorem) Let A ⊂ Rn be an open set, and let f ∈C1(A;Rm) be such that D f (x) has full rank for all x ∈ A. Then f is an openmapping: if B ⊂ A is open in Rn then f (B) is open in Rm.

Proof : We need to show that for every open set B ⊂ A and a ∈ B, f (a) is aninterior point of f (B), since it would imply that f (B) only contains interior points.This follows from the next lemma. n

Lemma 3.4 Let A ⊂ Rn be an open set and f ∈ C1(A;Rm). Suppose that at apoint a ∈ A, rank D f (a) = m (i.e., the derivative has full rank), then f (a) is aninterior point of f (A).

Proof : Since D f (a) has full rank, there exists a linear mapping T : Rm → Rn, suchthat D f (a) ◦ T is invertible. Define the function S : Rm → Rn,

S (x) = a + T x,

and on S −1(A) we define the function

g(x) = f (S (x)),

which is a function from an open subset of Rm into Rm (since T is continuous). Bythe chain rule

Dg(0) = D f (S (0)) ◦ T = D f (a) ◦ T,

118 Chapter 3

which is an invertible transformation. From the inverse mapping theorem, thepoint b has an open neighborhood which is mapped onto an open set that containsg(b) = f (a). Since the image of g is a subset of the image of f , we conclude thatf (a) has a neighborhood included in f (A), i.e., f (a) is an interior point.

n

Comment: This theorem has a generalized version, known as the Banach-Schaudertheorem: if X and Y are Banach spaces and A : X → Y is a surjective continuouslinear transformation, then A is an “open mapping”.

Example: Here is a “counter example”: the mapping f : R→ R,

f (x) = x2,

has a point x = 0, where D f (0) = 0 has rank less than one. This mapping is notopen since f (R) = [0,∞). N

3.7 The implicit function theorem

Consider, for example, the equation,

g(x, y) = ey cos(y3 + xy2 + x2) + ex sin(xy) − 1 = 0. (3.8)

This equation defines a relation between x and y; for every x there may existzero, one, or more values of y for which this equation is satisfied. Thus, such anequation defines a function R→ R, that may not be defined on the whole real line,and that may or may not be single-valued. If a (single-valued) function f : R→ Rsatisfying

g(x, f (x)) = 0

does exists, in say, a domain A ⊂ R, we say that (3.8) defines implicitly thefunction f (x). The implicit function theorem, which is the main topic of thischapter, states, in a more general setting, conditions on g that ensure that theequation g(x, y) = 0 does indeed define a function y = f (x).

Theorem 3.9 (Implicit function theorem) Let A ⊂ Rm and B ⊂ Rn be open sets,and

f ∈ C1(B × A;Rn) satisfies f (b, a) = 0


at a point (b, a) ∈ B × A ( f (y, x) is a mapping from a subset of Rn+m into Rn).Furthermore, suppose that the square matrix

∂ fi

∂y j(b, a), i, j = 1, . . . , n

is invertible. Then there exist open sets (b, a) ∈ U ⊂ Rn+m and a ∈ V ⊂ Rm and afunction g ∈ C1(V; B) such that for all (y, x) ∈ U,

f (y, x) = 0 iff y = g(x).

x!R!

y!R"

A

B

f#y,x$=0

a

a

U

V

Comment: Just a variable count: the equation f (y, x) = 0 constitutes n equationsfor n + m variables. Under the theorem’s condition, if we fix m of those vari-ables, we remain with n equations in n variables, and a unique solution exists.This implicitly defines a mapping between the m “fixed” variables and n resultingvariables, i.e., a mapping Rm → Rn.

Example: Consider the case m = n = 1 and the function

f (y, x) = x2y + xy3 − 2.

At the point (b, a) = (1, 1) we have f (b, a) = 0. Does there exist a neighborhoodof x = 1 and a function y = g(x) such that f (y, x) = 0 iff y = g(x)? According tothe theorem, we only need to verify that

∂ f∂y

(1, 1) = 1 + 3 = 4 , 0.

N

120 Chapter 3

Proof : Define a function F : B × A→ Rn+m by

F : (y, x) 7→ ( f (y, x), x).

At the point (y, x) = (b, a) we have

F(b, a) = ( f (b, a), a) = (0, a),

and

DF(b, a) =

∂ f1

∂y1. . .

∂ f1

∂yn

∂ f1

∂x1. . .

∂ f1

∂xm...

......

......

...∂ fn

∂y1. . .

∂ fn

∂yn

∂ fn

∂x1. . .

∂ fn

∂xm0 . . . 0 1 . . . 0

0 . . . 0 0 . . . 0

0 . . . 0 0 . . . 1

By assumption, the determinant of this matrix is non-zero. Thus, there exists, bythe inverse function theorem, an open neighborhood U of (b, a) such that W is anopen neighborhood of (0, a), F : U → W is one-to-one and onto and F−1 : W →U is continuously differentiable. Note that F−1 is a mapping of the form

(z, x) 7→ (h(z, x), x),

where h : W → B is continuously differentiable.

Define nowV = {x ∈ Rm : (0, x) ∈ W}

and g : V → B by g(x) = h(0, x). Since W is open so is V . And since h iscontinuously differentiable, so is g. Now,

(y, x) ∈ U, f (y, x) = 0 ⇔ (y, x) ∈ U, F(y, x) = (0, x)

⇔ (0, x) ∈ W, F−1(0, x) = (y, x)⇔ x ∈ V, h(0, x) = y⇔ x ∈ V, g(x) = y.

n(44 hrs)


Example: Consider the two equations:

0 = x3yt3 + ytz + 3y2 − 5

0 = z3 + xt2 − 2.

Does there exist a neighborhood of the point x = y = z = t = 1 where all thesolutions are of the form (

zt

)= g

(xy

),

where g is a continuously differentiable function R2 → R2?

To comply to the structure of the theorem we rewrite this system as

f1(y1, y2, x1, x2) = x31x2y3

2 + x2y2y1 + 3x22 − 5 = 0

f2(y1, y2, x1, x2) = y31 + x1y2

2 − 2 = 0,

where (y1, y2) are the old (z, t) and (x1, x2) are the old (x, y). Now∂ f1

∂y1

∂ f1

∂y2∂ f2

∂y1

∂ f2

∂y2

(1, 1, 1, 1) =(1 43 2

),

and the latter is invertible. N

3.8 Lagrange multipliers

In this section we consider the following problem: suppose we have a domainB ⊂ Rn and a function f ∈ C1(B;R), and we are looking for local minima ormaxima of f . We have seen that a necessary condition is that D f (the gradientof f ) vanishes as that point. Suppose however that we restrict ourselves to asubset of B determined by a set of k constraints: gi(x) = 0, where gi ∈ C1(B;R),i = 1, . . . , k. That is, we consider a restricted set:

A = {x ∈ B : g1(x) = · · · = gk(x) = 0} .

(For A not to be a trivial set, we need n ≥ k + 1.) The question is how to find alocal extremum of f within the set A.

122 Chapter 3

Example: A probability distribution on the set {1, . . . , n} is a vector p = (p1, . . . , pn)of non-negative entries that sum up to one. The entropy of the distribution is afunction H : Rn → R:

H(p) = −n∑

i=1

pi log pi.

We want to find the distribution that maximizes H under the constaint

g(p) =n∑

i=1

pi − 1 = 0.

N

Before formalizing this problem, we can solve it by the following considerations.Recall that the gradient of f at a points along the direction normal to the (n − 1)-dimensional hyperplane tangent to the level set of f . That is,

∇ f (a) ⊥ M⊥f (a) def=

{x ∈ Rn : D f (a)x = 0

}.

For f to be a local extremum in the constrained set A, we need f not to (locally)vary along directions that are level sets of all the gi at a. That is, we need

y ∈ M⊥g1(a) ∩ · · · ∩ M⊥gk

(a) ⇒ y ∈ M⊥f (a),

or.M⊥g1

(a) ∩ · · · ∩ M⊥gk(a) ⊆ M⊥f (a).

This can be re-written as follows

{x ∈ Rn : D f (a)x = 0

}⊇

x ∈ Rn :k∑

j=1

λ jDg j(a)x = 0,∀(λ1, . . . , λk)

.It follows that for a to be a local extremum of f in the hyperplane tangent to thelevel sets of all the gi there must exist k numbers (λ1, . . . , λk) such that

D f (a) =k∑

j=1

λ jDg j(a).

This will be found to be a corollary of the following theorem:


Theorem 3.10 Let B ⊂ Rn, k + 1 ≤ n,

f , g1, . . . , gk ∈ C1(B;R),

A = {x ∈ B : g1(x) = · · · = gk(x) = 0} ,

and a ∈ A is a local extremum point of f |A. Then the (k + 1)-by-n matrix

∂g1

∂x1. . .

∂g1

∂xn...

......

∂gk

∂x1. . .

∂gk

∂xn∂ f∂x1

. . .∂ f∂xn

has rank less than k + 1.

Proof : Suppose, wlog, that a is a local maximum point. That is, there exists anopen neighborhood U ⊂ B of a such that

f (a) = maxx∈A∩U

f (x).

Consider now the function F : U → Rk+1 defined by

F(x) =

g1...

gk

f

.We need to prove that DF(a) has rank less than k + 1. By contradiction, if it hadrank k + 1, then by the open mapping theorem

F(a) = (0, . . . , 0, f (a))T

would be an interior point of F(U) in Rn, which contradicts the fact that all thepoints of the form (0, . . . , 0, t), t > f (a) are not in F(U). n

Corollary 3.3 If Dg1(a), . . . ,Dgk(a) are linearly independent, then there exist knumbers λ1, . . . , λk, such that

D f (a) =k∑

i=1

λiDg1(a).

124 Chapter 3

The λi are called Lagrange multipliers. Thus, to find local extrema of f in A oneneeds to solve n + k equations in n + k unknowns:

g1(x) = 0...

gk(x) = 0

D f (x) =k∑

i=1

λiDg1(x).

Example: Let’s return to the above example: to maximize

f (p) = −n∑

i=1

pi log pi,

subject to the constraint

g(p) =n∑

i=1

pi − 1

we solve the system in n + 1 variables,

0 =n∑

i=1

pi − 1

− 1 − log p j = λ,

from which we get that all the p j are equal, i.e., equal to 1/n. This means that thedistribution that maximizes the entropy is the uniform distribution. N

Example: Find the non-negative vector in Rn that maximizes f (x) = x1x2 . . . xn

subject to the constraint that x1 + · · · + xn = n. Here we solve the system

x1 + · · · + xn = nx1x2 . . . x j−1x j+1 . . . xn = λ, j = 1, . . . , n.

Here again, we deduce that all the x j are equal to each other, and given by 1. Thismeans that for all such x,

x1x2 . . . xn ≤ 1.


In particular, for every non-negative vector y the vector

x =ny

y1 + · · · + yn

satisfies the normalization requirement, and therefore

nny1y2 . . . yn

(y1 + · · · + yn)n ≤ 1,

or,

(y1y2 . . . yn)1/n ≤1n

(y1 + · · · + yn),

which is the well-known arithmetic-mean-geometric-mean inequality. N

126 Chapter 3

Chapter 4

Integration in Rn

Consider a function f : A → R, where A ⊆ Rn. We would like to define theintegral of f over A, ∫

Af (x) dx.

Intuitively, it should be a straightforward generalization of the Riemann integral:partition A into little cubes, sample f in each cube, and sum up the sampled valuedof f times the volume of the cubes. The integral is the limit of this procedure asthe cubes become finer and finer. We are now going to formalize this general idea.

We start by defining integrals over domains that are rectangular boxes. Recall thata closed box in Rn is a set of the form

A = [a1, b1] × [a2, b2] × · · · × [an, bn],

with bi > ai, and that its volume is

V(A) =n∏

i=1

(bi − ai).

Suppose then that for each j = 1, . . . , n the segment [a j, b j] has a partition P j

made of pointsa j = t0

j < t1j < · · · < tm j

j = b j.

Such partitions of the segments induce a partition of the box into sub-boxes of theform

Ci1,...,in = [ti1−11 , ti1

1 ] × · · · × [tin−1n , tin

n ].

128 Chapter 4

There are (m1 − 1)(m2 − 1) . . . (mn − 1) such sub-boxes. We will call this partitionof A as formed by the partition P = P1× · · · ×Pn. It is tedious, but straightforwardto verify that

V(A) =∑

i1,...,in

V(Ci1,...,in).

In most cases, we will not need to worry about indexing: a partition of the seg-ments induces a partition of the box, and we can enumerate the sub-boxes in theorder we wish.

Definition 4.1 Let A be a box in Rn, f : A → R a bounded function, andA1, . . . , Ar the partition of A induced by a partition P. Define,

mi = infx∈Ai

f (x) and Mi = supx∈A

f (x).

The lower sum of f on A with respect to the partition P is

s( f , P) =r∑

i=1

mi V(Ai)

and similarly the upper sum of f on A with respect to the partition P is

S ( f , P) =r∑

i=1

Mi V(Ai)

Finally, a partition P′ is called a refinement of P if it is obtained from P by addingpartition points.

Proposition 4.1 If P′ is a refinement of P then

s( f , P) ≤ s( f , P′) ≤ S ( f , P′) ≤ S ( f , P).

Proof : This is obvious. n

Corollary 4.1 For every two partitions P, P′,

s( f , P) ≤ S ( f , P′).

Proof : Add a sub-partition of both P and P′. n

Integration in Rn 129

Definition 4.2 Let A and f be defined as above. The lower integral of f overthe box A is defined by∫

A

f (x) dx = sup {s( f , P) : all partitions P} .

The upper integral of f over the box A is defined by∫A

f (x) dx = inf {S ( f , P) : all partitions P} .

Definition 4.3 f is called Riemann integrable on A if∫A

f (x) dx =∫

Af (x) dx.

We call this expression the integral of f over A and denote it by∫A

f (x) dx or"

Af (x1, . . . , xn) dx1dx2 . . . dxn.

Proposition 4.2 Let f , g be Riemann integrable over A and c ∈ R, then

À f + g is integrable and∫A( f + g)(x) dx =

∫A

f (x) dx +∫

Ag(x) dx.

Á c f is integrable and ∫A(c f )(x) dx = c

∫A

f (x) dx.

Â The function 1 is integrable and∫A

1 dx = V(A).

Ã If f (x) ≥ 0 for all x ∈ A then ∫A

f (x) dx ≥ 0.

130 Chapter 4

Proof :

À For every partition S ,

s( f , P) + s(g, P) ≤ s( f + g, P) ≤ S ( f + g, P) ≤ S ( f , P) + S (g, P),

and taking suprema on s and infima on S we obtain the desired result.Á If c > 0 then for every partition P

s(c f , P) = c s( f , P) and S (c f , P) = c S ( f , P),

and we take suprema and infima, respectively. Otherwise, if c < 0,

s(c f , P) = c S ( f , P) and S (c f , P) = c s( f , P),

and, say,

supP

s(c f , P) = supP

c S ( f , P) = c infP

S ( f , P) = c∫

Af (x) dx.

Â For every partition P inducing a partition A1, . . . , Ar,

s(1, P) = S (1, P) =r∑

i=1

V(Ai) = V(A).

Ã If f (x) ≥ 0 then s( f , P) ≥ 0 for every partition P.

n

Corollary 4.2 If f (x) ≤ g(x) for all x then∫A

f (x) dx ≤∫

Ag(x) dx.

Proof : By the positivity of the integral,∫A

f dx ≤∫

Af dx +

∫A(g − f ) dx =

∫A

g dx.

n


Corollary 4.3 If | f (x)| ≤ M for all x then∣∣∣∣∣∫A

f (x) dx∣∣∣∣∣ ≤ M V(A).

We are now going to show the additivity of the integral with respect to the in-tegration set. At this point, we are still restricted to integration over rectangularboxes.

Proposition 4.3 Let A, B ⊂ Rn be boxes such that A◦ and B◦ are disjoint andA ∪ B is a box. Let f : A ∪ B → R be integrable over A and over B. Then it isintegrable over A ∪ B and∫

A∪Bf dx =

∫A

f dx +∫

Bf dx.

Proof : Let PA and PB be partitions of A and B. We can form a partition P ofA∪ B that contains all the points PA and PB. The partitions PA ∩ P and PB ∩ P arerefinements of PA and PB respectively.

A

B

Hence

s( f , PA) + s( f , PB) ≤ s( f , PA ∩ P) + s( f , PB ∩ P)= s( f , P) ≤ S ( f , P)= S ( f , PA ∩ P) + S ( f , PB ∩ P)≤ S ( f , PA) + S ( f , PB).

Thus, ∫A

f dx +∫

A

f dx ≤∫

A∪B

f dx ≤∫

A∪Bf dx ≤

∫A

f dx +∫

A

f dx,

132 Chapter 4

but the right and side and the left hand side are equal. n

Like for the univariate Riemann integral, we now consider under what conditionsis a function integrable.

Proposition 4.4 Let A ⊂ Rn be a box and f : A → R a continuous function.Then, f is integrable over A.

Proof : Since the box A compact, the function f is bounded and uniformly contin-uous on A. Thus, given ε > 0 there exists a δ > 0 such that

‖x − y‖ ≤ δ ⇒ | f (x) − f (y)| ≤ε

V(A).

Let P be a partition of A fine enough such that the diameter of every sub-box isless than δ. Then,

S ( f , P) − s( f , P) =∑

i

(Mi − mi) V(Ai) ≤ ε,

from which we conclude that∫A

f dx −∫

A

f dx ≤ ε,

i.e., the difference between the upper and lower integral is less that ε for everyε > 0, from which we conclude that f is integrable. n

The restriction to continuous function is unacceptable. The following theoremrelaxes the conditions on f :

Theorem 4.1 (Lebesgue) Let A ⊂ Rn be a box and f : A→ R a bounded function.Then f is integrable over A if and only if its points of discontinuity form a set ofmeasure zero.

Proof : TO BE COMPLETED n

Corollary 4.4 Let f1, . . . , fm : A → R be integrable functions and φ : Rm → R acontinuous function, then ∫

Aφ( f1(x), . . . , fm(x)) dx

exists.


Proof : The points of discontinuity of φ is the union of points of discontinuity ofthe fi. n

We are now in measure to define integrals over domains what are not only boxes.

Definition 4.4 Let S ⊂ Rn be a bounded set. The indicator function of S is thefunction

χS (x) =

1 x ∈ S0 x < S .

The set S is said to have a volume if for some rectangular box A ⊃ S the integral

V(S ) =∫

AχS (x) dx

exists.

We need to show however that V(S ) is well-defined, i.e., that it does not dependon the choice of A.

Lemma 4.1 Let A = [a1, b1] × · · · × [an, bn] be a box, and f : A → R a boundedfunction such that f (x) = 0 for all x ∈ A◦. Then f is integrable over A and itsintegral is zero.

Proof : Set M = supA | f (x)|, and let

F = 2n∑

i=1

∑j,i

(bi − ai).

be the sum of the (n − 1)-dimensional volumes of the 2(n − 1) sides of A. Givenε > 0 we partition A such that its sub-boxes have diameter less than ε/MF. If thepartition P divides A into sub-boxes A1, . . . , Ar, we can assume that only the firstk sub-boxes intersect ∂A, and the rest are in A◦. We claim that

k∑i=1

V(Ai) ≤ F ·ε

MF=ε

M.

Thus,

s( f , P) =r∑

i=1

miV(Ai) ≥k∑

i=1

(−M) V(Ai) ≥ −ε.

134 Chapter 4

Similarly,

S ( f , P) =r∑

i=1

MiV(Ai) ≤k∑

i=1

M V(Ai) ≤ ε,

from which we conclude that ∫A

f dx = 0.

n

Suppose now that A and B are boxes than enclose a set S . We want to show that∫AχS (x) dx =

∫BχS (x) dx.

It is sufficient to show this for the case where A ⊂ B, because we can then encloseA and B in a box C that contains both.

S

A

B

So let A ⊂ B. We can partition B into boxes B1, . . . , Br, where B1 = A, such thattheir union is B and their interiors are disjoint. Note that for all i = 2, . . . , r,

S ∩ Bi ⊂ A ∩ Bi ⊂ ∂Bi,

i.e. χS vanishes in B◦i . By the additivity of the integral and the previous lemmawe obtain the desired result.

Comment: For a box inRn this definition of the volume coincides with our originaldefinition.

The question is for what kind of sets in Rn we can prove the existence of a volumeand how to compute it.


Lemma 4.2 Let S ⊂ Rn be a compact set and f ∈ C(S ;R). The graph of f ,

{(x, f (x) : x ∈ S } ⊂ Rn+1

is a set of measure zero in Rn+1.

Proof : Let A ⊂ Rn be a box that contains S . Since f is continuous on a compactset , it is uniformly continuous, hence for every ε > 0 there exists a δ > 0 suchthat for every x, y ∈ S ,

‖x − y‖ ≤ δ ⇒ | f (x) − f (y)| ≤ε

V(A).

Let P be a partition of A into sub-boxes A j such their diameter is less than δ. Wenote the following inclusion,

{(x, f (x) : x ∈ S } ⊆⋃

A j∩S,∅

A j × [minA j∩S

f ,maxA j∩S

f ],

i.e., we enclose the graph of f in a finite set of boxes. On the other hand, thevolume of the right hand side is bounded by∑

j

V(A j)ε

V(A)= ε.

n

This means that given a continuous function f : S → R, S ⊂ Rn compact, the setof points

{(x, y) : |y| < f (x)} ⊂ Rn+1,

has a volume in Rn+1, since its indicator function is only discontinuous on thegraphs of f (x) and − f (x).

In fact, this lemma has two important generalization. The first is analogous toLebesgue’s theorem for integrability:

Lemma 4.3 Let S ⊂ Rn be a compact set and f : S → R a function whose pointsof discontinuity have measure zero in Rn. The graph of f ,

{(x, f (x) : x ∈ S } ⊂ Rn+1

is a set of measure zero in Rn+1.

136 Chapter 4

The second extension says that the measure (zero) of a set is invariant under trans-lations and rotations:

Lemma 4.4 Let A ⊂ Rn be a set of measure zero. Then for every rotation Q :Rn → Rn and every vector b ∈ Rn, the set

{Qx + b : x ∈ A}

has measure zero.

This last lemma is useful in the following setting: we can show that a set has afinite volume if we can partition its boundary into a finite number of sets, each ofwhich can be represented as the graph of a continuous function upon a rotation ofthe axes.

Finally, we are in measure to define the integral of functions over domains that arenot boxes:

Definition 4.5 Let S ⊂ Rn be a bounded set that has a volume and let f : S → Rbe a bounded function. f is called Riemann integrable over S is there exists a boxA enclosing S such that the integral∫

Af |S dx

def=

∫S

f dx,

where

f |S =

f (x) x ∈ S0 otherwise.

As before, we need to show that this definition is independent of A.

Comment: In principle we need to show at this stage that all the basic propertiesof the integral, which we proved for integration over boxes, carry verbatim tointegrals over arbitrary sets.

Lemma 4.5 Let A ⊂ Rn be a set with volume, as well as C,D ⊂ A. Then the setsC ∪ D and C ∩ D have volumes and

V(C ∪ D) = V(C) + V(D) − V(C ∩ D)

(an inclusion-exclusion principle).


Proof : Since C,D have volume the functions χC and χD are integrable, i.e., theirpoints of discontinuity of measure zero. It follows that the points of discontinuityof the functions χCχD and χC + χD − χCχD have measure zero, hence they areintegrable. Now,∫

A(χC + χD − χCχD) dx =

∫AχC dx +

∫AχD dx −

∫AχCχD dx,


Theorem 4.2 (Fubini) Let A ⊂ Rn and B ⊂ Rm be closed boxes, f : A × B → Ran integrable function then for every x, y the integrals

∫B

∫A

f (x, y) dx

dy and∫

B

∫A

f (x, y) dx dy

exist and equal to ∫A×B

f (x, y) dxdy.

Comment: Obviously, the roles of x and y can be interchanged.

Comment: Thus far the notation dxdy does not have any meaning, except forrepresenting the fact that the integration is over both variables x, y.

Proof : Since f is integrable on A × B there exists, for a given ε > 0, a partition Psuch that

S ( f , P) − s( f , P) ≤ ε.

Every partition P of A× B is a product of partitions PA and PB of the boxes A andB, yielding a partition in sub-boxes Ai, B j. For every y ∈ B we set

G(y) =∫

Af (x, y) dx and H(y) =

∫A

f (x, y) dx.

138 Chapter 4

Then,

s( f , P) =∑

i

∑j

V(Ai)V(B j) inf(x,y)∈Ai×B j

f (x, y)

=∑

j

V(B j)∑

i

V(Ai) inf(x,y)∈Ai×B j

f (x, y)

≤∑

j

V(B j) infy∈B j

∑i

V(Ai) infx∈Ai

f (x, y)

≤∑

j

V(B j) infy∈B j

H(y) = s(H, PB).

Similarly,S ( f , P) ≥ S (G, PB).

From that we conclude that∫A×B

f (x, y) dxdy − ε ≤ s( f , P) ≤ s(H, PB) ≤ S (H, PB) ≤ S (G, PB)

≤ S ( f , P) ≤∫

A×Bf (x, y) dxdy + ε.

Since this holds for every ε is follows that∫A×B

f (x, y) dxdy =∫

BH(y) dy =

∫B

∫A

f (x, y) dx dy.

In the same way we show that∫A×B

f (x, y) dxdy =∫

BG(y) dy =

∫B

∫A

f (x, y) dx dy.

n

Corollary 4.5 If for every x ∈ A and y ∈ B the integrals

u(x) =∫

Bf (x, y) dy and v(y) =

∫A

f (x, y) dx

exist, then Fubini’s theorem implies that∫A×B

f (x, y) dxdy =∫

Au(x) dx =

∫B

v(y) dy,

i.e., the order of integration can be interchanged.


Example: LetC =

{(x, y) ∈ R2 : x2 + y2 ≤ a2

},

i.e. a disc of radius a. What is the volume (area) of the disc.

We start by enclosing the disc in a box [−a, a] × [−a, a]. Using Fubini’s theorem,

V(A) =∫

AχC dx =

∫ a

−a

(∫ a

−aχC(x, y) dy

)dx,

provided the integrals exist (but they do because the points of discontinuity havemeasure zero). These integrals are easily calculated:

V(A) =∫ a

−a2√

a2 − x2 dx =∫ π/2

−π/22a2 cos2 θ dθ = πa2,

where we have used the change of variables x = a sin θ. N

Example: Consider a body of revolution:

C ={(x, y, z) ∈ R3 : x ∈ [a, b], y2 + z2 ≤ f 2(x)

}.

Using again fubini, we have

V(C) =∫ b

a

(∫χC(x, y, z) dydz

)dx =

∫ b

aπ f 2(x) dx,

since the inner integral is the area of a disc of radius | f (x)|. N

Example: Finally we calculate the volume of the n-sphere

Sn ={(x1, . . . , xn+1 ∈ R

n+1 : x21 + . . . x

2n+1 ≤ 1

}.

For n = 0 it is 2, for n = 1 it is π and for n = 2 it is 4π/3. We want to show thatthis is indeed correct and generalize it. The idea is to use recursion:

V(Sn) =∫ 1

−1

(∫[−1,1]n

χSn dx1 . . . dxn

)dxn+1.

The interior integral is the volume of an (n−1)-sphere of radius√

1 − x2n+1. Thus,

if Vn(a) is the volume of an n-sphere of radius a, then

Vn(1) =∫ 1

−1Vn−1(

√1 − x2) dx.

140 Chapter 4

We claim, without proof, that

Vn(a) = an+1Vn(1),

hence

Vn(1) = Vn−1(1)∫ 1

−1(1 − x2)n/2 dx def

= pnVn−1(1).

Thus,Vn(1) = 2p1 p2 . . . pn−1.

It remains to calculate those pn. The first is given by

p1 =

∫ 1

−1(1 − x2)1)/2 dx = π.

The next is

p2 =

∫ 1

−1(1 − x2) dx = 2 −

13=

23.

For n ≥ 2, substituting x = cos θ,

pn =

∫ 1

−1(1 − x2)n/2 dx =

∫ π

0sinn+1 θ dθ.

Integrating by parts:

pn = pn−2 −

∫ π

0(sinn−1 θ cos θ) cos θ dθ

= pn−2 −1n

sinn θ cos θ|π0 −1n

∫ π

0sinn+1 θ dθ

= pn−2 −pn

n.

We thus have the recursive formula,

pn =n − 1

npn−2,

from which we can reproduce all the volumes:

p3 =23π p4 =

34

23

p5 =45

23π p6 =

56

34

23.

N


Theorem 4.3 (Change of variables formula) Let B ⊂ Rn be an open set and g ∈C1(B;Rn) one-to-one such that Jg(x) , 0 in B. Let A ⊂ g(B) be a compact set andf : A→ R continuous (i.e., integrable). Then,∫

Af dx =

∫g−1(A)

f ◦ g |Jg| dx.

That is, the integration can be pulled back into the pre-image of f .

Example: Let f : R2 → R. We denote the coordinates on which f is defined by(x, y). Let g : R2 → R2 be the mapping

g :(rθ

)7→

(r cos θr sin θ

).

The Jacobian of this transformation is

Jg(r, θ) =(cos θ −r sin θsin θ r cos θ

)= r,

which is non-zero except along the θ-axis. For every set (x, y) that does not includethe origin, Jg(g−1) is non-zero. By the change of variables formula:∫

Af (x, y) dxdy =

∫g−1(A)

f (r cos θ, r sin θ) rdrdθ.

Take for example the integral of the function f (x, y) = x2 + y2 over the set

A ={(x, y) : 1 ≤ x2 + y2 ≤ 4

}.

By this formula, ∫D

(x2 + y2) dxdy =∫ 2

1

∫ 2π

0r2 rdrdθ =

15π2.

Note that without this change of variables it wouldn’t be easy to perform such anintegral. N

142 Chapter 4

Documents

Einstein Institute of Mathematics @ The Hebrew University ...razk/Teaching/LectureNotes/AdvancedCalculus… · Contents 1 Metric Spaces 1 1.1 Basic deﬁnitions . . . . . . .