Introduction to Real Analysis Fall 2011 Lecture Noteswagner/4331/4331lec.pdfChapter 1 Metric Spaces These notes accompany the Fall 2011 Introduction to Real Analysis course 1.1 De

Introduction to Real Analysis

Fall 2011 Lecture Notes

Vern I. Paulsen

January 17, 2012

2

Contents

1 Metric Spaces 51.1 Definition and Examples . . . . . . . . . . . . . . . . . . . . . 51.2 Open Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.2.1 Uniformly Equivalent Metrics . . . . . . . . . . . . . . 111.3 Closed Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.4 Convergent Sequences . . . . . . . . . . . . . . . . . . . . . . 16

1.4.1 Further results on convergence in Rk . . . . . . . . . . 201.4.2 Subsequences . . . . . . . . . . . . . . . . . . . . . . . 22

1.5 Interiors, Closures, Boundaries of Sets . . . . . . . . . . . . . 231.6 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . 251.7 Compact Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

2 Finite and Infinite Sets 37

3 Continuous Functions 413.1 Functions into Euclidean space . . . . . . . . . . . . . . . . . 453.2 Continuity of Some Basic Functions . . . . . . . . . . . . . . 473.3 Continuity and Limits . . . . . . . . . . . . . . . . . . . . . . 493.4 Continuous Functions and Compact Sets . . . . . . . . . . . . 513.5 Connected Sets and the Intermediate Value Theorem . . . . . 52

4 The Contraction Mapping Principle 554.1 Application: Newton’s Method . . . . . . . . . . . . . . . . . 574.2 Application: Solution of ODE’s . . . . . . . . . . . . . . . . . 58

5 Riemann and Riemann-Stieltjes Integration 635.1 The Riemann-Stieltjes Integral . . . . . . . . . . . . . . . . . 685.2 Properties of the Riemann-Stieltjes Integral . . . . . . . . . . 745.3 The Fundamental Theorem of Calculus . . . . . . . . . . . . . 79

3

4 CONTENTS

6 Solutions to Problems 83

Chapter 1

Metric Spaces

These notes accompany the Fall 2011 Introduction to Real Analysis course

1.1 Definition and Examples

Definition 1.1. Given a set X a metric on X is a function d : X×X → Rsatisfying:

1. for every x, y ∈ X, d(x, y) ≥ 0,

2. d(x, y) = 0 if and only if x = y,

3. d(x, y) = d(y, x),

4. (triangle inequality) for every x, y, z ∈ X, d(x, y) ≤ d(x, z) + d(z, y).

The pair (X, d) is called a metric space.

Example 1.2. Let X = R and set d(x, y) = |x − y|, then d is a metric onR. We call this the usual metric on R.

To prove it is a metric we verify (1)–(4). For (1): d(x, y) = |x− y| ≥ 0,by the definition of the absolute value functions so (1). Since d(x, y) = 0 ifand only if |x − y| = 0 if and only if x = y, (2) follows. (3) follows sinced(x, y) = |x − y| = |y − x| = d(y, x). Finally, for (4), d(x, y) = |x − y| =|x− z + z − y| ≤ |x− z|+ |z − y| = d(x, z) + d(z, y).

Example 1.3 (The taxi cab metric). Let X = R2. Given x = (x1, x2), y =(y1, y2), set d(x, y) = |x1 − y1|+ |x2 − y2|, then d is a metric on R2.

5

6 CHAPTER 1. METRIC SPACES

We verify (1)–(4). (1) and (3) are obvious. For (2): d(x, y) = 0 iff|x1 − y1|+ |x2 − y2| = 0. But since both terms in the sum are non-negativefor the sum to be 0, each one must be 0. So d(x, y) = 0 iff |x1 − y1| = 0AND |x2 − y2| = 0 iff x1 = y1 and x2 = y2 iff x = (x1, x2) = (y1, y2) = y.Finally to see (4):

d(x, y) = |x1 − y1|+ |x2 − y2| = |x1 − z1 + z1 − y1|+ |x2 − z2 + z2 − y2|≤ |x1 − z1|+ |z1 − y1|+ |x2 − z2|+ |z2 − y2| = d(x, z) + d(z, y).

We often denote the taxi cab metric by d1(x, y).

Example 1.4. A different metric on R2. For x = (x1, x2), y = (y1, y2) setd∞(x, y) = max{|x1 − y1|, |x2 − y2|}. So the distance between two points isthe larger of these two numbers.

We only check the triangle inequality. Let z = (z1, z2) be another point.We have two cases to check. Either |x1−y1| = d(x, y) OR d(x, y) = |x2−y2|.

Case 1: d(x, y) = |x1 − y1|.Now notice that |x1− z1| ≤ max{|x1− z1|, |x2− z2|} = d(x, z). Similarly,

|z1 − y1| ≤ max{|z1 − y1|, |z2 − y2|}. Hence,

d(x, y) = |x1 − y1| = |x1 − z1 + z1 − y1| ≤ |x1 − z1|+ |z1 − y1|≤ d(x, z) + d(z, y).

Case 2: d(x, y) = |x2−y2|. Now use |x2−z2| ≤ max{|x1−z1|, |x2−z2|} =d(x, z). Similarly, |z2 − y2| ≤ max{|z1 − y1|, |z2 − y2|}. Hence,

d(x, y) = |x2 − y2| = |x2 − z2 + z2 − y2| ≤ |x2 − z2|+ |z2 − y2|≤ d(x, z) + d(z, y).

So in each case the triangle inequality is true, so it is true.

Example 1.5. In this case we let X be the set of all continuous real-valuedfunctions on [0, 1]. We use three facts from Math 3333:

1. if f and g are continuous on [0,1], then f − g is continuous on [0,1],

2. if f is continuous on [0,1], then |f | is continuous on [0,1],

3. if h is continuous on [0,1], then there is a point 0 ≤ t0 ≤ 1, so thath(t) ≤ h(t0) for every 0 ≤ t ≤ 1. That is h(t0) = max{h(t) : 0 ≤ t ≤1}.

1.1. DEFINITION AND EXAMPLES 7

Now given f, g ∈ X, we set d(f, g) = max{|f(t)− g(t)| : 0 ≤ t ≤ 1}.Note that by (1) and (2) |f − g| is continuous and so by (3) there is a

point where it acheives its maximum.

We now show that d is a metric on X. Clearly, (1) holds. Next, ifd(f, g) = 0, then the maximum of |f(t) − g(t)| is 0, so we must have that|f(t)− g(t)| = 0 for every t. But then this means that f(t) = g(t) for everyt, and so f = g. So d(f, g) = 0 implies f = g. Also f = g implies d(f, g) = 0so (2) holds. Clearly (3) holds. Finally to see the triangle inequality, we letf, g, h be three continuous functions on [0,1]. that is, f, g, h ∈ X. We mustshow that d(f, g) ≤ d(f, h) + d(h, g).

We know that there is a point t0, 0 ≤ t0 ≤ 1, so that

d(f, g) = max{|f(t)− g(t)| : 0 ≤ t ≤ 1} = |f(t0)− g(t0)|.

Hence,

d(f, g) = |f(t0)− g(t0)| = |f(t0)− h(t0) + h(t0)− g(t0)|≤ |f(t0)− h(t0)|+ |h(t0)− g(t0)|

≤ max{|f(t)− h(t)| : 0 ≤ t ≤ 1}+ max{|h(t)− g(t)| : 0 ≤ t ≤ 1}= d(f, h) + d(h, g)

Example 1.6 (Euclidean space, Euclidean metric). Let X = Rn the set ofreal n-tuples. For x = (a1, . . . , an) and y = (b1, . . . , bn) we set

d(x, y) =√

(a1 − b1)2 + · · ·+ (an − bn)2.

This efines a metric on Rn, which we will prove shortly. This metric iscalled the Euclidean metric and (Rn, d) is called Euclidean space.

It is easy to see that the Euclidean metric satisfies (1)–(3) of a metric.It is harder to prove the triangle inequality for the Euclidean metric thansome of the others that we have looked at. This requires some results first.

Lemma 1.7. Let p(t) = at2 + bt+ c with a ≥ 0. If p(t) ≥ 0 for every t ∈ R¯,

then b2 ≤ 4ac.

Proposition 1.8 (Schwarz Inequality). Let a1, ..., an, b1, ..., bn be real num-bers. Then

|a1b1 + · · · anbn| ≤√a2

1 + · · ·+ a2n

√b21 + · · ·+ b2n.


Proof. Look at p(t) = (ta1 + b1)2 + · · ·+ (tan + bn)2 ≥ 0 for all t.

Corollary 1.9.√

(a1 + b1)2 + · · ·+ (an + bn)2 ≤√a2

1 + · · ·+ a2n+√b21 + · · ·+ b2n.

Proof. Let LHS denote the left hand side, RHS the right hand side of theinequality. then

(LHS)2 = (a1+b1)2+· · ·+(an+bn)2 = a21+· · ·+a2

n+2(a1b1+· · ·+anbn)+b21+· · ·+b2n

≤ a21 + · · ·+ a2

n + 2√a2

1 · · ·+ a2n

√b21 + · · ·+ b2n + b21 + · · ·+ b2n

= (√a2

1 + · · ·+ a2n +

√b21 + · · ·+ b2n)2 = (RHS)2

Now prove the triangle inequality.

Example 1.10 (The discrete metric). Let X be any non-empty set anddefine

d(x, y) =

{1 x 6= y

0 x = y.

Then this is a metric on X called the discrete metric and we call (X, d)a discrete metric space.

Example 1.11. When (X, d) is a metric space and Y ⊆ X is a subset,then restricting the metric on X to Y gives a metric on Y, we call (Y, d) asubspace of (X,d).

Homework Assignment, Due 9/7.

Problem 1.12. Let X = R, define d(x, y) = |x − y| + 1. Show that this isNOT a metric.

Problem 1.13. Let X = R, define d(x, y) = |x2 − y2|. Show that this isNOT a metric.

Problem 1.14. Let X = R, define d(x, y) = |x− y|+ |x2 − y2|. Prove thatthis is a metric on R.Problem 1.15. Let X = Rn for x = (a1, ..., an), y = (b1, ..., bn) define

d1(x, y) = |a1 − b1|+ · · ·+ |an − bn|.

Prove that this is a metric.

Problem 1.16. Let X = Rn for x and y as before, define

d∞(x, y) = max{|a1 − b1|, ..., |an − bn|}.

Prove that this is a metric.

1.2. OPEN SETS 9

1.2 Open Sets

Definition 1.17. Let (X,d) be a metric space, fix x ∈ X and r > 0. Theopen ball of radius r centered at x is the set

B(x; r) = {y ∈ X : d(x, y) < r}.

Example 1.18. In R with the usual metric, B(x; r) = {y : |x − y| < r} ={y : x− r < y < x+ r} = (x− r, x+ r).

Example 1.19. In R2 with the Euclidean metric, x = (x1, x2), then B(x; r) ={(y1, y2) : (x1 − y1)2 + (x2 − y2)2 < r2}, which is a disk of radius r centeredat x.

Example 1.20. In R3 with the Euclidean metric, B(x; r) really is an openball of radius r. This example is where the name comes from.

Example 1.21. In (R2, d∞) we have

B(x; r) = {(y1, y2) : |x1 − y1| < r and |x2 − y2| < r} ={(y1, y2) : x1 − r < y1 < x1 + r and x2 − r < y2 < y2 + r}.

So now an open “ball” is actually an open square, centered at x with sidesof length 2r.

Example 1.22. In (R2, d1) we have B((0, 0); 1) = {(x, y) : |x−0|+ |y−0| <1} which can be seen to be the “diamond” with corners at (1,0),(0,1),(-1,0),(0,-1).

Example 1.23. When X = {f : [0, 1] → R|f is continuous} and d(f, g) =max{|f(t)−g(t)| : 0 ≤ t ≤ 1}, then B(f ; r) = {g : g is continuous and f(t)−r < g(t) < f(t) + r, ∀t}. This can be pictured as all continuous functions gwhose graphs lie in a band of width r about the graph of f.

Example 1.24. If we let R have the usual metric and let Y = [0, 1] ⊆ Rbe the subspace, then when we look at the metric space Y we have thatB(0; 1/2) = [0, 1/2) = (−1/2, 1/2) ∩ Y.

Example 1.25. When we let X be a set with the discrete metric and x ∈ X,then B(x; r) = {x} when r < 1. When r ≥ 1, then B(x; r) = X.

Definition 1.26. Given a metric space (X,d) a subset O ⊆ X is called openprovided that whenever x ∈ O, then there is an r > 0 such that B(x; r) ⊆ O.


Showing that sets are open really requires proof, so we do a few examples.

Example 1.27. In R with the usual metric, an interval of the form (a, b) ={x : a < x < b} is an open set.

Proof. Given x ∈ (a, b), let r = min{x−a, b−x}. If y ∈ B(x; r) then x−r <y < x+ r. Since y < x+ r ≤ x+ (b−x) = b and y > x− r ≥ x− (x− a) = awe have a < y < b. So B(x; r) ⊆ (a, b). Thus, (a, b) is open.

So “open intervals” really are “open sets”.

Example 1.28. In R2 with the Euclidean metric, a rectangle of the formR = {(y1, y2) : a < y1 < b, c < y2 < d} is an open set.

Proof. Given x = (x1, x2) ∈ R, let r = min{x1 − a, b − x1, x2 − c, d − x2}.We claim that B(x; r) ⊆ R. To see this let y = (y1, y2) ∈ B(x; r), then(y1 − x1)2 + (y2 − x2)2 < r2.

Looking at just one term in the sum, we see that (y1 − x1)2 < r2 and so|y1 − x1| < r. This implies that x1 − r < y1 < x1 + r. As before we see thatx1 + r ≤ b and x1 − r ≥ a, so a < y1 < b.

Similarly, we get that c < y2 < d, so y ∈ R.

Example 1.29. In R with the usual metric an interval of the form [a, b) isnot open.

Proof. Consider the point a ∈ [a, b). Any ball about this point is of the formB(a; r) = (a− r, a+ r) and this contains points that are less than a and sonot in the set [a, b).

CAREFUL! If we let Y = [0, 1) ⊆ R be the subspace. Then in Y the setO = [0, 1/2) is open!(Explain why.)

The next result justifies us calling B(x; r) an open ball.

Proposition 1.30. Let (X,d) be a metric space, fix x ∈ X and r > 0. ThenB(x; r) is an open set.

Theorem 1.31. Let (X,d) be a metric space. Then

1. the empty set is open,

2. X is open,

3. the union of any collection of open sets is open,

4. the intersection of finitely many open sets is open.

1.2. OPEN SETS 11

Proposition 1.32. In a discrete metric space, every set is open.

Homework, due 9/7.

Problem 1.33. Prove that the set O = {(y1, y2) : y1 + y2 > 0} is an opensubset of R2 in the Euclidean metric.

Problem 1.34. Prove that the set O = {(y1, y2) : y1 > 0} is an open subsetof R2 in the Euclidean metric.

Problem 1.35. Prove that the disk D = {(y1, y2) : y21 + y2

2 < 1} is an opensubset of (R2, d∞).

Problem 1.36. Let X be the set of continuous real-valued functions on [0,1]with the metric that we introduced. Prove that O = {g ∈ X : g(t) > 0∀t} isan open subset.

1.2.1 Uniformly Equivalent Metrics

The definition of open set really depends on the metric. For example, onR if instead of the usual metric we used the discrete metric, then every setwould be open. But we have seen that when R has the usual metric, thennot every set is open. For example, [a, b) is not an open subset of R in theusual metric. Thus, whether a set is open or not really can depend on themetric that we are using.

For this reason, if a given set X has two metrics, d and ρ, and we saythat a set is open, we generally need to specify which metric we mean.Consequently, we will say that a set is open with respect to d or openin (X, d) when we want to specify that it is open when we use the metricd. In this case it may or may not be open with respect to ρ.

In the case of R2, we already have three metrics, the Euclidean metricd, the taxi cab metric d1 and the metric d∞. So when we say that a set isopen in R2, we could potentially mean three different things. On the otherhand it could be the case that all three of these metrics give rise to the samecollection of open sets.

In fact, these three metrics do give rise to the same collections of opensets and the following definition and result explains why.

Definition 1.37. Let X be a set and let d and ρ be two metrics on X. Wesay that these metrics are uniformly equivalent provided that there areconstants A and B such that for every x, y ∈ X,

ρ(x, y) ≤ Ad(x, y) and d(x, y) ≤ Bρ(x, y).


Example 1.38. On R2 the Euclidean d and the metric d∞ are uniformlyequivalent. In fact,

d∞(x, y) ≤ d(x, y) and d(x, y) ≤√

2d∞(x, y).

To see this, let x = (a1, a2), y = (b1, b2). Since |a1−b1| ≤√

(a1 − b1)2 + (a2 − b2)2

and |a2 − b2| ≤√

(a1 − b1)2 + (a2 − b2)2, we have that

d∞(x, y) = max{|a1 − b1|, |a2 − b2|} ≤√

(a1 − b1)2 + (a2 − b2)2 = d(x, y).

On the other hand, since |a1 − b1| ≤ d∞(x, y) and |a2 − b2| ≤ d∞(x, y),we have that (a1 − b1)2 + (a2 − b2)2 ≤ 2(d∞(x, y))2. Taking square roots ofboth sides, yields d(x, y) ≤

√2d∞(x, y).

Example 1.39. On R2 the Euclidean metric d and d1 are uniformly equiv-alent. In fact,

d(x, y) ≤ d1(x, y) and d1(x, y) ≤√

2d(x, y).

We have that

d1(x, y)2 = (|a1 − b1|+ |a2 − b2|)2

= |a1 − b1|2 + 2|a1 − b1||a2 − b2|+ |a2 − b2|2 ≥ (a1 − b1)2 + (a2 − b2)2

= (d(x, y))2.

Hence, d(x, y) ≤ d1(x, y).To see the other inequality, we use the Schwarz inequality,

d1(x, y) = ||a1 − b1| · 1 + |a2 − b2| · 1| ≤√|a1 − b1|2 + |a2 − b2|2

√12 + 12 =

√2d(x, y)

Given a set X with metrics d and ρ, a point x ∈ X and r > 0, we shallwrite Bd(x; r) = {y ∈ X : d(x, y) < r} and Bρ(x; r) = {y ∈ X : ρ(x, y) < r}.

Lemma 1.40. Let X be a set, let d and ρ be two metrics on X that areuniformly equivalent, and let A and B denote the constants that appear inDefinition 1.37. Then Bρ(x; r) ⊆ Bd(x;Br) and Bd(x; r) ⊆ Bρ(x;Ar).

Proof. If y ∈ Bρ(x; r), then ρ(x, y) < r which implies that d(x, y) ≤ Bρ(x, y) <Br. Hence, y ∈ Bd(x;Br) and so Bρ(x; r) ⊆ Bd(x;Br). The other case isproven similarly, using the other inequality.

1.3. CLOSED SETS 13

Theorem 1.41. Let X be a set and let d and ρ be metrics on X that areuniformly equivalent. Then a set is open with respect to d if and only if itis open with respect to ρ.

Proof. Let E ⊆ X be a set that is open with respect to d. We must provethat E is open with respect to ρ.

Since E is open with respect to d, given x ∈ E there is an r > 0 so thatBd(x; r) ⊆ E. By the lemma, Bρ(x; r/B) ⊆ Bd(x; r). Thus, Bρ(x; r/B) ⊆ Eand we have shown that given an arbitrary point in E that there is an openball in the ρ metric about that point that is contained in E. Hence, E isopen with respect to ρ.

The proof that a set that is open with respect to ρ is open with respectto d is similar and uses the other containment given in the lemma.

HW due 9/12:

Problem 1.42. Let X be a set with three metrics, d, ρ, and γ. Prove thatif d and ρ are uniformly equivalent and ρ and γ are uniformly equivalent,then d and γ are uniformly equivalent.

Problem 1.43. Prove that on Rn the Euclidean metric, the metric d1 andthe metric d∞ are all uniformly equivalent.

1.3 Closed Sets

Definition 1.44. Given a set X and E ⊆ X, the complement of E,denoted Ec is the set of all elements of X that are not in E, i.e.,

Ec = {x ∈ X : x /∈ E}.

Other notations that are used for the complement are Ec = CE = X\E.Note that (Ec)c = E.

Definition 1.45. Let (X, d) be a metric space. Then a set E ⊆ X is closedif and only if Ec is open.

The following gives a useful way to re-state this definition.

Proposition 1.46. Let (X, d) be a metric space. Then a set E is closed ifand only if there is an open set O such that E = Oc.

Proof. If E is closed, then Ec is open. So let O = Ec, then E = Oc.Conversely, if E = Oc for some open set O, then Ec = (Oc)c = O, which

is open, so by the definition, E is closed.


Example 1.47. In R with the usual metric, we have that (b,∞) is openand (−∞, a) is open. So when a < b we have that O = (−∞, a) ∪ (b,+∞)is open. Hence, Oc = [a, b] is closed.

This shows that our old calculus definition of a “closed interval”, reallyis a closed set in this sense.

Definition 1.48. Let (X, d) be a metric space, let x ∈ X and let r > 0. Theclosed ball with center x and radius r is the set

B−(x; r) = {y ∈ X : d(x, y) ≤ r}.

The following result explains this notation.

Proposition 1.49. Let (X, d) be a metric space, let x ∈ X and let r > 0.Then B−(x; r) is a closed set.

Proof. Let O = (B−(x; r))c denote the complement. We must prove that Ois open. Note that O = {y ∈ X : d(x, y) > r}.

For p ∈ O set r1 = d(x, p) − r > 0. We claim that B(p; r1) ⊆ O. Ifwe can prove this claim, then we will have shown that O is open. So letq ∈ B(p; r1), then d(x, p) ≤ d(x, q) + d(q, p) < d(x, q) + r1. Subtracting r1from both sides of this last inequality yields, d(x, p) − r1 < d(x, q). Butd(x, p) − r1 = r. Hence, r < d(x, q) and so q ∈ O. Since q was an arbitraryelement of B(p; r1), we have that B(p; r1) ⊆ O and our proof is complete.

Because the definition of closed sets involves complements, it is usefulto recall DeMorgan’s Laws. Given subsets Ei ⊆ X where i belongs to someset I, we have⋃

i∈IEi = {x ∈ X : there exists i ∈ I with x ∈ Ei}

and ⋂i∈I

Ei = {x ∈ X : x ∈ Ei for every i ∈ I}.

Proposition 1.50 (DeMorgan). Let Ei ⊆ X for i ∈ I. Then

(⋃i∈I

Ei)c =⋂i∈I

Eci and (⋂i∈I

Ei)c =⋃i∈I

Eci .

The following theorem about closed sets follows from DeMorgan’s Lawsand Theorem 1.31.

1.3. CLOSED SETS 15

Theorem 1.51. Let (X, d) be a metric space. Then:

1. the empty set is closed,

2. X is closed,

3. the intersection of any collection of closed sets is closed,

4. the union of finitely many closed sets is closed.

Proof. By Theorem 1.31, the set X is open, so Xc is closed, but Xc is theempty set. Similarly, the empty set is open, so its complement, which is X,is closed.

Given a collection of closed set Ei i ∈ I, we have that each Eci is open.Since

(⋂i∈I

Ei)c =⋃i∈I

Eci ,

we see that the complement of their intersection is the union of a collectionof open sets. By Theorem 1.31.3, this union of open sets is an open set.Hence, the complement of the intersection is open and so the intersection isclosed.

Similarly, if we have only a finite collection of closed sets, E1, . . . , Enthen (E1 ∪ · · · ∪ En)c = Ec1 ∩ · · ·Ecn, which is a finite intersection of opensets. By Theorem 1.31.4, this finite intersection of open sets is open. Hence,the complement of the union is open and so the union is closed.

Proposition 1.52. In a discrete metric space, every set is closed.

As with open sets, when there is more than one metric on the set X,then we need to specify which metric we are referring to when saying that aset is closed. The following is the analogue of Theorem 1.41 for closed sets.

Proposition 1.53. Let X be a set and let d and ρ be two metrics on X thatare uniformly equivalent. Then a set is closed with respect to d if and onlyif it is closed with respect to ρ.

HW due 9/12:

Problem 1.54. Prove that {(a1, a2) ∈ R2 : 0 ≤ a1 ≤ 2, 0 ≤ a2 ≤ 4} is aclosed set in the Euclidean metric.


1.4 Convergent Sequences

Our general idea from calculus of a sequence {pn} converging to a point p isthat as n grows larger, the points pn grow closer and closer to p. When wesay “grow closer” we really have in our minds that some distance is growingsmaller. This leads naturally to the following definition.

Definition 1.55. Let (X, d) be a metric space, {pn} ⊆ X a sequence in Xand p ∈ X. We say that the sequence {pn} converges to p and write

limn→+∞

pn = p,

provided that for every ε >), there is a real number N so that when n > N,then d(p, pn) < ε.

Often out of sheer laziness, I will write limn pn = p for limn→+∞ pn = p.

Example 1.56. When X = R and d is the usual metric, then d(p, pn) =|p−pn| and this definition is identical to the definition used when we studiedconvergent sequences of real numbers in Math 3333.

For a quick review, we will look at a couple of examples of sequences andrecall how we would prove that they converge.

For a first example, consder the sequence given by pn = 3n+15n−2 . Fo any

natrual number n, the denominator of this fraction is non-zero. So thisformula defines a sequence of points. For large n, the +1 in the numeratorand the −2 in the denominator are small in relation to the 3n and 5n so weexpect that this sequence has limit p = 3

5 .Now for some scrap work. To prove this we would need

d(p, pn) = |p− pn| = |3(5n− 2)− (3n+ 1)5

5(5n− 2)| < ε.

Simplifying this fraction leads to the condition 125n−10 < ε. Solving for n we

see that this is true provided 1ε < 25n− 10, i.e., 1

25ε + 25 < n. So it looks like

we should choose, N = 125ε + 2

5 . This is not a proof, just our scrap work.Now for the proof:Given ε > 0, define N = 1

25ε + 25 . For any n > N, we have that 25n−10 >

25N − 10 = 1ε , and so d(p, pn) = |35 −

3n+15n−2 | = 1

25n−10 < (1ε )−1 = ε. Hence,

limn→+∞ pn = p.For the next example, we look at pn =

√n2 + 8n−n and prove that this

sequence has limit p = 4.

1.4. CONVERGENT SEQUENCES 17

Now that we have recalled what this definition means in R and have seenhow to prove a few examples. We want to look at what this concept meansin some of our other favorite metric spaces.

Theorem 1.57. Let Rk be endowed with the Euclidean metric d, let {pn} ⊆Rk be a sequence with pn = (a1,n, a2,n, . . . , ak,n) and let p = (a1, a2, . . . , ak) ∈Rk. Then limn→+∞ pn = p in (Rk, d) if and only if for each j, 1 ≤ j ≤ k, wehave limn→+∞ aj,n = aj in R with the usual metric.

Proof. First we assume that limn→+∞ pn = p. Given ε > 0, there is a N sothat for n > N, we have that

d(p, pn) =√

(a1 − a1,n)2 + · · ·+ (ak − ak,n)2 < ε.

Now fix j, we have that |aj−aj,n| =√

(aj − aj,n)2 is smaller than the abovesquare root, since it is just one of the terms. Thus, for the same N, we havethat n > N implies that |aj − aj,n| ≤ d(p, pn) < ε. Hence, for each ε, wehave found an N and so limn aj,n = aj . This proves the first implication.

Now conversely, suppose that for each j, we have that limn aj,n = aj .Given an ε > 0, we must produce an N so that when n > N, we haved(p, pn) < ε.

For each j, taking ε1 = ε√k

we can find an Nj so that for n > Nj we havethat |aj−aj,n| < ε1.(Why did I call it Nj ?) Now let N = max{N1, . . . , Nk},then for n > N, we have that (aj − aj,n)2 < ε21 for every j, 1 ≤ j ≤ k.

Hence, d(p−pn) =√

(a1 − a1,n)2 + · · ·+ (ak − ak,n)2 <√ε21 + · · ·+ ε21 =√

kε21 = ε. This proves that limn pn = p in the Euclidean metric.

So the crux of the above theorem is that a sequence of points in Rk

converges if and only if each of their components converge.If we combine our first two examples with the above theorem, we see that

if we define a sequence of points in R2 by setting pn = (3n+15n−2 ,

√n2 + 8n−n)

then in the Euclidean metric these points converge to the point p = (35 , 4).

What if we had used the taxi cab metric or d∞ metric on R2 instead ofthe Euclidean metric, would these points still converge to the same point?The answer is yes and the following result explains why and saves us havingto prove separate theorems for each of these metrics.

Proposition 1.58. Let X be a set with metrics d and ρ that are uniformlyequivalent, let {pn} ⊆ X be a sequence and let p ∈ X be a point. Then{pn} converges to p in the metric d if and only if {pn} converges to p in themetric ρ.


Proof. Let ρ(x, y) ≤ Ad(x, y) and d(x, y) ≤ Bρ(x, y) for every x, y ∈ X.First assume that {pn} converges to p in the metric d. Given ε > 0, we

can pick a N so that when n > N, we have that d(p, pn) < εA . Then when

n > N, we have that ρ(p, pn) ≤ Ad(p, pn) < ε. Hence, {pn} converges to pin the metric ρ.

The proof of the converse statement is similar, using the constant b.

We now look at the discrete metric.

Proposition 1.59. Let (X, d) be the discrete metric space, let {pn} ⊆ Xbe a sequence, and let p ∈ X. Then the sequence {pn} converges to p if andonly if there exists N so that for n > N, pn = p.

Proof. First assume that {pn} converges to p. Taking ε = 1, there is N sothat n > N implies that d(p, pn) < ε = 1. But since this is the discretemetric, d(x, y) < 1 implies that d(x, y) = 0, i.e., that x = y. Hence, pn = pfor every n > N.

Conversely, if there is an N1 so that when n > N1, we have pn = p, thengiven any ε > 0, pick N = N1. Then n > N implies that d(p, pn) = d(p, p) =0 < ε.

A sequence {pn} such that pn = p for every n > N, is often calledeventually constant.

We now look at some general theorems about convergence in metricspaces.

Proposition 1.60. Let (X, d) be a metric space. A sequence {pn} ⊆ X canhave at most one limit.

Proof. Suppose that limn pn = p and limn pn = q, we need to prove thatthis implies that p = q. We will do a proof by contradiction. Suppose thatp 6= q, then d(p, q) > 0.

Set ε = d(p,q)2 . Since limn pn = p, there is N1 so that n > N1 implies that

d(p, pn) < ε.Since limn pn = q, there is N2 so that n > N2 implies that d(q, pn) < ε.

So if we let N = max{N1, N2}, then for n > N, both of these inequalitieswill be true.

Thus, for any n > N, we have that d(p, q) ≤ d(p, pn) + d(pn, q) < ε+ ε =d(p, q). Thus, d(p, q) < d(p, q), a contradiction.

The above result is most often used to prove that two points are reallythe same point, since if limn pn = p and limn pn = q, then p = q.

Convergence is an important way to characterize closed sets.


Theorem 1.61. Let (X, d) be a metric space and let S ⊆ X be a subset.Then S is a closed subset if and only if whenever {pn} ⊆ S is a convergentsequence, we have limn pn ∈ S.

That is, a set is closed if and only if limits of convergent sequences stayin the set.

Proof. First assume that S is closed, let {pn} ⊆ S be a convergent sequenceand let p = limn pn. We must show that p ∈ S. We do a proof by contradic-tion.

Suppose that p /∈ S, then p ∈ Sc which is open. Hence, there is r > 0 sothat B(p; r) ⊆ Sc. Thus, for any q ∈ S, d(p, q) ≥ r. Now set ε = r, then forevery n, d(p, pn) ≥ ε, and hence we can find no N for this ε.

To prove the converse, we consider the logically equivalent contraposi-tive statement: if S is not closed, then there exists at least one convergentsequence {pn} ⊆ S, with limn pn /∈ S.

So we want to prove that this statement is true. To this end, look atSc since S is not closed, Sc is not open. Hence, there exists p ∈ Sc, suchthat for every r > 0, the set B(p; r) is not a subset of Sc. That is for everyr > 0, B(p; r) ∩ S is not empty. Now for each n if we set r = 1/n, then wecan pick a point pn ∈ B(p; 1/n) ∩ S. Thus, the sequence of points that weget this way satisfies, {pn} ⊆ S. Also, since d(p, pn) < 1/n, we have thatlimn pn = p, by Problem 1.65.

This completes the proof of the theorem.

One of our other main results from Math 3333 about convergent seqe-unces in R is that every convergent sequence of real numbers is bounded.This plays an important role in metric spaces too. But first we need to saywhat it means for a set to be bounded in a metric space.

Proposition 1.62. Let (X, d) be a metric space and let E ⊆ X be a subset.Then the following are equivalent:

1. there exists a point p ∈ X and r1 > 0 such that E ⊆ B−(p; r1),

2. there exists a point q ∈ X and r2 > 0 such that E ⊆ B(q; r2),

3. there exists M > 0 so that every x, y ∈ E satisfies d(x, y) ≤M.

Proof. If 1) holds, then let q = p and let r2 be any number satisfying,r2 > r1. Then E ⊆ B−(p; r1) ⊆ B(p; r2), and so 2) holds.

If 2) holds, let M = 2r2, then for any x, y ∈ E, we have that d(x, y) ≤d(x, q) + d(q, y) < r2 + r2 = M, and so 3) holds.


If 3) holds, fix any point p ∈ E and let r1 = M, then any y ∈ E satisfies,d(p, y) ≤M and so, E ⊆ B−(p,M).

Definition 1.63. Let (X, d) be a metric space and let E ⊆ X. We say thatE is a bounded set provided that it satisfies any of the three equivalentconditions of the above proposition.

Proposition 1.64. Let (X,d) be a metric space. If {pn} is a convergentsequence, then it is bounded.

Proof. Let limn pn = p. For ε = 1, there is N so that when n > N, thend(p, pn) < 1.

Let r1 = max{1, d(p, pn) : n ≤ N}. Since there are only finitely manynumbers in this set the maximum is well-defined and a finite number.

Now when, n ≤ N, d(p, pn) ≤ r1, since it is one of the numbers used infinding the maximum. While if n > N, then d(p, pn) < 1 ≤ r1. Thus, forevery n, d(p, pn) ≤ r1 and so {pn} ⊆ B−(p; r1).

HW Due 9/19.

Problem 1.65. Let (X, d) be a metric space, {pn} ⊆ X and p ∈ X. Provethat limn pn = p if and only if the sequence of real numbers {d(p, pn)} satis-fies limn d(p, pn) = 0.

Problem 1.66. Let S ⊆ R2 be the set defined by S = {(a1, a2) : 0 ≤ a2 ≤a1}. Prove that S is a closed subset of R2 in the Euclidean metric.

Problem 1.67. Give an example of a bounded subset of R that is not closed,of a closed subset that is not bounded and of a bounded sequence that is notconvergent.

1.4.1 Further results on convergence in Rk

Recall that Rk is a vector space, for p = (a1, . . . , ak), q = (b1, . . . , bk) ∈ Rk

and r ∈ R we have

p+ q = (a1 + b1, . . . , ak + bk) and rp = (ra1, . . . , rak).

There is also the “dot product”,

p · q = a1b1 + · · ·+ akbk.


Note that in terms of the dot product and Euclidean metric, we can see thatthe Schwarz inequality says that

|p · q| ≤√a2

1 + · · ·+ a2k

√b21 + · · ·+ b2k = d(0, p)d(0, q).

Two other useful connection to notice between the Euclidean metric andthe vector space operations is that

d(p, q) =√

(a1 − b1)2 + · · ·+ (ak − bk)2 = d(0, p− q)

andd(rp, rq) =

√(ra1 − rb1)2 + · · ·+ (rak − rbk)2 = |r|d(p, q).

Lemma 1.68. Let (Rk, d) be Euclidean space. If E ⊆ Rk is a bounded set,then there is A so that E ⊆ B−(0;A).

Proof. Since E is bounded, there is p ∈ Rk and r1, so that d(p, x) ≤ r1for every x ∈ E. Let A = r1 + d(0, p), then for any x ∈ E, d(0, x) ≤d(0, p) + d(p, x) ≤ A, and so E ⊆ B−(0;A).

The following result can be proved using Theorem 1.57 and results fromMath 3333 about convergence of sums and product of real numbers. we givea proof that mimics the proofs for real numbers.

Theorem 1.69. Let (Rk, d) be Euclidean space, let pn = (a1,n, . . . , ak,n) andqn = (b1,n, . . . , bk,n) be sequences in Rk with limn pn = p = (a1, . . . , ak) andlimn qn = q = (b1, . . . , bk), and let {rn} be a sequence in R with limn rn = r.Then we have the following:

1. limn pn + qn = p+ q,

2. limn rnpn = rp,

3. limn pn · qn = p · q.

Proof. To see 1), given ε > 0, choose N1, so that for n > N1, d(p, pn) < ε/2and choose N2 so that for n > N2, d(q, qn) < ε/2.

Let N = max{N1, N2}, then for n > N, we have that

d(p+q, pn+qn) =√

(a1 + b1 − a1,n − b1,n)2 + · · ·+ (ak + bk − ak,n − bk,n)2 ≤√(a1 − a1,n)2 + · · ·+ (ak − ak,n)2 +

√(b1 − b1,n)2 + · · ·+ (bk − bk,n)2 ≤

d(p, pn) + d(q, qn) < ε.


To prove 2), since {rn} is a bounded sequence of real numbers, there isM > 0, so that |rn| ≤M, for every n. Note that also |r| ≤M. Since {pn} isa bounded set of vectors, by the lemma there is A > 0, so that d(0, pn) ≤ A,for every n.

Now choose N1 so that for n > N1, |r − rn| < ε2A , and choose N2, so

that for n > N2, d(p, pn) < ε2M .

Let N = max{N1, N2}, then for n > N, we have

d(rp, rnpn) ≤ d(rp, rpn) + d(rpn, rnpn) = |r|d(p, pn) + |r − rn|d(0, pn) <

|r| ε2M

+ε

2AA ≤ ε

2+ε

2= ε.

Finally, to prove 3), let d(0, pn) ≤ A, as before and let d(0, qn) ≤ B, sothat d(0, p) ≤ A and d(0, q) ≤ B. We may choose N1 so that for n > N1 wehave that d(p, pn) < ε

2B and N2 so that for n > N2, we have d(q, qn) < ε2A .

Thus, if we let N = max{N1, N2}, then for n > N,

|p · q − pn · qn| = |(p− pn) · q + pn · (q − qn)|≤ |(p− pn) · q|+ |pn · (q − qn)| ≤ d(0, p− pn)d(0, q) + d(0, pn)d(0, q − qn)

≤ d(p, pn)B +Ad(q, qn) < ε/2 + ε/2 = ε

1.4.2 Subsequences

We recall the definition of a subsequence.

Definition 1.70. Given a set X, a sequence {pn} in X and numbers, 1 ≤n1 < n2 < · · · , the new sequence that we get by setting qk = pnk

is called asubsequence of {pn}.

For example, if nk = 2k, k = 1, 2, ..., then the subsequence that we get isjust the even numbered terms of the old sequence. If nk = 2k−1, k = 1, 2, ...,then we get the subsequence of odd terms.

Often we simply denote the subsequence by {pnk}.

Proposition 1.71. Let (X, d) be a metric space, {pn} a sequence in X andp ∈ X. If the sequence {pn} converges to p, then every subsequence of {pn}also converges to p.

Proof. Let 1 ≤ n1 < n2 < · · · , be the numbers for our subsequence. Notethat since these are integers and nj < nj+1, we have that they must be at

1.5. INTERIORS, CLOSURES, BOUNDARIES OF SETS 23

least one apart. Since, 1 ≤ n1, we have that 2 ≤ n2, 3 ≤ n3, and in general,k ≤ nk.

Now let qk = pnkdenote our subsequence and let ε > 0 be given. We

must show that there is a K so that k > K, implies that d(p, qk) < ε.Since limn pn = p, there is N so that n > N implies that d(p, pn) < ε.

Let K = N, then for k > K, nk ≥ k > N. Hence, for k > K, d(p, qk) =d(p, pnk

) < ε.This proves that limk qk = p.

1.5 Interiors, Closures, Boundaries of Sets

In this section we look at some other important concepts related to openand closed sets.

Definition 1.72. Let (X,d) be a metric space and S ⊆ X. A point p is calledan interior point of S provided that there exists r > 0 so that B(p; r) ⊆ S.The set of all interior points of S is called the interior of S and is denotedas int(S).

Note that since p ∈ B(p; r), every interior point of S is a point in S.Thus, int(S) ⊆ S.

Example 1.73. Let X = R with the usual metric. Then int([a, b]) = (a, b),while int(Q) is the empty set.

Example 1.74. Let X = R2 with the Euclidean metric. Then int({(x1, x2) :a ≤ x1 ≤ b, c ≤ x2 ≤ d}) = {(x1, x2) : a < x1 < b, c < x2 < d} andint({(x1, 0) : a ≤ x1 ≤ b}) is the empty set.

Proposition 1.75. Let (X,d) be a metric space and let S ⊆ X. Then:

1. int(S) is an open set,

2. if O ⊆ S is an open set, then O ⊆ int(S),

3. int(S) is the largest open set contained in S.

Proof. If int(S) is empty, then it is open. Otherwise, let p ∈ int(S), so thatthere is r > 0 with B(p : r) ⊆ S. If we show that B(p; r) ⊆ int(S), then thatwill prove that int(S) is open. Let q ∈ B(p; r) and set r1 = r−d(p, q). Usingthe triangle inequality, we have seen before that B(q; r1) ⊆ B(p; r) ⊆ S. Thisshows that q ∈ int(S). Hence, B(p; r) ⊆ int(S).

To see the second statement, if p ∈ O, then since O is open, there isr > 0, so that B(p; r) ⊆ O ⊆ S. This shows that p ∈ int(S) and so O ⊆ S.

The third statement is a summary of the first two statements.


Definition 1.76. Let (X,d) be a metric space and let S ⊆ X. Then theclosure of S, denoted S is the intersection of all closed sets containing S.

Before giving any examples, it will be easiest to have some other descrip-tions of S.

Proposition 1.77. S is closed and if C is any closed set with S ⊆ C, thenS ⊆ C.

Proof. S is closed because intersections of closed sets are closed. If C isclosed and S ⊆ C, then C is one of the sets that is in the intersectiondefining S and so S ⊆ C.

Thus, S is the smallest closed set containing S.

Theorem 1.78. Let p ∈ X, then p ∈ S if and only if for every r > 0,B(p; r) ∩ S is non-empty.

Proof. Suppose that there was an r > 0 so that B(p; r)∩S was empty. Thiswould imply that S ⊆ B(p; r)c, which is a closed set. Thus, we have a closedset containing S and p /∈ B(p; r)c. This implies that p /∈ S. (This is thecontrapositive.)

Conversely, if B(p; r) ∩ S is non-empty for every r > 0, then for each n,we may choose a point pn ∈ B(p; 1/n) ∩ S. This gives a sequence {pn} ⊆ S,and it is easy to see that limn pn = p. Now if C is any closed set with S ⊆ C,then {pn} ⊆ C, and so by Theorem 1.61, p ∈ C. Hence, p is in every closedset containing S and so p ∈ S.

Corollary 1.79. Let p ∈ X, then p ∈ S if and only if there is a sequence{pn} ⊆ S with limn pn = p.

Example 1.80. Let X = R with the usual metric. Then the closure of (a, b)is [a, b] and Q = R.

Example 1.81. If X = {x ∈ R : x > 0} with the usual metric and S = {x :0 < x < 1} then S = (0, 1].

Definition 1.82. Let (X,d) be a metric space and let S ⊆ X. Then theboundary of S, denoted ∂S is the set ∂S = S ∩ Sc.

Proposition 1.83. Let p ∈ X, then p ∈ ∂S if and only if for every r > 0,B(p; r) ∩ S is non-empty and B(p; r) ∩ Sc is non-empty.

Proof. Apply the characterization of the closure of sets to S and Sc.

1.6. COMPLETENESS 25

Corollary 1.84. Let p ∈ X. p ∈ ∂S if and only if there is a sequence{pn} ⊆ S and a sequence {qn} ⊆ Sc with limn pn = limn qn = p.

Example 1.85. Let X = R with the usual metric and let S = (0, 1]. Then∂S = {0, 1} and ∂Q = R.

Due 9/26

Problem 1.86. Prove Corollary 1.79.


Problem 1.88. Let X = R2 with the Euclidean metric and let S = {(x1, x2) :x2

1 + x22 < 1}. Prove that S = {(x1, x2) : x2

1 + x22 ≤ 1} and that ∂S =

{(x1, x2) : x21 + x2

2 = 1}.

Definition 1.89. Let (X,d) be a metric space S ⊆ X. A point p ∈ X, iscalled a cluster point or accumulation point of S, provided that for everyr > 0, B(p; r) ∩ S has infinitely many points.

Problem 1.90. Prove that p is a cluster point of S if and only if there isa sequence {pn} ⊆ S of distinct points, i.e., pi 6= pj for i 6= j, such thatlimn pn = p.

Problem 1.91. Let (X,d) be a discrete metric space and let S ⊆ X be anynon-empty set. Find int(S), S, ∂S and give the cluster points of S.

1.6 Completeness

One weakness of “convergence” is that when we want to prove that a se-quence {pn} converges, then we need the point p that it converges to beforewe can prove that it converges. But often in math, one doesn’t know yetthat a problem has a “solution” and we can only produce a sequence {pn}that somehow is a better and better approximate solution and we want toclaim that necessarily a point exists that is the limit of this sequence. Itis for these reasons that mathematicians introduced the concepts of Cauchysequences and complete metric spaces.

Definition 1.92. Let (X,d) be a metric space. A sequence {pn} ⊆ X iscalled Cauchy provided that for each ε > 0 there exists N so that wheneverm,n > N, then d(pn, pm) < ε.

We look at a few properties of Cauchy sequences.


Proposition 1.93. Let (X,d) be a metric space and let {pn} ⊆ X be asequence. If {pn} is a convergent sequence, then {pn} is a Cauchy sequence.

Proof. Let p = limn pn. Given ε > 0 there is N so that n > N implies thatd(p, pn) < ε

2 . Now let m,n > N, then d(pn, pm) ≤ d(pn, p) + d(p, pm) <ε2 + ε

2 = ε.

Proposition 1.94. Let (X,d) be a metric space, {pn} ⊆ X a sequence and{pnk} a subsequence. If {pn} is Cauchy, then {pnk

} is Cauchy, i.e., everysubsequence of a Cauchy sequence is also Cauchy.

Proof. Given ε > 0, there is N so that n,m > N then d(pn, pm) < ε. LetK = N and recall that nk ≥ k. Thus, if k, j > K then nk ≥ k > N andnj ≥ j > N, and so d(pnk

, pnj ) < ε.

Proposition 1.95. Let (X,d) be a metric space and let {pn} ⊆ X be aCauchy sequence. Then {pn} is bounded.

Proof. Set ε = 1 and choose N so that for n,m > N, d(pn, pm) < 1. Fix anyn1 > N, we will show that there is a radius r so that {pn} ⊆ B−(pn1 ; r).To this end, set r = max{1, d(pn1 , pn) : 1 ≤ n ≤ N}. This is a finite set ofnumbers so it does have a maximum. Now given any n, either 1 ≤ n ≤ N,in which case d(Pn1 , pn) ≤ r, since it is one of the numbers occurring themax, or n > N, and then d(pn1 , pn) < 1 ≤ r.

Definition 1.96. Let (X,d) be a metric space. If for each Cauchy sequencein (X,d), there is a point in X that the sequence converges to, then (X,d) iscalled a complete metric space.

Example 1.97. Let X = (0, 1] ⊆ R be endowed with the usual metricd(x, y) = |x − y|. Then X is not complete, since { 1

n} ⊆ X is a Cauchysequence with no point in X that it can converge to.

Example 1.98. Let Q denote the rational numbers with metric d(x, y) =|x−y|. We can take a sequence of rational numbers converging to

√2, which

we know is irrational. Then that sequence will be Cauchy, but not have alimit in Q. Thus, (Q, d) is not complete.

In Math 3333, we proved that R with the usual metric has the propertythat every Cauchy sequence converges, that is, (R, d) is a complete metricspace. This fact is so important that we repeat the proof here. First, weneed to recall a few important facts and definitions.

1.6. COMPLETENESS 27

Recall that a set S ⊆ R is called bounded above if there is a numberb ∈ R such that s ∈ S implies that s ≤ b. Such a number b is called an upperbound for S. An upper bound for S that is smaller than every other upperbound of S is called a least upper bound for S and denoted lub(S) inRosenlicht’s book. In the book that we used for Math 3333, a least upperbound for S was called a supremum for S and denoted sup(S).

Similarly, a set S ⊆ R is called bounded below if there is a numbera ∈ R such that s ∈ S implies that a ≤ s. Such a number a is called alower bound for S. A lower bound for S that is larger than every otherlower bound is called a greatest lower bound for S, and denoted glb(S)in Rosenlicht’s book. In the book that we used for Math 3333, a greatestlower bound was called an infimum for S and denoted inf(S).

A key property of R is that every set that is bounded above has a leastupper bound and that every set that is bounded below has a greatest lowerbound.

Proposition 1.99. Let {an} be a sequence of real numbers such that theset S = {an : n ≥ 1} is bounded above and such that an ≤ an+1 for everyn. Then {an} converges and limn an = lub(S). If {bn} is a sequence of realnumbers such that S = {bn : n ≥ 1} is bounded below and bn ≥ bn+1, then{bn} converges and limn bn = glb(S).

Proof. Let a = lub(S), then an ≤ a for every n. Given ε > 0, since a−ε < a,we have that a − ε is not an upper bound for S and so there is a N sothat a − ε < aN . If n > N, then a − ε < aN ≤ an ≤ a < a + ε. Thus,a − ε < an < a + ε and so |an − a| < ε for every n > N. This proves thatlimn an = a.

The proof of the other case is similar.

Theorem 1.100. Let (R, d) denote the real numbers with the usual met-ric. Then (R, d) is complete, i.e., every Cauchy sequence of real numbersconverges.

Proof. Let {pn} be a Cauchy sequence of real numbers. By the above resultsS1 = {pj : j ≥ 1} is a bounded set of real numbers, so it is bounded aboveand bounded below. For each n ≥ 1, let Sn = {pk : k ≥ n}. Note thatS1 ⊇ S2 ⊇ . . . , so that each set Sn is a bounded set of real numbers.

Let an = glb(Sn), and bn = lub(Sn). Since Sn+1 ⊆ Sn we have that an ≤an+1 and bn+1 ≤ bn. Since pn ∈ Sn, an ≤ pn ≤ bn. In particular, an ≤ b1for every n, and so {an} converges and a = limn an = lub({an : n ≥ 1}).Similarly, a1 ≤ bn for every n and so {bn} converges and b = limn bn =glb({bn : n ≥ 1}). Also, since an ≤ bn we have that a ≤ b.


Now given any ε > 0, there is N so that when j, n > N, |pn − pj | < ε.Hence, for any j ≥ n, we have that pn − ε < pj < pn + ε. This shows thatpn − ε is a lower bound for Sn and pn + ε is an upper bound for Sn. Thus,pn − ε ≤ an ≤ a ≤ b ≤ bn ≤ pn + ε.

This shows that a and b are trapped in an interval of length 2ε. Since εwas arbitrary, by the “vanishing ε principle”, |a − b| = 0, so a = b. Thus,limn an = limn bn, but an ≤ pn ≤ bn, so by the “Squeeze Theorem” forlimits, limn an = limn pn = limn bn.

Another set of important examples of compelte metric spaces are theEuclidean spaces.

Theorem 1.101. Let (Rk, d) denote k-dimensional Euclidean space. Then(Rk, d) is complete.

Proof. To simplify notation, we will only do the case k = 2. So let pn =(an, bn) be a Cauchy sequence of points in R2 with the Euclidean metricd. Given ε > 0, there is N so that n,m > N implies that d(pn, pm) < ε.But |an − am| ≤

√|an − am|2 + |bn − bm|2 = d(pn, pm), so for n,m > N,

|an − am| < ε, and so {an} is a Cauchy sequence of real numbers. Hence{an} converges, let a = limn an.

Similarly, |bn − bm| < ε for n,m > N and so {bn} converges and letb = limn bn.

Since limn an = a and limn bn = b, by Theorem 1.57, limn pn = (a, b).Hence, this arbitrary Cauchy sequence has a limit and the proof is done.

Due 10/3:

Problem 1.102. Let X be a set and let d and ρ be two uniformly equivalentmetrics on X. Prove that (X, d) is a complete metric space if and only if(X, ρ) is complete.

By the above theorem and problem, (Rk, d1) and (Rk, d∞) are also com-plete metric spaces.

Example 1.103. Let X = (0, 1]. Recall that X is not complete in the usualmetric d(x, y) = |x− y|. Given x, y ∈ X we set γ(x, y) = | 1x −

1y |. It is easy

to check that γ is a metric on X. We claim that (X, γ) is complete! Thus,d and γ are not uniformly equivalent.

We sketch the proof. To prove this we must show that if {xn} ⊆ X isCauchy in the γ metric, then it converges to a point in X. Given ε > 0,suppose that for n,m > N, γ(xn, xm) < ε. Since γ(xn, xm) = | 1

xn− 1

xm| =

1.7. COMPACT SETS 29

|xn−xmxnxm

| we have that |xn − xm| < ε|xnxm| ≤ ε. Thus, {xn} is Cauchyin the usual metric. Let x = limn xn. Since, 0 < xn ≤ 1 we have that0 ≤ x ≤ 1. We claim that x 6= 0. Because if x = 0, then for any N, whenn,m > N, if we fix m and we let n→ +∞, xn → 0, then 1

xn→ +∞. Thus,

γ(xn, xm) = | 1xn− 1

xm| → +∞. This prevents us making γ(xn, xm) < ε and

so violates the Cauchy condition.Thus, x 6= 0. But also xn 6= 0 for every n. By one of our basic results

from 3333, when x 6= 0, xn 6= 0 and limn xn = x, then limn1xn

= 1x . But this

last limit being true means that for any ε > 0, we can pick N so that whenn > N, | 1x −

1xn| < ε. But this implies that for n > N, γ(x, xn) < ε and so

{xn} converges to x in the γ metric! We are done.Now that we have a few examples of complete metric spaces, the following

result gives us many more examples.

Proposition 1.104. Let (X,d) be a complete metric space. If Y ⊆ X is aclosed subset, then (Y,d) is a complete metric space.

Proof. Let {yn} ⊆ Y be a Cauchy sequence. Since X is complete, there is apoint x ∈ X, so that limn yn = x. But since Y is closed, x ∈ Y. Thus, eachCauchy sequence in Y has a limit in Y.

1.7 Compact Sets

Definition 1.105. Let (X,d) be a metric space, S ⊆ X. A collection {Uα}α∈Aof subsets of X is called a cover of S provided that S ⊆ ∪α∈AUα and anopen cover of S provided that it is a cover of S and every set Uα is open.A subset S ⊆ X is called compact provided that whenever {Uα}α∈A is anopen cover of S, then there is a finite subset F ⊆ A such that S ⊆ ∪α∈FUα.The collection {Uα}α∈F is called a finite subcover.

Example 1.106. Let R have the usual metric. Let Un = B(0;n) = (−n,+n),n ∈ N. Then these sets are open and R = ∪n∈NUn. Suppose that there wasa finite subset F = {n1, ..., nL} ⊆ N so that R ⊆ ∪n∈FUn = Un1 ∪ · · · ∪UnL .If we let N = max{n1, ..., nL}, then since n < m implie that Un ⊆ Um, wewould have that R ⊆ ∪n∈FUn = UN . But this implies that every real numberis in B(0;N), a contradiction. Hence, no finite subcover of {Un}n∈N coversR and so R is not compact.

Proposition 1.107. Let (X, d) be a discrete metric space and let K ⊆ X.Then K is compact if and only if K is a finite set.


Proof. Assume that K is compact. For each x ∈ K, let Ux = {x}. BecauseX is discrete each Ux is an open set. Also K = ∪x∈KUx, so this is an opencover of K. But if you leave out any of the sets Ux then you no longer coverK. So this cover must be a finite cover, in which case K has only finitelymany points.

Conversely, assume that K = {x1, ..., xn} is a finite set and let {Uα}α∈Abe any open cover of K. There must be some αJ so that xj ∈ Uαj . But then{Uα1 , ..., Uαn} is a finite subcover for K.

Before we can give many more examples of compact sets, we need sometheorems.

Proposition 1.108. Let (X,d) be a metric space. If K ⊆ X is compact,then K is closed.

Proof. We will prove that if K is not closed, then K is not compact. SinceK is not closed K 6= K. Hence, there is p ∈ K, p /∈ K. Since p ∈ K, thenfor every r > 0, B(p; r) ∩K is non-empty.

Let Un = B−(p; 1/n)c = {q ∈ X : d(p, q) > 1/n}. Then each of thesesets is open and ∪n∈NUn = {q ∈ X : d(p, q) > 0} = {p}c. Since p /∈ K,we have that thi is an open cover of K. Not that n < m implies thatUn ⊆ Um, so as in the proof of Example 1.106, Un1 ∪ · · · ∪ UnL = UNwhere N = max{n1, ..., nL}. So if K was contained in a finite subcover, thenthere would be N with K ⊆ UN . But this would imply that for q ∈ K,d(p, q) > 1/N, and so K ∩B(p; 1/N) would be empty, contradiction.

Proposition 1.109. Let (X,d) be a metric space, K ⊆ X a compact setand let S ⊆ K be a closed subset. Then S is compact.

Proof. Let {Uα}α∈A be an open cover of S and let V = Sc so that V is open.Check that {V,Uα : α ∈ A} is an open cover of K. So finitely many of thesesets cover K and the same finite collection covers S since S is a subset ofK.

The next few results give a better understanding of the structure ofcompact sets.

Proposition 1.110. Let (X,d) be a metric space, K ⊆ X a compact setand S ⊆ K an infinite subset. Then S has a cluster point in K.

Proof. We prove the contrapositive. Suppose S ⊆ K is a set with no clusterpoint in K. Then for each point p ∈ K, since it is not a cluster point ofS there would exist an r > 0 such that B(p; r) contains only finitely many


points from S. The set of all such balls is an open cover of K. Now becauseK is compact, some finite collection of these balls cover K. But each ofthese balls has only finitely many points from S so their union contains onlyfinitely many points from S. Since their union contains K, it contains all ofS and so S has only finitely many points.

Definition 1.111. Let (X,d) be a metric space, K ⊆ X. Then K is calledsequentially compact if every sequence in K has a subsequence that con-verges to a point in K.

In some texts a set is said to have the Bolzano-Weierstrass propertyif and only if it is sequentially compact.

Proposition 1.112. Let (X,d) be a metric space, K ⊆ X. If K is compact,then K is sequentially compact.

Proof. Let {pn} ⊆ K be a sequence. We do two cases.Case 1. The sequence has only finitely many points. In this case there

must be some p so that pn = p for infinitely many n. Let n1 be the first suchinteger, n2 the second, etc., this defines a subsequence {pnk

} with pnk= p

for all k. Clearly this subsequence converges to p.Case 2. The sequence has infinitely many values. In this case if we let

S be the set of points in the sequence, then S is an infinite set and so has acluster point p in K. We will construct a subsequence that converges to p.First, let n1 be the smallest integer so that pn1 ∈ B(p; 1) and pn1 6= p. Nowlet n2 be the smallest integer strictly larger than n1 so that pn2 ∈ B(p; 1/2)and pn2 6= p. Continuing in this way we obtain a subsequence {pnk

} suchthat d(p, pnk

) < 1/k, and hence this subsequence converges to p.

Definition 1.113. Let (X,d) be a metric space, K ⊆ X and let ε > 0. Asubset E ⊆ K is called an ε-net for K provided that given any p ∈ K thereis q ∈ E, such that d(p, q) < ε. The subset K is called totally bounded iffor each ε > 0 there is an ε-net for K consisting of finitely many points.

Example 1.114. Let K = [0, 1] for each ε > 0, let N be the largest integerso that Nε ≤ 1. Then {0, ε, 2ε, ..., Nε} is an ε-net for K.

Proposition 1.115. Let (X,d) be a metric space and K ⊆ X. If K issequentially compact, then K is totally bounded and complete.

Proof. First we show that K is complete. To see this let {pn} ⊆ K be aCauchy sequence. Since K is sequentially compact, there is a p ∈ K anda subsequence so that limk pnk

= p. We now show that limn pn = p. Given


ε > 0, there is N so that n,m > N implies that d(pn, pm) < ε/2. But wecan pick a large k so that nk > N and d(p, pnk

) < ε/2. Hence for n > N,d(p, pn) ≤ d(p, pnk

) + d(pnk, pn) < ε.

Now to show that K sequentially compact implies K totally bounded,we prove the contrapositive. So assume that K is not totally bounded. Thenthere exists some ε > 0 for which we can find no finite ε-net. Pick any p1 ∈ K,since this point is not an ε-net, there must be p2 ∈ K such that d(p2, p1) > ε.Since {p1, p2} can not be an ε-net, there is p3 ∈ K such that d(p3, p2) > εand d(p3, p1) > ε. Continuing in this way, we obtain a sequence {pn} in Kwith d(pn, pm) > ε for all n 6= m. But for a sequence to converge to a point,it must be Cauchy. Since no subsequence of this sequence could be Cauchy,no subsequence converges. Hence, K is not sequentially compact.

Now we come to the main theorem.

Theorem 1.116. Let (X,d) be a metric space and K ⊆ X. Then the fol-lowing are equivalent:

1. K is compact,

2. K is sequentially compact,

3. K is totally bounded and complete.

Proof. We have already shown that 1) implies 2) and that 2) implies 3). Soit will be enough to show that 3) implies 1). To this end assume that 3)holds but that K is not compact. Then there is some open cover {Uα} of Kwith no finite subcover.

For ε = 1/2 take a finite 1/2-net, {p1, ..., pL}, So thatK ⊆ ∪Lj=1B(pj ; 1/2).Hence, K = (K ∩ B−(p1; 1/2)) ∪ · · · ∪ (K ∩ B−(pL; 1/2). This writes K asthe union of L closed sets. Now if each of these sets could be covered by afinite collection of sets in {Uα} then K would have a finite subcover. So oneof these sets cannot have a finite subcover. Pick one and call it K1. ThenK1 ⊆ K is closed and since it is the intersection with a closed ball of radius1/2 no two points in K1 can be distance greater than 1 apart.

Now take a finite 1/4-net, use it to write K1 as a finite union of sets ofradius 1/4, and arguing as before pick one which cannot be covered by afinite collection of the U’s. This defines a closed subset K2 ⊆ K1 with notwo points greater than 1/2 apart. Proceed inductively to define a sequenceof closed subsets, K ⊇ K1 ⊆ K2 ⊇ . . . , so that no two points in Kn aredistance greater than 1/2n−1 apart and each Kn cannot be covered by afinite collection of the U’s.


If we pick a point pn ∈ Kn, then the sequence is Cauchy since for n,m >N, d(pn, pm) ≤ 1/2N−1. Since K is complete, there is p ∈ K with limn pn =p. Also, since pn ∈ Km for all n ≥ m, this implies that the limit is in Km,i.e., p ∈ Km for all m.

Now since {Uα} is an open cover of K, there is some α0, with p ∈ Uα0

and since this set is open, there is an r > 0, so that B(p : r) ⊆ Uα0 . Butsince no two points in Kn are more than distance 1/2n−1 apart, if we pickn, so that 1/2n−1 < r, then Kn ⊆ B(p; r) ⊆ Uα0 . This contradicts the factthat Kn is not contained in a finite union of the U’s. This contradictioncompletes the proof of the theorem.

Theorem 1.117 (Heine-Borel). Let (Rk, d) denote k-dimensional Euclideanspaceand let K ⊆ Rk. Then K is compact if and only if K is closed andbounded.

Proof. We have already shown that compact sets are closed. Every compactset is bounded by Problem 1.121.

If K is closed and bounded, then there is some M > 0 so that K ⊆[−M,+M ] × [−M,+M ] × · · · × [−M,+M ]. If we can show that this k-dimensional cube is totally bounded, then it will be compact and since K isa closed subset of this compact set, it will follow that K is compact.

To see that the cube is totally bounded, divide each interval into 2Nequal length subintervals of length M/N. This divides the large cube up intoa finite number of smaller cubes. Each of these smaller cubes is containedin a Euclidean ball of radius

√k M2N . Given any ε > 0, pick N large enough

that√k M2N < ε, then these balls define a finite ε-net. Thus, the big cube is

totally bounded.

Theorem 1.118 (Bolzano-Weierstrass). Let (Rk, d) be k-dimensional space.If {pn} ⊆ Rk is a bounded sequence, then it has a convergent subsequence.

Proof. Since the sequence is bounded it is contained in a closed ball ofsome finite radius. This set is compact by Heine-Borel, hence sequentiallycompact.

Historically, the Bolzano-Weierstrass theorem was proved before Heine-Borel. The original proof used a “divide and conquer” strategy.

Definition 1.119. Let (X,d) be a metric space and let Y ⊆ X. A subset S ⊆Y is called dense provided that S ∩ Y = Y. The set Y is called separableif there is a sequence S = {pn} ⊆ Y that is dense.


Proposition 1.120. Let (X,d) be a metric space. If K ⊆ X is compact,then K is separable.

Proof. For each n ∈ N, let Sn be a finite 1/n-net for K. Now we write theelements in S = ∪n∈NSn as a sequence by numbering them beginning withS1, then S2, etc.

Now we claim that the sequence S = {pn} is dense. To see this, givenany p ∈ K and r > 0, if we take 1/n < r then since Sn is a 1/n-net for K,we have that B(p; r) ∩ Sn 6= Ø. Hence, B(p; r) ∩ S 6= Ø. This shows thatp ∈ S. Thus, K ⊆ S ∩K ⊆ K and so S is dense.

Due 10/10:

Problem 1.121. Let (X, d) be a metric space and K ⊆ X a compact subset.Prove that K is bounded.

Problem 1.122. Let (X, d) be a metric space. Let Ki ⊆ X, i = 1, ..., n be afinite collection of compact subsets. Prove that K1 ∪ · · · ∪Kn is compact.

We now look at a few applications of compact sets. First we need aninequality sometimes called the reverse triangle inequality.

Proposition 1.123. Let (X, d) be a metric space and let x, y, z ∈ X. Then|d(x, y)− d(x, z)| ≤ d(y, z).

Proof. We have that d(x, y)−d(x, z) ≤ (d(x, z)+d(z, y))−d(x, z) = d(y, z).On the other hand, −(d(x, y) − d(x, z)) = d(x, z) − d(x, y) ≤ (d(x, y) +d(y, z))− d(x, y) = d(y, z). Since this number and its negative are both lessthan d(y, z) we have that |d(x, y)− d(x, z)| ≤ d(y, z).

Proposition 1.124. Let (X, d) be a metric space, let {qn} ⊆ X be a con-vergent sequence with limit q and let p ∈ X. Then d(p, q) = limn d(p, qn).

Proof. Given ε > 0, there is a N so that n > N, implies that d(q, qn) < ε.Hence, by the reverse triangle inequality, for n > N, we have |d(p, q) −d(p, qn)| ≤ d(q, qn) < ε. This shows that d(p, q) = limn d(p, qn).

Definition 1.125. Let (X, d) be a metric space, let S ⊆ X and let p ∈ X.Then the distance from p to S is the number

dist(p, S) = inf{d(p, q) : q ∈ S}.


In general, the distance from a point to a set is really an infinum andnot a minimum. That is there is not generally a point in q0 ∈ S so thatdist(p, S) = d(p, q0). For example, in R with the usual metric, if we takeS = [0, 1) and p = 3, then dist(p, S) = 2, but there is no point q0 ∈ S, withd(p, q0) = |p− q0| = 2.

However, for compact sets the distance is always attained as the followingresult shows.

Proposition 1.126. Let (X, d) be a metric space, let K ⊆ X and let p ∈ X.If K is compact, then there is a point q0 ∈ K, such that dist(p,K) = d(p, q0).

Proof. Let {qn} be a sequence in K so that dist(p,K) = limn d(p, qn). SinceK is compact, this sequence has a subsequence that converge to a point inK. Let {qnk

} be a convergent subsequence with limit q0 ∈ K. Then by theabove proposition, we have that d(p, q0) = limk d(p, qnk

). Since {d(p, qn)} isa convergent sequence of reals, the subsequence has the same limit. Hence,d(p, q0) = limn d(p, qn) = dist(p,K).

Proposition 1.127. Let (Rk, d) be k-dimesnional Euclidean space, let C ⊆Rk, and let p ∈ Rk. If C is closed, then there is a point q0 ∈ C withdist(p, C) = d(p, q0).

Proof. Pick any point q1 ∈ C and let r = d(p, q1). Then dist(p, C) ≤ d(p, q1)and so, dist(p, C) = inf{d(p, q) : q ∈ C, d(p, q) ≤ r}. This shows that if we letK = C ∩ B−(p; r), then dist(p, C) = dist(p,K). Now since K is closed andbounded, it is a compact set by Heine-Borel. Thus, by the last result, thereis q0 ∈ K( and hence, q0 ∈ C) with dist(p, C) = dist(p,K) = d(p, q0).


Chapter 2

Finite and Infinite Sets

Later we will need a little familiarity with infinite sets and the fact that theycan have “different sizes”. So we discuss these ideas now.

One formal way to think of the statement that a set S has n elements,is that there is a function f : {1, 2, ..., n} → S that is one-to-one and onto.Usually, we write f(j) = sj and write S = {s1, s2, ..., sn} to show that it hasn elements.

A set S is finite when there is some natural number n and a one-to-oneonto function f : {1, 2, ..., n} → S.

A set S is called countably infinite when there is a one-to-one ontofunction f : N→ S. This can be thought of as saying that a set is countableinfinite precisely when the elements of S can be listed in sequence of S ={s1, s2, ...} with no repetitions, i.e., i 6= j implies that si 6= sj .

A set is countable when it is either finite or countably infinite. It isnot hard to see that a set S is countable if and only if there is a functionf : N→ S that is onto.

There are many sets that are countable, some are a little surprising.For example, Z is countable, since I could “list” the elements as,

{0, 1,−1, 2,−2, ...}.

Formally, f : N → Z is the function, f(2k − 1) = −k, for k = 1, 2, ... andf(2k) = k, for k = 1, 2, ....

Also, even though N × N seems to be much “larger” than N it is alsocountable. To see this first I’ll write N×N as a union of sets, each of whichis finte. For each natrual number n ≥ 2, let Sn = {(i, j) ∈ N×N : i+j = n}.Thus, S2 = {(1, 1)}, S3 = {(1, 2), (2, 1)}, S4 = {(1, 3), (2, 2), (3, 1)}. It is nothard to see that Sn has n−1 elements. Now to list these points, we will first

37

38 CHAPTER 2. FINITE AND INFINITE SETS

list the points in S2 then S3 etc. Within each set, we will list the elements inorder of their first coordinate, so for example, (1, 2) comes before (2, 1). Thistime the formula for the one-to-one onto function f : N→ N×N, is difficultto write down. But equivalently, we could define its inverse, i.e., a one-to-oneonto function g : N×N→ N. Now if i+j = n then all the elements of the setsS1, ..., Sn−1 come before. These sets have 1 + 2 + ... + (n− 2) = (n−2)(n−1)

2elements, then the order after that is determined by whether i = 1, 2, ...Thus, the function g is defined by

g((i, j)) =(i+ j − 2)(i+ j − 1)

2+ i

and we have shown that N× N is countable.

Now since N×N is countable N×N×N is also countable! To see how todo this, first define h : N×N×N→ N×N by h((i, j, k)) = (g((i, j)), k). Sinceg is one-to-one and onto N, it is easy to see that h is one-to-one and ontoN× N. Now if we compose with g then we will have g ◦ h : N× N× N→ Nis one-to-one and onto!

In a similar way one sees that doing the Cartesian product of N withitself any finite number of times yields a countable set.

These facts make it easy to see that some other sets are countable. Forexample to see that, Q = {p/q : p, q ∈ Z, q 6= 0} is countable. First, themap from Z× (Z\{0})→ Q defined by (p, q)→ p/q is onto(but not one-to-one). We know that Z is countable and Z\{0} is countable. So there is aone-to-one onto map from N × N → Z × (Z\{0}), and hence, a one-to-oneonto map from N to Z× (Z\{0}). Composing with the above map gives anonto map from N onto Q. Hence, Q is countable.

With so many countable sets it is a little surprising that there are setsso big that their elements cannot be “listed”, i.e., that are not countable.Such sets are called uncountable.

One example of an uncountable set is the set R. To prove that R isuncountable it will be enough to prove that the real numbers in the interval(0, 1) is uncountable.

Suppose that the real numbers in (0, 1) was a countable set, then wecould list them all, let rn denote the n-th real number, so that {rn : n ∈N} = (0, 1). Now each real number has a decimal expansion. Recall that0.4000... = 0.399999.... We shall only allow the first type of decimal expan-sion. When we disallow infinite tails of nines, then the decimal expansion is

39

unique. Now let

r1 = 0.a1,1a1,2a1,3.....

r2 = 0.a2,1a2,2a2,3.....

etc.

where each of the ai,j is an integer between 0 and 9. Note that ai,j is thej-th decimal entry of the i-th number.

Now define a sequence of integers bn, as follows: if 0 ≤ an,n ≤ 4 letbn = an,n + 1, and if 5 ≤ an,n ≤ 9 let bn = an,n − 1.

Note that 1 ≤ bn ≤ 8. Hence, there is a real number r, 0 < r < 1, wherer has decimal expansion,

r = 0.b1b2....

Note that the way that we have defined the bn’s, there are no nines. Sinceb1 6= a1,1, r 6= r1 and since b2 6= a2,2, r 6= r2, etc. Thus, r was not includedin the list! Contradiction.

This method of proof–of listing things, somehow getting a doubly in-dexed array and then altering the (n, n)-entries is often referred to as Can-tor’s diagonalization method.

40 CHAPTER 2. FINITE AND INFINITE SETS

Chapter 3

Continuous Functions

Continuity plays an important role for functions on the real line. Intuitively,a function is continuous provided that it sends “close” points to “close”points. When we say “close” we naturally have a notion of distance betweenpoints, both in the domain and the range. Thus, we see that continuity isreally a property for functions between metric spaces. In this chapter, wedefine what we mean by continuous functions between metric spaces, thenstudy the properties of continuous functions. We will see that our notionsof open, closed and compact sets all play an important role.

Definition 3.1. Let (X, d) and (Y, ρ) be metric spaces and let p0 ∈ X. Wesay that a function f : X → Y is continuous at p0 provided that for everyε > 0 there is δ > 0 such that whenever p ∈ X and d(p0, P ) < δ thenρ(f(p0), f(p)) < ε.

Note that in this definition the value of δ really depends on the ε for thisreason some authors write δ(ε).

Two quick examples.

Example 3.2. Let X = Y = R both endowed with the usual metric and letf : R→ R be defined by

f(x) =

{0 if x ≤ 01 if x > 0.

Then f is not continuous at p0 = 0.

To see that f is not continuous at 0, take ε = 1/2. For any δ > 0,the point p = δ/2 satisfies d(p0, p) = |p0 − p| = |0 − δ/2| = δ/2, but

41

42 CHAPTER 3. CONTINUOUS FUNCTIONS

ρ(f(p0), f(p)) = |f(p0)− f(p)| = 0− 1| > ε. Thus, every possible value of δfails to meet the critieria.

Note that the domain of the function is really important when trying todecide continuity. For this same formula, if we made the domain insteadjust X = [−1, 0], then f would be continuous at 0, since now the functionwould be the function f(x) = 0, for every x ∈ X.

Example 3.3. Let X = Y = R with the usual metric, let f(x) = x2 and letp0 = 3. Then f is continuous at p0.

To see this given ε > 0, we take δ = min{1, ε/7}. Then when d(p0, p) =|3−p| < δ we know that |3−p| < 1 and so 2 < p < 4. Hence, d(f(p0), f(p)) =|32 − p2| = |3− p||3 + p| < (3 + 4)|3− p| < 7δ ≤ ε.

If we wanted to prove that this function was continuous at p0 = 5, thenwe could take δ = min{1, ε/11}, and the same argument would work.

Definition 3.4. Let (X, d) and (Y, ρ) be metric spaces. We say that afunction f : X → Y is continuous provided that f is continuous at everypoint in X.

Thus, f is continuous provided that for each p0 ∈ X and ε > 0 thereis δ > 0, such that p ∈ X and d(p0, p) < δ implies that ρ(f(p0), f(p)) < ε.In this case δ depends on both ε and the point p0. For this reason someauthor’s write δ(p0, ε).

Example 3.5. Let X = Y = R with the usual metric. Then f(p) = p2 iscontinuous.

Given p0 ∈ R and ε > 0 let δ = min{1, ε2|p0|+1 . Then d(p0, p) < δ implies

that |p| < |p0|+ 1 and hence ρ(f(p0), f(p)) = |p20 − p2| = |p0 + p||p0 − p| ≤

(|p0|+ |p|)δ < (2|p0|+ 1)δ ≤ ε.In this example, the δ that we picked depended on both p0 and ε. When it

can be chosen independent of p0, the function is called uniformly continuous.That is:

Definition 3.6. Let (X, d) and (Y, ρ) be metric spaces. We say that afunction f : X → Y is uniformly continuous provided that for everyε > 0 there is δ > 0 such that whenever p, q ∈ X and d(p, q) < δ thenρ(f(p), f(q)) < ε.

The following is immediate:

Proposition 3.7. Let (X, d) and (Y, ρ) be metric spaces and let f : X → Y.If f is uniformly continuous, then f is continuous.

43

Example 3.8. Let X = Y = R with the usual metric and let f(p) = 5p+ 7,then f is uniformly continuous.

Given ε > 0, let δ = ε/5, then when d(p, q) = |p − q| < δ we have thatρ(f(p), f(q)) = |f(p)− f(q)| = 5|p− q| < 5δ = ε.

Example 3.9. Let X = Y = R with the usual metric and let f(p) = p2.Then f is not uniformly continuous.

To prove this it will be enough to take ε = 1, and show that for this valueof ε, we can find no corresponding δ > 0. By way of contradiction, supposethat there was a δ such that |p− q| < δ implied that |f(p)− f(q)| < 1. Setpn = n, qn = n + δ/2. Then |pn − qn| < δ, but |f(pn)− f(qn)| = q2n − p2

n =nδ + δ2/4 > nδ > 1, for n sufficiently large, in fact for n > δ−1.

Example 3.10. Let X = [−M,+M ], Y = R with usual metrics and letf : X → Y with f(p) = p2. Then f is uniformly continuous.

To see this, given ε > 0, set δ = ε2M . then |p − q| < δ implies that

|p2 − q2| = |p+ q||p− q| ≤ 2M |p− q| < 2Mδ = ε.We now look at a characterization of continuity in terms of open and

closed sets.Recall that if f : X → Y and S ⊆ Y, then the preimage of S is the subset

of X given byf−1(S) = {x ∈ X : f(x) ∈ S}.

It is important to realize that to make this definition, we do not need forthe function f to have an inverse function.

Theorem 3.11. Let (X, d) and (Y, ρ) be metric spaces and let f : X → Y.Then the following are equivalent:

1. f is continuous,

2. for every open set U ⊆ Y, the set f−1(U) is an open set in X,

3. for every closed set C ⊆ Y, the set f−1(C) is a closed subset of X.

Proof. First we show that 2) and 3) are equivalent. To see this note that(f−1(S))c = {x ∈ X : f(x) /∈ S} = f−1(Sc). So if say 2) holds and C isa closed set, then Cc is an open set. Hence, (f−1(C))c = f−1(Cc) is open,which implies that f−1(C) is closed. The proof that 3) implies 2) is similar.

Now we prove that 1) implies 2). Let U ⊆ Y be open. We must provethat f−1(U) is open. Since we will be talking about balls in two different


metric spaces, we will use a subscript to help indicate which metric. Letp0 ∈ f−1(U), so that f(p0) ∈ U. Since U is open there is ε > 0, so thatBρ(f(p0); ε) ⊆ U. Because f is continuous at p0, there is a δ > 0, so thatd(p0, p) < δ implies that ρ(f(p0), f(p)) < ε.

But this last statement shows that Bd(p0; δ) ⊆ f−1(U), because p ∈Bd(p0; δ) implies that d(p0, p) < δ, implies that ρ(f(p0), f(p)) < ε, impliesthat f(p) ∈ Bρ(f(p0); ε), implies that f(p) ∈ U.

Hence, f−1(U) is open in X.

Conversely, assume that 2) holds and we will prove 1). Given p0 ∈ Xand an ε > 0, we have that U = bρ(f(p0); ε) is open. Hence, by as-sumption, f−1(U) is open and p0 ∈ f−1(U). Thus, we may pick δ > 0so that Bd(p0; δ) ⊆ f−1(U). But this implies that if d(p0, p) < δ, thenp ∈ f−1(U), which implies that f(p) ∈ U = Bρ(f(p0); ε), which implies thatρ(f(p0), f(p)) < ε.

Hence, f is continuous at p0, and since this was an arbitrary point, wehave that f is continuous at every point in X.

Proposition 3.12. Let (X, d),(Y, ρ) and (Z, γ) be metric spaces and letf : X → Y and g : Y → Z be continuous functions. Then g ◦ f : X → Z iscontinuous.

Proof. First note that if S ⊆ Z, then x ∈ (g ◦ f)−1(S) iff (g ◦ f)(x) ∈ Siff g(f(x)) ∈ S iff f(x) ∈ g−1(S) iff x ∈ f−1(g−1(S)). Thus, we have that(g ◦ f)−1(S) = f−1(g−1(S)).

We use the above theorem. If U ⊆ Z is open in Z, then g−1(U) is openin Y and hence f−1(g−1(U)) is open in X. Thus, when U is open in Z, then(g ◦ f)−1(U) is open in X. Hence, g ◦ f is continuous.

Due 10/19:

Problem 3.13. Let f : R → R be defined by f(x) = x3. Prove that f iscontinuous.

Problem 3.14. Prove that f(x) = x3 is not uniformly continuous on R.

Problem 3.15. Prove that f : [−M,+M ]→ R with f(x) = x3 is uniformlycontinuous.

Problem 3.16. Let (X, d) and (Y, ρ) be metric spaces. A function f : X →Y is called Lipschitz continuous provided that there is a constant K > 0, sothat ρ(f(p), f(q)) ≤ Kd(p, q). Prove that every Lipschitz continuous functionis uniformly continuous.

3.1. FUNCTIONS INTO EUCLIDEAN SPACE 45

Given a function f : X → Y and S ⊆ X, the image of S is the subset ofY given by f(S) = {f(p) : p ∈ S}.

Problem 3.17. Let (X, d) and (Y, ρ) be metric spaces and f : X → Y afunction. Prove that f is continuous at p0 if and only if for every ε > 0, thereexists a δ > 0 so that the image of Bd(p0; δ) is contained in Bρ(f(p0); ε).

3.1 Functions into Euclidean space

Let (X, d) be a metric space. Given functions fi : X → R, i = 1, ..., kwe can define a function F : X → Rk by setting F (x) = (f1(x), ..., fk(x)).Conversely, any function F : X → Rk is readily seen to be of this form.We call the functions f1, ..., fk the component functions or, more simply, thecomponents of F.

To avoid possibly confusion, we shall let d2 denote the Euclidean metricon Rk.

Theorem 3.18. Let (X, d) be a metric space and let (Rk, d2) be Euclideanspace. A function F : X → Rk is continuous at a point p0 ∈ X if and onlyif each of its component functions fi : X → R is continuous at p0 for everyi = 1, ..., k. The function F is continuous if and only if each of its componentfunctions is continuous.

Proof. First assume that F is continuous at p0. Given ε > 0, there existsδ > 0 such that when d(p0, p) < δ, then

d2(F (p0), F (p)) =√

(f1(p0)− f1(p)2 + · · ·+ (fk(p0)− fk(p))2 < ε.

For each i, since |fi(p0)− fi(p)| < d2(F (p0), F (p)), we see that d(p0, p) < δimplies that |fi(p0)− fi(p)| < ε. Thus, fi is continuous at p0 for each i.

Conversely, assume that each fi is continuous at p0 and let ε > 0 begiven. For each i, there exists δi > 0, so that d(p0, p) < δi implies that|fi(p0) − fi(p)| < ε√

k. Now let δ = min{δ1, ..., δk}. Then for d(p0, p) < δ we

have that d2(F (p0), F (p)) < ε.The second equivalence follows from the first and the fact that continuous

means continuous at every point.

Theorem 3.19. Let (X, d) be a metric space, let p0 ∈ X, let R have the usualmetric and let f, g : X → R be functions. If f and g are both continuous atp0, then:

1. f + g is continuous at p0,


2. fg is continuous at p0,

3. for any constant c, cf is continuous at p0,

4. if g(p) 6= 0 for all p, then fg is continuous at p0.

Proof. To prove the first statement, given ε > 0, there is δ1 > 0, so thatd(p0, p) < δ1 implies that f(p0) − f(p)| < ε/2. Also there is δ2 > 0, so thatd(p0, p) < δ2 implies that |g(p0)− g(p)| < ε/2. Let δ = min{δ1, δ2}.

To prove the second, given ε > 0, first pick δ1 > 0 and δ2 > 0, sothat d(p0, p) < δ1 implies |f(p0) − f(p)| < 1 and d(p0, p) < δ2 implies that|g(p0)− g(p)| < 1. Now pick δ3 > 0 and δ4 > 0 so that d(p0, p) < δ3 impliesthat |f(p0)−f(p)| < ε

2(|g(p0)|+1) and d(p0, p) < δ4 implies that |g(p0)−g(p)| <ε

2(|f(p0)|+1) . Let δ = min{δ1, ..., δ4}, then for d(p0, p) < δ we have that

|f(p0)g(p0)− f(p)g(p)| ≤ |f(p0)− f(p)||g(p0)|+ |g(p0)− g(p)||f(p)| <ε

2(|g(p))|+ 1)|g(p0)|+ ε

2(|f(p0)|+ 1)(|f(p0)|+ 1) < ε.

Statement 3) follows from 2) and the fact constants are continuous.Statement 4) will follow from 2), if we prove that the function 1

g iscontinuous at p0. Given ε > 0, we first pick δ1 > 0, so that d(p0, p) < δ1implies that |g(p0)− g(p)| < |g(p0)|

2 . This guarantees that |g(p)| > |g(p0)|2 . We

now pick δ2 > 0 so that d(p0, p) < δ2 implies that |g(p0) − g(p)| < |g(p0)|2ε2 .

Now if we let δ = min{δ1, δ2}, then when d(p0, p) < δ, we have

| 1g(p0)

− 1g(p)| = |g(p)− g(p0)|

|g(p0)g(p)|≤

|g(p)− g(p0)||g(p0)|2/2

<|g(p0)|2ε

22

|g(p0)|2= ε.

Corollary 3.20. Let (X, d) be a metric space, let R have the usual metricand let f, g : X → R be functions. If f and g are both continuous, then:

1. f + g is continuous,

2. fg is continuous,

3. for any constant c, cf is continuous,

4. if g(p) 6= 0 for all p, then fg is continuous.

3.2. CONTINUITY OF SOME BASIC FUNCTIONS 47

Corollary 3.21. Let (X, d) be a metric space, let p0 ∈ X,and let F,G :X → Rk. If F,G are both continuous at p0(respectively, both continuous),then

1. F +G is continous at p0(respectively, continuous),

2. cF is continuous at p0(respectively, continuous),

3. F ·G is continuous at p0(respectively, continuous).

Proof. We only prove 3), the rest are similar. If we let f1, ..., fk and g1, .., gkbe the component functions for F and G, then the fact that F and G arecontinuous at p0 implies that f1, ..., fk, g1, ..., gk are all continuous at p0. Bythe above result about continuity of products, each figi is continuous at p0,and so by the result about continuity of sums, F ·G = f1g1 + · · · + fkgk iscontinuous at p0.

The results about continuity on all of X then follow since they are trueat each point.

3.2 Continuity of Some Basic Functions

Proposition 3.22. Every polynomial defines a continuous function on R.

Proof. The proof is by induction on the degree of the polynomial. First notethat 1(x) = x is not only continuous but uniformly continuous since we canpick δ = ε. Applying Corollary 3.19.2 with f(x) = g(x) = x, we get that thefunction p2(x) = x2 is continuous. Now assume that we have shown thatpn(x) = xn is continuous. Then applying Corollary 3.20.2 again, we havethat pn(x)p1(x) = xn+1 = pn+1(x) is continuous.

Since pn is continuous, applying Corollary 3.20.3 any function of the formanx

n, with an ∈ R a constant is continuous. Since constants are continuous,applying Corollary 3.20.1 to f(x) = a1x and g(x) = a0, we get that any firstdegree polynomial is continuous. Now assume that we have proved thatall polynomials of degree n are continuous and we are given an (n + 1)-stdegree polynomial, q(x) = an+1x

n+1 + qn(x) where qn is an n-th degreepolynomial. By our inductive hypothesis qn is continuous and by our earlierwork, an+1x

n+1 is continuous. Thus, applying Corollary 3.20.1, gives that qis continuous.

Thus, all polynomials are continuous.

Proposition 3.23. Let p, q be polynomials and let E = {x ∈ R : q(x) 6= 0}.Then r : E → R defined by r(x) = p(x)/q(x) is continuous.


Proof. We know that p and q are continuous on R by the above resultand hence they are continuous on E. The result now follows by applyingCorollary 3.20.4 with X = E.

We now prove continuity of our favorite trigonometric functions. We willneed a few important facts and inequalities:

cos(x+y) = cos(x)cos(y)−sin(x)sin(y), sin(x+y) = sin(x)cos(y)+cos(x)sin(y),|sin(x)| ≤ |x|, |1− cos(x)| ≤ |x|.

The first two equalities are the double angle formulas. The two inequalitiesfollow by examining the unit circle and recalling that x in radian measureis the length of the arc.

Proposition 3.24. The functions, sin, cos : R → R are both Lipschitzcontinuous with constant 2.

Proof. We have that

|sin(x)− sin(y)| = |sin((x− y) + y)− sin(y)| =|sin(x− y)cos(y) + cos(x− y)sin(y)− sin(y)| ≤|sin(x− y)cos(y)|+ |(cos(x− y)− 1)sin(y)| ≤

|sin(x− y)|+ |cos(x− y)− 1| ≤ 2|x− y|.

Thus, d(sin(x), sin(y)) ≤ 2d(x, y).

Recall that by Problem 3.16, we have that the functions sin and cosare not only continuous on R but are uniformly continuous on R. ApplyingCorollary 3.20.4, we have that the functions tan, cot, sec, cosec are continu-ous wherever their denominators do not vanish.

We now look at some multivariable functions. We formally define func-tions, xi : Rk → R by xi((a1, ..., ak)) = ai. Thus, xi is the function that yieldsthe i-th coordinate of a point. These are called the coordinate functionsof a point. By a polynomial in the coordinate functions, we meanany function p : Rk → R that can be expressed as a finite sum of productsof constants times products of coordinate functions of points. For example,6x2

1x53 + 2x3

2 + 7 is a polynomial in the coordinate functions.

Proposition 3.25. Every polynomial in the coordinate functions defines acontinuous function from the Euclidean space Rk to R.

3.3. CONTINUITY AND LIMITS 49

Proof. Given p = (a1, ..., ak) and q = (b1, ..., bk) two points in Rk, we havethat d(xi(p), xi(q)) = |xi(p) − xi(q)| = |ai − bi| ≤ d2(p, q). This shows thateach coordinate function is a Lipschitz continuous function from (Rk, d2) toR. The remainder of the proof follows as in the case of ordinary polynomials.First, one does induction to show that every monomial function is continuousand then do an induction on the number of terms in the polynomial.

Due 10/26:

Problem 3.26. Prove that the function g : [0,+∞)→ R defined by g(x) =√x is continuous.

Problem 3.27. Prove or disprove that the above function is uniformly con-tinuous.

3.3 Continuity and Limits

In this section we generalize the concept of limit and establish the connec-tions between limits and continuity. The first result gives a sequential testfor continuity.

Theorem 3.28. Let (X, d) and (Y, ρ) be metric spaces, let f : X → Y be afunction and let p0 ∈ X. Then f is continuous at p0 if and only if whenever{pn} ⊆ X is a sequence that converges to p0, the sequence {f(pn)} ⊆ Yconverges to f(p0).

Proof. Assume that f is continuous at p0 and let ε > 0. Then there existsδ > 0, such that d(p0, p) < δ implies that ρ(f(p0), f(p)) < ε. Since limn pn =po, there is N so that n > N implies that d(p0, p) < δ. Hence, if n > N,then ρ(f(p0), f(pn)) < ε. This proves that limn f(pn) = f(p0).

To prove the converse, we show its contrapositive. So assume that f isnot continuous at p0. Then there exists ε > 0, for which we can obtain noδ that satisfies the definition of continuity. The fact that δ = 1/n fails tosatisfy the definition, means that there exists a point pn, with d(p0, pn) <1/n, but ρ(f(p0), f(pn)) ≥ ε.

In this manner we obtain a sequence {pn} with d(p0, pn) < 1/n andhence limn pn = p0. Yet ρ(f(p0), f(pn)) ≥ ε for every n, and so f(p0) cannotbe the limit of the sequence {f(pn)}.

Corollary 3.29. Let (X, d) and (Y, ρ) be metric spaces and let f : X → Y.Then f is continuous if and only if whenever {pn} ⊆ X is a convergentsequence we have limn f(pn) = f(limn pn).


There is another notion of limit, familiar from calculus, that is alsorelated to continuity.

Definition 3.30. Let (X, d) and (Y, ρ) be metric spaces, let p0 ∈ X be acluster point, let f : X\{p0} → Y be a function and let q0 ∈ Y. We write

limp→p0

f(p) = q0

provided that for every ε > 0 there is a δ > 0 such that whenever d(p0, p) < δand p 6= p0, then ρ(q0, f(p)) < ε.

Note that we need p0 to be a cluster point to guarantee that the set ofp’s satisfying d(p0, p) < δ and p 6= p0, is non-empty. Often, when using thisdefinition, the function f is actually defined on all of X, in which case wesimply ignore its value at p0.

We leave the proof of the following result as an exercise.

Theorem 3.31. Let (X, d) and (Y, ρ) be metric spaces, let f : X → Y bea function and let p0 ∈ X be a cluster point. Then f is continuous at p0 ifand only if limp→p0 f(p) = f(p0).

Lemma 3.32. Let (X, d) and (Y, ρ) be metric spaces and let f : X → Y beany function. If p0 ∈ X is not a cluster point, then f is continuous at p0.

Proof. Since p0 is not a cluster point, there exists r > 0 so that d(p0, p) < rimplies that p = p0. Thus, given any ε > 0, if we let δ = r, then d(p0, p) < δimplies that ρ(f(p0), f(p)) = 0 < ε.

Corollary 3.33. Let (X, d) and (Y, ρ) be metric spaces, and let f : X → Ybe a function. Then f is continuous if and only if for every cluster pointp0 ∈ X, we have that limp→p0 f(p) = f(p0).

Due 10/26:

Problem 3.34. Prove Theorem 3.31.


Problem 3.36. Let (X, d) and (Y, ρ) be metric spaces, let f : X → Y bea function and let p0 ∈ X be a cluster point. Prove that limp→p0 f(p) = q0if and only if for every sequence of points {pn} ⊆ X such that pn 6= p0 andlimn pn = p0, we have limn f(pn) = q0.

3.4. CONTINUOUS FUNCTIONS AND COMPACT SETS 51

3.4 Continuous Functions and Compact Sets

If we let C = {(x, y) ∈ R2 : xy = 1} then it is not hard to see that C isa closed subset of R2. Also we know that the function f : R2 → R givenby f((x, y)) = x is a continuous function, it is the first coordinate function.But the image f(C) = {x ∈ R : x 6= 0} is not a closed subset of R. Thus,the continuous image of a closed set need not be a closed set. The story isquite different for compact sets.

Theorem 3.37. Let (X, d) and (Y, ρ) be metric spaces, and let f : X → Ybe a continuous function. If K ⊆ X is compact, then its image f(K) ⊆ Yis compact.

Proof. Let {Uα}α∈A be an open cover of f(K), then each of the sets f−1(Uα)is open and {f−1(Uα)}α∈A is an open cover of K. Since K is compact, thereis a finite subset F ⊆ A such that {f−1(Uα)}α∈F covers K.

Note that f(f−1(Uα)) = Uα. Hence, {Uα}α∈F covers f(K). Since everyopen cover of f(K) has a finite subcover, f(K) is compact.

Corollary 3.38. Let (X, d) and (Y, ρ) be metric spaces, and let f : X → Ybe a continuous function. If X is compact, then f(X) is a bounded subsetof Y.

Thus, when X is non-empty, compact and f : X → R is continuous,then there exists M such that |f(x)| ≤M for every x ∈ X. So in particularsup{f(x) : x ∈ X} and inf{f(x) : x ∈ X} exists. The following shows thatnot only does the supremum and infimum exist but that they are attained.

Corollary 3.39. Let (X, d) be a non-empty compact metric space and letf : X → R be continuous. Then there are points xm, xM ∈ X, such that forany x ∈ X, f(xm) ≤ f(x) ≤ f(xM ). That is, f(xm) = inf{f(x) : x ∈ X}and f(xM ) = sup{f(x) : x ∈ X}.

Proof. The proof is similar to the proof of Proposition 1.126. Choose asequence of points {xn} such that limn f(xn) = inf{f(x) : x ∈ X}. Thensince X is sequentially compact, there is a subsequence {xnk

} that convergesto some point xm. Since f is continuous, f(xm) = limk f(xnk

) = inf{f(x) :x ∈ X}. The proof for the supremum is similar.

The above result gives the proof that whenever f : [a, b]→ R is continu-ous, then f attains its maximum and minimum value. First, by Heine-Borelthe interval [a, b] is compact, now apply the above result. Note that (0, 1)is a bounded interval, f(x) = 1/x is continuous on this set but is not evenbounded.


Theorem 3.40. Let (X, d) and (Y, ρ) be metric spaces, and let f : X → Ybe a continuous function. If X is compact, then f is uniformly continuous.

Proof. Given ε > 0, since f is continuous at p, there is a δ(p) > 0, such thatd(p, q) < δ(p) implies that ρ(f(p), f(q)) < ε/2.

The collection of sets {B(p; δ(p)2 )}p∈X is an open cover of X. Hence, thereexists a finite collection of points, {p1, ..., pn} so that

X ⊆ B(p1;δ(p1)

2) ∪ · · · ∪B(pn;

δ(pn)2

).

Let δ = 12 min{δ(p1), ..., δ(pn)}.

Now let p, q ∈ X be any points with d(p, q) < δ. Because the above ballsare a cover, there exists an i, so that p ∈ B(pi;

δ(pi)2 ). Hence, ρ(f(pi), f(p)) <

ε/2.Also, d(pi, q) ≤ d(pi, p)+d(p, q) < δ(pi)2 +δ ≤ δ(pi) and so, ρ(f(pi), f(q)) <

ε/2. Thus, we have that ρ(f(p), f(q)) ≤ ρ(f(p), f(pi)) + ρ(f(pi), f(q)) <ε.

Thus, for example any rational function r(x) = p(x)/q(x), such thatq(x) 6= 0 on the set [a, b] will be uniformly continuous.

Problem 3.41. Let (X, d) be a metric space, fix p ∈ X and define f : X →R by f(x) = d(p, x). Prove that f is continuous. Use this fact to give anotherproof of Proposition 1.126.

3.5 Connected Sets and the Intermediate ValueTheorem

We will see in this section that the intermediate value theorem from calculusis really a consequence of the fact that an interval of real numbers is aconnected set. First we need to define this concept.

Definition 3.42. A metric space (X, d) is connected if the only subsetsof X that are both open and closed are X and the empty set. A subset Sof X is called connected provided that the subspace (S, d) is a connectedmetric space. If S is not connected then we say that S is disconnected orseparated.

Proposition 3.43. A metric space (X, d) is disconnected if and only if Xcan be written as a union of two disjoint, non-empty open sets.

3.5. CONNECTED SETS AND THE INTERMEDIATE VALUE THEOREM53

Proof. We have that (X, d) is disconnected if and only if there is a subsetA with A 6= X and A non-empty and A is open and closed. Let B = Ac,then since A is closed B is open and since A 6= X, B is non-empty. Thus,X = A ∪B expresses X as a disjoint union of two non-empty open sets.

Conversely, if X = A ∪ B with A and B disjoint, non-empty and open.Then necessarily A = Bc and so A is both open and closed. Since neitherset was empty, A 6= X and A is non-empty.

Example 3.44. If (X, d) is a discrete metric space with two or more points,then X is disconnected since X = {p0} ∪ {p0}c expresses X as a disjointunion of two non-empty open sets.

We now come to perhaps the most important example of a connectedspace. By an interval in R we mean either an open interval, closed interval,or half-open interval. The endpoints can be either an actual number or +∞or −∞.

Theorem 3.45. Let I ⊆ R be an interval or all of R. Then I is a connectedset.

Proof. Suppose not, then we could write I = A ∪ B where A and B areboth non-empty and open in the metric space (I, d) where d is the usualmetric. Let a ∈ A and b ∈ B. Without loss of generality, we can assumethat a < b(otherwise just change the names of A and B). Let A1 = A∩ [a, b],and B1 = B∩[a, b]. Then A1 and B1 are disjoint, non-empty open sets in thetopological space ([a, b], d). Since A1 is closed and bounded, it is compactand hence c = sup{x : x ∈ A1} is an element of A1. Hence, c < b. But sinceA1 is open, there is an open interval centered at c that is still in A1. Thisopen interval contains points in A1 that are larger than c. This contradictioncompletes the proof.

Now we come to a general version of the Intermediate Value Theorem.

Theorem 3.46 (Intermediate Value Theorem for Metric Spaces). Let (X, d)be a connected metric space and let f : X → R be a continuous function. Ifx0, x1 ∈ X with f(x0) < L < f(x1), then there is x2 ∈ X with f(x2) = L.

Proof. Suppose not. Then we would have thatX = f−1((−∞, L))∪f−1((L,+∞)).Since f is continuous, both of these sets are open, with x0 in the first set andx1 in the second set. Thus, X is written as a disjoint union of two non-emptyopen sets. This makes each of these sets open and closed, contradiction.


Corollary 3.47 (Intermediate Value Theorem). Let I ⊆ R be an intervalor the whole real line and let f : I → R be continuous. If x0, x1 ∈ I andf(x0) < L < f(x1), then there is x2 between x0 and x1 with f(x2) = L.

Proof. If x0 < x1, apply the theorem to the connected space [x0, x1], ifx1 < x0 apply the theorem to the connected space [x1, x0].

Theorem 3.48. Let (X, d) and (Y, ρ) be metric spaces and let f : X → Ybe continuous. If X is connected, then f(X) ⊆ Y is a connected subset.

Proof. Let S = f(X) and let (S, ρ) be the subspace. Then f : X → S is alsocontinuous. If we could write S = A∪B as a disjoint union of two non-emptyopen sets, then we would have that X = f−1(A)∪ f−1(B) expresses X as adisjoint union of non-empty open sets. Hence, S must be connected.

Definition 3.49. A metric space (X, d) is called pathwise connectedprovided that for any two points, a, b ∈ X there exists a continuous functionf : [0, 1]→ X such that f(0) = a and f(1) = b.

Intuitively, a space is pathwise connected if and only if you can draw a“curve” between any two points with no breaks in the curve.

Due 11/2:

Problem 3.50. Prove that if (X, d) is pathwise connected, then (X, d) isconnected.

Example 3.51. The subset of the plane defined by

X = {(x, sin(1x

)) : 0 < x ≤ 1} ∪ {(0, y) : −1 ≤ y ≤ +1}

is a space that is connected, but not pathwise connected.

Due 11/2:

Problem 3.52. Let X = {(x, y) : a ≤ x ≤ b, c ≤ y ≤ d} ⊆ R2. Prove thatX is pathwise connected.

Definition 3.53. Let g : [a, b] → R. By the graph of g we mean the setG = {(x, g(x)) : a ≤ x ≤ b} ⊆ R2.

Not due:

Problem 3.54. Let g : [a, b]→ R. Prove that g is continuous if and only ifthe graph of g is a pathwise connected subset of the plane.

This last problem is often used in calculus to intuitively describe con-tinuous functions as those whose graph can be drawn without lifting yourpencil.

Chapter 4

The Contraction MappingPrinciple

At this point we would like to present an important result that uses theconcepts that we have developed and together with some of its applications.Logically, this material should be done later, since it uses facts from calculusabout derivatives and integrals which we have not yet discussed. But we findthat a little practical math at this stage helps to motivate the rest of thecourse.

Definition 4.1. Let (X, d) be a metric space. A function f : X → X iscalled a contraction mapping provided that there is r, 0 < r < 1, so thatd(f(x), f(y)) ≤ rd(x, y) for every x, y ∈ X.

Note that saying that f is a contraction mapping is the same as sayingthat it is Lipschitz continuous with constant r < 1. It is important to notethat when we say that f is a contraction mapping that is stronger than justsaying that d(f(x), f(y)) < d(x, y), since we need the r.

Given a function f : X → X any point satisfying f(x0) = x0 is called afixed point of the function.

Theorem 4.2 (Contraction Mapping Principle). Let (X, d) be a completemetric space and let f : X → X be a contraction mapping. Then:

1. there exists a unique point x0 ∈ X, such that f(x0) = x0,

2. if x1 ∈ X is any point and we define a sequence inductively, by settingxn+1 = f(xn), then limn xn = xo,

3. for this sequence, we have that d(x0, xn) ≤ d(x2,x1)rn−1

1−r .

55

56 CHAPTER 4. THE CONTRACTION MAPPING PRINCIPLE

Proof. First, we show that the inductively defined sequence does converge.To see this, since X is complete, it will be enough to show that the sequenceis Cauchy.

Let A = d(x2, x1). Then we have that d(x3, x2) = d(f(x2), f(x1)) ≤rd(x2, x1) = rA. Similarly, d(x4, x3) = d(f(x3), f(x2)) ≤ rd(x3, x2) ≤ r2A.By induction, we prove that d(xn+1, xn) ≤ rn−1A.

Now, if m > n, then d(xm, xn) ≤ d(xm, xm−1) + · · · + d(xn+1, xn) ≤∑m−1k=n r

k−1A ≤ Arn−1

1−r . Given any ε > 0, we may choose an integer N such

that ArN

1−r < ε. Then if m,n > N, we have that d(xm, xn) < ε. Thus, {xn}n∈Nis Cauchy.

Let x0 = limn xn. Then since f is continuous, f(x0) = limn f(xn) =limn xn+1 = x0. to a point x0 satisfying f(x0) = x0.

Now if we fix any n, then d(x0, xn) = limm d(xm, xn) ≤ Arn−1

1−r , by theabove estimate, which proves 3).

Thus, we know that there is a point x0, with f(x0) = x0 and that thissequence converges to one such point. To complete the proof of the theoremit will be enough to show that if f(x0) = x0 and f(p0) = p0, then p0 = x0.To see this last fact, note that d(p0, x0) = d(f(p0), f(x0)) ≤ rd(p0, x0). Sincer < 1, this implies that d(p0, x0) = 0.

Thus, the contraction mapping principle not only guarantees us the exis-tence of a fixed point, but shows that there is a unique fixed point and givesus a method for approximating the fixed point, together with an estimate ofhow close the sequence xn is to the fixed point! This is a remarkable amountof information.

Here’s a typical application of this theorem. Let f(x) = cos(x) andnote that since 1 < π/2 for 0 ≤ x ≤ 1, we have that 0 ≤ f(x) ≤ 1. Thus,f : [0, 1] → [0, 1] and is continuous. Also, by the mean value theorem, for0 ≤ x ≤ y ≤ 1, there is c, x ≤ c ≤ y with

f(y)− f(x) = f ′(c)(y − x) = −sin(c)(y − x).

Hence, |f(y)− f(x)| ≤ sin(c)|y − x| ≤ sin(1)|y − x|. Thus, f(x) = cos(x) isa contraction mapping with r = sin(1) < 1.

Using the fact that [0, 1] is complete, by the contraction mapping princi-ple, there is a unique point, 0 ≤ x0 ≤ 1, such that cos(x0) = x0. Moreover,we can obtain this point(or at least approximate it) by choosing any numberx1, 0 ≤ x1 ≤ 1 and forming the inductive sequence xn+1 = cos(xn).

The third part of the theorem gives us an estimate of the distance be-tween our “approximate” fixed point xn and the true fixed point. In par-

4.1. APPLICATION: NEWTON’S METHOD 57

ticular, if we pick x1 = 0, then x2 = cos(x1) = 1, so |x2 − x1| = 1. Hence,|x0 − xn| ≤ sin(1)n−1

1−sin(1) .

Due 11/11/11:

Problem 4.3. Let f(x) = x2 + 1

x . Use basic calculus to show that f : [1, 2]→[1, 2] and use the mean value theorem to show that f is a contraction. Showthat the unique fixed point of f is

√2. Using x1 = 3

2 , give an estimate on|√

2− xn|.

4.1 Application: Newton’s Method

Newton’s method gives an iterative method for approximating the solutionto an equation of the form f(x) = 0, when f is differentiable.

The idea of the iteration is to choose any x1 and then get an “improved”estimate to the solution by tracing the tangent line to the graph at the point(x1, f(x1)) until it intersects the x-axis and letting this determine the pointx2. The formula that one obtains is

x2 = x1 −f(x1)f ′(x1)

.

Newton’s method consists of repeating this formula iteratively to generatea sequence of points

xn+1 = xn −f(xn)f ′(xn)

that, hopefully, converge to the zero of the function.Note that if we set g(x) = x− f(x)

f ′(x) , then f(x) = 0 if and only if g(x) = x.Thus, finding a zero of f is equivalent to finding a fixed point of g. Moreover,the above iteration is simply computing xn+1 = g(xn).

Thus, if we can construct an interval [a, b] such that g : [a, b]→ [a, b] andsuch that g is a contraction mapping on [a, b] then we will have a criterionfor convergence of Newton’s method.

The details are below.

Theorem 4.4 (Newton’s Method). Let f be a twice continuously differ-entiable function and assume that there is a point x0 with f(x0) = 0 andf ′(x0) 6= 0. Then there is M > 0 so that for any x1 with x0 −M ≤ x1 ≤x0 + M the sequence of points obtained by Newton’s method starting at x1

converges to x0.


Proof. Fix an r, 0 < r < 1. Let g be as above, then g′(x) = f(x)f ′′(x)(f ′(x))2 , which is

continuous in a neighborhood of x0. Since g′(x0) = 0, there is some M > 0,so that |x − x0| ≤ M implies that |g′(x)| ≤ r. Hence, for any x, in theinterval I = [x0 −M,x0 + M ], we have that |g(x) − x0| = |g(x) − g(x0)| =|g′(c)(x− x0)| ≤ rM < M. Hence, x ∈ I implies that g(x) ∈ I.

Thus, g : I → I and for any x, y ∈ I, |g(x) − g(y)| = |g′(c)(x − y)| ≤r|x−y|, since c ∈ I. Hence, we may apply the contraction mapping principleto g to obtain the desired result.

There are two difficulties in using the above theorem. First, the intervalis centered at x0, which is a value that we are trying to obtain! The secondlies in having enough detailed information about g to explicitly find M.However, since the proof of Newton’s method uses the contraction mappingprinciple, in some cases it is possible to say a great deal.

Due 11/11/11:

Problem 4.5. Let f(x) = x2 − 5. Show that a zero of this equation liessomewhere in the interval [2, 3]. Calculate g(x) for this function and provethat g : [2, 3] → [2, 3] and |g′(x)| ≤ 1/2 for x ∈ [2, 3]. Prove that if weperform Newton’s method with x1 = 2, then |xn −

√5| ≤ (1/2)n.

Due 11/11/11:

Problem 4.6. Let f(x) = x − cos(x), so that x0 = cos(x0), the problemthat we studied earlier. Compute the corresponding g for this f and find aconcrete interval I = [a, b] with 0 ≤ a < b ≤ 1 such that g : I → I is acontraction mapping. How does this r compare to the earlier r from the lastsection?

4.2 Application: Solution of ODE’s

In this section we show how the contraction mapping principle can be usedto deduce that solutions exist and are unique for some very complicatedordinary differential equations.

Starting with a continuous function h(x, y) and an intial value y0, wewish to solve

dy

dx= h(x, y), y(a) = y0.

That is, we seek a function f(x) so that f ′(x) = h(x, f(x)) for a < x < band f(a) = y0. Such a problem is often called an initial value problem(IVP).

4.2. APPLICATION: SOLUTION OF ODE’S 59

For example, when h(x, y) = x2y3, then we are trying to solve, dydx = x2y3

which can be done by separation of variables. But our function could beh(x, y) = sin(xy), in which case the differential equation becomes dy

dx =sin(xy), which cannot be solved by elementary means.

For this application, we will use a number of things that we have not yetdeveloped fully. But again, we stress that we are seeking motivation.

First note that by the fundamental theorem a continuous function f :[a, b]→ R satisfies the IVP if and only if for a ≤ x ≤ b,

f(x) =∫ x

af ′(t)dt+ y0 =

∫ x

ah(t, f(t))dt+ y0.

This is often called the integral form of the IVP.Now let C([a, b]) denote the set of all real-valued continuous functions

on the interval [a, b]. Note that if we are given any f ∈ C([a, b]) and we set

g(x) =∫ x

ah(t, f(t))dt+ y0,

then g is also a continuous function on [a, b].Thus, we can define a map Φ : C([a, b])→ C([a, b]) by letting Φ(f) = g,

where f and g are as above. We see that solving our IVP is the same asfinding a fixed point of the map Φ! Also, we are now in a situation, whereby a “point” in our space C([a, b]), we mean a function.

This is starting to look like an application of the contraction mappingprinciple. For this we would first need a metric d on the set C([a, b])(againpoints in this metric space are functions!) so that (C([a, b]), d) is a completemetric space and then we would need Φ to be a contraction mapping.

It turns out that there is such a metric on C([a, b]) and that for manyfunctions h the corresponding map Φ is a contraction mapping.

First for the metric. Given any two functions f, g ∈ C([a, b]), we set

d(f, g) = sup{|f(x)− g(x)| : a ≤ x ≤ b}.

This is the example that was introduced in the our first section on metricspaces. Note that since f, g are continuous functions on the compact metricspace [a, b], we have that the continuous function f − g is bounded. Hence,the supremum is finite. Also, it is clear that d(f, g) = 0 if and only iff(x) = g(x) for all x, i.e., if and only if f = g. Note that d(f, g) = d(g, f).


Finally, if f, g, h ∈ C([a, b]), then

d(f, g) = sup{|f(x)− h(x) + h(x)− g(x)| : a ≤ x ≤ b} ≤sup{|f(x)− h(x)|+ |h(x)− g(x)| : a ≤ x ≤ b} ≤

sup{|f(x)− h(x)| : a ≤ x ≤ b}+ sup{|h(x)− g(x)| : a ≤ x ≤ b} =d(f, h) + d(h, g).

Thus, we see that d is indeed a metric on C([a, b]). To apply the con-traction mapping principle, we need this metric space to be complete. Thisfact is shown by the following theorem.

Theorem 4.7. The metric space (C([a, b]), d) is complete.

Proof. We need to prove that if {fn} ⊆ C([a, b]) is a Cauchy sequence inthe d metric, then there is a continuous function f, to which it converges.First, note that is we fix x, a ≤ x ≤ b, then we have that

|fn(x)− fm(x)| ≤ sup{|fn(t)− fm(t)| : a ≤ t ≤ b} = d(fn, fm).

Thus, for each fixed x, the sequence of real numbers {fn(x)} is Cauchy andhence converges to a real number.

Thus, we may define a function by setting, f(x) = limn fn(x). Doing thisfor each a ≤ x ≤ b, defines f : [a, b]→ R.

We now wish to prove that f is a continuous function, so that f definesa point in the metric space C([a, b]) and then prove that the sequence {fn}converges to this point.

First to see that f is continuous at some point x0, given ε > 0, pick Nso that n,m > N implies that d(fn, fm) < ε/3. Now fix an n > N, then wehave that

|f(x)− fn(x)| = limm|fm(x)− fn(x)| ≤ ε/3.

Using that fn is continuous, pick δ > 0, so that |x0−x| < δ implies that|fn(x0)− fn(x)| < ε/3. Then for |x0 − x| < δ we have that

|f(x0)− f(x)| = |f(x0)− fn(x0) + fn(x0)− fn(x) + fn(x)− f(x)| ≤|f(x0)− fn(x0)|+ |fn(x0)− fn(x)|+ |fn(x)− f(x)| < ε.

Thus, we have that f is continuous at x0 and since x0 was arbitrary, f iscontinuous.

Finally, we must show that limn d(f, fn) = 0. But to see this, givenε > 0, take N as before and recall that we have already shown, that |f(x)−fn(x)| ≤ ε/3, for any n > N. Hence, for n > N, we have that d(f, fn) ≤sup{|f(x)− fn(x)| : a ≤ x ≤ b} ≤ ε/3 < ε.

4.2. APPLICATION: SOLUTION OF ODE’S 61

We now have all the tools at our disposal needed to solve some IVP’s.

Theorem 4.8. Let h(x, y) be a continuous function on [a, b]×R and assumethat |h(x, y1)− h(x, y2)| ≤ K|y1 − y2| with r = K(b− a) < 1. Then for anyy0, the initial value problem, f ′(x) = h(x, f(x)), f(a) = y0 has a uniquesolution on the interval [a, b].

Proof. Consider the mapping Φ : C([a, b])→ C([a, b]) defined by

Φ(f)(x) =∫ x

ah(t, f(t))dt+ y0.

Given f, g ∈ C([a, b]), we have that

|Φ(f)(x)− Φ(g)(x)| = |∫ x

ah(t, f(t))− h(t, g(t))dt| ≤∫ x

a|h(t, f(t))− h(t, g(t))|dt ≤

∫ x

aK|f(t)− g(t)|dt ≤∫ x

aKd(f, g)dt = K(x− a)d(f, g) ≤ rd(f, g).

Thus, Φ is a contraction mapping in the metric d. Since (C([a, b]), d) isa complete metric space, by the contraction mapping principle, there existsa unique f ∈ C([a, b]), such that Φ(f) = f. Earlier, we saw that an f wasa solution to the IVP if and only if Φ(f) = f. Thus, the solution not onlyexists but it is unique.

The contraction mapping principle not only gives us the existence anduniqueness of solutions to the IVP, but it also gives us a method for approx-imating solutions and very nice bounds on the error of the approximation.

The above theorem is far from the most useful, because the conditions onh(x, y) are too restrictive for most applications. For example, h(x, y) = x3y2

doesn’t satisfy our hypothesis. For this reason you will seldom see it in atextbook. But it is does have the advantage of being the simplest to proveand we believe that it illustrates the key guiding principles of the proofs ofmore complicated results.

Due 11/11/11:

Problem 4.9. Consider the IVP on the interval [0, π/2] given by dydx =

sin(xy)5 , y(0) = 1. Prove that the corresponding map Φ is a contraction map-

ping with r = π2

10 (if you get a better bound, that is good!). Let f denotethe unique solution to this IVP and let fn+1 = Φ(fn) denote the sequencethat one obtains starting with f1 equal to the constant function 1. Give anestimate on d(f, fn).


Chapter 5

Riemann andRiemann-StieltjesIntegration

In this chapter we develop the theory of the Riemann integral, which isthe type of integration used in your calculus courses and we also introduceRiemann-Stieltjes integration which is widely used in probability, statisticsand financial mathematics.

Given a closed interval [a, b] by a partition of [a, b] we mean a setP = {x0.x1, , ..., xn−1, xn} with a = x0 < x1 < ... < xn = b. The norm orwidth of the partition is

‖P‖ = max{xi − xi−1 : 1 ≤ i ≤ n}.

Given two partitions P1 and P2 we say that P2 is a refinement of P1 orP2 refines P1 provided that as sets P1 ⊆ P2. Note that if P2 refines P1,then ‖P2‖ ≤ ‖P1‖.

Given a bounded function f : [a, b]→ R and a partition P = {x0, ..., xn}of [a, b] for i = 1, ..., n, we set

Mi = sup{f(x) : xi−1 ≤ x ≤ xi}

andmi = inf{f(x) : xi−1 ≤ x ≤ xi}.

The upper Riemann sum of f given the partition P is the real number,

U(f,P) =n∑i=1

Mi(xi − xi−1),

63

64CHAPTER 5. RIEMANN AND RIEMANN-STIELTJES INTEGRATION

and the lower Riemann sum of f is

L(f,P) =n∑i=1

mi(xi − xi−1).

Note that if we hadn’t assumed that f is a bounded function then someof the numbers Mi or mi would have been infinite. This is the one reasonthat we can only define Riemann integrals for bounded functions.

By a general Riemann sum of f given P, we mean a sum of the form

n∑i=1

f(x′i)(xi − xi−1),

where x′i is any choice of points satisfying, xi−1 ≤ x′i ≤ xi, for i = 1, ..., n.Since mi ≤ f(x′i) ≤ Mi, the upper and lower Riemann give an upper andlower bound for general Riemann sums, i.e.,

L(f,P) ≤n∑i=1

f(x′i)(xi − xi−1) ≤ U(f,P).

In fact, since we can choose the points x′i so that f(x′i) is arbitrarily closeto Mi, we see that U(f,P) is actually the supremum of all general Riemannsums of f given P. Similarly, by choosing the points x′i so that f(x′i) isarbitrarily close to mi, we see that L(f,P) is the infimum of all generalRiemann sums.

Thus, if we want all general Riemann sums of a function to be “close”to a value that we wish to think of as the “integral of f”, then it will beenough to study the “extreme” cases of the upper and lower sums.

Proposition 5.1. Let f : [a, b] → R be a bounded function and let P1 andP2 be partitions of [a, b] with P2 a refinement of P1. Then

L(f,P1) ≤ L(f,P2) ≤ U(f,P2) ≤ U(f,P1).

Proof. We have already observed that for any partition, L(f,P) ≤ U(f,P).So we only need to consider the other two inequalities.

It is enough to prove these inequalities in the case that P2 is obtainedfrom P1 by adding just one extra point. Because if we do this then P2 canbe obtained from P1 by successively adding a point at a time and in eachcase the inequalities “move” in the right direction.

So let us set P1 = {x0, ..., xn} so that for some j, P2 = {x0, ...., xj−1, x̂, xj , ..., xn}where xj−1 < x̂ < xj .

65

We start with the case of the upper sums. Note that most of the termsin the sum for U(f,P1) and U(f,P2) will be equal, except that the one termMj(xj − xj−1) in the sum for U(f,P1) will be replaced by two terms in thesum for U(f,P2). These two new terms are M ′j(x̂ − xj−1) + M ′′j (xj − x̂),where

M ′j = sup{f(x) : xj−1 ≤ x ≤ x̂} and M ′′j = sup{f(x) : x̂ ≤ x ≤ xj}.

Since [xj−1, x̂] ⊆ [xj−1, xj ] and [x̂, xj ] ⊆ [xj−1, xj ], we have that M ′j ≤Mj and M ′′j ≤Mj . Hence, the two terms appearing in U(f,P2), satisfy

M ′j(x̂− xj−1) +M ′′j (xj − x̂) ≤Mj(x̂− xj−1) +Mj(xj − x̂) = Mj(xj − xj−1),

which is the one term appearing in U(f,P1). Thus, U(f,P2) ≤ U(f,P1).The proof for the lower sums is similar, except that since we are dealing

with infima, we will have that

m′j = inf{f(x) : xj−1 ≤ x ≤ x̂} ≥ mj and m′′j = inf{f(x) : x̂ ≤ x ≤ xj} ≥ mj .

Definition 5.2. Let f : [a, b]→ R be a bounded function. Then the upperRiemann integral of f is the number∫ b

af(x)dx = inf{U(f,P) : P a partition of [a, b]}.

The lower Riemann integral of f is the number∫ b

a

f(x)dx = sup{L(f,P) : P a partition of [a, b]}.

We say that f is Riemann integrable when these two numbers are equaland in this case we define the Riemann integral of f to be∫ b

af(x)dx =

∫ b

af(x)dx =

∫ b

a

f(x)dx.

To help cement these definitions, let us consider the function, f : [a, b]→R defined by

f(x) =

{1 x rational0 x irrational,


then for any partition P we will have that Mi = 1 and mi = 0 for every i.Hence, U(f,P) = (b− a) and L(f,P) = 0. Thus,∫ b

af(x)dx = b− a and

∫ b

a

f(x)dx = 0.

In particular, f is not Riemann integrable.The following helps to explain the terms “upper” and “lower”.

Proposition 5.3. Let f : [a, b] → R be a bounded function and let P1 andP2 be any two partitions of [a, b]. Then L(f,P1) ≤ U(f,P2) and

∫ baf(x)dx ≤∫ b

af(x)dx.

Proof. Let P3 be the partition of [a, b] that as a set is P3 = P1 ∪ P2. ThenP3 is a refinement of P1 and of P2. So by the above result,

L(f,P1) ≤ L(f,P3) ≤ U(f,P3) ≤ U(f,P2).

Now we have that∫ b

a

f(x)dx = sup{L(f,P1) : P1 is a partition of [a, b]} ≤ U(f,P2),

and so∫ b

a

f(x)dx ≤ inf{U(f,P2) : P2 is a partition of [a, b]} =∫ b

af(x)dx.

The following gives an important means of determining if a function isRiemann integrable.

Theorem 5.4. Let f : [a, b]→ R be a bounded function. Then f is Riemannintegrable if and only if for every ε > 0 there exists a partition P such thatU(f,P)− L(f,P) < ε.

Proof. Assuming that the latter condition is met, for any ε > 0 and P asabove, we have

L(f,P) ≤∫ b

a

f(x)dx ≤∫ b

af(x)dx ≤ U(f,P),

67

which implies that

0 ≤∫ b

af(x)dx−

∫ b

a

f(x)dx ≤ U(f,P)− L(f,P) < ε.

Since ε > 0 was arbitrary, the lower and upper Riemann integrals must beequal.

Conversely, assume that f is Riemann integrable and that ε > 0 is given.Since the upper integral is an infimum, we may choose a partition P1 so that∫ b

af(x)dx ≤ U(f,P1) <

∫ b

af(x)dx+ ε/2.

Similarly, we may choose a partition P2 so that∫ b

a

f(x)dx− ε/2 < L(f,P2) ≤∫ b

a

f(x)dx.

Let P3 = P1 ∪ P2, then

U(f,P3)− L(f,P3) ≤ U(f,P1)− L(f,P2) <

(∫ b

af(x)dx+ ε/2)− (

∫ b

a

f(x)dx− ε/2) = ε

since the upper and lower integrals are equal. Thus, we have found a parti-tion satisfying our criteria.

Definition 5.5. We say that a bounded function f : [a, b] → R satisfiesthe Riemann integrability criterion provided that for every ε > 0, thereexists a partition P, such that U(f,P)− L(f,P) < ε.

Thus, the above theorem says that a bounded function is Riemann inte-grable if and only if it satisfies the Riemann integrability criterion.

Theorem 5.6. Let f : [a, b] → R be a continuous function. Then f isRiemann integrable.

Proof. Since f is continuous and [a, b] is compact, we have that f is boundedand f is uniformly continuous. We will prove that f meets the Riemannintegrability criterion.

Given any ε > 0, since f is uniformly continuous, there is a δ > 0, sothat whenever |x− y| < δ then |f(x)− f(y)| < ε/(b− a). Now let P = {a =


x0, ..., xn = b} be any partition of [a, b] with ‖P‖ < δ. Then for each i, wehave that xi−1 ≤ z, y ≤ xi implies that |f(z) − f(y)| < ε/(b − a). Since fis continuous, we also have that there are points zi and yi in [xi−1, xi] withMi = f(zi) and mi = f(yi). Hence, 0 ≤ Mi −mi < ε/(b − a). This impliesthat

0 ≤ U(f,P)−L(f,P) =n∑i=1

(Mi−mi)(xi−xi−1) <n∑i=1

[ε/(b−a)](xi−xi−1) = ε.

Thus, the Riemann integrability criterion is met by f.

For the following problem, we consider the interval [0, 1] and let Pn ={0, 1/n, ..., (n−1)/n, 1} denote the partition on [0, 1] into n subintervals eachof length 1/n. Thus, ‖Pn‖ = 1/n. We will also use the summation formula,

n∑j=1

j =n(n+ 1)

2.

Problem 5.7. Let f(x) = x. Compute U(f,Pn) and L(f,Pn). Use theseformulas and the Riemann integrability criterion to prove that f is Riemannintegrable on [0, 1] and to prove that

∫ 10 xdx = 1/2.

Problem 5.8. Let a ≤ c < d ≤ b and let f : [a, b] → R be the functiondefined by

f(x) =

0 a ≤ x ≤ c1 c < x < d

0 d ≤ x ≤ b.

Prove that f is Riemann integrable on [a, b] and that∫ ba f(x)dx = (d− c).

5.1 The Riemann-Stieltjes Integral

The Riemann-Stieltjes integral is a slight generalization of the Riemannintegral. The new ingredient in Riemann-Stieltjes integration is a function,

α : [a, b]→ R

that is increasing, i.e., x ≤ y implies that α(x) ≤ α(y). It is best to think ofα as a function that measures a new “length” of subintervals by setting thelength of a subinterval [xi−1, xi] equal to α(xi) − α(xi−1). One case wherethis concept arises is if we imagine that we have a piece of wire of varying

5.1. THE RIEMANN-STIELTJES INTEGRAL 69

density stretched from a to b and α(xi) − α(xi−1) represents the weight ofthe section of wire from xi−1 to xi.

Given a bounded function f : [a, b] → R the Riemann-Stieltjes integralis denoted ∫ b

afdα,

and it is designed to also define a “ signed area” under the graph of f butnow if we want the area of a rectangle to be the length of the base times theheight, then a rectangle from xi−1 to xi of height h should have area

h(α(xi)− α(xi−1)).

Thus, given a bounded function f an increasing function α, a partitionP = {a = x0, ...., xn = b}, the numbers Mi = sup{f(x) : xi−1 ≤ x ≤ xi} andmi = inf{f(x) : xi−1 ≤ x ≤ xi}, we are led to define the upper Riemann-Stieltjes sum as

U(f,P, α) =n∑i=1

Mi(α(xi)− α(xi−1))

and the lower Riemann-Stieltjes sum as

L(f,P, α) =n∑i=1

mi(α(xi)− α(xi−1)).

The upper Riemann-Stieltjes integral of f with respect to α isthen defined to be∫ b

afdα = inf{U(f,P, α) : P is a partition of [a, b]}.

Similarly, the lower Riemann-Stieltjes integral of f with respect toα is defined to be∫ b

a

fdα = sup{L(f,P, α) : P is a partition of [a, b]}.

When ∫ b

afdα =

∫ b

a

fdα,


then we say that f is Riemann-Stieltjes integrable with respect to αand we let ∫ b

afdα

denote this common value.We repeat the key facts about Riemann-Stieltjes integration below. Since

the proofs are almost identical to the corresponding proofs in the case ofRiemann integration, we omit the details.

Proposition 5.9. Let f : [a, b]→ R be a bounded function, let α : [a, b]→ Rbe an increasing function and let P1 and P2 be partitions of [a, b] with P2 arefinement of P1. Then

L(f,P1, α) ≤ L(f,P2, α) ≤ U(f,P2, α) ≤ U(f,P1, α).

Proposition 5.10. Let f : [a, b]→ R be a bounded function, let α : [a, b]→R be increasing and let P1 and P2 be any two partitions of [a, b]. Then

L(f,P1, α) ≤ U(f,P2, α) and∫ baf(x)dα ≤

∫ baf(x)dα.

Theorem 5.11. Let f : [a, b]→ R be a bounded function and let α : [a, b]→R be increasing. Then f is Riemann-Stieltjes integrable with respect to α ifand only if for every ε > 0 there exists a partition P such that U(f,P, α)−L(f,P, α) < ε.

Definition 5.12. Given an increasing function α : [a, b]→ R, we say that abounded function f : [a, b] → R satisfies the Riemann-Stieltjes integra-bility criterion with respect to α provided that for every ε > 0, thereexists a partition P, such that U(f,P, α)− L(f,P, α) < ε.

Thus, the above theorem says that a bounded function is Riemann-Stieltjes integrable with respect to α if and only if it satisfies the Riemann-Stieltjes integrability criterion with respect to α.

Theorem 5.13. Let f : [a, b] → R be a continuous function and let α :[a, b]→ R be an increasing function. Then f is Riemann-Stieltjes integrablewith respect to α.

One of the main motivations for Riemann-Stieltjes integration comesfrom the concept of a cumlative distribution function of a random variable.To understand the Riemann-Stieltjes integral, one need not understand anyprobability theory, but we introduce these ideas from probability here, via anexample, in order to motivate the desire for the Riemann-Stieltjes integral.


For a probability space, suppose that we consider the flip of a “biased”coin, so tht the probability of heads(H) is p and the probability of tails(T)is 1− p with 0 < p < 1. When p = 1/2, we think of the coin as a “fair” coin.A random variable would then be a function that assigned a real numberto each of these possible outcomes. For example, we could define a randomvariable X by setting X(H) = 1, X(T ) = 3. If we let Prob(X = x) denotethe probability that X is equal to the real number x, then we would haveProb(X = 1) = p and Prob(X = 3) = 1 − p. The cumulative distributionfunction of X, is the function Prob(X ≤ x). So in our case,

Prob(X ≤ x) =

0 x < 1p 1 ≤ x < 31 3 ≤ x

.

So our cumulative distribution function has two “jump” discontinuities, thefirst of size p ocurring at 1 and the second of size 1− p occurring at 3.

More generally, if one has a probability space, a random variable X,and we let α(x) = Prob(X ≤ x), then α will be an increasing function,i.e., one for which we can consider Riemann-Stieltjes integrals. Much workin probability and its applications, e.g., financial math, is concerned withcomputing these Riemann-Stieltjes integrals for various functions f in thecase that α is the cumulative distribution function of a random variable.

Returning to our example, if for any c we let

Jc(x) =

{0 x < c

1 c ≤ x,

then we can also write Prob(X ≤ x) = pJ1(x) + (1− p)J3(x).These simple “jump” functions are useful for expressing the cumulative

distributions of many random variables. For example, if we consider one rollof a “fair” die, so that each side has probability 1/6 of facing up and we letX be the random variable that simply gives the number that is facing up,then

Prob(X ≤ x) = 1/6[J1(x) + J2(x) + J3(x) + J4(x) + J5(x) + J6(x)].

For this reason we want to take a careful look at what Riemann-Stieltjesintegration is when α(x) = hJc(x), for some real number c and h > 0.

Definition 5.14. Let f : [a, b]→ R and let a < c ≤ b. If there is a numberL so that for every ε > 0, there exists δ > 0, so that when c − δ < x < c,


we have |f(x)− L| < ε, then we say that L is the limit from the left off at c. We write limx→c− f(x) = L and say that f is continuous fromthe left at c provided that limx→c− f(x) = f(c). When a ≤ c < b, thenthe limit from the right of f at c is defined similarly and is denotedlimx→c+ f(x). We say that f is continuous from the right at c providedthat limx→c+ f(x) = f(c).

Theorem 5.15. Let a < c ≤ b, let h > 0, let α(x) = hJc(x), and letf : [a, b]→ R be a bounded function. Then f is Riemann-Stieltjes integrablewith respect to α if and only if f is continuous from the left at c. In thiscase, ∫ b

afdα = hf(c).

Proof. First we assume that f is continuous from the left at c and provethat f satisfies the Riemann-Stieltjes integrability criterion. So let ε > 0,be given. Since limx→c− f(x) = f(c), there is δ > 0, so that c − δ < x < cimplies that |f(x)− f(c)| < ε

3h .Now consider the partition, P = {a = x0, x1 = c − δ/2, x3 = c, x4 = b}.

Then we have that

U(f,P, α)− L(f,P, α) =4∑i=1

(Mi −mi)(α(xi)− α(xi−1) = (M2 −m2)h,

since α(xi)−α(xi−1) = 0 when i 6= 2. Now for x1 = c−δ/2 ≤ x ≤ c = x2, wehave that |f(x)− f(c)| < ε

2h , so that f(c)− ε3h ≤ f(x) ≤ f(c) + ε

3h . Hence,f(c)− ε

3h ≤ m2 ≤M2 ≤ f(c) + ε3h , which implies that 0 ≤M2 −m2 ≤ 2 ε

3h .Thus, U(f,P, α) − L(f,P, α) ≤ (M2 −m2)h ≤ 2ε

3 < ε, and we have shownthat the Riemann-Stieltjes integrability criterion is met.

Also, note that these inequalities show that

(f(c)− ε

2h)h ≤ mih = L(f,P, α) ≤

∫ b

afdα ≤ U(f,P, α) ≤ (f(c) +

ε

2h)h

which shows that |f(c)h−∫ ba fdα| ≤

ε2 < ε. So it follows that

∫ ba fdα = f(c)h.

Now we must prove that if f satisfies the Riemann-Stieltjes integrabilitycriterion, then f is continuous from the left at c, i.e., limx→c− f(x) = f(c).

To this end fix ε > 0 and pick a partition P so that U(f,P, α) −L(f,P, α) < hε. Let P2 be the partition that we obtain from P by includingthe point c. Since P2 is a refinement of P, we will have that

L(f,P, α) ≤ L(f,P2, α) ≤ U(f,P2, α) ≤ U(f,P, α),


and hence, U(f,P2, α) − L(f,P2, α) ≤ U(f,P, α) − L(f,P, α) < hε. Nowour partition has the form P2 = {a = x0 < ... < xj−1 < xj = c < ... <xn = b}, for some j. Since α(xi) − α(xi−1) = 0 for all i 6= j, we have that(Mj −mj)h = U(f,P2, α) − L(f,P2, α) < hε and Mj −mj < ε. Thus, forxj−1 ≤ x ≤ c, we have that mj ≤ f(c) ≤ Mj and mj ≤ f(x) ≤ Mj , so that|f(x)− f(c)| ≤Mj −mj < ε.

Thus, if we let δ = c−xj−1, then for xj−1 = c− δ < x < c, we have that|f(x)− f(c)| < ε. Since ε > 0 was arbitrary, limx→c− f(x) = f(c).

In probability, two events are called independent if the probability ofboth occurring is the product of the probabilities of each occurring. For ex-ample, suppose that we flip a biased coin with Prob(H) = p, Prob(T ) = 1−ptwice, so that the possible outcomes are {HH,HT, TH, TT}. If we assumethat the flips are independent then the outcomes would have probabilities,

Prob(HH) = Prob(H)Prob(H) = p2,

P rob(HT ) = Prob(TH) = Prob(H)Prob(T ) = p(1− p),P rob(TT ) = Prob(T )Prob(T ) = (1− p)2.

Problem 5.16. Suppose that we flip our biased coin twice, assume thatthe flips are independent and let our random variable be the sum of ourearlier random variable, so that X(HH) = X(H) + X(H) = 2, X(HT ) =X(TH) = 1 + 3, X(TT ) = 3 + 3. Find the cumulative distribution function,α(t) = Prob(X ≤ t).

Problem 5.17. Suppose that we flip a fair coin and roll a fair die andasume that these events are independent, so that each possible outcome hasprobbility (1/2)(1/6) = 1/12. Let X be the random variable that adds +1 tothe number on the die when a H is flipped and +3 to the number on the diewhen a T is flipped. Find the cumulative distribution function of X.

Problem 5.18. Let α : [a, b] → R be an increasing function and let f :[a, b] → R be a bounded function. If m = inf{f(x) : a ≤ x ≤ b} and

M = sup{f(x) : a ≤ x ≤ b}, then m(α(b) − α(a)) ≤∫ bafdα ≤

∫ bafdα ≤

M(α(b)− α(a)).

Problem 5.19. Write out the proof of Theorem 5.13. You may use theearlier stated results in your proof without proving them.


5.2 Properties of the Riemann-Stieltjes Integral

Before trying to compute many integrals, it will be helpful to have provensome basic properties of these integrals.

Theorem 5.20. Let α : [a, b] → R be an increasing function, let f, g :[a, b] → R be bounded functions that are Riemann-Stieltjes integrable withrespect to α and let c ∈ R be a constant. Then:

1. cf is Riemann-Stieltjes integrable with respect to α and∫ b

acfdα = c

∫ b

afdα,

2. f + g is Riemann-Stieltjes integrable with respect to α and∫ b

a(f + g)dα =

∫ b

afdα+

∫ b

agdα,

3. fg is Riemann-Stieltjes integrable with respect to α.

Proof. To prove 1), first consider the case that c ≥ 0. Then we have thatfor any partition, L(cf,P, α) = cL(f,P, α) and U(cf,P, α) = cU(f,P, α).Thus,∫ b

a

cfdα = sup{L(cf,P, α)} =

sup{cL(f,P, α)} = c

∫ b

a

fdα = c

∫ b

afdα =

c

∫ b

afdα = inf{cU(f,P, α} =

inf{U(cf,P, α)} =∫ b

afdα,

which shows that the upper and lower integrals are equal for cf, so that cfis Riemann-Stieltjes integrable with respect to α and shows that they areboth equal to c

∫ ba fdα. Thus, 1) is proven in the case that c ≥ 0.

When c < 0, recall that for any bounded set S of real numbers, sup{cs :s ∈ S} = c inf{s : s ∈ S} and inf{cs : s ∈ S} = c sup{s : s ∈ S}. Thus,for any interval, sup{cf(x) : xi−1 ≤ x ≤ xi} = c inf{f(x) : xi−1 ≤ x ≤ xi}

5.2. PROPERTIES OF THE RIEMANN-STIELTJES INTEGRAL 75

and inf{cf(x) : xi−1 ≤ x ≤ xi} = c sup{f(x) : xi−1 ≤ x ≤ xi}. From this itfollows that L(cf,P, α) = cU(f,P, α) and U(cf,P, α) = cL(f,P, α).

Thus,∫ b

a

cfdα = sup{L(cf,P, α)} =

sup{cU(f,P, α)} = c inf{U(f,P, α)} =

c

∫ b

afdα = c

∫ b

afdα = c

∫ b

a

fdα =

c sup{L(f,P, α)} = inf{cL(f,P, α)} =

inf{U(cf,P, α)} =∫ b

afdα,

and again we have that cf is Riemann-Stieltjes integrable with respect to αand the equality of the integrals.

Now we prove 2). Note that for any interval, we have that

sup{f(x)+g(x) : xi−1 ≤ x ≤ xi} ≤ sup{f(x) : xi−1 ≤ x ≤ xi}+sup{g(x) : xi−1 ≤ x ≤ xi}

from which it follows that for any partition, U(f + g,P, α) ≤ U(f,P, α) +U(g,P, α). Similarly,

inf{f(x)+g(x) : xi−1 ≤ x ≤ xi} ≥ inf{f(x) : xi−1 ≤ x ≤ xi}+inf{g(x) : xi−1 ≤ x ≤ xi}

and it follows that L(f + g,P, α) ≥ L(f,P, α) + L(g,P, α).

Given any ε > 0, choose partitions P1,P2 so that∫ bafdα ≤ U(f,P1, α) <∫ b

afdα+ε/2 and∫ bagdα ≤ U(g,P2, α) <

∫ bagdα+ε/2. If we let P3 = P1∪P2

denote their common refinement, then

∫ b

a(f + g)dα ≤ U(f + g,P3, α) ≤ U(f,P3, α) + U(g,P3, α) ≤

U(f,P1, α) + U(g,P2, α) <∫ b

afdα+

∫ b

agdα+ ε.

since ε > 0 was arbitrary, we have that∫ b

a(f + g)dα ≤

∫ b

afdα+

∫ b

agdα.


A similar calculation shows that

∫ b

a

(f + g)dα ≥∫ b

a

fdα+∫ b

a

gdα.

Since the upper bound and lower bound are both equal to∫ ba fdα+

∫ ba gdα

we have that these inequalities must be equalities and 2) follows.

Before proving 3) we first make a number of observations to reduce thework. First suppose that we only prove that if a function h is Riemann-Stieltjes integrable with respect to α, then h2 is Riemann-Stieltjes integrable.Then, since f and g are both integrable, by part 2), f + g is integrable andhence we would have (f + g)2 = f2 + 2fg+ g2, f2 and g2 are all integrable.Applying 1) and 2) again, we would have that (f + g)2 − f2 − g2 = 2fgis integrable and hence by 1), fg is integrable. Thus, it will be enough toprove that h integrable implies that h2 is integrable.

Next, we claim that to show that squares of integrable functions areintegrable, it will be enough to show that squares of non-negative integrableare integrable. To see this, given any h that is integrable, we know thath is bounded, so there is some constant c > 0 so that h + c ≥ 0. Sinceconstants are continuous functions and continuous functions are integrable,h+ c is integrable. Thus, if we know that squares of non-negative integrablefunctions are integrable, then we will have that (h + c)2 = h2 + 2ch + c2 isintegrable. But 2ch+c2 is integrable by 1) and 2), so h2 = (h+c)2−2ch−c2is integrable.

Thus, it remains to show that if h ≥ 0 is Riemann-Stieltjes integrablewith respect to α, then h2 is also Riemann-Stieltjes integrable with respectto α. To this end we show that h2 satisfies the Riemann-Stieltjes integrabilitycriterion. So let ε > 0, be given and let K = sup{h(x) : a ≤ x ≤ b}. If h = 0,then the result is clearly true, so we may assume that h 6= 0 and so K > 0.

Since h is Riemann-Stieltjes integrable, it satisfies the criterion and hencethere exists a partition P, so that U(h,P, α) − L(h,P, α) < ε/2K. LetP = {a = x0, ..., xn = b}, let Mi = sup{h(x) : xi−1 ≤ x ≤ xi} and letmi = inf{h(x) : xi−1 ≤ x ≤ xi}. Since h(x) ≥ 0, we have that sup{h(x)2 :xi−1 ≤ x ≤ xi} = M2

i and inf{h(x)2 : xi−1 ≤ x ≤ xi} = m2i . Thus, we have

5.2. PROPERTIES OF THE RIEMANN-STIELTJES INTEGRAL 77

that

U(h2,P, α)− L(h2,P, α) =n∑i=1

(M2i −m2

i )(α(xi)− α(xi−1)) =

n∑i=1

(Mi−mi)(Mi+mi)(α(xi)−α(xi−1)) ≤ 2K∑i=1

(Mi−mi)(α(xi)−α(xi−1)) =

2K(U(h,P, α)− L(h,P, α)) < ε

and we have shown that h2 satisfies the Riemann-Stieltjes integrability cri-terion with respect to α.

Proposition 5.21. Let α1, α2 : [a, b]→ R be increasing functions, let c > 0be a constant and let f : [a, b]→ R be a bounded function. Then∫ b

a

fd(cα1 + α2) = c

∫ b

a

fdα1 +∫ b

a

fdα2

and ∫ b

afd(cα1 + α2) = c

∫ b

afdα1 +

∫ b

afdα2.

Proof. First note that for any partition P = {a = x0, ..., xn = b}, we havethat

U(f,P, cα1 +α2) =n∑i=1

Mi[(cα1(xi) +α2(xi))− (cα1(xi−1) +α2(xi−1))] =

cU(f,P, α1) + U(f,P, α2).

Thus,

∫ b

afd(cα1 + α2) = inf{cU(f,P, α1) + U(f,P, α2)} ≥

inf{cU(f,P, α1)}+ inf{U(f,P, α2)} =

c

∫ b

afdα1 +

∫ b

afdα2.

On the other hand, given any ε > 0, there is a partition P1 so that

U(f,P1, α1) <∫ b

afdα1 + ε/2c


and a partition P2 so that

U(f,P2, α2) <∫ b

afdα2 + ε/2.

Let P3 = P1 ∪ P2, then

∫ b

afd(cα1 + α2) ≤ U(f,P3, cα1 + α2) =

cU(f,P3, α1) + U(f,P3, α2) ≤cU(f,P1, α1) + U(f,P2, α2) <

c[∫ b

afdα1 + ε/2c] + [

∫ b

afdα2 + ε/2].

Since ε > 0 was arbitrary, we have that∫ b

afd(cα1 + α2) ≤ c

∫ b

afdα1 +

∫ b

afdα2

and these two inequalties for the upper Riemann-Stieltjes, prove equalityof these integrals. The proof for the lower Riemann-Stieltjes integrals issimilar.

Proposition 5.22. Let α1, α2 : [a, b] → R be increasing, let c > 0 and letf : [a, b] → R be a bounded function. If f is Riemann-Stieltjes integrablewith respect to α1 and α2, then f is Riemann-Stieltjes integrable with respectto cα1 + α2 and ∫ b

afd(cα1 + α2) = c

∫ b

afdα1 +

∫ b

afdα2.

Proof. The previous result shows that the upper and lower Riemann-Stieltjesof f with respect to cα1 +α2 are equal and are given by the above formula.

Theorem 5.23. Let a < c1 < .... < cn ≤ b, let hi > 0, i = 1, ..., n andlet α = h1Jc1 + · · ·hnJcn . If f : [a, b] → R is a bounded function that iscontinuous from the left at each ci, i = 1, ..., n, then f is Riemann-Stieltjesintegrable with respect to α and∫ b

afdα = h1f(c1) + · · ·hnf(cn).

5.3. THE FUNDAMENTAL THEOREM OF CALCULUS 79

Proof. Apply Theorem 5.15 together with the above result.

Problem 5.24. Let a < 1 and 3 < b and let α be the cumulative distributionfunction for the random variable on page 69 for one flip of a biased coin.Compute µ =

∫ ba tdα(t), and

∫ ba (t− µ)2dα(t).

This two values are known as the “mean” and “variance” of the randomvariable. The square root of the variance is called the “standard deviation”.

Problem 5.25. Let a < 2 and 6 < b. Compute the mean and variance ofthe random variable of Problem 5.16.

Problem 5.26. Let a < 2 and 8 < b. Compute the mean and variance ofthe random variable of Problem 5.17.

Problem 5.27. Let α : [−1, 3]→ R be defined by α(x) =

0 x < 01 0 ≤ x < 11 + x 1 ≤ x < 27 2 ≤ x ≤ 3

.

Compute∫ 3−1 tdα(t).

5.3 The Fundamental Theorem of Calculus

Before proceeding it will be useful to have a few more facts about theRiemann-Stieltjes integral.

Theorem 5.28 (Mean Value theorem for Integrals). Let α : [a, b] → R bean increasing function and let f : [a, b]→ R be a continuous function. Thenthere exists c, a ≤ c ≤ b such that∫ b

afdα = f(c)(α(b)− α(c)).

Proof. When α(b)−α(a) = 0, then both sides of the equation are 0 and anypoint c will suffice. So assume that α(b)− α(a) 6= 0.

By Problem 5.18,

m ≤∫ ba fdα

α(b)− α(a)≤M,

where m = f(x0) and M = f(x1) are the minimum and maximum of f on[a, b]. Thus, by the intermediate value theorem, there is a point c betweenx0 and x1 such that f(c) is equal to this fraction.


Proposition 5.29. Let a ≤ c < d ≤ b, let α : [a, b] → R be an increasingfunction and let f : [a, b] → R be a bounded function that is Riemann-Stieltjes integrable with respect to α on [a, b], then f is Riemann-Stieltjesintegrable with respect to α on [c, d].

Proof. Given ε > 0 there is a partition P of [a, b] with U(f,P, α)−L(f,P, α) <ε. Let P1 = P ∪ {c, d}, then P1 is a refinement of P and so U(f,P1, α) −L(f,P1, α) ≤ U(f,P, α)−L(f,P, α) < ε. Now if we only consider the pointsin P1 that lie in [c, d] then that will be a partition of [c, d] and the terms in theabove sum corresponding to those subintervals will be even smaller. Thus,we obtain a partition of [c, d] that satisfies the condition in the Riemann-Stieltjes integrability criterion.

In the case that α(x) = x the following theorem reduces to the classicfundamental theorem of calculus.

Theorem 5.30 (Fundamental Theorem of Calculus). Let α : [a, b] → R bean increasing function that is differentiable on (a, b) and let f : [a, b] → Rbe continuous. If F (x) =

∫ xa f(t)dα(t), then F is differentiable on (a, b) and

F ′(x) = f(x)α′(x).

Proof. Let a < x < x + h < b, then by the mean value theorem, there isc, x ≤ c ≤ x+ h so that

F (x+ h)− F (x)h

=1h

∫ x+h

xf(t)dα(t) =

f(c)(α(x+ h)− α(x)h

.

By choosing h sufficiently small, |f(c)− f(x)| and |α(x+h)−α(x)h − α′(x)| can

be made arbitrarily small. The case of h < 0 is similar.

Corollary 5.31. Let α : [a, b] → R be an increasing, continuous functionthat is differentiable on (a, b) and let f : [a, b] → R be continuous. If F :[a, b]→ R is a continuous function that is differentiable on (a, b) and F ′(x) =f(x)α′(x), then

∫ ba fdα = F (b)− F (a).

Proof. Let G(x) =∫ xa fdα+F (a), then for a < x < b, we have that G′(x) =

F ′(x) and G(a) = F (a). Hence, G(x) = F (x) for a < x < b. Since |G(b) −G(x)| = |

∫ bx fdα| ≤ K(α(b) − α(x)), where K = sup{|f(x)| : a ≤ x ≤

b}, we see that G(b) = limx→b− G(x) = limx→b− F (x) = F (b), since F iscontinuous. Therefore, F (b) = G(b) =

∫ ba fdα + F (a) and the formula

follows.

5.3. THE FUNDAMENTAL THEOREM OF CALCULUS 81

Problem 5.32. Let α : [0, 2]→ R be defined by α(x) =

{x 0 ≤ x < 25 x = 2

and

let f(x) = xm. Calculate∫ 20 fdα. Show that if we let F be continuous on

[0, 2] with F ′(x) = f(x)α′(x) for 0 < x < 2, then F (2)− F (0) 6=∫ 20 fdα.


Chapter 6

Solutions to Problems

1.12. Since d(x, x) = 1 6= 0 not a metric.1.13. Since d(+1,−1) = |(+1)2 − (−1)2| = 0 not a metric.1.14. To see that d(x, y) = |x− y|+ |x2− y2| is a metric: 1) holds since

d(x, y) ≥ 0.2) holds since if x = y then d(x, y) = 0 and if d(x, y) = 0 then |x−y| = 0

so x = y.Clearly, d(x, y) = d(y, x)For 4) let x, y, z ∈ R, then d(x, y) = |(x − z) + (z − y)| + |(x2 − z2) +

(z2 − y2)| ≤ |x− z|+ |z − y|+ |x2 − z2|+ |z2 − y2| = d(x, z) + d(z, y).1.15. The first three are straightforward, only do 4) the triangle inequal-

ity. Let x = (a1, . . . , an), y = (b1, . . . , bn), z = (c1, . . . , cn) be three points inRn, then

d(x, y) = |(a1 − c1) + (c1 − b1)|+ · · ·+ |(an − cn) + (cn − bn)| ≤|a1 − c1|+ c1 − b1|+ · · ·+ |an − cn|+ |cn − bn| = d(x, z) + d(z, y)

1.16. Again, I’ll only do the triangle inequality. Let x = (a1, . . . , an), y =(b1, . . . , bn), z = (c1, . . . , cn) be three points in Rn, then there is some inte-ger j, 1 ≤ j ≤ n where the max occurs, so that d∞(x, y) = |aj − bj |. Hence,d∞(x, y) = |(aj − cj) + (cj − bj)| ≤ |aj − cj | + |cj − bj | But, |aj − cj | ≤max{|a1 − c1|, . . . , |an − cn|} = d∞(x, z), since it is one of the terms oc-curring in the maximum. Similarly, |cj − bj | ≤ d∞(z, y). Combining theseinequalities, yields d∞(x, y) ≤ d∞(x, z) + d∞(z, y).

1.33. Let p = (a1, a2) ∈ O, then a1 + a2 > 0. Set r = (a1 + a2)/2.We claim that B(p; r) ⊆ O, which will prove that O is open. To see theclaim, if q = (b1, b2) ∈ B(p; r), then (a1 − b1)2 + (a2 − b2)2 < r2. Hence,each term satisfies, (ai − bi)2 < r2 and so |ai − bi| < r. This implies that

83

84 CHAPTER 6. SOLUTIONS TO PROBLEMS

a1− r < b1 < a1 + r and a2− r < b2 < a2 + r. Adding together the lefthandside inequalities, yields a1 + a2 − 2r < b1 + b2, but a1 + a2 − 2r = 0, so0 < b1 + b2 which implies that q ∈ O. Thus, B(p; r) ⊆ O.

1.34 Given p = (a1, a2) ∈ O, let r = a1 > 0. Claim that B(p; r) ⊆ O. Tosee the claim, let q = (b1, b2) ∈ B(p; r), then |a1−b1| ≤

√(a1 − b1)2 + (a2 − b2)2 <

r, and so, 0 = a1 − r < b1 < a1 + r, and we have b1 > 0. This proves thatq ∈ O and so B(p; r) ⊆ O.

1.35. Let p = (a1, a2), let B(p; r) denote a disk of radius r centered atp in the Euclidean metric which is an actual disk, and let B∞(p; r) denotethe ball of radius r centered at p in the d∞ metric, which is actually thesquare {(b1, b2) : a1 − r < b1 < a1 + r, a2 − r < b2 < a2 + r}. Now checkthat B∞(p; r√

2) ⊆ B(p; r). This can be seen by geometry or verifying the

inequalities.Now to prove that D is open in the d∞ metric, let p ∈ D. If we let

r = 1 − d(0, p), then q ∈ B(p; r) implies that d(0, q) ≤ d(0, p) + d(p, q) <d(0, p) + r = 1. Hence, q ∈ D and we’ve shown that B(p; r) ⊆ D. Thus,B∞(p; r√

2) ⊆ B(p; r) ⊆ D.

1.36. Let g ∈ O. Set r = min{g(t) : 0 ≤ t ≤ 1}. By the theoremsabout continuous functions from Math 3333, we know that there is a pointt0, 0 ≤ t0 ≤ 1, so that r = g(t0). Hence, r > 0. We claim that B(g; r) ⊆ O.To see this let f ∈ B(g; r), then |g(t) − f(t)| < r for every t, and hence0 = g(t0)− r ≤ g(t)− r < f(t). This shows that 0 < f(t) for every t and so,f ∈ O. Thus, B(g; r) ⊆ O.

1.42. Since d and ρ are uniformly equivalent there are constants, A,B sothat ρ(x, y) ≤ Ad(x, y) and d(x, y) ≤ Bρ(x, y). Since ρ and γ are uniformlyequivalent there are constants C,D so that γ(x, y) ≤ Cρ(x, y) and ρ(x, y) ≤Dγ(x, y).

Hence, γ(x, y) ≤ Cρ(x, y) ≤ CAd(x, y) and d(x, y) ≤ Bρ(x, y) ≤ BDγ(x, y).Thus, d and γ are uniformly equivalent with constants CA and BD.

1.43. Let d denote the Euclidean metric. We will prove that d andd1 are uniformly equivalent and that d and d∞ are uniformly equivalent.Then by 1.42 it will follow that d1 and d∞ are uniformly equivalent. Letx = (a1, . . . , an) and y = (b1, . . . , bn) be arbitrary points in Rn.

Note that

d1(x, y)2 =n∑i=1

n∑j=1

|ai − bi| · |aj − bj | ≥n∑j=1

|aj − bj |2 = d(x, y)2,

85

so that d(x, y) ≤ d1(x, y). On the other hand,

d1(x, y) = |a1 − b1| · 1 + · · ·+ |an − bn| · 1 ≤√|a1 − b1|2 + · · ·+ |an − bn|2

√12 + · · ·+ 12 =

√nd(x, y)

using the Schwarz inequality and the fact that there are n 1’s inside thesquare root.

Thus, d and d1 are uniformly equivalent.Now we prove that d and d∞ are uniformly equivalent. For each j we

have that |aj − bj | ≤ d∞(x, y) because it is the maximum difference. Hence,

d(x, y)2 =n∑j=1

|aj − bj |2 ≤ nd∞(x, y)2

and so d(x, y) ≤√nd∞(x, y). On the other hand for a given j,

|aj − bj |2 ≤n∑i=1

|ai − bi|2 = d(x, y)2,

which shows that for each j, |aj − bj | ≤ d(x, y). Hence, the maximum suchterm, which is d∞(x, y) is also less than d(x, y), i.e., d∞(x, y) ≤ d(x, y).

1.54. A slick proof.

B−∞((1, 1); 1) = {(a1, a2) : |a1 − 1| ≤ 1, |a2 − 1| ≤ 1}= {(a1, a2) : 0 ≤ a1 ≤ 2, 0 ≤ a2 ≤ 2}

is a closed set in the d∞ metric. Similarly,

B−∞((1, 3); 1) = {(a1, a2) : 0 ≤ a1 ≤ 2, 2 ≤ a2 ≤ 4}

is a closed set in the d∞ metric. Since the union of two closed sets is still aclosed set,

B−∞((1, 1); 1) ∪B−∞((1, 3); 1) = {(a1, a2) : 0 ≤ a1 ≤ 2, 0 ≤ a2 ≤ 4}

is closed in the d∞ metric.Since d∞ and the Euclidean metric are uniformly equivalent, this set is

also closed in the Euclidean metric.1.65. Suppose that limn pn = p. Then given ε > 0 there is N so that

n > N implies that d(p, pn) < ε. So for this same value of N, |d(p, pn)−0| =d(p, pn) < ε. Thus, limn d(p, pn) = 0.


Conversely, assume that limn d(p, pn) = 0. Given ε > 0 there is N sothat n > N implies that |d(p, pn) − 0| < ε. Hence for n > N, d(p, pn) =|d(p, pn)− 0| < ε.

1.66. We know that the set C = {x ∈ R : x ≥ 0} is closed. Hence ifxn ≥ 0 and limn xn = x, then x ∈ C by Theorem 1.61 and so x ≥ 0.(Thiswas also proved in Math 3333 and you can just use this”fact” from there.)

Now let pn = (a1,n, a2,n) ∈ S. and let limn pn = p = (a1, a2). Thenby Theorem 1.57, limn a1,n = a1 and limn a2,n = a2. Since pn ∈ S weknow that 0 ≤ a2,n and 0 ≤ a1,n − a2,n. Hence, a2 = limn a2,n ≥ 0. Alsoa1 − a2 = limn a1,n − a2,n ≥ 0. Thus, 0 ≤ a2 ≤ a1 and so p ∈ S.

We have shown that the limit of every sequence in S is also in S. Hence,S is closed by Theorem 1.61.

1.67. (0, 1] is bounded but not closed, {x : 0 ≤ x} is closed but notbounded and pn = (−1)n(1− 1/n) is bounded but not convergent.

1.86. If p ∈ S then by Theorem 1.78, B(p; 1/n) ∩ S is non-empty, sopick a point pn in this intersection. Then {pn} ⊆ S and for any ε > 0, if weset N = 1/ε then for n > N, we have that d(p, pn) < 1/n < 1/N = ε. Thisproves that limn pn = p.

Conversely, if {pn} ⊆ S and limn pn = p, then for any r > 0, there is aN so that n > N implies that d(p, pn) < r and hence pn ∈ B(p; r)∩S. Thus,by Theorem 1.78, p ∈ S.

1.87. We apply Corollary 1.79 twice. p ∈ ∂S iff p ∈ S and p ∈ Sc iffthere exists {pn} ⊆ S with limn pn = p and there exists {qn} ⊆ Sc withlimn qn = p.

1.88. Let C = {(x1, x2) : x21 + x2

2 ≤ 1}. Since C = B−((0, 0); 1) we ahvethat C is closed and S ⊆ C. Hence, S ⊆ C since it is contained in everyclosed set containing S. Let rn = (1− 1/n) if p ∈ C, then pn = rnp ∈ S andlimn rnp = p, by Theorem 1.69.2. Hence, by Corollary 1.79, p ∈ S. Thus,C ⊆ S and the two sets are equal.

Look at Sc = {(x1, x2) : x21 + x2

2 ≥ 1}. This is a closed set since itscomplement is open. Thus, Sc = Sc.

Now ∂S = S ∩ Sc = C ∩ Sc = {(x1, x2) : x21 + x2

2 = 1}.1.90. Let p be a cluster point of S. Since there are infinitely many points

we can pick a point p1 ∈ B(p; 1)∩S, p1 6= p. Now let r1 = min{1/2, d(p, p1)},again since there are infinitely many points in B(p; r1) ∩ S we can pick apoint p2 ∈ B(p; r1)∩S, p2 6= p. Since d(p, p2) < d(p, p1) we have that p2 6= p1.Now set r2 = min{1/3, d(p, p2)}, pick a point p3 ∈ B(p; r2)∩S, p3 6= p. Sinced(p, p3) < d(p, pi), i = 1, 2, we get that p3 6= pi, i = 1, 2. Continuing this waywe get a sequence of distinct points {pn} ⊆ S, with d(p, pn) ≤ 1/n. Hence,limn d(p, pn) = 0, and so limn pn = p.

87

1.91. Since every set is open and closed, int(S) = S and S = S. Butthen ∂S = S ∩ Sc = S ∩ Sc = Ø.

1.102. Since d and ρ are uniformly equivalent there are constants, A,Bso that ρ(x, y) ≤ Ad(x, y) and d(x, y) ≤ Bρ(x, y).

Assume that (X, d) is complete and let {pn} be a Cauchy sequence in(X, ρ). Given ε > 0, there exists N so that for n,m > N, we have ρ(pn, pm) <εB . This implies that d(pn, pm) ≤ Bρ(pn, pm) < ε. Hence, {pn} is Cauchy in(X, d). Since (X, d) is complete there is a point p, so that limn pn = p, inthe d-metric. By Proposition 1.58, {pn} also converges to p in the ρ metric.Thus, (X, ρ) is complete.

The converse implication follows by reversing the roles of d and ρ.1.121. Fix any point p0 ∈ X. The balls {B(p0;n)}n∈N are an open cover

of K so there is a finite subset of them that covers K, say K ⊆ B(p0;n1) ∪· · · ∪ B(p0;nm). Let r = max{n1, ..., nm}, then each B(p0;ni) ⊆ B(p0; r)and so K ⊆ B(p0; r). But this is one of our equivalent definitions of whatbounded means,

1.122. Let K = K1∪· · ·∪Kn. If {Uα}α∈A is an open cover of K, then itis also an open cover of each Ki. Since Ki is compact, there is a finite subsetof the index set, Fi ⊆ A, such that Ki ⊆ ∪α∈FiUα. Now let F = F1∪· · ·∪Fn,which is still a finite set and notice that K ⊆ ∪α∈FUα.

3.13. To see that f(x) = x3 is continuous at x0, given ε > 0, letδ = min{1, ε

3|x0|2+3|x0|+1} then for |x− x0| < δ, we have

|x3 − x30| = |x− x0||x2 + xx0 + x2

0|≤ |x− x0|((|x0|+ 1)2 + (|x0|+ 1)|x0|+ |x0|2) < δ(3|x0|2 + 3|x0|+ 1) ≤ ε.

Since f is continuous at every x0, f is continuous.3.14. For ε = 1, we will show that there is no δ, that is for every

positive δ we show that there is x, y with |x− y| < δ, but |f(x)− f(y)| ≥ 1.To this end let x =

√2/δ and let y = x+ δ/2. Then |x− y| = δ/2 < δ, but

|f(y)− f(x)| = |y3 − x3| = |y − x||y2 + yx+ x2| ≥ (δ/2)(x2) = 1 ≥ ε.3.15. Given ε > 0, let δ = ε

3M . Then whenever x, y ∈ [−M,+M ]and |x − y| < δ, we have that |f(x) − f(y)| = |x − y||x2 + xy + y2| ≤|x− y|(M2 +M2 +M2) < 3M2δ = ε.

3.16. Given ε let δ = εK . Then whenever, d(p, q) < δ, we have that

ρ(f(p), f(q)) ≤ Kd(p, q) < Kδ = ε.3.17. Assume that f is continuous at p0, then given ε > 0, there exists

δ > 0 so that d(p, p0) < δ implies that ρ(f(p), f(p0)) < ε. If p ∈ Bd(p0; δ),then d(p, p0) < δ hence ρ(f(p), f(p0)) < ε, but this implies that f(p) ∈Bρ(f(p0); ε). Thus, we have the set inclusion, f(Bd(p0; δ)) ⊆ Bρ(f(p0); ε).


Conversely, given ε > 0, let δ > 0 be the number that gives the set in-clusion, then d(p, p0) < δ implies that ρ(f(p), f(p0)) < ε. So f is continuousat p0.

3.26. We do two cases, x0 = 0 and x0 6= 0. To see that g is continuous at0, given ε > 0, let δ = ε2. Then when |x−0| < δ, we have that |g(x)−g(0)| =√x <√δ = ε.

To see that g is continuous at x0 6= 0, given ε > 0, set δ = ε√x0. Then

when |x − x0| < δ, we have that |g(x) − g(x0)| = |√x −√x0| = |x−x0|√

x+√x0<

δ√x0

= ε.

3.27. g is uniformly continuous on [0,+∞). Given ε > 0, let δ = ε2.Now let |y−x| < δ. Without loss of generality, we may assume that y is thelarger of the two numbers.

Case 1: y < ε2. In this case, |√y −√x| ≤ √y <

√ε2 = ε.

Case 2: y ≥ ε2. In this case |√y −√x| = |y−x|√

y+√x< δ√

y ≤δε = ε.

3.34. Assume that f is continuous at p0. Then given ε > 0, we have thatthere exists δ > 0, so that d(p0, p) < δ, we have ρ(f(p0), f(p)) < ε. For thisarbitrary ε this δ satisfies the conditions needed to show that limp→p0 f(p) =f(p0).

Conversely, assume that limp→p0 f(p) = f(p0). Then given ε > 0, wehave a δ > 0, so that when d(p0, p) < δ and p 6= p0, then ρ(f(p), f(p0)) < ε.Since when p = p0, ρ(f(p), f(p0)) = 0, we have that d(p0, p) < δ impliesthat ρ(f(p0), f(p)) < ε. This shows that the ε-δ definition of continuity ismet at p0.

3.35. Using Lemma 3.32, we have that f is continuous iff f is continuousat p0 for every p0 iff f is continuous at p0 for every p0 a cluster point.Applying Theorem 3.31, we have f is continuous at p0 for every clusterpoint p0 iff for every cluster point p0, limp→p0 f(p) = f(p0).

3.36. First assume limp→p0 f(p) = q0, and limn pn = p0, is a sequencewith pn 6= p0. Given ε > 0, there is δ > 0, so that d(p0, p) < δ and p 6=p0, then ρ(f(p), q0) < ε. Now since limn pn = p0, we may pick N so thatn > N, implies that d(p0, pn) < δ. Since pn 6= p0, we have that for n > N,ρ(f(pn), q0) < ε.

Conversely, assume that limp→p0 f(p) 6= q0. Then there is an ε > 0, forwhich we can find no δ. Taking δ = 1/n, means that we have a point pn 6= p0,d(p0, pn) < 1/n and ρ(f(pn), q0) ≥ ε. This sequence satisfies, limn pn = p0,pn 6= p0 and limn f(pn) 6= q0.

4.3 To show that f([1, 2]) ⊆ [1, 2], do an absolute max-min problem.Look at f ′(x) = 1/2 − 1/x2 = 0 when x =

√2. Calculating, f(

√2) =

89

√2, f(1) = 3/2, f(2) = 3/2. Thus abs max is 3/2 < 2 and abs min is√2 > 1. Thus, f([1, 2]) ⊆ [

√2, 3/2] ⊆ [1, 2].

For 1 ≤ x ≤ 2, we see that f ′(x) is increasing, hence, −1/2 = f ′(1) ≤f ′(x) ≤ f ′(2) = 1/4. Thus, for 1 ≤ x ≤ 2, we have that |f ′(x)| ≤ 1/2. Henceby the mean value theorem, |f(x2)− f(x1)| = |f ′(c)||x2−x1| ≤ 1/2|x2−x1|and f is a contraction mapping on the set [1, 2] with r = 1/2.

So f has a unique fixed point, since f(√

2) =√

2 that is the fixedpoint. Using x1 = 3/2 we have that x2 = 17/12. Hence, |

√2 − xn| ≤

|x2−x1|(1/2)n−1

1−1/2 = 1712(1/2)n−2.

4.5. For f(x) = x2−5, we have g(x) = x− x2−52x = x

2 + 52x . We have that

g(√

5) =√

5. Again we do abs max-min on [2,3]. We find that g′(x) = 0,only when x =

√5 and g(

√5) =

√5. Endpoints, g(2) = 9/4, g(3) = 7/3. So

abs max is 7/3 and abs min is√

5. Hence, g([2, 3]) ⊆ [√

5, 7/3] ⊆ [2, 3].Since, g′(x) = 1/2 − 5

2x2 we see that g′ is an increasing function withg′(x) ≤ 1/2. Also, g′(2) = −1/8, so we have that |g′(x)| ≤ 1/2 for 2 ≤ x ≤ 3.This is where I got the 1/2 from that appears in the solution.

However, if you are more careful and as in the other problem do themax-min of g′ over [2, 3] you will find an improved estimate: for 2 ≤ x ≤ 3,we have −1/8 ≤ g′(x) ≤ 2/9. Hence, |g′(x)| ≤ 2/9 ≤ 1/2.

Thus, g is a contraction mapping on [2, 3] with unique fixed point,√

5.Using x1 = 2, we find that x2 = 9/4. We may use either r = 2/9(for a betterestimate) or r = 1/2 to get that |

√5 − xn| ≤ |x2−x1|(2/9)n−1

1−2/9 = 14

97(2/9)n−1

or using r = 1/2, that |√

5− xn| ≤ |x2−x1|(1/2)n−1

1−1/2 = (1/2)n.

4.6. Here after simplifying, g(x) = xsin(x)+cos(x)1+sin(x) and g′(x) = cos(x)(x−cos(x))

(1+sin(x))2.

We first need to find an interval [a, b] with g([a, b]) ⊆ [a, b]. Then to show ga contraction on that interval, using the MVT, it will be enough to find abound on g′ that is less that some r < 1.

Know that g(x0) = x0 when cos(x0) = x0. On the interval [0, π/2] wesee that g′(t) ≤ 0 for 0 ≤ t ≤ x0 and g′(t) ≥ 0 when x0 ≤ t ≤ π/2.

So for any interval of the form [a, b] with 0 ≤ a < x0 < b ≤ π/2 wewill have that the minimum of g is g(x0) = x0 and the maximum will be atthe endpoints and so will be L = max{g(a), g(b)}. Thus, we will have thatg([a, b]) ⊆ [x0, L] ⊆ [a, b] as long as we have L ≤ b.

If we try a = 0, b = π/2, then L = max{g(0), g(π/2)} = max{1, π/4} =1 ≤ b. So this interval works and g([0, π/2]) ⊆ [x0, 1].

If we want to take a smaller interval, which is always better, since theupper bound is 1, we could take b = 1 and g([0, 1]) ⊆ [x0, 1].

Because g′(0) = 1, to get a contraction mapping, we want to take a >


0, but some value that we can guarantee is still smaller than x0. A littleguesstimation shows that x0 is around 3/4, so to put it in the middle of ourinterval try a = 1/2. Our formulas for g and g′ involve cosines and sines,can’t compute these very accurately at 1/2, but we can at π/6 which isroughly 1/2!!

So I picked an interval, [a, b] = [π/6, 1]. This is really my “scrap work”.Now to prove things for this interval. First, we know that x − cos(x) isincreasing on [0, π/2] at π/6, π/6 − cos(π/6) = π/6 −

√3/2 < 0. This

implies that π/6 < x0. Also on this interval |x−cos(x)| ≤ 1, hence, |g′(x)| ≤cos(x)

(1+sin(x))2which is decreasing on [π/6, 1]. Therefore, |g′(x)| ≤ cos(π/6)

(1+sin(π/6))2=

√3/2

(1+1/2)2= 2

3√

3.

Therefore, g : [π/6, 1] → [π/6, 1] is a contraction mapping with r =2

3√

3. This value is about .385. In the earlier section when we tried to find

the solution to cos(x0) = x0, directly using only the contraction mappingtheorem and iterating the function cos(x)(instead of g(x))we had that r =sin(1) which is about .842. Thus, this Newton’s method of finding x0, shouldconverge a whole lot faster.

4.9. For this problem or map Φ : C([0, π/2])→ C([0, π/2]) is given by

Φ(f)(x) =∫ x

0

sin(tf(t))5

dt+ 1.

Apply MVT,

|h(x, y1)− h(x, y2)| = (1/5)|sin(xy1)− sin(xy2)| = |cos(c)|5|xy1 − xy2|,

for some c with xy1 ≤ c ≤ xy2, we get that |h(x, y1) − h(x, y2)| ≤ 15 |xy1 −

xy2| = |x|5 |y1− y2|. Because 0 ≤ x ≤ π/2, we get So K = π/10 and applying

the theorem r = K(π/2− 0) = π2

20 .

Thus, by our theorem Φ is a contraction mapping with r = π2

20 .If we start with f1(x) = 1, i.e., the constant function, then

f2(x) = Φ(f1)(x) =∫ x

0(1/5)sin(t · 1)dt+ 1 =

6− cos(x)5

.

We have that d(f1, f2) = sup{|6−cos(x)5 − 1| : 0 ≤ x ≤ π/2} = (1/5).Thus, if f denotes the solution of our IVP, then

d(f, fn) ≤ d(f1, f2)(π2/20)n−1

1− π2/20=

420− π2

(π2

20

)n−1

.

91

5.7. Since the function x is increasing the maximum value occurs atthe right hand endpoint and the minimum value occurs at the right handendpoint. For the partition Pn this means that Mj = j

n and mj = j−1n .

Thus,

U(f,Pn) =n∑j=1

j

n

1n

=1n2

n(n+ 1)2

=n+ 1

2n→ 1

2.

and

L(f,Pn) =n∑j=1

j − 1n

1n

=1n2

n−1∑j=0

j =1n2

(n− 1)n2

=n− 1

2n→ 1

2.

This shows that

12≤∫ 1

0xdx ≤

∫ 1

0xdx ≤ 1

2.

Hence the integral exists and∫ 10 xdx = 1

2 .

5.8. Consider the partition P = {a, c, c+ ε, d− ε, d, b}. For this partitionwe have that M1 = m1 = 0,M2 = 1,m2 = 0,M3 = m3 = 1,M4 = 1,m4 =0,M5 = m5 = 0. Thus, we have that U(f,P) = M1(c− a) +M2(c+ ε− c) +M3((d−ε)−(c+ε))+M4(d−(d−ε))+M5(b−d) = ε+d−c−2ε+ε = d−c, andL(f,P) = m1(c−a)+m2(ε)+m3(d−c−2ε)+m4(ε)+m5(b−d) = d−c−2ε.This shows that

d− c− 2ε ≤∫ b

afdx ≤

∫ b

afdx ≤ d− c.

Since ε was arbitrary,∫ ba fdx =

∫ ba fdx = d−c. Thus, f is Riemann integrable

and the integral is d− c.5.16. We have that

α(x) =

0 x < 2p2 2 ≤ x < 42p− p2 4 ≤ x < 61 6 ≤ x.

= p2J2(x) + 2p(1− p)J4(x) + (1− p)2J6(x)


5.17.We have that

α(x) =

0 x < 21/12 2 ≤ x < 32/12 3 ≤ x < 44/12 4 ≤ x < 56/12 5 ≤ x < 68/12 6 ≤ x < 710/12 7 ≤ x < 811/12 8 ≤ x < 91 9 ≤ x

or 1/12[J2(x) +J3(x) +J8(x) +J9(x)] + 2/12[J4(x) +J5(x) +J6(x) +J7(x)].5.18. Let your partition be P = {a, b}, then L(f,P, α) = m(α(b)−α(a))

and U(f,P, α) = M(α(b) − α(a)). Since we always have that L(f,P, α) ≤∫ ba fdα ≤

∫ ba fdα ≤ U(f,P, α), the result follows.

5.19. When α(b) = α(a), then for any function L(f,P, α) = U(f,P, α) =0 for every partition and so any f is Riemann-Stieltjes integrable. So weassume that α(b) > α(a). Given ε, since f is uniformly continuous thereis a δ > 0, so that when |x − y| < δ, then |f(x) − f(y)| < ε

3(α(b)−α(a)) .

Now choose any partition P = {a = x0, ..., xn = b} with the property that|xj−xj−1| < δ. Then we will have that Mj−f(xj) ≤ ε

3(α(b)−α(a)) and f(xj)−mj ≤ ε

3(α(b)−α(a)) . From this it follows that Mj −mj ≤ 2ε3(α(b)−α(a)) . Hence,

U(f,P, α)−L(f,P, α) =∑n

j=1(Mj−mj)(α(xj)−α(xj−1) ≤∑n

j=12ε

3(α(b)−α(a))(α(xj)−α(xj−1)) = 2ε

3 .Thus, we have shown that f satisfies the Riemann-Stieltjes integrability

condition and so by Theorem 5.11, f is Riemann-Stieltjes integrable.5.24. We have that α(x) = pJ1(x) + (1 − p)J3(x), so for continuous

f(x), we have that∫ ba fdα = pf(1) + (1 − p)f(3). Thus for f(x) = x,

∫ ba =

p1 + (1− p)3 = 3− 2p = µ, and for f(x) = (x− µ)2,∫ ba fdα = p(1− µ)2 +

(1− p)(3− µ)2 = 4p− 4p2.5.25. We have that α(x) = p2J2(x) + 2p(1 − p)J4(x) + 91 − p)2J6(x).

Hence, µ = p2(2) + 2p(1 − p)(4) + (1 − p)2(6) = 6 − 4p and the variance isp2(4− 4p)2 + 2p(1− p)(2− 4p)2 + (1− p)2(4p)2 = 8p− 8p2.

5.26. We have α(x) = 1/12J2(x)+1/12J3(x)+2/12J4(x)+2/12J5(x)+2/12J6(x) + 2/12J7(x) + 1/12J8(x) + 1/12J9(x).

Hence, µ = 2/12 + 3/12 + 8/12 + 10/12 + 12/12 + 14/12 + 8/12 + 9/12 =66/12 = 11/2.

93

The variance is 1/12(2− 11/2)2 + 1/12(3− 11/2)2 + 2/12(4− 11/2)2 +2/12(5−11/2)2+2/12(6−11/2)22/12(7−11/2)2+1/12(8−11/2)2+1/12(9−11/2)2 = 47/12.

5.27. Note that α jumps by 1 at 0 by 1 at 1 and by 4 at 2, so we writeα(x) = J0(x) + J1(x) + 4J2(x) + β(x), and see that

β(x) =

0 x < 1x− 1 1 ≤ x < 21 2 ≤ x

.

Note that this β has no jumps and is continuous. Combining 5.22 and5.23 we see that

∫ 3−1 fdα = f(0) + f(1) + 4f(2) +

∫ 2−1 fdβ. For any par-

tition, since β(1) − β(−1) = 0 and β(3) − β(2) = 0, we see that thesums along the partition only get contributions for points in the parti-tion between 1 and 2. Hence,

∫ 3−1 fdβ =

∫ 21 fdβ. Now for any partition

P = {1 = x0, ..., xn = 2} we have that β(xj)− β(xj−1) = (xj − xj−1). Thus,we have that U(f,P, β) = U(f,P), L(f,P, β) = L(f,P). Thus, the integraldβ is the ordinary dx integral. So

∫ 21 fdβ =

∫ 21 fdx. Applying all of this to

the case f(x) = x, yields∫ 3

−1fdα = 0 + 1 + 4 · 2 +

∫ 2

1xdx = 21/2.

Documents

Introduction to Real Analysis Fall 2011 Lecture Noteswagner/4331/4331lec.pdfChapter 1 Metric Spaces These notes accompany the Fall 2011 Introduction to Real Analysis course 1.1 De