99
Boot Camp: Real Analysis Lecture Notes Lectures by Itay Neeman Notes by Alexander Wertheim August 23, 2016 Introduction Lecture notes from the real analysis class of Summer 2015 Boot Camp, delivered by Professor Itay Neeman. Any errors are my fault, not Professor Neeman’s. Corrections are welcome; please send them to [firstinitial][lastname]@math.ucla.edu. Contents 1 Week 1 3 1.1 Lecture 1 - Construction of the Real Line .................... 3 1.2 Lecture 2 - Uniqueness of R and Basic General Topology ........... 7 1.3 Lecture 3 - More on Compactness and the Baire Category Theorem ..... 11 1.4 Lecture 4 - Completeness and Sequential Compactness ............ 16 2 Week 2 20 2.1 Lecture 5 - Convergence of Sums and Some Exam Problems ......... 20 2.2 Lecture 6 - Some More Exam Problems and Continuity ............ 25 2.3 Lecture 7 - Path-Connectedness, Lipschitz Functions and Contractions, and Fixed Point Theorems .............................. 30 2.4 Lecture 8 - Uniformity, Normed Spaces and Sequences of Functions ..... 34 3 Week 3 39 3.1 Lecture 9 - Arzela-Ascoli, Differentiation and Associated Rules ........ 39 3.2 Lecture 10 - Applications of Differentiation: Mean Value Theorem, Rolle’s Theorem, L’Hopital’s Rule and Lagrange Interpolation ............ 45 3.3 Lecture 11 - The Riemann Integral (I) ..................... 51 3.4 Lecture 12 - The Riemann Integral (II) ..................... 58 4 Week 4 65 4.1 Lecture 13 - Limits of Integrals, Mean Value Theorem for Integrals, and In- tegral Inequalities ................................. 65 4.2 Lecture 14 - Power Series (I), Taylor Series, and Abel’s Lemma/Theorem . . 72 4.3 Lecture 15 - Stone-Weierstrass and Taylor Series Error Approximation .... 80 4.4 Lecture 16 - Power Series (II), Fubini’s Theorem, and exp(x) ......... 87 1

Boot Camp: Real Analysis Lecture Notes - UCLAawertheim/Bootcamp/Notes/Real Analysis Lectur… · 1 Week 1 As per the syllabus, Week 1 topics include: cardinality, the real line, completeness,

  • Upload
    vankhue

  • View
    234

  • Download
    0

Embed Size (px)

Citation preview

Boot Camp: Real Analysis Lecture Notes

Lectures by Itay NeemanNotes by Alexander Wertheim

August 23, 2016

Introduction

Lecture notes from the real analysis class of Summer 2015 Boot Camp, delivered byProfessor Itay Neeman. Any errors are my fault, not Professor Neeman’s. Corrections arewelcome; please send them to [firstinitial][lastname]@math.ucla.edu.

Contents

1 Week 1 31.1 Lecture 1 - Construction of the Real Line . . . . . . . . . . . . . . . . . . . . 31.2 Lecture 2 - Uniqueness of R and Basic General Topology . . . . . . . . . . . 71.3 Lecture 3 - More on Compactness and the Baire Category Theorem . . . . . 111.4 Lecture 4 - Completeness and Sequential Compactness . . . . . . . . . . . . 16

2 Week 2 202.1 Lecture 5 - Convergence of Sums and Some Exam Problems . . . . . . . . . 202.2 Lecture 6 - Some More Exam Problems and Continuity . . . . . . . . . . . . 252.3 Lecture 7 - Path-Connectedness, Lipschitz Functions and Contractions, and

Fixed Point Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.4 Lecture 8 - Uniformity, Normed Spaces and Sequences of Functions . . . . . 34

3 Week 3 393.1 Lecture 9 - Arzela-Ascoli, Differentiation and Associated Rules . . . . . . . . 393.2 Lecture 10 - Applications of Differentiation: Mean Value Theorem, Rolle’s

Theorem, L’Hopital’s Rule and Lagrange Interpolation . . . . . . . . . . . . 453.3 Lecture 11 - The Riemann Integral (I) . . . . . . . . . . . . . . . . . . . . . 513.4 Lecture 12 - The Riemann Integral (II) . . . . . . . . . . . . . . . . . . . . . 58

4 Week 4 654.1 Lecture 13 - Limits of Integrals, Mean Value Theorem for Integrals, and In-

tegral Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 654.2 Lecture 14 - Power Series (I), Taylor Series, and Abel’s Lemma/Theorem . . 724.3 Lecture 15 - Stone-Weierstrass and Taylor Series Error Approximation . . . . 804.4 Lecture 16 - Power Series (II), Fubini’s Theorem, and exp(x) . . . . . . . . . 87

1

5 Week 5 955.1 Lecture 17 - Some Special Functions and Differentiation in Several Variables 955.2 Lecture 18 - Inverse Function Theorem, Implicit Function Theorem and La-

grange Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 985.3 Lecture 19 - Multivariable Integration and Vector Calculus . . . . . . . . . . 99

2

1 Week 1

As per the syllabus, Week 1 topics include: cardinality, the real line, completeness, topology,connectedness, compactness, metric spaces, sequences, and convergence.

1.1 Lecture 1 - Construction of the Real Line

Today’s main goal will be the construction of the real numbers. We will take the construc-tion of N,Z, and Q for granted.

Let’s start with a fact. The rationals form a dense linear order with no endpoints.Unpacked, this means:

(i) Dense: For all x and y, there exists z such that x < z < y

(ii) Linear: For all x and y, either x < y or x = y or y < x

(iii) No endpoints: For all x, there exists y such that y < x; for all x, there exists y suchthat y > x

It turns out that every countable dense linear order with no endpoints is isomorphic to(Q;<). We will come back to this result after a brief discussion of cardinality.

Cardinality:

Definition 1.1.1. Two sets are equinumerous (written A ∼= B) if there is a bijectionf : A→ B.

Note that ∼= determines an equivalence relation:

(i) ∼= is reflexive (take the identity)

(ii) ∼= is symmetric (if f : A→ B is a bijection, then f−1 : B → A is also a bijection)

(iii) ∼= is transitive (if f : A → B and g : B → C are bijections, then g f : A → C is abijection)

Definition 1.1.2. A set x is finite if there exists n ∈ N such that x ∼= 0, 1, . . . , n− 1.

Definition 1.1.3. A set x is infinite if x is not finite.

We write A B if there exists an injection f : A→ B.

Theorem 1.1.4 (Cantor-Schroeder-Bernstein). If A B and B A, then A ∼= B.

The proof of CSB is beyond the scope of this lecture, so we omit it here. Using CSB, wecan prove several useful facts.

Corollary 1.1.5 (Pigeonhole Principle). For all n,m ∈ N, if n < m, then 0, 1, . . . ,m−1 0, 1, . . . , n− 1.

3

Proof. By the uniqueness of the cardinality of finite sets, we have 0, 1, . . . , n − 1 6∼=0, 1, . . . ,m−1, so by CSB, we must have 0, 1, . . . , n−1 0, 1, . . . ,m−1 or 0, 1, . . . ,m−1 0, 1, . . . , n − 1. It must be the latter, since inclusion is clearly an injection from0, 1, . . . , n− 1 to 0, 1, . . . ,m− 1

Note that one doesn’t really need the strength of CSB to prove the Pigeonhole Principle;a direct argument can be made. We have the following (nearly) immediate corollary.

Corollary 1.1.6. For finite A,B, if A ( B, then A 6∼= B.

Example 1.1.7. Note that the above corollary fails for infinite sets. Indeed, N and N \ 0are indeed equinumerous via the map n 7→ n+ 1.

Now we wll talk a bit about countable sets.

Definition 1.1.8. A set A is countable if A is equinumerous with N, i.e. A ∼= N.

Example 1.1.9. Here are some familiar faces which are countable:

(i) N is countable; just take the identity map.

(ii) Z is also countable. One can biject N with Z as follows:

0, 1, 2, 3, 4, . . .

0, 1,−1, 2,−2, . . .

(iii) More generally, if A and B are countable sets, then A ∪B is also countable.

Claim 1.1.10. N× N is countable.

Proof. Proof 1: There is an obvious injection from N to N×N given by n 7→ (0, n). On theother hand, (n,m) 7→ 2n3m is an injection from N× N to N by the fundamental theorem ofarithmetic, so by CSB, N ∼= N× N.Proof 2: Picture the elements of N × N as a square lattice, e.g. by identifying N × Nwith the corresponding set of points in the Cartesian plane. Starting at the first element ofthis square (the origin, as it were), and for the kth element on the bottom row, count up kelements and over k − 1 elements to the left. That is, we define a bijection f : N → N × Nso that f(k2 + 1), . . . , f((k + 1)2) lists the elements of the (k + 1) × (k + 1) square minuselements of the k × k contained inside it.

Corollary 1.1.11. If A and B are countable, then so is A×B.

Proof. Fix bijections fA : N→ A and fB : N→ B. Fix n 7→ (h1(n), h2(n)) a biijection fromN to N× N. Then n 7→ (fA(h1(n)), fB(h2(n))) is a bijection from N to A×B.

Corollary 1.1.12. Q is countable.

Proof. There is an injection from N to Q given by inclusion. Also, there is an injection fromQ to Z × Z by mapping each element p/q ∈ Q (in lowest terms) to (p, q) ∈ Z × Z. By theprevious corollary, since Z is countable, Z × Z is countable, so there is an injection fromZ × Z to N, whence composing injections, we obtain an injection from Q to N. ApplyingCSB, we’re done.

4

Now we may return to our claim stated earlier.

Theorem 1.1.13. Every countable dense linear order with no endpoints is isomorphic to(Q;<).

Proof. Fix a countable dense linear order (L;<L) with no endpoints. Let f : N→ L, g : N→Q be bijections; this is possible since L and Q are both countable. Our strategy will be asfollows: by induction on n, we will construct sequences an ∈ L, bn ∈ Q with the followingproperties.

(1) an 6L am if and only if bn 6 bm for all n,m ∈ N (This guarantees injectivity, sincebn 6 bm and bm 6 bn implies an 6L am and am 6L an. The map is similarly well-defined.)

(2) a0, a1, . . . = L

(3) b0, b1, . . . = Q

The map an 7→ bn will then be our isomorphism. One can see that (1) is very nearly all weneed. It guarantees that our map between Q and L is injective, well-defined, and respectsthe order on each set. Condition (2) guarantees that our map covers all of L, and condition(3) guarantees that our map is surjective.To define an, bn we will work inductively. Suppose (inductively) that a0, . . . , an−1, b0, . . . , bn−1

have been defined, and satisfy (1).Case 1: If n is even, set an = f(n/2) (this covers (2)!). Then, pick b ∈ Q such that for allm 6 n ∈ N, am 6L an if and only if bm 6 b and an 6L am if and only if b 6 bm. One can dothis because Q is a dense linear order and has no endpoints. We are picking b so that werespect the order of elements chosen so far, i.e. to preserve (1).Case 2: If n is odd, set bn = g((n − 1)/2) (this covers (3)!). Pick an ∈ L to preserve (1);this is again possible L since is a dense linear order and has no endpoints.Note then that a0, a2, a4, . . . = L and b1, b3, b5, . . . = Q, so conditions (2) and (3) met.

Construction of the real line (R):

We start with a familiar fact. The motivation for why we would like to do calculus on Rand not Q is that Q has natural ’gaps’ which R (as we will see) does not.

Proposition 1.1.14. There is no q ∈ Q such that q2 = 2.

While Q has gaps, we may often approximate real numbers to arbitary accuracy.

Proposition 1.1.15. For every ε > 0 ∈ Q, there exists q ∈ Q such that q2 < 2 < (q + ε)2.

Proof. Suppose that there exists ε > 0 ∈ Q such that for all q ∈ Q, our claim is false.Taking q = 0, we find ε2 6 2, and by the above proposition, since ε ∈ Q, the inequality isstrict. Further, if (nε)2 < 2 for any n > 1 ∈ N, then taking q = nε, we find (q + ε)2 6 2,i.e. ((n + 1)ε)2 6 2. Again, since n + 1 is rational, we must have strict inequality, i.e.((n + 1)ε)2 < 2. Hence, by induction, (n · ε)2 < 2 for all n ∈ N. This is impossible, forexample, taking the smallest n greater than the rational number 2/ε.

5

Now we will construct the real numbers, using equivalence classes of strictly increasingsequences of bounded rational numbers, which we will naturally identify with their supre-mums.

Definition 1.1.16. A sequence (an)∞n=0 is strictly increasing if for all n,m ∈ N, n <m =⇒ an < am. We say (an)∞n=0 is bounded (in Q) if there exists c ∈ Q such that for alln ∈ N, an < c.

Let E be the set of all strictly increasing bounded sequences of rationals. For (an), (bn) ∈E, set (an) ≈ (bn) if and only if ∀n ∈ N,∃m ∈ N such that bm > an and ∀n ∈ N,∃m ∈ Nsuch that am > bn. In colloquial terms, the sequences (an) and (bn) are interleaved.

Proposition 1.1.17. ≈ is an equivalence relation on E.

Proof. It is straightforward to verify the three necessary conditions.

(1) (an) ≈ (an), since (an) is strictly increasing

(2) ≈ is symmetric by the requirement for equivalence

(3) ≈ is also transitive by a (layered) application of the requirement for equivalence

For (an) ∈ E, let [an be the equivalence class of (an), which formally is the set (bn)|(bn) ≈(an). By moving to equivalence classes, we have [an] = [bn] if and only if (an) ≈ (bn), i.e.we translate equivalence to equality. Let E∗ be the set of equivalence classes. Define < onE∗ by setting [an] < [bn] if ∃k ∈ N such that ∀n ∈ N, an < bk. Informally, [an] < [bn] if theterms of (bn) eventually bound the terms of (an).There are two things to check here, namely that < is well-defined, and < is a linear orderon E∗.Well-defined: If (an) ≈ (a′n), (bn) ≈ (b′n), suppose there exists k ∈ N such that ∀n ∈N, an < bk. Ten take l such that bk < b′l. Then for all n ∈ N, we have m ∈ N such thata′n < am. So a′n < am < b′l, so < is well-defined.Linear order on E∗: This is precisely what we have rigged in our definition of the equiva-lence relation on E. That is, if [an] 6 <[bn] and [bn] 6 <[an], then (an) and (bn) are interlaced,so (an) ≈ (bn), i.e. [an] = [bn].Now, there is the matter of identifying the rationals in E∗. The map p 7→ [(p − 1/n)∞n=1]embeds Q into E∗ (that is, is an order-preserving injection). From now on, we will identify[(p − 1/n)∞n=1] with p for p ∈ Q. Replacing E∗ with an isomorphic copy, we have Q ⊆ E∗;call this isomorphic copy R, the real line.

Proposition 1.1.18. Q is dense in R. This means ∀x, y ∈ R such that x < y, there existsz ∈ Q such that x < z < y.

Proof. Say x = [an], y = [bn]. Since x < y, there exists k ∈ N such that an < bk for all n ∈ N.Take z = bk+1 ∈ Q. Then z < y, since bk+1 < bk+2, and every element of z is bounded bybk+1, hence is bounded by bk+2. We also have z > x, since (by the archimedean property ofthe rationals), there exists n ∈ N such that bk < bk+1 − 1/n.

6

Corollary 1.1.19. R is a dense linear ordering.

Proof. This is clear, since Q ⊆ R!

Proposition 1.1.20. R has no endpoints.

1.2 Lecture 2 - Uniqueness of R and Basic General Topology

Today, we will talk about the properties which characterize R, as well as some generaltopology.

Definition 1.2.1. An order (L;<) is Dedekind complete if

(i) Every A ⊆ L which is bounded above has a supremum (i.e., a <-least upper bond)

(ii) Every A ⊆ L which is bounded below has an infimum (i.e., a <-greatest lower bound)

Proposition 1.2.2. R is Dedekind complete.

Proof. Let A ⊆ R be bounded above. Let f : N → Q be onto, i.e. enumerate the rationals.Put An = f(i) | i 6 n,∃x ∈ A such that f(i) 6 x, and let (an) be the sequence defined byan = maxAn. Clearly, an 6 an+1, since An ⊆ An+1. Colloquially, we are building (an) tobe a sequence of nondecreasing elements of Q which are less than some element of A, withthe goal of showing that [an] is the sup of A. We break into two cases:Case 1: Suppose (an)∞n=0 is eventually constant. Say that an = p ∈ Q for all n > k for somek ∈ N. Can check that p is a sup for A.Case 2: Otherwise, we can thin (an)∞n=1 to a subsequence (ank

)∞k=1 which is strictly increas-ing. Can check that [ank

] is a sup for A.

Example 1.2.3. Q is not Dedekind complete. Let A = q ∈ Q | q2 < 2. Then A has noleast upper bound in Q. Let z = supA; we will see z2 = 2.

Definition 1.2.4. A linear order (L;<) is separable if there is a countable D ⊆ L which isdense in L.

Let’s recap. So far, we know that (R,+) is:

(1) a dense linear order with no endpoints

(2) Dedekind complete

(3) separable

Proposition 1.2.5. (1)+(2)+(3) characterizes (R;<) uniquely up to isomorphism.

Proof. Let (L;<) satisfy (1)+(2)+(3). Using (3), pick a countable dense subset D of L. Dnaturally inherits the linear order from L, and cannot have any endpoints. Indeed, if Dhas a (say) right endpoint α, then since L has no endpoints, there are β1, β2 ∈ L such thatα < β1 < β2; but then L has no point of D between β1 and β2, contradicting the densenessof D in L. Hence, D is a countable dense linear order with no endpoints, so (D;<) is

7

isomorphic to Q. Let f : D → Q witness this. We will extend f to an isomorphism from Lto R.For every x ∈ L, let Ωx = u | u ∈ D, u 6 x ⊆ L. Since Ωx is a bounded set in L, andsince D has no endpoints, there exists d ∈ D such that d > x. Since f is order preserving,f(d) > f(u) for every u ∈ Ωx, so f(u) must be an upper bound for f(Ωx) in R. Thus, wemay put f(x) = supR(f(Ωx)) by the Dedekind completeness of R.Note that if x < y ∈ L, then there exist z1, z2 ∈ D such that x < z1 < z2 < y by thedensity of D in L. Since f is order-preserving on the elements of D, f(z2) > f(z1) > f(u)for every u ∈ Ωx, so f(z2) > f(z1) > f(x). However, since z2 ∈ Ωy, f(z1) < f(z2) 6 f(y), sof(x) < f(y). This estalishes that f is order-preserving and injective on L.Fix z ∈ R and let A = p ∈ Q | p 6 z. Let x be the sup in L of B = f−1(A) ⊆ D; x existssince L is Dedekind complete. Note B ⊆ Ωx, since B ⊆ D, and b 6 x for each b ∈ B, soz = supR(f(B)) 6 supR(f(Ωx)) = f(x). Suppose z < f(x); then there is an element u ∈ Ωx

such that z < f(u) 6 f(x). Take q ∈ Q such that z < q < f(u); then f−1(q) < u 6 x, sincef−1 is order preserving on Q, and f−1(q) > b for each b ∈ B. So f−1(q) is an upper boundfor B, but f−1(q) < x, a contradiction. So f(x) = z, and hence f is surjective.

Topological Spaces

Definition 1.2.6. A topological space is a pair (X,T) where X is a set, T is a collectionof subsets of X, and T satisfies:

(1) ∅ ∈ T, X ∈ T

(2) V1, . . . , Vn ∈ T implies⋂ni=1 Vi ∈ T

(3) Vi ∈ T for i ∈ I implies⋃i∈I Vi ∈ T

T is called the topology of the space, and the elements of T are called the open sets. Wewill refer to X as the space when T is clear.

Definition 1.2.7. We say T is generated from U ⊆ T if T consists of arbitrary unions ofsets from U (note then U must be closed under finite intersection). U is called a basis forT. Elements of U are basic open sets.

Example 1.2.8. If (L;<) is a linear order with no endpoints, the open intervals generate atopology, called the order topology.

Definition 1.2.9. V is an (open) neighborhood of x if V is open and x ∈ V . V is a basicopen neighborhood if in addition V is a basic open set.A basis for the neighborhoods of x is any U consisting of neighborhoods of x such that everyneighborhood of x contains some V ∈ U.

Proposition 1.2.10. A ⊆ X is open if and only if for all x ∈ A, there exists an openneighborhood V of x such that V ⊆ A.

Proof. ( =⇒ ) If A is open, then A ⊆ A is a neighborhood of x.( ⇐= ) For each x ∈ A, there is a neighborhood V of x such that V ⊆ A. Then A is theunion of all such neighborhoods, and is therefore open.

8

Definition 1.2.11. The interior of E ⊆ X is the union of all open subsets of X containedin E. It is the largest open subset of X contained in E.The exterior of E ⊆ X is the union of all open subsets of X which have empty intersectionwith E. It is the largest open subset of X contained in X \E, hence is the interior of X \E.The boundary of E ⊆ X is the set of all points of X which are not in IntE or ExtE.

Definition 1.2.12. A ⊆ X is closed if X \ A is open. Note that arbitrary intersections ofclosed sets are closed, as are finite unions.

Definition 1.2.13. The closure of A in X, denoted A, is the intersection of all closed setsin X containing A. It is the smallest closed set of X containing A.

Definition 1.2.14. D ⊆ X is dense in X if every nonempty open subset V of X containsa point of D.

Definition 1.2.15. Let X ⊆ X,T a topology on X. Then the relative or induced topol-ogy T on X is defined to be T = V ∩ X | V ∈ T.

Definition 1.2.16. An open cover of X is a collection Vii∈I of open subsets of X suchthat

⋃i∈I Vi = X. A subcover of X is a subcollection Vii∈J , J ⊆ I such that

⋃i∈J Vi = X.

Definition 1.2.17. X is compact if every open cover has a finite subcover. Y ⊆ X iscompact if Y is compact in the relative topology. Equivalently, whenever Vii∈I are openin X such that

⋃i∈I Vi ⊇ Y , then there exists J ⊆ I finite such that

⋃i∈J Vi ⊇ Y .

Proposition 1.2.18. If N ⊆ X is compact, and V is open, N \ V is compact.

Proof. Let U be an open cover of N \ V . Since V is open, U ∪ V is an open cover of N .Since N is compact, there is a finite subcover U′ ⊆ U of N . If V ∈ U′, replace U′ by U′ \V ,which yields a finite subcover of N \ V .

Definition 1.2.19. (X,T) is locally compact if for every x ∈ X, there is a compact Ncontaining a neighborhood of x.

Definition 1.2.20. (X,T) is connected if it cannot be partitioned into two nonempty opensets, i.e. there are no open, disjoint A,B such that A ∪B = X.

Proposition 1.2.21. R (with the order topology) is connected.

Proof. Suppose R = A∪B for some nonempty open, disjoint subsets A,B such that A∩B =∅. Let a ∈ A, b ∈ B; WLOG, a < b. Put E = x ∈ A | x < b; since a ∈ E, E is nonemptyand bounded above. Let z = sup(E). Since R is Dedekind complete, we must have z ∈ Aor z ∈ B. We break into cases:Case 1: Suppose z ∈ A. Since A is open, there is an open interval (x, y) such thatz ∈ (x, y) ⊆ A. Since R is dense, we can find z > z with z < z < minb, y. Then z ∈ (x, y),so z ∈ A, but z < b, contradicting that z is an upper bound for E.Case 2: Suppose z ∈ B. Since B is open, there is an open interval (x, y) such thatz ∈ (x, y) ⊆ B. Then x < z, so x is not an upper bound for E (z is the least upper boundfor E). Thus, we can find z ∈ E such that z > x. Since z is an upper bound for E, hatz 6 z,so z ∈ (x, y) ⊆ B. This is a contradiction, since z ∈ A.

9

Proposition 1.2.22. R is locally compact.

Proof. We show for all a < b ∈ R, the closed interval [a, b] = x | a 6 x 6 b is compact. LetVii∈I be an open cover of [a, b]. Suppose for contradiction that there is no finite subcover.The strategy we will pursue is as follows: we will find the greatest point x of [a, b] such that[a, x] has a finite subcover. Of course, this x must lie in some open set, which allows us topush the finite open cover to cover [a, x+ ε], contradicting the maximality of x.Let A = x ∈ [a, b] | there is finite J ⊆ I such that

⋃i∈J Vi ⊆ [a, x]. A is nonempty and

bounded, since every element of A is less than b by hypothesis, and a ∈ A since we canjust take J = i for any Vi containing a (such a Vi must exist since Vii∈I is an opencover of [a, b]). If b ∈ A, we are done. Suppose not, and let c = sup(A). Then a 6 c 6 b.Hence, there exists k ∈ I such that c ∈ Vk. Vk is open, so there is an open interval (u,w)such that c ∈ (u,w) ⊆ Vk. Since c = sup(A), we can find x ∈ A such that u < x 6 c(otherwise, u would be an explictly smaller lower bound for A, contradicting the minimalityof c). Since x ∈ A, we can find finite J ⊆ I such that

⋃i∈J ⊇ [a, x]. Take z ∈ (c, w). Then⋃

i∈(J∪k) ⊇ [a, z], which contradicts the maximality of c, since z > c.

Definition 1.2.23. (X,T) is Hausdorff if for all x 6= y ∈ X, there exists neighborhoodsVx, Vy ∈ T of x and y such that Vx ∩ Vy = ∅.

Proposition 1.2.24. R is Hausdorff.

Proof. This essentially boils down to density. Let x, y ∈ R be distinct points, and x < yWLOG. Then there exists z ∈ R such that x < z < y, so for ε > 0, (x− ε, z) and (z, y + ε)are neighborhoods of x and y respectively with empty intersection.

Proposition 1.2.25. Let (X,T) be Hausdorff and locally compact. Then for every open setU and for every x ∈ U , there exists a compact Nx ⊆ U containing a neighborhood of x.

Proof. Fix N compact containing a neighborhood Ux of x; this is possible by local compact-ness. If N ⊆ U , we’re done. If not, we can use the fact that X is Hausdorff to selectively peelaway parts of N not contained in U while retaining a neighborhood of x. For every y ∈ N \U ,fix a neighborhood Vy of y, and let Ux

y be a neighborhood of x such that Uxy ∩ Vy = ∅.

Note⋂y∈N\U Vy ⊇ N \U , i.e. Vyy∈N\U form an open cover of N \U ; but N \U is compact,

since N is compact and U is open. Thus, there exists a finite subcover Vy1 , . . . , Vyk such

that⋃i=1,...,k Vyi ⊇ N \ U . Let N = N \

⋃i=1,...,k Vyi . Then N is compact, and N ⊆ U .

Further, N ⊇⋂i=1,...,k(U

xyi∩N), since each Ux

yihas no points in common with Viy . Finally,

since Ux ⊆ N ,⋂i=1,...,k(U

xyi∩ Ux) is a neighborhood of x contained in N .

1.3 Lecture 3 - More on Compactness and the Baire CategoryTheorem

Today, we will give a few more results on compactness, and will introduce the Baire CategoryTheorem.

Proposition 1.3.1. Let (X,T) be a compact space. Let Cii∈N be a collection of closed subsetsof X. Suppose for all n ∈ N,

⋂i6nCi 6= ∅. Then

⋂i∈NCi 6= ∅.

10

Proof. As we will see in this proof, compactness is often the natural bridge between finitefacts and infinite claims, and vice versa. Suppose

⋂i∈NCi = ∅. Then for every x ∈ X, there

is some ix ∈ N such that x /∈ Cix . Then since Cix is closed, there is an open neighborhood Uxof x contained in X \ Cix , i.e. Ux ∩ Cix = ∅. Note Uxx∈X is an open cover of X. Since Xis compact, we have k ∈ N and x1, . . . , xk ∈ X such that Ux1 , . . . , Uxk cover the space. SinceCixl ∩ Uxl = ∅ for each l ∈ 1, . . . , k, every y ∈ X is outside at least one of Cix1 , . . . , Cixk ,so Cix1 ∩ · · · ∩ Cixk = ∅. Then for any n larger than maxix1 , . . . , ixk,

⋂i6nCi = ∅, a

contradiction.

Proposition 1.3.2. Compact sets in Hausdorff spaces are closed.

Proof. Let N be a compact subset of a Hausdorff space X. It suffices to show that eachx ∈ X \ N has a neighborhood completely contained in X \ N . For each y ∈ Y , let Vy, Uybe neighborhoods of y and x respectively such that Vy ∩ Uy = ∅. Then Vyy∈N is an opencover of N , so there exists a finite subcover Vy1 , . . . , Vyk. Since Uyi ∩ Vyi = ∅ for eachi = 1, . . . , k, Uy1 ∩ · · · ∩ Uyk is a neighborhood of x which is disjoint from N .

We now have the tools to present proof of the Baire Category theorem. We will seeanother equivalent formulation of the BCT today, as well as a distinct formulation in Lecture4.

Theorem 1.3.3 (Baire Category Theorem). Let (X,T) be Hausdorff and locally compact(e.g., R). Let Dnn∈N be dense open subsets of X. Then

⋂n∈NDn is dense (and nonempty).

Proof. Let U be a nonempty open subset of X. We will find a point in U ∩ (⋂n∈NDn). This

will establish that⋂n∈NDn is nonempty and dense, as we will have shown each nonempty

open subset of X contains a point of⋂n∈NDn. The basic idea is that we will construct a

sequence of shrinking neighborhoods, making sure we pull in a point from each Dk at everystep. We will then leverage local compactness and Hausdorff-ness as needed to ensure thatour sets shrink to at least one point, using the previous two propositions.Set U0 = U . Now by induction on k > 1, we construct the following:

(i) Take xk ∈ Uk−1 ∩ Dk−1; this is possible since Dk−1 is dense, so its intersection witheach nonempty open subset is nonempty

(ii) Find Nk compact containing a neighborhood of xk such that Nk ⊆ Uk−1; this is possiblesince X is locally compact and Hausdorff, using the theorem proved at the end ofLecture 2.

(iii) Let Uk be a neighborhood of xk such that Uk ⊆ Nk−1 (via (ii) above) and Uk−1 ⊆ Dk−1;this is possible since Dk−1 is open, and we can take a neighborhood of xk in Nk−1 andintersect it with Dk−1 (which is nonempty, since (i) guarantees xk ∈ Dk−1)

Then we have constructed a chain U0 ⊇ N1 ⊆ U1 ⊇ N2 ⊇ U2 . . .. Each Nk is compact,hence closed by the previous proposition, and for every k ∈ N, we have

⋂i6kNi = Nk 6= ∅.

Thus, by our earlier proposition,⋂i∈NNi 6= ∅. Take any y ∈

⋂i∈NNi; then for every i,

y ∈ Ni+1 ⊆ Ui+1 ⊆ Di, and y ∈ N0 ⊆ U0 = U . Thus, y ∈⋂n∈NDi, and y ∈ U , so

y ∈ U ∩ (⋂n∈NDi).

11

Definition 1.3.4. A set C is nowhere dense if C has empty interior.

Proposition 1.3.5. C is closed nowhere dense if and only if X \ C is open dense.

Proof. ( =⇒ ) If C is closed nowhere dense, then X \ C is open, and C = C. Further, letU be a nonempty open subset of X. Since C has empty interior, U has at least one pointu ∈ X \ C = X \ C.( ⇐= ) If X \ C is open dense, then C is closed, whence C = C. Further, since X \ C isdense, every nonempty open subset U of X contains a point u ∈ X \ C = X \ C. Thus, Chas empty interior, so C is nowhere dense.

Theorem 1.3.6 (Equivalent formulation of the Baire Category Theorem). Let (X,T) be alocally compact Hausdorff space. Let Cnn∈N be closed nowhere dense. Then

⋃n∈NCn 6= X.

Proposition 1.3.7. Let (X,T) be Hausdorff. Then for every x ∈ X,X \ x is open.

Proof. Since X is Hausdorff, for each y 6= x ∈ X, there is a neighborhood Vy of y such thatx /∈ Vy. Then X \ x =

⋃y 6=x∈X Vy, whence X \ x is open.

Definition 1.3.8. x ∈ X is isolated if x is open.

Proposition 1.3.9. If x is not isolated, then X \ x is dense.

Proof. Every nonempty open set clearly contains a point of X \x, except for possibly x.Since x is not isolated, x is not open, so we’re done.

Note: R has no isolated points.

Corollary 1.3.10. Let X be Hausdorff and locally compact with no isolated points. ThenX is not countable.

Proof. Suppose X were countable, and enumerate X via the sequence (qn)n∈N. For eachn ∈ N, put Dn = X\qn. Then Dn is open dense by the previous two propositions, so by theBaire Category Theorem,

⋂n∈NDn 6= ∅. But

⋂n∈NDn =

⋂n∈NX \qn = X \

⋃n∈Nqn = ∅,

a contradiction.

Corollary 1.3.11. R is not countable.

Proof. R is Hausdorff and locally compact with no isolated points.

Proposition 1.3.12 (F12.5). Let (X,T) be Hausdorff and locally compact. E ⊆ X is Gδ ifit can be written as

⋂n∈NGn with Gn open for each n ∈ N. Prove that Q is not Gδ.

Proof. This is a standard application of the Baire Category Theorem. Let (qn)n∈N enumerateQ. Note that Dn = R \ qn is open dense, since R is Hausdorff and has no isolated points.Suppose Q =

⋂∞n=1Gn, where Gn is open for each n ∈ N. Note Gn is also dense for each

n ∈ N, since Gn ⊇ Q. Thus, by the Baire Category Theorem, (⋂n∈NGn) ∩ (

⋂n∈NDn) 6= ∅.

But this is a contradiction, as⋂n∈NGn = Q and

⋂n∈NDn = R \Q.

12

Metric Spaces

Definition 1.3.13. A metric space is a pair (X, d) where d : (X ×X)→ [0,∞) satisfies:

(1) For all x ∈ X, d(x, x) = 0

(2) For all x 6= y ∈ X, d(x, y) 6= 0

(3) For all x, y ∈ X, d(x, y) = d(y, x)

(4) (Triangle inequality): For all x, y, z ∈ X, d(x, z) 6 d(x, y) + d(y, z)

Intuitively, d(x, y) can be thought of as the distance between x and y.

Proposition 1.3.14. Let (X, d) be a metric space. Then the sets

B(z, r) = x | d(z, x) < r, for z ∈ X, r > 0

generate a topology on X called the metric topology; B(z, r) are the open balls of radiusr centered at z.

Proof. Let T be the collection of all unions of open balls. To show T is a topology, it suffiesto check that the intersection of any two open balls is a union of open balls. For this, it isenough to show that for all z1, z2 ∈ X and r1, r2 ∈ (0,∞), for all x ∈ B(z1, r1) ∩ B(z2, r2),there exists s > 0 such that B(x, s) ⊆ B(z1, r1) ∩B(z2, r2).This essentially boils down to the triangle inequality. Let d1 = d(z1, x) < r1; d2 = d(z2, x) <r2. Let s > 0 be small enough such that d(z1, x) + s < r1, and d(z2, x) + s < r2; this ispossibly because R is dense.Fix y ∈ B(x, s). Then d(zi, y) 6 d(zi, x) + d(x, y) < di + s < ri for i = 1, 2. Hence,B(x, s) ⊆ B(z1, r1) ∩B(z2, r2).

Example 1.3.15. The usual metric on R: d(x, y) = |x − y|. Then the metric topologyon R is the usual order topology, generated by open intervals.

Example 1.3.16. The discrete metric on any set X:

d(x) =

0, if x = y

1, if x 6= y

Every subset of X is open with respect to this metric.

Definition 1.3.17. A metric space is compact if it is compact with the metric topology.Equivalently, any covering of X with open balls has a finite subcover.Y ⊆ X is compact if (Y, d|Y×Y ) is compact. Equivalently, any covering of Y with openballs has a finite subcover.

Definition 1.3.18. A sequence (xn)∞n=1 is Cauchy if for every ε > 0, there exists n ∈ Nsuch that for all k, l > N ∈ N, d(xk, xl) < ε. Intuitively, the points of a Cauchy sequencecluster arbitrarily closely together if you go far out enough in the sequence.

13

Definition 1.3.19. A sequence (xn)∞n=1 converges to z if for every ε > 0, there existsN ∈ N such that for all k > N ∈ N, d(xn, z) < ε.

Proposition 1.3.20. A sequence (xn)∞n=1 in any metric space (X, d) can converge to at mostone z ∈ X.

Proof. The underlying idea here is that every metric space is Hausdorff, so we can findtwo neighborhoods the two converging values which are disjoint, and points of the sequencecan’t be in both at once. Suppose otherwise, i.e. (xn)∞n=1 converges to both z1, z2 ∈ X, withz1 6= z2. Then d(z1, z2) > 0; let ε = (1/2)d(z1, z2).By the definition of conergence, for each ε > 0, there exists Ni such that for all k > Ni ∈ N,d(xk, zi) < ε for i = 1, 2. But then taking k > maxN1, N2, we get d(xk, z1) < ε andd(xk, z2) < ε. But then d(z1, z2) 6 d(xk, z1) + d(xk, z2) < 2ε = d(z1, z2).

Definition 1.3.21. If (xn)∞n=1 converges, then the unique z it converges to is called the limitof (xn)∞n=1, denoted limn→∞ xn.

Proposition 1.3.22. If (xn)∞n=1 converges, then it is Cauchy.

Proof. A simple application of the triangle inequality: take N large enough so that for allm > N , d(xm, z) < ε/2. Then note that for all m,n > N , d(xm, xn) 6 d(xm, z) + d(z, xn) <ε/2 + ε/2 = ε.

Proposition 1.3.23. If (xn)∞n=1 converges to z, then so does every subsequence of (xn)∞n=1.

Proof. Let (xn)∞n=1 converge to z, and let (xnk)∞k=1 be a subsequence of (xn)∞n=1. Let ε > 0

be given. Choose N ∈ N such that for all m > N ∈ N, d(xn, z) < ε. Let k be the smallestelement of N such that nk > N . Then for all j > k ∈ N, nj > nk > N , so d(xnj

, z) < ε.

Definition 1.3.24. A metric space X is complete if every Cauchy sequence of points in Xhas a limit in X.

Definition 1.3.25. Let (xn)∞n=1 be a sequence, bounded above and below. Define the liminf of (xn)∞n=1 by

limxn→∞

inf xn = supn∈N

infl>n

xl

and similarly, the lim sup of (xn)∞n=1 by

limxn→∞

inf xn = infn∈N

supl>n

xl

For sequence of real numbers, these values live in R by Dedekind completeness.

Theorem 1.3.26. R (with the Euclidean metric) is complete.

Proof. We will use the Dedekind completeness of R. Let (xn)∞n=1 be a sequence, boundedabove and below. Note that

infl>1

xl 6 infl>2

xl 6 · · · 6 supl>2

xl 6 supl>1

xl

This is because when we remove elements from our sequence, the inf can only go up, whereasthe sup can only decrease, but the sups will never fall below the infs. In particular, this shows

supn∈N

infl>n

xl 6 infn∈N

supl>n

xl

14

Claim 1.3.27. For every ε > 0, there exists N ∈ N such that ∀k > N , xk < lim supxn + ε.

Proof. Let N be large enough that

supl>N

xl − lim supxn < ε

orsupl>N

xl < lim supxn + ε

This is possible because (supl>n xl)∞n=1 is a monotone decreasing sequence whose greatest

lower bound (i.e., limit by monotone convergence theorem) is lim sup xn. Then for all l >N, xl 6 supl>n xl < lim supxn + ε.

Claim 1.3.28. For every ε > 0, there exists N ∈ N such that ∀k > N , xk > lim supxn − ε.

Assume (xn) is Cauchy. We want to show that (xn) has a limit in R.

Claim 1.3.29. If lim inf xl = lim sup xl = l for some l ∈ R, xn → l.

Proof. Let ε > 0 be given. By the claim above, there exists N ∈ N such that for all k > N ,

xk − l < ε

Additionally, by the claim above, there exists N ′ ∈ N such that for all k > N ′,

l − xk < ε

Thus, for all K > maxN,N ′,|xk − l| < ε

Thus, it is sufficient to show lim inf xl = lim sup xl.

Claim 1.3.30. For every ε > 0 and N ∈ N, there exists k > N such that xk > lim supxl−ε.

Proof. Let ε > 0, N ∈ N be given. Suppose there were no such k > N such that xk >supl>N−ε, i.e. xk 6 supl>N xl − ε for all k > N . Then supl>N xl − ε is an upper bound for(xl)l>N which is smaller than supl>N xl, a contradiction. Hence, there exists k > N suchthat xk > supl>N−ε. Since supl>N xl > lim supxl, we get xk > lim supxl − ε.

Claim 1.3.31. For every ε > 0 and N ∈ N, there exists k > N such that xk < lim inf xl + ε.

Suppose for contradiction, lim inf xl 6= lim supxl. Let

ε =lim supxl − lim inf xl

3

Since (xn) is Cauchy, there exists N ∈ N such that for all k1, k2 > N ,

|xk1 − xk2| < ε

By the claims above, there exist k1, k2 > N such that xk1 > lim supxl − ε and xk2 <lim inf xl + ε. But then, |xk1 − xk2| > ε by the triangle inequality, a contradiction.

15

1.4 Lecture 4 - Completeness and Sequential Compactness

Today, we will talk more about completeness, as well as its connection with compactness.

Definition 1.4.1. A limit point of A (in a metric space (X, d)) is a point z such that forall ε > 0, A∩B(z, ε) 6= ∅. Equivalently, z is a limit point of A if z is the limit of a sequenceof values in A.

Proposition 1.4.2. Let (X, d) be a metric space. Then A ⊆ X is closed if and only if Acontains each of its limit points.

Proof. ( =⇒ ) Suppose A is closed, and let z be a limit point of A. If z /∈ A, then sinceX \A is open in the metric topology, there exists ε > 0 such that B(z, ε) ⊆ X \A. But thenA ∩B(z, ε) = ∅, a contradiction since z is a limit point.(⇐= ) Suppose A contains all its limit points. Pick z ∈ X \ A. Then z is not a limit pointof A, so there exists ε > 0 such that B(z, ε) ∩ A = ∅, i.e. B(z, ε) ⊆ X \ A. Hence, X \ A isopen, so A is closed.

Definition 1.4.3. Let (X, d) be a metric space. A subset A ⊆ X is complete if and onlyif (A, d|A×A) is complete.

Proposition 1.4.4. If (X, d) is complete, then A ⊆ X is complete if and only if A is closed.

Proof. ( =⇒ ) Suppose A is complete. Let z be a limit point of A. Then there is a sequence(xn)∞n=1 of points in A such that xn → z; but xn is convergent, hence Cauchy in (X, d). Since(xn) is a sequence of points in A, it is also cauchy in (A, d|A×A). Since A is complete, (xn)must have a limit in (A, d|A×A); call it z ∈ A. Then xn → z also in (X, d), so z = z, whencez ∈ A. (Note that we have not used the completeness of X here).( ⇐= ) Suppose A is closed. Let (xn) be Cauchy in (A, d|A×A). Then (xn) ⊆ A and (xn)is Cauchy in (X, d). Since X is complete, there exists z ∈ X such that xn → z. Since A isclosed and z is a limit point of A, z ∈ A.

Proposition 1.4.5. Let (X, d) be a metric space. The sets B(z, r) = x | d(z, x) 6 r areclosed.

Proposition 1.4.6. Let (X, d) be a metric space. If U is open, for each z ∈ U , there existsε > 0 such that B(z, ε) ⊆ U .

Proof. Let ε be sufficiently small such that B(z, 2ε) ⊆ U . Then B(x, ε) ⊆ B(z, 2ε) ⊆ U .

Proposition 1.4.7. Let (X, d) be complete. Let Bnn∈N be open balls of radius rn withlimn→∞ rn = 0 and Bn+1 ⊆ Bn. Then

⋂n∈NBn is nonempty.

Proof. Say Bn = (xn, rn). Then for all N ∈ N, for all k, l > N , we have xk, xl ∈ BN , sinceeach of the open balls are nested, which gives d(xk, xl) < 2rN . Since rn → 0, this impliesthat (xn) is Cauchy, since we can take N large enough that 2rN is arbitrarily small, whichbounds d(xl, xk) for all k, l > N . By the completeness of (X, d), (xn) has a limit; call it z.For every N, z = limn>N+1 xn. Hence z is a limit point of BN+1 ⊆ BN+1. Since BN+1 isclosed, z ∈ BN+1, so z ∈ BN , since BN+1 ⊆ BN .

16

Theorem 1.4.8 (Baire Category Theorem, Version 2). Let X be a complete metric space.Let Dnn∈N be open dense. Then

⋂n∈NDn is dense.

Proof. Though BCT v2 is actually not equivalent to BCT v1, the main technique in thisproof will closely resemble that of the proof of BCT v1. In particular, we will inductivelyconstruct a shrinking set of balls which capture points from each dense subset. The previouslemma will guarantee that our eventual intersection is nonempty, which is dependent on theassumption of completeness.Let U0 be open. Inductively, we construct (for n > 1):

(i) Pick xn ∈ Un−1 ∩Dn−1; this is possible since Dn−1 is dense

(ii) Pick rn > 0 sufficiently small such that B(xn, rn) ⊆ Un−1 ∩ Dn−1; this is possible bythe our earlier proposition, since Un−1 ∩Dn−1 is open

(iii) Reduce rn as needed so that rn < 1/n; this ensures rn → 0

(iv) Set Un = B(xn, rn)

Set Bn = B(xn, rn). Then we get the following chain of inclusions

U0 ⊇ B1 ⊇ B1 ⊇ B2 ⊇ · · ·

where Bn ⊆ Dn−1 for each n > 1 ∈ N. Now by the previous proposition, there existsz ∈

⋂n∈NBn. Since Bn ⊆ Bn ⊆ Dn−1, z ∈ Dn for each n ∈ N; similarly, since z ∈ B1,

z ∈ U0. Thus, z ∈ U0 ∩ (⋂n∈NDn).

Corollary 1.4.9 (S08.6). A complete metric space with no isolated points cannot be count-able.

Proof. See previous proof.

Definition 1.4.10. A metric space (X, d) is sequentially compact if every sequence hasa convergent subsequence.

Theorem 1.4.11. (X, d) is sequentially compact if and only if X is compact.

Proof. The “if” direction here is quite tricky, so we will start with the “only if”.(⇐= ) Suppose (X, d) is compact. Let (xn)∞n=1 be given. Suppose for contradiction that noz ∈ X is a limit of a subsequence of (xn)∞n=1. Fix z ∈ X; if for every r > 0 and N ∈ N, thereexists k > N ∈ N such that xk ∈ B(z, r), then we can inductively construct a subsequenceof (xn) which converges to z. Hence, for each z ∈ X, there exists rz > 0, Nz ∈ N suchthat for all n > Nz, xn /∈ B(z, rz). The set B(z, rz) | z ∈ X is an open cover of X. Bycompactness, there are finitely z1, . . . , zk such that B(z1, rz1)∪ · · · ∪B(zk, rzk) = X. But forany n > maxNz1 , . . . , Nzk, we have xn /∈ B(z1, rz1)∪ · · · ∪B(zk, rzk) = X, a contradiction.( =⇒ ) Let Vii∈I be an open cover of (X, d). Suppose Vii∈I has no finite subcover. Wewill aim to construct a sequence (xn) with no convergent subsequence. The intuitive idea isthat we can construct a sequence (xn) such that the terms of (xn) are spaced sufficiently farfrom each other element of the sequence, so that (xn) cannot have a convergent subsequence.We work inductively as follows. Let x0 be some point in X. Suppose we have constructedxn. Then:

17

(i) There exists in ∈ I such that xn ∈ Vin ; this is possible since Vii∈I is an open coverfor X

(ii) Fix rn > 0 such that B(xn, rn) ⊆ Vin ; this is possible since Vin is open

(iii) Pick in, rn so that rn > 1/2L for the smallest possible L ∈ N; in other words, we wantto pick in and rn so that we can put the largest open ball possible around xn in Vin

(iv) Finally, pick xn+1 in X \ (Vi1 ∪ · · · ∪Vin); this is possible since X has no finite subcover

Claim 1.4.12. (xn) constructed above has no convergent subsequence.

Proof. Suppose for contradiction that (xn) has a subsequence converging to some z ∈ X, i.e.for every ε > 0 and N ∈ N, there exists n > N such that xn ∈ B(z, ε).Fix i such that z ∈ Vi. Fix L ∈ N such that B(z, 1/L) ⊆ Vi. Fix n ∈ N such thatxn ∈ B(z, 1/2L); we can do so by the above. Note then that B(xn, 1/2L) ⊆ B(z, 1/L) ⊆ Vi,where the first inclusion follows from the triangle inequality. At stage n in the inductionabove, we could have picked in = i and rn = 1/2L. Hence, by construction, the rn we didpick must be ≥ 1/2L, whence B(xn, 1/2L) ⊆ Vin .Hence, for all k > n, xk /∈ B(xn, 1/2L), since by construction we have xk /∈ Vin for eachk > n. From this, we get that for all k > n,

d(z, xk) > d(xk, xn)− d(z, xn) > 1/2L− d(z, xn) = ε > 0

Thus, xk /∈ B(z, ε) for every k > n, so z is not a limit point of xk, a contradiction.

This completes the proof.

Theorem 1.4.13 (Heine-Borel). In R, a set A is compact if and only if it is closed (complete)and bounded (both above and below).

Proof. ( =⇒ ) We proved earlier that every compact subset of R (a Hausdorff space) isclosed. If A were unbounded in either direction, then we would have a monotone set ofpoints in A which have no convergent subsequence, contradicting sequential compactness.(⇐= ) Suppose A is bounded and closed. We prove A is sequentially compact. Our methodwill be a “lion in the desert” style proof. To hunt a lion in the desert, divide the desert intotwo halves; the lion must be in one of them, so follow him there. Repeat until you’ve caughtthe lion.Let (xn) be a sequence of points in A. It is sufficient to find a Cauchy subsequence by thecompleteness of R. Let (a, b) ⊆ R such that A ⊆ (a, b) (possible since A is bounded aboveand below). Working inductively, we define open intervals Bk such that (∗) for infinitelymany n, xn ∈ Bk (i.e. Bk contains infinitely many terms from our sequence):

(i) Set B0 = (a, b).

(ii) Having defined Bk, say Bk = (ak, bk), let c be the midpoint of (ak, bk). Then either xnhas infinitely many elements in (ak, c), or in (c, bk) (if xn has infinitely many elementsequal to c, then that constant subsequence is clearly convergent).

18

(iii) If xn has infinitely many elements in (ak, c), take Bk+1 = (ak, c). Otherwise, set Bk+1 =(c, bk).

Using (∗), find a subsequence (xnk) such that xnk

∈ Bk for each k ∈ N. Since the lengths ofBk shrink to 0, (xnk

) is Cauchy.

Definition 1.4.14. (X, d) is totally bounded if for every ε > 0, X can be covered withfinitely many balls of radiuses less than ε.

A proof similar to the one above yields above gives the “only if” direction of the following:

Theorem 1.4.15 (S09.4(e), S13.3). A metric space (X, d) is compact if and only if X iscomplete and totally bounded.

19

2 Week 2

As per the syllabus, Week 2 topics include: convergence of sums, rearrangements and ab-solute convergence, continuity in topological and metric spaces, path connectedness, inter-mediate value theorem, contraction maps and the fixed point theorem, uniform continuity,uniform convergence, and the Arzela-Ascoli theorem.

2.1 Lecture 5 - Convergence of Sums and Some Exam Problems

Today, we will discuss convergence of sums, and will complete some exam problems concern-ing the evaluation and convergence of sums.

Definition 2.1.1 (Convergence of sums in R). For a sequence (ai) ⊆ R,∑∞

i=1 ai convergesto s ∈ R if the sequence sn =

∑ni=1 ai converges to s.

∑∞i=1 ai converges absolutely if∑∞

i=1 |ai| converges.

Define (sn) by sn =∑n

i=1 ai. Then to check that∑∞

i=1 ai converges, it is sufficient (in fact,necessary) to show that (sn) is Cauchy, i.e. for every ε > 0, there exists N ∈ N such thatfor all k, l > N ,

|sk − sl| < ε, i.e. if k 6 l,

∣∣∣∣ l∑i=k+1

ai

∣∣∣∣ < ε

Example 2.1.2. For any (bn) with 0 6 bn 6M ,

∞∑n=1

bn10n

converges.

Proof. It is sufficient to show that the tail sums are arbitrarily small, i.e. for N sufficientlylarge,

∞∑n=N

bn10n

< ε

We compare∞∑i=N

bn10n

6M ·∞∑i=N

1

10n= M · 1

10N· 1

1− 110

=1

10N· M

9

Since we can always take N sufficiently large so that 9 · 10N · ε > M , we are done.

Corollary 2.1.3 (S04.1). P (N) = b | b ⊆ N injects into R.

Proof. For b ⊆ N, set

f(b) =∞∑n=1

bn10n

20

where

bn =

1 if n ∈ b0 if n /∈ b

Note that f is well defined, since∞∑n=1

bn10n

always converges by the previous proposition. Moreover, if a 6= b, then taking the least Nwhich belongs to one of a, b but not the other (assume WLOG N ∈ a,N /∈ b), we have

f(a) =N−1∑n=0

an10n

+1

10N+

∞∑n=N+1

an10n

f(b) =N−1∑n=0

bn10n

+0

10N+

∞∑n=N+1

bn10n

Since no integer less than N is in one of a, b but not the other by the definition of N , wehave

N−1∑n=0

an10n

=N−1∑n=0

bn10n

Note∞∑

n=N+1

bn10n

61

10N· 1

9<

1

10N+

∞∑n=N+1

an10n

So f(b) < f(a), and f is injective.

Corollary 2.1.4. R ∼= P (N).

Proof. First, note P (Q) ∼= P (N), since we can map every subset A of Q maps to a uniquesubset of N by applying a bijection from Q to N to each element of A. Then map f : R →P (Q) given by

f(x) = q ∈ Q | q < x

Note f is injective since Q is dense in R, so if x < y ∈ R, then there is z ∈ Q such thatx < z < y, and z ∈ f(y) but z /∈ f(x). Thus, by CSB, R ∼= P (N).

Theorem 2.1.5 (Cantor). P (N) is not countable.

Proof. Suppose for contradiction that there exists f : N → P (N) which is bijective. Letb = n ∈ N | n /∈ f(n). Then b is not in the range of f , as for every n ∈ N, n ∈ b if andonly if n /∈ f(n), so b 6= f(n) for each n ∈ N. Thus, f is not onto, a contradiction.

Proposition 2.1.6. R ∼= R× R

21

Proof. It suffices to show P (N) ∼= P (N) × P (N), as R ∼= P (N). Let A1, A2 ⊆ N be infinitesubsets of N with A1 ∪ A2 = N, A1 ∩ A2 = ∅.Let gi : Ai → N be bijective for i = 1, 2. Define a function from P (N) to P (N)× P (N) by

b 7→ (g1(b ∩ A1), g2(b ∩ A2))

This is a bijection, since g1 and g2 are bijections.

Proposition 2.1.7 (S09.1). Let a0 = 0, an+1 =√

6 + an. Show (an) converges, and find itslimit.

Solution. We proceed directly, without use of continuity. By induction, we show that an 6 3for all n ∈ N:Note a0 = 0 6 3. Suppose an 6 3. Then

a2n+1 = 6 + an 6 6 + 3 = 9

so an+1 6 3. To prove a2n+1 > a2

n, note

a2n+1 = 6 + an > an · 2 + an = 3an > a2

n

So (an) is increasing and bounded above by 3, so an has limit 6 3.

Claim 2.1.8. For every ε > 0, (an) is not bounded by 3− ε. This will show the limit is 3.

Proof. Restrict to ε < 4. We show that if 3 − ε is a bound, then so is 3 − 2ε. It is enoughto show that an+1 < 3− ε =⇒ an < 3− 2ε. Say an+1 < 3− ε. Then

√6 + an < 3− ε

=⇒ 6 + an < 9− 6ε+ ε2

So an < 3 − 6ε + ε2 < 3 − 2ε, where the last inequality is obtained because ε2 < 4ε, sinceε < 4 by assumption.

Using repeated application of this to keep doubling ε, we get an 6 3− 4 = −1. This is acontradiction, since a0 > 3− 4.

Proposition 2.1.9. Let (an), (bn), (cn) be sequences. If an < bn < cn for each n ∈ N, and(an) and (cn) converge to the same limit l, then (bn) converges to l.

Proof. Pick N large enough such that |an − l| < ε, |cn − l| < ε. Then note

an − l < bn − l < cn − l < ε

Further,ε > l − an > l − bn > l − cn

Hence, |bn − l| < ε.

Proposition 2.1.10. If limn→∞ dn = l1 and limn→∞ un = l2 and limn→∞ dn − un → 0, thenl1 = l2.

22

Proof. Suppose not. Let ε > 0 such that |l1 − l2| > ε. Take N large enough such that

|dN − l1| <ε

3and |uN − l2| <

ε

3and |dN − uN | <

ε

3

Then

|l1− l2| = |l1− dN + dN − uN + uN − l2| 6 |l1− dN |+ |dN − uN |+ |uN − l2| <ε

3+ε

3+ε

3= ε

Proposition 2.1.11.∑∞

n=0(−1)n

n+1converges.

Proof. Set

sn =n∑i=0

(−1)i

i+ 1

The basic idea here is that sn alternates between increasing and decreasing; that is, sn > sn+2

for each n even, and sn < sn+2 for n odd. We will build a strictly increasing sequence outof elements of (sn) and a strictly decreasing sequence out of elements of (sn), and wedge theterms of (sn) between these two sequences.Let (un) be the sequence s0, s0, s2, s2, s4, s4, . . ., i.e. un = sbn/2c. Note that (un) is decreasing.Let (dn) be the sequence s1, s1, s3, s3, s5, s5, . . ., i.e. un = sd(n+1)/2e. Note that (dn) isincreasing.Then for all n, dn 6 sn 6 un. Note (un) and (dn) are bounded below by 0 and boundedabove by 1 respectively, so both sequences converge. Finally, note

|un − dn| 61

n+ 1

so by the above proposition, (un) and (dn) converge to the same limit. Thus, by the previousproposition, (sn) converges.

Proposition 2.1.12 (S05.5). Let

SN =1

N

N∑n=1

an

(a) Suppose limn→∞ an = A. Show limN→∞ SN = A.

(b) Is the converse true? Prove your assertions.

Proof. (a) Let ε > 0 be given. Since (an) converges to A, there exists M0 large enough suchthat |an − A| < ε/2 for all n > M0. Let M > M0 be large enough so that

|(a1 − A) + (a2 − A) + · · ·+ (aM0 − A)|M

2

23

Then for N >M

|SN − A| =∣∣∣∣a1 + · · ·+ aN

N− A

∣∣∣∣6

∣∣∣∣(a1 − A) + · · ·+ (aM0 − A)

N

∣∣∣∣+

∣∣∣∣(aM0 − A) + · · ·+ (aN − A)

N

∣∣∣∣<ε

2+|aM0 − A|+ · · ·+ |aN − A|

N

2+

(ε/2) · (N −M0)

N<ε

(b) Consider an = (−1)n. Then∑N

n=1 an is −1 or 0 depending on whether n is odd or even,so sN → 0. But an does not itself converge.

Proposition 2.1.13 (S05.2). Let X be the space of sequences (σn)∞n=1 with σn ∈ 0, 1.Define a metric on x by

d((σn), (τn)) =∞∑n=1

1

2n|σn − τn|

Prove (directly) that every infinite A ⊆ X has an accumulation point, i.e. X is sequentiallycompact.

Proof. This will be another “lion in the desert” proof. The essential idea is that we canmake the distance between two sequences very small if they agree on a large number of firstdigits. Let A ⊆ X be infinite. We will define ak ∈ 0, 1 for k > 1 so that

(∗)k: There are infinitely many (σn) ∈ A extending (a1, . . . , ak−1) (i.e., the first k digits of(σn) are a1, . . . , ak−1).

holds for each ak. Note that (∗)1 holds because A is infinite, so there must either be aninfinite number of sequences starting with 0 or an infinite number of sequences starting with1. Let a1 be 0 or 1 depending on which is the case.Now suppose (∗)k holds. Note that

(σn) extending (a1, . . . , ak−1) =(σn) extending (a1, . . . , ak−1, 0) ∪ (σn) extending (a1, . . . , ak−1, 1)=B0 ∪B1

Since (∗)k holds, A must have infinite intersection with at least one of the sets in the union.If A ∩B0 is infinite, put ak+1 = 0. Otherwise, set ak+1 = 1. It remains to show that (ak)

∞k=1

is an accumulation point of A.Let ε > 0 be given. Let N be large enough so that 1/(2N) < ε. Using (∗)N+1, find (σn) ∈ Awhich extends (a1, . . . , aN). Then

d((σn), (an)) =∞∑n=1

1

2n|σn − an| = 0 +

∞∑n=N+1

1

2n|σn − an| 6

∞∑n=N+1

1

2n=

1

2N< ε

24

Example 2.1.14. There is a related space Y = (σ)∞n=1 | σn ∈ N where the metric on Y isdefined by

d((σn), (τn)) =

0 if (σn) = (τn)1

2Nwhere N is the least number such that σN 6= τN

One can check that d is a metric, and in fact, (Y, d) is complete. But Y is not sequentiallycompact. For example, the sequence of elements

(0, 0, 0, . . .)

(1, 0, 0, . . .)

(2, 0, 0, . . .)

...

has no convergent subsequence, since the difference between any two elements of the sequenceis fixed at 1/2. In fact, (Y, d) is not even locally compact. In any open neighborhood of thespace, you can find a set of sequences which agree on a large number of digits, then eachdiffer in the N + 1 place, whence there is no convergent subsequence.

2.2 Lecture 6 - Some More Exam Problems and Continuity

Today, we will do some more exam problems. We will also introduce continuity.

Proposition 2.2.1 (F07.8). Suppose (an) is a sequence such that an > 0 for all n ∈ N, and

∞∑n=1

an =∞

Does∞∑n=1

an1 + an

=∞?

Proof. Yes. Since∞∑n=1

an =∞,

the sequences (k∑

n=1

an

)∞k=1

and

(k∑

n=M

an

)∞k=1

are both unbounded for each M ∈ N.Case 1: Suppose there exists M ∈ N such that for each n > M , an 6 1. Then for alln >M ,

an1 + an

>an2

25

Thenk∑

n=M

an1 + an

>1

2

k∑n=M

an

for each k > M ; since the RHS is not bounded, the LHS is not bounded either. Hence,certainly

k∑n=1

an1 + an

is not bounded.Case 2: Suppose there are infinitely many n ∈ N such that an > 1. Then for each such n,2an > an + 1, so

an1 + an

>1

2

Thus, (an

1 + an

)∞n=1

is a sequence of positive real numbers with infinitely many greater than 1/2, so

∞∑n=1

an1 + an

=∞

Proposition 2.2.2 (F13.1). Suppose (an) is a sequence such that an > 0 for all n ∈ N. Let

Pn =n∏j=1

(1 + aj)

Prove

limn→∞

Pn <∞ ⇐⇒∞∑n=1

an <∞

Proof. ( =⇒ ) It suffices to show that the sum is bounded, since each of the terms arepositive. Note that

Pn =n∏j=1

(1 + aj) = (1 + a1) · · · (1 + an)

=1 + (a1 + · · ·+ an) + (a1a2 + a1a3 + · · ·+ an−1an) + · · ·+ (a1a2 · · · an)

>a1 + a2 + · · ·+ an

So for each n ∈ N,n∑j=1

an 6 Pn <∞

26

(⇐= ) Once again, it suffices to show that (Pn) is bounded. Assume

∞∑n=1

an <∞

Then there exists N ∈ N such that∞∑n=N

an <1

2

For k > N ,

k∏j=N

(1 + aj) =1 +k∑

j=N

aj +k∑

j1,j2=N,j1 6=j2

aj1aj2 + · · ·+ (aN · · · ak)

61 +

(k∑

j=N

aj

)+

(k∑

j=N

aj

)2

+ · · ·+

(k∑

j=N

aj

)k−N+1

61 +

(1

2

)+

(1

2

)2

+ · · ·+(

1

2

)k−N+1

62

Thus, for every k > N ,

k∏j=1

(1 + aj) =k∏

j=N

(1 + aj) ·N∏j=1

(1 + aj) < 2 ·N∏j=1

(1 + aj)

Proposition 2.2.3 (S10.11). Suppose∑∞

n=1 an converges absolutely. Show that every re-arrangement of

∑∞n=1 an converges to the same limit. (A rearrangement of

∑∞n=1 an is∑∞

n=1 aσ(n) where σ : N→ N is a bijection.)

Proof. Let a =∑∞

n=1 an. Let ε > 0 be given. We must show that there exists N ∈ N suchthat for all k > N , ∣∣∣∣ ∞∑

n=1

aσ(n) − a∣∣∣∣ < ε

Let N1 be large enough that for all k > N1,∣∣∣∣ k∑n=1

an − a∣∣∣∣ < ε

2and

k∑n=N1

|an| <ε

2

where the first inequality is obtained using the convergence of∑∞

n=1 an to a, and the secondsince

∑∞n=1 an converges absolutely. Let N > N1 be large enough that

aσ1 , . . . , aσN ⊇ a1, . . . , aN1

27

which is possible as σ is a bijection. Fix k > N , and put

A = σ(1), . . . , σ(k) \ 1, 2, . . . , N

Then ∣∣∣∣ ∞∑n=1

aσ(n) − a∣∣∣∣ =

∣∣∣∣ N1∑n=1

an − a+∑i∈A

ai

∣∣∣∣ 6 ∣∣∣∣ N1∑n=1

an − a∣∣∣∣+∑i∈A

|ai| <ε

2+ε

2= ε

Proposition 2.2.4 (F08.5). Suppose∑∞

n=1 an converges, but not absolutely. Then for everya ∈ R, there is a rearrangement (aσ(n))

∞n=1 such that

∑∞n=1 aσ(n) = a.

Proof. The key property here is tha the sum of all positive (resp. negative) terms tends topositive (resp. negative) infinity, but the terms themselves are converging to 0. Let

X = n | an > 0

Y = n | an < 0

Claim 2.2.5. (1)∑∞

n=1,n∈X an =∞

(2)∑∞

n=1,n∈Y an = −∞

Proof. Since∑∞

n=1 an converges, each of (1), (2) implies the other, so it is enough to provethat one of them holds (if one of the sums is unbounded but the other is finite, it is clearthat the sum over all n will eventually diverge). Suppose for contradiction that both (1) and(2) fail. Then there exists M such that for all k ∈ N,

k∑n=1,n∈X

an < M

k∑n=1,n∈Y

an > −M

Then for all k ∈ N,k∑

n=1

|an| =k∑

n=1,n∈X

an −k∑

n=1,n∈Y

an < 2M

i.e. the sum converges absolutely, a contradiction.

Note that an → 0 since∑∞

n=1 an converges. We will define sequences (un) and (vn)inductively. Take σ to be the rearrangement such that σ(1), . . . , σ(k1) are the elements of Xwhich are ≤ u1; σ(k1 +1), . . . , σ(j1) are the elements of Y which are ≤ v1; σ(j1 +1), . . . , σ(k2)are the elements of X which are in (u1, u2]; σ(k2 + 1), . . . , σ(j2) are the elements of Y whichare in (v1, v2]; and so on. This defines a bijection from N to N because X and Y partition

28

N, and σ is injective by construction; as we will see, σ will cover every element of X and Y ,so σ is also surjective. Let u1, v1 be sufficiently large enough that

i > u1, i ∈ X =⇒ ai <1

2

i > v1, i ∈ Y =⇒ ai > −1

2

This is possible since (an) converges to 0. This defines σ up to j1, so it also determines∑j1n=1 aσ(n). Say WLOG that

j1∑n=1

aσ(n) < a

Then set u2, picked so that adding elements of X in (u1, u2] puts the sum between a anda + (1/2); we can do this since ai < 1/2 for each i > v1. Similarly, set v2, picked so thatadding elements of Y in (v1, v2] puts the sum between a− (1/2) and a. Keep repeating thisprocess until uk, vk are sufficiently large enough so that

i > uk, i ∈ X =⇒ ai <1

4

i > vk, i ∈ Y =⇒ ai > −1

4

Once again, this is possible since (an) converges to 0. Repeat the above step with 1/4 insteadof 1/2; then it’s clear that the partial sums of the rearrangement are Cauchy with limit a.

Continuity

Definition 2.2.6. Let X, Y be topological spaces. Then f : X → Y is continuous atx0 ∈ X if for every open neighborhood V of f(x0), f−1(V ) contains an open neighborhoodof x0.If the topologies are generated by some basic open sets, then f is contnuous at x0 if andonly if for every basic open neighborhood V of x0, there exists a basic open neighborhoodU of x0 such that U ⊆ f−1(V ), i.e. f(U) ⊆ V .In particular, in metric spaces (where the topology is generated by open balls), f is con-tinuous at x0 if and only if for every ε > 0 (standing for the basic open ball B(f(x0), ε)),there exists δ > 0 (standing for the basic open ball B(x0, δ)) such that x ∈ B(x0, δ) impliesf(x) ∈ B(f(x0), ε), i.e. dX(x, x0) < δ =⇒ dy(f(x), f(x0)) < ε.

Proposition 2.2.7. Let X and Y be metric spaces. Then f : X → Y is continous at x ifand only if whenever (xn) converges to x, (f(xn)) converges to f(x).

Proof. ( =⇒ ) Suppose f is continuous at x, and (xn) converges to x. Let δ > 0 be given.We must show that there exists N ∈ N such that n > N implies d(f(xn), f(x)) < δ. Sincef is continuous at x, there exists ε > 0 such that

d(y, x) < ε =⇒ d(f(y), f(x)) < δ

29

Since xn → x, there exists N ∈ N such that for all n > N , d(xn, x) < ε, whenced(f(xn), f(x)) < δ.( ⇐= ) Fix f, x, and let δ > 0 be given. We must find ε > 0 such that d(x, y) < ε im-plies d(f(x), f(y)) < δ. Suppose no such ε exists. n particular, for each n, ε = 1/n doesnot satisfy the desired condition. Then there exists y ∈ X such that d(x, y) < 1/n butd(f(x), f(y)) > δ. Let xn be some such y for each n ∈ N. Then (xn) converges to x, butf(xn) does not converge to f(x), a contradiction.

Definition 2.2.8. Let X, Y be topological spaces. Then f : X → Y is continuous if andonly if it is continuous at all x ∈ X.

Proposition 2.2.9. f : X → Y is continuous if and only if for every open V ⊆ Y , f−1(V )is open in X.

Proof. ( =⇒ ) Let f be continuous, V ⊆ Y be open, and put U = f−1(V ). Fix x ∈ U . Thenby continuity of f at x, since V is an open neighborhood of f(x), U must contain an openneighborhood of x.(⇐= ) Fix f, x, and let V ⊆ Y be an open neighborhood of f(x). Since V is open, f−1(V )is an open neighborhood of x, whence f is continuous at x.

Corollary 2.2.10. The composition of continuous functions is continuous.

Note: +,−, · are continuous by their definitions on elements of R.

Theorem 2.2.11 (Intermediate Value Theorem). Let f : X → R be continuous. Suppose Xis connected. Then if f takes values y0, y1 (with y0 < y1, say) then f takes every value in(y0, y1). Precisely, for every y ∈ (y0, y1), there exists x ∈ X such that f(x) = y.

Proof. Suppose y ∈ (y0, y1) and for all x ∈ X, f(x) 6= y. Put

A = x ∈ X | f(x) < y;B = x ∈ X | f(x) > y

It is clear that A ∩ B = ∅ and X = A ∪ B by assumption. Both A and B are open inX by continuity, since A = f−1((−∞, y)) and B = f−1((y,∞)). Additionally, both A andB are nonempty, since there exists x ∈ X such that f(x) = y0 < y and z ∈ X such thatf(z) = y1 > y. This is a contradiction, since X is connected.

Corollary 2.2.12 (W06.6). Let f : [a, b] → R be continuous. Then f takes every valuebetween f(a) and f(b).

Proof. Since we showed earlier that [a, b] is connected, this is immediate by the IntermediateValue Theorem.

2.3 Lecture 7 - Path-Connectedness, Lipschitz Functions and Con-tractions, and Fixed Point Theorems

Today, we will discuss the idea of path-connectedness, and show that there are sets in R2

which are connected but not path connected. We will also introduce classes of continuousfunctions which have stronger conditions on how much the function can grow between pointswhich are close together. Finally, we will prove some valuable fixed point theorems for theseclasses of functions.

30

Definition 2.3.1. A metric space (X, d) is path-connected if for all x0, x1 ∈ X, thereexists a continuous f : [0, 1]→ X such that f(0) = x0, f(1) = x1.

Proposition 2.3.2. If (X, d) is path-connected, then it is connected.

Proof. Suppose A,B are open, nonempty disjoint subsets such that A ∪ B = X. Pickx0 ∈ A, x1 ∈ B. Since (X, d) is path-connected, there exists continuous f : X → [0, 1] suchthat f(0) = x0, f(1) = x1. Then f−1(A), f−1(B) decomposes [0, 1] into disjoint nonemptyopen sets, which is a contradiction as [0, 1] is connected.

Proposition 2.3.3. If f : [a, b]→ X is continuous, then im(f) = f([a, b]) is connected.

Proof. Suppose there are A,B are open, nonempty disjoint subsets such that A∪B = im(f).Proceed as in the proposition above.

Proposition 2.3.4 (S11.11(b)). Give an example of a connected set in R2 which is connectedbut not path-connected.

Proof. The classic example of a connected subset of R2 which is not path-connected is calledthe Topologists’ Sine Curve. Let

Z = (x, y) ∈ R2 | x = 0, y = 0 or x 6= 0, y = sin(1/x)

The idea here is that while the graph cannot be broken up into disjoint open pieces, thecurve starts to wiggle so violently near the origin that no continuous curve from [0, 1] canmove “quickly” enough to pass through all the points.

Claim 2.3.5. Z is connected.

Proof. Suppose for contradiction that Z = A ∪ B, where A,B are open, disjoint, andnonempty. The topology on Z is inherited from R2, so we have A∗, B∗ open in R2 suchthat A = A∗ ∩ Z, B = B∗ ∩ Z.One of A,B, must contain the origin. WLOG, say (0, 0) ∈ A. Then (0, 0) ∈ A∗, so thereexists ε > 0 such that B((0, 0), ε) ∈ A∗. So A has points in both Z− and Z+, where

Z− = (x, y) ∈ Z | x < 0

andZ+ = (x, y) ∈ Z | x > 0

B has points in at least one of Z−, Z+, since (0, 0) /∈ B and B is nonempty. Suppose WLOGthat B ∩ Z− 6= ∅. Thus, A and B both have points in Z−. Then A ∩ Z− and B ∩ Z−are nonempty, disjoint, open subsets of Z−, so Z− is not connected. But by the previousproposition, Z− must be connected, as it is the image of a continuous function on an interval(namely, (−∞, 0)).

Claim 2.3.6. Z is not path connected.

31

Proof. Let (x0, sin(1/x0)) = (x0, y0) and x1 = 0, y1 = 0 be in Z with x0 < 0. Supposefor contradiction that there exists continuous f : [0, 1] → Z such that f(0) = (x0, y0) andf(1) = (x1, y1). Note that for any α ∈ [0, 1] we may write f(α) as (fX(α), fY (α)) with fX , fYcontinuous by composing f with the continuous projections to the x and y-axis respectively.Let (un) be an increasing sequence converging to 0 such that fX(0) = x0 < un < 0 = fX(1)and

sin

(1

un

)=

1 if n is even

−1 if n is odd

By the intermediate value theorem, there exists α0 ∈ [0, 1] such that fX(α0) = u0. Thenby the Intermediate Value Theorem applied inductively, we have αn < αn+1 < 1 such thatfX(αn+1) = un+1. Since (αn) is an increasing sequence which is bounded above, it mustconverge to some limit, say α 6 1.Since fX is continuous, fX(αn) must converge to fX(α). Since fX(αn) = un and un convergesto 0, we must have fX(α) = 0. Since (fX(α), fY (α)) ∈ Z, we must have fY (α) = 0. Bythe continuity of fY , this means that (fY (αn)) converges to 0. But (fX(αn), fY (αn)) =(un, sin(1/un)) ∈ Z, so

fY (αn) = sin

(1

un

)=

1 if n is even

−1 if n is odd

whence (fY (αn)) does not converge.

This completes the proof.

Theorem 2.3.7 (Extreme Value Theorem). Continuous functions f : X → R on compactspaces X take minimum and maximum values.

Proof. Suppose f does not attain a maximum on X. Then there exists a sequence (yn)n∈N ⊆im(f) such that (∗) for all x ∈ X, there exists n ∈ N such that f(x) < yn. Set An =f−1((−∞, yn)); note each An is open by the continuity of f . By (∗), Ann∈N is an opencover for X. By compactness, there exists k ∈ N finite such that

X ⊆⋃n6k

An = f−1((−∞,maxy0, . . . , yk))

But this is a contradiction, since maxy0, . . . , yk = yi for some i ∈ 1, . . . , k, so yi ∈ im(f).The proof is similar for minmum values.Here is an alternative proof. Note that the continuous image of a compact set is compact(we will prove this later). Hence, f(X) ⊆ R is compact, whence it is closed and bounded byHeine-Borel. Since f(X) is bounded, it has a least upper bound α. Note α is a limit pointof f(X) (every open ball about α must contain a point of f(X), or else α would not be theleast upper bound for f(X)), whence α ∈ f(X) since f(X) is closed. Hence, there existsx ∈ X such that f(x) = α. The proof that f takes minimum values is nearly identical.

Definition 2.3.8. Let (X,TX), (Y,TY ) be topological spaces. The product topology onX × Y is the topology generated by U × V, U ∈ TX , V ∈ TY .

32

Proposition 2.3.9. Let (X, d) be a metric space. Then d : X ×X → R is continuous.

Proof. Fix x, y ∈ X2. Let α = d(x, y), and let ε > 0 be given. We must find some openneighborhood U × V of (x, y) in X2 such that for all (x, y) ∈ U × V , d(x, y) ∈ B(α, ε). TakeU = B(x, ε/2), V = B(y, ε/2). Then for all (x, y) ∈ U × V , we have

d(x, y) 6 d(x, x) + d(x, y) + d(y, y) =⇒ d(x, y)− α < ε

andd(x, y) 6 d(x, x) + d(x, y) + d(y, y) =⇒ α− d(x, y) < ε

so|d(x, y)− α| < ε

i.e., d(x, y) ∈ B(α, ε).

Fixed Point Theorems

Definition 2.3.10. Let X, Y be metric spaces. Then f : X → Y is Lipschitz with con-stant L if for all x1, x2 ∈ X,

dY (f(x1), f(y1)) 6 L · (dX(x1, x2))

Proposition 2.3.11. All Lipschitz functions are continuous.

Proof. If f is Lipschitz with constant L, then for any point x ∈ X, take δ = ε/L. (Note thatthis also shows that all Lipschitz functions are uniformly continuous; see Lecture 8).

Definition 2.3.12. f : X → Y is a contraction if it is Lipschitz with constant L < 1.

Theorem 2.3.13 (W08.1(b) - Banach Fixed Point Theorem). Let f : X → X be a contrac-tion, where X is a complete metric space. Then f has a fixed point, i.e. there is x ∈ X suchthat f(x) = x. Morever, the fixed point is unique.

Proof. Uniqueness: Suppose f(x) = x, f(y) = y. Since f is a contraction, d(f(x), f(y)) 6L · d(x, y) for some L < 1. But f(x) = x, f(y) = y, so d(f(x), f(y)) = d(x, y), so it must bethat d(x, y) = 0, whence x = y.Existence: There’s not much we can do here, so we start by picking an arbitrary point inX and applying f . Then we hope for the best.Let x0 ∈ X. Set xn+1 = f(xn). For every n ∈ N,

(∗)d(xn+2, xn+1) = d(f(xn+1), f(xn)) 6 L · d(xn+1, xn) < d(xn+1, xn)

Using (∗) by induction, we obtain

d(xn+1, xn) 6 Ln · d(x1, x0)

33

Additionally, for every k > n+ 1, we have

d(xn+1, xk) 6d(xn+1, xn+2) + d(xn+2, xn+3) + · · ·+ d(xk−1, xk)

6L · d(xn, xn+1) + L2 · d(xn, xn+1) + · · ·+ Lk−(n+1)d(xn, xn+1)

<d(xn, xn+1) ·∞∑i=1

Li

=d(xn, xn+1) · L

1− L

<d(x0, x1) · Ln+1

1− L

In particular, we see (xn) is Cauchy. By completeness, xn → x for some x ∈ X. Thenxn+1 → x also, so f(xn)→ x. Finally, by continuity, we have

x = limn→∞

f(xn) = f( limn→∞

xn) = f(x)

Theorem 2.3.14 (W08.1(a) - Brouwer Fixed Point Theorem). Let g : [a, b] → [a, b] becontinuous. Prove g has a fixed point.

Proof. Let f(x) = g(x) − x; note f is continuous. We have g(a) > a, since g(a) ∈ [a, b], sof(a) > 0. Similarly, f(b) 6 0. Hence, by the Intermediate Value Theorem, there exists somex ∈ [a, b] such that f(x) = 0, i.e. g(x)− x = 0, so g(x) = x.

Proposition 2.3.15 (F11.1). Let f : X → X. Suppose for all x 6= y, d(f(x), f(y)) < d(x, y).Suppose X is compact. Prove f has a unique fixed point.

Proof. Uniqueness: If f(x) = x, f(y) = y for x 6= y, then d(x, y) = d(f(x), f(y)) < d(x, y),a contradiction.Existence: Let g(x) = d(x, f(x)). Since each of the component functions x 7→ x and x 7→f(x) are continuous and d is continuous, g : X → [0,∞) is continuous. Since X is compact, gattains a minimum value. Let x ∈ X be such that g(x) is a minimal value of g on X. Supposeg(x) = d(x, f(x)) = α > 0. Then by our assumption on f , d(f(x), f(f(x))) < d(x, f(x) = α,contradicting the minimality of α. This finishes the proof, since g(x) = d(x, f(x)) = 0, sox = f(x).

2.4 Lecture 8 - Uniformity, Normed Spaces and Sequences ofFunctions

Today, we will discuss the idea of uniform continuity. We will also introduce pointwise anduniform convergence of sequences of functions.

Uniformity

34

Definition 2.4.1. Let X, Y be metric spaces. Recall f : X → Y is continuous if for everyx ∈ X and ε > 0, there exists δ > 0 such that d(x, y) < δ =⇒ d(f(x), f(y)) < ε.We say f : X → Y is uniformly continuous if for every ε > 0, there exists δ > 0 such thatfor all x ∈ X, d(x, y) < δ =⇒ d(f(x), f(y)) < ε.

Proposition 2.4.2. If a function is uniformly continuous, it is continuous.

Proposition 2.4.3. Lipschitz functions are uniformly continuous.

Proof. Suppose f is Lipschitz with constant L. Then for all ε > 0, d(x, y) < ε/L =⇒d(f(x), f(y)) < ε.

Theorem 2.4.4. Any continuous function f on a compact space X is uniformly continuous.

Proof. Fix f, ε > 0. We must find a δ > 0 such that for all x, y ∈ X, d(x, y) < δ =⇒d(f(x), f(y)) < ε. By continuity, for every z ∈ x, there exists δz > 0 such that d(x, z) <δz =⇒ d(f(x), f(z)) < ε/2. Then for all x, y ∈ B(z, δz),

(∗)d(f(x), f(y)) 6 d(f(x), f(z)) + d(f(z), f(y)) <ε

2+ε

2= ε

Note that B(z, δz/2) | z ∈ X is an open cover for X. By compactness, there are z1, . . . , zksuch that B = B(z1, δz1/2), . . . , B(zk, δzk/2) cover X. Set δ = minδz1 , . . . , δzk. Supposed(x, y) < δ. Note x ∈ B(zi, δzi/2) for some i ∈ 1, . . . , k, since B is an open cover forX. Since d(x, y) < δ 6 δzi/2, by the triangle inequality we have y ∈ B(zi, δzi). By (∗),d(f(x), f(y)) < ε.

Proposition 2.4.5 (S04.2). f(x) =√x is uniformly continuous on [0,∞).

Proof. We break the proof into two steps, which we glue together in the third step.Step 1: Observe

√x is uniformly continuous on [0, 3] by the previous theorem, since

√x is

continuous∗ on [0, 3] and [0, 3] is compact.Step 2: We show f is Lipschitz on [1,∞), hence uniformly continuous. Note

√x =√y + (

√x−√y)

so on [1,∞)x = y + 2

√y(√x−√y) + (

√x−√y)2

=⇒ x− y = (√x−√y)2 + 2

√y(√x−√y) > 2

√y(√x−√y) > 2(

√x−√y)

Thus, on [1,∞)√x−√y 6

1

2(x− y)

Step 3: We now need to merge these two observations to get uniform continuity on thefull space [0,∞). Fix ε > 0. By steps 1, 2, there exist δ1, δ2 > 0 such that x, y ∈ [0, 3]and |x − y| < δ1 implies |f(x) − f(y)| < ε, and x, y ∈ [1,∞) and |x − y| < δ2 implies|f(x) − f(y)| < ε. Take δ = min(δ1, δ2, 1). If |x − y| < δ, then either x, y are both in [0, 3]or x, y are both in [1,∞), and in each case |f(x)− f(y)| < ε.

35

∗ Of course, you should now be asking: why is√x continuous in the first place? In fact,

a much more general class of functions is continuous, of which√x is a particular example.

Proposition 2.4.6. The forward image of compact sets by continuous functions are compact.

Proof. Let f : X → Y be continuous, and let K ⊆ X be compact. Let Vii∈I be anopen cover of f(K). Then f−1(Vi)i∈I is an open cover of K by the continuity of f . Bycompactness of k, there is a finite subcover f−1(Vi1), . . . , f

−1(Vin). Then Vi1 , . . . , Vin coverK.

Proposition 2.4.7. If f : X → Y is a continuous bijection, Y is Hausdorff, and X iscompact, then f−1 : Y → X is also continuous. In particular, this shows

√x is continuous

on [0,M ] for each M , hence on [0,∞).

Proof. Fix U ⊆ X open. We must show f(U) is open. Let K = X \U . Then K is compact,since X is compact and U is open. Note f(U) = Y \ f(X \U) since f is a bijection. By theprevious proposition, f(X \ U) is compact, hence closed, since Y is Hausdorff. So f(U) isopen.

Limits of functions

Definition 2.4.8. Let X, Y be metric spaces, E ⊆ X, and f : E → X. Let a ∈ X (withpossibly a /∈ E) with points of E arbitrarily close to A. Then we say limx→a,x∈E f(x) existsand is equal to y if for every ε > 0, there exists δ > 0 such that d(x, a) < δ, x ∈ E, x 6= aimplies d(f(x), f(y)) < ε.Note: some people allow x = a in the limit. We will also use this with a = ±∞.

Example 2.4.9. Let f(x) = 1/x from (0,∞) to (0,∞). Then limx→∞ f(x) = 0.

Convergence of sequences of functions

Definition 2.4.10. Let fn : X → Y , n ∈ N be a sequence of functions. Let f : X → Y . Wesay (fn) converges to f pointwise on X if for all x ∈ X, limn→∞ fn(x) exists and equalsf(x).

Example 2.4.11. Take fn : [0, 1] → [0, 1] given by fn(x) = xn. Then fn(x) convergespointwise to

f(x) =

0 if x ∈ [0, 1)

1 if x = 1

This is an example of a pointwise limit of continuous functions which is not continuous.Another way to put it is:

limn→∞

limx→a

fn(x) need not equal limx→a

limn→∞

f(x)

even if both limits exist (e.g., in the example above, take a = 1).

36

Definition 2.4.12. Let (fn) be a sequence of functions from X to Y and let f : X → Y .Then (fn) converges to f uniformly on X if for every ε > 0, there exists N ∈ N such thatfor all x ∈ X,n > N , d(fn(x), f(x)) < ε.

The key difference between uniform convergence and pointwise convergence is that inuniform convergence, N does not depend on the choice of x. This is similar to the case ofuniform continuity, which differs from continuity in that δ is independent of the choice of x.

Proposition 2.4.13. If (fn) converges to f uniformly, (fn) converges to f pointwise.

Proposition 2.4.14. Suppose (fn) converges to f uniformly. Suppose fn is continuous atx0 ∈ X for each n ∈ N. Then so is f .

Proof. Let ε > 0 be given. We need to find δ such that d(x0, y) < δ =⇒ d(f(x0), f(y)) < ε.Since (fn) converges to f uniformly, there exists N ∈ N such that for all n > N, y ∈X, d(fn(y), f(y)) < ε/3. By the continuity of fN at x0, there exists δ > 0 such that d(x0, y) <δ =⇒ d(fN(x0), fn(y)) < ε/3. Then if d(x0, y) < δ, we have

d(f(x0), f(y)) 6 d(f(x0), fN(x0)) + d(fN(x0), fN(y)) + d(fN(y), f(y)) <ε

3+ε

3+ε

3= ε

Corollary 2.4.15. If (fn) converges uniformly to f on X, and fn is continuous on X foreach n ∈ N, then so is f . If (fn) converges uniformly to f on X, and limx→x0 limn→∞ fn(x)exists, then so does limn→∞ limx→x0 fn(x), and the two limits are equal.

Normed Spaces

Definition 2.4.16. Let V be a vector space. A norm on V is a function || · || : V → [0,∞)such that

(1) If x 6= 0 then ||x|| 6= 0

(2) ||αx|| = |α|||x|| (in particular, ||~0|| = 0)

(3) (triangle inequality) ||x+ y|| 6 ||x||+ ||y||

We can then define d(x, y) = ||x− y||. This will be a metric on V .

Definition 2.4.17. For any set X, let

B(X) = f | f is a bounded function (both above and below) from X → R

Then ||f || := supx∈X |f(x)| is a norm on B(X) called the sup norm.

Note then that for (fn) ⊆ B(X) and f ∈ B(X), (fn) conveges to f uniformly on X ifand only if (fn) converges to f in the metric on B(X). We can also consider convergence

37

of sums of functions. For a sequence of functions (fn) between metric spaces X and Y , foreach x ∈ X we take

∞∑n=1

fn(x)

to be the limit as k →∞ of

sk(x) =k∑

n=1

fn(x)

Theorem 2.4.18. If (fn) consists of bounded, continuous functions on X and

∞∑n=1

||fn||sup

converges, then∑∞

n=1 fn converges uniformly to a continuous function f .

Proof. (Exericse). Look at tail sums, and use uniform convergence.

Getting a convergent subsequence

Definition 2.4.19. A sequence of functions (fn)∞n=1 is (pointwise) equicontinuous at x0 iffor every ε > 0, there exists δ such that for all n ∈ N, d(x0, y) < δ implies d(fn(x0), fn(y)) <ε. Note δ may depend on x0, but not on n!

Definition 2.4.20. A sequence of functions (fn)∞n=1 is pointwise bounded if for everyx0 ∈ X, there exists M such that for all n ∈ N, |fn(x0)| < M . (At each point, the boundmight be different.)

Theorem 2.4.21 (Arzela-Ascoli). Let X be a separable metric space. Suppose fn : X → R,and

(i) (fn) is pointwise bounded

(ii) (fn) is pointwise equicontinuous

Then (fn) has a subsequence which converges pointwise to a continuous function. Moreover,the convergence is uniform on compact subsets of X. Note : If X is a compact metricspace, then X is automatically separable.

38

3 Week 3

As per the syllabus, Week 3 topics include: definition of derivative, derivative for inversefunction, local maxima and minima, Rolle’s theorem, mean value theorem, Rolle’s the-orem for higher order derivatives and applications to error bounding for approximationsby Lagrange interpolations, monotonicity, L’Hopital’s rule, uniform convergence limits ofderivatives (in homework), upper and lower Riemann integrals, upper and lower Riemannsums, definition of the Riemann integral, integrability of bounded continuous functions onbounded intervals, basic properties of the Riemann integral, integrability of mins, maxes,sums, and products, Riemann-Stieltjes integral, the fundamenal theorems of calculus, inte-gration by parts, change of variables in integration, improper integrals, integrals of uniformconvergence limits (in homework), Cauchy-Schwarz inequality.

3.1 Lecture 9 - Arzela-Ascoli, Differentiation and Associated Rules

Today, we will discuss the Arzela-Ascoli theorem. We will also introduce differentiation,and discuss some important rules which allow us to compute derivatives of certain classes offunctions.

Recall, from last time:

Theorem 3.1.1 (Arzela-Ascoli). Let X be a separable metric space. Suppose fn : X → R,and

(i) (fn) is pointwise bounded

(ii) (fn) is pointwise equicontinuous

Then (fn) has a subsequence which converges pointwise to a continuous function. Moreover,the convergence is uniform on compact subsets of X. Note : If X is a compact metricspace, then X is automatically separable.

Proof. Let D ⊆ X be countable and dense (such a D exists since X is separated). We willproceed in four parts.

(1) Get a subsequence which converges pointwise on D (using pointwise boundedness)

(2) Show it converges pointwise everywhere on X (using equicontinuity)

(3) Check the limit is continuous (using equicontinuity)

(4) Show uniform convergence on compact sets

Part 1: We will show (fn) has a subsequence which converges on D. Let (zl)∞l=1 enumerate

D (this is possible since D is countable). The idea here is that we will find a subsequence of(fn) which converges at zl for each l ∈ N. We will then patch these subsequences togetherin a way that one particular subsequence of (fn) converges at zl for every l ∈ N.Let A0 = N. By induction, we define Al+1 ⊆ Al such that (fn(zl))n∈Al+1

converges. To doso, suppose Al has been defined. The sequence (fn(zl))n∈Al

is a bounded sequence in R since

39

(fn) is pointwise bounded, so it has a convergent subsequence. Take Al+1 ⊆ Al to be suchthat (fn(zl))n∈Al+1

is a convergent subsequence of (fn(zl))n∈Al. In this way, (fn(z0))n∈A1

is a convergent subsequence of (fn(z0))n∈A0 , (fn(z1))n∈A2 is a convergent subsequence of(fn(z1))n∈A1 , and so on, with A0 ⊇ A1 ⊇ A2 ⊇ · · · .If we naively take the intersection Ai over all i ∈ N, we might end up with an empty set,so we have to be careful. We want to arrange it so that we use the nestedness of the Ais tocapture a subsequence which has all of the tail ends. Precisely, for k ∈ N, let nk be the leastnatural number such that nk > k and nk ∈ Ak. Then for every l, nk ∈ Al for all k > l, sinceAk ⊆ Al for each k > l, and nk ∈ Ak for each k ∈ N. Hence, (nk) is eventually contained inAl for each l ∈ N.Hence, for every l ∈ N, a tail end of (fnk

(zl))∞k=1 is contained in (fn(zl))n∈Al+1

. Since(fn(zl))n∈Al+1

converges, so (fnk(zl))

∞k=1.

Part 2: Now we want to how (fnk(x))∞k=1 converges for all x ∈ X. The main tool we’ll use

here is equicontinuity; this allows us to leverage the continuity of fn on all of X for eachx ∈ X independently of our index n. We make this precise below.Fix x ∈ X, ε > 0. It suffices to find N ∈ N such that for all k1, k2 > N , |fnk1

(x)−fnk2(x)| < ε.

Thus, the sequence is Cauchy, hence converges. By equicontinuity of (fn), there exists δ > 0such that d(z, x) < δ implies for all n ∈ N, |fn(z)−fn(x)| < ε/3. By the density of D, thereexists some z ∈ D such that d(z, x) < δ. By part (1), since (fnk

(z)) converges for each z ∈ D,it is Cauchy, whence there exists N ∈ N such that for all k1, k2 > N, |fnk1

(z)−fnk2(z)| < ε/3.

Then for all k1, k2 > N ,

|fnk1(x)−fnk2

(x)| 6 |fnk1(x)−fnk1

(z)|+|fnk1(z)−fnk2

(z)|+|fnk2(z)−fnk2

(x)| < ε

3+ε

3+ε

3= ε

where we have arrived at the first term using equicontinuity, the second term using part (1),and the third term using equicontinuity again.Part 3: For x ∈ X, let f(x) = limk→∞ fnk

(x); this limit exists, as established by part (2).We now show f is continuous. Fix ε > 0 and x ∈ X. Let δ > 0 be given such that d(x, z) < dimplies |fnk

(x)− fnk(z)| < ε/3 for all k ∈ N; such a delta exists and is independent of nk by

equicontinuity.Our strategy is now essentially the same as that of part (2). We use the convergence of oursubsequence on all of X from part (2) together with equicontinuity to establish the continuityof f . Fix z ∈ X. Since fnk

(x)→ f(x) and fnk(z)→ f(z), for k sufficiently large, we have

|fnk(x)− f(x)| < ε

3and |fnk

(z)− f(z)| < ε

3

Then

|f(x)− f(z)| 6 |f(x)− fnk(x)|+ |fnk

(x)− fnk(z)|+ |fnk

(z)− f(z)| < ε

3+ε

3+ε

3= ε

where we have arrived at the first term using convergence from part (2), the second termusing equicontinuity, and the third term using convergence again.Part 4: Let K be a compact subset of X. To prove that (fnk

) converges to f uniformly onK, we will first establish uniform convergence on small neighborhoods. Using compactness,we will patch this convergence together on an open cover, then trim down to a finite set of

40

neighborhoods, i.e. subcover. We make this precise below.We want to show that for every ε > 0, there exists N ∈ N such that for all n > N ,

|fn(x)− f(x)| < ε

for all x ∈ X. Here, N is independent of x. Let ε > 0 be given. Let x ∈ X. Since (fnk) is

equicontinuous, there exists δ1x > 0 depending on ε and x such that for all y ∈ B(x, δ1

x),

|fnk(x)− fnk

(y)| < ε

3

for all k ∈ N. Since f is continuous on all of X, there similarly exists δ2x > 0 such that for

all y ∈ B(x, δ2x),

|f(x)− f(y)| < ε

3

Finally, since (fnk) converges to f pointwise, there exists Nx ∈ N such that for all j > Nx,

we have|fnj

(x)− f(x)| < ε

3

Put δx = minδ1x, δ

2x. Putting these conditions together, for all j > Nx

|fnj(y)− f(y)| 6 |fnj

(y)− fnj(x)|+ |fnj

(x)− f(x)|+ |f(x)− f(y)| 6 ε

3+ε

3+ε

3= ε

for all y ∈ B(x, δx). Cover K with the open balls B(x, δx) for each x ∈ X. Then there existsa finite subcover B(x1, δx1), . . . , B(xl, δxl). Put N = maxNx1 , . . . , Nxl. Then as shownabove, for all j > N , we have

|fnj(x)− f(x)| < ε

for each x ∈ K, completing the proof.

DerivativesWe will work in R throughout our discussion of derivatives, though there are select proposi-tions we will prove that hold for general metric spaces.

Definition 3.1.2. Let X ⊆ R, f : X → R, and x0 ∈ X be a limit point of X. We say f isdifferentiable at x0 if

limx→x0

f(x)− f(x0)

x− x0

exists. The derivative of f at x0, denoted f ′(x0), is the value of the limit above. If thelimit does not exist, f is not differentiable at x0. Finally, f is differentiable on X if fis differentiable at every point x0 ∈ X.

Example 3.1.3. We compute some fundamental examples of derivatives below.

(1) Let f(x) = c for some c ∈ R. Then

limx→x0

f(x)− f(x0)

x− x0

= limx→x0

c− cx− x0

= limx→x0

0 = 0

So f is differentable on R, and f ′(x) = 0 for all x ∈ R.

41

(2) Let f(x) = x on R. Then

limx→x0

f(x)− f(x0)

x− x0

= limx→x0

x− x0

x− x0

= limx→x0

1 = 1

So f is differentable on R, and f ′(x) = 1 for all x ∈ R.

(3) Let f(x) = xn for some n > 2. Then

limx→x0

f(x)− f(x0)

x− x0

= limx→x0

xn − xn0x− x0

= limx→x0

xn−1 + x0xn−2 + · · ·+ xn−1

0

=xn−10 + x0x

n−20 + · · ·+ xn−1

0

=nxn−10

where the third step is true by the continuity of multiplication and addition on R (hence,polynomials are continuous functions). So f is differentable on R, and f ′(x) = nxn−1

for all x ∈ R.

(4) Let f(x) = 1/x for x ∈ R \ 0. Then

limx→x0

f(x)− f(x0)

x− x0

= limx→x0

1x− 1

x0

x− x0

= limx→x0

x0−xx0x

x− x0

= limx→x0

−1

x0x

=−1

x20

where the fourth step is justified by the continuity of multiplication and division onR \ 0. So f is differentable on R \ 0, and f ′(x) = −1/x2 for all x ∈ R.

Proposition 3.1.4. If f is differentiable at x0, then f is continuous at x0.

Proof. Fix ε > 0. We will use a lot less than full differentiability here. Since

limx→x0

f(x)− f(x0)

x− x0

= f ′(x0)

there exists δ > 0 such that

|x− x0| < δ =⇒∣∣∣∣f(x)− f(x0)

x− x0

− f ′(x0)

∣∣∣∣ < 1

and this implies ∣∣∣∣f(x)− f(x0)

x− x0

∣∣∣∣ < |f ′(x0)|+ 1

42

so|f(x)− f(x0)| < |x− x0|(|f ′(x0)|+ 1)

Hence, it follows that

|x− x0| < min

δ,

ε

|f ′(x0) + 1|

=⇒ |f(x)− f(x0)| < ε

Proposition 3.1.5 (Chain Rule). Let f : X → Y ⊆ R, and g : Y → R be continuous. Letx0 ∈ X, y0 = f(x0). Suppose f is differentiable at x0 and g is differentiable at y0. Then(g f) is differentiable at x0, and (g f)′(x0) = g′(y0)f ′(x0) = g′(f(x0))f ′(x0).

Proof. This proof comes from Rudin’s “Principles of Mathematical Analysis”, Chapter 5,page 105. By the definition of the derivative,

f(x)− f(x0) = (x− x0)[f ′(x0) + ε(x)]

g(y)− g(y0) = (y − y0)[g′(y0) + δ(y)]

where x ∈ X, y ∈ Y , and ε(x)→ 0 as x→ x0, δ(y)→ 0 as y → y0. Let y = f(x). Then bythe second expression, we get

(g f)(x)− (g f)(x0) = g(f(x))− g(f(x0)) = [f(x)− f(x0)] · [g′(y0) + δ(y)]

By the first expression, this implies

(g f)(x)− (g f)(x0) = (x− x0) · [f ′(x0) + ε(x)] · [g′(y0) + δ(y)]

Then if x 6= x0, we find

(g f)(x)− (g f)(x0)

x− x0

= [f ′(x0) + ε(x)] · [g′(y0) + δ(y)]

Taking the limit at x → x0, y → limx→x0 f(x) = f(x0) by the continuity of f , so that theRHS tends to

g′(y0)f ′(x0)

whence the conclusion follows.

Corollary 3.1.6. Let f(x) = 1xn

on R \ 0. Then f is differentiable on R \ 0 andf ′(x) = −n

xn+1 .

Proof. Let g(y) = 1y

on R \ 0. Take h(x) = xn on R. Then f = g h, so by the chain rule,

f ′(x0) = g′(y0) · h′(x0), where y0 = h(x0)) = xn0 , so

f ′(x0) =−1

y20

· (n · xn−10 ) =

−1

x2n0

· (nxn−10 ) =

−nxn+1

0

So for f(x) = x−n, f ′(x) = −n · x−n−1.

43

Proposition 3.1.7. Suppose f : X → Y, g : Y → X with X, Y ⊆ R and (f g)(y) = y forall y ∈ Y , and (g f)(x) = x for all x ∈ X. Let x0 ∈ X, y0 ∈ Y such that y0 = f(x0) andx0 = f(y0), and suppose f ′(x0), g′(y0) exist. Then

g′(y0) =1

f ′(x0)

Proof. Proof is immediate from the chain rule, and the fact that the derivative of the identityis 1.

We will slightly weaken the assumptions of the previous proposition to obtain an identicalresult.

Proposition 3.1.8. Suppose f : X → Y is bijective for X, Y ⊆ R and g = f−1. Letx0 ∈ X, y0 ∈ Y such that y0 = f(x0) and x0 = f(y0). Suppose

(i) f is differentiable at x0

(ii) f ′(x0) 6= 0

(iii) g is continuous on Y

Then g is differentiable at y0, and

g′(y0) =1

f ′(x0)

Proof. Fix ε > 0. We must find δ > 0 such that

|y − y0| < δ =⇒∣∣∣∣g(y)− g(y0)

y − y0

− 1

x0

∣∣∣∣ < ε

By the continuity of u 7→ 1/u on R \ 0, and since f ′(x0) 6= 0, there exists ε1 > 0 such that

(∗)|u− f ′(x0)| =⇒∣∣∣∣1u − 1

f ′(x0)

∣∣∣∣ < ε

Since f is differentiable at x0, there exists ε2 > 0 such that

|x− x0| < ε2 =⇒∣∣∣∣f(x)− f(x0)

x− x0

− f ′(x0)

∣∣∣∣ < ε1

By the continuity of g, there exists δ > 0 such that

y − y0| < δ =⇒ |g(y)− g(y0)| < ε2

Then setting x = g(y) and using g(y0) = x0, we get∣∣∣∣f(x)− f(x0)

x− x0

− f ′(x0)

∣∣∣∣ =

∣∣∣∣ y − y0

g(y)− g(y0)− f ′(x0)

∣∣∣∣ < ε1

so by (∗), we have ∣∣∣∣g(y)− g(y0)

y − y0

− 1

f ′(x0)

∣∣∣∣ < ε

44

Corollary 3.1.9. Let g(x) = n√x = x1/n. Then g is differentiable on (0,∞), and g′(x) =

(1/n) · x1/n−1.

Proof. Note g is the inverse of f(x) = xn. By an earlier proposition (2.4.7) in lecture8, g is continuous on (0,∞). Put let x0 ∈ (0,∞) and put y0 = f(x0) = xn0 , so that

x0 = g(y0) = y1/n0 . Using the previous proposition, we have

g′(y0) =1

nxn−10

=1

ny(n−1)/n0

=1

n· y1/n−1

0

Corollary 3.1.10. For any rational α, the function f(x) = xα is differentiable on (0,∞),and f ′(x) = αxα−1.

Proof. This is an easy consequence of the chain rule on the maps x 7→ xn, x 7→ x1/k, togetherwith the previous propositions computing their derivatives.

We conclude with a few more basic differentiation rules.

Proposition 3.1.11. Let f, g : X → R be differentiable functions on some X ⊆ R. Then

(1) (f + g)′ = f ′ + g′

(2) (fg)′ = f ′g + g′f (product rule)

3.2 Lecture 10 - Applications of Differentiation: Mean Value The-orem, Rolle’s Theorem, L’Hopital’s Rule and Lagrange Inter-polation

Today, we will highlight some useful applications of derivatives. Chief among them areRolle’s Theorem and the Mean Value Theorem (of which Rolle’s Theorem is a special case).We will also discuss how derivatives can be used to analyze polynomial approximation viaLagrange Interpolation. Finally, we will prove several incarnations of L’Hopital’s rule, whichis a useful tool for computing certain kinds of limits.

Proposition 3.2.1 (Newton’s approximation). f is differentiable at x0 with derivative L ifand only if for every ε > 0, there exists δ > 0 such that for all x,

(∗)|x− x0| < δ =⇒ |f(x)− l(x)| < ε|x− x0|

where l(x) is the line through (x0, f(x0)) with slope L, i.e. l(x) = f(x0) + (x− x0).Informally, this proposition says f has derivative L at x0 if and only if the line with slope Lthrough (x0, f(x0) tracks the curve very closely around x0.

Proof. Notef(x)− l(x)

x− x0

=f(x)− f(x0)

x− x0

− L

Hence, the equivalence between the prescribed condition and differentiability at x0 is clear.

45

Definition 3.2.2. Let f : X → R. We say x0 is a local maximum of f if there existsδ > 0 such that for all x ∈ X, |x− x0| < δ implies f(x) 6 f(x0). Local minima are definedsimilarly.

Theorem 3.2.3. Let f : X → R. Let x0 ∈ X. Suppose x0 is a limit point of both (−∞, x0)∩X and (x0,∞)∩X. Suppose f is differentiable at x0, and x0 is a local maximum of minumumof f . Then f ′(x0) = 0.

Proof. The idea here is that if x0 is a local max (resp. min) of f : X → R, and there areelements of X to the right of x0 arbitrarily close to x0, and similarly from the left, and (∗)in the proposition above holds, then it must be that L = 0.Suppose L > 0. Then using (∗) with ε = 1/2L, we get that for x close enough to x0 withx > x0,

l(x)− L

2(x− x0) < f(x) < l(x) +

L

2(x− x0)

Since l(x) = f(x0) + (x− x0)L, this implies

f(x0) < f(x0) +L

2(x− x0) < f(x)

which is a contradiction, since x0 is a local maximum. If L < 0, we get a similar contradictionusing points x such that x < x0.

Theorem 3.2.4 (Rolle’s Theorem). Let f : [a, b] → R be continuous, and suppose f isdifferentiable on (a, b), where a < b. Suppose f(a) = f(b). Then there exists w ∈ (a, b) suchthat f ′(w) = 0.

Proof. If for all x ∈ (a, b), f(x) = f(a) = f(b), then f is constant on [a, b], so f ′(x) = 0 forall x ∈ (a, b), and the theorem holds.Suppose there exists c ∈ (a, b) such that f(c) 6= f(a). Assume f(c) > f(a). Since f iscontinuous on [a, b], it attains some maximal value, call it M , and let w ∈ [a, b] such thatf(w) = M . Then M > f(c) > f(a) = f(b), so w ∈ (a, b). W is then a local maximum off . By the previous theorem f ′(w) = 0. The case where f(c) < f(a) is dealt with similarly,the key being that continuous functions attain maximum and minimum values on compactsets.

Theorem 3.2.5 (Mean Value Theorem). Let a < b, and let f : [a, b] → R be continuous,and differentiable on (a, b). Then there exists x ∈ (a, b) such that

f ′(x) =f(b)− f(a)

b− aProof. The geometric picture you should have in your head is this. For a function which iscontinuous on an closed interval and differentiable on the open interval, between any twopoints in the interval, there exists a third point in between the two chosen such that thetangent line to the curve at that point is parallel to the secant line through the first twopoints.Let l(x) be the line between (a, f(a)) and (b, f(b)). Precisely,

l(x) = f(a) +(f(b)− f(a))(x− a)

b− a

46

Note

l′(x) =f(b)− f(a)

b− aPut g(x) = f(x)− l(x). Then

g′(x) = f ′(x)− f(b)− f(a)

b− a

and g(a) = g(b) = 0. By Rolle’s theorem, there exists x ∈ (a, b) such that g′(x) = 0. Then

f ′(x) =f(b)− f(a)

b− a

Example 3.2.6. For every rational α ∈ (0, 1) and every x, y > 1, |yα−xα| 6 |y−x|. Hence,x 7→ xα is Lipschitz on [1,∞). The proof uses the Mean Value Theorem, as one mightimagine.Fix x, y ∈ [1,∞), and assume WLOG x < y. Let f be the function x 7→ xα. From a previousproposition, we know f ′(x) = αxα−1. By the mean value theorem applied to [x, y], thereexists z ∈ (x, y) such that

f(y)− f(x)

y − x= f ′(x) = αzα−1 6 zα−1 =

1

z1−α 6 1

since z > 1. Hencef(y)− f(x) = yα − xα 6 y − x

Theorem 3.2.7 (Higher Order Rolle’s Theorem). Let f : [a, b]→ R be continuous. Supposef is n-times differentiable on (a, b), i.e. f ′, f ′′, f (3), . . . , f (n) all exist. Suppose a = a0 < a1 <· · · < an = b are such that for all i ∈ 1, . . . , n, f(ai) = 0. Then there exists x ∈ (a, b) suchthat f (n) = 0.

Proof. A small note in the statement above: while we take f(ai) = 0 for all i, this is morea matter of convenience. This theorem is true as long as f(ai) = c ∈ R for each i, but wecan always translate this case into the above form by replacing f with f − c. We proceed byinduction on n. Note the case n = 1 is Rolle’s Theorem, so our claim is true for n = 1.Suppose the claim holds for n > 1, and suppose a = a0 < · · · < an+1 = b such that for alli ∈ 1, . . . , n− 1, f(ai) = 0. By Rolle’s theorem, for each i, we can find ci ∈ (ai, ai+1) suchthat f ′(ci) = 0. Now we have c1, . . . , cn such that f ′(cj) = 0 for each j ∈ 1, . . . , n, so usingthe inductive hypothesis on f ′ (since f ′ is n-times differentiable), we get x ∈ (c0, cn) suchthat (f ′)(n)(x) = 0, i.e. f (n+1)(x) = 0.

Example 3.2.8 (Lagrange Interpolation). We can use the higher order Rolle’s theorem tobound error in approximation by polynomials. We will start by approximating a functionby a line. Suppose f : [a, b] → R is continuous, and twice differentiable on (a, b). Suppose|f 2(x)| is bounded on (a, b) by M . We approximate f by the line

p1(x) = f(a) + (x− a) · f(b)− f(a)

b− a

47

Let g(x) = f(x) − p1(x). This is the approximation error. Note that p(2)1 (x) = 0 for

all x ∈ [a, b], so g(2)(x) = f 2(x). Take c ∈ (a, b). Approximate g by a second degree-2polynomial p2 that hits g at a, b and c:

p2(x) =g(c)

(c− a)(c− b)(x− a)(x− b)

Let h(x) = g(x)− p2(x). Then

h(2)(x) = g2(x)− 2g(c)

(c− a)(c− b)= f 2(x)− 2g(c)

(c− a)(c− b)

By the higher order Rolle’s theorem for h, and since h(a) = h(c) = h(b) = 0, we have somez ∈ (a, b) such that h2(z) = 0. Then

f (2)(z) =2g(c)

(c− a)(c− b)

So

|g(c)| 6 1

2|f (2)(z)||c− a||c− b| 6 1

2M(b− a)2

Since the choice of c was arbitrary, the error g(x) is bounded on (a, b) by (M(b− a)2)/2.We can continue this same process with higher degree polynomial approximations. Setq2 = p1 + p2. Then q2 is a degree-2 polynomial with q2(a) = f(a), q2(b) = f(b), q2(c) = f(c).By a similar argument to the above, assuming f is 3-times differentiable and M bounds |f 3|,using n = 3 higher order Rolle’s, we get

|(f − q2)(x)| 6 M(b− a)3

3 · 2

More generally, if qn is a polynomial of degree n hitting f at a = a0 < · · · < an = b, and fis (n+ 1)-times differentiable and f (n+1) is bounded, then

|(f − qn)(x)| 6 M(b− a)n

n!

where |f (n+1)| 6M . The polynomials above are the Lagrange interpolations of f .

Derivatives of Monotone Functions

Definition 3.2.9. A function f : X → R (with X ⊆ R) is monotone increasing (nonde-creasing) if x < y implies f(x) 6 f(y). We say f is strictly monotone increasing if x < yimplies f(x) < f(y). Monotone decreasing and strictly monotone decreasing are definedsimilarly.

Proposition 3.2.10. If f is monotone increasing on X and differentiable at x0 ∈ Y , thenf ′(x0) > 0. Similarly, f monotone decreasing implies f ′(x0) 6 0.

48

Proof. Since f is non-decreasing, for all x 6= x0,

f(x)− f(x0)

x− x0

> 0

so

limx→ x0f(x)− f(x0)

x− x0

must be ≥ 0 if it exists.

Proposition 3.2.11. If f : [a, b] is continuous and differentiable on (a, b), and for all x ∈(a, b), f ′(x) > 0, then f is monotone increasing on [a, b]. Similarly, if for all x ∈ (a, b),f ′(x) 6 0, then f is monotone decreasing on [a, b]. This is a kind of “converse” to the aboveproposition.

Proof. As one might expect, we proceed using the Mean Value Theorem. Fix x < y in [a, b].By the Mean Value Theorem on [x, y], there exists w ∈ (x, y) such that

f(y)− f(x)

y − x= f ′(w)

Since f ′(w) > 0 by hypothesis, we get

f(y)− f(x) = f ′(w)(y − x) > 0

so f(y) > f(x).

L’Hopital’s RuleWe conclude today by presenting L’Hopital’s rule in several forms, differing mainly in thehypotheses assumed.

Proposition 3.2.12 (L’Hopital’s Rule). Let X ⊆ R, f, g : X → R, and let x0 be a limitpoint of X. Suppose f(x0) = g(x0) = 0. Suppose f, g are both differentiable at x0. Supposeg′(x0) > 0. Suppose there is a neighborhood (x0−δ, x0 +δ) of x0 where g(x) is never 0 exceptat x0. Then

limx→x0,x∈X

f(x)

g(x)=f ′(x0)

g′(x0)

Proof. Note for x 6= x0, we have

f(x)

g(x)=f(x)− f(x0)

g(x)− g(x0)=f(x)− f(x0)

x− x0

· x− x0

g(x)− g(x0)

As x→ x0,f(x)− f(x0)

x− x0

→ f ′(x0)

andx− x0

g(x)− g(x0)→ 1

g′(x0)so

f(x)

g(x)→ f ′(x0)

g′(x0)

49

Proposition 3.2.13. Let X ⊆ R, f, g : X → R, and let x0 be a limit point of X. Supposef(x0) = g(x0) = 0. Suppose f, g are both differentiable at x0. Suppose g′(x0) > 0. Supposethere is a neighborhood (x0 − δ, x0 + δ) of x0 where g′(x) is never 0. Then

limx→x0,x∈X

f(x)

g(x)=f ′(x0)

g′(x0)

Proof. We show there exists δ > 0 such that |x−x0| < δ, x 6= x0 such that g(x) 6= 0, reducingour claim to the previous proposition.Let δ > 0 be small enough such that |x−x0| < δ implies g′(x) 6= 0. Suppose for contradiction,there is x ∈ B(x0, δ) \ x0 such that g(x) = 0. By Rolle’s Theorem on [x, x0] (or [x0, x] ifx0 < x), there exists w between x and x0 such that g′(w) = 0, a contradiction.

Proposition 3.2.14. Suppose a < b and f, g : [a, b] → R are continuous. Suppose f(a) =g(a) = 0, and g′ is nonzero on (a, b]. If

limx→a

f ′(x)

g′(x)

exists and equals L, then g(x) 6= 0 for all x ∈ (a, b] and

limx→a

f(x)

g(x)= L

Proof. First, by Rolle’s Theorem as above, g is nonzero on (a, b]. For each z ∈ (a, b),let hz(x) = f(x)g(z) − g(x)f(z). Note hz(x) is continuous on [a, z], hz(a) = hz(z) = 0,and h′z(x) = f ′(z)g(z) − g′(x)f(z). By Rolle’s theorem, there exists w ∈ (a, z) such thath′z(w) = 0, so f ′(w)g(z)− g′(w)f(z) = 0, i.e.

f ′(w)

g′(w)=f(z)

g(z)

Now by assuption that

limx→a

f ′(x)

g′(x)= L

for every ε > 0, there exists δ > 0 such that

|w − a| < δ =⇒∣∣∣∣f ′(w)

g′(w)− L

∣∣∣∣ < ε

Now for any z ∈ (a, a+ δ), by the above, we have w ∈ (a, z) ⊆ (a, a+ δ) such that

f(z)

g(z)=f ′(w)

g′(w)

so ∣∣∣∣f(z)

g(z)− L

∣∣∣∣ < ε

50

3.3 Lecture 11 - The Riemann Integral (I)

Today, we will introduce Riemann integration as a method for computing areas under curves.We will develop the Riemann integral, define what it means to be “Riemann integrable”,and will prove that several classes of functions are Riemann integrable.

Definition 3.3.1. An interval I is any of [a, b], (a, b], [a, b), (a, b) for a 6 b (allowing fora = −∞, b = ∞). Note that under this definition, ∅ and single points are consideredintervals.

Definition 3.3.2. An interval is bounded if a 6= −∞ and b 6= ∞. We define the lengthof the interval to be b− a, denoted |I|.

Definition 3.3.3. A partition of an interval I is a (finite) set P of pairwise disjoint intervalssuch that

⋃J∈P J = I.

Proposition 3.3.4. If P is a (finite) partition of a bounded interval I, then |I| =∑

J∈P |J |.

Proof. Let P be a partition of a bounded interval I whose endpoints are a, b where a < b ∈ R.Let P = J1, . . . , Jn, and let Ji have endpoints xi < yi for each i. Since P is finite and everypair of elements of P are disjoint, we can “rearrange” the intervals so that the endpoints areordered, i.e. x1 ≤ y1 ≤ x2 ≤ y2 ≤ · · · ≤ xn ≤ yn. Since I is the union of J1, . . . , Jn, it followsthat x1 = a and yn = b, where the intervals J1 and Jn are inclusive on the left and rightrespectively. Further, we must have yi = xi+1 for each i; if not, then there exists z ∈ [a, b]such that yi < z < xi+1, contradicting the fact that I is the union of elements of P . Finally,we have∑

J∈P

|J | =n∑i=1

(yi − xi) =n−1∑i=1

(xi+1 − xi) + (yn − xn) = yn − x1 = (b− a) = |I|

Definition 3.3.5. A function f : I → R is piecewise constant if there is a partition P ofI such that for all J ∈ P , f is constant on J . We say f is piecewise constant with respectto P .

Definition 3.3.6. A partition P ′ is finer than a partition P (of the same interval I) if forevery J ′ ∈ P ′, there exists J ∈ P such that J ′ ⊆ J . We say P ′ refines P .

Proposition 3.3.7. If f is piecewise constant with respect to P and P ′ refines P , then f ispiecewise constant with respect to P ′.

Proposition 3.3.8. If f, g are both piecewise constant on I, then so are f+g, f−g, f ·g, f/g(if g is nonzero everywhere on I).

Proof. The key is to find a partition P of I such that f and g are both piecewise constanton P . Say f is piecewise constant with respect to P1, g is piecewise constant with respectto P2. Take P = J ∩K | J ∈ P1, K ∈ P2; then one can easily see that P is partition of Iand is finer both P1 and P2. It is straightorward to check that f + g, f − g, f · g and f/g arealso piecewise constant with respect to P .

51

Definition 3.3.9. For f piecewise constant on a bounded interval I with respect to apartition P of I, define

p.c.

∫[P ]

f =∑J∈P

cJ · |J |

where f(j) = cJ for all j ∈ J , i.e. cJ is the constant value of f on J .

Proposition 3.3.10. Let I be a bounded interval. Then the value of∫

[P ]f for f : I → R for

any partition P of I for which f is piecewise constant is independent of P (it depends onlyon f, I).

Proof. Let P ′ be a partition of I which refines P . Since P ′ is finer than P , there exists asubset ΩJ of P ′ which partitions J for each J ∈ P . Since ΩJ is a partition of J , the constantvalue of f on any J ′ ∈ ΩJ is the same as the constant value of f on J . Further, by an earlierproposition, we have |J | =

∑J ′∈ΩJ

|J ′|. Hence

p.c.

∫[P ]

f =∑J∈P

cJ · |J | =∑J∈P

cJ ·∑J ′∈ΩJ

|J ′| =∑J ′∈P ′

cJ ′ · |J ′| = p.c.

∫[P ′]

f

Then given partitions P1, P2 of I, we can pick a partition P ′ which refines both P1, P2, whencethe above remark completes the proof.

Definition 3.3.11. Let I be a bounded interval, and f : I → R be piecewise constant. Then

p.c.

∫I

f = p.c.

∫[P ]

f

for some (equivalently, by the last proposition, all) partition(s) P of I such that f is piecewiseconstant with respect to P .

Definition 3.3.12. We say f majorizes f on I if for all x ∈ I, f(x) > f(x). Similarly, fminorizes f on I for all x ∈ I, f(x) 6 f(x).

Definition 3.3.13. Let I be a bounded interval, and f : I → R be bounded. Then theupper Riemann integral of f on I is defined as∫

I

f = infp.c.∫

[P ]

f | f majorizes f and is piecewise constant on I

Analogously, the lower Riemann integral of f on I is defined as∫I

f = supp.c.∫

[P ]

f | f minorizes f and is piecewise constant on I

Proposition 3.3.14. Let f be piecewise constant and bounded on a bounded interval I.Then ∫

I

f 6∫I

f

52

Proof. Let f be piecewise constant with respect to a partition P1 of I and majorize f .Similarly, let f be piecewise constant with respect to a partition P2 of I and minorize f .

Fix a partition P which refines both P1 and P2, so that f and f are piecewise constant with

respect to P . Then it is clear f(x) > f(x) for each x ∈ I, so

p.c.

∫[P ]

f =∑J∈P

(f)J · |J | 6∑J∈P

(f)J · |J | = p.c.

∫[P ]

f

where it is understood that (f)J is the piecewise constant value of f on J , and analogously

(f)J is the piecewise constant value of f on J . Since we have∫I

f 6 p.c.

∫[P ]

f

and

p.c.

∫[P ]

f 6∫I

f

for all f, f as defined above, together we have∫I

f 6∫I

f

Definition 3.3.15. We say f is Riemann integrable on I if∫I

f =

∫I

f

Then ∫I

f

the Riemann integral of f on I, is ∫I

f =

∫I

f

An alternate definition of the Riemann integral

Note that for any partition P of I, the function

gP (y) = supx∈J,J∈P

f(x)

is piecewise constant with respect to P , and it majorizes f . Moreover, for any f majorizingf which is piecewise constant with respect to P , we have f(y) > gP (y). By definition, wehave

p.c.

∫[P ]

gP =∑J∈P

|J | · supx∈J

f(x)

53

We refer to the sum on the RHS as the upper Riemann sum of f, P denoted U(f, P ).Since gP majorizes f for each partition P , and is piecewise constant with respect to I, wehave ∫

I

f = infp.c.∫

[P ]

f | f majorizes f and is piecewise constant on I 6 U(f, P )

for each partition P , whence we have∫I

f 6 infU(f, P ) | P a partition of I

Similarly, since f(y) > gP (y) for any particular partition Y , we have

infU(f, P ) | P a partition of I 6 p.c.

∫[P ]

f

for each f which majorizes f and is piecewise constant on I, so

infU(f, P ) | P a partition of I 6∫I

f

so ∫I

f = infU(f, P ) | P a partition of I

Similarly, define

L(f, P ) =∑J∈P

|J | · infx∈J

f(x)

This is the lower Riemann sum. We then have similarly∫I

f = supp.c.∫

[P ]

f | f minorizes f and is p.c. on I = supL(f, P ) | P a partition of I

Proposition 3.3.16. If f is piecewise constant on I, then f is Riemann integrable and∫I

f = p.c.

∫I

f

Proof. Since f ∈ f | f minorizes f and is p.c. on I, we have

p.c.

∫I

f 6∫I

f

Similarly, since f ∈ f | f majorizes f and is p.c. on I, we have

p.c.

∫I

f >∫I

f

54

Hence, by a previous proposition, we have∫I

f =

∫I

f =

∫I

f

and

p.c.

∫I

f 6∫I

f =

∫I

f =

∫I

f 6 p.c.

∫I

f

so

p.c.

∫I

f =

∫I

f

Theorem 3.3.17. Let I be a bounded interval. Let f : I → R be uniformly continuous.Then f is Riemann integrable on I.

Proof. The theorem holds trivially if |I| = 0. Suppose |I| > 0. Fix ε > 0. We will show that∫I

f 6∫I

f + ε

We do this for every ε > 0, so by a previous proposition,∫I

f =

∫I

f

Since f is uniformly continuous, there exists δ > 0 such that |x−y| < δ =⇒ |f(x)−f(y)| <ε/|I|. Take any partition P such that every interval J in P has length less than δ. Then foreach J ∈ P , for any x, y ∈ J

f(x) < f(y) +ε

|I|So for every y ∈ J ,

supx∈J

f(x) 6 f(y) +ε

|I|So

supx∈J

f(x) 6 infy∈J

f(y) +ε

|I|and thus

|J | · supx∈J

f(x) 6 |J | · infx∈J

f(x) +|J |ε|I|

Summing over all J ∈ P , we obtain∑J∈P

|J | · supx∈J

f(x) 6∑J∈P

|J | · infx∈J

f(x) +∑J∈P

|J |ε|I|

i.e.U(f, P ) 6 L(f, P ) + ε

55

Thus, letting Γ be the set of all partitions of I,

infQ∈Γ

U(f,Q) 6 U(f, P ) 6 L(f, P ) + ε 6 supQ∈Γ

L(f,Q) + ε

i.e. ∫I

f 6∫I

f + ε

as desired.

Corollary 3.3.18 (S07.9). Let f : [a, b] → R be continuous ([a, b] bounded). Then f isRiemann integrable on [a, b].

Proof. This is immediate, since [a, b] is compact, so f is in fact uniformly continuous on[a, b], hence Riemann integrable by the last theorem.

Proposition 3.3.19. If : I → R is bounded by M , then

−M · |I| 6∫I

f 6∫I

6M · |I|

Proof. This is clear, since the constant function M majorizes (resp −M minorizes) f .

Theorem 3.3.20. Let f : I → R be continuous. Suppose I is a bounded interval and f isbounded. Then f is Riemann integrable on I.

Proof. Let ε > 0. We will find a partition P of I such that

U(f, P ) 6 L(f, P ) + ε

whence f is Riemann integrable on I, as shown in the previous theorem. Let a, b be the leftand right endpoints respectively of I. Let M bound |f | on I. Let δ > 0 be small enough sothat 2δ < b− a, and 2M · δ < ε/3.By the previous theorem, since f is uniformly continuous on [a + δ, b − δ], f is Riemannintegrable on [a+ δ, b− δ]. Hence, there exists a partition P of [a+ δ, b− δ] such that

U(f, P ) 6 L(f, P ) +ε

3

Let P be the partition of I attained by adding P to the intervals (a, a + δ) and (b − δ, b)(where we could change the left and right braces to match whether I is open or closed oneither end). Note that

δ · infx∈(a,a+δ)

f(x) > δ · −M and δ · infx∈(b−δ,b)

f(x) > δ · −M

andδ · sup

x∈(a,a+δ)

f(x) 6 δ ·M and δ · supx∈(b−δ,b)

f(x) 6 δ ·M

56

Then

U(f, P ) =δ · supx∈(a,a+δ)

f(x) + U(f, P ) + δ · supx∈(b−δ,b)

f(x)

62M · δ + U(f, P )

6L(f, P ) +ε

3+ 2M · δ

=L(f, P )− 2M · δ +ε

3+ 4M · δ

6δ · infx∈(a,a+δ)

f(x) + L(f, P ) + δ · infx∈(b−δ,b)

f(x) +ε

3+ 4M · δ

=L(f, P ) +ε

3+ 4M · δ

<L(f, P ) +ε

3+ 2 · ε

3=L(f, P ) + ε

Definition 3.3.21. We say f is piecewise continuous on I if there exists a partition Pof I such that f is continuous on each J ∈ P .

Proposition 3.3.22. If f is bounded and piecewise constant on a bounded interval I, thenit is Riemann integrable on I.

Proof. Let ε > 0 be given. Let P be a partition of I such that f is continuous on eachelement of P . Let |P | = n. Since f is Riemann integrable on each element of P , so thereexists a partition PJ of J for each J ∈ P such that U(f, PJ) 6 L(f, PJ) + ε/n. Replace Pby the refinement of P ′ obtained by

⋃J∈P PJ . Then

U(f, P ′)− L(f, P ′) =∑J∈P

U(f, PJ)− L(f, PJ) 6 n · εn

= ε

Proposition 3.3.23. If f : [a, b]→ R is monotone, then f is Riemann integrable on [a, b].

Proof. Homework (S13.1, F12.2).

Example 3.3.24. We now exhibit the standard example of a function f : [0, 1]→ R whichis not Riemann integrable. Let

f(x) =

0 if x ∈ [0, 1] ∩Q1 if x ∈ [0, 1] \Q

Then for every interval J ∈ [0, 1] of nonzero length,

supx∈J

f(x) = 1, infx∈J

= 0

So for every partition P of I, L(f, P ) = 0 and U(f, P ) = 1, so∫I

f = 0 6= 1 =

∫I

f

57

Theorem 3.3.25. Let I be a bounded interval, and let f : I → R, g : I → R be Riemannintegrable on I. Then

(1) f + g is Riemann integrable on I, and∫I

f + g =

∫I

f +

∫I

g

(2) For any real c, cf is Riemann integrable on I, and∫I

cf = c

∫I

f

(3) If I = J ∪K for disjoint intervals J,K, then f is Riemann integrable on each of J,K,and ∫

I

f =

∫J

f +

∫K

f

(4) If f(x) > 0 for all x ∈ I, then ∫I

f > 0

Using (1) and (2), if f(x) > g(x) for all x ∈ I, then∫I

f >∫I

g

(5) If f(x) = c for all x ∈ I, then ∫I

f = c|I|

(6) The functions min(f, g) : x 7→ min(f(x), g(x)) and max(f, g) : x 7→ max(f(x), g(x)) areboth Riemann integrable on I

3.4 Lecture 12 - The Riemann Integral (II)

Today, we will prove several theorems which will help us calculate integrals. In particular, wewill prove the 1st and 2nd fundamental theorems of calculus; we will introduce the Riemann-Stieltjes integral; and prove a kind of product rule (integration by parts) and chain rule(change of variables) for integrals. We will begin by proving (6) from the theorem introducedat the end of Lecture 11.

Proof. We will prove the result for the max function; the proof for the min function is similar.We start with a claim.

Claim 3.4.1. Let a, a, b, b ∈ R such that a > a, b > b. Then maxa, b − maxa, b 6(a− a) + (b− b).

58

Proof. Note

max(a, b)−max(a, b) = min(max(a, b)− a,max(a, b)− b)

6

a− a if max(a, b) = a

b− b if max(a, b) = b

6(a− a) + (b− b)

Fix ε > 0. Since f, g are Riemann integrable on I, there exist partitions P1, P2 of I suchthat

U(f, P1)− L(f, P1) 6ε

2

U(g, P2)− L(g, P2) 6ε

2

Let P refine P1, P2. Note that for any refinement K ′ of a partition K, we have U(f,K ′) 6U(f,K) and L(f,K ′) > L(f,K). Hence,

U(f, P )− L(f, P ) 6ε

2

U(g, P )− L(g, P ) 6ε

2Note for any J ∈ P

max(supx∈J

f(x), supx∈J

g(x)) > max(f(x), g(x))

for each x ∈ J , somax(sup

x∈Jf(x), sup

x∈Jg(x)) > sup

x∈Jmax(f(x), g(x))

Similarly,max(inf

x∈Jf(x), inf

x∈Jg(x)) 6 max(f(x), g(x))

for each x ∈ J , somax(inf

x∈Jf(x), inf

x∈Jg(x)) 6 inf

x∈Jmax(f(x), g(x))

Thus, by the claim above, we have

U(max(f, g), P )− L(max(f, g), P ) =∑J∈P

|J | · [(supx∈J

max(f, g)(x)− infx∈J

max(f, g)(x)]

6∑J∈P

|J | · [max(supx∈J

f(x), supx∈J

g(x))−max(infx∈J

f(x), infx∈J

g(x))]

6∑J∈P

|J | · ([supx∈J

f(x)− infx∈J

f(x)] + [supx∈J

g(x)− infx∈J

g(x)])

=(U(f, P )− L(f, P )) + (U(g, P )− L(g, P ))

2+ε

2= ε

59

Corollary 3.4.2. If f : I → R is Riemann integrable, then so are f+ = max(f, 0) andf0 = min(f, 0) and |f | = f+ − f−.

Proof. By the last proposition, f+ and f− are Riemann integrable on I. Since |f | = f+−f−,it must also be Riemann integrable on I.

Proposition 3.4.3. If f : I → R, let J,K ⊆ I be intervals such that J ∩K = ∅, J ∪K = I.If f is Riemann integrable on each of J,K, then f is Riemann integrable on I.

Theorem 3.4.4. Let I be a bounded interval, and let f, g : I → R be Riemann integrable onI. Then f · g is Riemann integrable on I.

Proof. Fix ε > 0. We will find a partition P of I such that U(f, P ) − L(f, P ) 6 ε. Note fand g are bounded, and f = f+ + f− and g = g+ + g−. Hence

f · g = f+g+ + f−g+ + f+g− + g−g− = f+g+ − (−f−)g+ − f+(−g−) + (−g−)(−g−)

and all functions above are positive on I. Hence, it suffices to prove the theorem for functionswhich take nonnegative values only, whence we obtain f+g+, (−f−)(g+), f+(−g−), (−g−)(−g−)are all Riemann integrable, hence so is their sum by closure of Riemann integrable functionsunder addition and subtraction.Assume f, g > 0 on I. Fix ε > 0. Note f, g are bounded; let M bound both of |f |, |g|. Fixa partition P of I such that U(f, ))− L(f, P ), U(g, P )− L(g, P ) 6 ε/2M . Then

U(fg, P )− L(fg, P ) =∑J∈P

|J | · [supx∈J

f(x)g(x)− infx∈J

f(x)g(x)]

6∑J∈P

|J | · [supx∈J

f(x) supx∈J

g(x)− infx∈J

f(x) infx∈J

g(x)]

6∑J∈P

|J | · [supx∈J

f(x) supx∈J

g(x)− infx∈J

f(x) supx∈J

g(x) + infx∈J

f(x) supx∈J

g(x)− infx∈J

f(x) infx∈J

g(x)]

6∑J∈P

|J | · [supx∈J

f(x)− infx∈J

f(x)] ·M −∑J∈P

|J | · [supx∈J

g(x)− infx∈J

g(x)] ·M

=M [(U(f, P )− L(f, P )) + (U(g, P )− L(g, P ))]

<M[ ε

2M+

ε

2M

]= ε

The Riemann-Stieltjes IntegralLet I be a bounded interval, f : I → R and α : I → R monotone increasing. The idea of theRiemann-Stieltjes integral is to replace |J | by α(right edge of J)− α(left edge of J).

Definition 3.4.5. The α-length of an interval J ⊆ I, denoted α[J ], is α(b)− α(a), whereJ is any of [a, b], (a, b), [a, b), (a, b]). If J is empty, we take α[J ] = 0. If α = id, then α[J ] isjust the traditional length |J |.

60

Definition 3.4.6. The Riemann-Stieltjes integral of f on I (and upper/lower sums, andpiecewise constant integrals) are defined as before using α[J ] instead of |J | throughout. Wedenote the Riemann-Stieltjes integral of f on [a, b] using α by∫ b

a

fdα

Calculating Integrals

Theorem 3.4.7 (Fundamental Theorem of Calculus I). Let a < b, f : [a, b]→ R be Riemannintegrable. Let F : I → R be the function F (x) =

∫[a,x]

f . Then F is continuous on I, and for

every x0 ∈ [a, b], if f is continuous at x0, then F is differentiable at x0 and F ′(x0) = f(x0).

Proof. Let M be a bound for |f | on I. Then for every x < y in [a, b], we have:

|F (y)− F (x)| =∣∣∣∣ ∫

[a,y]

f −∫

[a,x]

f

∣∣∣∣ =

∣∣∣∣ ∫[x,y]

f

∣∣∣∣ 6 ∫[x,y]

|f | 6∫

[x,y]

M = M(y − x)

So F is Lipschitz continuous on I. Suppose f is continuous at x0. Then for every ε > 0,there exists δ > 0 such that |y − x0| < δ implies |f(y)− f(x0)| < ε. For x > x0,

F (x)− F (x0) =

∫[x0,x]

f

Suppose x− x0 < δ. Then for all y ∈ [x0, x],

f(x0)− ε < f(y) < f(x0) + ε

Using a previous proposition gives

(x− x0)(f(x0)− ε) <∫

[x0,x]

f < (x− x0)(f(x0) + ε)

Simillarly, if x < x0, and x > x0 − δ, then

(x− x0)(f(x0)− ε) <∫

[x0,x]

f < (x− x0)(f(x0) + ε)

So |x− x0| < δ implies

(x− x0)(f(x0)− ε) < F (x)− F (x0) < (x− x0)(f(x0) + ε)

whence

f(x0)− ε < F (x)− F (x0)

x− x0

< f(x0) + ε

i.e.

|x− x0| < δ =⇒∣∣∣∣F (x)− F (x0)

x− x0

− f(x0)

∣∣∣∣ < ε

61

Definition 3.4.8. We say F : I → R is an antiderivative of f : I → R (I bounded) if F isdifferentiable and for all x ∈ I, F ′(x) = f(x).

Theorem 3.4.9 (Fundamental Theorem of Calculus II). Let a < b, f : [a, b]→ R be Riemannintegrable. If F is (any) antiderivative of f , then∫

[a,b]

f = F (b)− F (a)

Proof. Let P be a partition of [a, b]. Let J ∈ P (with |J | > 0), say the left and rightendpoints of J are y and z. Then by the Mean Value Theorem, since F ′ = f by assumption,F (z)− F (y) = (z − y)f(w) = |J |f(w) for some w ∈ J .So

|J | infx∈J

f(x) = F (z)− F (y) 6 |J | supx∈J

f(x)

For each J ∈ P , denote the left and right endpoint of J by yJ and zJ respectively. Thesumming over all J ∈ P , we find

L(f, P ) 6∑J∈P

F (zJ)− F (yJ) 6 U(f, P )

The middle sum is telescoping, so only the first and last terms of the sum survive, and weget

L(f, P ) 6 F (b)− F (a) 6 U(f, P )

Since f is Riemann integrable, we can get L(f, P ) and U(f, P ) arbitrarily close to∫

[a,b]f .

By the last inequality, then ∫[a,b]

f = F (b)− F (a)

Proposition 3.4.10. If F,G are both antiderivatives of a function f , then there exists aconstant c ∈ R such that for all x, F (x) = G(x) + c.

Proof. If f is Riemann integrable, we can use the Fundamental Theorem of Calculus II toget

F (x) = F (a) +

∫[a,x]

f and G(x) = G(a) +

∫[a,x]

f

So F (x)−G(x) = F (a)−G(a) is constant.For the general case, we use the Mean Value Theorem. Let H = F −G. Then H ′ = F ′−G′,so for all x ∈ I, H ′(x) = 0. Take y < z ∈ I. We show H(y) = H(z).By the Mean Value Theorem, there exists w ∈ (y, z) such that

H(z)−H(y)

z − y= H ′(w) = 0

so H(y) = H(z).

62

Hereafter, we denote∫

[a,b]f by

∫ baf .

Theorem 3.4.11 (Integration by parts). Let F : [a, b] → R, G : [a, b] → R be differentiableon [a, b]. Suppose F ′, G′ are Riemann integrable on [a, b]. Then FG′, G′F are Riemannintegrable, and ∫ b

a

FG′ = F (b)G(b)− F (a)G(a) =

∫ b

a

F ′G

Proof. F is differentiable, hence continuous, so F is Riemann integrable on [a, b]; the sameholds for G. Since F ′, G′ are Riemann integrable [a, b], G′F and FG′ are also Riemannintegrable on [a, b]. By the 2nd fundamental theorem of calculus and the product rule fordifferentiation,∫ b

a

(FG′) = (FG)(b)−(FG)(a) = F (b)G(b)−F (a)G(a) =

∫ b

a

F ′G+G′F =

∫ b

a

F ′G+

∫ b

a

G′F

Theorem 3.4.12. Let α : [a, b] → R be monotone increasing. Suppose α is differentiableon [a, b] and α′ is Riemann integrable on [a, b]. Let f be Riemann-Stieltjes integrable withrespect to α on [a, b]. Then fα′ is Riemann integrable on [a, b], and∫ b

a

fα′ =

∫ b

a

fdα

Proof. Consider first the case that f is piecewise constant on [a, b] with respect to P . Thenfor each J ∈ P , by the 2nd fundamental theorem of calculus,∫

J

fdα =

∫J

cJα′ = cJ

∫J

α′ = cJ(α(z)− α(y)) = cJ · α[J ]

where cJ is the constant value of f on J , and y, z are the endpoints of J . Hence∫ b

a

fα′ =∑J∈P

∫J

fα′ =∑J∈P

cJα[J ] =

∫abfdα

For the general case, approximate f by piecewise constant functions. See Corollary 11.10.3from Tao I.

Proposition 3.4.13. Let Φ: [a, b] → [Φ(a),Φ(b)] be continuous and monotone increasing.Let f : [Φ(a),Φ(b)]→ R be Riemann integrable. Then f Φ: [a, b]→ R is Riemann-Stieltjesintegrable with respect to Φ, and ∫ b

a

fdΦ =

∫ Φ(b)

Φ(a)

f

Proof. We will show this for the case of piecewise constant functions f . The general casefollows by a standard argument (see Tao I, 11.10.6). Say f is pieewise constant with respectto P . Note if J ⊆ [Φ(a),Φ(b)] is an interval, then end points of J are taken as values by Φ

63

by the Intermediate Value Theorem. So there is an interval J ⊆ [a, b] such that Φ(J) = J .Let P be the set of these J . Then P is a partition of [a, b], so∫ Φ(b)

Φ(a)

f =∑J∈P

cJ · |J | =∑J∈P

cJ · Φ[J ] =

∫ b

a

fdΦ

where cJ = cJ , the constant value of f Φ on J .

Theorem 3.4.14 (Change of variables). Let Φ: [a, b]→ [Φ(a),Φ(b)] be differentiable, mono-tone increasing such that Φ′ is Riemann integrable. Let f : [Φ(a),Φ(b)] → R be Riemannintegrable. Then (f Φ) · Φ′ is Riemann integrable on [a, b], and∫ b

a

(f Φ)Φ′ =

∫ Φ(b)

Φ(a)

f

Proof. Immediate from the previous two results.

64

4 Week 4

As per the syllabus, Week 4 topics include: Young’s, Holder’s, and Minkowski’s inequaities,formal power series, radius of convergence, real analytic functions, absolute and uniformconvergence on closed subintervals, derivatives and integrals of power series, Taylor’s forumla,Abel’s lemma, Abel’s theorem for uniform convergence and continuity, Stone-Weierstrasstheorem, Cauchy mean value theorem, Taylor theorem with reminder in Lagrange, Cauchy,and integral forms, Newton’s methods for finding roots of a single function, error bounds innumerical integration and differentiation (homework).

4.1 Lecture 13 - Limits of Integrals, Mean Value Theorem forIntegrals, and Integral Inequalities

Today, we will discuss how limits of integrals behave with respect to certain classes of func-tions. We will prove an analogue of the Mean Value Theorem for integrals. Finally, weprove some useful integral inequalities - namely, Cauchy-Schwarz and Young’s, Holder’s, andMinkowski’s inequalities.

A few words on limits

Definition 4.1.1. If f : [a, b) → R is not bounded, we can still try to make sense of∫ baf

as limx→b∫ xaf . If the limit exists, take

∫ baf to be this limit. We can similarly define an

interpretation of∫ baf on the left side of the interval, as well as for integrals of the form∫∞

af,∫ b−∞ f .

Theorem 4.1.2 (F04.3). If fn : [a, b] → R are Riemann integrable and (fn) converges uni-formly to f : [a, b]→ R, then f is Riemann integrable, and∫ b

a

f = limn→∞

∫ b

a

fn

Equivalently, under the above assumption,∫ b

a

limn→∞

fn = limn→∞

∫ b

a

fn

Proof. Homework F04.3 (and a few more). For a counterexample when uniform convergenceis not assumed, see S08.2 (among others).

This gives the following special case for sums:

Theorem 4.1.3. If fn : [a, b]→ R are Riemann integrable and∑∞

n=1 fn converges uniformlyon [a, b], then ∫ b

a

∞∑n=1

fn =∞∑n=1

∫ b

a

fn

65

A useful proposition to know in the context of the theorem above is the WeierstrassM -test:

Theorem 4.1.4 (Weierstrass M -test). If

∞∑n=1

||fn||∞ <∞

then∑∞

n=1 fn converges uniformly. This is sometimes stated alternatively as follows: suppose

|fn(x)| 6Mn

for each x ∈ [a, b]. Then∑∞

n=1 fn converges uniformly if∑∞

n=1Mn converges.

Proof. See Rudin’s “Principles of Mathematical Analysis”, 7.10.

Proposition 4.1.5. Let f : I → R be continuous and Riemann integrable. Suppose |I| > 0,and f(x) > 0 for all x ∈ I. Then ∫

I

f = 0

if and only if f(x) = 0 for all x ∈ I.

Proof. (⇐= ) This is clear.( =⇒ ) Suppose for contradiction that f(x0) > 0 for some x0 ∈ I. For the sake of convenience,suppose x0 is not an endpoint; if x0 were an endpoint, we will soon see that we can picka nearby point x′0 6= x0 for which f(x′0) > 0 using continuity anyway. Say f(x0) = u. Bycontinuity, there exists δ > 0 such that |x − x0| < δ implies |f(x) − f(x0)| < u/2, i.e.f(x) > u/2. Take δ small enough that B(x0, δ) ⊆ I. Let J,K be partitions of I to the leftof x0 − δ and to the right of x0 + δ. Then∫

I

f(x)dx =

∫J

f(x)dx+

∫ x0+δ

x0−δf(x)dx+

∫K

f(x)dx > 0 + 2δ · u2

+ 0 > 0

a contradiction.

Corollary 4.1.6. Let f, g : I → R be continuous and Riemann integral. Suppose |I| > 0,and f(x) 6 g(x) for all x ∈ I. Then if ∫

I

f =

∫I

g

then for all x ∈ I, f(x) = g(x).

Proof. Use the proposition above on g − f .

Theorem 4.1.7 (Mean Value Theorem I for integrals). Let f : [a, b] → R be continuous,where a < b. Then there exists c ∈ [a, b] such that

1

b− a

∫ b

a

f = f(c)

66

In fact, we will prove something stronger.

Theorem 4.1.8 (Mean Value Theorem II for integrals). Let f : [a, b] → R be continuous,where a < b. Let ϕ : [a, b]→ R be Riemann integrable on [a, b], and suppose ϕ(x) > 0 for allx ∈ [a, b]. Then there exists c ∈ [a, b] such that∫ b

a

f · ϕ = f(c) ·∫ b

a

ϕ

Note that Mean Value Theorem I follows from the special case ϕ = 1.

Proof. First, since f is continuous and [a, b] is compact, note that f achieves its minimum andmaximum, say at points xmin, xmax ∈ [a, b] respectively. Then in particular, |f | is boundedby some M ∈ R, so if ∫ b

a

ϕ = 0

then ∣∣∣∣ ∫ b

a

∣∣∣∣ 6 ∫ b

a

Mϕ = 0

so for every x ∈ [a, b], ∫ b

a

fϕ = f(x)

∫ b

a

ϕ

Now assume ∫ b

a

ϕ 6= 0

Replace ϕ by

ϕ ·(∫ b

a

ϕ

)−1

so that we may assume ∫ b

a

ϕ = 1

This normalization does not change the nature of the proof, but makes some later detailssimpler. We then note ∫ b

a

f(xmin)ϕ 6∫ b

a

fϕ 6∫ b

a

f(xmax)ϕ

i.e.

f(xmin)

∫ b

a

ϕ 6∫ b

a

fϕ 6 f(xmax)

∫ b

a

ϕ

hence

f(xmin) 6∫ b

a

fϕ 6 f(xmax)

67

By the Intermediate Value Theorem, since f(xmin), f(xmax) ∈ f([a, b]), there exists c ∈ [a, b](in fact, c is in between xmin and xmax) such that

f(c) =

∫ b

a

Then

f(c)

∫ b

a

ϕ =

∫ b

a

Some Important Inequalities

Theorem 4.1.9 (Cauchy-Schwarz Inequality). Let f, g : I → R be continuous and Riemannintegrable, and let |I| > 0. Then ∫

I

fg 6

√∫I

f 2

∫I

g2

with equality if and only if f and g are linearly dependent, meaning there exists c ∈ R suchthat f(x) = cg(x) for all x ∈ R (or g(x) = cf(x) for all x ∈ R).

Proof. If ∫I

f 2 = 0

then for all x ∈ I, f(x) = 0 by the previous proposition, and the theorem is clear. Oneproceeds similarly if ∫

I

g2 = 0

Suppose ∫I

f 2 6= 0,

∫I

g2 6= 0

Note that for all u, v ∈ R, (u−v)2 > 0, i.e. u2−2uv+v2 > 0, so 2uv 6 u2 +v2 with equalityif and only if u = v. We apply this with

u =f(x)√∫If 2, v =

g(x)√∫Ig2

These can be thought of the values at x of “normalizations” f and g. Then for all x ∈ I,

2f(x)g(x)√∫If 2∫Ig2

6f(x)2∫If 2

+g(x)2∫Ig2

68

with equality if and only iff(x)2∫If 2

=g(x)2∫Ig2

Now integrate both sides to obtain∫I

2f(x)g(x)√∫If 2∫Ig2

6

∫If(x)2∫If 2

+

∫Ig(x)2∫Ig2

with equality if and only iff(x)2∫If 2

=g(x)2∫Ig2

for all x ∈ I. Hence ∫I

2f(x)g(x)√∫If 2∫Ig2

6 2

so ∫I

fg 6

√∫I

f 2

∫I

g2

with equality if and only if ∫I

2f(x)g(x)√∫If 2∫Ig2

6 2

In this case, f and g are clearly linearly dependent; if f = c · g for some c ∈ R, then one cantrace back through thse steps and in fact show that

c =

√∫If 2∫

Ig2

Proposition 4.1.10 (F10.11). Find the function g(x) which minimizes∫ 1

0

|f ′(x)|2dx

amongst smooth f : [0, 1]→ R with f(0) = 0, f(1) = 1. Is the optimal g unique?

Solution. The idea is to use Cauchy-Schwarz on the functions f ′, 1. We obtain∫ 1

0

f ′(x) · 1dx 6

√∫ 1

0

|f ′(x)|2∫ 1

0

1

with equality if and only if f ′(x) = c for some constant c ∈ R. By the 2nd fundamentaltheorem of calculus, ∫ 1

0

f ′(x) · 1dx = f(1)− f(0) = 1− 0 = 1

69

so

(∗)∫ 1

0

|f ′(x)|2 > 1

with equality if and only if f ′ = c for a constant c (assuming f(1) = 1, f(0) = 0). Hence, anyf for which equality holds in (∗) will minimize the expression in question. Note the functiong(x) = x achieves equality. Also, g is the only function satisfying f(0) = 0, f(1) = 1 and f ′

is constant, so g is the unique minimizing function.

Theorem 4.1.11 (Young’s Inequality). Let φ : [0,∞) → [0,∞) be continuous and strictlymonotone increasing, with φ(0) = 0. (Then note we also have φ−1 : [0,∞) → [0,∞) whichis continuous, strictly monotone increasing, and φ−1(0) = 0.) Then for every a, b > 0,

ab 6∫ a

0

φ(x)dx+

∫ b

0

φ−1(x)dx

with equality if and only if b = φ(a).

Proof. Divide into three cases: b < φ(a), b = φ(a), b > φ(a), and draw the picture in eachcase.

Corollary 4.1.12. If p, q > 1, and 1/p+ 1/q = 1, then for all a, b > 0,

ab 6ap

p+bq

q

with equality if and only if ap = bq.

Proof. We use Young’s inequality. Let φ(x) = xp−1; note

1

p+

1

q= 1 =⇒ 1− 1

p=

1

q

sop− 1 =

p

qso

1

p− 1=q

p

Similarly,1

q − 1=p

qso

1

p− 1=q

p=

1

p/q=

1

1/(q − 1)= q − 1

So φ−1(y) = y1/(p−1) = yq−1. By Young’s inequality,

ab 6∫ a

0

xp−1dx+

∫ b

0

yq−1dy =xp

p

∣∣∣∣a0

+yq

q

∣∣∣∣b0

=ap

p+bq

q

70

with equality if and only ifb = ap−1 = ap/q

which is true if and only ifbq = ap

Note that the case of p = q = 2 gives

ab 6a2

2+b2

2

which is exactly what we used to get Cauchy-Schwarz. We can use a similar argument tothe proof of Cauchy-Schwarz using the more general inequality proved above to show thefollowing inequality.

Theorem 4.1.13 (Holder’s Inequality). Let f, g : I → R be continuous, and Riemann inte-grable, and let |I| > 0. Let p, q > 1 such that 1/p+ 1/q = 1. Then∫

I

|fg| 6 p

√∫I

|f |p q

√∫I

|g|q

with equality if and only if |f |p and |g|q are linearly dependent.

Proof. Put

a =|f(x)|

p

√∫I|f |p

, b =|g(x)|

q

√∫I|g|q

and run the same proof as before using the corollary above instead.

Theorem 4.1.14 (Minkowski’s Inequality). Let f, g : I → R be continuous, and Riemannintegrable, and let |I| > 0. Let p > 1. Then∫

I

|f + g|p 6 p

√∫I

|f |p + p

√∫I

|g|p

with equality if and only if f and g are linearly dependent by a constant ≥ 0.

Proof. Let q be such that1

q+

1

p= 1

i.e.

q − 1 =1

p− 1so

q = 1 +1

p− 1

71

We also then have

q(p− 1) =

(1 +

1

p− 1

)(p− 1) = p− 1 + 1 = p

By Holder’s Inequality,∫I

|f + g|p =

∫I

|f + g| · |f + g|p−1 6∫I

(|f |+ |g|) · (|f + g|p−1)

by the triangle inequality with equality if and only if f(x) and g(x) have the same sign foreach x ∈ I. Then by Holder’s Inequality, we obtain∫I

|f | · |f + g|p−1 +

∫I

|g| · |f + g|p−1 6 p

√∫I

|f |p q

√∫I

|f + g|q(p−1) + p

√∫I

|g|p q

√∫I

|f + g|q(p−1)

=

(p

√∫I

|f |p + p

√∫I

|g|p)(

q

√∫I

|f + g|p)

Dividing by

q

√∫I

|f + g|p

we get (∫I

|f + g|p)1−1/q

6 p

√∫I

|f |p + p

√∫I

|g|p

i.e., since 1− 1/q = 1/p,

q

√∫I

|f + g|p 6 p

√∫I

|f |p + p

√∫I

|g|p

For equality, through use of Holder’s Inequality, we need |f |p, |f + g|p−1 to be linearly de-pendent, and |g|p, |f + g|p−1 to be linearly dependent, so in particular, we need |f |, |g| to belinearly dependent. Also, through use of the triangle inequality, we need f(x) and g(x) tohave the same sign for all x ∈ I. Combining these conditions, we need f and g to be linearlydependent, and by a constant c > 0. This is a necessary condition for equality, and it isstraightforward to check it is also sufficient.

4.2 Lecture 14 - Power Series (I), Taylor Series, and Abel’s Lemma/Theorem

Today, we will introduce formal power series. We will discuss how the radius of convergenceaffects the convergence of power series, including two powerful tools, namely Abel’s Lemmaand Abel’s Theorem. We will also briefly introduce Taylor series.

72

Definition 4.2.1. A formal power series centered at a ∈ R is any series of the form

∞∑n=0

cn(x− a)n

where cn ∈ R is called the nth coefficient of the series. The radius of convergence of theseries is defined to be

R =1

lim supn→∞(cn)1/n

We allow R = +∞ if lim supn→∞(cn)1/n = 0 and R = 0 if lim supn→∞(cn)1/n =∞.

Theorem 4.2.2. (a) If |x− a| > R, then

∞∑n=0

cn(x− a)n

diverges.

(b) (S06.2) If |x− a| < R, then∞∑n=0

cn(x− a)n

converges absolutely.

Proof. (a) It is enough to show that |cn(x − a)n| does not converge to 0; to do this, it isenough to find infinitely many n where |cn(x−a)n| > 1. By the definition of lim sup, foreach ε > 0, there are infinitely many n such that

|cn|1/n > lim supn→∞

|cn|1/n − ε

So for each ε > 0, we have infinitely many n such that

|cn|1/n >1

R− ε

Since |x− a| > R by assumption, put

ε =

(|x− a|R

− 1

)· 1

|x− a|> 0

Then

|cn|1/n|x− a| >(

1

R− ε)|x− a| = |x− a|

R− |x− a|

R+ 1 = 1

so |cn(x−a)n| > 1 for infinitely many n. In particular, (cn(x−a)n)n∈N does not convergeto 0.

73

(b) Again using the definition of R and lim sup, for every ε > 0,

|cn|1/n <1

R+ ε

for all but finitely many n (say for all n > k ∈ N). Since |x−a| < R by assumption, put

ε =

(1− |x− a|

R

)· 1

2|x− a|> 0

Then for all but finitely many n

|cn|1/n|x− a| <|x− a|R

+

(1− |x− a|

R

)· 1

2=R + |x− a|

2R<

2R

2R= 1

whence|cn(x− a)n| < Ln

where L < 1,

k∑n=0

|cn(x− a)n|+∞∑n=k

|cn(x− a)n| <k∑

n=0

|cn(x− a)n|+∞∑n=k

Ln <∞

Theorem 4.2.3. Assume R > 0. Let f : (a−R, a+R)→ R be given by

f(x) =∞∑n=0

cn(x− a)n

(Note f(x) is well-defined on the specified domain by the previous theorem.)

(a) For any r < R, the series∞∑n=0

cn(x− a)n

converges uniformly to f on [a− r, a+ r]. In particular, f is continuous on [a− r, a+ r],hence on (a−R, a+R).

(b) (Term by term differentiation of power series) f is differentiable on (a−R, a+R). Forevery r < R, the series

∞∑n=1

ncn(x− a)n−1

converges uniformly to f ′ on [a− r, a+ r].

(c) (Term by term integration of power series) For any closed [y, z] ⊆ (a−R, a+R),∫ z

y

f =∞∑n=0

cn(x− a)n+1

n+ 1

∣∣∣∣zy

74

Proof. The proofs here are relatively straightforward with the help of the previous theorem,with the exception of (b) which is a little tricky.

(a) The key here is to use the Weierstrass M-test with the help of the bound from theprevious proof. That is, in our proof of (b) in the last theorem, used with x = a+ r, wefound L < 1 such that for all n > k ∈ N for some k,

|cnrn| < Ln

Then certainly, since |x− a| < |r| for each x ∈ (a− r, a+ r), we have

|cn(x− a)n| < |cnrn| < Ln

for all n > k. Thus, it follows that

∞∑n=0

supx∈[a−r,a+r]

|cn(x− a)n| =k∑

n=0

supx∈[a−r,a+r]

|cn(x− a)n|+∞∑n=k

supx∈[a−r,a+r]

|cn(x− a)n|

<k∑

n=0

supx∈[a−r,a+r]

|cn(x− a)n|+∞∑n=k

Ln

<∞

Hence, by the Weierstrass M-test,

m∑n=0

cn(x− a)n

converges uniformly to f on [a− r, a+ r]

(b) By Tao II 3.7.2, it is sufficient to show that( l∑n=0

cn(x− a)n

)′l∈N

=

(l∑

n=1

ncn(x− a)n−1

)l∈N

conveges uniformly on [a − r, a + r] (plus convergence at a point, but this is given bypart (a)). Thus, we have to show

∞∑n=1

ncn(x− a)n−1

converges uniformly on [a − r, a + r]. For this it is enough to show that the radius ofconvergence of

∞∑n=1

ncn(x− a)n−1

75

is > r by part (a) of this theorem. For this, it is sufficient to find one point x outside[a− r, a+ r] on which

∞∑n=1

ncn(x− a)n−1

converges; this is applying part (a) of the previous theorem (the series must divergeat EVERY point outside the radius of convergence). Pick some x,w ∈ R such thatr < |x − a| < |x − w| < R. Since |x − w| < R, by part (a), f(w) converges absolutely;in particular, |cn||w − a|n is bounded by M for some M ∈ R. Then we compute

∞∑n=1

n|cn(x− a)n−1| =∞∑n=1

n|cn||x− a|n−1

|w − a|n−1|w − a|n−1 <

M

|w − a|·∞∑n=1

n|x− a|n−1

|w − a|n−1

Since |x− a|/|w − a| < 1, it is not hard to see

∞∑n=1

n|x− a|n−1

|w − a|n−1

converges, whence we’re done.

(c) This is similar to (b), but easier leveraging the information given in part (a). Since

fm =m∑n=0

cn(x− a)n

converges uniformly f on [a− r, a+ r] for any r < R by part (a), it follows that on anysubset [y, z] ⊆ (a−R, a+R), ∫ z

y

fm →∫ z

y

f

i.e.

limm→∞

∫ z

y

m∑n=0

cn(x− a)n = limm→∞

m∑n=0

∫ z

y

cn(x− a)n =

∫ z

y

limm→∞

m∑n=0

cn(x− a)n

so the desired conclusion follows.

Definition 4.2.4. f : E → R is real analytic at a ∈∫

(E) if on some neighborhood(a− r, a+ r) ⊆ E, f is equal to a power series with radius of convergence > r. We say f isreal analytic on an open set E if f is real analytic at each a ∈ E.

Proposition 4.2.5. If f is real analytic on E, then f is smooth (k-times continuouslydifferentiable for all k ∈ N) and for each k, f (k) is real analytic on E.

Proof. We proved both the base case and the induction case in part (b) of the previoustheorem.

76

Corollary 4.2.6 (Taylor’s formula). Let f : E → R be real analytic at a ∈ Int(E). Say

f(x) =∞∑n=0

cn(x− a)n

on some (a− r, a+ r). Then for all k ∈ N, fk(a) = k! · ck. In particular,

f(x) =∞∑n=0

f (n)(a)

n!(x− a)n

Proof. Since f is real analytic at a, we can apply part (b) of the previous theorem todifferentiate f(x) term by term. Then a simple induction argument shows that the constantterm of the power series expansion for fk(x) at a is k! · ck. Plugging in at a makes all thehigher order terms vanish, so we obtain fk(a) = k! · ck. Isolating ck gives ck = fk(a)/k!,yielding the familiar Taylor expansion on (a− r, a+ r).

Corollary 4.2.7. If f is representable by (equal to) two power series with coefficients(cn), (dn), then cn = dn for each n.

Proof. By the above corollary, Taylor series forces the value of cn, dn.

Next, we want to consider the behavior of power series at the end points of the radius ofconvergence. The series may converge or diverge, but we show that if the series convergesat a = −R (resp a = R), then the series converges uniformly on [a−R, a] (resp [a, a+R]).

Lemma 4.2.8 (Abel’s Lemma (F12.1)). Let (bn) be a (non-strictly) decreasing sequence ofnon-negative reals. Let (an) be such that (

∑mn=1 an)m is bounded on both sides, say by ±A.

Thenn∑

j=m+1

ajbj 6 2Abm+1

Proof. The key here is to think of this as a special kind of “integration by parts” for sums.Let

sm =m∑n=1

an

We know |sm| is bounded by A.

Claim 4.2.9. This is the parallel for sums of integration by parts.

n∑j=m+1

sj(bj+1 − bj)−n∑

j=m+1

ajbj = snbn+1 − smbm+1

77

Proof. Note

n∑j=m+1

sj(bj+1 − bj)−n∑

j=m+1

ajbj =n∑

j=m+1

sjbj+1 −n∑

j=m+1

sjbj −n∑

j=m+1

ajbj

=n∑

j=m+1

sjbj+1 −n∑

j=m+1

(sj − aj)bj

=n∑

j=m+1

sjbj+1 −n∑

j=m+1

sj−1bj

=n∑

j=m+1

sjbj+1 −n−1∑j=m

sjbj+1

=snbn+1 − smbm+1

Now to prove abel’s theorem, note (by the claim above and since (bj) is decreasing),∣∣∣∣ n∑j=m+1

ajbj

∣∣∣∣ 6 |snbn+1 − smbm+1|+n∑

j=m+1

|sj(bj+1 − bj)| =|snbn+1 − smbm+1|+n∑

j=m+1

|sj|(bj+1 − bj)

6Abn+1 + Abm+1 +n∑

j=m+1

A(bj − bj+1)

=Abn+1 + Abm+1 + (Abm+1 − Abn+1)

=2Abm+1

Theorem 4.2.10. If∞∑n=0

cn(x− a)n

converges at x = a+R (for R > 0), then the function

f(x) =∞∑n=0

cn(x− a)n

converges uniformly on [a, a+R]. Similarly at a−R.

Proof. For simplicity, suppose that a = 0. Fix ε > 0. By convergence of∑cnx

n at x = R,there exists N such that for n > m > N ,∣∣∣∣ n∑

j=m+1

cnRj

∣∣∣∣ < ε

3

78

Now considern∑

j=m+1

cnxj =

n∑j=m+1

cnRj( xR

)jfor x < R. By Abel’s lemma, used with aj = cjR

j, bj = (x/R)j, we obtain∣∣∣∣ n∑j=m+1

cnRj( xR

)j ∣∣∣∣ 6 2 · ε3·( xR

)m+1

< ε

since x < R.

Corollary 4.2.11 (Abel’s Theorem). If

∞∑n=0

cn(x− a)n

converges at x = a+R (for R > 0), then the function

f(x) =∞∑n=0

cn(x− a)n

is continuous to the left of a+R, i.e.

limx→(a+R)−

∞∑n=0

cn(x− a)n =∞∑n=0

cnRn

Similarly at a−R.

Proof. A uniform limit of continuous functions is continuous, so this is immediate by theprevious theorem.

Example 4.2.12. We compute the Taylor series for f(x) =√

1 + x around a = 0.

f(x) =√

1 + x =⇒ f(0) = 1

f(x) =1

2(1 + x)−1/2 =⇒ f ′(0) =

1

2

f(x) =−1

4(1 + x)−3/2 =⇒ f ′′(0) =

−1

4

f(x) =3

8(1 + x)−5/2 =⇒ f ′′′(0) =

3

8...

...

f (k)(x) =(−1)k+1 · (2k − 2)!

(k − 1)!22k−1(1 + x)(−2k+1)/2 =⇒ f (k)(0) =

(−1)k+1 · (2k − 2)!

(k − 1)!22k−1

79

So the Taylor series for f is

∞∑k=0

(−1)k+1 · (2k − 2)!

(k − 1)!22k−1k!xk =

(−1)k+1 · (2k)!

(k!)24k(2k − 1)xk

Call

ck =(−1)k+1 · (2k)!

(k!)24k(2k − 1)xk

(This formula for ck works also at k = 0, 1.) We take for granted for now that∑

k ckxk

converges to√

1 + x on (−1, 0]. At x = −1, we have

∞∑k=0

(−1)k+1 · (2k)!

(k!)24k(2k − 1)· (−1)k =

(−1)0!

(0!)240(−1)+∞∑k=1

− (2k)!

(k!)24k(2k − 1)= 1−

∞∑k=1

(2k)!

(k!)24k(2k − 1)

This is the limit of a decreasing sequence; in fact the same is true for any x ∈ (−1, 0]. Let

sn(x) =n∑j=0

cjxj

Then on (−1, 0], sn(x) is decreasing and converges to√

1 + x > 0. At x = −1, sn(x) > 0, so(sn(−1))n∈N is decreasing and bounded below by 0, whence it converges. Thus, our powerseries converges at x = −1; by the theorem, it converges uniformly on [−1, 0].Let’s recap. We saw that the polynomials sn converge uniformly to

√1 + x on [−1, 0]. In

the next lecture, we will prove some more general theorems about functions which can beuniformly approximated by polynomials. We also found

limn→∞

sn(−1) = 1−∞∑k=1

(2k)!

(k!)24k(2k − 1)=√

1 + (−1) = 0

so∞∑k=1

(2k)!

(k!)24k(2k − 1)= 1

4.3 Lecture 15 - Stone-Weierstrass and Taylor Series Error Ap-proximation

Today, we will discuss the approximation of certain classes of functions uniformly by poly-nomials; we will use this to prove the Stone-Weierstrass theorem. We will also give variousforms of the error bounds for Taylor expansions.Last time, we showed there is a sequence of polynomials which uniformly converges to√

1 + x on [−1, 0]. We still need to show that the Taylor expansion of√

1 + x at 0 convergesto√

1 + x on (−1, 0]. We will prove this later.

Proposition 4.3.1. Let X, Y and Z be general metric spaces. Suppose gn : X → Y convergeuniformly to g, and fn : Y → Z converge uniformly to f , and f is uniformly continuous.Then (fn gn) converges uniformly to f g.

80

Proof. Fix ε > 0. By the uniform continuity of f , there exists δ > 0 such that d(y1, y2) <δ =⇒ d(f(y1), f(y2)) < ε/2. Using uniform convergence of fn, gn, there exists N ∈ N suchthat for all n > N ,

d(fn(y), f(y)) <ε

2and d(gn(x), g(x)) < δ

for all Y ∈ Y and x ∈ X. Then for all n > N , and for all x ∈ X

d(f(g(x)), fn(gn(x))) 6 d(f(g(x)), f(gn(x))) + d(f(gn(x)), fn(gn(x))) <ε

2+ε

2= ε

since d(g(x), gn(x)) < δ for all x ∈ X and fn → f uniformly at y = gn(x).

Proposition 4.3.2. Each of the following is a uniform limit of polynomials on the indicateddomain:

(1) x 7→ |x| on [−1, 1]

(2) x, y 7→ |x− y|/2 on [−1, 1]2

(3) x, y 7→ min(x, y) on [−1, 1]2

(4) x, y 7→ max(x, y) on [−1, 1]2

(5) x1, . . . , xn 7→ min(x1, . . . , xk) on [−1, 1]k

(6) x1, . . . , xn 7→ max(x1, . . . , xk) on [−1, 1]k

Proof. (1) Let h be x 7→ |x|, and let g : [−1, 1] → [−1, 0] be given by x 7→ x2 − 1, andf : [−1, 0] → [0, 1] be given by y 7→

√y + 1. Then one can see h = f g. We showed

in the last lecture that f is a uniform limit of polynomials; call them (fn). Also, f isuniformly continuous, since f is continuous on [−1, 0]. Since g is itself a polynomial,it is trivially the uniform limit of the polynomials (gn) where gn = g for each n ∈ N.By the previous proposition, (fn gn) converge uniformly to f g = h, and fn gn is acomposition of polynomials for each n ∈ N, hence a polynomial.

(2) Note x, y 7→ |x − y|/2 is the composition of x, y 7→ x − y, z 7→ |z|, w 7→ w/2. Sincez 7→ |z| is a uniform limit of polynomials, a near identical argument to that given in (1)shows x, y 7→ |x− y|/2 is a uniform limit of polynomials.

(3) Note

min(x, y) =x+ y

2− |x− y|

2

The first term is a polynomial in x and y, and the second term is a uniform limit ofpolynomials by (2), so a similar argument to (1) and (2) does the job.

(4) Similarly, note

max(x, y) =x+ y

2+|x− y|

2

81

(5) This can be done using a simple induction argument, using the composition

min(x1, . . . , xk, xk+1) = min(min(x1, . . . , xk), xk+1)

(6) This can similarly be done using a simple induction argument, using the composition

max(x1, . . . , xk, xk+1) = max(max(x1, . . . , xk), xk+1)

Proposition 4.3.3. The previous proposition holds true if your replace [−1, 1] in the do-mains by [−M,M ] for any M ∈ R.

Proof. This is clear by composing the functions above with the linear polynomial whichscales the interval [−M,M ] to [−1, 1].

Theorem 4.3.4 (Stone-Weierstrass). Let X be a compact metric space. Let A ⊆ C(X)(where C(X) is the set of continuous functions from X into R) satisfy:

(1) A is closed under sums, products, and products with scalars. Precisely:

g ∈ A, c ∈ R =⇒ c · g ∈ R

f, g ∈ A =⇒ f + g ∈ A

f, g ∈ A =⇒ fg ∈ A

I.e., A is an algebra

(2) The constant function x 7→ 1 belongs to A (i.e. A is unital)

(3) For every x1 6= x2 ∈ X, there exists f ∈ A such that f(x1) 6= f(x2) (A separatespoints in X)

Then every f ∈ C(X) is a uniform limit of functions in A.

Example 4.3.5. If X = [a, b] ⊆ R, and A is the set of all polynomials, then A satisfies thecriteria in Stone-Weierstrass, so every continuous function f : [a, b] → R is a uniform limitof polynomials.

Proof. As one might imagine, there is some heavy lifting to be done here. The proof isessentially broken up into three steps. The first is to find functions in our algebra whichmatch f on at least two points for any two given points. Next, we use the continuity of ourapproximating functions to keep these functions close to f on small neighborhoods aboutone of the two points, then shrink down to a finite set of neighborhoods using compactness.We then take the minimum the finitely many functions we recover, and we can approximatethe min function on a finite set of variables uniformly using polynomials (using the previousproposition!), bounding the difference between f and our approximating functions fromabove. The third and final crucial step is to repeat this process using the max function tobound the difference between f and our approximating functions below. This is all madeprecise in the follow claims.Fix f ∈ C(X) and ε > 0. We need g ∈ A such that for all x ∈ X,

d(g(x), f(x)) < ε

82

Claim 4.3.6. For every s, t ∈ X, there exists fs,t ∈ A such that fs,t(s) = f(s) and fs,t(t) =f(t).

Proof. If s = t, then we can use the constant function x 7→ f(s). Suppose s 6= t. By (3),there exists h ∈ A such that h(s) 6= h(t). Since A is closed under multiplication by a scalar,we can modify h such that we have h(t)− h(s) = f(t)− f(s) by multiplying by h by

f(s)− f(t)

h(s)− h(t)

where the denominator is nonzero by hypothesis. Then adding the constant function x 7→f(s)− h(s), our function still lives in A by (1) and (2), and

h(s) = f(s) and h(t) = f(t)

Claim 4.3.7. For every s ∈ X, there is hs ∈ A such that

(1) |hs(s)− f(s)| < ε/2

(2) For all x ∈ X, hs(x) < f(x) + ε/2

Proof. The first condition might lead you to think we are taking a step backwards here; afterall, we previously found elements of A which agreed exactly with f at s. However, the firstcondition comes at the cost of the second condition, which is clearly much stronger thanexact agreement at two points; our loss of exactness at s is the price for condition (2).Fix s ∈ X. For each t ∈ X, by the previous claim there exists fs,t ∈ A such that

fs,t(s) = f(s) and fs,t(t) = f(t)

By the continuity of f − fs,t, there exists an open neighborhood Ut of t such that for allx ∈ Ut,

|fs,t(x)− f(x)| < ε

4

In particular, for all x ∈ Ut, fs,t(x) < f(x) + ε/4. It is clear that the set of all such Ut for allt ∈ X forms an open cover of X, so by compactness, there are finitely many t1, . . . , tk suchthat Ut1 ∪ · · · ∪ Utk = X. Let

h(x) = min(fs,t1 , . . . , fs,tk)(x)

Note that h(s) = f(s), and for all x ∈ X, h(x) < f(x) + ε/4. To see this, note that for eachx ∈ X, there is an i ∈ 1, . . . , k such that x ∈ Utk , whence

h(x) = min(fs,t1 , . . . , fs,tk)(x) 6 fs,ti(x) < f(x) +ε

4

Let M > 0 bound fs,t1 , . . . , fs,tk on X, which is possible since each of fs,t1 , . . . , fs,tk arecontinuous and X is compact. Let p : [−M,M ]k → [−M,M ] be a polynomial such that forall (y1, . . . , yk) ∈ [−M,M ]k,

|p(y1, . . . , yk)−min(y1, . . . , yk)| <ε

4

83

This is possible by our earlier result on uniform approximation of the min in k-variablesby polynomials. Now let hs(x) = p(fs,t1 , . . . , fs,tk). Then hs ∈ A, since A is closed under“polynomial” operations and is unital, and

|hs(x)− h(x)| < ε

4

So|hs(x)− h(x)| < ε

4<ε

2

and for all x ∈ X,

hs(x) < f(x) +ε

4+ε

4= f(x) +

ε

2

Claim 4.3.8. There is g ∈ A such that for all x ∈ X,

f(x)− ε < g(x) < f(x) + ε

This finishes the proof.

Proof. We almost had exactly what we wanted with the last claim, but the trouble is that wecould only get an upper bound on our approximating function. Now, we rig things similarlyto get a lower bound on the approximation error while not losing our upper bound.For all s ∈ X, |hs(s) − f(s)| < ε/2 by the last claim. By the continuity of hs − f , we havean open neighborhood Vs of s such that for all x ∈ Vs,

|hs(x)− f(x)| < 3ε

4

In particular, for all x ∈ Vs,f(x)− 3ε

4< hs(x)

Once again, the set of all neighborhoods Vs for each s ∈ X forms an open cover of X, so bythe compactness of X, there are finitely many s1, . . . , sl such that Vs1 ∪ · · · ∪ Vsl = X. Let

g = max(hs1 , . . . , hsl). Then for all x ∈ X,

g(x) < f(x) +ε

2

since hs(x) < f(x) + ε/2 for each x ∈ X, s ∈ X by the previous claim, and

f(x)− 3ε

4< g(x)

To see this, note that for each x ∈ X, there is an i ∈ 1, . . . , l such that x ∈ Vsi , whence

g(x) = max(hs1 , . . . , hsl) > hsi(x) > f(x)− 3ε

4

84

Now, as in the previous case, we can find a polynomial q on [−M ′,M ′]l for M ′ sufficientlylarge such that for all (y1, . . . , yl) ∈ [−M ′,M ′]l,

|q(y1, . . . , yl)−max(y1, . . . , yl)| <ε

4

Set g = q(hs1 , . . . , hsl). Then g ∈ A, and for all x ∈ X,

g(x) < g(x) +ε

4< f(x) +

ε

2+ε

4< f(x) + ε

and

g(x) > g(x)− ε

4> f(x)− 3ε

4− ε

4= f(x)− ε

These three claims together complete the proof. Note that we can replace the assumptionthat A is unital with the weaker assumption that for all x ∈ X, there exists h ∈ A such thath(x) 6= 0. One can then still prove the first claim with this weaker assumption. We alsoused that A is unital to show that A is closed under compositions with polynomials p. Butthis was only needed if p has a nonzero constant term; one can avoid this by showing thatthe min and max functions can in fact be uniformly approximated by polynomials with azero constant term.

Some Error Bounds

Theorem 4.3.9 (Generalization of Mean Value Theorem). Let f, g : [a, b]→ R be continuouson [a, b] and differentiable on (a, b). Then there exists c ∈ (a, b) such that

f ′(c)(g(b)− g(a)) = g′(c)(f(b)− f(a))

Moreover, if g′(x) is never 0 on (a, b), then

f ′(c)

g′(c)=f(b)− f(a)

g(b)− g(a)

Proof. Let

h(x) = (f(x)− f(a))(g(b)− g(a))− (g(x)− g(a))(f(b)− f(a))

The h is continuous on [a, b], differentiable on (a, b), and h(a) = 0 = h(b), so by Rolle’stheorem, there exists c ∈ (a, b) such that h′(c) = 0, i.e.

h′(c) = (f ′(c))(g(b)− g(a))− (g′(c))(f(b)− f(a)) = 0

proving our first claim. Now if g′(x) 6= 0 for all x ∈ (a, b), then g′(c) 6= 0, and g(b)−g(a) 6= 0by Rolle’s Theorem, so

f ′(c)

g′(c)=f(b)− f(a)

g(b)− g(a)

As an aside, note that under the right hypotheses, L’hopital’s rule is an easy consequence ofthis theorem. See the third incarnation from Lecture 10 for the analogy.

85

Remainder Calculuation for Taylor SeriesLet f be n-times continuously differentiable on [a, x], and (n + 1)-times differentiable on(a, x). Consider the function

F (u) = f(u) + f ′(u)(x− u) +f 2(u)

2!(x− u)2 + ·+ f (n)(u)

n!(x− u)n

as a function of u on the interval [a, x]. F is continuous on [a, b] and one time differentiableon (a, x). Let g : [a, x] → R be continuous on [a, x] and differentiable on (a, x). By thegeneralization of the Mean Value Theorem, there exists c ∈ (a, x) such that

F ′(c)

g′(c)=F (x)− F (a)

g(x)− g(a)

Note that differentiating F as a function of u gives:

F ′(u) =f ′(u) + [(−1)f ′(u) + f (2)(u)(x− u)] +

[(−1)(2)(x− u)

f (2)(u)

2!+f (3)(u)

2!(x− u)2

]+ · · ·+ (−1)(n)(x− u)n−1f

n(u)

n!+f (n+1)(u)

n!(x− u)n

=f (n+1)(u)

n!(x− u)n

Then note that F (a) is the Taylor expansion of f around a to the nth term, and

F (x)− F (a) = f(x)−n∑k=0

fk(a)

k!(x− a)k

is the remainder of the nth term Taylor expansion, denoted Rn(x). By the generalizedMean Value Theorem, there then exists c ∈ (a, x) such that

f (n+1)(c)n!

(x− c)n

g′(c)=f(x)−

∑nk=0

fk(a)k!

(x− a)k

g(x)− g(a)

Theorem 4.3.10 (Taylor Expansion with Lagrange Remainder). Let f be n-times continu-ously differentiable on [a, x] and (n+ 1)-times differentiable on (a, x). Then

f(x) =n∑k=0

fk(a)

k!(x− a)k +Rn(x)

where

Rn(x) =f (n+1)(c)

(n+ 1)!(x− a)n+1

for some c ∈ (a, x).

86

Proof. Use the above computation with g(u) = (x− u)n+1. Then

g′(u) = (−1)(n+ 1)(x− u)n

andg(x)− g(a) = (0− (x− a)n+1)

so there exists some c ∈ (a, x) such that

Rn(x) =f (n+1)(c)

n!(x− c)n

(−1)(n+ 1)(x− c)n· (0− (x− a)n+1) =

f (n+1)(c)

(n+ 1)!(x− a)n+1

Theorem 4.3.11 (Taylor Expansion with Cauchy Remainder). Let f be n-times continu-ously differentiable on [a, x] and (n+ 1)-times differentiable on (a, x). Then

f(x) =n∑k=0

fk(a)

k!(x− a)k +Rn(x)

where

Rn(x) =f (n+1)(c)

(n+ 1)!(x− c)n(x− a)

for some c ∈ (a, x).

Proof. Use the above computation with g(u) = x− u (or g(u) = u).

4.4 Lecture 16 - Power Series (II), Fubini’s Theorem, and exp(x)

Today, we will discuss yet another remainder term for Taylor expansions, the integral formof the Taylor remainder. We will also prove Fubini’s theorem, and use this to justify themultiplication of power series. We will use these tools to introduce the exponential function.

Theorem 4.4.1 (Taylor Expansion with Integral Remainder). Let f be (n + 1)-times dif-ferentiable on [a, x] with f (n+1) Riemann integrable on [a, x]. Then

f(x) =n∑k=0

fk(a)

k!(x− a)k +Rn(x)

where

Rn(x) =

∫ x

a

f (n+1)(t)

n!(x− t)ndt

Proof. We proceed by induction on n. For the case n = 0, we need to show

f(x) = f(a) +

∫ x

a

f ′(t)dt

87

which is true by the 1st fundamental theorem of calculus. Assume the theorem is true forn > 0. Let f be (n + 2)-times differentiable, with f (n+2) Riemann integrable. Then inparticular, f (n+1) is continuous, hence Riemann integrable. By the theorem for n then:

f(x) =n∑k=0

fk(a)

k!(x− a)k +

∫ x

a

f (n+1)(t)

n!(x− t)ndt

Note (when differentiating with respect to t)

(x− t)n =

(−(x− t)n+1 · 1

n+ 1

)′Hence, using integration by parts,∫ x

a

f (n+1)(t)

n!(x− t)ndt =

f (n+1)(t)

n!· −(x− t)n+1

n+ 1

∣∣∣∣xa

+

∫ x

a

f (n+2)(t)

n!· (x− t)n+1

n+ 1dt

=f (n+1)(a)

n!· (x− a)n+1

n+ 1+

∫ x

a

f (n+2)(t)

(n+ 1)!· (x− t)n+1dt

So

f(x) =n∑k=0

fk(a)

k!(x− a)k +

f (n+1)(a)(x− a)n+1

(n+ 1)!+

∫ x

a

f (n+2)(t)

(n+ 1)!· (x− t)n+1dt

=n+1∑k=0

fk(a)

k!(x− a)k +

∫ x

a

f (n+2)(t)

(n+ 1)!· (x− t)n+1dt

=n+1∑k=0

fk(a)

k!(x− a)k +Rn+1(x)

which completes the induction.

One can use the integral form of the Taylor remainder to show that for every x ∈ (−1, 0],the Taylor series for

√1 + x around a = 0 converges to

√1 + x at x. Note that doing so

completes the proof of Stone-Weierstrass. Here are some orthogonal approaches to Stone-Weierstrass (on [a, b] ⊆ R):

(i) Direct proof with explicit polynomials (this is the “Weierstrass” contribution of Stone-Weierstrass)

(ii) First approximate the δ-“function” using polynomials; then given any f , its convo-lutions with these polynomials uniformly approach f . See section 3.8 of Tao II fordetails.

Proposition 4.4.2 (S07.7). Let f : R→ R be twice continuously differentiable. Suppose f ′′

is uniformly bounded and has a simple root at x∗, i.e. f(x∗) = 0 but f ′(x∗) 6= 0. Let x0 ∈ R,and set xn+1 = F (xn), where

F (x) = x− f(x)

f ′(x)

88

Show that if x0 is close enough to x∗, then there is a constant C such that for all n > 1

|xn − x∗| 6 C|xn−1 − x∗|2

Solution. Let M bound f ′′. Since f ′(x∗) 6= 0, we can fix δ > 0 and L > 0 using the continuityof f such that

|x− x∗| < δ =⇒ |f ′(x)| > L

Now for x ∈ B(x∗, δ) we have (since f(x∗) = 0)

|F (x)− x∗| =∣∣∣∣x− f(x)

f ′(x)− x∗

∣∣∣∣ =

∣∣∣∣(x− x∗)− f(x)− f(x∗)

f ′(x)

∣∣∣∣By the Taylor series expansion of f around x, we have

f(x∗) = f(x) + f ′(x)(x∗ − x) +R2(x∗)

Sof(x∗)− f(x) = f ′(x)(x∗ − x) +R2(x∗)

Plug this in above to get

|F (x)− x∗| =∣∣∣∣(x− x∗) +

f ′(x)(x∗ − x) +R2(x∗)

f ′(x)

∣∣∣∣Using the Lagrange term for the remainder,

R2(x) =f ′′(c)(x∗ − x)2

2

for some c between x, x∗. Hence

|F (x)− x∗| =∣∣∣∣(x− x∗) + (x∗ − x) +

f ′′(c)(x∗ − x)2

2f ′(x)

∣∣∣∣ 6 M

2L|x− x∗|2

Taking C = M/2L completes the proof, with one subtlety; we need to make sure F (x) ∈(x∗ − δ, x∗ + δ) in order for the argument to work for all xn. To do so, take x such that|x− x∗| < min(δ, 2L/M).

Proposition 4.4.3 (W06.3). Let f : [a, b]→ R be twice continuously diffferentiable. Find a

good error bound for the trapezoid approximation of∫ badx, where the trapezoid approximation

for n = 1 is:

(b− a) · f(b) + f(a)

2

Solution. The trapezoid approximation is given by∫ b

a

l(x)dx

89

where

l(x) = f(a) + (f(b)− f(a)) · x− ab− a

So the error is given by ∣∣∣∣ ∫ b

a

f(x)− l(x)dx

∣∣∣∣In Lecture 10, we got the following error bounds on |f(x) − l(x)| using the higher orderRolle’s theorem:

f(x)− l(x) =(x− a)(b− x)

2· f ′′(c)

for some c ∈ (a, b). Since f ′′ is continuous on [a, b], let M be a bound for |f ′′|. Then theerror is bounded as follows:∣∣∣∣ ∫ b

a

f(x)− l(x)dx

∣∣∣∣ 6 ∫ b

a

∣∣∣∣(x− a)(b− x)

2

∣∣∣∣ ·Mdx 6∫ b

a

∣∣∣∣ |b− a||b− a|2

∣∣∣∣Mdx =M(b− a)3

2

Theorem 4.4.4 (Fubini’s Theorem for Sequences). If

∞∑n=1

∞∑m=1

an,m

converges absolutely, i.e.∞∑n=1

(∞∑m=1

|an,m|

)<∞

then so does∞∑m=1

∞∑n=1

an,m

and∞∑n=1

∞∑m=1

an,m =∞∑m=1

∞∑n=1

an,m

Proof. Let

sn =∞∑m=1

|an,m|

By assumption,∞∑n=1

sn <∞

Fix ε > 0. There exists N ∈ N such that for all k > N

k∑n=N+1

sn <ε

2

90

Also, for each n,∞∑m=1

|an,m| <∞

Hence, we can find M > N such that for all n 6 N and for all l > M

l∑m=M+1

|an,m| <ε

2N

Then for all l, k > M∣∣∣∣ k∑n=1

l∑m=1

an,m −M∑n=1

M∑m=1

an,m

∣∣∣∣ =

∣∣∣∣ k∑n=M+1

l∑m=1

an,m +M∑n=1

l∑m=M+1

an,m

∣∣∣∣6

∣∣∣∣ k∑n=M+1

l∑m=1

an,m +N∑n=1

l∑m=M+1

an,m

∣∣∣∣<ε

2+N · ε

2N= ε

Using a similar calculation, one can show that. Hence, for large enough M and l, k > M , wehave that

l∑n=1

k∑m=1

an,m,l∑

m=1

k∑n=1

an,m

are both within ε of the square sum

M∑n=1

M∑m=1

an,m

Sol∑

n=1

k∑m=1

an,m,l∑

m=1

k∑n=1

an,m

both converge to

limM→∞

M∑n=1

M∑m=1

an,m

Multiplication of Power Series

Definition 4.4.5. Let (cn), (dn) be two sequences of real numbers. We define the sequence(en) by

en =n∑j=0

cjdn−j

The sequence (en) is called the convolution of (cn), (dn).

91

Theorem 4.4.6. Let f, g : (a−r, a+r)→ R be real analytic at a, with radius of convergence> r, given by power series

∞∑n=0

cn(x− a)n

and∞∑n=0

dn(x− a)n

respectively. Then fg is real analytic at a with radius of convergence > r, with coefficientsgiven by the convolution of (cn) and (dn).

Proof. Note(∞∑n=0

cn(x− a)n

)(∞∑m=0

dm(x− a)m

)=

(∞∑n=0

cn(x− a)n∞∑m=0

dm(x− a)m

)

=∞∑n=0

∞∑m=0

cn(x− a)ndm(x− a)m

=∞∑n=0

∞∑k=0

cn(x− a)ndk−n(x− a)k−n

=∞∑n=0

∞∑k=0

cndk−n(x− a)k

=∞∑k=0

∞∑n=0

cndk−n(x− a)k

=∞∑k=0

k∑n=0

cndk−n(x− a)k since dm = 0 for all m < 0

=∞∑k=0

(x− a)k∞∑n=0

cndk−n

=∞∑k=0

(x− a)kek

where interchanging the order of the sums on line 5 is justified by Fubini’s theorem

Corollary 4.4.7. More generally, if∑cnx

n,∑dny

n converge absolutely at x, y ∈ R, then

∞∑n=0

(n∑j=0

cjdn−jxjyn−j

)

converges, to∞∑n=0

cnxn

∞∑n=0

dnyn

92

Definition 4.4.8. Define exp: R→ R by

exp(x) =∞∑k=0

xk

k!

Theorem 4.4.9. (1) exp(x) converges absolutely for all x ∈ R

(2) exp is differentiable, and exp′(x) = exp(x)

(3) exp is continuous, hence Riemann integrable, and∫ b

a

exp(x)dx = exp(b)− exp(a)

(4) exp(x+ y) = exp(x) · exp(y)

(5) exp(0) = 1, exp(−x) = 1/ exp(x)

(6) exp is strictly increasing

Proof. (1) Note

lim supn→∞

1

n!= 0

so the radius of convergence is infinite.

(2) By our earlier theorem, exp(x) is differentiable, and its derivative is given by term byterm differntiation of the series. One can easily show by induction that the coefficientsof exp′(x) are exactly that of exp(x), whence the two series are equal.

(3) Since exp is differentiable, it is clearly continuous. One can show that∫ b

a

exp(x)dx = exp(b)− exp(a)

either by using term by term integration (which is possible by a previous theorem), orby using the fundamental theorem of calculus and applying (2).

(4) Note

exp(x) exp(y) =

(∞∑k=0

xk

k!

)(∞∑k=0

yk

k!

)

=∞∑n=0

n∑j=0

xj

j!· yn−j

(n− j)!

=∞∑n=0

1

n!

n∑j=0

n!

j!(n− j)!xjyn−j

=∞∑n=0

1

n!(x+ y)n

= exp(x+ y)

93

(5) Using (4) with x = y = 0 gives

exp(0 + 0) = exp(0) = exp(0) · exp(0)

so exp(0) = 1. Then for any x ∈ R,

exp(−x+ x) = exp(0) = exp(x) · exp(−x) = 1

so

exp(−x) =1

exp(x)

(6) It is straightforward to show that exp(x) is strictly positive, so (2) shows that exp′(x) isstrictly positive.

Definition 4.4.10. Define

e = exp(1) =∞∑n=0

Proposition 4.4.11. For every rational q, exp(q) = eq.

Proof. An induction argument using (4) above shows the claim is true for q ∈ N, and (5)establishes the claim for q ∈ Z. Then using (4) once more estalishes the claim for allrationals.

Definition 4.4.12. For every a > 0, define the function x 7→ ax to be the unique continuousextension of the function q 7→ aq for q ∈ Q.

Corollary 4.4.13. For all x ∈ R, exp(x) = ex.

Proof. This is clear by the last two propositions, since exp is continuous.

94

5 Week 5

As per the syllabus, Week 5 topics include: Fubini theorem for sequences, multiplicationof power series, the exponential and logarithm, sine and cosine, uniform approximation ofperiodic functions by trigonometric polynomials, multi-variable differentiation, the chainrule, partial derivatives, directional derivatives, differentiability of functions with continuouspartial derivatives, inverse function theorem, implicit function theorem, Lagrange multipliers,integrals in several variables, change of variables, differentiation under the integral sign,integration over product of spaces and double integrals, Clairaut’s theorem on equality ofmixed partial derivatives, local minima, maxima, and saddle points in two variables, Taylor’sformula with remainder for functions of several variables, connection to Newton’s method inseveral variables, line integrals, Green’s theorem, divergence theorem, Stokes theorem in R3.

5.1 Lecture 17 - Some Special Functions and Differentiation inSeveral Variables

Today, we will introduce the natural logarithm and the trigonometric functions sin and cos.We will prove several key results about these functions, and will introduce basic Fourieranalysis. We will also discuss differentiation in several variables, and will prove the chainrule. We will also introduce directional derivatives.

Definition 5.1.1. ln : (0,∞)→ R is the inverse of exp. Note that ln exists and is continuoussince exp is strictly monotone increasing, continuous, and onto (0,∞).

Proposition 5.1.2. ln′(y) = 1/y.

Proof. By our earlier theorem on derivatives of inverses (see Lecture 9, Prop 3.1.8), ln isdifferentiable, and

ln′(y) =1

exp′(x)

where y = exp(x), so

ln′(y) =1

exp(x)=

1

y

Proposition 5.1.3. −∑∞

n=1 xn/n converges to ln(1− x) on (−1, 1).

Proof. Note that the series expansion∞∑n=0

tn

converges absolutely to

f(t) =1

1− tfor all |t| < 1. By our earlier theorem on integration of power series, we have∫ x

0

1

1− tdt =

∞∑k=0

∫ x

0

tkdt =∞∑k=0

xk+1

k + 1=∞∑n=1

xn

n

95

for all x ∈ (−1, 1). Additionally, by the fundamental theorem of calculus,∫ x

0

1

1− tdt = − ln(1− t)

∣∣∣∣x0

= − ln(1− x)

Corollary 5.1.4. ln(1− x) is analytic on (−1, 1).

Example 5.1.5. Here is a nice application of Abel’s theorem. In Lecture 5, we showed that

∞∑n=0

xn

n

converges at x = −1. By Abel’s theorem, it converges to − ln(x− 1) at x = −1, so

∞∑n=1

xn

n= − ln(2)

Proposition 5.1.6. Let f : (a−r, a+r)→ (b−s, b+s) and g : (b−s, b+s)→ R be analyticat a, b with radii of convergence r, s > 0 respectively. Then g f is analytic at a, with radiusof convergence r.

Proof. Here’s a sketch. Plug the power series of f into the power series of g, and use multipli-cation of power series (i.e., convolution and Fubini) together with absolute convergence.

Example 5.1.7 (The Trigonometric Functions).

Definition 5.1.8. For α ∈ R, define sinα, cosα as fllows. Trace the path of length α on theunit circle counterclockwise, starting at (1, 0). Let (x, y) be the endpoint of the path. Thensinα = y, cosα = x. α is also the angle in the resulting right triangle with horizontal sidex, vertical side y, and hypotenuse 1.

Proposition 5.1.9. sin, cos are periodic, with period 2π.

Proposition 5.1.10. In the triangle with horizontal side x, vertical side y, and hypotenuser, where the angle between the horizonal side and the hypotenuse is α, x = r cosα, y = r sinα.

Proposition 5.1.11. sin and cos are continuous. In fact, they are Lipschitz continuous withconstant 1 on sufficiently small neighborhoods.

Proof. We give proof for sin; the proof for cos is very similar. Let h be a small perturbationof the angle α. We want to put bounds on the quantity sin(α + h) − sin(α). Let u be thelength of the chord connecting the points on the unit circle determined by the endpoints ofthe path of length α and the path of length α + h. Clearly, u < h, since h is the length ofthe arc connecting the two points, which must be greater than the length of the straight lineconnecting the two points, i.e. u. If θ is the angle between u and the vertical line extendingfrom the endpoint of the path of length α + h, then it is clear that

sin(α + h)− sin(α) = u cos θ 6 u < h = 1 · h

96

Proposition 5.1.12. sin and cos are differentiable, and sin′ α = cosα, cos′ α = − sinα.

Proof. We give proof for only sin. Let u, h, α, and θ be as they were the previous proof.Then note

α +π

2− θ +

h

2=π

2

Sosin(α + h)− sin(α)

h=u cos θ

h=u

hcos(α +

h

2)

As h→ 0, u/h→ 1, and by the continuity of cos, cos(α + h/2)→ cos(α), so

sin(α + h)− sin(α)

h→ cos(α)

Corollary 5.1.13. sin, cos are smooth.

Proof. This is an easy induction using the above proposition.

Proposition 5.1.14. The taylor expansion of sin, cos around 0 converges to sin, cos on(−∞,∞).

Proof. We use the Lagrange Remainder. As cos and sin are smooth, the remainder for then-term expansion is

|Rn(x)| =∣∣∣∣f (n+1)(c)

(n+ 1)!xn+1

∣∣∣∣for some c between 0 and x. Then as f (n+1) = ± sin or ± cos, we have

|Rn(x)| 6 1

(n+ 1)!|x|n+1

which converges to 0 as n→∞ for each fixed x.

Corollary 5.1.15.

sinx = x− x3

3!+x5

5!+ · · ·

cosx = 1− x2

2!+x4

4!+ · · ·

Definition 5.1.16. The trigonometric polynomials are the functions obtained as linearcombinations of sin(nx), cos(nx) for n = 0, 1, 2, . . .. Note all trigonometric polynomialsare periodic with period 2π. We can then view them as functions on the interval [0, 2π]where we identify 0 and 2π, or equivalently, as functions on the unit circle (denoted R/2πZ).

Proposition 5.1.17. The trigonometric polynomials form an algebra on R/2πZ.

97

Proof. By definition, the trigonometric polynomials are closed under addition and scalarmultiplication, so we only need to show closure under products. It is sufficient to show thatfor all m,n ∈ N,

sin(nx) cos(mx), sin(nx) sin(mx), cos(nx) cos(mx)

are trigonometric polynomials. Using the following trigonometric identities,

sinA sinB =1

2[cos(A−B)− cos(A+B)]

sinA sinB =1

2[cos(A−B) + cos(A+B)]

sinA cosB =1

2[sin(A−B) + sin(A+B)]

we note

sin(nx) sin(mx) =1

2[cos((n−m)x)− cos((n+m)x)]

cos(nx) cos(mx) =1

2[cos((n−m)x) + cos((n+m)x)]

sin(nx) cos(mx) =1

2[sin((n−m)x) + sin((n+m)x)]

which completes the proof.

Proposition 5.1.18. The algebra of trigonometric polynomials is unital and separate points.

Proof. The constant function 1 is cos(0x), hence is a trigonometric polynomial. One cancheck that for any x 6= y ∈ [0, 2π), either sinx 6= sin y or cos x 6= cos y.

Corollary 5.1.19. Any continuous function on R/2πZ (or equivalently, any continuousfunction on R with period 2π) is the uniform limit of trigonometric polynomials.

Proof. Immediate by Stone-Weierstrass.

Note that Tao’s proof of Stone-Weierstrass also gives formulas for the coeffficients of theapproximating trigonometric polynomials, using convolutions. The coefficients obtained arethe start of Fourier analysis.

Multivariable Calculus (Differentiation and Integration)

5.2 Lecture 18 - Inverse Function Theorem, Implicit FunctionTheorem and Lagrange Multipliers

Today, we will prove two essential results of multivariable calculus, namely the Inverse Func-tion Theorem and the Implicit Function Theorem. We will then use the Implicit FunctionTheorem to rigorously prove the Lagrange multipliers method for finding extrema constrainedto a particular surface.

98

5.3 Lecture 19 - Multivariable Integration and Vector Calculus

Today, we will introduce Riemann integration in several variables, and we will prove Fubini’stheorem, which allows you to interchange the order of integration under the appropriatecircumstances. We will also prove the standard results of vector calculus, namely Green’sTheorem, Stoke’s Theorem, and the Divergence Theorem.

99