26
Measures and Integration L´aszl´oErd˝os Nov 9, 2007 Based upon the poll in class (and the required prerequisite for the course – Analysis III), I assume that everybody is familiar with general measure theory and Lebesgue integration. The beginning of this note (Section 1 and 2) is meant to remind you what concepts this involves. If something is unknown, look it up. I provide a good summary of basic concepts (without proofs) by Marcel Griesemer. This file actually contains a bit more than we need, see the list below. Another summary you can find e.g. in the Appendix A of Werner: Funktionalanalysis. If you need to check more details (e.g. proofs), consult with any analysis or measure theory book. Very good books are: H.L. Royden: Real Analysis or Walter Rudin: Principles of mathematical analysis. A very concise and sharp introduction is in E. Lieb, M. Loss: Analysis. 1 Need for Lebesgue Integral One can justify the necessity of a more general integration (than Riemann) in many ways. From functional analysis point of view there are two “natural” arguments Need for Lebesgue I. Recall that we have equipped C [0, 1] with two different metrics (actually, norms) d 1 and d (or ‖·‖ 1 and ‖·‖ in terms of norms). This space was proven to be complete under the d metric. It is clearly not complete under the d 1 metric – it is trivial to find a sequence of continuous functions f n and a discontinuous function f : [0, 1] R, such that 1 0 |f n - f |→ 0. In particular, f n is Cauchy in d 1 (it even converges), but the limit is not in C [0, 1]. You may think, there is no problem, since we know how to Riemann-integrate functions with jumps, e.g. piecewise continuous functions. So if (C [0, 1],d 1 ) is not complete, maybe (PC [0, 1],d 1 ) is so. It is fairly easy to see that this is not the case: Homework 1.1 Prove that (PC [0, 1],d 1 ) is not complete. 1

Measures and Integration - LMU Münchenlerdos/WS07/FA/leb.pdf · Measures and Integration L´aszl´o Erd˝os Nov 9, 2007 Based upon the poll in class (and the required ... this involves

Embed Size (px)

Citation preview

Measures and Integration

Laszlo Erdos

Nov 9, 2007

Based upon the poll in class (and the required prerequisite for the course – Analysis III),I assume that everybody is familiar with general measure theory and Lebesgue integration.

The beginning of this note (Section 1 and 2) is meant to remind you what conceptsthis involves. If something is unknown, look it up. I provide a good summary of basicconcepts (without proofs) by Marcel Griesemer. This file actually contains a bit more thanwe need, see the list below. Another summary you can find e.g. in the Appendix A of Werner:Funktionalanalysis. If you need to check more details (e.g. proofs), consult with any analysisor measure theory book. Very good books are: H.L. Royden: Real Analysis or Walter Rudin:Principles of mathematical analysis. A very concise and sharp introduction is in E. Lieb, M.Loss: Analysis.

1 Need for Lebesgue Integral

One can justify the necessity of a more general integration (than Riemann) in many ways.From functional analysis point of view there are two “natural” arguments

Need for Lebesgue I.

Recall that we have equipped C[0, 1] with two different metrics (actually, norms) d1 andd∞ (or ‖ · ‖1 and ‖ · ‖∞ in terms of norms). This space was proven to be complete under thed∞ metric. It is clearly not complete under the d1 metric – it is trivial to find a sequence ofcontinuous functions fn and a discontinuous function f : [0, 1] → R, such that

∫ 1

0|fn−f | → 0.

In particular, fn is Cauchy in d1 (it even converges), but the limit is not in C[0, 1].You may think, there is no problem, since we know how to Riemann-integrate functions

with jumps, e.g. piecewise continuous functions. So if (C[0, 1], d1) is not complete, maybe(PC[0, 1], d1) is so. It is fairly easy to see that this is not the case:

Homework 1.1 Prove that (PC[0, 1], d1) is not complete.

1

We know that PC[0, 1] is not the biggest class of functions that are Riemann integrable,eventually Riemann integrability can allow infinitely many discontinuities, as long as thedifference of the lower sums and upper sums converge to zero, i.e. the oscillation of thefunction is not too big. The basic theorem about Riemann integrability is the following:

Theorem 1.2 A function f : [0, 1] → R is Riemann integrable if and only if f is boundedand it is continuous almost everywhere, i.e. the set of discontinuities is of (Lebesgue) measurezero.

Homework 1.3 Prove directly (without reference to the above characterization of Riemannintegrability) that the set of Riemann integrable functions equipped with the d1 metric is notcomplete. (Hint: take a Cantor set C that has nonzero measure and consider its approxima-tions Cn that you obtain along the Cantor procedure after removing the n-th generation ofintervals. Then take the characteristic functions of these sets.)

Need for Lebesgue II.

We have seen that pointwise convergence and continuity (say in C[0, 1]) are not compatiblewithout further assumptions. What about pointwise convergence and Riemann integration?Is

limn→∞

∫ 1

0

fn(x)dx =

∫ 1

0

limn→∞

fn(x)dx

true? (In a sense, that pointwise limit of Riemann integrable functions is Riemann integrableand the limit of the integral is the integral of the limit). We know that without some furthercondition this cannot hold, just consider the sequence

fn(x) = nχ(0,1/n)(x)

that clearly converges to f ≡ 0 pointwise but∫ 1

0fn = 1.

Suppose we are willing to assume uniform boundedness (that is anyway reasonable inthe realm of Riemann integrable functions and, a-posteriori, we know from the dominatedconvergence theorem that some condition is necessary) in order to save the exchangeability ofthe limit and integral. It still does not work, for example, consider the Dirichlet function

f(x) =

1 if x ∈ Q ∩ [0, 1]0 if x ∈ [0, 1] \ Q

and its approximations

fn(x) =

1 if x = pq, p, q ∈ Z, p ≤ q ≤ n

0 otherwise

2

Clearly fn is Riemann integrable (since it is everywhere zero apart from finitely many points),while its pointwise limit, f , is not Riemann integrable (WHY?) Again, the problem is the bigoscillation.

2 Concepts you should know

The following concepts, theorems you should be familiar with:

• σ-algebra (meaningful on any set X)

• Borel sets (meaningful on a topological space).

• Measures, outer measures. Measure spaces.

• Regular measures on topological spaces (approximability of measures of sets by opensets from outside and compact sets from inside)

• Lebesgue measure and its properties. Lebesgue measure is the unique measure on Rn

that is invariant under Euclidean motions and assigns 1 to the unit cube.

• Lebesgue measurable sets. Zero measure sets. Concept of “almost everywhere”. Not allLebesgue sets are Borel (this is not easy to prove)

• Counting measure on the measure space (N, P (N), µ), where P (N) is the σ-algebra ofall subsets and µ is the counting measure.

• (Borel)-measurable functions. This class is closed under arithmetic operations, compo-sitions, lim inf and lim sup.

• Lebesgue integral. Integrable functions. Usual properties (linearity, monotonicity,|∫

f | ≤∫

|f |)

• Lebesgue integral coincides with Riemann integral for Riemann integrable functions.

• Basic limit theorems: Monotone and dominated convergence, Fatou’s Lemma.

• Lebesgue integral of complex valued functions (infinite integral is not allowed,∫

|f | < ∞is required).

• σ-finite measure spaces.

3

• Product of two measure spaces (construction of the product σ-algebra and productmeasure). Fubini theorem (need non-negativity or integrability with respect to theproduct measure to interchange integrals)

We will use the notation∫

Ω

fdµ =

Ω

f(x)dµ(x)

simultaneously for the Lebesgue integral. The second notation is favored if for some reasonthe integration variable needs to be spelled out explicitly (e.g. we have multiple integral).

If Ω ⊂ Rd, then we use∫

Ω

f =

Ω

f(x)dx

where dx stands for the Lebesgue measure. Unless we indicate otherwise, integration onsubsets of Rd is always understood with respect to the Lebesgue measure.

3 Singular measures

This chapter usually belongs to measure theory but I am not sure if the majority of you hadit. So we review it. We first present examples on R, then develop the general definitions.

Let α(x) : R → R be a monotonically increasing function. A monoton function may notbe continuous, but its one-sided limits exist at every point, we introduce the notation

α(a + 0) := limx→a+0

α(x), α(a − 0) := limx→a−0

α(x),

We define a measure µα by assigning

µα((a, b)) := α(b − 0) − α(a + 0)

to any open interval (a, b). Since open intervals generate the sigma-algebra of Borel sets, it iseasy to see that the usual construction of Lebesgue measure (using α(x) = x) goes through forthis more general case. The resulting measure is called the Lebesgue-Stieltjes measure.With respect to this measure we can integrate, the corresponding integral is sometimes denotedas

fdµα =

fdα

(the right hand side is only a notation). This is called the Lebesgue-Stieltjes integral.

Examples:

4

(i) As mentioned, α(x) = x gives back the Lebesgue integral. A bit more generally, ifα ∈ C1 (continuously differentiable), then it is easy to check that

fdα =

f(x)α′(x)dx

i.e. in this case the Lebesgue-Stieltjes integral can be expressed as a Lebesgue integralwith a weight function α′.

(ii) Fix a number p ∈ R. Let α(x) := χ(x ≥ p) be the characteristic function of the semi-axis[p,∞). CHECK that

fdα = f(p)

for any function f . In particular, any function is integrable and the integral dependsonly on the value at the origin. The corresponding L1 space is simply

L1(R, dα) ∼= C

i.e. it is a one-dimensional vectorspace (CHECK!).

The generated Lebesgue-Stieltjes measure is called the Dirac delta measure at p andit is denoted as δp. In particular,

δp(A) =

1 if p ∈ A0 if p 6∈ A

(iii) Let f ≥ 0 be a measurable function on R with finite total integral. Let

α(x) :=

∫ x

−∞

f(s)ds

Then

µα(A) =

A

f(x)dx

(iv) A considerable more interesting example is the following function. Consider the standardCantor set, i.e.

C := [0, 1] \(

(1

3,2

3) ∪ (

1

9,2

9) ∪ (

7

9,8

9) ∪ (

1

27,

2

27) ∪ . . .

)

5

Recall that the Cantor set is a compact, uncountable set. It is easy to see that theLebesgue measure of C is zero. Define an increasing function α on [0, 1] as follows: αwill be constant on each of the set removed in the definition of C, more precisely

α(x) :=

1/2 x ∈ (1/3, 2/3)1/4 x ∈ (1/9, 2/9)3/4 x ∈ (7/9, 8/9)1/8 x ∈ (1/27, 2/27)3/8 x ∈ (7/27, 8/27)5/8 x ∈ (19/27, 20/27)7/8 x ∈ (25/27, 26/27)etc.

Make a picture to see the succesive definition of α on the complement of the Cantor set.

With these formulas we have not yet defined α on C.

Homework 3.1 (a) Show that the function α defined on [0, 1]\C above can be uniquelyextended to [0, 1] by keeping monotonicity. This is called the Devil’s staircase.

(b) Show that the extension is continuous.

(c) Let µα the corresponding Lebesgue-Stieltjes measure. Show that µα(p) = 0 for anypoint p ∈ [0, 1].

(d) Show that dµα is supported on a set of Lebesgue measure zero. (Recall that thesupport (Trager) of a measure µ is the smallest closed set K such that for anyproper closed subset H we have µ(K \ H) > 0.)

(e) Show that α is almost everywhere differentiable in [0, 1] but the fundamental theoremof calculus does not hold, e.g.

α(1) − α(0) 6=

∫ 1

0

α′(x)dx

Homework 3.2 Let µα be the Lebesgue-Stieltjes measure constructed in the previousHomework. Compute

(a)

∫ 1

0

xdµα(x), and (b)

∫ 1

0

x2dµα(x)

[Hint: use the hierarchical structure of C]

6

This example shows that without the fundamental theorem of calculus it can be quitecomplicated to compute integrals. In this particular example the special structure ofC and α helped. If one constructs either a less symmetric Cantor set or one defines αdifferently, it may be very complicated to compute the integral.

The last three examples were prototypes of a certain classification of measures accordingto their singularity structure. The Dirac delta measure is so singular that it assigns nonzerovalue to a set consisting of a single point, namely δp(p) = 1. The measure dµα obtained fromthe “Devil’s staircase”, example (iv), is less singular, since it assigns zero measure to everypoint, but it is still supported on a small set (measured with respect to the usual Lebesguemeasure). Finally, example (iii) shows a non-singular measure in a sense that µα(A) = 0 forany set of zero Lebesgue measure.

We give the precise definitions of these classes.

Definition 3.3 Let µ and ν be two measures defined on a fixed σ-algebra on a space X. Then(a) ν is absolutely continuous (absolutstetig) with respect to µ if ν(A) = 0 whenever

µ(A) = 0. Notation; ν µ;(b) µ and ν are mutually singular if there is a measurable set A such that µ(A) = 0 and

ν(X \ A) = 0. Notation: µ ⊥ ν.

Example (iii) is a measure that is absolutely continuous with respect to the Lebesguemeasure, while examples (ii) and (iv) are both mutally singular with the Lebesgue measure(and with each other as well).

It is clear that example (iii) is an absolutely continuous measure. It is less clear, thatessentially every absolutely continuous measure is the result an integration. This is the contentof the important Radon-Nikodym theorem, whose proof we postpone:

Theorem 3.4 (Radon-Nikodym) Let µ and ν be two measures on a common σ-algebra onX and µ be σ-finite. Then ν µ if and only if there exists a measurable function, f : X → R+

(infinity is allowed), such that

ν(A) =

A

f(x)dµ(x)

for any A in the σ-algebra. This function is µ-a.e. (almost everywhere) unique. Notation:f = dν

dν(this is only a formal fraction!)

Moreover, we also have the following decomposition which we mention without proof:

Theorem 3.5 (Lebesgue decomposition I.) Let µ and ν be two σ-finite measures on acommon σ-algebra. Then ν can be uniquely decomposed as

ν = νac + νsing

7

where νac µ and νsing ⊥ µ.

The singular part can be further decomposed under a mild countability condition on thenumber of points that have positive measure:

Definition 3.6 Let (X,B, µ) be a measure space such that for every point x ∈ X, the set xbelongs to B, and let

P := x ∈ X : µ(x) 6= 0

The set P is called the pure points or atomic points of the measure µ. Assume that P isa countable set. Then the measure

µpp(A) := µ(A ∩ P ) =∑

x∈A∩P

µ(x)

is well-defined and it is called the pure point or atomic component of µ. A measure µ iscalled pure point measure if µ = µpp. A measure µ is called continuous if µpp = 0.

Given another measure ν on the same σ-algebra, the measure µ is singular continuouswith respect to ν if µ is continuous and µ ⊥ ν.

The Dirac delta measure from example (ii) is an atomic measure; examples (iii) and (iv)are continuous measures. Example (iii) is a measure that is absolutely continuous with respectto the Lebesgue measure, while (iv) and the Lebesgue measure are mutually singular. Themeasure in (iv) is thus a singular continuous measure with respect to the Lebesgue measure.

The following theorem is a simple exercise from these definitions:

Theorem 3.7 Given two measures µ, ν on the same σ-algebra that contains each x, assumethat ν ⊥ µ and assume that the set of atoms of ν is countable. Then the measure ν can beuniquely decomposed into ν = νpp + νsc, where νpp is the pure point component of ν and νsc isa singular continuous measure that is also mutually singular to νpp.

The most important application is the following version of these decomposition theoremswhose proof is a simple exercise from the definitions above.

Theorem 3.8 (Lebesgue decomposition II.) Let µ and ν be two σ-finite measures on acommon σ-algebra that contains each x. (In particular there are at most countably manypoints with nonzero weight). Then ν can be uniquely decomposed as

ν = νac + νpp + νsc

where νac µ, νsc ⊥ µ and νpp is the pure-point component of ν.

8

4 Lp-spaces

Dominated convergence theorem resolved the “Need for Lebesgue II.” by demonstrating thatpointwise limit and integration can be interchanged within the Lebesgue framework (assumingthe existence of the integrable dominating function). What about “Need for Lebesgue I”?

It is clear that the formula

‖f‖1 :=

∫ 1

0

|f(x)|dx (4.1)

extends the norm (metric) d1 from C[0, 1] to all Lebesgue integrable functions on [0, 1], sinceRiemann and Lebesgue integrals coincide on continuous functions. In the Riesz-Fischer theo-rem below (Section 5) we will show, that the space of Lebesgue integrable functions is actuallycomplete, so it is one of the possible completions of (C[0, 1], d1) (we do not know yet that itis the smallest possible, for that we will have to show that the continuous functions are densein the set of Lebesgue integrable ones).

However, before we discuss this, we have to introduce the Lp spaces. It would be temptingto equip the space of Lebesgue integrable functions by the norm given by (4.1). Unfortunately,this is not a norm, for a “stupid” reason: the Lebesgue integral is insensitive to changing theintegrand on zero measure set. In particular, ‖f‖1 = 0 does not imply that f(x) = 0 for allx, only for almost all x.

The following idea circumvents this problem and we discuss it in full generality. Let(Ω,B, µ) be a measure space (where Ω is the base set, B is a σ-algebra and µ is the measure).We consider the an equivalence relation on the set of functions f : Ω → C:

f ∼ g iff f(x) = g(x) for µ-almost all x

It needs a (trivial) proof that this is indeed an equivalence relation.Suppose that f is integrable, then obviously any function in its equivalence class is also

integrable with the same integral. Therefore we consider the space

L1(Ω,B, µ) = L1(Ω, µ) = L1(Ω) = L1 := Integrable functions/ ∼

i.e. the integrable functions factorized with this equivalence relation (the various notationsare all used in practice, in principle the concept of L1 depends on the space, the measureand the sigma algebra, but in most cases it is clear from the context which sigma-algebraand measure we consider, so we omit it from the notation). It is easy to see that the usualvectorspace operations extend to the factorspace. Moreover the integration naturally extendsto L1(Ω, µ).

The only thing to keep in mind, that notationally we still keep denoting elements ofL1(Ω, µ) by f(x), even though f(x) does not make sense for a fixed x for a general L1 function(for continuous functions it is of course meaningful).

9

Definition 4.1 Let (Ω,B, µ) be a measure space and let 0 < p ≤ ∞. We set

Lp(Ω, µ) := f : Ω → C,measurable :

Ω

|f |pdµ < ∞/ ∼

for p < ∞ and

L∞(Ω, µ) := f : Ω → C,measurable : ess sup |f | < ∞/ ∼

where the essential supremum of a function is defined by

ess sup |f | := infK ∈ R : |f(x)| ≤ K for almost all x

These spaces are called Lp-spaces or Lebesgue spaces.

Note that every Lebesgue space is actually an equivalence class of functions. But this factis usually omitted from the notations.

Homework 4.2 Prove that Lp(Ω, µ) is a vectorspace for any p > 0.

Definition 4.3 For f ∈ Lp we define

‖f‖p :=(

Ω

|f |pdµ)1/p

if p < ∞ and‖f‖∞ := ess sup |f |

if p = ∞.

These formulas do not define a norm if 0 < p < 1 (triangle inequality is not satisfied)but they do define a norm for 1 ≤ p ≤ ∞. For the proof, one needs Minkowski inequality(Theorem 6.5)

‖f + g‖p ≤ ‖f‖p + ‖g‖p

that is exactly the triangle inequality for ‖ · ‖p (the other two properties of the norm aretrivially satisfied). From now on we will always assume that 1 ≤ p ≤ ∞ whenever we talkabout Lp spaces.

These norms naturally define the concept of Lp convergence of functions:

Definition 4.4 A sequence of functions fn ∈ Lp converges to f ∈ Lp in Lp-sense or inLp-norm if ‖fn − f‖p → 0 as n → ∞.

10

In case of Lp convergent sequences, we often say that fn converges strongly (stark),although this is a bit imprecise, since it does not specify the exponent p. We will see laterthat it nevertheless distinguishes from the concept of weak convergence.

These convergences naturally extend the d1, dp and d∞ convergences on continuous func-tions we have studied earlier. Moreover, the pointwise convergence also naturally extends toLp functions, but we must keep in mind the problem that everything is defined only almostsurely.

Definition 4.5 (i) A sequence of measurable functions fn on a measure space (Ω,B, µ) con-verges to a measurable function f almost everywhere (fast uberall) if there exists a setZ of measure zero, µ(Z) = 0, such that

fn(x) → f(x) ∀x 6∈ Z

(ii) A sequence of equivalence classes of measurable functions fn converges pointwise to anequivalence class of measurable functions f , if any sequence of representatives of the classesof fn converges to any representative of f .

It is any easy exercise to show that if the convergence holds for at least one sequenceof representatives, then it holds for any sequence (of course the exceptional set Z changes),in particular part (ii) of the above definition is meaningful. Therefore one does not need todistinguish between almost everywhere pointwise convergence of equivalence classes and theirrepresentatives. In the future, we will thus freely talk about, e.g., Lp functions convergingalmost everywhere pointwise without ever mentioning the equivalence classes.

Homework 4.6 Give examples that pointwise convergence does not imply Lp convergence andvice versa. Give also examples that convergence in Lp does not in general imply convergencein Lq, p 6= q.

There is, however, one positive statement:

Lemma 4.7 Suppose the total measure of the space is finite, µ(Ω) < ∞. Then Lp convergenceimplies Lq convergence whenever q ≤ p.

Proof. Use Holder inequality (we will prove it later, but I assume everybody has seen it)∫

Ω

|f |qdµ =

Ω

|f |q · 1 dµ ≤(

Ω

(

|f |q)p/q

dµ)q/p(

Ω

1p/(p−q)dµ)(p−q)/p

thus‖f‖q ≤ ‖f‖p

(

µ(Ω))

1

q− 1

p

11

5 Riesz-Fischer theorem

The following theorem presents the most important step towards proving that L1[0, 1] is thecompletion of C[0, 1] equipped with the d1 metric.

Theorem 5.1 (Riesz-Fischer) Let (Ω,B, µ) be an arbitrary measure space, let 1 ≤ p ≤ ∞and consider the Lp = Lp(Ω, µ).

(i) The space Lp, equipped with the norm ‖ · ‖p, is complete, i.e. if fi ∈ Lp is Cauchy, thenthere is a function f ∈ Lp such that fi → f in Lp-sense.

(ii) If fi → f in Lp, then there exists a subsequence, fik , and a function F ∈ Lp such that|fik(x)| ≤ F (x) for all n (almost everywhere in x) and fik converges to f almost everywhere,as k → ∞.

Proof. We will do the proof for p < ∞. The p = ∞ case requires a somewhat differenttreatment (since L∞ is defined differently) but it is simpler.

Step 1: Subsequential convergence is enough.

This is an important basic idea. We want to prove that a Cauchy sequence fi convergesstrongly. It turns out that it is sufficient to show that some subsequence converges strongly.Apparently this is much weaker, but actually it is not. Suppose that fik is a strongly convergentsubsequence, i.e. fik → f (in Lp) as k → ∞. But then

‖fi − f‖p ≤ ‖fi − fik‖p + ‖fik − f‖p

and thus for any ε > 0 we can make the second term smaller than ε/2 by choosing k suffi-ciently large, and then, by the Cauchy property, the first term is smaller than ε/2 if i and kare sufficiently large. Thus from subsequential strong convergence of a Cauchy sequence weconcluded the strong convergence of the whole sequence.

Step 2. Selection a subsequence.

To find a convergent subsequence we proceed successively. Pick i1 such that

‖fn − fi1‖p ≤1

2∀n ≥ i1

such i1 exists by the Cauchy property. Now select i2 > i1 such that

‖fn − fi2‖p ≤1

4∀n ≥ i2

12

and again by the Cauchy property such i2 exists. Next we choose i3 > i2 such that

‖fn − fi3‖p ≤1

8∀n ≥ i3

etc., in general we have ik > ik−1 with

‖fn − fik‖p ≤1

2k∀n ≥ ik

Step 3. Telescopic sum

Now we define

F` := |fi1 | +`−1∑

k=1

|fik − fik+1|

By Minkowski inequality

‖F`‖p ≤ ‖fi1‖p +1

2+

1

4+ . . . = ‖fi1‖p + 1

and clearly F` is a monotone increasing sequence of functions. Let

F := lim`

F`

be the almost everywhere pointwise limit, then by monotone convergence theorem and by theuniform bound on the Lp norm of F`, we have

‖F‖p < ∞

in particular, F (x) < ∞ almost everywhere.Now use the telescopic cum

fik = fi1 + (fi2 − fi1) + (fi3 − fi2) + . . . + (fik−1− fik)

As k → ∞ this is an absolutely convergent series for every x such that F (x) < ∞, let f(x)be its limit, thus

fik(x) → f(x) k → ∞

for almost every x. Moreover, from the telescopic sum it also follows that

|fik | ≤ F ∈ Lp

13

and thus by dominated convergence, we have

f ∈ Lp

Using dominated convergence once more, for

|fik − f | ≤ |fik | + |f | ≤ F + |f | ∈ Lp

we also have‖fik − f‖p → 0, k → ∞.

We have proved earlier that (C[0, 1], ‖ · ‖∞) is complete and now we have seen that(L∞[0, 1], ‖ · ‖∞) is also complete. However, for any p < ∞, the set (C[0, 1], ‖ · ‖p) is notcomplete (EXAMPLE!) but (Lp[0, 1], ‖ · ‖p) is complete. Actually it is the (smallest) comple-tion of (C[0, 1], ‖ · ‖p) as we will soon prove.

Remark 5.2 The p = ∞ case often behaves exceptionally. Many theorems about Lp spaceshold only with the restriction p < ∞, and/or sometimes, by duality, p > 1 is necessary. Ruleof thumb: whenever you use some theorem about Lp spaces watch out for the borderline cases,p = 1,∞ and make sure the theorem applies to them. Riesz-Fischer theorem holds withoutrestrictions, but many other theorems do not.

6 Inequalities

The primary tools in analysis are inequalities. Even though often theorems in analysis areformulated as limiting statements, the heart of the proof is almost always an inequality. Herewe discuss a few basic inequalities involving integrals of functions. I assume that you havealready seen Jensen’s, Holder’s and Minkowski’s inequalities. I will not prove them in class,but I enclose their proofs – they are important, if you forgot them, review it.

Theorem 6.1 (Jensen’s inequality) Let J : R → R be a convex function and let (Ω, µ)be a measure space with finite total measure, i.e. µ(Ω) < ∞. Let f ∈ L1(Ω, µ) function anddefine its average as

〈f〉 :=1

µ(Ω)

Ω

fdµ

Then(i) (J f)− ∈ L1 (here a− := max0,−a is the negative part (Negativteil) of a), in

particular,∫

J f dµ is well defined (maybe +∞).(ii) 〈J f〉 ≥ J(〈f〉)(iii) If J is strictly convex at 〈f〉 then equality in (ii) holds iff f = 〈f〉.

14

Proof. By convexity, there exists a number v such that

J(t) ≥ J(〈f〉) + v(t − 〈f〉) (6.2)

holds for every t ∈ R. (The graph of a convex function lies “above” every tangent line).Plugging in t = f(x), we have

J(f(x)) ≥ J(〈f〉) + v(f(x) − 〈f〉) (6.3)

and thusJ(f(x))− ≤ J(〈f〉) + |v||f(x)|+ |v||〈f〉| ∈ L1

thus (i) is proven (we needed only an upper bound on J(f(x))− since it is always non-negative).Integrating (6.3) over Ω with respect to µ, then dividing by µ(Ω), we get exactly (ii).Finally, to prove (iii), it is clear that if f is constant (almost everywhere), then clearly this

constant must be its average, 〈f〉 and (ii) holds with equality. If f is not a constant, thenf(x)−〈f〉 takes on positive and negative values on sets of positive measure. Since J is strictlyconvex, then (6.2) is a strict inequality either for all t > 〈f〉 or for all t < 〈f〉. That meansthat inequality (6.3) is a strict inequality on a set of positive measure, thus after integrationwe get a strict inequality in (iii).

Remark 6.2 A measure space (M, µ) is called a probability space (Wahrscheinlichkeit-sraum) if µ(M) = 1. On a probability space, Jensen inequality simplifies a bit since there isno need for normalization with µ(Ω). For example, from the convexity of the function

J(t) = tp, t ≥ 0

in case of 1 ≤ p < ∞, it follows that on a probability space

(

|f |dµ)p

|f |pdµ (6.4)

The last example is a special case of the (probably) most important inequality in analysis:

Theorem 6.3 (Holder’s inequality) Let 1 ≤ p, q ≤ ∞ be conjugate exponents (kon-jugierte Exponent), i.e. satisfy 1

p+ 1

q= 1 (by convention, 1/∞ = 0). Then for any two

nonnegative functions f, g ≥ 0 defined on a measure space (Ω, µ) we have

Ω

fg dµ∣

∣≤ ‖f‖p‖g‖q (6.5)

15

Furthermore, if the assumption f, g ≥ 0 is dropped but we assume f ∈ Lp and g ∈ Lq, thenfg ∈ L1 and (6.5) holds.

Finally, if f ∈ Lp, g ∈ Lq then (6.5) holds with equality if and only if there exists λ ∈ Rsuch that

(i) g = λ|f |p−1 in case of 1 < p < ∞;(ii) in case of p = 1 we have |g| ≤ λ (a.e.) and |g| = λ on the set where f(x) 6= 0.The case p = ∞ is the dual of (ii).

Holder’s inequality is usually stated for two functions, but it is trivial to extend it toproduct of many functions by induction:

Ω

f1f2 . . . fk dµ∣

∣≤ ‖f1‖p1

‖f2‖p2. . . ‖fk‖pk

(6.6)

whenever1

p1+

1

p2+ . . . +

1

pk= 1

Proof. I will just show the inequality, the cases of equality follows from these arguments(THINK IT OVER!). We also assume that f ∈ Lp and g ∈ Lq, otherwise (6.5) holds triviallyfor f, g ≥ 0. [Note that this statement is not true without the non-negativity assumption,since

fgdµ may not be defined!]First proof. The standard proof starts with observing that it is sufficient to prove the

inequality if ‖f‖p = ‖g‖q = 1, otherwise one could redefine f → f/‖f‖p, g → g/‖g‖q by thehomogeneity of the norm. Then one uses the arithmetic inequality

ab ≤ap

p+

bq

q, a, b ≥ 0

(that can be proven by elementary calculus) and get

Ω

|f ||g| dµ ≤1

p

Ω

|f |p +1

q

Ω

|g|q =1

p+

1

q= 1

and this was to be proven under the condition that ‖f‖p = ‖g‖q = 1.

Second proof. Again, we will prove only the ‖f‖p = ‖g‖q = 1 case and for simplicitywe can clearly assume that f, g ≥ 0 (replace f → |f | and g → |g|). In this case, the measureg(x)qdµ(x) is a probability measure and we write

fgdµ∣

∣=

fg1−qgqdµ∣

∣≤

∣fg1−q∣

pgqdµ

16

by the probability space version of Jensen’s inequality (6.4). Thus

fgdµ∣

∣≤

f pg(1−q)p+qdµ =

f pdµ = 1

since p, q were conjugate exponents, thus (1 − q)p + q = 0.

The most commonly used case of Holder’s inequality is the case p = q = 2, i.e. theCauchy-Schwarz inequality

fgdµ∣

∣≤ ‖f‖2‖g‖2 (6.7)

Homework 6.4 Prove the the following form of Cauchy-Schwarz’ inequality. For any α > 0

fgdµ∣

∣≤

1

2

[

α‖f‖2 + α−1‖g‖2

]

This form is actually stronger; (6.7) follows from it easily (HOW?) In many cases it is usefulto have the freedom of choosing the additional parameter α in the estimate. Keep this in mind!

Theorem 6.5 (Minkowski inequality) Let 1 ≤ p ≤ ∞ and let f, g be defined on a measurespace (Ω, dµ). Then

‖f + g‖p ≤ ‖f‖p + ‖g‖p (6.8)

If f 6= 0 and 1 < p < ∞, then equality holds iff g = λf for some λ ≥ 0. For the endpointexponents, p = 1 or p = ∞ equality can hold in other cases as well.

Minkowski inequality states the triangle inequality of the Lp norm as it was mentionedearlier.

Proof. Again, Minkowski inequality has many proofs, see e.g. a very general version ofthis inequality whose proof uses Fubini’s theorem in Lieb-Loss: Analysis, Section 2.4.

The most direct proof relies on convexity of the function t → tp (we can assume 1 < p < ∞,the p = 1 case is trivial, the p = ∞ case requires a different but equally trivial argument).We first note that f, g ≥ 0 can be assumed (WHY?). Then we write

(f + g)p = f(f + g)p−1 + g(f + g)p−1

and apply Holder’s inequality

f(f + g)p−1dµ ≤ ‖f‖p

(

(f + g)(p−1)qdµ)1/q

= ‖f‖p

(

(f + g)pdµ)1/q

17

(since (p − 1)q = p). Similarly

g(f + g)p−1dµ ≤ ‖g‖p

(

(f + g)(p−1)qdµ)1/q

= ‖g‖p

(

(f + g)pdµ)1/q

Thus∫

(f + g)p ≤(

‖f‖p + ‖g‖p

)(

(f + g)pdµ)1/q

dividing through the second factor and using that 1 − 1q

= 1p, we obtain (6.8). There is

only one small thing to check: the last step of the argument would not have been correct if∫

(f + g)pdµ = ∞. But by convexity of t → tp (t ≥ 0), we have

(f + g

2

)p

≤f p + gp

2

and the right hand side is integrable, so is the left hand side.

So far we worked on arbitrary measure spaces. The following inequality uses that theunderlying space has a vectorspace structure and the measure is translation invariant. Forsimplicity we state it only for Rd and the Lebesgue measure.

Theorem 6.6 (Young’s inequality) Let 1 ≤ p, q, r ≤ ∞ be three exponents satisfying

1

p+

1

q+

1

r= 2 (6.9)

Then for any f ∈ Lp(Rd), g ∈ Lq(Rd), h ∈ Lr(Rd) it holds

Rd

Rd

f(x)g(x − y)h(y) dxdy∣

∣≤ ‖f‖p‖g‖q‖h‖r (6.10)

Proof of Young’s inequality. It is a smart way of applying Holder’s inequality. We canassume that f, g, h ≥ 0. Let p′, q′, r′ be the dual exponents of p, q, r, i.e.

1

p+

1

p′=

1

q+

1

q′=

1

r+

1

r′= 1 (6.11)

and note that (6.9) implies1

p′+

1

q′+

1

r′= 1

Defineα(x, y) := f(x)p/r′g(x − y)q/r′

18

β(x, y) := g(x− y)q/p′h(y)r/p′

γ(x, y) := f(x)p/q′h(y)r/q′

and notice that the integral in Young’s inequality is exactly

I =

Rd

Rd

α(x, y)β(x, y)γ(x, y) dxdy

by using (6.11). Now we can use the generalized Holder’s inequality (6.6) for three functionswith exponents p′, q′, r′ on the measure space (Rd × Rd, dxdy) and conclude that

I ≤ ‖α‖r′‖β‖p′‖γ‖q′

These norms can all be computed, e.g.

‖α‖r′ =(

Rd

Rd

f(x)pg(x − y)q dxdy)1/r′

= ‖f‖p/r′

p ‖g‖q/r′

q

and similarly the other two. Putting these together, we arrive at (6.10).

One important application of Young’s inequality is the honest definition of the convolution.Recall the definition

Definition 6.7 The convolution (Faltung) of two functions f, g on Rd is given by

(f ? g)(x) :=

Rd

f(y)g(x− y)dy

It is a nontrivial question that the integral in this definition makes sense and if does, inwhich sense (for all x, maybe only for almost all x?). If f, g are “nice” functions (e.g. boundedand sufficiently decaying at infinity), then it is easy to see that the convolution integral alwaysexists, moreover, by a change of variables

f ? g = g ? f

If, however, f, g are just in some Lebesgue spaces, then the integral may not exists. Itis exactly Young’s inequality that tells us under which conditions on the exponents one candefine convolution on Lebesgue spaces.

Theorem 6.8 Let 1 ≤ 1p

+ 1q≤ 2. Let f ∈ Lp(Rd), g ∈ Lq(Rd), then f ? g is a function in

Lr′, where r′ is the dual exponent to r from Young’s inequality, i.e.

1 +1

r′=

1

p+

1

q

19

Proof of the special case q = 1. We want to show that

‖f ? g‖pp =

f(y)g(x− y)dy∣

p

dx (6.12)

is finite. It is clearly enough to assume that f, g ≥ 0 (see the remark below). Write

fg = fg1

p · g1− 1

p = fg1

p · g1

r

(notice that p, r are dual exponents) and use Holder’s inequality for the inner integral (for p, ras exponents):

‖f ? g‖pp ≤

(

f(y)pg(x − y)dy)(

g(x − y)dy)

p

r

dx

= ‖g‖p

r

1

∫∫

f(y)pg(x − y)dxdy = ‖f‖pp‖g‖

p

r+1

1 =(

‖f‖p‖g‖1

)p

(since p/r + 1 = p) which proves the claim for the special case q = 1.

There are two related general remarks:

(1) Note that Fubini theorem has been used, but for non-negative functions this is justifiedwithout any further assumptions.

(2) You may not like that before we have proved that f ? g is actually in Lp or even that itexists, we already computed its Lp norm. However, none of these steps actually require anyof these integrals to be finite: this is a big advantage of Lebesgue integrals of nonnegativefunctions. Recall that, for example, Holder’s inequality was stated for any two nonnegativefunctions. To convince you that there is nothing fishy here, I show once the absolutely correctargument, but later similar arguments will not be spelled out.

We first consider nonnegative f, g; for these functions every step is well justified, even ifsome of the above integrals are infinite. A-posteriori, we obtain from ‖f‖p‖g‖1 < ∞ thatevery integral is finite. This does not mean that

f(y)g(x− y)dy

is finite for every x, but it means that this is an Lp function in x (in particular, it is finite foralmost all x).

Now for arbitrary f and g we want to prove that∫

f(y)g(x− y)dy (6.13)

20

defines an Lp function, in particular that this integral is meaningful for almost all x. But thisintegral is clearly dominated pointwise (in x) by the integral

|f(y)||g(x− y)|dy (6.14)

and we know that this latter is in Lp by the argument above for nonnegative f, g. In particular,for almost all x, the function

y → |f(y)||g(x− y)|

is integrable, thus for almost all x the function

y → f(y)g(x− y)

is in L1. Therefore the integral (6.13) is meaningful for almost all x and then to check thatit is in Lp as a function of x, it is enough to show that it has a nonnegative majorant in Lp.But clearly (6.14) majorates (6.13) and it is in Lp.

Remark on the proof of Theorem 6.8 of the general case.We do not yet have all tools for the proof of this theorem for the general case: it requires

to know that the dual space of Lr is Lr′ , then f ? g will be identified by its integral againstany h ∈ Lr function, i.e. by

(f ? g)(y)h(y)dy

which is (modulo a sign flip) is exactly the double integral in Young’s inequality. Young’sinequality will tell us, that this double integral makes sense for any h ∈ Lr, moreover, it isa bounded linear functional on Lr, therefore f ? g can be identified with elements of Lr′ . Wewill learn all these later, but keep in mind the theorem.

7 Approximation by C∞0 functions

The goal is to prove the following basic approximation theorem. Recall that for any opendomain Ω ⊂ Rd we denote by C∞

0 (Ω), i.e. the set of compactly supported, smooth (=infinitelymany times differentiable) functions:

C∞0 (Ω) :=

f : Ω → C : supp(f) ⊂ Ω is compact, ∂α1

1 ∂α2

2 . . . ∂αd

d f(x) exists ∀x ∈ Ω, ∀αj ∈ N

(Some books use the notation C∞c (Ω).)

21

WARNING: Recall the precise definition of the support (Trager) of a continuous function

supp(f) := x ∈ Rd : f(x) 6= 0

i.e. it is the closure of all points where f does not vanish.In particular, since Ω is open, a function with compact support in Ω must vanish in a

neighborhood of the boundary.

Theorem 7.1 Let Ω ⊂ Rd be a non-empty open set and let 1 ≤ p < ∞. Then C∞0 (Ω) is

dense in the space Lp(Ω, dx) equipped with the Lp norm.

In particular, from this theorem it follows that C[0, 1] is dense in Lp[0, 1] for any Lp

norm if p < ∞. Note that equipped with the supremum (or L∞) norm, (C[0, 1], L∞) is notdense in (L∞[0, 1], L∞) because both spaces are complete and they are obviously not equal.Summarizing the conclusions of Riesz-Fischer theorem and Theorem 7.1, we obtain

Corollary 7.2 Let 1 ≤ p < ∞ and Ω ⊂ Rd be open. Then the completion of C∞0 (Ω) equipped

with the Lp norm is Lp(Ω).

Homework 7.3 Let Ω ⊂ Rd be open. Show that the completion of C∞0 (Ω) equipped with the

supremum norm is C(Ω).

Proof of Theorem 7.1. We will show the proof for Ω = Rd, the general case will behomework.

Choose an arbitrary function j ∈ L1(Rd) with∫

j = 1. Define

jε(x) := ε−dj(x

ε

)

Note that∫

jε = 1, ‖j‖1 = ‖jε‖1 (7.15)

(this is how the normalization was chosen) and as ε → 0, the function jε is more and moreconcentrated and peaky around the origin.

Let f ∈ Lp, 1 ≤ p < ∞ and define

fε(x) := (f ? jε)(x) =

f(y)jε(x − y)dy

According to Theorem 6.8, fε is an Lp function and

‖fε‖p ≤ ‖f‖p‖j‖1 (7.16)

22

(we used ‖j‖1 = ‖jε‖1 and we used only the special case of Theorem 6.8 that we proved).Since jε is very strongly concentrated around 0 with a total integral 1, we expect that fε isclose to f . This is the content of the

Proposition 7.4 Assuming f ∈ Lp, 1 ≤ p < ∞, we have

limε→0

‖f − fε‖ = 0

Proof of Proposition 7.4. The proof consists of several standard steps. We will go throughthem, because the similar arguments very often used in analysis, and usually they are notexplained in details, it is usually referred to as “by standard approximation arguments” andit is assumed that everybody went through such a proof in his/her life.

Step 1. We show that it is sufficient to prove the Proposition if j has compact support.For any sufficiently large R, we define

jR(x) := CRχ(|x| < R)j(x)

(here R is not a power, but an upper index), where χ(|x| < R) is the characteristic functionof the ball |x| < R and CR is the normalization

CR :=(

|x|<R

j(x)dx)−1

to ensure that∫

jR = 1. Obviously, CR → 1 as R → ∞. As before, we define

jRε (x) := ε−djR(x/ε)

Then, by using j − jR = [(1 − χ) + (1 − CR)χ]j, we have

‖jε − jRε ‖1 = ‖j − jR‖1 ≤

|x|≥R

|j| + |CR − 1|

|x|≤R

|j| → 0

as R → ∞ uniformly in ε. Therefore, by inequality (7.16) (that is basically a special case ofYoung’s inequality), we have

‖jε ? f − jRε ? f‖p ≤ ‖f‖p‖jε − jR

ε ‖1 → 0

uniformly in ε as R → ∞. This shows that one can replace j with a compactly supportedversion jR and the error can be made arbitrarily small.

This technique is called cutoff at infinity.

23

Step 2. With an almost identical (actually somewhat easier) cutoff argument, it is suffi-cient to show the Proposition for compactly supported f (HOMEWORK: think it over).

Step 3. Now we show that it is sufficient to prove the theorem for bounded f . We againuse a cutoff argument, but now not in the domain (x-space) but in the range. For a sufficientlylarge positive h we define

fh(x) := f(x)χx : |f(x)| ≤ h

Again, by (7.16) and (7.15), we have

‖jε ? f − jε ? fh‖p ≤ ‖j‖1‖f − fh‖p

and clearly ‖f − fh‖p → 0 as h → ∞. The estimate is again uniform in ε.

Step 4. Now we show that it is sufficient to prove the Proposition for p = 1. Indeed, forany 1 < p < ∞ we have

‖jε ? f − f‖pp =

jε(x − y)f(y)dy − f(x)∣

p

dx

We can estimate∣

jε(x − y)f(y)dy − f(x)∣

p−1

≤ C‖f‖p−1∞

where C := (‖j‖1 + 1)p−1, thus

‖jε ? f − f‖pp ≤ C‖f‖p−1

jε(x − y)f(y)dy − f(x)∣

∣dx = C‖f‖p−1

∞ ‖jε ? f − f‖1

Thus it is sufficient to show ‖jε ? f − f‖1 → 0 as ε → 0. One also should check that f ∈ Lp

condition can be translated to f ∈ L1, but we already assumed that f is compactly supportedand bounded, so it is any Lp space.

Step 5. It is sufficient to prove the Proposition for simple functions of the form

f =∑

i

ciχRi

where the sum is finite, ci ∈ C and Ri’s are rectangles. To see this, we recall that theset of simple functions of this form are dense in L1, in other words any L1-function can beapproximated by them in L1-sense. (This fact follows from the construction of the Lebesgueintegral plus the regularity of the Lebesgue measure plus from the fact that any open set inRd can be approximated by rectangles – THINK IT OVER!)

24

For any given f ∈ L1, let fn be a sequence of simple functions such that fn → f in L1.Suppose that the Proposition is proven for every fn. Then

‖jε ? f − f‖1 ≤ ‖jε ? (f − fn)‖1 + ‖jε ? fn − fn‖1 + ‖fn − f‖1

≤ (‖j‖1 + 1)‖fn − f‖1 + ‖jε ? fn − fn‖1

For any given η > 0. the first term can be made small than η/2 by choosing n sufficiently largeand this choice is uniform in ε. After choosing n sufficiently large, we can fix it and choose εsufficiently small so that the second term becomes smaller than η/2. Thus ‖jε ? f − f‖1 canbe made smaller than any given η if ε is sufficiently small, and this proves Step 5.

Step 6. By linearity of the convolution and the triangle inequality of the norm, it issufficient to prove the Proposition for f = χR, i.e. for the characteristic function of a singlerectangle.

Step 7. By an explicit calculation:

‖jε ? χR − χR‖1 =

jε(x − y)χR(y)dy − χR(x)∣

∣dx

=

jε(x − y)(χR(y) − χR(x))dy∣

∣dx

Notice the trick of bringing the second term χR(x) inside the integration by using that∫

jε = 1.The integrand jε(x−y)(χR(y)−χR(x)) is explicitly zero unless dist(x, ∂R) ≤ ε`, where ∂R

is the boundary of R and j is supported in a ball of radius `. This is because the first factorin jε(x − y)(χR(y) − χR(x)) is zero whenever |x − y| ≥ ε` and the second factor is nonzeroonly if exactly one of the two points x, y lies in R. Therefore

‖jε ? χR − χR‖1 =

dist(x,∂R)≤ε`

jε(x − y)(χR(y) − χR(x))dy∣

∣dx

≤ 2‖jε‖1

dist(x,∂R)≤ε`

1dx = 2‖j‖1volx : dist(x, ∂R) ≤ ε` → 0

as ε → 0 since the volume of an ε` neighborhood of the boundary of a fixed rectange R is oforder ε (here ` is fixed). This completes the proof of Proposition 7.4.

From Proposition 7.4 our Theorem 7.1 easily follows. Simply consider a smooth, compactlysupported function j ∈ L1 with

j = 1. If f has a compact support, then so does

fε(x) =

jε(x − y)f(y)dy

25

and by the same argument as in Step 2. above, it is sufficient to prove Theorem 7.1 forcompactly supported f (THINK IT OVER). Since fε → f in Lp, Theorem 7.1 will be provenonce we show that fε ∈ C∞

0 . We will show that

∂fε

∂xj=

∂jε

∂xj? f (7.17)

i.e. convolution fε = jε ? f can be differentiated such that we differentiate one factor. Thedifferentiability up to arbitary order will then follow by induction.

To show (7.17), we form the difference quotient on the left hand side after changing vari-ables

jε(. . . , yj + δ, . . .) − jε(. . . , yj, . . .)

δf(x − y)dy

The fraction in the integrand converges to ∂jε

∂xj(y) pointwise, and it is also uniformly bounded

in δ (here ε is fixed!) since jε is smooth and compactly supported, thus its first derivativesare bounded (and the first derivatives control the difference quotients by the Taylor formulawith remainder term, THINK IT OVER!) Thus by dominated convergence theorem we obtain(7.17) and this completes the proof for the case Ω = Rd.

Homework 7.5 The above proof was for Ω = Rd. Prove the theorem for any open set Ω.[Hint: show that there exists an increasing sequence of compact sets, K1 ⊂ K2 ⊂ . . . ⊂ Ω suchthat if fn := fχKn

, then ‖f −fn‖Lp(Ω) → 0 as n → ∞. Apply the construction described abovefor each fn, the construction shows that the support of the approximating functions of fn canbe chosen arbitrarily close to the support of fn, in particular it can be chosen in an arbitrarysmall neighborhood of Kn, i.e. it can be chosen within Ω.]

This completes the proof of the approximation theorem.

26