Crash course in analysis - Mathematisches Institut der LMUlerdos/WS12/MQM/crashanal.pdf · Crash course in analysis L´aszl´o Erd˝os ... H.L. Royden: Real Analysis or Walter

Crash course in analysis

Laszlo Erdos

Jan 10, 2011

Contents

1 Need for Lebesgue Integral 2

2 Concepts you should know 4

3 Singular measures 6

4 Outlook: spectral theorem and measure decomposition in physics 10

5 Lp-spaces 14

6 Riesz-Fischer theorem 17

7 Inequalities: Jensen, Holder, Minkowski, Young and HLS 19

8 Interpolations: Riesz-Thorin, Hausdorff-Young 28

9 Approximation by C∞0 functions 30

10 Schwarz functions 35

11 H1 Sobolev spaces 37

12 Sobolev inequalities 40

1

Based upon the poll in class (and the required prerequisite for the course – Analysis III),I assume that everybody is familiar with general measure theory and Lebesgue integration.

The beginning of this note (Section 1 and 2) is meant to remind you what conceptsthis involves. If something is unknown, look it up. I provide a good summary of basicconcepts (without proofs) by Marcel Griesemer. This file actually contains a bit more thanwe need, see the list below. Another summary you can find e.g. in the Appendix A of Werner:Funktionalanalysis. If you need to check more details (e.g. proofs), consult with any analysisor measure theory book. Very good books are: H.L. Royden: Real Analysis or Walter Rudin:Principles of mathematical analysis. A very concise and sharp introduction is in E. Lieb, M.Loss: Analysis.

1 Need for Lebesgue Integral

One can justify the necessity of a more general integration (than Riemann) in many ways.From functional analysis point of view there are two “natural” arguments:

Need for Lebesgue I.

Consider the space C[0, 1], i.e. the space of continuous functions on the unit interval [0, 1].We can equip this space with two different metrics (actually, norms):

d1(f, g) := ‖f − g‖1 :=

∫ 1

0

|f(x) − g(x)|dx, d∞(f, g) := ‖f − g‖∞ := supx∈[0,1]

|f(x) − g(x)|

(the sup is actually max). It is a standard exercise in analysis to show that C[0, 1] is completeunder the d∞ metric (if it sounds unknown – DO IT). Recall that completeness means thatany Cauchy sequence converges; in this case it means that if a sequence fn ∈ C[0, 1] is Cauchyin the d∞ metric (i.e. for any ε > 0 there is an N = Nε such that for any n,m ≥ Nε we haved∞(fn, fm) ≤ ε), then there exists f ∈ C[0, 1] such that d∞(fn, f) → 0 as n→ ∞.

C[0, 1] is clearly not complete under the d1 metric – it is trivial to find a sequence of

continuous functions fn and a discontinuous function f : [0, 1] → R, such that∫ 1

0|fn−f | → 0.

For example, let f be the characteristic function of the interval [0, 12] and fn be a sequence

of smooth approximations of f (CONSTRUCT one to convince yourself). In particular, fn isCauchy in d1 (it even converges), but the limit is not in C[0, 1].

You may think, there is no problem, since we know how to Riemann-integrate functionswith jumps, e.g. piecewise continuous functions. Recall that PC[0, 1], the space of piecewisecontinuous functions on [0, 1] is defined as the set of all functions f : [0, 1] → R for whichthere exist finitely many numbers, 0 = a0 < a1 < a2 < . . . < ak−1 < ak = 1, such that f

2

restricted to each open interval (aj, aj+1) is continuous and all one-sided limits at the pointsaj , j = 0, 1, 2, . . . exist, but may not coincide.

So if (C[0, 1], d1) is not complete, maybe (PC[0, 1], d1) is so. It is fairly easy to see thatthis is not the case:

Homework 1.1 Prove that (PC[0, 1], d1) is not complete.

We know that PC[0, 1] is not the biggest class of functions that are Riemann integrable,eventually Riemann integrability can allow infinitely many discontinuities, as long as thedifference of the lower sums and upper sums converge to zero, i.e. the oscillation of thefunction is not too big. The basic theorem about Riemann integrability is the following:

Theorem 1.2 A function f : [0, 1] → R is Riemann integrable if and only if f is boundedand it is continuous almost everywhere, i.e. the set of discontinuities is of (Lebesgue) measurezero.

Homework 1.3 Prove directly (without reference to the above characterization of Riemannintegrability) that the set of Riemann integrable functions equipped with the d1 metric is notcomplete. (Hint: take a Cantor set C that has nonzero measure and consider its approxima-tions Cn that you obtain along the Cantor procedure after removing the n-th generation ofintervals. Then take the characteristic functions of these sets.)

Need for Lebesgue II.

It is a well-known phenomenon from basic analysis that pointwise convergence and con-tinuity are not compatible without further assumptions. For example, there are sequencesof continuous functions, fn ∈ C[0, 1], that converge pointwise to f(x), but the limit functionis not continuous, f 6∈ C[0, 1] (FIND an example!). In other words, the set of continuousfunctions is not closed under the pointwise limit (however, if fn converges to f uniformly,then f must be continuous, CHECK!)

What about pointwise convergence and Riemann integration? Is

limn→∞

∫ 1

0

fn(x)dx =

∫ 1

0

limn→∞

fn(x)dx

true? (In the sense, that pointwise limit of Riemann integrable functions is Riemann integrableand the limit of the integral is the integral of the limit). We know that without some furthercondition this cannot hold, just consider the sequence

fn(x) = nχ(0,1/n)(x)

3

(where χ(0,1/n) is the characteristic function of the open interval (0, 1/n)) that clearly converges

to f ≡ 0 pointwise but∫ 1

0fn = 1.

Suppose we are willing to assume uniform boundedness (that is anyway reasonable inthe realm of Riemann integrable functions and, a-posteriori, we know from the dominatedconvergence theorem that some condition is necessary) in order to save the exchangeability ofthe limit and integral. It still does not work, for example, consider the Dirichlet function

f(x) =

1 if x ∈ Q ∩ [0, 1]0 if x ∈ [0, 1] \ Q

and its approximations

fn(x) =

1 if x = p

q, p, q ∈ Z, p ≤ q ≤ n

0 otherwise

Clearly fn is Riemann integrable (since it is everywhere zero apart from finitely many points),while its pointwise limit, f , is not Riemann integrable (WHY?) Again, the problem is the bigoscillation.

The upshot is that Riemann integral is not sufficient; it resists the very useful concept ofcompleteness and it is incompatible with pointwise limit. It was a major conceptual achieve-ment in the history of the whole mathematics to find the right generalization of the Riemannintegral. The Lebesgue integral is the right concept. In the following section I list the toolboxof Lebesgue integrals.

2 Concepts you should know

The following concepts, theorems you should be familiar with:

• σ-algebra (meaningful on any set X)

• Borel sets (meaningful on a topological space).

• Measures, outer measures. Measure spaces.

• Regular measures on topological spaces (approximability of measures of sets by opensets from outside and compact sets from inside)

• Lebesgue measure and its properties. Lebesgue measure is the unique measure on Rn

that is invariant under Euclidean motions and assigns 1 to the unit cube.

4

• Lebesgue measurable sets. Zero measure sets. Concept of “almost everywhere”. Not allLebesgue sets are Borel (this is not easy to prove)

• Counting measure on the measure space (N, P (N), µ), where P (N) is the σ-algebra ofall subsets and µ is the counting measure.

• (Borel)-measurable functions. This class is closed under arithmetic operations, compo-sitions, lim inf and lim sup.

• Lebesgue integral. Integrable functions. Usual properties (linearity, monotonicity,|∫f | ≤

∫|f |)

• Lebesgue integral coincides with Riemann integral for Riemann integrable functions. Inparticular, the fundamental theorem of calculus (Newton-Leibniz Theorem) holds forLebesgue integral as well.

• Basic limit theorems: Monotone and dominated convergence, Fatou’s Lemma.

• Lebesgue integral of complex valued functions (infinite integral is not allowed,∫|f | <∞

is required).

• σ-finite measure spaces.

• Product of two measure spaces (construction of the product σ-algebra and productmeasure). Fubini theorem (need non-negativity or integrability with respect to theproduct measure to interchange integrals)

We will use the notation ∫

Ω

fdµ =

∫

Ω

f(x)dµ(x)

simultaneously for the Lebesgue integral. The second notation is favored if for some reasonthe integration variable needs to be spelled out explicitly (e.g. we have multiple integral).

If Ω ⊂ Rd, then we use ∫

Ω

f =

∫

Ω

f(x)dx

where dx stands for the Lebesgue measure. Unless we indicate otherwise, integration onsubsets of Rd is always understood with respect to the Lebesgue measure.

5

3 Singular measures

This chapter usually belongs to measure theory but I am not sure if the majority of you hadit. So we review it; you should be at least familiar with the concepts even if you have notseen all the proofs. We first present examples on R, then develop the general definitions.

Let α(x) : R → R be a monotonically increasing function. A monotonic function may notbe continuous, but its one-sided limits exist at every point, we introduce the notation

α(a+ 0) := limx→a+0

α(x), α(a− 0) := limx→a−0

α(x),

We define a measure µα by assigning

µα((a, b)) := α(b− 0) − α(a+ 0)

to any open interval (a, b). Since open intervals generate the sigma-algebra of Borel sets, it iseasy to see that the usual construction of Lebesgue measure (using α(x) = x) goes through forthis more general case. The resulting measure is called the Lebesgue-Stieltjes measure.With respect to this measure we can integrate, the corresponding integral is sometimes denotedas ∫

fdµα =

∫fdα

(the right hand side is only a notation). This is called the Lebesgue-Stieltjes integral.

Examples:

(i) As mentioned, α(x) = x gives back the Lebesgue integral. A bit more generally, ifα ∈ C1 (continuously differentiable), then it is easy to check that

∫fdα =

∫f(x)α′(x)dx

i.e. in this case the Lebesgue-Stieltjes integral can be expressed as a Lebesgue integralwith a weight function α′.

(ii) Fix a number p ∈ R. Let α(x) := χ(x ≥ p) be the characteristic function of the semi-axis[p,∞). CHECK that ∫

fdα = f(p)

for any function f . In particular, any function is integrable and the integral dependsonly on the value at the p. The corresponding L1 space is simply

L1(R, dα) ∼= C

6

i.e. it is a one-dimensional vectorspace (CHECK!).

The generated Lebesgue-Stieltjes measure is called the Dirac delta measure at p andit is denoted as δp. In particular,

δp(A) =

1 if p ∈ A0 if p 6∈ A

(iii) Let f ≥ 0 be a measurable function on R with finite total integral. Let

α(x) :=

∫ x

−∞

f(s)ds

Then

µα(A) =

∫

A

f(x)dx

(iv) A considerable more interesting example is the following function. Consider the standardCantor set, i.e.

C := [0, 1] \((1

3,2

3) ∪ (

1

9,2

9) ∪ (

7

9,8

9) ∪ (

1

27,

2

27) ∪ . . .

)

Recall that the Cantor set is a compact, uncountable set. It is easy to see that theLebesgue measure of C is zero. Define an increasing function α on [0, 1] as follows: αwill be constant on each of the set removed in the definition of C, more precisely

α(x) :=

1/2 x ∈ (1/3, 2/3)1/4 x ∈ (1/9, 2/9)3/4 x ∈ (7/9, 8/9)1/8 x ∈ (1/27, 2/27)3/8 x ∈ (7/27, 8/27)5/8 x ∈ (19/27, 20/27)7/8 x ∈ (25/27, 26/27)etc.

Make a picture to see the succesive definition of α on the complement of the Cantor set.

With these formulas we have not yet defined α on C.

Homework 3.1 (a) Show that the function α defined on [0, 1]\C above can be uniquelyextended to [0, 1] by keeping monotonicity. This is called the Devil’s staircase.

7

(b) Show that the extension is continuous.

(c) Let µα the corresponding Lebesgue-Stieltjes measure. Show that µα(p) = 0 for anypoint p ∈ [0, 1].

(d) Show that dµα is supported on a set of Lebesgue measure zero. (Recall that thesupport (Trager) of a measure µ is the smallest closed set K such that for anyproper closed subset H we have µ(K \H) > 0.)

(e) Show that α is almost everywhere differentiable in [0, 1] but the fundamental theoremof calculus does not hold, e.g.

α(1) − α(0) 6=

∫ 1

0

α′(x)dx

Homework 3.2 (Not trivial) Let µα be the Lebesgue-Stieltjes measure constructed inthe previous Homework. Compute

(a)

∫ 1

0

xdµα(x), and (b)

∫ 1

0

x2dµα(x)

[Hint: use the hierarchical structure of C]

This example shows that without the fundamental theorem of calculus it can be quitecomplicated to compute integrals. In this particular example the special structure ofC and α helped. If one constructs either a less symmetric Cantor set or one defines αdifferently, it may be very complicated to compute the integral.

The last three examples were prototypes of a certain classification of measures accordingto their singularity structure. The Dirac delta measure is so singular that it assigns nonzerovalue to a set consisting of a single point, namely δp(p) = 1. The measure dµα obtained fromthe “Devil’s staircase”, example (iv), is less singular, since it assigns zero measure to everypoint, but it is still supported on a small set (measured with respect to the usual Lebesguemeasure). Finally, example (iii) shows a non-singular measure in a sense that µα(A) = 0 forany set of zero Lebesgue measure.

We give the precise definitions of these classes.

Definition 3.3 Let µ and ν be two measures defined on a fixed σ-algebra on a space X. Then(a) ν is absolutely continuous (absolutstetig) with respect to µ if ν(A) = 0 whenever

µ(A) = 0. Notation: ν ≪ µ;(b) µ and ν are mutually singular if there is a measurable set A such that µ(A) = 0 and

ν(X \ A) = 0. Notation: µ ⊥ ν.

8

Example (iii) is a measure that is absolutely continuous with respect to the Lebesguemeasure, while examples (ii) and (iv) are both mutally singular with the Lebesgue measure(and with each other as well).

It is clear that example (iii) is an absolutely continuous measure. It is less clear, thatessentially every absolutely continuous measure is the result of an integration. This is thecontent of the important Radon-Nikodym theorem:

Theorem 3.4 (Radon-Nikodym) Let µ and ν be two measures on a common σ-algebra onX and µ be σ-finite. Then ν ≪ µ if and only if there exists a measurable function, f : X → R+

(infinity is allowed), such that

ν(A) =

∫

A

f(x)dµ(x)

for any A in the σ-algebra. This function is µ-a.e. (almost everywhere) unique. Notation:f = dν

dµ(this is only a formal fraction!)

Moreover, we also have the following decomposition:

Theorem 3.5 (Lebesgue decomposition I.) Let µ and ν be two σ-finite measures on acommon σ-algebra. Then ν can be uniquely decomposed as

ν = νac + νsing

where νac ≪ µ and νsing ⊥ µ.

The singular part can be further decomposed under a mild countability condition on thenumber of points that have positive measure:

Definition 3.6 Let (X,B, µ) be a measure space such that for every point x ∈ X, the set xbelongs to B, and let

P := x ∈ X : µ(x) 6= 0

The set P is called the pure points or atomic points of the measure µ. Assume that P isa countable set. Then the measure

µpp(A) := µ(A ∩ P ) =∑

x∈A∩P

µ(x)

is well-defined and it is called the pure point or atomic component of µ. A measure µ iscalled pure point measure if µ = µpp. A measure µ is called continuous if µpp = 0.

Given another measure ν on the same σ-algebra, the measure µ is singular continuouswith respect to ν if µ is continuous and µ ⊥ ν.

9

The Dirac delta measure from example (ii) is an atomic measure; examples (iii) and (iv)are continuous measures. Example (iii) is a measure that is absolutely continuous with respectto the Lebesgue measure, while (iv) and the Lebesgue measure are mutually singular. Themeasure in (iv) is thus a singular continuous measure with respect to the Lebesgue measure.

The following theorem is a simple exercise from these definitions:

Theorem 3.7 Given two measures µ, ν on the same σ-algebra that contains each x, assumethat ν ⊥ µ and assume that the set of atoms of ν is countable. Then the measure ν can beuniquely decomposed into ν = νpp + νsc, where νpp is the pure point component of ν and νsc isa singular continuous measure that is also mutually singular to νpp.

The most important application is the following version of these decomposition theoremswhose proof is a simple exercise from the definitions above.

Theorem 3.8 (Lebesgue decomposition II.) Let µ and ν be two σ-finite measures on acommon σ-algebra that contains each x. (In particular there are at most countably manypoints with nonzero weight). Then ν can be uniquely decomposed as

ν = νac + νpp + νsc

where νac ≪ µ, νsc ⊥ µ and νpp is the pure-point component of ν.

4 Outlook: spectral theorem and measure decomposi-

tion in physics

To a physicist, the Lebesgue decompositions may look like ad hoc mathematical subtleties.Here I explain informally how they show up in quantum mechanics.

We are used to think of a measure as a map assigning real (or complex) numbers to a familyB of selected subsets (σ-algebra) of a fixed space Ω. Suppose we are given a fixed separableHilbert space H, then it make sense to consider projection valued measures (PVM) on aσ-algebra B on Ω. This is a map that assigns to any element B ∈ B an orthogonal projectionP (B) in H. Recall that the orthogonal projections P in a Hilbert space are characterized byP 2 = P and P ∗ = P . Each orthogonal projection P corresponds (in a unique fashion) to aclosed subspace HP ⊂ H and vice versa; namely the subspace onto which P projects. Wesay that the dimension of P is the dimension of this unique subspace S, in other words, thedimension of the range of P :

dim P := dim HP

10

(that may be infinite). To the trivial elements of the σ-algebra we assign

P (∅) = 0, P (Ω) = Id

and we require the σ-additivity in the sense that if S1, S2, . . . is a countable family of disjointsubsets of Ω, then ∑

j

P (Sj) = P(⋃

j

Sj

).

It needs a proof that the infinite sum on the left hand side is independent of the order ofthe summation and it converges (in the sense of the strong operator convergence; recall thatHn → H in this sense iff ‖Hnf −Hf‖ → 0) to another orthogonal projection, but a versionof the monotone convergence theorem will do the job; note that every projection is a non-negative and bounded operator, 0 ≤ P ≤ 1, where 1 = Id is the identity operator on H. Notethat the fact that P (S)+P (S ′) is a projection for S∩S ′ = ∅ itself implies that P (S) and P (S ′)are orthogonal to each other (meaning that the corresponding subspaces are orthogonal).

Once we have the concept of PVM , we can integrate, i.e. we can rigorously define∫

Ω

f(x)dP (x) (4.1)

for functions f : Ω → R. The definition is essentially the same as the usual Lebesgue integral:one considers the approximate sums

N ·2N∑

k=−N ·2N

f( k

2N

)P (Sk), Sk := x ∈ R :

k

2N≤ f(x) <

k + 1

2N

and one defines (4.1) as the strong limit of these approximations as N → ∞ (a sequence ofoperators Tn in H converges strongly an operator T , if ‖Tnψ − Tψ‖ → 0 for all ψ in thedomain of T ). The limit may not exists for each element ψ ∈ H, but very often it exists fora dense subspace. Once the integral (4.1) is defined for real valued functions, it is trivial toextend it to complex valued functions defined on R, by integrating the real and imaginaryparts separately.

The basic fact about self-adjoint operators, H = H∗, on H is the spectral theorem thatstates (a bit informally) that H (uniquely) determines a corresponding PVM, namely thefamily of spectral projections (Spektralschar) or spectral decomposition of H . Thebase set is Ω = R, and the σ-algebra B is the usual Borel σ-algebra of R. It holds that

∫

R

xdP (x) = H (4.2)

11

and it allows to define arbitrary functions of H via the formula

f(H) :=

∫

R

f(x)dP (x).

In particular, the time evolution (solution to Schrodinger equation) is defined via

eitH =

∫

R

eitxdP (x).

We sweep under the rug the domain issues; if f(x) is unbounded on the support of the measureP , then f(H) will be defined only on a (typically dense) subspace of the Hilbert space H.However, if f is bounded (like f(x) = eitx), then f(H) will be defined on the whole H.

To appreciate this concept, consider the very simple finite dimensional case. Here H = Cn

and operators are just n×n matrices. A self-adjoint (hermitian) matrixH = H∗ has altogethern real eigenvalues counted with multiplicity,

E1, E1, . . . E1︸︷︷︸m1

, E2, E2, . . . E2︸︷︷︸m2

, . . . Ek, Ek, . . . Ek︸︷︷︸mk

where there are k distinct eigenvalues, and the nonnegative integers m1, m2, . . .mk denote themultiplicities of E1, E2, . . . Ek, respectively (with

∑k1 mj = n). The usual spectral theorem

for hermitian matrices states that there are orthogonal projections PE1 , PE2, . . . PEk, one for

each eigenvalue, that are mutually orthogonal and such that

H =

k∑

j=1

EjPEj, dimPEj

= mj.

This is exactly the integral (4.2) if the spectral measure is a sum of delta-functions

dP (x) =

k∑

j=1

δ(x−Ej)Px

or, more precisely

P (S) :=k∑

j=1

1(Ej ∈ S)PEj

for any S ⊂ R Borel set. In other words, finite dimensional matrices have a very singularspectral measure; it is a sum of delta functions. We say that such spectrum is pure point.

12

Self-adjoint operators (“Hamiltonians”) on an infinite dimensional Hilbert space (like H =L2(R3)) also have a spectral decomposition, but it is often not singular. For example, thefree kinetic energy operator, H = −∆, has an absolutely continuous spectral measure,namely for any S ⊂ R we have

HP (S) = f ∈ L2(R3) : suppf ⊂ S

(recall that any projection P = P (S) can be characterized by the subspace HP , its range).The spectral measure assigns a zero weight (zero projection) to each set S with zero Lebesguemeasure, since the subspace of L2-functions whose Fourier transform is supported on such aset S is trivial.

The Hydrogen Hamiltonian, H = −∆ − 1/|x|, has a mixed spectrum: it has eigenvalues(pure point spectrum) below zero and it has an absolutely continuous spectrum above zero.Singular continuous spectrum can also shows up (but it is more a pathological case).

The analogue of Lebesgue decomposition II. is the statement that for any self-adjointoperator H , the Hilbert space decomposes into a direct sum of three mutually orthogonalsubspaces,

H := Hpp ⊕ Hac ⊕ Hsc

such that these subspaces are invariant with respect to H and the spectral measure restrictedto Hpp,Hac,Hsc is pure point, absolutely continuous and singular continuous, respectively.

Quantum states ψ ∈ Hpp belonging to the pure point subspace are L2 eigenfunctions, theseare localized states. The eigenvalue E is the energy of the state ψ,

Hψ = Eψ, 〈ψ,Hψ〉 = E‖ψ‖2 = E.

They are called localized because their time evolution, eitHψ = eitEψ, is essentially unchanged(only a time dependent phase factor appears).

In contrast, states that are in the continuous subspaces, ψ ∈ Hac ⊕ Hsc, are delocalizedbecause it can be shown that their time evolution extends beyond any compact set as t→ ∞(it is called the RAGE theorem). There is no L2 eigenfunction ψ assigned to an energy E inthe support of the continuous spectrum. Formally one can think of ψp(x) := e2πipx being an“eigenvector” of −∆, with “eigenvalue” Ep = (2πp)2, since −∆ψp = Epψp, but ψp 6∈ L2(R3).However an appropriate “linear combination” of ψp,

∫a(p)ψpdp (4.3)

can be an L2 function, namely (F−1a)(x). For example, if a(p) ∈ L2(R3) then its inverseFourier transform is also in L2. The function (F−1a)(x) given in (4.3) is not an eigenfunction,but it lies in the continuous subspace.

13

For H = −∆ the spectral decomposition can easily be obtained with Fourier transform.For general self-adjoint operators, such an easy and explicit characterization is not possiblebut still the (de)localization properties of the time evolution can be identified via the spectraltype.

5 Lp-spaces

Dominated convergence theorem resolved the “Need for Lebesgue II.” by demonstrating thatpointwise limit and integration can be interchanged within the Lebesgue framework (assumingthe existence of the integrable dominating function). What about “Need for Lebesgue I”?

It is clear that the formula

‖f‖1 :=

∫ 1

0

|f(x)|dx (5.4)

extends the norm (metric) d1 from C[0, 1] to all Lebesgue integrable functions on [0, 1], sinceRiemann and Lebesgue integrals coincide on continuous functions. In the Riesz-Fischer theo-rem below (Section 6) we will show, that the space of Lebesgue integrable functions is actuallycomplete, so it is one of the possible completions of (C[0, 1], d1) (we do not know yet that itis the smallest possible, for that we will have to show that the continuous functions are densein the set of Lebesgue integrable ones).

However, before we discuss this, we have to introduce the Lp spaces. It would be temptingto equip the space of Lebesgue integrable functions by the norm given by (5.4). Unfortunately,this is not a norm, for a “stupid” reason: the Lebesgue integral is insensitive to changing theintegrand on zero measure set. In particular, ‖f‖1 = 0 does not imply that f(x) = 0 for allx, only for almost all x.

The following idea circumvents this problem and we discuss it in full generality. Let(Ω,B, µ) be a measure space (where Ω is the base set, B is a σ-algebra and µ is the measure).We consider the an equivalence relation on the set of functions f : Ω → C:

f ∼ g iff f(x) = g(x) for µ-almost all x

It needs a (trivial) proof that this is indeed an equivalence relation.Suppose that f is integrable, then obviously any function in its equivalence class is also

integrable with the same integral. Therefore we consider the space

L1(Ω,B, µ) = L1(Ω, µ) = L1(Ω) = L1 := Integrable functions/ ∼

i.e. the integrable functions factorized with this equivalence relation (the various notationsare all used in practice, in principle the concept of L1 depends on the space, the measure

14

and the sigma algebra, but in most cases it is clear from the context which sigma-algebraand measure we consider, so we omit it from the notation). It is easy to see that the usualvectorspace operations extend to the factorspace. Moreover the integration naturally extendsto L1(Ω, µ).

The only thing to keep in mind, that notationally we still keep denoting elements ofL1(Ω, µ) by f(x), even though f(x) does not make sense for a fixed x for a general L1 function(for continuous functions it is of course meaningful).

Definition 5.1 Let (Ω,B, µ) be a measure space and let 0 < p ≤ ∞. We set

Lp(Ω, µ) := f : Ω → C,measurable :

∫

Ω

|f |pdµ <∞/ ∼

for p <∞ and

L∞(Ω, µ) := f : Ω → C,measurable : ess sup |f | <∞/ ∼

where the essential supremum of a function is defined by

ess sup |f | := infK ∈ R : |f(x)| ≤ K for almost all x

These spaces are called Lp-spaces or Lebesgue spaces.

Note that every element of a Lebesgue space is actually an equivalence class of functions.But this fact is usually omitted from the notations.

Homework 5.2 Prove that Lp(Ω, µ) is a vectorspace for any p > 0.

Definition 5.3 For f ∈ Lp we define

‖f‖p :=(∫

Ω

|f |pdµ)1/p

if p <∞ and‖f‖∞ := ess sup |f |

if p = ∞.

These formulas do not define a norm if 0 < p < 1 (triangle inequality is not satisfied)but they do define a norm for 1 ≤ p ≤ ∞. For the proof, one needs Minkowski inequality(Theorem 7.5)

‖f + g‖p ≤ ‖f‖p + ‖g‖p

that is exactly the triangle inequality for ‖ · ‖p (the other two properties of the norm aretrivially satisfied). From now on we will always assume that 1 ≤ p ≤ ∞ whenever we talkabout Lp spaces.

These norms naturally define the concept of Lp convergence of functions:

15

Definition 5.4 A sequence of functions fn ∈ Lp converges to f ∈ Lp in Lp-sense or inLp-norm if ‖fn − f‖p → 0 as n→ ∞.

In case of Lp convergent sequences, we often say that fn converges strongly (stark),although this is a bit imprecise, since it does not specify the exponent p. We will see laterthat it nevertheless distinguishes from the concept of weak convergence.

These convergences naturally extend the d1 and d∞ convergences on continuous functionswe have studied earlier. Moreover, the pointwise convergence also naturally extends to Lp

functions, but we must keep in mind the problem that everything is defined only almostsurely.

Definition 5.5 (i) A sequence of measurable functions fn on a measure space (Ω,B, µ) con-verges to a measurable function f almost everywhere (fast uberall) if there exists a setZ of measure zero, µ(Z) = 0, such that

fn(x) → f(x) ∀x 6∈ Z

(ii) A sequence of equivalence classes of measurable functions fn converges pointwise to anequivalence class of measurable functions f , if any sequence of representatives of the classesof fn converges to any representative of f almost everywhere.

It is any easy exercise to show that if the convergence holds for at least one sequenceof representatives, then it holds for any sequence (of course the exceptional set Z changes),in particular part (ii) of the above definition is meaningful. Therefore one does not need todistinguish between almost everywhere pointwise convergence of equivalence classes and theirrepresentatives. In the future, we will thus freely talk about, e.g., Lp functions convergingalmost everywhere pointwise without ever mentioning the equivalence classes.

Homework 5.6 Give examples that pointwise convergence does not imply Lp convergence andvice versa. Give also examples that convergence in Lp does not in general imply convergencein Lq, p 6= q.

There is, however, one positive statement:

Lemma 5.7 Suppose the total measure of the space is finite, µ(Ω) <∞. Then Lp convergenceimplies Lq convergence whenever q ≤ p.

Proof. Use Holder inequality (we will prove it later, but I assume everybody has seen it)∫

Ω

|f |qdµ =

∫

Ω

|f |q · 1 dµ ≤(∫

Ω

(|f |q

)p/qdµ

)q/p(∫

Ω

1p/(p−q)dµ)(p−q)/p

thus‖f‖q ≤ ‖f‖p

(µ(Ω)

) 1q− 1

p2

16

6 Riesz-Fischer theorem

The following theorem presents the most important step towards proving that L1[0, 1] is thecompletion of C[0, 1] equipped with the d1 metric.

Theorem 6.1 (Riesz-Fischer) Let (Ω,B, µ) be an arbitrary measure space, let 1 ≤ p ≤ ∞and consider the Lp = Lp(Ω, µ).

(i) The space Lp, equipped with the norm ‖ · ‖p, is complete, i.e. if fi ∈ Lp is Cauchy, thenthere is a function f ∈ Lp such that fi → f in Lp-sense.

(ii) If fi → f in Lp, then there exists a subsequence, fik , and a function F ∈ Lp such that|fik(x)| ≤ F (x) for all n (almost everywhere in x) and fik converges to f almost everywhere,as k → ∞.

Proof. We will do the proof for p < ∞. The p = ∞ case requires a somewhat differenttreatment (since L∞ is defined differently) but it is simpler.

Step 1: Subsequential convergence is enough.

This is an important basic idea. We want to prove that a Cauchy sequence fi convergesstrongly. It turns out that it is sufficient to show that some subsequence converges strongly.Apparently this is much weaker, but actually it is not. Suppose that fik is a strongly convergentsubsequence, i.e. fik → f (in Lp) as k → ∞. But then

‖fi − f‖p ≤ ‖fi − fik‖p + ‖fik − f‖p

and thus for any ε > 0 we can make the second term smaller than ε/2 by choosing k suffi-ciently large, and then, by the Cauchy property, the first term is smaller than ε/2 if i and kare sufficiently large. Thus from subsequential strong convergence of a Cauchy sequence weconcluded the strong convergence of the whole sequence.

Step 2. Selection of a subsequence.

To find a convergent subsequence we proceed successively. Pick i1 such that

‖fn − fi1‖p ≤1

2∀n ≥ i1

such i1 exists by the Cauchy property. Now select i2 > i1 such that

‖fn − fi2‖p ≤1

4∀n ≥ i2

17

and again by the Cauchy property such i2 exists. Next we choose i3 > i2 such that

‖fn − fi3‖p ≤1

8∀n ≥ i3

etc., in general we have ik > ik−1 with

‖fn − fik‖p ≤1

2k∀n ≥ ik

Step 3. Telescopic sum

Now we define

Fℓ := |fi1 | +ℓ−1∑

k=1

|fik − fik+1|

By Minkowski inequality

‖Fℓ‖p ≤ ‖fi1‖p +1

2+

1

4+ . . . = ‖fi1‖p + 1

and clearly Fℓ is a monotone increasing sequence of functions. Let

F := limℓFℓ

be the almost everywhere pointwise limit, then by monotone convergence theorem and by theuniform bound on the Lp norm of Fℓ, we have

‖F‖p <∞

in particular, F (x) <∞ almost everywhere.Now use the telescopic sum

fik = fi1 + (fi2 − fi1) + (fi3 − fi2) + . . .+ (fik − fik−1).

As k → ∞ this is an absolutely convergent series for every x such that F (x) < ∞, let f(x)be its limit, thus

fik(x) → f(x) k → ∞

for almost every x. Moreover, from the telescopic sum it also follows that

|fik | ≤ F ∈ Lp

18

and thus by dominated convergence, we have

f ∈ Lp

Using dominated convergence once more, for

|fik − f | ≤ |fik | + |f | ≤ F + |f | ∈ Lp

we also have‖fik − f‖p → 0, k → ∞. 2

We know that (C[0, 1], ‖ · ‖∞) is complete and now we have seen that (L∞[0, 1], ‖ · ‖∞) isalso complete. However, for any p <∞, the set (C[0, 1], ‖ · ‖p) is not complete (EXAMPLE!)but (Lp[0, 1], ‖ · ‖p) is complete. Actually it is the (smallest) completion of (C[0, 1], ‖ · ‖p) aswe will soon prove.

Remark 6.2 The p = ∞ case often behaves exceptionally. Many theorems about Lp spaceshold only with the restriction p <∞, and/or sometimes, by duality, p > 1 is necessary. Ruleof thumb: whenever you use some theorem about Lp spaces watch out for the borderline cases,p = 1,∞ and make sure the theorem applies to them. Riesz-Fischer theorem holds withoutrestrictions, but many other theorems do not.

7 Inequalities: Jensen, Holder, Minkowski, Young and

HLS

The primary tools in analysis are inequalities. Even though often theorems in analysis areformulated as limiting statements, the heart of the proof is almost always an inequality. Herewe discuss a few basic inequalities involving integrals of functions. I assume that you havealready seen Jensen’s, Holder’s and Minkowski’s inequalities. I will not prove them in class,but I enclose their proofs – they are important, if you forgot them, review it.

Theorem 7.1 (Jensen’s inequality) Let J : R → R be a convex function and let (Ω, µ)be a measure space with finite total measure, i.e. µ(Ω) < ∞. Let f ∈ L1(Ω, µ) function anddefine its average as

〈f〉 :=1

µ(Ω)

∫

Ω

fdµ

Then(i) (J f)− ∈ L1 (here a− := max0,−a is the negative part (Negativteil) of a), in

particular,∫J f dµ is well defined (maybe +∞).

19

(ii) 〈J f〉 ≥ J(〈f〉)(iii) If J is strictly convex at 〈f〉 then equality in (ii) holds iff f = 〈f〉.

Proof. By convexity, there exists a number v such that

J(t) ≥ J(〈f〉) + v(t− 〈f〉) (7.5)

holds for every t ∈ R. (The graph of a convex function lies “above” every tangent line).Plugging in t = f(x), we have

J(f(x)) ≥ J(〈f〉) + v(f(x) − 〈f〉) (7.6)

and thusJ(f(x))− ≤ J(〈f〉)− + |v||f(x)| + |v||〈f〉| ∈ L1

thus (i) is proven (we needed only an upper bound on J(f(x))− since it is always non-negative).Integrating (7.6) over Ω with respect to µ, then dividing by µ(Ω), we get exactly (ii).Finally, to prove (iii), it is clear that if f is constant (almost everywhere), then clearly this

constant must be its average, 〈f〉 and (ii) holds with equality. If f is not a constant, thenf(x)−〈f〉 takes on positive and negative values on sets of positive measure. Since J is strictlyconvex, then (7.5) is a strict inequality either for all t > 〈f〉 or for all t < 〈f〉. That meansthat inequality (7.6) is a strict inequality on a set of positive measure, thus after integrationwe get a strict inequality in (iii). 2

Remark 7.2 A measure space (Ω, µ) is called a probability space (Wahrscheinlichkeit-sraum) if µ(Ω) = 1. On a probability space, Jensen inequality simplifies a bit since there isno need for normalization with µ(Ω). For example, from the convexity of the function

J(t) = tp, t ≥ 0

in case of 1 ≤ p <∞, it follows that on a probability space

(∫|f |dµ

)p

≤

∫|f |pdµ (7.7)

The last example is a special case of the (probably) most important inequality in analysis:

Theorem 7.3 (Holder’s inequality) Let 1 ≤ p, q ≤ ∞ be conjugate exponents (kon-jugierte Exponent), i.e. satisfy 1

p+ 1

q= 1 (by convention, 1/∞ = 0). Then for any two

nonnegative functions f, g ≥ 0 defined on a measure space (Ω, µ) we have

∣∣∣∫

Ω

fg dµ∣∣∣ ≤ ‖f‖p‖g‖q (7.8)

20

Furthermore, if the assumption f, g ≥ 0 is dropped but we assume f ∈ Lp and g ∈ Lq, thenfg ∈ L1 and (7.8) holds.

Finally, if f ∈ Lp, g ∈ Lq then (7.8) holds with equality if and only if there exists λ ∈ Rsuch that

(i) |g| = λ|f |p−1 in case of 1 < p <∞;(ii) in case of p = 1 we have |g| ≤ λ (a.e.) and |g| = λ on the set where f(x) 6= 0.The case p = ∞ is the dual of (ii).

Holder’s inequality is usually stated for two functions, but it is trivial to extend it toproduct of many functions by induction:

∣∣∣∫

Ω

f1f2 . . . fk dµ∣∣∣ ≤ ‖f1‖p1‖f2‖p2 . . . ‖fk‖pk

(7.9)

whenever1

p1

+1

p2

+ . . .+1

pk

= 1

Proof. I will just show the inequality, the cases of equality follows from these arguments(THINK IT OVER!). We also assume that f ∈ Lp and g ∈ Lq, otherwise (7.8) holds triviallyfor f, g ≥ 0. [Note that this statement is not true without the non-negativity assumption,since

∫fgdµ may not be defined!]

First proof. The standard proof starts with observing that it is sufficient to prove theinequality if ‖f‖p = ‖g‖q = 1, otherwise one could redefine f → f/‖f‖p, g → g/‖g‖q by thehomogeneity of the norm. Then one uses the arithmetic inequality

ab ≤ap

p+bq

q, a, b ≥ 0

(that can be proven by elementary calculus) and get

∫

Ω

|f ||g| dµ ≤1

p

∫

Ω

|f |p +1

q

∫

Ω

|g|q =1

p+

1

q= 1

and this was to be proven under the condition that ‖f‖p = ‖g‖q = 1. 2

Second proof. Again, we will prove only the ‖f‖p = ‖g‖q = 1 case and for simplicitywe can clearly assume that f, g ≥ 0 (replace f → |f | and g → |g|). In this case, the measureg(x)qdµ(x) is a probability measure and we write

∣∣∣∫fgdµ

∣∣∣ =∣∣∣∫fg1−qgqdµ

∣∣∣ ≤(∫ ∣∣fg1−q

∣∣pgqdµ)1/p

21

by the probability space version of Jensen’s inequality (7.7). Thus

∣∣∣∫fgdµ

∣∣∣ ≤(∫

f pg(1−q)p+qdµ)1/p

=( ∫

f pdµ)1/p

= 1

since p, q were conjugate exponents, thus (1 − q)p+ q = 0. 2

The most commonly used case of Holder’s inequality is the case p = q = 2, i.e. theCauchy-Schwarz inequality ∣∣∣

∫fgdµ

∣∣∣ ≤ ‖f‖2‖g‖2 (7.10)

Homework 7.4 Prove the following form of Cauchy-Schwarz’ inequality. For any α > 0

∣∣∣∫fgdµ

∣∣∣ ≤1

2

[α‖f‖2

2 + α−1‖g‖22

].

In many cases it is useful to have the freedom of choosing the additional parameter α in theestimate. Keep this in mind!

Theorem 7.5 (Minkowski inequality) Let 1 ≤ p ≤ ∞ and let the functions f, g be definedon a measure space (Ω, dµ). Then

‖f + g‖p ≤ ‖f‖p + ‖g‖p. (7.11)

If f, g ∈ Lp, f 6= 0 and 1 < p < ∞, then equality holds iff g = λf for some λ ≥ 0. For theendpoint exponents, p = 1 or p = ∞ equality can hold in other cases as well.

Minkowski inequality states the triangle inequality of the Lp norm as it was mentionedearlier.

Proof. Again, Minkowski inequality has many proofs, see e.g. a very general version ofthis inequality whose proof uses Fubini’s theorem in Lieb-Loss: Analysis, Section 2.4.

The most direct proof relies on convexity of the function t→ tp (we can assume 1 < p <∞,the p = 1 case is trivial, the p = ∞ case requires a different but equally trivial argument).We first note that f, g ≥ 0 can be assumed (WHY?). Then we write

(f + g)p = f(f + g)p−1 + g(f + g)p−1

and apply Holder’s inequality

∫f(f + g)p−1dµ ≤ ‖f‖p

(∫(f + g)(p−1)qdµ

)1/q

= ‖f‖p

(∫(f + g)pdµ

)1/q

22

(since (p− 1)q = p). Similarly

∫g(f + g)p−1dµ ≤ ‖g‖p

(∫(f + g)(p−1)qdµ

)1/q

= ‖g‖p

(∫(f + g)pdµ

)1/q

Thus ∫(f + g)pdµ ≤

(‖f‖p + ‖g‖p

)(∫(f + g)pdµ

)1/q

dividing through the second factor and using that 1 − 1q

= 1p, we obtain (7.11). There is

only one small thing to check: the last step of the argument would not have been correct if∫(f + g)pdµ = ∞. But by convexity of t→ tp (t ≥ 0), we have

(f + g

2

)p

≤f p + gp

2

and the right hand side is integrable, so is the left hand side. 2

So far we worked on arbitrary measure spaces. The following inequality uses that theunderlying space has a vectorspace structure and the measure is translation invariant. Forsimplicity we state it only for Rd and the Lebesgue measure.

Theorem 7.6 (Young’s inequality) Let 1 ≤ p, q, r ≤ ∞ be three exponents satisfying

1

p+

1

q+

1

r= 2 (7.12)

Then for any f ∈ Lp(Rd), g ∈ Lq(Rd), h ∈ Lr(Rd) it holds

∣∣∣∫

Rd

∫

Rd

f(x)g(x− y)h(y) dxdy∣∣∣ ≤ ‖f‖p‖g‖q‖h‖r (7.13)

Proof of Young’s inequality. It is a smart way of applying Holder’s inequality. We canassume that f, g, h ≥ 0. Let p′, q′, r′ be the dual exponents of p, q, r, i.e.

1

p+

1

p′=

1

q+

1

q′=

1

r+

1

r′= 1 (7.14)

and note that (7.12) implies1

p′+

1

q′+

1

r′= 1

Defineα(x, y) := f(x)p/r′g(x− y)q/r′

23

β(x, y) := g(x− y)q/p′h(y)r/p′

γ(x, y) := f(x)p/q′h(y)r/q′

and notice that the integral in Young’s inequality is exactly

I =

∫

Rd

∫

Rd

α(x, y)β(x, y)γ(x, y) dxdy

by using (7.14). Now we can use the generalized Holder’s inequality (7.9) for three functionswith exponents p′, q′, r′ on the measure space (Rd × Rd, dxdy) and conclude that

I ≤ ‖α‖r′‖β‖p′‖γ‖q′

These norms can all be computed, e.g.

‖α‖r′ =(∫

Rd

∫

Rd

f(x)pg(x− y)q dxdy)1/r′

= ‖f‖p/r′

p ‖g‖q/r′

q

and similarly the other two. Putting these together, we arrive at (7.13). 2

One important application of Young’s inequality is the honest definition of the convolution.Recall the definition

Definition 7.7 The convolution (Faltung) of two functions f, g on Rd is given by

(f ⋆ g)(x) :=

∫

Rd

f(y)g(x− y)dy

It is a nontrivial question that the integral in this definition makes sense and if it does, inwhich sense (for all x, maybe only for almost all x?). If f, g are “nice” functions (e.g. boundedand sufficiently decaying at infinity), then it is easy to see that the convolution integral alwaysexists, moreover, by a change of variables

f ⋆ g = g ⋆ f

If, however, f, g are just in some Lebesgue spaces, then the integral may not exists. Itis exactly Young’s inequality that tells us under which conditions on the exponents one candefine convolution on Lebesgue spaces.

Theorem 7.8 Let 1 ≤ 1p

+ 1q≤ 2. Let f ∈ Lp(Rd), g ∈ Lq(Rd), then f ⋆ g is a function in

Lr′ with‖f ⋆ g‖r′ ≤ ‖f‖p‖g‖q, (7.15)

where r′ is the dual exponent to r from Young’s inequality, i.e.

1 +1

r′=

1

p+

1

q

24

Proof of the special case q = 1. We want to show that

‖f ⋆ g‖pp =

∫ ∣∣∣∫f(y)g(x− y)dy

∣∣∣p

dx (7.16)

is finite. It is clearly enough to assume that f, g ≥ 0 (see the remark below). Write

fg = fg1p · g1− 1

p = fg1p · g

1r

(notice that p, r are dual exponents) and use Holder’s inequality for the inner integral (for p, ras exponents):

‖f ⋆ g‖pp ≤

∫ (∫f(y)pg(x− y)dy

)(∫g(x− y)dy

)prdx

= ‖g‖pr1

∫∫f(y)pg(x− y)dxdy = ‖f‖p

p‖g‖pr+1

1 =(‖f‖p‖g‖1

)p

(since p/r + 1 = p) which proves the claim for the special case q = 1. 2

There are two related general remarks:

(1) Note that Fubini theorem has been used, but for non-negative functions this is justifiedwithout any further assumptions.

(2) You may not like that before we have proved that f ⋆ g is actually in Lp or even that itexists, we already computed its Lp norm. However, none of these steps actually require anyof these integrals to be finite: this is a big advantage of Lebesgue integrals of nonnegativefunctions. Recall that, for example, Holder’s inequality was stated for any two nonnegativefunctions. To convince you that there is nothing fishy here, I show once the absolutely correctargument, but later similar arguments will not be spelled out.

We first consider nonnegative f, g; for these functions every step is well justified, even ifsome of the above integrals are infinite. A-posteriori, we obtain from ‖f‖p‖g‖1 < ∞ thatevery integral is finite. This does not mean that

∫f(y)g(x− y)dy

is finite for every x, but it means that this is an Lp function in x (in particular, it is finite foralmost all x).

Now for arbitrary f and g we want to prove that∫f(y)g(x− y)dy (7.17)

25

defines an Lp function, in particular that this integral is meaningful for almost all x. But thisintegral is clearly dominated pointwise (in x) by the integral

∫|f(y)||g(x− y)|dy (7.18)

and we know that this latter is in Lp by the argument above for nonnegative f, g. In particular,for almost all x, the function

y → |f(y)||g(x− y)|

is integrable, thus for almost all x the function

y → f(y)g(x− y)

is in L1. Therefore the integral (7.17) is meaningful for almost all x and then to check thatit is in Lp as a function of x, it is enough to show that it has a nonnegative majorant in Lp.But clearly (7.18) majorates (7.17) and it is in Lp.

Remark on the proof of Theorem 7.8 of the general case.The proof of this theorem for the general case requires to know that the dual space of Lr is

Lr′ (which is proved in functional analysis) then f ⋆ g will be identified by its integral againstany h ∈ Lr function, i.e. by ∫

(f ⋆ g)(y)h(y)dy

which (modulo a sign flip) is exactly the double integral in Young’s inequality. Young’sinequality will tell us, that this double integral makes sense for any h ∈ Lr, moreover, it is abounded linear functional on Lr, therefore f ⋆ g can be identified with elements of Lr′ and thenorm of the functional is bounded by ‖f‖p‖g‖q.

Finally, we mention a stronger version of Young’s inequality. Suppose that the functiong(x− y) in (7.13) is given by g(x− y) = |x− y|−λ with some λ > 0. Since the power functionis not contained in any Lq space, Young’s inequality cannot be applied (or, (7.13) is a uselessstatement since the right hand side is infinite). Nevertheless the following strengthening alsoholds:

Theorem 7.9 (Hardy-Littlewood-Sobolev (HLS)) Let p, r > 1 and 0 < λ < d such that

1

p+λ

d+

1

r= 2 (7.19)

Suppose that f ∈ Lp(Rd) and h ∈ Lr(Rd). Then there exists a constant C, depending ond, λ, p such that ∣∣∣

∫

Rd

∫

Rd

f(x)h(y)

|x− y|λdxdy

∣∣∣ ≤ C‖f‖p‖h‖r. (7.20)

26

Remark 1. Notice that the condition λ < d is necessary, otherwise the local singularity atx ∼ y would not be integrable.

Remark 2. Comparing (7.19) with (7.12), we see that q corresponds to d/λ. The functiong(u) = |u|−λ is not in Lq(Rd), since the integral

‖g‖qq =

∫

Rd

(|u|−λ

)q

du

always diverges, but it is the “closest” to be finite if q = d/λ, since in this case both singularities(at u ≈ 0 and at u ≈ ∞) are only logarithmically divergent. HLS states that this logarithmicdivergence, apparently present in Young’s inequality, can be neglected.

The proof of the HLS inequality is somewhat more involved than Young’s inequality andwe will not present it here. Interested students can read the proof from Lieb-Loss, Theorem4.3.

We close this section by showing a simple physical application of HLS. Consider a contin-uous charge distribution (x) in d = 3 dimensions. The self-energy is given by

SE() :=

∫

R3

∫

R3

(x)(y)

|x− y|dxdy

Obviously, if the charge distribution has a too strong local singularity, then the self-energy isinfinite. The question: how strong singularity can be allowed. E.g. if

(x) :=1(|x| ≤ 1)

|x|β(7.21)

for some β > 0 (the localization is to ensure that there is no divergence at infinity), then

SE() =

∫

|x|≤1

∫

|y|≤1

1

|x|β|x− y||y|βdxdy

How to decide the threshold β? Here is a back-of-the-envelope calculation. Consider theregime where |x| ∼ 2−k and |y| ∼ 2−k, more precisely the following part of the above integral

SEk :=

∫

2−k−1≤|x|≤2−k

∫

2−k−1≤|y|≤2−k

1

|x|β|x− y||y|βdxdy

In the designated integration domain the integrand behaves as

1

|x|β|x− y||y|β& (2k)β · 2k · (2k)β = (2k)2β+1

27

The volume of integration is of order (2−k)6. Thus

SE() ≥∞∑

k=1

SEk =

∞∑

k=1

(2k)2β+1 · (2−k)6 =

∞∑

k=1

(2k)2β−5

This sum diverges if β ≥ 5/2. Although the sum above converges if β < 5/2, it still doesnot prove immediately that SE() is finite since we left out all the integration regimes where|x| ∼ 2−k and |y| ∼ 2−ℓ with k 6= ℓ.

Homework 7.10 Prove that SE() <∞ if is given by (7.21) with β < 5/2.

Even with this homework at hand, we do not have a general condition on the singularityof to ensure the finiteness of SE(), since we considered only of a special form. However,the HLS inequality immediately gives the answer:

SE() ≤ C‖‖26/5

(check the exponent!). In other words, ∈ L6/5 guarantees that the self-energy is finite. Notethat the threshold β = 5/2 found above is exactly the threshold where the local singularity|u|−β becomes L6/5-divergent.

8 Interpolations: Riesz-Thorin, Hausdorff-Young

Consider a linear operator T that maps functions from a measure space (Ω, µ) to another one(Ω′, µ′). We say that T is bounded from Lp to Lq if its norm

‖T‖p→q = ‖T‖Lp(Ω,µ)→Lq(Ω′,µ′) := sup‖Tf‖Lq(Ω′,µ′) : ‖f‖Lp(Ω,µ) = 1

viewed as a map from Lp(Ω, µ) to Lq(Ω′, µ′) is finite.The following theorem is an important tool to extend linear operators from one Lp space to

another (over the same measure space). We will not give its proof (it is found, e.g. in Reed-Simon, Appendix to IX.4) which a beautiful application of the three-line (or three circle)theorem in complex analysis.

Theorem 8.1 (Riesz-Thorin) Suppose we have a linear operator T mapping functions fromone measure space (Ω, µ) to another (Ω′, µ′). Suppose that there are exponents 1 ≤ p0, q0, p1, q1 ≤∞ (not necessarily conjugate ones) such that T is a linear transformation from Lp0(Ω, µ) ∩Lp1(Ω, µ) to Lq0(Ω′, µ′) ∩ Lq1(Ω′, µ′) and it is bounded from Lp0 to Lq0 and also bounded fromLp1 to Lq1, i.e.

‖T‖p0→q0 <∞, ‖T‖p1→q1 <∞

28

For any 0 ≤ t ≤ 1 define a new pair of exponents pt, qt by

1

pt=

1 − t

p0+

t

p1,

1

qt=

1 − t

q0+

t

q1.

Then T is bounded from Lpt(Ω, µ) to Lqt(Ω′, µ′) with norm satisfying

‖T‖pt→qt ≤ ‖T‖1−tp0→q0

‖T‖tp1→q1

.

For those who are interested, here is the three-circle theorem (the proof can be found inany decent complex analysis book).

Theorem 8.2 (Hadamard) Let f be an analytic (holomorphic) function on an open neigh-borhood of the annulus A(r1, r3) := z : r1 ≤ |z| ≤ r3. Define

M(r) := max|f(z)| : |z| = r

to be the maximum of |f | on the circle of radius r. Then for any r2 with r1 < r2 < r3 we have

M(r2) ≤[M(r1)

] log(r3/r2)log(r3/r1)

[M(r3)

] log(r2/r1)log(r3/r1)

or, in logarithmic form

logM(r2) ≤log(r3/r2)

log(r3/r1)logM(r1) +

log(r2/r1)

log(r3/r1)logM(r3),

or, in other words, logM(r) is a convex function of log r.

We show two applications of Riesz-Thorin. First, we give a second proof of Theorem 7.8.Suppose first that p−1 + q−1 = 1, f ∈ Lp, g ∈ Lq. Then by Holder inequality we have

‖f ⋆ g‖∞ ≤ supx

∫|f(y)||g(y− x)|dy ≤ ‖f‖p‖g‖q (8.22)

so Theorem 7.8 holds for r = 1 (i.e. r′ = ∞).Next, suppose that f, g ∈ L1, i.e. p = q = 1. Then trivially

‖f ⋆ g‖1 ≤

∫ ∫|f(y)||g(y− x)|dydx = ‖f‖1‖g‖1

(by Fubini that is applicable for nonnegative functions), i.e. Theorem 7.8 holds for r = ∞(i.e. r′ = 1).

29

Now we fix f ∈ L1 and consider the map

Tfg := f ⋆ g

Clearly Tf is a bounded map from L1 to L1 (with norm at most ‖f‖1) and it is also a boundedmap from L∞ to L∞ (with norm at most ‖f‖1). Using Riesz-Thorin, we see that Tf is also abounded map from Lp to Lp with a norm at most ‖f‖1, i.e.

‖Tfg‖p = ‖f ⋆ g‖p ≤ ‖f‖1‖g‖p, 1 ≤ p ≤ ∞. (8.23)

Let again p−1 + q−1 = 1. Now we fix g ∈ Lp. Then the map Tg (defined by Tgf = g ⋆ f =f ⋆ g) satisfies

Tg : L1 → Lp, ‖Tg‖1→p ≤ ‖g‖p

Tg : Lq → L∞, ‖Tg‖q→∞ ≤ ‖g‖p.

(the first relation is from (8.23), the second is from (8.22) after interchanging p and q). Now weuse Riesz-Thorin again to interpolate between L1 and Lq spaces to conclude that Tg : Lr → Ls

is also bounded with norm at most ‖g‖p, where r−1 = 1 − t/p and s−1 = (1 − t)/p (here weused that p and q are dual). After eliminating t, we find the relation p−1 + r−1 = s−1 + 1among the exponents p, r, s, and they can be arbitrary numbers between 1 and ∞. Thus weproved that

‖Tgf‖s = ‖f ⋆ g‖s ≤ ‖f‖r‖g‖p

which is (7.15) after renaming the exponents. 2.

Another application is the following basic result on extending Fourier transform to Lp:

Theorem 8.3 (Hausdorff-Young) Suppose that 1 ≤ p ≤ 2 and q its dual exponent. TheFourier transform F defined on Rd extends to a bounded map from Lp to Lq, i.e. it holds

‖f‖q ≤ ‖f‖p.

Proof. Check directly that ‖f‖∞ ≤ ‖f‖1 and ‖f‖2 = ‖f‖2 (Parseval), then apply Riesz-Thorin. 2

9 Approximation by C∞0 functions

The goal is to prove the following basic approximation theorem. Recall that for any opendomain Ω ⊂ Rd we denote by C∞

0 (Ω), the set of compactly supported, smooth (=infinitelymany times differentiable) functions:

C∞0 (Ω) :=

f : Ω → C : supp(f) ⊂ Ω is compact, ∂α1

1 ∂α22 . . . ∂αd

d f(x) exists ∀x ∈ Ω, ∀αj ∈ N

30

(Some books use the notation C∞c (Ω).)

WARNING: Recall the precise definition of the support (Trager) of a continuous function

supp(f) := x ∈ Rd : f(x) 6= 0

i.e. it is the closure of all points where f does not vanish.In particular, since Ω is open, a function with compact support in Ω must vanish in a

neighborhood of the boundary.

Theorem 9.1 Let Ω ⊂ Rd be a non-empty open set and let 1 ≤ p < ∞. Then C∞0 (Ω) is

dense in the space Lp(Ω, dx) equipped with the Lp norm.

In particular, from this theorem it follows that C[0, 1] is dense in Lp[0, 1] for any Lp

norm if p < ∞. Note that equipped with the supremum (or L∞) norm, (C[0, 1], L∞) is notdense in (L∞[0, 1], L∞) because both spaces are complete and they are obviously not equal.Summarizing the conclusions of Riesz-Fischer theorem and Theorem 9.1, we obtain

Corollary 9.2 Let 1 ≤ p <∞ and Ω ⊂ Rd be open. Then the completion of C∞0 (Ω) equipped

with the Lp norm is Lp(Ω).

Homework 9.3 Let Ω ⊂ Rd be open. Show that the completion of C∞0 (Ω) equipped with the

supremum norm is the space f ∈ C(Ω) : f(∂Ω) ≡ 0 (where ∂Ω denotes the boundary ofΩ).

Proof of Theorem 9.1. We will show the proof for Ω = Rd, the general case will behomework.

Choose an arbitrary function j ∈ L1(Rd) with∫j = 1. Define

jε(x) := ε−dj(xε

)

Note that ∫jε = 1, ‖j‖1 = ‖jε‖1 (9.24)

(this is how the normalization was chosen) and as ε → 0, the function jε is more and moreconcentrated and peaky around the origin.

Let f ∈ Lp, 1 ≤ p <∞ and define

fε(x) := (f ⋆ jε)(x) =

∫f(y)jε(x− y)dy

31

According to Theorem 7.8, fε is an Lp function and

‖fε‖p ≤ ‖f‖p‖j‖1 (9.25)

(we used ‖j‖1 = ‖jε‖1 and we used only the special case of Theorem 7.8 that we proved).Since jε is very strongly concentrated around 0 with a total integral 1, we expect that fε isclose to f . This is the content of the

Proposition 9.4 Assuming f ∈ Lp, 1 ≤ p <∞, we have

limε→0

‖f − fε‖p = 0

Proof of Proposition 9.4. The proof consists of several standard steps. We will go throughthem, because the similar arguments very often used in analysis, and usually they are notexplained in details, it is usually referred to as “by standard approximation arguments” andit is assumed that everybody went through such a proof in his/her life.

Step 1. We show that it is sufficient to prove the Proposition if j has compact support.For any sufficiently large R, we define

jR(x) := CRχ(|x| < R)j(x)

(here R is not a power, but an upper index), where χ(|x| < R) is the characteristic functionof the ball |x| < R and CR is the normalization

CR :=(∫

|x|<R

j(x)dx)−1

to ensure that∫jR = 1. Obviously, CR → 1 as R→ ∞. As before, we define

jRε (x) := ε−djR(x/ε)

Then, by using j − jR = [(1 − χ) + (1 − CR)χ]j, we have

‖jε − jRε ‖1 = ‖j − jR‖1 ≤

∫

|x|≥R

|j| + |CR − 1|

∫

|x|≤R

|j| → 0

as R → ∞ uniformly in ε. Therefore, by inequality (9.25) (that is basically a special case ofYoung’s inequality, but in fact can be proved only from Holder, see Theorem 7.8), we have

‖jε ⋆ f − jRε ⋆ f‖p ≤ ‖f‖p‖jε − jR

ε ‖1 → 0

32

uniformly in ε as R → ∞. This shows that one can replace j with a compactly supportedversion jR and the error can be made arbitrarily small.

This technique is called cutoff at infinity.

Step 2. With an almost identical (actually somewhat easier) cutoff argument, it is suffi-cient to show the Proposition for compactly supported f (HOMEWORK: think it over).

Step 3. Now we show that it is sufficient to prove the theorem for bounded f . We againuse a cutoff argument, but now not in the domain (x-space) but in the range. For a sufficientlylarge positive h we define

fh(x) := f(x)χx : |f(x)| ≤ h

Again, by (9.25) and (9.24), we have

‖jε ⋆ f − jε ⋆ fh‖p ≤ ‖j‖1‖f − fh‖p

and clearly ‖f − fh‖p → 0 as h→ ∞. The estimate is again uniform in ε.

Step 4. Now we show that it is sufficient to prove the Proposition for p = 1. Indeed, forany 1 < p <∞ we have

‖jε ⋆ f − f‖pp =

∫ ∣∣∣∫jε(x− y)f(y)dy − f(x)

∣∣∣p

dx

We can estimate ∣∣∣∫jε(x− y)f(y)dy − f(x)

∣∣∣p−1

≤ C‖f‖p−1∞

where C := (‖j‖1 + 1)p−1, thus

‖jε ⋆ f − f‖pp ≤ C‖f‖p−1

∞

∫ ∣∣∣∫jε(x− y)f(y)dy − f(x)

∣∣∣dx = C‖f‖p−1∞ ‖jε ⋆ f − f‖1

Thus it is sufficient to show ‖jε ⋆ f − f‖1 → 0 as ε → 0. One also should check that f ∈ Lp

condition can be translated to f ∈ L1, but we already assumed that f is compactly supportedand bounded, so it is any Lp space.

Step 5. It is sufficient to prove the Proposition for simple functions of the form

f =∑

i

ciχRi

where the sum is finite, ci ∈ C and Ri’s are rectangles. To see this, we recall that theset of simple functions of this form are dense in L1, in other words any L1-function can be

33

approximated by them in L1-sense. (This fact follows from the construction of the Lebesgueintegral plus the regularity of the Lebesgue measure plus from the fact that any open set inRd can be approximated by rectangles – THINK IT OVER!)

For any given f ∈ L1, let fn be a sequence of simple functions such that fn → f in L1.Suppose that the Proposition is proven for every fn. Then

‖jε ⋆ f − f‖1 ≤ ‖jε ⋆ (f − fn)‖1 + ‖jε ⋆ fn − fn‖1 + ‖fn − f‖1

≤ (‖j‖1 + 1)‖fn − f‖1 + ‖jε ⋆ fn − fn‖1

For any given η > 0, the first term can be made smaller than η/2 by choosing n sufficientlylarge and this choice is uniform in ε. After choosing n sufficiently large, we can fix it andchoose ε sufficiently small so that the second term becomes smaller than η/2. Thus ‖jε⋆f−f‖1

can be made smaller than any given η if ε is sufficiently small, and this proves Step 5.

Step 6. By linearity of the convolution and the triangle inequality of the norm, it issufficient to prove the Proposition for f = χR, i.e. for the characteristic function of a singlerectangle.

Step 7. By an explicit calculation:

‖jε ⋆ χR − χR‖1 =

∫ ∣∣∣∫jε(x− y)χR(y)dy − χR(x)

∣∣∣dx

=

∫ ∣∣∣∫jε(x− y)(χR(y) − χR(x))dy

∣∣∣dx

Notice the trick of bringing the second term χR(x) inside the integration by using that∫jε = 1.

The integrand jε(x−y)(χR(y)−χR(x)) is explicitly zero unless dist(x, ∂R) ≤ εℓ, where ∂Ris the boundary of R and j is supported in a ball of radius ℓ. This is because the first factorin jε(x − y)(χR(y) − χR(x)) is zero whenever |x − y| ≥ εℓ and the second factor is nonzeroonly if exactly one of the two points x, y lies in R. Therefore

‖jε ⋆ χR − χR‖1 =

∫

dist(x,∂R)≤εℓ

∣∣∣∫jε(x− y)(χR(y) − χR(x))dy

∣∣∣dx

≤ 2‖jε‖1

∫

dist(x,∂R)≤εℓ

1dx = 2‖j‖1volx : dist(x, ∂R) ≤ εℓ → 0

as ε → 0 since the volume of an εℓ neighborhood of the boundary of a fixed rectangle R is oforder ε (here ℓ is fixed). This completes the proof of Proposition 9.4. 2

34

From Proposition 9.4 our Theorem 9.1 easily follows. Simply consider a smooth, compactlysupported function j ∈ L1 with

∫j = 1. If f has a compact support, then so does

fε(x) =

∫jε(x− y)f(y)dy

and by a similar argument as in Step 2. above, it is sufficient to prove Theorem 9.1 forcompactly supported f (THINK IT OVER). Since fε → f in Lp, Theorem 9.1 will be provenonce we show that fε ∈ C∞

0 . We will show that

∂fε

∂xj=∂jε∂xj

⋆ f (9.26)

i.e. convolution fε = jε ⋆ f can be differentiated such that we differentiate one factor. Thedifferentiability up to arbitary order will then follow by induction.

To show (9.26), we form the difference quotient on the left hand side after changing vari-ables ∫

jε(. . . , yj + δ, . . .) − jε(. . . , yj, . . .)

δf(x− y)dy

The fraction in the integrand converges to ∂jε

∂xj(y) pointwise, and it is also uniformly bounded

in δ (here ε is fixed!) since jε is smooth and compactly supported, thus its first derivativesare bounded (and the first derivatives control the difference quotients by the Taylor formulawith remainder term, THINK IT OVER!) Thus by dominated convergence theorem we obtain(9.26) and this completes the proof for the case Ω = Rd.

Homework 9.5 The above proof was for Ω = Rd. Prove the theorem for any open set Ω.[Hint: show that there exists an increasing sequence of compact sets, K1 ⊂ K2 ⊂ . . . ⊂ Ω suchthat if fn := fχKn, then ‖f −fn‖Lp(Ω) → 0 as n→ ∞. Apply the construction described abovefor each fn, the construction shows that the support of the approximating functions of fn canbe chosen arbitrarily close to the support of fn, in particular it can be chosen in an arbitrarysmall neighborhood of Kn, i.e. it can be chosen within Ω.]

This completes the proof of the approximation theorem. 2

10 Schwarz functions

We have seen that C∞0 functions play an important role: on one hand it is very easy to

manipulate with them (together with all their derivatives they are all bounded, integrable

35

etc.), on the other hand they are typically dense in larger spaces. Density allows us toprove results (especially inequalities) just by checking them on C∞

0 functions, then they willautomatically extend to all elements in the larger class. Recall that working in larger classes(like Lp) is necessary, because they are often complete, while C∞

0 is not.There is another very convenient class of functions that are used for similar purposes

instead of C∞0 . They are the Schwarz functions, or smooth functions with rapid decay,

and they are defined as follows:

Definition 10.1 Denote by α = (α1, α2, . . . , αd) a d-tuple of nonnegative integers, we willcall it d-dimensional multi-index. The size or order of α is |α| :=

∑dj=1 αj. Given x =

(x1, x2, . . . , xd) ∈ Rd, and given α, a d-dimensional multi-index, we introduce the shorthandnotation xα := xα1

1 xα22 . . . xαd

d for the monomial of coordinates and

Dα :=∂|α|

∂xα11 ∂x

α22 . . . ∂xαd

d

.

The space of smooth functions with rapid decrease on Rd is defined as

S(Rd) :=f ∈ C∞(Rd) : ‖f‖α,β <∞ for ∀α, β multiindices

where we define‖f‖α,β := sup

x∈Rd

|xαDβf(x)|. (10.27)

It is easy to see (CHECK) that for any fixed α, β, the expression ‖f‖α,β defined in (10.27)is indeed a semi-norm, so the notation is justified (a seminorm enjoys the same properties asthe norm, except that ‖x‖ = 0 does not imply x = 0.) Still, S(Rd) is not a normed space,since we require the boundedness of all ‖f‖α,β seminorms. However, it has a natural topology,generated by the sub-basis of neighborhoods of 0 consisting of neighborhoods of the form

Nα,β,ε := f ∈ S(Rd) : ‖f‖α,β < ε

as α, β run through all possible multi-indices, and ε runs through all positive numbers. It iseasy to check that

(i) Each seminorm ‖ · ‖α,β is continuous in this topology;(ii) The pointwise sum and the pointwise product of two functions is a continuous map

S(Rd) × S(Rd) → S(Rd).(iii) A sequence fn converges to f in this topology if and only if ‖fn − f‖α,β → 0 (as

n→ ∞) for any multi-indices α, β.

36

In other words, S(Rd) is a topological vectorspace whose topology is generated by the familyof seminorms ‖ · ‖α,β. Moreover the Schwarz functions also form an algebra under pointwisemultiplication (however, this algebra has no unit element, since the identically 1 function isnot in S).

It is also trivial to see that

C∞0 (Rd) ⊂ S(Rd) ⊂ Lp(Rd)

for any 1 ≤ p ≤ ∞.The Schwarz functions are typically useful in the whole Rd, unlike the C∞

0 functions thatare used also on a domain. This would indicate that their application is more restrictive.However, they have a very important property that C∞

0 functions do not:

Theorem 10.2 The Fourier transform is a bijection on Schwarz functions.

We leave the proof as an (easy) exercise.

11 H1 Sobolev spaces

Let Ω ⊂ Rd be a domain (i.e. an open subset), possibly Ω = Rd.

Definition 11.1 Let f ∈ L2(Ω). We say that f belongs to the Sobolev space H1(Ω) ifthere exists a vector valued function g = (g1, g2, . . . , gd) ∈ L2(Ω → Cd) such that g is thedistributional (or weak) gradient of f , i.e. if

∫

Ω

f∂φ

∂xj= −

∫gjφ (11.28)

holds for all j = 1, 2, . . . , d and for all φ ∈ C∞0 (Ω) functions. We will use the notation g = ∇f ,

but here ∇f is not defined via difference quotients (as the usual gradient for differentiablefunctions), but it is defined by the property (11.28).

We remark that higher order distributional (or weak) derivatives can be defined anal-ogously. The motivation behind (11.28) is that for continuously differentiable functionsf ∈ C1(Ω) this identity just follows from integration by parts. For less regular f integra-tion by parts is not directly available, but in many cases we can define the gradient ∇f suchthat the integration by parts identity holds; in fact this identity will be the guiding principlefor the definition.

The following properties can be checked directly:

37

1) The distributional gradient, if it exists, it is uniquely defined.

2) If f ∈ C1(Ω)∩L2(Ω), then f ∈ H1(Ω) and the usual “old” gradient (defined via differencequotient) and the distributional gradient coincide.

3) The space H1(Ω) is a linear subspace of L2(Ω).

4) The space H1(Ω) can be equipped with a natural scalar product

〈f, g〉H1(Ω) := 〈f, g〉L2(Ω) + 〈∇f,∇g〉L2(Ω)

and with the norm

‖f‖H1(Ω) :=√〈f, f〉H1(Ω)

5) The space H1(Ω) with the above scalar product is a Hilbert space, in particular it iscomplete.

6) The gradient operator ∇ is a bounded linear operator from H1(Ω) to L2(Ω) with norm1.

7) The Leibniz rule holds in the following form: if f ∈ H1(Ω) and ψ ∈ C∞, then fψ ∈H1(Ω) and its distributional gradient is given by

∇(fψ) = f∇ψ + ψ∇f

8) The chain rule holds if the outer function satsifies some restrictions. More precisely,let G : CN → C be a differentiable function with bounded and continuous derivatives.If u(x) = (u1(x), u2(x), . . . , uN(x)) denotes N functions in H1(Ω), then the functionK : Ω → C defined by

K(x) = (G u)(x) = G(u(x))

is in H1(Ω) with the extra assumption that if |Ω| = ∞, then G(0) = 0. Moreover, thedistributional derivative of K is given by

∂K

∂xj=

N∑

k=1

∂G

∂sk(u) ·

∂uk

∂xj

9) Integration by parts holds in H1(Rd), i.e. if u, v ∈ H1(Rd), then∫

Rd

u∂v

∂xj= −

∫

Rd

∂u

∂xjv

for j = 1, 2, . . . d. The integration by parts for H1(Ω) functions is more complicatedsince the boundary term needs to be taken into account.

38

Note that H1(Ω) is a proper subspace of L2(Ω), as it is easy to construct an L2 functionthat is not in H1 (EXERCISE!). For “reasonable” domains Ω the H1(Ω) space is dense inL2(Ω) (notice that H1 with its own H1 norm is closed, since H1 is a complete space, butH1 is not closed in the L2-norm; its closure is L2). We will not give a precise definition for“reasonable”, but essentially any domain without kinks and spikes is “reasonable”. Howeverwe formulate the theorem for Ω = Rd:

Theorem 11.2 (Meyers-Serrin) C∞0 (Rd) is dense in H1(Rd), i.e. H1 the completion of

C∞0 (Rd) in the H1-norm.

The proof uses a similar regularization via a convolution as we have seen for the proof thatC∞

0 is dense in Lp. We will not give the details, see Theorem 7.6 of Lieb-Loss.

Remark. Notice that there are two natural ways to define the Sobolev space. The first routeis what we followed here: we defined H1 as the set of L2 functions that have distributionalgradient in L2. The second way is to consider the completion of C∞

0 in the H1 norm usingthe abstract result that every normed space can be completed. Apriori it is not obvious thatthese two procedures lead to the same space, but Meyers-Serrin theorem guarantees that theydo, at least for Ω = Rd. For more details, see Lieb-Loss, Section 7.

The H1(Rd) Sobolev space can be very conveniently characterized by Fourier transformvia the following theorem (Theorem 7.9 of Lieb-Loss):

Theorem 11.3 Let f ∈ L2(Rd) and let f be its Fourier transform. Then f ∈ H1(Rd) if and

only if the function k → |k|f(k) is in L2(Rd). In this case the Fourier transform of ∇f isgiven by

∇f(k) = 2πikf(k)

and

‖f‖2H1 =

∫(1 + 4π2|k|2)|f(k)|2dk.

Note that this characterization immediately shows that H1 is a Hilbert space, in fact H1

can be identified with the L2 space on Rd with the measure (1 + 4π2|k|2)dk instead of theusual Lebesgue measure.

Both formulas in the theorem are easy to check for f ∈ C∞0 functions, the rest is standard

density argument. For details, see Lieb-Loss.

39

12 Sobolev inequalities

We already mentioned the most important Sobolev inequality,

‖f‖6 ≤ C‖∇f‖2

for f ∈ C10 (R3), i.e. for “nice” functions in d = 3 dimensions. Using density argument, this

inequality shows thatH1(R3) ⊂ L6(R3)

and the imbedding of H1 into L6 is continuous (by this we mean that the identity operator,viewed as a map from H1 into L6 is bounded). This inequality can be generalized in higherdimensions:

Theorem 12.1 (Sobolev inequality for H1(Rd), d ≥ 3) There exists a constant Sd, de-pending only on the dimension d ≥ 3 such that for any f ∈ C1

0(Rd) we have

‖f‖p ≤ Sd‖∇f‖2 (12.29)

where p = 2dd−2

, in particular the imbedding

H1(Rd) ⊂ Lp(R3)

is continuous.

Remark 1. The exponent p = 2dd−2

comes from scaling; this is the only possible exponentfor which the two sides of (12.29) scales in the same way in the length ℓ:

(∫|f |pdx

)1/p

∼ ℓd/p,(∫

|∇f |2dx)1/2

∼ (ℓ−2ℓd)1/2

and comparing the exponents, dp

= −2+d2

gives the only possible value of p. Note that thisconsistency check does not prove the inequality, it just shows that it is not obviously wrong.

Remark 2. The other direction in the inequality (12.29) is wrong: there is no way tocontrol the derivative with an Lp norm. Construct a counterexample (take a small, butheavily oscillating function).

Remark 3. The inequality is wrong for d = 2. Scaling gives p = ∞ as the only possibleexponent, but the possible bound

‖f‖∞ ≤ (const)‖∇f‖2

40

in d = 2 dimensions is wrong. It is an exercise to check that the function f(x) = log∣∣ log |x|

∣∣ ·χ(x) is a counterexample, where χ(x) is a smooth, compactly supported cutoff function whosesupport contains 0 (for simplicity, assume that χ is radial, i.e. it depends only on |x|).

Remark 4. The best constant is known (see Theorem 8.3 of Lieb-Loss) and the optimizers(for which the Sobolev inequality with the best constant becomes equality) are also known,these functions of the form

f(x) = (x2 + 1)−d−22

and their trivial modifications (translates or multiples).Remark 5. Notice that the Sobolev inequality (12.29) did not really use the full H1-norm,

only the L2-norm of the gradient was necessary. In principle, one can ask if (12.29) holds forany functions f such that ∇f ∈ L2. The answer is: not quite. We do not need f ∈ L2, butthe only additional information needed is that f has to go to zero at infinity just to avoid thetrivial f = const case (which would be a counterexample to the extension of (12.29)).

The inequality (12.29) holds only for d ≥ 3 dimensions. What about lower dimensionsd = 1, 2? In fact, for these dimensions the Sobolev inequality is stronger, but it has a slightlydifferent form, in particular controlling ∇f alone is not enough, we really need the full H1-norm. On the other hand, we can control higher Lp norms.

Theorem 12.2 (Sobolev inequality for H1(R) ) Suppose that f ∈ C10(R). Then

‖f‖∞ ≤1

2‖f‖H1 =

1

2

(‖f‖2

2 + ‖f ′‖22

)(12.30)

and f is Holder continuous with Holder exponent 1/2, i.e. it holds that

|f(x) − f(y)| ≤ ‖f ′‖2 · |x− y|1/2.

After the density argument, this result implies that any function f ∈ H1(R) is bounded andcontinuous (in fact Holder continuous). In particular, the imbedding H1(R) ⊂ C(R) is con-tinuous.

Proof. By fundamental theorem of calculus,

f(x)2 =

∫ x

−∞

2f ′(y)f(y)dy, f(x)2 = −

∫ ∞

x

2f ′(y)f(y)dy

so

f(x)2 =

∫ x

−∞

f ′f −

∫ ∞

x

f ′f

41

and thus

|f(x)|2 ≤

∫

R

|f | · |f ′| ≤ ‖f‖2‖f′‖2 ≤

1

2

(‖f‖2

2 + ‖f ′‖22

).

The Holder continuity follows from

|f(x) − f(y)| =∣∣∣∫ y

x

f ′(s)ds∣∣∣ ≤ ‖f ′‖2 · |x− y|1/2

after a Schwarz inequality. 2

Theorem 12.3 (Sobolev inequality for H1(R2) ) For any 2 ≤ q <∞ there exists a con-stant Cq such that

‖f‖Lq(R2) ≤ Cq‖f‖H1(R2) (12.31)

for any f ∈ C10(R

2). After the density argument, this result implies that the imbeddingH1(R2) ⊂ Lq(R2) is continuous for any 2 ≤ q <∞.

Remark. Note that q = ∞ is not allowed. It is (somewhat tricky) EXERCISE to constructa counterexample, i.e. an unbounded H1(R2) function.

Proof. The case q = 2 trivial, so we assume 2 < q < ∞, and let 1 < p < 2 its dualexponent, 1/p+ 1/q = 1. We compute the Fourier transform

‖f‖p =( ∫

R2

|f(k)|p)1/p

=( ∫

R2

∣∣f(k)(1 + 4π2k2)1/2∣∣p (1 + 4π2k2)−p/2

)1/p

dk

≤( ∫

R2

|f(k)|2(1 + 4π2k2)dk)1/2(∫

(1 + 4π2k2)−p2· 22−p dk

) 2−p2p

=Cq‖f‖H1(R2) (12.32)

where we used Holder inequality with exponents 2p

and its dual 22−p

in going to the third line.To arrive at the last line, we just evaluated the integral

∫(1 + 4π2k2)−

p2· 22−p dk

which is a finite constant (depending on q), since the integrand decays faster than the secondpower of the distance (note that p

2−p> 1 since p > 1 which is the place where q <∞ is used.

We have thus proved that‖f‖p ≤ Cq‖f‖H1(R2)

42

and now (12.31) follows from the Hausdorff-Young inequality (Theorem 8.3), stating that

‖f‖q ≤ ‖f‖p

(note that the condition 1 ≤ p ≤ 2 in Hausdorff-Young is satisfied). 2

The above Sobolev inequalities used H1 norm. We have not defined rigorously Sobolevspaces with other exponents (we will not need them here), but one can introduce the re-quirement that for a function f ∈ Lp, the weak derivative ∇f be also in Lp. This clearlygeneralizes Definition 11.1 from p = 2 to a general p. All statements remain valid, except thatthe corresponding space will not have a scalar product, so it will be only a Banach space andnot a Hilbert space. This space can be denoted by Hq (but the more general W 1,q notation isused, where 1 refers to the fact that only one derivative is taken).

This justifies the following generalization of the Sobolev inequality:

Theorem 12.4 (Sobolev inequality for Hq(Rd)) Let d ≥ 1 and q > d. There exists aconstant Sd,q, depending such that for any f ∈ C1

0 (Rd) we have

‖f‖p ≤ Sd,q‖∇f‖q (12.33)

where p = qdd−q

. In particular the imbedding

Hq(Rd) ⊂ Lp(Rd)

is continuous.

The exponent p = qdd−q

is again unique and is given by the scaling argument (CHECK).

Proof. We will give the proof of (12.33) for the d = 3 case; the general case is analogous,just needs longer formulas (THINK IT OVER). We start with

f(x, y, z) =

∫ x

−∞

∂xf(r, y, z)dr

(by fundamental theorem of calculus and using that f is compactly supported), implying that

|f(x, y, z)| ≤

∫ x

−∞

|∂xf(r, y, z)|dr =: g1(y, z)

and we can define g2(x, z) and g3(x, y) analogously. Therefore

|f(x, y, z)|3 ≤ g1(y, z)g2(x, z)g3(x, y).

43

We compute

‖f‖3/23/2 ≤

∫ √g1(y, z)

√g2(x, z)

√g3(x, y)dxdydz

≤

∫ √g1(y, z)

[ ∫ √g2(x, z)

√g3(x, y)dx

]dydz

≤

∫ √g1(y, z)

(∫g2(x, z)dx

)1/2(∫g3(x, y)dx

)1/2

dydz

≤

∫ (∫g1(y, z)dy

)1/2(∫g2(x, z)dx

)1/2(∫g3(x, y)dxdy

)1/2

dz

≤(∫

g1(y, z)dydz)1/2(∫

g2(x, z)dxdz)1/2(∫

g3(x, y)dxdy)1/2

(12.34)

where we used a Schwarz inequality in the dx integral going from the second line to the third,then we used a Schwarz inequality in the dy integral from the third line to the fourth andfinally a Schwarz inequality in the dz integral.

Noticing that ∫g1(y, z)dydz ≤ ‖∂xf‖1, etc.

we get

‖f‖3/23/2 ≤

(‖∂xf‖1‖∂yf‖1‖∂zf‖1

)1/3

≤ ‖∇f‖1

This completes the proof of (12.33) for d = 3, p = 32, q = 1 (and the constant turns out to be

Sd,q = 1). Exactly the same proof works for any d ≥ 2 and it gives q = 1, p = dd−1

.Now for the general case, we replace f by |f |s with some s > 0 to be chosen. Then

∥∥ |f |s∥∥

3/2≤

∥∥∇(|f |s)∥∥

1≤ s

∥∥|∇f | · |f |s∥∥

1≤ s‖∇f‖q ·

∥∥|f |s−1∥∥

q′

where q′ is the dual exponent to q, 1/q + 1/q′ = 1. Now we want∥∥ |f |s

∣∣3/2

in the left hand

side to be expressible in terms of ‖f‖p, i.e. we need 3s2

= p, but p = 3q3−q

, so s = 2q3−q

. Withthis choice of s we obtain

‖f‖2p/3p ≤ C‖∇f‖q ·

∥∥|f |s−1∥∥

q′(12.35)

Now we want to relate the ‖f‖p with the Lq′ norm of |f |s−1, i.e. we want (s− 1)q′ = p. Butthis is already guaranteed since

(s− 1)q′ = (s− 1)q

q − 1=

( 2q

3 − q− 1

) q

q − 1=

3q

3 − q= p.

44

Thus (12.35) becomes‖f‖2p/3

p ≤ C‖∇f‖q · ‖f‖p/q′

p

i.e.

‖f‖2p3− p

q′

p ≤ C · ‖∇f‖q

Computing the exponent

2p

3−p

q′=

(2

3−

1

q′

)p =

(2

3−

1

q′

) 3q

3 − q=

(1

q−

1

3

) 3q

3 − q= 1

so we obtained (12.33). Note that the last step was superfluous, since by scaling we alreadyknew that it has to come out right (no other power of ‖f‖p could yield a valid inequality). 2

45

Documents

Crash course in analysis - Mathematisches Institut der LMUlerdos/WS12/MQM/crashanal.pdf · Crash course in analysis L´aszl´o Erd˝os ... H.L. Royden: Real Analysis or Walter