MATH41112/61112 Ergodic Theory - School of …cwalkden/ergodic-theory/ergodic_theory.pdfEach question is worth 30 marks. ... There is no coursework, ... K. Petersen, Ergodic Theory

MATH41112/61112

Ergodic Theory

Charles Walkden

4th January, 2018

MATH4/61112 Contents

Contents

0 Preliminaries 2

1 An introduction to ergodic theory. Uniform distribution of real se-quences 4

2 More on uniform distribution mod 1. Measure spaces. 13

3 Lebesgue integration. Invariant measures 23

4 More examples of invariant measures 38

5 Ergodic measures: definition, criteria, and basic examples 43

6 Ergodic measures: Using the Hahn-Kolmogorov Extension Theorem toprove ergodicity 53

7 Continuous transformations on compact metric spaces 62

8 Ergodic measures for continuous transformations 72

9 Recurrence 83

10 Birkhoff’s Ergodic Theorem 89

11 Applications of Birkhoff’s Ergodic Theorem 99

12 Solutions to the Exercises 108

1

MATH4/61112 0. Preliminaries

0. Preliminaries

§0.1 Contact details

The lecturer is Dr Charles Walkden, Room 2.241, Tel: 0161 275 5805,Email: [email protected].

My office hour is: Monday 2pm-3pm. If you want to see me at another time then pleaseemail me first to arrange a mutually convenient time.

§0.2 Course structure

This is a reading course, supported by one lecture per week. I have split the notes intoweekly sections. You are expected to have read through the material before the lecture,and then go over it again afterwards in your own time. In the lectures I will highlightthe most important parts, explain the statements of the theorems and what they mean inpractice, and point out common misunderstandings. As a general rule, I will not normallygo through the proofs in great detail (but they are examinable unless indicated otherwise).You will be expected to work through the proofs yourself in your own time. All the materialin the notes is examinable, unless it says otherwise. (Note that if a proof is marked ‘notexaminable’ then it means that I won’t expect you to reproduce the proof, but you willbe expected to know and understand the statement of the result. If an entire section ismarked ‘not examinable’ (for example, the review of Riemann integration in §1.3, andthe discussions on the proofs of von Neumann’s Ergodic Theorem and Birkhoff’s ErgodicTheorem in §§9.6, 10.5, respectively) then you don’t need to know the statements of anysubsidiary lemmas/propositions in those sections that are not used elsewhere but readingthis material may help your understanding.)

Each section of the notes contains exercises. The exercises are a key part of the courseand you are expected to attempt them. The solutions to the exercises are contained in thenotes; I would strongly recommend attempting the exercises first without referring to thesolutions.

Please point out any mistakes (typographical or mathematical) in the notes.

§0.3 The exam

The exam is a 3 hour written exam. There are several past exam papers on the coursewebsite. Note that some topics (for example, entropy) were covered in previous years andare not covered this year; there are also some new topics (for example, Kac’s Lemma) thatwere not covered in 2010 or earlier.

The format of the exam is the same as last year’s. The exam has 5 questions, of whichyou must do 4. If you attempt all 5 questions then you will get credit for your best 4answers. The style of the questions is similar to last year’s exam as well as to ‘Section B’questions from earlier years.

2

MATH4/61112 0. Preliminaries

Each question is worth 30 marks. Thus the total number of marks available on theexam is 4× 30 = 120. This will then be converted to a mark out of 100 (by multiplying by100/120).

There is no coursework, in-class test or mid-term for this course.

§0.4 Recommended texts

There are several suitable introductory texts on ergodic theory, including

W. Parry, Topics in Ergodic Theory

P. Walters, An Introduction to Ergodic Theory

I.P. Cornfeld, S.V. Fomin and Ya.G. Sinai, Ergodic Theory

K. Petersen, Ergodic Theory

M. Einsiedler and T. Ward, Ergodic Theory: With a View Towards Number Theory.

Parry’s or Walter’s books are the most suitable for this course.

3

MATH4/61112 1. Uniform distribution mod 1

1. An introduction to ergodic theory. Uniform distributionof real sequences

§1.1 Introduction

A dynamical system consists of a space X, often called a phase space, and a rule thatdetermines how points in X evolve in time. Time can be either continuous (in which casea dynamical system is given by a first order autonomous differential equation) or discrete(in which case we are studying the iterates of a single map T : X → X).

We will only consider the case of discrete time in this course. Thus we will be studyingthe iterates of a single map T : X → X. We will write T n = T ◦ · · · ◦T (n times) to denotethe nth composition of T . If x ∈ X then we can think of T n(x), the result of applying themap T n times to the point x, as being where x has moved to after time n.

We call the sequence x, T (x), T 2(x), . . . , Tn(x), . . . the orbit of x. If T is invertible (andso we can iterate backwards by repeatedly applying T−1) then sometimes we refer to thedoubly-infinite sequence . . . , T−n(x), . . . , T−1(x), x, T (x), . . . , Tn(x), . . . as the orbit of xand the sequence x, T (x), . . . , Tn(x), . . . as the forward orbit of x.

As an example, consider the map T : [0, 1] → [0, 1] defined by

T (x) =

{

2x if 0 ≤ x ≤ 1/2,2x − 1 if 1/2 < x ≤ 1.

We call this the doubling map.Some orbits for the doubling map are periodic, i.e. they return to where they started

after a finite number of iterations. For example, 2/5 is periodic as

T (2/5) = 4/5, T (4/5) = 3/5, T (3/5) = 1/5, T (1/5) = 2/5.

Thus T 4(2/5) = 2/5. We say that 2/5 has period 4.In general, for a general dynamical system T : X → X, a point x ∈ X is a periodic

point with period n > 0 if T n(x) = x. (Note that we do not assume that n is least.) If x isa periodic point of period n then we call {x, T (x), . . . , Tn−1(x)} a periodic orbit of periodn.

Other points for the doubling map may have a dense orbit in [0,1]. Recall that a set Yis dense in [0, 1] if any point in [0, 1] can be arbitrarily well approximated by a point in Y .Thus the orbit of x is dense in [0, 1] if: for all x′ ∈ X and for all ε > 0 there exists n > 0such that |T n(x) − x′| < ε.

Consider a subinterval [a, b] ⊂ [0, 1]. How frequently does an orbit of a point under thedoubling map visit the interval [a, b]? Define the characteristic function χB of a set B by

χB(x) =

{

1 if x ∈ B,0 if x 6∈ B.

Thenn−1∑

j=0

χ[a,b](Tj(x))

4


denotes the number of the first n points in the orbit of x that lie in [a, b]. Hence

1

n

n−1∑

j=0

χ[a,b](Tj(x))

denotes the proportion of the first n points in the orbit of x that lie in [a, b]. Hence

limn→∞

1

n

n−1∑

j=0

χ[a,b](Tj(x))

denotes the frequency with which the orbit of x lies in [a, b]. In ergodic theory, one wants tounderstand when this is equal to the ‘size’ of the interval [a, b] (we will make ‘size’ preciselater by using measure theory; for the moment, ‘size’=‘length’). That is, when does

limn→∞

1

n

n−1∑

j=0

χ[a,b](Tj(x)) = b − a (1.1.1)

for every interval [a, b]? Note that if x satisfies (1.1.1) then the proportion of time that theorbit of x spends in an interval [a, b] is equal to the length of that interval; i.e. the orbit ofx is equidistributed in [0, 1] and does not favour one region of [0, 1] over another.

In general, one cannot expect (1.1.1) to hold for every point x; indeed, if x is periodicthen (1.1.1) does not hold. Even if the orbit of x is dense, then (1.1.1) may not hold.However, one might expect (1.1.1) to hold for ‘typical’ points x ∈ X (where again we canmake ‘typical’ precise using measure theory). One might also want to replace the functionχ[a,b] with an arbitrary function f : X → R. In this case one would want to ask: for thedoubling map T , when is it the case that

limn→∞

1

n

n−1∑

j=0

f(T j(x)) =

∫ 1

0f(x) dx?

The goal of the course is to understand the statement, prove, and explain how to applythe following result.

Theorem 1.1.1 (Birkhoff’s Ergodic Theorem)Let (X,B, µ) be a probability space. Let f ∈ L1(X,B, µ) be an integrable function. Supposethat T : X → X is an ergodic measure-preserving transformation of X. Then

limn→∞

1

n

n−1∑

j=0

f(T j(x)) =

∫

f dµ

for µ-a.e. point x ∈ X.

Ergodic theory has many applications to other areas of mathematics, notably hyperbolicgeometry, number theory, fractal geometry, and mathematical physics. We shall see someof the (simpler) applications to number theory throughout the course.

5


§1.2 Uniform distribution mod 1

Let T : X → X be a dynamical system. In ergodic theory we are interested in the long-termdistributional behaviour of the sequence of points x, T (x), T 2(x), . . .. Before studying thisproblem, we consider an analogous problem in the context of sequences of real numbers.

Let xn ∈ R be a sequence of real numbers. We may decompose xn as the sum of itsinteger part [xn] = sup{m ∈ Z | m ≤ xn} (i.e. the largest integer which is less than orequal to xn) and its fractional part {xn} = xn − [xn]. Clearly, 0 ≤ {xn} < 1. The study ofxn mod 1 is the study of the sequence {xn} in [0, 1].

Definition. We say that the sequence xn is uniformly distributed mod 1 (udm1 for short)if for every a, b with 0 ≤ a < b ≤ 1, we have that

limn→∞

1

ncard {0 ≤ j ≤ n − 1 | {xj} ∈ [a, b]} = b − a.

Remarks.

(i) Here, card denotes the cardinality of a set.

(ii) Thus xn is uniformly distributed mod 1 if, given any interval [a, b] ⊂ [0, 1], thefrequency with which the fractional parts of xn lie in the interval [a, b] is equal to itslength, b − a.

(iii) We can replace [a, b] by [a, b), (a, b] or (a, b) without altering the definition.

The following result gives a necessary and sufficient condition for the sequence xn ∈ R

to be uniformly distributed mod 1.

Theorem 1.2.1 (Weyl’s Criterion)The following are equivalent:

(i) the sequence xn ∈ R is uniformly distributed mod 1;

(ii) for every continuous function f : [0, 1] → R with f(0) = f(1) we have

limn→∞

1

n

n−1∑

j=0

f({xj}) =

∫ 1

0f(x) dx; (1.2.1)

(iii) for each ℓ ∈ Z \ {0} we have

limn→∞

1

n

n−1∑

j=0

e2πiℓxj = 0.

Remarks.

(i) As a grammatical point, criterion is singular (the plural is criteria). Weyl’s criterionis that (i) and (iii) are equivalent. Statement (ii) has been included because it is animportant intermediate step in the proof and, as we shall see, it closely resembles anergodic theorem.

(ii) One can replace the hypothesis that f is continuous in (1.2.1) with f is Riemannintegrable.

6


(iii) To prove that (i) is equivalent to (iii) we work, in fact, not on the unit interval [0, 1] buton the unit circle R/Z. To form R/Z, we work with real numbers modulo the integers(informally: we ignore integers parts). Note that, ignoring integer parts means that0 and 1 in [0, 1] are ‘the same’. Thus the end-points of the unit interval ‘join up’ andwe see that R/Z is a circle. More formally, R is an additive group, Z is a subgroupand the quotient group R/Z is, topologically, a circle. Note that the requirement in(ii) that f(0) = f(1) means that f : [0, 1] → R is a well-defined function on the circleR/Z.

It is, however, the case that (i) is equivalent to (ii) without the hypothesis in (ii) thatf(0) = f(1).

§1.2.1 The sequence xn = αn

The behaviour of the sequence xn = αn depends on whether α is rational or irrational.If α ∈ Q then it is easy to see that {αn} can take on only finitely many values in [0, 1].

Indeed, if α = p/q (p ∈ Z, q ∈ Z, q 6= 0,hcf(p, q) = 1) then {αn} takes the q values

0 =

{

0

q

}

,

{

p

q

}

,

{

2p

q

}

, . . . ,

{

(q − 1)p

q

}

as {qp/q} = 0. In particular αn is not uniformly distributed mod 1.If α 6∈ Q then the situation is completely different. We shall show that αn is uniformly

distributed mod 1 by applying Weyl’s Criterion. Let ℓ ∈ Z \ {0}. As α 6∈ Q we have thatℓα is never an integer; hence e2πiℓα 6= 1. Note that

1

n

n−1∑

j=0

e2πiℓxj =1

n

n−1∑

j=0

e2πiℓαj =1

n

e2πiℓαn − 1

e2πiℓα − 1

by summing the geometric progression. Hence

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

e2πiℓxj

∣

∣

∣

∣

∣

∣

=1

n

|e2πiℓαn − 1||e2πiℓα − 1| ≤ 1

n

2

|e2πiℓα − 1| . (1.2.2)

As α 6∈ Q, the denominator in the right-hand side of (1.2.2) is not 0. Letting n → ∞ wesee that

limn→∞

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

e2πiℓxj

∣

∣

∣

∣

∣

∣

= 0.

Hence xn is uniformly distributed mod 1.

Remarks.

1. More generally, we could consider the sequence xn = αn + β. It is easy to see bymodifying the above argument that xn is uniformly distributed mod 1 if and only ifα is irrational. (See Exercise 1.2.)

2. Fix α > 1 and consider the sequence xn = αnx. Then it is possible to show that for(Lebesgue) almost every x ∈ R, the sequence xn is uniformly distributed mod 1. Wewill prove this, at least for the cases when α = 2, 3, 4, . . ..

7


3. Suppose we set x = 1 in the above remark and consider the sequence xn = αn. Thenone can show that xn is uniformly distributed mod 1 for almost every α > 1. However,not a single example of such an α is known! Indeed, it is not even known if (3/2)n isdense mod 1.

§1.2.2 Proof of Weyl’s Criterion

We prove (i) implies (ii). Suppose that the sequence xn ∈ R is uniformly distributed mod 1.If χ[a,b] is the characteristic function of the interval [a, b], then we may rewrite the definitionof uniform distribution mod 1 as

limn→∞

1

n

n−1∑

j=0

χ[a,b]({xj}) =

∫ 1

0χ[a,b](x) dx.

From this we deduce that

limn→∞

1

n

n−1∑

j=0

g({xj}) =

∫ 1

0g(x) dx

whenever g is a step function, i.e., when g(x) =∑m

k=1 ckχ[ak ,bk](x) is a finite linear combi-nation of characteristic functions of intervals.

Now let f be a continuous function on [0, 1]. Then, given ε > 0, we can find a stepfunction g with ‖f − g‖∞ ≤ ε. We have the estimate

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f({xj}) −∫ 1

0f(x) dx

∣

∣

∣

∣

∣

∣

≤

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

(f({xj}) − g({xj}))

∣

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g({xj}) −∫ 1

0g(x) dx

∣

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∫ 1

0g(x) dx −

∫ 1

0f(x) dx

∣

∣

∣

∣

≤ 1

n

n−1∑

j=0

|f({xj}) − g({xj})| +

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g({xj}) −∫ 1

0g(x) dx

∣

∣

∣

∣

∣

∣

+

∫ 1

0|g(x) − f(x)| dx

≤ 2ε +

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g({xj}) −∫ 1

0g(x) dx

∣

∣

∣

∣

∣

∣

.

Since the last term converges to zero as n → ∞, we obtain

lim supn→∞

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f({xj}) −∫ 1

0f(x) dx

∣

∣

∣

∣

∣

∣

≤ 2ε.

Since ε > 0 is arbitrary, this gives us that

1

n

n−1∑

j=0

f({xj}) →∫ 1

0f(x) dx

8


as n → ∞.We now prove (ii) implies (iii). Suppose that f : [0, 1] → C is continuous and f(0) =

f(1). By writing f = Ref + iImf and applying (ii) to the real and imaginary parts of f wehave that

limn→∞

1

n

n−1∑

j=0

f({xj}) =

∫ 1

0f(x) dx.

For ℓ ∈ Z, ℓ 6= 0 we let f(x) = e2πiℓx. Note that, as exp is 2πi-periodic, f({xj}) = e2πiℓxj .Hence

limn→∞

1

n

n−1∑

j=0

e2πiℓxj =

∫ 1

0e2πiℓx dx =

1

2πiℓe2πiℓx

∣

∣

∣

1

0= 0,

as ℓ 6= 0.We prove (iii) implies (i). Suppose that (iii) holds. Then

limn→∞

1

n

n−1∑

j=0

g({xj}) =

∫ 1

0g(x) dx

whenever g(x) =∑m

k=1 cke2πiℓkx is a trigonometric polynomial, ck ∈ C, i.e. a finite linear

combination of exponential functions.Note that the space C(X, C) is a vector space: if f, g ∈ C(X, C) then f + g ∈ C(X, C)

and if f ∈ C(X, C), λ ∈ C then λf ∈ C(X, C). A linear subspace S ⊂ C(X, C) is an algebraif whenever f, g ∈ S then fg ∈ S. We will need the following result:

Theorem 1.2.2 (Stone-Weierstrass Theorem)Let X be a compact metric space and let C(X, C) denote the space of continuous functionsdefined on X. Suppose that S ⊂ C(X, C) is an algebra of continuous functions such that

(i) if f ∈ S then f ∈ S,

(ii) S separates the points of X, i.e. for all x, y ∈ X, x 6= y, there exists f ∈ S such thatf(x) 6= f(y),

(iii) for every x ∈ X there exists f ∈ S such that f(x) 6= 0.

Then S is uniformly dense in C(X, C), i.e. for all f ∈ C(X, C) and all ε > 0, there existsg ∈ S such that ‖f − g‖∞ = supx∈X |f(x) − g(x)| < ε.

We shall apply the Stone-Weierstrass Theorem with S given by the set of trigonometricpolynomials. It is easy to see that S satisfies the hypotheses of Theorem 1.2.2. Let f beany continuous function on [0, 1] with f(0) = f(1). Given ε > 0 we can find a trigonometricpolynomial g such that ‖f − g‖∞ ≤ ε. As in the first part of the proof, we can concludethat

limn→∞

1

n

n−1∑

j=0

f({xj}) =

∫ 1

0f(x) dx.

Now consider the interval [a, b] ⊂ [0, 1]. Given ε > 0, we can find continuous functionsf1 and f2 (with f1(0) = f1(1), f2(0) = f2(1)) such that

f1 ≤ χ[a,b] ≤ f2

9


and∫ 1

0f2(x) − f1(x) dx ≤ ε.

We then have that

lim infn→∞

1

n

n−1∑

j=0

χ[a,b]({xj}) ≥ lim infn→∞

1

n

n−1∑

j=0

f1({xj}) =

∫ 1

0f1(x) dx

≥∫ 1

0f2(x) dx − ε ≥

∫ 1

0χ[a,b](x) dx − ε

and

lim supn→∞

1

n

n−1∑

j=0

χ[a,b]({xj}) ≤ lim supn→∞

1

n

n−1∑

j=0

f2({xj}) =

∫ 1

0f2(x) dx

≤∫ 1

0f1(x) dx + ε ≤

∫ 1

0χ[a,b](x) dx + ε.

Since ε > 0 is arbitrary, we have shown that

limn→∞

1

n

n−1∑

j=0

χ[a,b]({xj}) =

∫ 1

0χ[a,b](x) dx = b − a,

so that xn is uniformly distributed mod 1. ✷

§1.2.3 Exercises

Exercise 1.1Show that if xn is uniformly distributed mod 1 then {xn} is dense in [0, 1]. (The converseis not true.)

Exercise 1.2Let α, β ∈ R. Let xn = αn + β. Show that xn is uniformly distributed mod 1 if and onlyif α 6∈ Q.

Exercise 1.3(i) Prove that log10 2 is irrational.

(ii) The leading digit of an integer is the left-most digit of its base 10 representation.(Thus the leading digit of 32 is 3, the leading digit of 1024 is 1, etc.) Show that thefrequency with which 2n has leading digit r (r = 1, 2, . . . , 9) is log10(1 + 1/r).

(Hint: first show that 2n has leading digit r if and only if

r10k ≤ 2n < (r + 1)10k

for some k ∈ N.)

Exercise 1.4Calculate the frequency with which the penultimate leading digit of 2n is equal to r, r =0, 1, 2, . . . , 9. (The penultimate leading digit is the second-to-leftmost digit in the base 10expansion. The penultimate leading digit of 2048 is 0, etc.)

10


§1.3 Appendix: a recap on the Riemann Integral

(This subsection is included for general interest and to motivate the Lebesgue integral.Hence it is not examinable.)

You have probably already seen the construction of the Riemann integral. This givesa method for defining the integral of suitable functions defined on an interval [a, b]. Inthe next section we will see how the Lebesgue integral is a generalisation of the Riemannintegral in the sense that it allows us to integrate functions defined on spaces more generalthan subintervals of R. The Lebesgue integral has other nice properties, for example it iswell-behaved with respect to limits. Here we give a brief exposition about some inadequaciesof the Riemann integral and how they motivate the Lebesgue integral.

Let f : [a, b] → R be a bounded function (for the moment we impose no other conditionson f).

A partition ∆ of [a, b] is a finite set of points ∆ = {x0, x1, x2, . . . , xn} with

a = x0 < x1 < x2 < · · · < xn = b.

In other words, we are dividing [a, b] up into subintervals.We then form the upper and lower Riemann sums

U(f,∆) =

n−1∑

i=0

supx∈[xi,xi+1]

f(x) (xi+1 − xi),

L(f,∆) =

n−1∑

i=0

infx∈[xi,xi+1]

f(x) (xi+1 − xi).

The idea is then that if we make the subintervals in the partition small, these sums willbe a good approximation to our intuitive notion of the integral of f over [a, b] as the areabounded by the graph of f . More precisely, if

inf∆

U(f,∆) = sup∆

L(f,∆),

where the infimum and supremum are taken over all possible partitions of [a, b], then wewrite

∫ b

af(x) dx

for their common value and call it the (Riemann) integral of f between those limits. Wealso say that f is Riemann integrable.

The class of Riemann integrable functions includes continuous functions and step func-tions (i.e. finite linear combinations of characteristic functions of intervals).

However, there are many functions for which one wishes to define an integral but whichare not Riemann integrable, making the theory rather unsatisfactory. For example, definef : [0, 1] → R by

f(x) = χQ∩[0,1](x) =

{

1 if x ∈ Q

0 otherwise.

Since between any two distinct real numbers we can find both a rational number and anirrational number, given 0 ≤ y < z ≤ 1, we can find y < x < z with f(x) = 1 and y < x′ < z

11


with f(x′) = 0. Hence for any partition ∆ = {x0, x1, . . . , xn} of [0, 1], we have

U(f,∆) =

n−1∑

i=0

(xi+1 − xi) = 1,

L(f,∆) = 0.

Taking the infimum and supremum, respectively, over all partitions ∆ shows that f is not

Riemann integrable.Why does Riemann integration not work for the above function and how could we go

about improving it? Let us look again at (and slightly rewrite) the formulæ for U(f,∆)and L(f,∆). We have

U(f,∆) =

n−1∑

i=0

supx∈[xi,xi+1]

f(x) λ([xi, xi+1])

and

L(f,∆) =

n−1∑

i=0

infx∈[xi,xi+1]

f(x) λ([xi, xi+1]),

where, for an interval [y, z],λ([y, z]) = z − y

denotes its length. In the example above, things did not work because dividing [0, 1] intointervals (no matter how small) did not ‘separate out’ the different values that f could take.But suppose we had a notion of ‘length’ that worked for more general sets than intervals.Then we could do better by considering more complicated ‘partitions’ of [0, 1], where bypartition we now mean a collection of subsets {E1, . . . , Em} of [0, 1] such that Ei ∩Ej = ∅,if i 6= j, and

⋃mi=1 Ei = [0, 1].

In the example, for instance, it might be reasonable to write

∫ 1

0f(x) dx = 1 × λ([0, 1] ∩ Q) + 0 × λ([0, 1]\Q)

= λ([0, 1] ∩ Q).

Instead of using subintervals, the Lebesgue integral uses a much wider class of subsets(namely, sets in a given σ-algebra) together with a notion of ‘generalised length’ (namely,measure).

12

MATH4/61112 2. More on uniform distribution. Measure spaces.

2. More on uniform distribution mod 1. Measure spaces

§2.1 Uniform distribution of sequences in Rk

We shall now look at the uniform distribution of sequences in Rk. We will say that a se-quence xn = (xn,1, . . . , xn,k) ∈ Rk is uniformly distributed mod 1 if, given any k-dimensionalcube, the frequency with which the fractional parts of xn lie in the cube is equal to its k-dimensional volume.

Definition. A sequence xn = (xn,1, . . . , xn,k) ∈ Rk is said to be uniformly distributedmod 1 if, for each choice of k intervals [a1, b1], . . . , [ak, bk] ⊂ [0, 1], we have that

limn→∞

1

n

n−1∑

j=0

card{j ∈ {0, 1, . . . , n − 1} | xj ∈ [a1, b1]× · · · × [ak, bk]} = (b1 − a1) · · · (bk − ak).

We have the following criterion for uniform distribution.

Theorem 2.1.1 (Multi-dimensional Weyl’s Criterion)Let xn = (xn,1, . . . , xn,k) ∈ Rk. The following are equivalent:

(i) the sequence xn ∈ Rk is uniformly distributed mod 1;

(ii) for any continuous function f : Rk/Zk → R we have

limn→∞

1

n

n−1∑

j=0

f({xj,1}, . . . , {xj,k}) =

∫

· · ·∫

f(x1, . . . , xk) dx1 . . . dxk;

(iii) for all ℓ = (ℓ1, . . . , ℓk) ∈ Zk \ {0} we have

limn→∞

1

n

n−1∑

j=0

e2πi(ℓ1xj,1+···+ℓkxj,k) = 0.

Remark. Here and throughout 0 ∈ Zk denotes the zero vector (0, . . . , 0).

Remark. In §1 we commented that, topologically, the quotient group R/Z is a circle.More generally, the quotient group Rk/Zk is a k-dimensional torus.

Remark. Consider the case when k = 2 so that R2/Z2 is the 2-dimensional torus. We canregard R2/Z2 as the square [0, 1] × [0, 1] with the top and bottom sides identified and leftand right sides identified. Thus a continuous function f : R2/Z2 → R has the property thatf(0, y) = f(1, y) and f(x, 0) = f(x, 1). More generally, we can identify the k-dimensionaltorus Rk/Zk with [0, 1]k with (x1, . . . , xi−1, 0, xi+1, . . . , xk) and (x1, . . . , xi−1, 1, xi+1, . . . , xk)identified, 1 ≤ i ≤ k. A continuous function f : Rk/Zk → R then corresponds to acontinuous function f : [0, 1]k → R such that

f(x1, . . . , xi−1, 0, xi+1, . . . , xk) = f(x1, . . . , xi−1, 1, xi+1, . . . , xk)

for each i, 1 ≤ i ≤ k.

13


Proof of Theorem 2.1.1. The proof of Theorem 2.1.1 is essentially the same as in thecase k = 1. ✷

§2.2 The sequence xn = (α1n, . . . , αkn)

We shall apply Theorem 2.1.1 to the sequence xn = (α1n, . . . , αkn), for real numbersα1, . . . , αk.

Definition. Real numbers β1, . . . , βs ∈ R are said to be rationally independent if the onlyrationals r1, . . . , rs ∈ Q such that

r1β1 + · · · + rsβs = 0

are r1 = · · · = rs = 0.

Proposition 2.2.1Let α1, . . . , αk ∈ R. Then the following are equivalent:

(i) the sequence xn = (α1n, . . . , αkn) ∈ Rk is uniformly distributed mod 1;

(ii) α1, . . . , αk and 1 are rationally independent.

Proof. The proof is similar to the discussion in §1.2.1 and we leave it as an exercise. (SeeExercise 2.1.) ✷

Remark. Note that in the case k = 1, Proposition 2.2.1 reduces to the results of §1.2.1.To see this, note that α, 1 are rationally dependent if and only if there exist rationals r, s(not both zero) such that rα+ s = 0. This holds if and only if α is rational. Hence α, 1 arerationally independent if and only if α is irrational.

§2.3 Weyl’s Theorem on Polynomials

We have seen that αn+β is uniformly distributed mod 1 if α is irrational. Weyl’s Theoremgeneralises this to polynomials of higher degree. Write

p(n) = αknk + αk−1n

k−1 + · · · + α1n + α0.

Theorem 2.3.1 (Weyl’s Theorem on Polynomials)If any one of α1, . . . , αk is irrational then p(n) is uniformly distributed mod 1.

To prove this theorem we shall need the following technical result.

Lemma 2.3.2 (van der Corput’s Inequality)Let z0, . . . , zn−1 ∈ C and let 1 < m < n. Then

m2

∣

∣

∣

∣

∣

∣

n−1∑

j=0

zj

∣

∣

∣

∣

∣

∣

2

≤ m(n + m − 1)

n−1∑

j=0

|zj |2 + 2(n + m − 1)Re

m−1∑

j=1

(m − j)

n−1−j∑

i=0

zi+j zi.

14


Proof (not examinable). The proof is essentially an exercise in multiplying out a prod-uct and some careful book-keeping of the cross-terms. You are familiar with a particularcase of it, namely the fact that

|z0 + z1|2 = (z0 + z1)(z0 + z1) = |z0|2 + |z1|2 + z0z1 + z0z1 = |z0|2 + |z1|2 + 2Re(z0z1).

Construct the following parallelogram:

z0

z0 z1

z0 z1 z2...

......

. . .

z0 z1 z2 · · · zm−1

z1 z2 · · · zm−1 zm

z2 · · · zm−1 zm zm+1

. . .. . .

zn−m · · · zn−1

zn−m+1 · · · zn−1

. . ....

zn−2 zn−1

zn−1

(There are n columns, with each column containing m terms, and n + m− 2 rows.) Let sj,0 ≤ j ≤ n + m − 2, denote the sum of the terms in the jth row. Each zi occurs in exactlym of the row sums sj. Hence

s0 + · · · + sn+m−2 = m(z0 + · · · + zn−1)

so that

m2

∣

∣

∣

∣

∣

∣

n−1∑

j=0

zj

∣

∣

∣

∣

∣

∣

2

= |s0 + · · · + sn+m−2|2

≤ (|s0| + · · · + |sn+m−2|)2

≤ (n + m − 1)(|s0|2 + · · · + |sn+m−2|2),where the final inequality follows from the (n + m − 1)-dimensional Cauchy-Schwarz in-equality.

Recall that |sj |2 = sj sj. Expanding out this product and recalling that 2Re(z) = z + zwe have that

|sj|2 =∑

k

|zk|2 + 2Re∑

k,ℓ

zkzℓ

where the first sum is over all indices k of the zi occurring in the definition of sj, and thesecond sum is over the indices ℓ < k of the zi occurring in the definition of sj.

Noting that the the number of time the term zkzℓ occurs in |s0|2 + · · · + |sn+m−1|2 isequal to m − (ℓ − k), we can write

|s0|2 + · · · + |sn+m−1|2 ≤ mn−1∑

j=0

|zj |2 + 2Rem−1∑

j=1

(m − j)

n−1−j∑

i=0

zi+j zi

and the result follows. ✷

15


Let xn ∈ R. For each m ≥ 1 define the sequence x(m)n = xn+m − xn to be the sequence

of mth differences. The following lemma allows us to infer the uniform distribution of thesequence xn if we know the uniform distribution of the each of the mth differences of xn.

Lemma 2.3.3Let xn ∈ R be a sequence. Suppose that for each m ≥ 1 the sequence x

(m)n of mth differences

is uniformly distributed mod 1. Then xn is uniformly distributed mod 1.

Proof. We shall apply Weyl’s Criterion. We need to show that if ℓ ∈ Z \ {0} then

limn→∞

1

n

n−1∑

j=0

e2πiℓxj = 0.

Let zj = e2πiℓxj for j = 0, . . . , n − 1. Note that |zj | = 1. Let 1 < m < n. By van derCorput’s inequality,

m2

n2

∣

∣

∣

∣

∣

∣

n−1∑

j=0

e2πiℓxj

∣

∣

∣

∣

∣

∣

2

≤ m

n2(n + m − 1)n +

2(n + m − 1)

nRe

m−1∑

j=1

(m − j)

n

n−1−j∑

i=0

e2πiℓ(xi+j−xi)

=m

n(m + n − 1) +

2(n + m − 1)

nRe

m−1∑

j=1

(m − j)An,j

where

An,j =1

n

n−1−j∑

i=0

e2πiℓ(xi+j−xi) =1

n

n−1−j∑

i=0

e2πiℓx(j)i .

As the sequence x(j)i of jth differences is uniformly distributed mod 1, by Weyl’s criterion

we have that An,j → 0 for each j = 1, . . . ,m − 1. Hence for each m ≥ 1

lim supn→∞

m2

n2

∣

∣

∣

∣

∣

∣

n−1∑

j=0

e2πiℓxj

∣

∣

∣

∣

∣

∣

2

≤ lim supn→∞

m(n + m − 1)

n= m.

Hence, for each m > 1 we have

lim supn→∞

1

n

∣

∣

∣

∣

∣

∣

n−1∑

j=0

e2πiℓxj

∣

∣

∣

∣

∣

∣

≤ 1√m

.

As m > 1 is arbitrary, the result follows. ✷

Proof of Weyl’s Theorem. We will only prove Weyl’s Theorem on Polynomials (The-orem 2.3.1) in the special case where the leading coefficient αk of

p(n) = αknk + · · · + α1n + α0

is irrational. (The general case, where αi is irrational for some 1 ≤ i ≤ k, can be deducedeasily from this special case and we leave this as an exercise. See Exercise 2.2.)

We shall use induction on the degree of p. Let ∆(k) denote the statement ‘for everypolynomial p of degree ≤ k, with irrational leading coefficient, the sequence p(n) is uniformlydistributed mod 1’. We know that ∆(1) is true; this follows immediately from Exercise 1.2.

16


Suppose that ∆(k − 1) is true. Let p(n) = αknk + · · · + α1n + α0 be any polynomial of

degree k with αk irrational. Let m ∈ N and consider the sequence p(m)(n) = p(n+m)−p(n)of mth differences. We have that

p(m)(n) = p(n + m) − p(n)

= αk(n + m)k + αk−1(n + m)k−1 + · · · + α1(n + m) + α0

− αknk − αk−1n

k−1 − · · · − α1n − α0

= αknk + αkknk−1m + · · · + αk−1n

k−1 + αk−1(k − 1)nk−2m

+ · · · + α1n + α1m + α0 − αknk − αk−1n

k−1 − · · · − α1n − α0.

After cancellation, we can see that, for each m, p(m)(n) is a polynomial of degree k − 1with irrational leading coefficient αkkm. Therefore, by the inductive hypothesis, p(m)(n)is uniformly distributed mod 1. We may now apply Lemma 2.3.3 to conclude that p(n) isuniformly distributed mod 1 and so ∆(k) holds. This completes the induction. ✷

§2.4 Measures and the Lebesgue integral

You may have seen the definition of Lebesgue measure, Lebesgue outer measure and theLebesgue integral in other courses, for example in Fourier Analysis and Lebesgue Integra-tion. The theory developed in that course is one particular example of a more generaltheory, which we sketch here. Measure theory is a key technical tool in ergodic theory, andso a good knowledge of measures and integration is essential for this course (although wewill not need to know the (many) technical intricacies).

§2.4.1 Measure spaces

Loosely speaking, a measure is a function that, when given a subset of a space X, willsay how ‘big’ that subset is. A motivating example is given by Lebesgue measure on[0, 1]. The Lebesgue measure of an interval [a, b] is given by its length b− a. In defining anabstract measure space, we will be taking the properties of ‘length’ (or, in higher dimensions,‘volume’) and abstracting them, in much the same way that a metric space abstracts theproperties of ‘distance’.

It turns out that in general it is not possible to be able to define the measure of anarbitrary subset of X. Instead, we will usually have to restrict our attention to a class ofsubsets of X.

Definition. A collection B of subsets of X is called a σ-algebra if the following propertieshold:

(i) ∅ ∈ B,

(ii) if E ∈ B then its complement X \ E ∈ B,

(iii) if En ∈ B, n = 1, 2, 3, . . ., is a countable sequence of sets in B then their union⋃∞

n=1 En ∈ B.

Definition. If X is a set and B a σ-algebra of subsets of X then we call (X,B) a mea-surable space.

Examples.

17


1. The trivial σ-algebra is given by B = {∅,X}.

2. The full σ-algebra is given by B = P(X), i.e. the collection of all subsets of X.

Remark. In general, the trivial σ-algebra is too small and the full σ-algebra is too big.We shall see some more interesting examples of σ-algebras later.

Here are some easy properties of σ-algebras:

Lemma 2.4.1Let B be a σ-algebra of subsets of X. Then

(i) X ∈ B;

(ii) if En ∈ B then⋂∞

n=1 En ∈ B.

In the special case when X is a compact metric space there is a particularly importantσ-algebra.

Definition. Let X be a compact metric space. We define the Borel σ-algebra B(X) tobe the smallest σ-algebra of subsets of X which contains all the open subsets of X.

Remarks.

1. By ‘smallest’ we mean that if C is another σ-algebra that contains all open subsets ofX then B(X) ⊂ C, that is:

B(X) =⋂

{C | C is a σ-algebra that contains the open sets}.

2. We say that the Borel σ-algebra is generated by the open sets. We call a set in B(X)a Borel set.

3. By Definition 2.4.1(ii), the Borel σ-algebra also contains all the closed sets and is thesmallest σ-algebra with this property.

4. By Lemma 2.4.1 it follows that B contains all countable intersections of open sets, allcountable unions of countable intersections of open sets, all countable intersections ofcountable unions of countable intersections of open sets, etc—and indeed many othersets.

5. There are plenty of sets that are not Borel sets, although by necessity they are rathercomplicated. For example, consider R as an additive group and Q ⊂ R as a subgroup.Form the quotient group R/Q and choose an element in [0, 1] for each coset (thisrequires the Axiom of Choice.) The set E of coset representatives is a non-Borel set.

6. In the case when X = [0, 1] or R/Z, the Borel σ-algebra is also the smallest σ-algebrathat contains all sub-intervals.

Let X be a set and let B be a σ-algebra of subsets of X.

Definition. A function µ : B → R is called a (finite) measure if:

(i) µ(∅) = 0;

18


(ii) if En is a countable collection of pairwise disjoint sets in B (i.e. En ∩ Em = ∅ forn 6= m) then

µ

(

∞⋃

n=1

En

)

=

∞∑

n=1

µ(En).

We call (X,B, µ) a measure space.If µ(X) = 1 then we call µ a probability or probability measure and refer to (X,B, µ)

as a probability space.

Remark. Thus a measure just abstracts properties of ‘length’ or ‘volume’. Condition (i)says that the empty set has zero length, and condition (ii) says that the length of a disjointunion is the sum of the lengths of the individual sets.

Definition. We say that a property holds almost everywhere if the set of points on whichthe property fails to hold has measure zero. We will often abbreviate this to ‘µ-a.e.’ or to‘a.e.’ when the implied measure is clear from the context.

Example. We shall see (Exercise 2.9) that the set of rationals in [0, 1] forms a Borelset with zero Lebesgue measure. Thus Lebesgue almost every point in [0, 1] is irrational.(Thus, ‘typical’ (in the sense of measure theory, and with respect to Lebesgue measure)points in [0, 1] are irrational.)

We will usually be interested in studying measures on the Borel σ-algebra of a compactmetric space X. To define such a measure, we need to define the measure of an arbitraryBorel set. In general, the Borel σ-algebra is extremely large. We shall see that it is oftenunnecessary to do this and instead it is sufficient to define the measure of a certain class ofsubsets.

§2.4.2 The Hahn-Kolmogorov Extension Theorem

A collection A of subsets of X is called an algebra if:

(i) ∅ ∈ A,

(ii) if A1, A2, . . . , An ∈ A then⋃n

j=1 Aj ∈ A,

(iii) if A ∈ A then Ac ∈ A.

Thus an algebra is like a σ-algebra, except that it is closed under finite unions and notnecessarily closed under countable unions.

Example. Take X = [0, 1], and A = {all finite unions of subintervals}.

Let B(A) denote the σ-algebra generated by A, i.e., the smallest σ-algebra containingA. More precisely:

B(A) =⋂

{C | C is a σ-algebra, C ⊃ A}.

In the case when X = [0, 1] and A is the algebra of finite unions of intervals, we havethat B(A) is the Borel σ-algebra. Indeed, in the special case of the Borel σ-algebra of acompact metric space X, it is usually straightforward to check whether an algebra generatesthe Borel σ-algebra.

19


Proposition 2.4.2Let X be a compact metric space and let B be the Borel σ-algebra. Let A be an algebraof Borel subsets, A ⊂ B. Suppose that for every x1, x2 ∈ X, x1 6= x2, there exist disjointopen sets A1, A2 ∈ A such that x1 ∈ A1, x2 ∈ A2. Then A generates the Borel σ-algebra B.

The following result says that if we have a function which looks like a measure definedon an algebra, then it extends uniquely to a measure defined on the σ-algebra generatedby the algebra.

Theorem 2.4.3 (Hahn-Kolmogorov Extension Theorem)Let A be an algebra of subsets of X. Suppose that µ : A → [0, 1] satisfies:

(i) µ(∅) = 0;

(ii) if An ∈ A, n ≥ 1, are pairwise disjoint and if⋃∞

n=1 An ∈ A then

µ

(

∞⋃

n=1

An

)

=∞∑

n=1

µ(An).

Then there is a unique probability measure µ : B(A) → [0, 1] which is an extension ofµ : A → [0, 1].

Remarks.

(i) We will often use the Hahn-Kolmogorov Extension Theorem as follows. Take X =[0, 1] and take A to be the algebra consisting of all finite unions of subintervals of X.We then define the ‘measure’ µ of a subinterval in such a way as to be consistent withthe hypotheses of the Hahn-Kolmogorov Extension Theorem. It then follows that µdoes indeed define a measure on the Borel σ-algebra.

(ii) Here is another way in which we shall use the Hahn-Kolmogorov Extension Theorem.Suppose we have two measures, µ and ν, and we want to see if µ = ν. A prioriwe would have to check that µ(B) = ν(B) for all B ∈ B. The Hahn-KolmogorovExtension Theorem says that it is sufficient to check that µ(A) = ν(A) for all A inan algebra A that generates B. For example, to show that two Borel probabilitymeasures on [0, 1] are equal, it is sufficient to show that they give the same measureto each subinterval.

(iii) There is a more general version of the Hahn-Kolmogorov Extension Theorem for thecase when X does not have finite measure (indeed, this is the setting in which theHahn-Kolmogorov Theorem is usually stated). Suppose that X is a set, B is a σ-algebra of subsets of X, and A is an algebra that generates B. Suppose that µ : A →R ∪ {∞} satisfies conditions (i) and (ii) of Theorem 2.4.3. Suppose in addition thatthere exist a countable number of sets An ∈ A, n = 1, 2, 3, . . . such that X =

⋃∞n=1 An

such that µ(An) < 1. Then there exists a unique measure µ : B(A) → R∪{∞} whichis an extension of µ : A → R ∪ {∞}.

A consequence of the proof (which we omit) of the Hahn-Kolmogorov Extension The-orem is that sets in B can be arbitrarily well approximated by sets in A in the followingsense. We define the symmetric difference between two sets A,B by

A△B = (A \ B) ∪ (B \ A).

Thus, two sets are ‘close’ if their symmetric difference is small.

20


Proposition 2.4.4Suppose that A is an algebra that generates the σ-algebra B. Let B ∈ B and let ε > 0.Then there exists A ∈ A such that µ(A△B) < ε.

Remark. It is straightforward to check that if µ(A△B) < ε then |µ(A) − µ(B)| < ε.

§2.4.3 Examples of measure spaces

Lebesgue measure on [0, 1]. Take X = [0, 1] and take A to be the collection of all finiteunions of subintervals of [0, 1]. For a subinterval [a, b] define

µ([a, b]) = b − a.

This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so definesa measure on the Borel σ-algebra B. This is Lebesgue measure.

Lebesgue measure on R/Z. Take X = R/Z and take A to be the collection of all finiteunions of subintervals of [0, 1). For a subinterval [a, b] define

µ([a, b]) = b − a.

This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so definesa measure on the Borel σ-algebra B. This is Lebesgue measure on the circle.

Lebesgue measure on the k-dimensional torus. Take X = Rk/Zk and take A to bethe collection of all finite unions of k-dimensional sub-cubes

∏kj=1[aj , bj ] of [0, 1]k . For a

sub-cube∏k

j=1[aj , bj ] of [0, 1]k , define

µ(

k∏

j=1

[aj , bj ]) =

k∏

j=1

(bj − aj).

This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so definesa measure on the Borel σ-algebra B. This is Lebesgue measure on the torus.

Stieltjes measures.1 Take X = [0, 1] and let ρ : [0, 1] → R+ be an increasing functionsuch that ρ(1) − ρ(0) = 1. Take A to be the algebra of finite unions of subintervals anddefine

µρ([a, b]) = ρ(b) − ρ(a).

This satisfies the hypotheses of the Hahn-Kolmogorov Extension Theorem, and so definesa measure on the Borel σ-algebra B. We say that µρ is the measure on [0, 1] with densityρ.

Dirac measures. Finally, we give an example of a class of measures that do not fall intothe above categories. Let X be an arbitrary space and let B be an arbitrary σ-algebra. Letx ∈ X. Define the measure δx by

δx(A) =

{

1 if x ∈ A0 if x 6∈ A.

Then δx defines a probability measure. It is called the Dirac measure at x.

1An approximate pronunciation of Stieltjes is ‘Steeel-tyuz’.

21


§2.5 Exercises

Exercise 2.1Prove Proposition 2.2.1: let α1, . . . , αk ∈ R and let xn = (α1n, . . . , αkn) ∈ Rk. Prove thatxn is uniformly distributed mod 1 if and only if α1, . . . , αk, 1 are rationally independent.

Exercise 2.2Deduce the general case of Weyl’s Theorem on Polynomials (where at least one non-constantcoefficient is irrational) from the special case proved above (where the leading coefficient isirrational).

Exercise 2.3Let α be irrational. Show that p(n) = αn2 + n + 1 is uniformly distributed mod 1 by using

Lemma 2.3.3 and Exercise 1.2: i.e. show that, for each m ≥ 1, the sequence p(m)(n) =p(n + m) − p(n) of mth differences is uniformly distributed mod 1.

Exercise 2.4Let p(n) = αkn

k + αk−1nk−1 + · · · + α1n + α0, q(n) = βkn

k + βk−1nk−1 + · · · + β1n + β0.

Show that (p(n), q(n)) ∈ R2 is uniformly distributed mod 1 if, for some 1 ≤ i ≤ k, αi, βi

and 1 are rationally independent.

Exercise 2.5Prove Lemma 2.4.1.

Exercise 2.6Let X = [0, 1]. Find the smallest σ-algebra that contains the sets: [0, 1/4), [1/4, 1/2), [1/2, 3/4),and [3/4, 1]

Exercise 2.7Let X = [0, 1] and let B denote the Borel σ-algebra. A dyadic interval is an interval of theform

[p1

2k,p2

2k

]

, p1, p2 ∈ {0, 1, . . . , 2k}.

Show that the algebra formed by taking finite unions of all dyadic intervals (over all k ∈ N)generates the Borel σ-algebra.

Exercise 2.8Show that A = {all finite unions of subintervals of [0, 1]} is an algebra.

Exercise 2.9Let µ denote Lebesgue measure on [0, 1]. Show that for any x ∈ [0, 1] we have thatµ({x}) = 0. Hence show that the Lebesgue measure of any countable set is zero.

Show that Lebesgue almost every point in [0, 1] is irrational.

Exercise 2.10Let X = [0, 1]. Let µ = δ1/2 denote the Dirac δ-measure at 1/2. Show that

µ ([0, 1/2) ∪ (1/2, 1]) = 0.

Conclude that µ-almost every point in [0, 1] is equal to 1/2.

22

MATH4/61112 3. Lebesgue integration. Invariant measures

3. Lebesgue integration. Invariant measures

§3.1 Lebesgue integration

Let (X,B, µ) be a measure space. We are interested in how to integrate functions definedon X with respect to the measure µ. In the special case when X = [0, 1], B is the Borelσ-algebra and µ is Lebesgue measure, this will extend the definition of the Riemann integralto a class of functions that are not Riemann integrable.

Definition. Let f : X → R be a function. If D ⊂ R then we define the pre-image of Dto be the set f−1D = {x ∈ X | f(x) ∈ D}.

A function f : X → R is measurable if f−1D ∈ B for every Borel subset D of R. Onecan show that this is equivalent to requiring that f−1(−∞, c) ∈ B for all c ∈ R.

A function f : X → C is measurable if both the real and imaginary parts, Ref andImf , are measurable.

Remark. In writing f−1D, we are not assuming that f is a bijection. We are writingf−1D to denote the pre-image of the set D.

We define integration via simple functions.

Definition. A function f : X → R is simple if it can be written as a linear combinationof characteristic functions of sets in B, i.e.:

f =

r∑

j=1

ajχBj ,

for some aj ∈ R, Bi ∈ B, where the Bj are pairwise disjoint.

Remarks.

(i) Note that the sets Bj are sets in the σ-algebra B; even in the case when X = [0, 1]we do not assume that the sets Bj are intervals.

(ii) For example, χQ∩[0,1] is a simple function. Note, however, that χQ∩[0,1] is not Riemannintegrable.

For a simple function f : X → R we define

∫

f dµ =

r∑

j=1

ajµ(Bj).

For example, if µ denotes Lebesgue measure on [0, 1] then

∫

χQ∩[0,1] dµ = µ(Q ∩ [0, 1]) = 0,

23


as Q ∩ [0, 1] is a countable set and so has Lebesgue measure zero.A simple function can be written as a linear combination of characteristics functions of

pairwise disjoint sets in many different ways (for example, χ[1/4,3/4] = χ[1/4,1/2) +χ[1/2,3/4]).However, one can show that the definition of a simple function f given in (3.1.1) is indepen-dent of the choice of representation of f as a linear combination of characteristic functions.Thus for a simple function f , the integral of f can be regarded as being the area of theregion in X × R bounded by the graph of f .

If f : X → R, f ≥ 0, is measurable then one can show that there exists an increasingsequence of simple functions fn such that fn ↑ f pointwise1 as n → ∞ and we define

∫

f dµ = limn→∞

∫

fn dµ.

This can be shown to exist (although it may be ∞) and to be independent of the choice ofsequence fn.

For an arbitrary measurable function f : X → R, we write f = f+ − f−, wheref+ = max{f, 0} ≥ 0 and f− = max{−f, 0} ≥ 0 and define

∫

f dµ =

∫

f+ dµ −∫

f− dµ.

If∫

f+ dµ = ∞ and∫

f− dµ is finite then we set∫

f dµ = ∞. Similarly, if∫

f+ dµ is finitebut

∫

f− dµ = ∞ then we set∫

f dµ = −∞. If both∫

f+ dµ and∫

f− dµ are infinite thenwe leave

∫

f dµ undefined.Finally, for a measurable function f : X → C, we define

∫

f dµ =

∫

Ref dµ + i

∫

Imf dµ.

We say that f is integrable if∫

|f | dµ < +∞.

(Note that, in the case of a measurable function f : X → R, saying that f is integrable isequivalent to saying that both

∫

f+ dµ and∫

f− dµ are finite.)Denote the space of C-valued integrable functions by L1(X,B, µ). (We shall see a

slightly more sophisticated definition of this space below.)Note that when we write

∫

f dµ we are implicitly integrating over the whole space X.We can define integration over subsets of X as follows.

Definition. Let (X,B, µ) be a probability space. Let f ∈ L1(X,B, µ) and let B ∈ B.Then χBf ∈ L1(X,B, µ). We define

∫

Bf dµ =

∫

χBf dµ.

1fn ↑ f pointwise means: for every x, fn(x) is an increasing sequence of real numbers and limn→∞ fn(x) =f(x).

24


§3.1.1 Examples

Lebesgue measure. Let X = [0, 1] and let µ denote Lebesgue measure on the Borelσ-algebra. If f : [0, 1] → R is Riemann integrable then it is also Lebesgue integrable andthe two definitions agree. However, there are plenty of examples of functions which areLebesgue integrable but not Riemann integrable. For example, take f(x) = χQ∩[0,1](x)defined on [0, 1] to be the characteristic function of the rationals. Then f(x) = 0 µ-a.e.Hence f is integrable and

∫

f dµ = 0. However, f is not Riemann integrable.

The Stieltjes integral. Let ρ : [0, 1] → R+ and suppose that ρ is differentiable. Thenone can show that

∫

f dµρ =

∫

f(x)ρ′(x) dx.

Integration with respect to Dirac measures. Let x ∈ X. Recall that we defined theDirac measure at x by

δx(B) =

{

1 if x ∈ B0 if x 6∈ B.

If χB denotes the characteristic function of B then

∫

χB dδx =

{

1 if x ∈ B0 if x 6∈ B.

Suppose that f =∑

ajχBj is a simple function and that, without loss of generality, the Bj

are pairwise disjoint. Then∫

f dδx = aj where j is chosen so that x ∈ Bj (and equals zeroif no such Bj exists). Now let f : X → R. By choosing an increasing sequence of simplefunctions, we see that

∫

f dδx = f(x).

We say that two measurable functions f, g : X → C are equivalent or equal µ-a.e. iff = g µ-a.e., i.e. if µ({x ∈ X | f(x) 6= g(x)}) = 0. The following result says that if twofunctions differ only on a set of measure zero then their integrals are equal.

Lemma 3.1.1Suppose that f, g ∈ L1(X,B, µ) and f, g are equal µ-a.e. Then

∫

f dµ =∫

g dµ.

Functions being equivalent is an equivalence relation. We shall write L1(X,B, µ) forthe set of equivalence classes of integrable functions f : X → C on (X,B, µ). We define

‖f‖1 =

∫

|f | dµ.

Then d(f, g) = ‖f − g‖1 is a metric on L1(X,B, µ). One can show that L1(X,B, µ) is avector space; indeed, it is complete in the L1 metric, and so is a Banach space.

Remark. In practice, we will often abuse notation and regard elements of L1(X,B, µ) asfunctions rather than equivalence classes of functions. In general, in measure theory onecan often ignore sets of measure zero and treat two objects (functions, sets, etc) that differonly on a set of measure zero as ‘the same’.

25


More generally, for any p ≥ 1, we can define the space Lp(X,B, µ) consisting of (equiv-alence classes of) measurable functions f : X → C such that |f |p is integrable. We canagain define a metric on Lp(X,B, µ) by defining d(f, g) = ‖f − g‖p where

‖f‖p =

(∫

|f |p dµ

)1/p

is the Lp norm.Apart from L1, the most interesting Lp space is L2(X,B, µ). This is a Hilbert space2

with the inner product

〈f, g〉 =

∫

f g dµ.

The Cauchy-Schwarz inequality holds: |〈f, g〉| ≤ ‖f‖2‖g‖2 for all f, g ∈ L2(X,B, µ).Suppose that µ is a finite measure. It follows from the Cauchy-Schwarz inequality that

L2(X,B, µ) ⊂ L1(X,B, µ).In general, the Riemann integral does not behave well with respect to limits. For

example, if fn is a sequence of Riemann integrable functions such that fn(x) → f(x) atevery point x then it does not follow that f is Riemann integrable. Even if f is Riemannintegrable, it does not follow that

∫

fn(x) dx →∫

f(x) dx. The following convergencetheorems hold for the Lebesgue integral.

Theorem 3.1.2 (Monotone Convergence Theorem)Suppose that fn : X → R is an increasing sequence of integrable functions on (X,B, µ).Suppose that

∫

fn dµ is a bounded sequence of real numbers (i.e. there exists M > 0 suchthat |

∫

fn dµ| ≤ M for all n). Then f(x) = limn→∞ fn(x) exists µ-a.e. Moreover, f isintegrable and

∫

f dµ = limn→∞

∫

fn dµ.

Theorem 3.1.3 (Dominated Convergence Theorem)Suppose that g : X → R is integrable and that fn : X → R is a sequence of measurablefunctions with |fn(x)| ≤ g(x) µ-a.e. and limn→∞ fn(x) = f(x) µ-a.e. Then f is integrableand

limn→∞

∫

fn dµ =

∫

f dµ.

Remark. Both the Monotone Convergence Theorem and the Dominated ConvergenceTheorem fail for Riemann integration.

§3.2 Invariant measures

We are now in a position to study dynamical systems. Let (X,B, µ) be a probability space.Let T : X → X be a dynamical system. If B ∈ B then we define

T−1B = {x ∈ X | T (x) ∈ B},that is, T−1B is the pre-image of B under T .

2An inner product 〈·, ·〉 : H×H → C on a complex vector space H is a function such that: (i) 〈v, v〉 ≥ 0for all v ∈ H with equality if and only if v = 0, (ii) 〈u, v〉 = 〈v, u〉, and (iii) for each v ∈ H, u 7→ 〈u, v〉 islinear. An inner product determines a norm by setting ‖v‖ = (〈v, v〉)1/2. A norm determines a metric bysetting d(u, v) = ‖u − v‖. We say that H is a Hilbert space if the vector space H is complete with respectto the metric induced from the inner product.

26


Remark. Note that we do not have to assume that T is a bijection for this definition tomake sense. For example, let T (x) = 2x mod 1 be the doubling map on [0, 1]. Then T isnot a bijection. One can easily check that, for example, T−1(0, 1/2) = (0, 1/4)∪ (1/2, 3/4).

Definition. A transformation T : X → X is said to be measurable if T−1B ∈ B for allB ∈ B.

Remark. We will often work with compact metric spaces X equipped with the Borelσ-algebra. In this setting, any continuous transformation is measurable.

Remark. Suppose that A is an algebra of sets that generates the σ-algebra B. One canshow that if T−1A ∈ B for all A ∈ A then T is measurable.

Definition. We say that T is a measure-preserving transformation (m.p.t. for short) or,equivalently, µ is said to be a T -invariant measure, if µ(T−1B) = µ(B) for all B ∈ B.

§3.3 Using the Hahn-Kolmogorov Extension Theorem to prove invariance

Recall the Hahn-Kolmogorov Extension Theorem:

Theorem 3.3.1 (Hahn-Kolmogorov Extension Theorem)Let A be an algebra of subsets of X and let B(A) denote the σ-algebra generated by A.Suppose that µ : A → [0, 1] satisfies:

(i) µ(∅) = 0;

(ii) if An ∈ A, n ≥ 1, are pairwise disjoint and if⋃∞

n=1 An ∈ A then

µ

(

∞⋃

n=1

An

)

=

∞∑

n=1

µ(An).

Then there is a unique probability measure µ : B(A) → [0, 1] which is an extension ofµ : A → [0, 1].

That is, if µ looks like a measure on an algebra A, then it extends uniquely to a measuredefined on the σ-algebra B(A) generated by A.

Corollary 3.3.2Let A be an algebra of subsets of X. Suppose that µ1 and µ2 are two measures on B(A)such that µ1(A) = µ2(A) for all A ∈ A. Then µ1 = µ2 on B(A).

We shall discuss several examples of dynamical systems and prove that certain naturallyoccurring measures are invariant using the Hahn-Kolmogorov Extension Theorem.

Suppose that (X,B, µ) is a probability space and suppose that T : X → X is measurable.We define a new measure T∗µ by

T∗µ(B) = µ(T−1B) (3.3.1)

where B ∈ B. It is straightforward to check that T∗µ is a probability measure on (X,B, µ)(see Exercise 3.4). Thus µ is a T -invariant measure if and only if T∗µ = µ, i.e. T∗µ andµ are the same measure. Corollary 3.3.2 says that if two measures agree on an algebra,then they agree on the σ-algebra generated by that algebra. Hence if we can show thatT∗µ(A) = µ(A) for all sets A ∈ A for some algebra A that generates B, then T∗µ = µ, andso µ is a T -invariant measure.

27


§3.3.1 The doubling map

Let X = R/Z be the circle, B be the Borel σ-algebra, and let µ denote Lebesgue measure.Define the doubling map by T (x) = 2x mod 1.

Proposition 3.3.3Let X = R/Z be the circle, B be the Borel σ-algebra, and let µ denote Lebesgue measure.Define the doubling map by T (x) = 2x mod 1. Then Lebesgue measure µ is T -invariant.

Proof. Let A denote the algebra of finite unions of intervals. For an interval [a, b] wehave that

T−1[a, b] = {x ∈ R/Z | T (x) ∈ [a, b]} =

[

a

2,b

2

]

∪[

a + 1

2,b + 1

2

]

.

See Figure 3.3.1.

a

b

a2

b2

a+12

b+12

Figure 3.3.1: The pre-image of an interval under the doubling map

Hence

T∗µ([a, b]) = µ(T−1[a, b])

= µ

([

a

2,b

2

]

∪[

a + 1

2,b + 1

2

])

=b

2− a

2+

(b + 1)

2− (a + 1)

2= b − a = µ([a, b]).

Hence T∗µ = µ on the algebra A. As A generates the Borel σ-algebra, by uniqueness inthe Hahn-Kolmogorov Extension Theorem we see that T∗µ = µ. Hence Lebesgue measureis T -invariant. ✷

§3.3.2 Rotations on a circle

Let X = R/Z be the circle, let B be the Borel σ-algebra and let µ be Lebesgue measure.Fix α ∈ R. Define T : R/Z → R/Z by T (x) = x + α mod 1. We call T a rotation throughangle α.

28


One can also regard R/Z as the unit circle K = {z ∈ C | |z| = 1} in the complex planevia the map t 7→ e2πit. In these co-ordinates, the map T becomes T (e2πiθ) = e2πiαe2πiθ,which is a rotation about the origin through the angle 2πα.

Proposition 3.3.4Let T : R/Z → R/Z, T (x) = x + α mod 1, be a circle rotation. Then Lebesgue measure isan invariant measure.

Proof. Let [a, b] ⊂ R/Z be an interval. By the Hahn-Kolmogorov Extension Theorem, ifwe can show that T∗µ([a, b]) = µ([a, b]) then it follows that µ(T−1B) = µ(B) for all B ∈ B,hence µ is T -invariant.

Note that T−1[a, b] = [a−α, b−α] where we interpret the endpoints mod 1. (One needsto be careful here: if a − α < 0 < b − α then T−1([a, b]) = [0, b − α] ∪ [a − α + 1, 1], etc.)Hence

T∗µ([a, b]) = µ([a − α, b − α]) = (b − α) − (a − α) = b − a = µ([a, b]).

Hence T∗µ = µ on the algebra A. As A generates the Borel σ-algebra, by uniqueness inthe Hahn-Kolmogorov Extension Theorem we see that T∗µ = µ. Hence Lebesgue measureis T -invariant. ✷

§3.3.3 The Gauss map

Let X = [0, 1] be the unit interval and let B be the Borel σ-algebra. Define the Gauss mapT : X → X by

T (x) =

{

1x mod 1 if x 6= 00 if x = 0.

See Figure 3.3.2

114

13

120

Figure 3.3.2: The graph of the Gauss map (note that there are, in fact, infinitely manybranches to the graph, only the first 5 are illustrated)

The Gauss map is very closely related to continued fractions. Recall that if x ∈ (0, 1)

29


then x has a continued fraction expansion of the form

x =1

x0 +1

x1 +1

x2 + · · ·

(3.3.2)

where xj ∈ N. If x is rational then this expansion is finite. One can show that x is irrationalif and only if it has an infinite continued fraction expansion. Moreover, if x is irrationalthen it has a unique infinite continued fraction expansion.

If x has continued fraction expansion given by (3.3.2) then

1

x= x0 +

1

x1 +1

x2 +1

x3 + · · ·

.

Hence, taking the fractional part, we see that T (x) has continued fraction expansion givenby

T (x) =1

x1 +1

x2 +1

x3 + · · ·i.e. T acts by deleting the zeroth term in the continued fraction expansion of x and thenshifting the remaining digits one place to the left.

The Gauss map does not preserve Lebesgue measure (see Exercise 3.5). However it doespreserve Gauss’ measure µ defined by

µ(B) =1

log 2

∫

B

dx

1 + x

(here log denotes the natural logarithm; the factor log 2 is a normalising constant to makethis a probability measure).

Proposition 3.3.5Gauss’ measure is an invariant measure for the Gauss map.

Proof. It is sufficient to check that µ([a, b]) = µ(T−1[a, b]) for any interval [a, b]. Firstnote that

T−1[a, b] =∞⋃

n=1

[

1

b + n,

1

a + n

]

.

Thus

µ(T−1[a, b])

=1

log 2

∞∑

n=1

∫ 1a+n

1b+n

1

1 + xdx

=1

log 2

∞∑

n=1

[

log

(

1 +1

a + n

)

− log

(

1 +1

b + n

)]

=1

log 2

∞∑

n=1

[log(a + n + 1) − log(a + n) − log(b + n + 1) + log(b + n)]

30


= limN→∞

1

log 2

N∑

n=1

[log(a + n + 1) − log(a + n) − log(b + n + 1) + log(b + n)]

=1

log 2lim

N→∞[log(a + N + 1) − log(a + 1) − log(b + N + 1) + log(b + 1)]

=1

log 2

(

log(b + 1) − log(a + 1) + limN→∞

log

(

a + N + 1

b + N + 1

))

=1

log 2(log(b + 1) − log(a + 1))

=1

log 2

∫ b

a

1

1 + xdx = µ([a, b]),

as required. ✷

§3.3.4 Markov shifts

Let S be a finite set, for example S = {1, 2, . . . , k}, with k ≥ 2. Let

Σ = {x = (xj)∞j=0 | xj ∈ S}

denote the set of all infinite sequences of symbols chosen from S. Thus a point x in thephase space Σ is an infinite sequence of symbols x = (x0, x1, x2, . . .).

Define the shift map σ : Σ → Σ by

σ((x0, x1, x2, . . .)) = (x1, x2, x3, . . .)

(equivalently, (σ(x))j = xj+1). Thus σ takes a sequence, deletes the zeroth term in thissequence, and then shifts the remaining terms in the sequence one place to the left.

When constructing a measure µ on the Borel σ-algebra B of [0, 1] we first defined µ onan algebra A that generates the σ-algebra B and then extended µ to B using the Hahn-Kolmogorov Extension Theorem. In this case, our algebra A was the collection of finiteunions of intervals; thus to define µ on A it was sufficient to define µ on an interval. Wewant to use a similar procedure to define measures on Σ. To do this, we first need to definea metric on Σ, so that it makes sense to talk about the Borel σ-algebra, and then we needan algebra of subsets that generates the Borel σ-algebra.

Let x,y ∈ Σ. Suppose that x 6= y. Define n(x,y) = n where xn 6= yn but xj = yj

for 0 ≤ j ≤ n − 1. Thus n(x,y) is the index of the first place in which the sequences x,ydisagree. For convenience, define n(x,y) = ∞ if x = y. Define

d(x,y) =1

2n(x,y).

Thus two sequences x,y are close if they agree for a large number of initial places.One can show (see Exercise 3.10) that d is a metric on Σ and that the shift map

σ : Σ → Σ is continuous.Fix ij ∈ S, j = 0, 1, . . . , n − 1. We define the cylinder set

[i0, i1, . . . , in−1] = {x = (xj)∞j=0 ∈ Σ | xj = ij , j = 0, 1, . . . , n − 1}.

That is, the cylinder set [i0, i1, . . . , in−1] consists of all infinite sequences of symbols from Sthat begin i0, i1, . . . , in−1. We call n the rank of the cylinder. Cylinder sets for shifts often

31


play the same role that intervals do for maps of the unit interval or circle. Let A denotethe algebra of all finite unions of cylinders. Then A generates the Borel σ-algebra B. Tosee this we use Proposition 2.4.2. It is sufficient to check that A separates every pair ofdistinct points in Σ. Let x = (xj)

∞j=0,y = (yj)

∞j=0 ∈ Σ and suppose that x 6= y. Then there

exists n ≥ 0 such that xn 6= yn. Hence x,y are in different cylinders of rank n + 1, and theclaim follows.

We will construct a family of σ-invariant measures on Σ by first constructing them oncylinders and then extending them to the Borel σ-algebra by using the Hahn-KolmogorovExtension Theorem. A k × k matrix P is called a stochastic matrix if

(i) P (i, j) ∈ [0, 1]

(ii) each row of P sums to 1: for each i,∑k

j=1 P (i, j) = 1.

(Here, P (i, j) denotes the (i, j)th entry of the matrix P .)We say that P is irreducible if: for all i, j, there exists n > 0 such that Pn(i, j) > 0. We

say that P is aperiodic if there exists n > 0 such that every entry of Pn is strictly positive.Thus P is irreducible if for every (i, j) there exists an n such that the (i, j)th entry of Pn

is positive, and P is aperiodic if this n can be chosen to be independent of (i, j).Suppose that P is irreducible. Let d be the highest common factor of {n > 0 | Pn(i, i) >

0}. One can show that P is aperiodic if and only if d = 1. We call d the period of P .In general, if d is the period of an irreducible matrix P then {1, 2, . . . , k} can be parti-

tioned into d sets, S0, S1, . . . , Sd−1, say, such that P (i, j) > 0 only if i ∈ Sℓ, j ∈ Sℓ+1 mod d.The matrix P d restricted to the indices that comprise each set Sj is then aperiodic.

The eigenvalues of aperiodic (or, more generally, irreducible) stochastic matrices areextremely well-behaved.

Theorem 3.3.6 (Perron-Frobenius Theorem)Let P be an irreducible stochastic matrix with period d. Then the following statementshold:

(i) The dth roots of unity are simple eigenvalues for P and all other eigenvalues havemodulus strictly less than 1.

(ii) Let 1 denote the column vector (1, 1, . . . , 1)T . Then P1 = 1 so that 1 is a righteigenvector corresponding to the maximal eigenvalue 1. Moreover, there exists acorresponding left eigenvector p = (p(1), . . . , p(k)) for the eigenvalue 1, that is pP = p.The vector p has strictly positive entries p(j) > 0, and we can assume that p isnormalised so that

∑kj=1 p(j) = 1.

(iii) for all i, j ∈ {1, 2, . . . , k}, we have that Pnd(i, j) → p(j) as n → ∞.

Proof (not examinable). We prove only the aperiodic case. In this case, the periodd = 1. We must show that 1 is a simple eigenvalue, construct the positive left eigenvectorp, and show that Pn(i, j) → p(j) as n → ∞.

First note that 1 is an eigenvalue of P as P1 = 1; this follows from the fact that, for astochastic matrix, the rows sum to 1.

Suppose P has an eigenvalue λ with corresponding eigenvector v. Then Pv = λv. HencePnv = λnv. As the entries of P are non-negative we have that

|λn||v| ≤ Pn|v|.

32


Note that if P is stochastic then so is Pn for any n ≥ 1. As Pn is stochastic, the right-handside is a bounded sequence in n. Hence P keeps an eigenvector in a bounded region of Ck.If |λ| > 1 then |λn||v| → ∞, a contradiction if v 6= 0. Hence the eigenvalues of P havemodulus less than or equal to 1.

Suppose that Pv = λv and |λ| = 1. Then Pn|v| ≥ |v|. As P is aperiodic, we can choosen such that Pn(i, j) > 0 for all i, j. Hence

k∑

j=1

Pn(i, j)|v(j)| ≥ |v(i)| (3.3.3)

and choose i0 such that |v(i0)| = max{|v(j)| | 1 ≤ j ≤ k}. Also, as Pn is stochastic andPn(i, j) > 0, we must have that

|v(i0)| ≥k∑

j=1

Pn(i0, j)|v(j)| (3.3.4)

as the right-hand side of (3.3.4) is a convex combination of the |v(j)|. Thus |v(j)| = |v(i0)|for every j, 1 ≤ j ≤ k. We can assume, by normalising, that |v(j)| = 1 for all 1 ≤ j ≤ k.

Now Pv = λv, i.e. λv(i) =∑k

j=1 P (i, j)v(j), a convex combination of v(j). As the |v(j)|all have the same modulus, this can only happen if all of the v(j) are the same. Hence v isa multiple of 1 and λ = 1. So 1 is a simple eigenvalue and there are no other eigenvaluesof modulus 1.

Since 1 is a simple eigenvalue, there is a unique (up to scalar multiples) left eigenvectorp such that pP = p. As P is non-negative, we have that |p|P ≥ |p|, i.e.

k∑

i=1

|p(i)|P (i, j) ≥ |p(j)| (3.3.5)

and summing over j givesk∑

i=1

|p(i)| ≥k∑

j=1

|p(j)|

as P is stochastic. Hence we must have equality in (3.3.5), i.e. |p|P = |p|. Hence |p| is aleft eigenvector for P . Hence p is a scalar multiple of |p|, so without loss of generality wecan assume that p(i) ≥ 0 for all i.

To see that p(i) > 0, choose n such that all of the entries of Pn are positive. ThenpPn = p. Hence p(j) =

∑ki=1 p(i)Pn(i, j). The right-hand side of this expression is a sum

of non-negative terms and can only be zero if p(i) = 0 for all 1 ≤ i ≤ k, i.e. if p = 0. Henceall of the entries of p are strictly positive.

We can normalise p and assume that∑k

j=1 p(j) = 1.

Decompose Rk into the sum V0 + V1 of eigenspaces where

V0 = {v | 〈p, v〉 = 0}, V1 = span{1}

so that V1 is the eigenspace corresponding to the eigenvalue 1 and V0 is the sum of theeigenspaces of the remaining eigenvalues. Then P (V1) = V1 and P (V0) ⊂ V0. Note thatif w ∈ V0 then Pnw → 0 as 1 is not an eigenvalue of P when restricted to V0 and theeigenvalues of P restricted to V0 have modulus strictly less than 1.

33


Let v ∈ Rk and write v = c1 + w where 〈p,w〉 = 0. Hence c = 〈p, v〉. Then

Pnv = 〈p, v〉1 + Pnw.

Hence Pnv → 〈p, v〉 as n → ∞. Taking v = ej = (0, . . . , 0, 1, 0, . . . , 0), the standard basisvectors, we see that Pn(i, j) → p(j). ✷

Given an irreducible stochastic matrix P with corresponding normalised left eigenvectorp, we define a Markov measure µP on cylinders by defining

µP ([i0, i1, . . . , in−1]) = p(i0)P (i0, i1)P (i1, i2) · · ·P (in−2, in−1).

We can then extend µP to a probability measure on the Borel σ-algebra of Σ.Bernoulli measures are particular examples of Markov measures. Let p = (p(1), . . . , p(k)),

p(j) ∈ (0, 1),∑k

j=1 p(j) = 1 be a probability vector. Define

µp([i0, i1, . . . , in−1]) = p(i0)p(i1) · · · p(in−1).

and then extend to the Borel σ-algebra. We call µp the Bernoulli-p measure.We can now prove that Markov measures are invariant for shift maps.

Proposition 3.3.7Let σ : Σ → Σ be a shift map on the set of k symbols S. Let P be an irreducible stochasticmatrix with left eigenvector p. Then the Markov measure µP is a σ-invariant measure.

Proof. It is sufficient to prove that µP (σ−1[i0, . . . , in−1]) = µP ([i0, . . . , in−1]) for eachcylinder [i0, . . . , in−1]. First note that

σ−1[i0, . . . , in−1] = {x ∈ Σ | σ(x) ∈ [i0, . . . , in−1]}= {x ∈ Σ | x = (i, i0, . . . , in−1, . . .), i ∈ S}

=

k⋃

i=1

[i, i0, . . . , in−1].

Hence

µP (σ−1[i0, . . . , in−1]) = µP

(

k⋃

i=1

[i, i0, . . . , in−1]

)

=k∑

i=1

µP ([i, i0, . . . , in−1]) as this is a disjoint union

=

k∑

i=1

p(i)P (i, i0)P (i0, i1) · · ·P (in−2, in−1)

= p(i0)P (i0, i1) · · ·P (in−2, in−1) as pP = p

= µP ([i0, . . . , in−1])

where we have used the fact that pP = p. ✷

34


Remark. Bernoulli measures are familiar to you from probability theory. Suppose thatS = {H,T} so that Σ denotes all infinite sequences of Hs and T s. We can think of anelement of Σ as the outcome of an infinite sequence of coin tosses. Suppose that p = (pH , pT )is a probability vector with corresponding Bernoulli measure µp. Then, for example, thecylinder set [H,H, T ] denotes the set of (infinite) coin tosses that start H,H, T , and thisset has measure pHpHpT , corresponding to the probability of tossing H,H, T .

Markov measures are similar. Given a stochastic matrix P = (P (i, j)) and a left prob-ability eigenvector p = (p(1), . . . , p(k)) we defined


We can regard p(i0) as being the probability of outcome i0. Then we can regard P (i0, i1)as being the probability of outcome i1, given that the previous outcome was i0.

§3.4 Exercises

Exercise 3.1Show that in Weyl’s Criterion (Theorem 1.2.1) one cannot replace the hypothesis in equa-tion (1.2.1) that f is continuous with the hypothesis that f ∈ L1(R/Z,B, µ) (where µdenotes Lebesgue measure).

Exercise 3.2Let X be a compact metric space equipped with the Borel σ-algebra B. Show that acontinuous transformation T : X → X is measurable.

Exercise 3.3Give an example of a sequence of functions fn ∈ L1([0, 1],B, µ) (µ = Lebesgue measure)such that fn → 0 µ-a.e. but fn 6→ 0 in L1.

Exercise 3.4Let (X,B, µ) be a probability space and suppose that T : X → X is measurable. Showthat T∗µ is a probability measure on (X,B, µ).

Exercise 3.5(i) Show that the Gauss map does not preserve Lebesgue measure. (That is, find an

example of a Borel set B such that T−1B and B have different Lebesgue measures.)

(ii) Let µ denote Gauss’ measure and let λ denote Lebesgue measure. Show that if B ∈ B,the Borel σ-algebra of [0, 1], then

1

2 log 2λ(B) ≤ µ(B) ≤ 1

log 2λ(B). (3.4.1)

Conclude that a set B ∈ B has Lebesgue measure zero if and only if it has Gauss’measure zero. (Two measures with the same sets of measure zero are said to beequivalent.)

(iii) Using (3.4.1), show that f ∈ L1([0, 1],B, µ) if and only if f ∈ L1([0, 1],B, λ).

Exercise 3.6For an integer k ≥ 2 define T : R/Z → R/Z by T (x) = kx mod 1. Show that T preservesLebesgue measure.

35


Exercise 3.7Let β > 1 denote the golden ratio (so that β2 = β + 1). Define T : [0, 1] → [0, 1] byT (x) = βx mod 1. Show that T does not preserve Lebesgue measure. Define the measureµ by µ(B) =

∫

B k(x) dx where

k(x) =

11β+ 1

β3on [0, 1/β)

1

β“

1β+ 1

β3

” on [1/β, 1].

By using the Hahn-Kolmogorov Extension Theorem, show that µ is a T -invariant measure.

Exercise 3.8Define the logistic map T : [0, 1] → [0, 1] by T (x) = 4x(1 − x). Define the measure µ by

µ(B) =1

π

∫

B

1√

x(1 − x)dx.

(i) Check that µ is a probability measure.

(ii) By using the Hahn-Kolmogorov Extension Theorem, show that µ is a T -invariantmeasure.

Exercise 3.9Define T : [0, 1] → [0, 1] by

T (x) =

n(n + 1)x − n if x ∈(

1

n + 1,1

n

]

, n ∈ N

0 if x = 0.

This is called the Luroth map.Show that

∞∑

n=1

1

n(n + 1)= 1.

Show that T preserves Lebesgue measure.

Exercise 3.10Let Σ = {x = (xj)

∞j=0 | xj ∈ {1, 2, . . . , k}} denote the shift space on k symbols. For

x,y ∈ Σ, define n(x,y) to be the index of the first place in which the two sequences x,ydisagree, and write n(x,y) = ∞ if x = y. Define

d(x,y) =1

2n(x,y).

(i) Show that d(x,y) is a metric.

(ii) Show that the shift map σ is continuous.

(iii) Show that a cylinder set [i0, . . . , in−1] is both open and closed.

(One can also prove that Σ is compact; we shall use this fact later.)

36


Exercise 3.11Show that the matrix

P =

0 1 0 0 01/4 0 3/4 0 00 1/2 0 1/2 00 0 3/4 0 1/40 0 0 1 0

is irreducible but not aperiodic. Show that P has period 2. Show that {1, 2, . . . , 5} can bepartitioned into two sets S0 ∪S1 so that P (i, j) > 0 only if i ∈ Sℓ and j ∈ Sℓ+1 mod 2. Showthat P 2, when restricted to indices in S0 and in S1 is aperiodic.

Determine the eigenvalues of P . Find the unique left probability eigenvector p suchthat pP = p.

Exercise 3.12Show that Bernoulli measures are Markov measures. That is, given a probability vectorp = (p(1), . . . , p(k)), construct a stochastic matrix P such that pP = p. Show that thecorresponding Markov measure is the Bernoulli-p measure.

37

MATH4/61112 4. Examples of invariant measures

4. More examples of invariant measures

§4.1 Criteria for invariance

We shall give more examples of invariant measures. Recall that, given a measurable trans-formation T : X → X of a probability space (X,B, µ), we say that µ is a T -invariantmeasure (or, equivalently, T is a measure-preserving transformation) if µ(T−1B) = µ(B)for all B ∈ B.

We will need the following characterisations of invariance.

Lemma 4.1.1Let T : X → X be a measurable transformation of a probability space (X,B, µ). Then thefollowing are equivalent:

(i) T is a measure-preserving transformation;

(ii) for each f ∈ L1(X,B, µ) we have∫

f dµ =

∫

f ◦ T dµ;

(iii) for each f ∈ L2(X,B, µ) we have∫

f dµ =

∫

f ◦ T dµ.

Proof. We will use the identity χT−1B = χB ◦ T ; this is straightforward to check, seeExercise 4.1.

We prove that (i) implies (ii). Suppose that T is a measure-preserving transformation.For any characteristic function χB, B ∈ B,

∫

χB dµ = µ(B) = µ(T−1B) =

∫

χT−1B dµ =

∫

χB ◦ T dµ

and so the equality holds for any simple function (a finite linear combination of characteristicfunctions). Given any f ∈ L1(X,B, µ) with f ≥ 0, we can find an increasing sequence ofsimple functions fn with fn → f pointwise, as n → ∞. For each n we have

∫

fn dµ =

∫

fn ◦ T dµ

and, applying the Monotone Convergence Theorem to both sides, we obtain∫

f dµ =

∫

f ◦ T dµ.

To extend the result to a general real-valued integrable function f , consider the positiveand negative parts. To extend the result to complex-valued integrable functions f , takereal and imaginary parts.

38


That (ii) implies (iii) follows immediately, as L2(X,B, µ) ⊂ L1(X,B, µ).Finally, we prove that (iii) implies (i). Let B ∈ B. Then χB ∈ L2(X,B, µ) as

∫

|χB|2 dµ =∫

χB dµ = µ(B). Recalling that χB ◦ T = χT−1B we have that

µ(B) =

∫

χB dµ =

∫

χB ◦ T dµ =

∫

χT−1B dµ = µ(T−1B)

so that µ is a T -invariant probability measure. ✷

§4.2 Invariant measures on periodic orbits

Recall that if x ∈ X then we define the Dirac measure δx by

δx(B) =

{

1 if x ∈ B0 if x 6∈ B.

We also recall that if f : X → R then∫

f dδx = f(x).Let T : X → X be a measurable dynamical system defined on a measurable space

(X,B). Suppose that x = T nx is a periodic point with period n. Then the probabilitymeasure

µ =1

n

n−1∑

j=0

δT jx

is T -invariant. This is clear from Lemma 4.1.1, noting that for f ∈ L1(X,B, µ)

∫

f ◦ T dµ =1

n(f(Tx) + · · · + f(T n−1x) + f(T nx))

=1

n(f(x) + f(Tx) + · · · + f(T n−1x))

=

∫

f dµ,

using the fact that T nx = x.

§4.3 The change of variables formula

The change of variables formula (equivalently, integration by substitution) for (Riemann)integration should be familiar to you. It can be stated in the following way: if u : [a, b] →[c, d] is a differentiable bijection with continuous derivative and f : [c, d] → R is (Riemann)integrable then f ◦ u : [a, b] → R is (Riemann) integrable and

∫ u(b)

u(a)f(x) dx =

∫ b

af(u(x))u′(x) dx. (4.3.1)

Allowing for the possibility that u is decreasing (so that u(b) < u(a)), we can rewrite (4.3.1)as

∫

[c,d]f(x) dx =

∫

[a,b]f(u(x))|u′(x)| dx. (4.3.2)

We would like a version of (4.3.2) that holds for (Lebesgue) integrable functions on subsetsof Rn, equipped with Lebesgue measure on Rn.

39


Theorem 4.3.1 (Change of variables formula)Let B ⊂ Rn be a Borel subset of Rn and suppose that B ⊂ U for some open subset U .Suppose that u : U → Rn is a diffeomorphism onto its image (i.e. u : U → u(U) is adifferentiable bijection with differentiable inverse). Then u(B) is a Borel set.

Let µ denote Lebesgue measure on Rn and let f : Rn → C be integrable. Then

∫

u(B)f dµ =

∫

Bf ◦ u|det Du| dµ

where Du denotes the matrix of partial derivatives of u.

There are more sophisticated versions of the change of variables formula that hold forarbitrary measures on Rn.

§4.4 Rotations of a circle

We illustrate how one can use the change of variables formula for integration to prove thatLebesgue measure is an invariant measure for certain maps on the circle.

Proposition 4.4.1Fix α ∈ R and define T (x) = x + α mod 1. Then Lebesgue measure µ is T -invariant.

Proof. By Lemma 4.1.1(ii) we need to show that∫

f ◦ T dµ =∫

f dµ for every f ∈L1(X,B, µ).

Recall that we can identify functions on R/Z with 1-periodic functions on R. By usingthe substitution u(x) = x + α and the change of variables formula for integration we havethat

∫

f ◦ T dµ =

∫ 1

0f(Tx) dx =

∫ 1

0f(x + α) dx =

∫ 1+α

αf(x) dx

=

∫ 1

αf(x) dx +

∫ 1+α

1f(x) dx =

∫ 1

αf(x) dx +

∫ α

0f(x) dx =

∫ 1

0f(x) dx

where we have used the fact that∫ α0 f(x) dx =

∫ 1+α1 f(x) dx by the periodicity of f . ✷

§4.5 Toral automorphisms

Let X = Rk/Zk be the k-dimensional torus. Let A = (a(i, j)) be a k×k matrix with entriesin Z and with detA 6= 0. We can define a linear map Rk → Rk by

x1...

xk

7→ A

x1...

xk

.

For brevity, we shall often abuse this notation by writing this as (x1, . . . , xk) 7→ A(x1, . . . , xk).Since A is an integer matrix it maps Zk to itself. We claim that A allows us to define

a mapT = TA : X → X : (x1, . . . , xk) + Zk 7→ A(x1, . . . , xk) + Zk.

We shall often abuse notation and write T (x1, . . . , xk) = A(x1, . . . , xk) mod 1.

40


To see that this map is well defined, we need to check that if x + Zk = y + Zk thenAx + Zk = Ay + Zk. If x, y ∈ Rk give the same point in the torus, then x = y + n for somen ∈ Zk. Hence Ax = A(y + n) = Ay + An. As A maps Zk to itself, we see that An ∈ Zk

so that Ax,Ay determine the same point in the torus.

Definition. Let A = (a(i, j)) denote a k×k matrix with integer entries such that detA 6=0. Then we call the map TA : Rk/Zk → Rk/Zk a linear toral endomorphism.

The map T is not invertible in general. However, if det A = ±1 then A−1 exists and isan integer matrix. Hence we have a map T−1 given by

T−1(x1, . . . , xk) = A−1(x1, . . . , xk) mod 1.

One can easily check that T−1 is the inverse of T .

Definition. Let A = (a(i, j)) denote a k×k matrix with integer entries such that detA =±1. Then we call the map TA : Rk/Zk → Rk/Zk a linear toral automorphism.

Remark. The reason for this nomenclature is clear. If TA is either a linear toral en-domorphism or linear toral automorphism, then it is an endomorphism or automorphism,respectively, of the torus regarded as an additive group.

Example. Take A to be the matrix

A =

(

2 11 1

)

and define T : R2/Z2 → R2/Z2 to be the induced map:

T (x1, x2) = (2x1 + x2 mod 1, x1 + x2 mod 1).

Then T is a linear toral automorphism and is called Arnold’s CAT map (CAT stands for‘C’ontinuous ‘A’utomorphism of the ‘T’orus). See Figure 4.5.1.

Definition. Suppose that det A = ±1. Then we call T a hyperbolic toral automorphismif A has no eigenvalues of modulus 1.

Proposition 4.5.1Let T be a linear toral automorphism of the k-dimensional torus X = Rk/Zk. ThenLebesgue measure µ is T -invariant.

Proof. By Lemma 4.1.1(ii) we need to show that∫

f ◦ T dµ =∫

f dµ for every f ∈L1(X,B, µ).

Recall that we can identify functions f : Rk/Zk → C with functions f : Rk → C thatsatisfy f(x + n) = f(x) for all n ∈ Zk. We apply the change of variables formula with thesubstitution T (x) = Ax. Note that DT (x) = A and |det DT | = 1. Hence, by the changeof variables formula

∫

f ◦ T dµ =

∫

Rk/Zk

f ◦ T |det DT | dµ =

∫

T (Rk/Zk)f dµ =

∫

f dµ.

✷

We shall see in §5.4.3 that linear toral endomorphisms (i.e. when A is a k×k integer matrixwith detA 6= 0 also preserves Lebesgue measure.

41


Figure 4.5.1: Arnold’s CAT map

§4.6 Exercises

Exercise 4.1Suppose that T : X → X. Show that χT−1B = χB ◦ T .

Exercise 4.2Let T : R/Z → R/Z, T (x) = 2x mod 1, denote the doubling map. Show that the periodicpoints for T are points of the form p/(2n − 1), p = 0, 1, . . . , 2n − 2. Conclude that T hasinfinitely many invariant measures.

Exercise 4.3By using the change of variables formula, prove that the doubling map T (x) = 2x mod 1on R/Z preserves Lebesgue measure.

Exercise 4.4Fix α ∈ R and define the map T : R2/Z2 → R2/Z2 by

T ((x, y) + Z2) = (x + α, x + y) + Z2.

By using the change of variables formula, prove that Lebesgue measure is T -invariant.

42

MATH4/61112 5. Ergodic measures

5. Ergodic measures: definition, criteria, and basic examples

§5.1 Introduction

In section 3 we defined what is meant by an invariant measure or, equivalently, what ismeant by a measure-preserving transformation. In this section, we define what is meant byan ergodic measure. The primary motivation for ergodicity is Birkhoff’s Ergodic Theorem:if T is an ergodic measure-preserving transformation of the probability space (X,B, µ) then,for each f ∈ L1(X,B, µ) we have that

limn→∞

1

n

n−1∑

j=0

f(T jx) =

∫

f dµ

for µ-a.e. x ∈ X. Checking that a given measure-preserving transformation is ergodic isoften a highly non-trivial task and we shall study some methods for proving ergodicity.

§5.2 Ergodicity

We define what it means to say that a measure-preserving transformation is ergodic.

Definition. Let (X,B, µ) be a probability space and let T : X → X be a measure-preserving transformation. We say that T is an ergodic transformation with respect to µ(or that µ is an ergodic measure) if, whenever B ∈ B satisfies T−1B = B, then we havethat µ(B) = 0 or 1.

Remark. Ergodicity can be viewed as an indecomposability condition. If ergodicity doesnot hold then we can find a set B ∈ B such that T−1B = B and 0 < µ(B) < 1. We canthen split T : X → X into T : B → B and T : X \ B → X \ B with invariant probabilitymeasures 1

µ(B)µ(· ∩ B) and 11−µ(B)µ(· ∩ (X \ B)), respectively.

It will sometimes be convenient for us to weaken the condition T−1B = B to µ(T−1B△B) =0, where △ denotes the symmetric difference:

A△B = (A \ B) ∪ (B \ A).

We will often write that A = B µ-a.e. or A = B mod 0 to mean that µ(A△B) = 0.

Remark. It is easy to see that if A = B µ-a.e. then µ(A) = µ(B).

Lemma 5.2.1Let T be a measure-preserving transformation of the probability space (X,B, µ).

Suppose that B ∈ B is such that µ(T−1B△B) = 0. Then there exists B′ ∈ B withT−1B′ = B′ and µ(B△B′) = 0. (In particular, µ(B) = µ(B′).)

43


Proof (not examinable). For each n ≥ 0, we have the inclusion

T−nB△B ⊂n−1⋃

j=0

(

T−(j+1)B△T−jB)

=

n−1⋃

j=0

T−j(T−1B△B).

Hence, as T preserves µ,

µ(T−nB△B) ≤ nµ(T−1B△B) = 0.

Let

B′ =

∞⋂

n=0

∞⋃

j=n

T−jB.

We have that

µ

B△∞⋃

j=n

T−jB

≤∞∑

j=n

µ(B△T−nB) = 0.

Since the sets⋃∞

j=n T−jB decrease as n increases we have that µ(B△B′) = 0. Also,

T−1B′ =∞⋂

n=0

∞⋃

j=n

T−(j+1)B =∞⋂

n=0

∞⋃

j=n+1

T−jB = B′,

as required. ✷

Corollary 5.2.2If T is ergodic and µ(T−1B△B) = 0 then µ(B) = 0 or 1.

We have the following convenient characterisations of ergodicity.

Proposition 5.2.3Let T be a measure-preserving transformation of the probability space (X,B, µ). Thefollowing are equivalent:

(i) T is ergodic;

(ii) whenever f ∈ L1(X,B, µ) satisfies f ◦ T = f µ-a.e. we have that f is constant µ-a.e.

(iii) whenever f ∈ L2(X,B, µ) satisfies f ◦ T = f µ-a.e. we have that f is constant µ-a.e.

Remark. If f is a constant function then clearly f ◦ T = f . Proposition 5.2.3 says that,when T is ergodic, the constants are the only T -invariant functions (up to sets of measurezero).

Proof. We prove that (i) implies (ii). Suppose that T is ergodic. Suppose that f ∈L1(X,B, µ) is such that f ◦ T = f µ-a.e. By taking real and imaginary parts, we canassume without loss of generality that f is real-valued. For k ∈ Z and n ∈ N, define

X(k, n) =

{

x ∈ X | k

2n≤ f(x) <

k + 1

2n

}

= f−1

[

k

2n,k + 1

2n

)

.

Since f is measurable, we have that X(k, n) ∈ B.

44


We have that

T−1X(k, n)△X(k, n) ⊂ {x ∈ X | f(Tx) 6= f(x)}

so thatµ(T−1X(k, n)△X(k, n)) = 0.

Hence, as T is ergodic, we have by Corollary 5.2.2 that µ(X(k, n)) = 0 or µ(X(k, n)) = 1.As f ∈ L1(X,B, µ) is integrable, we have that f is finite almost everywhere. Hence, for

each n,

f−1R = f−1

(

∞⋃

k=−∞

[

k

2n,k + 1

2n

)

)

=∞⋃

k=−∞

f−1

[

k

2n,k + 1

2n

)

=∞⋃

k=−∞

X(k, n)

is equal to X up to a set of measure zero, i.e.,

µ

(

X△⋃

k∈Z

X(k, n)

)

= 0;

moreover, this union is disjoint. Hence we have

∑

k∈Z

µ(X(k, n)) = µ(X) = 1

and so there is a unique kn for which µ(X(kn, n)) = 1. Let

Y =

∞⋂

n=1

X(kn, n).

Then µ(Y ) = 1. Let x, y ∈ Y . Then for each n we have that f(x), f(y) ∈ [kn/2n, (kn +1)/2n). Hence for all n ≥ 1 we have that

|f(x) − f(y)| ≤ 1

2n.

Hence f(x) = f(y). Hence f is constant on the set Y . Hence f is constant µ-a.e.That (ii) implies (iii) is clear as if f ∈ L2(X,B, µ) then f ∈ L1(X,B, µ).Finally, we prove that (iii) implies (i). Suppose that B ∈ B is such that T−1B = B.

Then χB ∈ L2(X,B, µ) and χB ◦ T (x) = χB(x) for all x ∈ X. Hence χB is constant µ-a.e.Since χB only takes the values 0 and 1, we must have χB = 0 µ-a.e. or χB = 1 µ-a.e.Therefore

µ(B) =

∫

XχB dµ =

{

0 if χB = 0 µ-a.e.1 if χB = 1 µ-a.e.

Hence T is ergodic with respect to µ. ✷

§5.3 Fourier series

We shall give a method for proving that certain transformations of the circle or torus areergodic with respect to Lebesgue measure. To do this, we use Proposition 5.2.3 and Fourierseries.

45


Let X = R/Z denote the unit circle and let f : X → R. (Alternatively, we can thinkof f as a periodic function R → R with f(x) = f(x + n) for all n ∈ Z.) Equip X with theBorel σ-algebra, let µ denote Lebesgue measure and assume that f ∈ L2(X,B, µ).

We can associate to f its Fourier series

a0

2+

∞∑

n=1

(an cos 2πnx + bn sin 2πnx) , (5.3.1)

where

an = 2

∫ 1

0f(x) cos 2πnx dµ, bn = 2

∫ 1

0f(x) sin 2πnx dµ.

(Notice that we are not claiming that the series converges—we are just formally associatingthe Fourier series to f .)

We shall find it more convenient to work with a complex form of the Fourier series andrewrite (5.3.1) as

∞∑

n=−∞

cne2πinx, (5.3.2)

where

cn =

∫ 1

0f(x)e−2πinxdµ.

(In particular, c0 =∫ 10 f dµ.) We call cn the nth Fourier coefficient.

Remark. That (5.3.2) and (5.3.1) are equivalent follows from the fact that

cos 2πnx =e2πinx + e−2πinx

2, sin 2πnx =

e2πinx − e−2πinx

2i

One can explain Fourier series by considering a more general construction. Recall thatan inner product on a complex vector space H is a function 〈·, ·〉 : H×H → C such that

(i) 〈u, v〉 = 〈v, u〉 for all u, v ∈ H,

(ii) for each v ∈ H, u 7→ 〈u, v〉 is linear,

(iii) 〈v, v〉 ≥ 0 for all v ∈ H, with equality if and only if v = 0.

Given an inner product, one can define a norm on H by setting ‖v‖ =√

〈v, v〉. One canthen define a metric on H by setting dH(u, v) = ‖u − v‖.

If H is a complex vector space with an inner product 〈·, ·〉 such that H is complete withrespect to the metric given by the inner product then we call H a Hilbert space.

Recall that L2(X,B, µ) is a Hilbert space with the inner product

〈f, g〉 =

∫

f g dµ.

The metric on L2(X,B, µ) is then given by

d(f, g) =

(∫

|f − g|2 dµ

)1/2

.

Let H be an infinite dimensional Hilbert space. We say that {ej}∞j=0 is an orthonormalbasis for H if:

46


(i) 〈ei, ej〉 =

{

0 if i 6= j1 if i = j.

(ii) every v ∈ H can be written in the form

v =∞∑

j=0

cjej . (5.3.3)

As (5.3.3) involves an infinite sum, we need to be careful about what convergence means.To make (5.3.3) precise, let sn =

∑nj=0 cjej denote the nth partial sum. Then (5.3.3) means

that ‖v − sn‖ → 0 as n → ∞.As the vectors {ej}∞j=0 are orthonormal, taking the inner product of (5.3.3) with ei

shows that

〈v, ei〉 = 〈∞∑

j=0

cjej, ei〉 =

∞∑

j=0

cj〈ej , ei〉 = ci.

Let X = R/Z be the circle and let B be the Borel σ-algebra. Let µ denote Lebesguemeasure. Let x ∈ R/Z. Let en(x) = e2πinx. Then {en}∞n=−∞ is an orthonormal basis forthe Hilbert space L2(X,B, µ). Thus if f ∈ L2(X,B, µ) then we can write

f(x) =∞∑

n=−∞

cne2πinx

(in the sense that the sequence of partial sums L2-converges to f) where

cn = 〈f, en〉 =

∫

f(x)e−2πinx dµ. (5.3.4)

If we want to make the dependence of cn on f clear, then we will sometimes write cn(f)for cn.

We shall need the following facts about Fourier coefficients.

Proposition 5.3.1(i) Let f, g ∈ L2(X,B, µ). Then f = g µ-a.e. if and only if their Fourier coefficients are

equal, i.e. cn(f) = cn(g) for all n ∈ Z.

(ii) Let f ∈ L2(X,B, µ). Then cn → 0 as n → ±∞.

Remark. Proposition 5.3.1(ii) is better known as the Riemann-Lebesgue Lemma.

So far, we have studied Fourier series for functions defined on the circle; a similarconstruction works for functions defined on the k-dimensional torus. Let X = Rk/Zk

be the k-dimensional torus equipped with the Borel σ-algebra and let µ denote Lebesguemeasure on X. Then L2(X,B, µ) is a Hilbert space when equipped with the inner product

〈f, g〉 =

∫

f g dµ.

Let x ∈ Rk/Zk. Let n = (n1, . . . , nk) ∈ Zk and define en(x) = e2πi〈n,x〉 where 〈n, x〉 =n1x1 + · · · + nkxk. Then {en}n∈Zk is an orthonormal basis for L2(X,B, µ). Thus we canwrite f ∈ L2(X,B, µ) as

f(x) =∑

n∈Zk

cne2πi〈n,x〉

47


in the sense that the sequence of partial sums sN converges in L2(X,B, µ) where

sN (x) =∑

n=(n1,...,nk)∈Zk ,|nj|≤N

cne2πi〈n,x〉.

The nth Fourier coefficient is given by

cn = cn(f) =

∫

f(x)e−2πi〈n,x〉 dµ.

We have the following analogue of Proposition 5.3.1:

Proposition 5.3.2(i) Let f, g ∈ L2(X,B, µ). Then f = g µ-a.e. if and only if their Fourier coefficients are

equal.

(ii) Let f ∈ L2(X,B, µ). Let n = (n1, . . . , nk) ∈ Zk and define ‖n‖ = max1≤j≤k |nj|.Then cn → 0 as ‖n‖ → ∞.

Remark. We could have used any norm on Zk in (ii).

§5.4 Proving ergodicity using Fourier series

In the previous section we studied a number of examples of dynamical systems definedon the circle or the torus and we proved that Lebesgue measure is invariant. We showhow Proposition 5.2.3 can be used in conjunction with Fourier series to determine whetherLebesgue measure is ergodic.

Recall that if f ∈ L2(X,B, µ) then we associate to f the Fourier series

∞∑

n=−∞

cn(f)e2πinx

where

cn(f) =

∫

f(x)e−2πinx dµ.

If we let sn(x) =∑n

ℓ=−n cℓ(f)e2πiℓx then ‖sn − f‖2 → 0 as n → ∞.If T is a measure-preserving transformation then it follows that

‖sn ◦ T − f ◦ T‖2 =

(∫

|sn ◦ T − f ◦ T |2 dµ

)1/2

=

(∫

(|sn − f |)2 ◦ T dµ

)1/2

=

(∫

(|sn − f |)2 dµ

)1/2

= ‖sn − f‖2 → 0

as n → ∞, where we have used Lemma 4.1.1. By Proposition 5.3.2(i) it follows that, iflimn→∞ sn ◦ T is a possibly infinite sum of terms of the form e2πinx, then it must be theFourier series of f ◦ T . In practice, this means that if we take the Fourier series for f(x)and evaluate it at Tx, then we obtain the Fourier series for f(Tx). If f ◦ T = f almosteverywhere, then we can use Proposition 5.3.1(i) to compare Fourier coefficients to obtainrelationships between the Fourier coefficients, and then show that f must be constant.

A similar method works for Fourier series on the torus, as we shall see.

48


§5.4.1 Rotations on a circle

Fix α ∈ R and define T : R/Z → R/Z by T (x) = x+α mod 1. We have already seen that Tpreserves Lebesgue measure. The following result gives a necessary and sufficient conditionfor T to be ergodic.

Theorem 5.4.1Let T (x) = x + α mod 1.

(i) If α ∈ Q then T is not ergodic with respect to Lebesgue measure.

(ii) If α 6∈ Q then T is ergodic with respect to Lebesgue measure.

Proof. Suppose that α ∈ Q and write α = p/q for p, q ∈ Z with q 6= 0. Define

f(x) = e2πiqx ∈ L2(X,B, µ).

Then f is not constant but

f(Tx) = e2πiq(x+p/q) = e2πi(qx+p) = e2πiqx = f(x).

Hence T is not ergodic.Suppose that α 6∈ Q. Suppose that f ∈ L2(X,B, µ) is such that f ◦T = f a.e. We want

to prove that f is constant. Suppose that f has Fourier series

∞∑

n=−∞

cne2πinx.

Then f ◦ T has Fourier series∞∑

n=−∞

cne2πinαe2πinx.

Comparing Fourier coefficients we see that

cn = cne2πinα,

for all n ∈ Z. As α 6∈ Q, we see that e2πinα 6= 1 unless n = 0. Hence cn = 0 for n 6= 0.Hence f has Fourier series c0, i.e. f is constant a.e. ✷

§5.4.2 The doubling map

Let X = R/Z. Recall that if f ∈ L2(X,B, µ) has Fourier series

∞∑

n=−∞

cne2πinx

then the Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)) tells us that cn → 0 as n → ∞.

Proposition 5.4.2The doubling map T : X → X defined by T (x) = 2x mod 1 is ergodic with respect toLebesgue measure µ.

49


Proof. Let f ∈ L2(X,B, µ) and suppose that f ◦ T = f µ-a.e. Let f have Fourier series

f(x) =∞∑

n=−∞

cne2πinx.

For each p ≥ 0, f ◦ T p has Fourier series

∞∑

n=−∞

cne2πin2px.


cn = c2pn

for all n ∈ Z and each p = 0, 1, 2, . . .. Suppose that n 6= 0. Then |2pn| → ∞ as p → ∞. Bythe Riemann-Lebesgue Lemma (Proposition 5.3.1(ii)), c2pn → 0 as p → ∞. As c2pn = cn,we must have that cn = 0 for n 6= 0. Thus f has Fourier series c0, and so must be equal toa constant a.e. Hence T is ergodic with respect to µ. ✷

§5.4.3 Toral endomorphisms

The argument for the doubling map can be generalised using higher-dimensional Fourierseries to study toral endomorphisms. Let X = Rk/Zk and let µ denote Lebesgue measure.When T is invertible (and so a linear toral automorphism) we have already seen thatLebesgue measure is an invariant measure; in §7 we shall see that Lebesgue measure is aninvariant measure when T is a linear toral endomorphism.

Recall that f ∈ L2(X,B, µ) has Fourier series

∑

n∈Zk

cne2πi〈n,x〉,

where n = (n1, . . . , nk), x = (x1, . . . , xk). Define |n| = max1≤j≤k |nj |. Then the Riemann-Lebesgue Lemma tells us that cn → 0 as |n| → ∞.

Let A be a k × k integer matrix with det A 6= 0 and define T : X → X by

T ((x1, . . . , xk) + Zk) = A(x1, . . . , xk) + Zk.

Proposition 5.4.3A linear toral endomorphism T is ergodic with respect to µ if and only if no eigenvalue ofA is a root of unity.

Remark. In particular, hyperbolic toral automorphisms (i.e. det A = ±1 and A has noeigenvalues of modulus 1) are ergodic with respect to Lebesgue measure.

Proof. Suppose that T is ergodic but, for a contradiction, that A has a pth root of unityas an eigenvalue. We choose p > 0 to be the least such integer. Then Ap has 1 as aneigenvalue, and so n(Ap − I) = 0 for some non-zero vector n = (n1, . . . , nk) ∈ Rk. Since Ais an integer matrix, we have that Ap − I is an integer matrix, and so we can in fact taken ∈ Zk. Note that

e2πi〈n,Apx〉 = e2πi〈nAp,x〉 = e2πi〈n,x〉.

50


This is because, writing x = (x1, . . . , xk)T ,

〈n, Ax〉 = (n1, . . . , nk)

a1,1 · · · a1,k...

...ak,1 · · · ak,k

x1...

xk

= 〈nA,x〉.

Define

f(x) =

p−1∑

j=0

e2πi〈n,Ajx〉.

Then f ∈ L2(X,B, µ) and is T -invariant. Since T is ergodic, we must have that f isconstant. But the only way in which this can happen is if n = 0, a contradiction.

Conversely suppose that no eigenvalue of A is a root of unity; we prove that T is ergodicwith respect to Lebesgue measure. Suppose that f ∈ L2(X,B, µ) is T -invariant µ-a.e. Weshow that f is constant µ-a.e. Associate to f its Fourier series:

∑

n∈Zk

cne2πi〈n,x〉.

Since fT p = f µ-a.e., for all p > 0, we have that

∑

n∈Zk

cne2πi〈nAp ,x〉 =∑

n∈Zk

cne2πi〈n,x〉.

Comparing Fourier coefficients we see that, for every n ∈ Zk,

cn = cnA = · · · = cnAp = · · · .

If cn 6= 0 then there can only be finitely many indices in the above list, for otherwise itwould contradict the fact that cn → 0 as |n| → ∞, by the Riemann-Lebesgue Lemma(Proposition 5.3.1(ii)). Hence there exist q1 > q2 ≥ 0 such that nAq1 = nAq2. Lettingp = q1 − q2 > 0 we see that nAp = n. Thus n is either equal to 0 or n is an eigenvector forAp with eigenvalue 1. In the latter case, A would have a pth root of unity as an eigenvalue.Hence n = 0. Hence cn = 0 unless n = 0 and so f is equal to the constant c0 µ-a.e. ThusT is ergodic. ✷

§5.5 Exercises

Exercise 5.1Suppose that α ∈ Q. Show directly from the definition that the rotation T (x) = x+α mod 1is not ergodic, i.e. find an invariant set B = T−1B, B ∈ B, which has Lebesgue measure0 < µ(B) < 1.

Exercise 5.2Define T : R2/Z2 → R2/Z2 by

T ((x, y) + Z2) = (x + α, x + y) + Z2.

Suppose that α 6∈ Q. By using Fourier series, show that T is ergodic with respect toLebesgue measure.

51


Exercise 5.3Let T : X → X be a measurable transformation of a measurable space (X,B). Supposethat x = T nx is a periodic point with period n. Define the measure µ supported on theperiodic orbit of x by

µ =1

n

n−1∑

j=0

δT jx

where δx denotes the Dirac measure at x. Show from the definition of ergodicity that µ isan ergodic measure.

Exercise 5.4(Part (iv) of this exercise is outside the scope of the course!)

It is easy to construct lots of examples of hyperbolic toral automorphisms (i.e. noeigenvalues of modulus 1—the CAT map is such an example), which must necessarily beergodic with respect to Lebesgue measure. It is harder to show that there are ergodic toralautomorphisms with some eigenvalues of modulus 1.

(i) Show that to have an ergodic toral automorphism of Rk/Zk with an eigenvalue ofmodulus 1, we must have k ≥ 4.

Consider the matrix

A =

0 1 0 00 0 1 00 0 0 1−1 8 −6 8

.

(ii) Show that A defines a linear toral automorphism TA of the 4-dimensional torus R4/Z4.

(iii) Show that A has four eigenvalues, two of which have modulus 1.

(iv) Show that TA is ergodic with respect to Lebesgue measure. (Hint: you have to showthat the two eigenvalues of modulus 1 are not roots of unity, i.e. are not solutions toλn − 1 = 0 for some n. The best way to do this is to use results from Galois theoryon the irreducibility of polynomials.)

52

MATH4/61112 6. Ergodic measures: using the HKET

6. Ergodic measures: Using the Hahn-Kolmogorov ExtensionTheorem to prove ergodicity

§6.1 Introduction

We illustrate a method for proving that a given transformation is ergodic using the Hahn-Kolmogorov Extension Theorem. The key observation is the following technical lemma.

Lemma 6.1.1Let (X,B, µ) be a probability space and suppose that A ⊂ B is an algebra that generatesB. Let B ∈ B. Suppose there exists K > 0 such that

µ(B)µ(I) ≤ Kµ(B ∩ I) (6.1.1)

for all I ∈ A. Then µ(B) = 0 or 1.

Proof. Let ε > 0. As A generates B there exists I ∈ A such that µ(Bc△I) < ε. Hence|µ(Bc) − µ(I)| < ε. Moreover, note that B ∩ I ⊂ Bc△I so that µ(B ∩ I) < ε. Hence

µ(B)µ(Bc) ≤ µ(B)(µ(I) + ε) ≤ µ(B)µ(I) + µ(B)ε ≤ Kµ(B ∩ I) + ε ≤ (K + 1)ε.

As ε > 0 is arbitrary, it follows that µ(B)µ(Bc) = 0. Hence µ(B) = 0 or 1. ✷

Remark. We will often apply Lemma 6.1.1 when A is an algebra of finite unions ofintervals or cylinders. In this case, we need only check that there exists a constant K > 0such that (6.1.1) holds for intervals or cylinders. To see this, let I =

⋃kj=1 Ij be a finite

union of pairwise disjoint sets in A. Then if (6.1.1) holds for Ij then

µ(B)µ(I) = µ(B)µ

k⋃

j=1

Ij

=

k∑

j=1

µ(B)µ(Ij)

≤ Kk∑

j=1

µ(B ∩ Ij) = Kµ

B ∩k⋃

j=1

Ij

= Kµ(B ∩ I).

We will also use the change of variables formula for integration. Recall that if I, J ⊂ R

are intervals, u : I → J is a differentiable bijection, and f : J → R is integrable, then

∫

Jf(x) dx =

∫

If(u(x))|u′(x)| dx.

53


§6.2 The doubling map

To illustrate the method, we give another proof that the doubling map is ergodic withrespect to Lebesgue measure. Let X = [0, 1] be the unit interval, let B be the Borelσ-algebra, and let µ be Lebesgue measure.

Given x ∈ [0, 1], we can write x as a base 2 ‘decimal’ expansion:

x = ·x0x1x2 . . . =

∞∑

j=0

xj

2j+1(6.2.1)

where xj ∈ {0, 1}. Note that

T (x) = 2∞∑

j=0

xj

2j+1mod 1 = x0 +

∞∑

j=0

xj+1

2j+1mod 1 =

∞∑

j=0

xj+1

2j+1.

Hence if x has base 2 expansion given by (6.2.1) then T (x) has base 2 expansion given by

T (x) = ·x1x2x3 . . .

i.e. T deletes the zeroth term in the base 2 expansion of x and shifts the remaining termsone place to the left.

We introduce dyadic intervals or cylinders to be the sets

I(i0, i1, . . . , in−1) = {x ∈ [0, 1] | xj = ij , j = 0, . . . , n − 1}.

(So, for example, I(0) = [0, 1/2], I(1) = [1/2, 1], I(0, 0) = [0, 1/4], I(0, 1) = [1/4, 1/2], etc.)We call n the rank of the cylinder. A dyadic interval is an interval with end-points atk/2n, (k + 1)/2n where n ≥ 1 and k ∈ {0, 1, . . . , 2n}.

Let A denote the algebra of finite unions of cylinders. Then A generates the Borelσ-algebra. This follows from Proposition 2.4.2 by noting that cylinders are intervals (andso Borel) and that they separate points: if x, y ∈ [0, 1], x 6= y, then they have base 2expansions that differ at some index, say xn 6= yn. Hence x, y belong to disjoint cylindersof rank n.

Define the maps

φ0(x) =x

2, φ1(x) =

x + 1

2.

Then φ0 : [0, 1] → I(0) and φ1 : [0, 1] → I(1) are differentiable bijections. Indeed, ifx ∈ [0, 1] has base 2 expansion

x = ·x0x1x2 . . .

then φ0(x) and φ1(x) have base 2 expansions given by

φ0(x) = ·0x0x1x2 . . . , φ1(x) = ·1x0x1x2 . . . .

Thus φ0 and φ1 act on base 2 expansions as a shift to the right, inserting the digits 0 and1 in the zeroth place, respectively. Note that Tφ0(x) = x and Tφ1(x) = x for all x ∈ [0, 1].

Given i0, i1, . . . , in−1 ∈ {0, 1}, define

φi0,i1,...,in−1 : [0, 1] → I(i0, i1, . . . , in−1)

byφi0,i1,...,in−1 = φi0φi1 · · ·φin−1 . (6.2.2)

54


Thus φi0,i1,...,in−1 takes the point x with base 2 expansion given by (6.2.1), shifts the digitsn places to the right, and inserts the digits i0, i1, . . . , in−1 in the first n places. Note thatT nφi0,i1,...,in−1(x) = x for all x ∈ [0, 1].

We are now in a position to prove that T is ergodic with respect to Lebesgue measure.Let B ∈ B be such that T−1B = B. We must show that µ(B) = 0 or 1. By Lemma 6.1.1,it is sufficient to prove that there exists K > 0 such that µ(B)µ(I) ≤ Kµ(B ∩ I) for allintervals I; in fact, we shall prove that µ(B)µ(I) = µ(B ∩ I) for all dyadic intervals I.

Note that T−nB = B. Let I = I(i0, i1, . . . , in−1) be a cylinder of rank n and letφ = φi0,i1,...,in−1 . Then T nφ(x) = x. Note also that µ(I) = 1/2n. We will also need thefact that φ′(x) = 1/2n (this follows by noting that φ′

0(x) = φ′1(x) = 1/2 and differentiating

(6.2.2) using the chain rule).Finally, we observe that

µ(B ∩ I) =

∫

χB∩I(x) dx

=

∫

χB(x)χI(x) dx

=

∫

IχB(x) dx

=

∫ 1

0χB(φ(x))φ′(x) dx by the change of variables formula

=

∫ 1

0χT−nB(φ(x))φ′(x) dx as T−nB = B

=

∫ 1

0χB(T n(φ(x)))φ′(x) dx as χT−nB = χB ◦ T n

=

∫ 1

0χB(x)φ′(x) dx as T nφ(x) = x

=1

2n

∫ 1

0χB(x) as φ′(x) = 1/2n

= µ(I)µ(B) as µ(I) = 1/2n.

Hence µ(B ∩ I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 itfollows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T .

§6.3 The Gauss map

Let x ∈ [0, 1]. If x has continued fraction expansion

x =1

x0 +1

x1 +1

x2 + · · ·

then for brevity we write x = [x0, x1, x2, . . .].Let X = [0, 1] and recall that the Gauss map is defined by T (x) = 1/x mod 1 (with

T defined at 0 by setting T (0) = 0). If x has continued fraction expansion [x0, x1, x2, . . .]then T (x) has continued fraction expansion [x1, x2, . . .]. We have already seen that T leaves

55


Gauss’ measure µ invariant, where Gauss’ measure is defined by

µ(B) =1

log 2

∫

B

1

1 + xdx.

We shall find it convenient to swap between Gauss’ measure µ and Lebesgue measure,which we shall denote here by λ. Recall from Exercise 3.5 that for any set B ∈ B we have

1

2 log 2λ(B) ≤ µ(B) ≤ 1

log 2λ(B).

Hence µ(B) = 0 if and only if λ(B) = 0. Thus to prove ergodicity it suffices to show thatany T -invariant set B has either λ(B) = 0 or λ(Bc) = 0.

We shall also need some basic facts about continued fractions. Let x ∈ (0, 1) be irrationaland have continued fraction expansion [x0, x1, . . .]. For any t ∈ [0, 1], write

[x0, x1, . . . , xn−1 + t] =Pn(x0, x1, . . . , xn−1; t)

Qn(x0, x1, . . . , xn−1; t)

where Pn(x0, x1, . . . , xn−1; t) and Qn(x0, x1, . . . , xn−1; t) are polynomials in x0, x1, . . . , xn−1

and t. Let Pn = Pn(x0, x1, . . . , xn−1; 0), Qn = Qn(x0, x1, . . . , xn−1; 0) (we suppress thedependence of Pn and Qn on x0, . . . , xn−1 for brevity). The following lemma is easilyproved using induction.

Lemma 6.3.1(i) We have

Pn(x0, x1, . . . , xn−1; t) = Pn + tPn−1, Qn(x0, x1, . . . , xn−1; t) = Qn + tQn−1.

and the following recurrence relations hold:

Pn+1 = xnPn + Pn−1, Qn+1 = xnQn + Qn−1

with initial conditions P0 = 0, P1 = 1, Q0 = 1, Q1 = x0.

(ii) The following identity holds:

QnPn−1 − Qn−1Pn = (−1)n.

Let i0, i1, . . . , in−1 ∈ N. Define the cylinder I(i0, i1, . . . , in−1) to be the set of all pointsx ∈ (0, 1) whose continued fraction expansion starts with i0, . . . , in−1. This is easily seento be an interval; indeed

I(i0, i1, . . . , in−1) = {[i0, i1, . . . , in−1 + t] | t ∈ [0, 1)}.

Let A denote the algebra of finite unions of cylinders. Then A generates the Borel σ-algebra. (This follows from Proposition 2.4.2: cylinders are clearly Borel sets and theyseparate points. To see this, note that if x 6= y then they have different continued fractionexpansions. Hence there exists n such that xn 6= yn. Hence x, y are in different cylindersof rank n, and these cylinders are disjoint.)

For each i ∈ N define the map φi : [0, 1) → I(i) by

φi(x) =1

i + x.

56


Thus if x has continued fraction expansion [x0, x1, . . .] then φi(x) has continued fractionexpansion [i, x0, x1, . . .]. Clearly T (φi(x)) = x for all x ∈ [0, 1).

For i0, i1, . . . , in−1 ∈ N, define

φi0,i1,...,in−1 = φi0φi1 · · ·φin−1 : [0, 1) → I(i0, i1, . . . , in−1).

Then φi0,i1,...,in−1(x) takes the continued fraction expansion of x, shifts every digit n placesto the right, and inserts the digit i0, i1, . . . , in−1 in the first n places. Clearly

T n(φi0,i1,...,in−1(x)) = x

for all x ∈ [0, 1).We first need an estimate on the length of (i.e. the Lebesgue measure of) the cylinder

I(i0, i1, . . . , in−1). Note that

φi0,i1,...,in−1(t) =Pn(i0, . . . , in−1; t)

Qn(i0, . . . , in−1; t)=

Pn + tPn−1

Qn + tQn−1.

Differentiating this expression with respect to t and using Lemma 6.3.1(ii), we see that

|φ′i0,i1,...,in−1

(t)| =

∣

∣

∣

∣

QnPn−1 − PnQn−1

(Qn + tQn−1)2

∣

∣

∣

∣

=1

(Qn + tQn−1)2.

It follows from Lemma 6.3.1(ii) that Qn + Qn−1 ≤ 2Qn. Hence

1

4

1

Q2n

≤ 1

(Qn + Qn−1)2≤ |φ′

i0,i1,...,in−1(t)| ≤ 1

Q2n

. (6.3.1)

Hence

λ(I(i0, i1, . . . , in−1)) =

∫

χI(i0,i1,...,in−1)(t) dt =

∫

I(i0,i1,...,in−1)dt =

∫ 1

0|φ′

i0,i1,...,in−1(t)| dt

(6.3.2)where we have used the change of variables formula. Combining (6.3.2) with (6.3.1) we seethat

1

4

1

Q2n

≤ λ(I(i0, i1, . . . , in−1)) ≤1

Q2n

. (6.3.3)

We can now prove that the Gauss map is ergodic with respect to Gauss’ measure µ.Suppose that T−1B = B where B ∈ B. Let I(i0, i1, . . . , in−1) be a cylinder. Then

λ(B ∩ I(i0, i1, . . . , in−1))

=

∫

I(i0,i1,...,in−1)χB(x) dx

=

∫ 1

0χB(φi0,i1,...,in−1(x))|φ′

i0,i1,...,in−1(x)| dx by the change of variables formula.

=

∫ 1

0χT−nB(φi0,i1,...,in−1(x))|φ′

i0,i1,...,in−1(x)| dx as T−nB = B

=

∫ 1

0χB(T n(φi0,i1,...,in−1(x)))|φ′

i0,i1,...,in−1(x)| dx as χT−nB = χB ◦ T n

=

∫ 1

0χB(x)|φ′

i0,i1,...,in−1(x)| dx as T nφi0,i1,...,in−1(x) = x.

57


By (6.3.1) and (6.3.3) it follows that

λ(B ∩ I(i0, i1, . . . , in−1)) ≥1

4Q2n

λ(B) ≥ 1

4λ(B)λ(I(i0, i1, . . . , in−1))

so thatλ(B)λ(I(i0, i1, . . . , in−1)) ≤ 4λ(B ∩ I(i0, i1, . . . , in−1)).

By Lemma 6.1.1 it follows that λ(B) = 0 or λ(Bc) = 0. Hence, as Lebesgue measure andGauss’ measure have the same sets of measure zero, it follows that either µ(B) = 0 orµ(Bc) = 0. Hence T is ergodic with respect to Gauss’ measure.

§6.4 Bernoulli shifts

Let S = {1, . . . , k} be a finite set of symbols and let Σ = {x = (xj)∞j=0 | xj ∈ {1, 2, . . . , k}}

denote the shift space on k symbols. Let σ : Σ → Σ denote the left shift map, so that(σ(x))j = xj+1.

Recall that we defined the cylinder [i0, . . . , in−1] to be the set of all sequences in Σ thatstart with symbols i0, . . . , in−1, that is

[i0, . . . , in−1] = {x = (xj)∞j=0 ∈ Σ | xj = ij , j = 0, 1, . . . , n − 1}.

Let p = (p(1), . . . , p(k)) be a probability vector (that is, p(j) > 0,∑k

j=1 p(j) = 1). Wedefined the Bernoulli measure µp on cylinders by setting

µp[i0, . . . , in−1] = p(i0)p(i1) · · · p(in−1).

We have already seen that µp is a σ-invariant measure.

Proposition 6.4.1Let µp be a Bernoulli measure. Then µp is ergodic.

Proof. We first make the following observation: let I = [i0, . . . , ip−1], J = [j0, . . . , jq−1]be cylinders of ranks p, q, respectively. Consider I ∩ σ−nJ where n ≥ p. Then

I ∩ σ−nJ = {x = (xk)∞k=0 ∈ Σ | xk = ik for k = 0, 1, . . . , p − 1, xk+n = jk for k = 0, 1, . . . , q − 1}

=⋃

xp,...,xn−1

[i0, i1, . . . , ip−1, xp, . . . , xn−1, j0, . . . , jq−1],

a disjoint union. Hence

µp(I ∩ σ−nJ) =∑

xp,...,xn−1

µp[i0, i1, . . . , ip−1, xp, . . . , xn−1, j0, . . . , jq−1]

=∑

xp,...,xn−1

p(i0)p(i1) · · · p(ip−1)p(xp) · · · p(xn−1)p(j0)p(j1) · · · p(jq−1)

= p(i0)p(i1) · · · p(ip−1)p(j0)p(j1) · · · p(jq−1) as

k∑

xp=1

p(xp) = · · · =

k∑

xn−1=1

p(xn−1) = 1

= µp(I)µp(J). (6.4.1)

Let B ∈ B be σ-invariant. By Lemma 6.1.1 it is sufficient to prove that µp(B)µp(I) ≤µp(B∩I) for each cylinder I. Let ε > 0. We first approximate the invariant set B by a finite

58


union of cylinders. By Proposition 2.4.4, we can find a finite disjoint union of cylindersA =

⋃rj=1 Jj such that µp(B△A) < ε. Note that |µp(A) − µp(B)| < ε.

Let n be any integer greater than the rank of I. Note that σ−nB△σ−nA = σ−n(B△A).Hence

µp(σ−nB△σ−nA) = µp(σ

−n(B△A)) = µp(B△A) < ε,

where we have used the facts that σ−nB = B and that µp is an invariant measure.As A =

⋃rj=1 Jj is a finite union of cylinders and n is greater than the rank of I, it

follows from (6.4.1) that

µp(σ−nA ∩ I) = µp

σ−n

r⋃

j=1

Jj

∩ I

=

r∑

j=1

µp(σ−nJj ∩ I)

=r∑

j=1

µp(Jj)µp(I) = µp

r⋃

j=1

Jj

µp(I)

= µp(A)µp(I).

Finally, note that (σ−nA ∩ I)△(σ−nB ∩ I) ⊂ (σ−nA)△(σ−nB). Hence µp((σ−nA ∩

I)△(σ−nB ∩ I)) < ε so that µp(σ−nA ∩ I) < µp(σ

−nB ∩ I) + ε. Hence

µp(B)µp(I) = µp(σ−nB)µp(I) ≤ µp(σ

−nA)µp(I) + ε = µp(σ−nA ∩ I) + ε

≤ µp(σ−nB ∩ I) + 2ε = µ(B ∩ I) + 2ε.

As ε > 0 is arbitrary, we have that µp(B)µp(I) ≤ µp(B ∩ I) for any cylinder I. ByLemma 6.1.1, it follows that µp(B) = 0 or 1. Hence µp is ergodic. ✷

§6.5 Markov shifts

Let P be an irreducible stochastic k×k matrix with entries P (i, j). Let p = (p(1), . . . , p(k))be the unique left probability eigenvector corresponding to eigenvalue 1, so that pP = p.Recall that the Markov measure µP is defined on the Borel σ-algebra by defining it oncylinders in the following way:


We have seen that µP is an invariant measure for the shift map σ. We can adapt the proofof Proposition 6.4.1 to show that µP is ergodic.

Proposition 6.5.1Let P be an irreducible stochastic matrix. Then the corresponding Markov measure µP isergodic.

Proof (not examinable). Let d denote the period of P .Let I = [i0, . . . , ip−1], J = [j0, . . . , jq−1] be cylinders of ranks p, q, respectively. Consider

I ∩ σ−nJ where n ≥ p. Then

I ∩ σ−nJ = {x ∈ Σ | xj = ij for j = 0, 1, . . . , p − 1, xj+n = yj for j = 0, 1, . . . , q − 1}=

⋃

xp,...,xn−1

[i0, i1, . . . , ip−1, xp, . . . , xn−1, j0, . . . , jq−1],

59


a disjoint union. Hence

µP (I ∩ σ−nJ)

=∑

xp,...,xn−1

µp[i0, i1, . . . , ip−1, xp, . . . , xn−1, j0, . . . , jq−1]

=∑

xp,...,xn−1

p(i0)P (i0, i1) · · ·P (ip−2, ip−1)P (ip−1, xp)P (xp, xp+1) · · ·P (xn−1, j0)

× P (j0, j1) · · ·P (jq−2, jq−1)

= µP (I)µP (J)1

p(j0)

∑

xp,...,xn−1

P (ip−1, xp)P (xp, xp+1) · · ·P (xn−1, j0)

= µP (I)µP (J)Pn−1−p(ip−1, j0)

p(j0)

By the Perron-Frobenius Theorem (Theorem 3.3.6), we know that Pnd(i, j) → p(j) asn → ∞. Hence, letting n → ∞ through an appropriate subsequence, we see that

µP (I ∩ σ−nJ) → µP (I)µP (J). (6.5.1)

The remainder of the proof is almost identical to the proof of Proposition 6.4.1. Let B ∈B be σ-invariant. By Proposition 6.1.1 it is sufficient to prove that µP (B)µP (I) ≤ µP (B∩I)for every cylinder I. Let ε > 0. We approximate B by a finite union of cylinders by usingProposition 2.4.4. That is, we can find a finite disjoint union of cylinders A =

⋃rj=1 Jj such

that µP (B△A) < ε. Note that |µP (A) − µP (B)| < ε.Let n be any integer greater than the rank of I. Note that σ−nB△σ−n(A) = σ−n(B△A).

HenceµP (σ−nB△σ−nA) = µP (σ−n(B△A)) = µP (B△A) < ε

where we have used the facts that σ−nB = B and that µP is an invariant measure.As A =

⋃rj=1 Jj is a finite union of cylinders, it follows from (6.5.1) that by choosing n

sufficiently large, we have that µP (σ−nJj ∩ I) ≤ µP (Jj)µP (I) + ε for j = 1, 2, . . . , r. Hence

µP (σ−nA)µP (I) = µP

σ−n

r⋃

j=1

Jj

µP (I) =r∑

j=1

µP (σ−nJj)µP (I)

=

r∑

j=1

µP (Jj ∩ I) + ε = µP

r⋃

j=1

Jj ∩ I

+ ε

= µP (A ∩ I) + ε.

Finally, note that (σ−nA ∩ I)△(σ−nB ∩ I) ⊂ (σ−nA)△(σ−nB). Hence µP ((σ−nA ∩I)△(σ−nB ∩ I)) < ε so that µP (σ−nB ∩ I) < µP (σ−nA ∩ I) + ε. Hence

µP (B)µP (I) = µP (σ−nB)µP (I) ≤ µP (σ−nA)µP (I) + ε = µP (σ−nA ∩ I) + 2ε

≤ µP (σ−nB ∩ I) + 3ε = µP (B ∩ I) + 3ε.

As ε > 0 was arbitrary, we have that µP (B)µP (I) ≤ µP (B ∩ I). By Proposition 6.1.1, theresult follows. ✷

60


§6.6 Exercises

Exercise 6.1The dynamical system T : [0, 1] → [0, 1] defined by

T (x) =

{

2x if 0 ≤ x ≤ 1/22(1 − x) if 1/2 ≤ x ≤ 1

is called the tent map.

(i) Prove that T preserves Lebesgue measure.

(ii) Prove that T is ergodic with respect to Lebesgue measure.

Exercise 6.2Recall that the Luroth map T : [0, 1] → [0, 1] is defined to be

T (x) =

n(n + 1)x − n if x ∈(

1

n + 1,1

n

]

0 if x = 0.

We saw in Exercise 3.9 that Lebesgue measure is a T -invariant probability measure. Provethat Lebesgue measure is ergodic.

Exercise 6.3Prove (using induction on n) Lemma 6.3.1.


∞j=0 | xj ∈ {0, 1}} and let σ : Σ → Σ, (σ(x))j = xj+1 be the shift map on

the space of infinite sequences of two symbols {0, 1}. Note that Σ supports uncountablymany different σ-invariant measures (for example, the Bernoulli-(p, 1 − p) measures areall ergodic and all distinct for p ∈ (0, 1)). We will use this observation to prove that thedoubling map has uncountably many ergodic measures.

Define π : Σ → R/Z by

π(x) = π(x0, x1, . . .) =x0

2+

x1

22+ · · · + xn

2n+1+ · · · .

(i) Show that π is continuous.

(ii) Let T : R/Z → R/Z be the doubling map: T (x) = 2x mod 1. Show that π◦σ = T ◦π.

(iii) If µ is a σ-invariant probability measure on Σ, show that π∗µ (where π∗µ(B) =µ(π−1B) for a Borel subset B ⊂ R/Z) is a T -invariant probability measure on R/Z.

(Lebesgue measure on R/Z corresponds to choosing µ to be the Bernoulli-(1/2, 1/2)-measure on Σ.)

(iv) Show that if µ is an ergodic measure for σ, then π∗µ is an ergodic measure for T .

(v) Conclude that there are uncountably many different ergodic measures for the doublingmap.

61

MATH4/61112 7. Continuous transformations

7. Continuous transformations on compact metric spaces

§7.1 Introduction

So far, we have been studying a measurable map T defined on a probability space (X,B, µ).We have asked whether the given measure µ is invariant or ergodic. In this section, we shiftour focus slightly and consider, for a given transformation T : X → X, the space M(X,T )of all probability measures that are invariant under T . In order to equip M(X,T ) with somestructure we will need to assume that the underlying space X is itself equipped with someadditional structure other than merely being a measure space. Throughout this sectionwe will work in the context of X being a compact metric space and T being a continuoustransformation.

§7.2 Probability measures on compact metric spaces

Let X be a compact metric space equipped with the Borel σ-algebra B. (Recall that theBorel σ-algebra is the smallest σ-algebra that contains all the open subsets of X.)

Let C(X, R) = {f : X → R | f is continuous} denote the space of real-valued continuousfunctions defined on X. Define the uniform norm of f ∈ C(X, R) by

‖f‖∞ = supx∈X

|f(x)|.

With this norm, C(X, R) is a Banach space.An important property of C(X, R) that will prove to be useful later on is that it is

separable: C(X, R) contains a countable dense subset. Thus we can choose a sequence{fn ∈ C(X, R)}∞n=1 such that, for all f ∈ C(X, R) and all ε > 0, there exists n such that‖f − fn‖∞ < ε.

Let M(X) denote the set of all Borel probability measures on (X,B).It will be very important to have a sensible notion of convergence in M(X); the ap-

propriate notion for us is called weak∗ convergence. We say that a sequence of probabilitymeasures µn weak∗ converges to µ as n → ∞ if, for every f ∈ C(X, R),

limn→∞

∫

f dµn =

∫

f dµ.

If µn weak∗ converges to µ then we write µn ⇀ µ as n → ∞. We can make M(X) into ametric space compatible with this definition of convergence by choosing a countable densesubset {fn}∞n=1 ⊂ C(X, R) and, for µ1, µ2 ∈ M(X), setting

dM(X)(µ1, µ2) =

∞∑

n=1

1

2n‖fn‖∞

∣

∣

∣

∣

∫

fn dµ1 −∫

fn dµ2

∣

∣

∣

∣

(we can assume that fn 6≡ 0 for any n). It is easy to check that µn ⇀ µ as n → ∞ if andonly if dM(X)(µn, µ) → 0 as n → ∞.

However, we will not need to work with a particular metric: what will be important isthe definition of convergence.

62


Remark. Note that with this definition it is not necessarily true that limn→∞ µn(B) =µ(B) for B ∈ B.

§7.2.1 The Riesz Representation Theorem

Let µ ∈ M(X) be a Borel probability measure. Then we can think of µ as a functionalthat acts on C(X, R), that is we can regard µ as a map

µ : C(X, R) → R : f 7→∫

f dµ.

We will often write µ(f) for∫

f dµ.Notice that this functional enjoys several natural properties:

(i) the functional defined by µ is linear:

µ(λ1f1 + λ2f2) = λ1µ(f1) + λ2µ(f2)

where λ1, λ2 ∈ R and f1, f2 ∈ C(X, R).

(ii) the functional defined by µ is bounded: i.e. if f ∈ C(X, R) then |µ(f)| ≤ ‖f‖∞.

(iii) if f ≥ 0 then µ(f) ≥ 0 (we say that the functional µ is positive);

(iv) consider the function 1 defined by 1(x) ≡ 1 for all x; then µ(1) = 1 (we say that thefunctional µ is normalised).

The Riesz Representation Theorem says that the above properties characterise all Borelprobability measures on X. That is, if we have a map w : C(X, R) → R that satisfiesthe above four properties, then w must be given by integrating with respect to a Borelprobability measure. This will be a very useful method of constructing measures: we needonly construct bounded positive normalised linear functionals.

Theorem 7.2.1 (Riesz Representation Theorem)Let w : C(X, R) → R be a functional such that:

(i) w is linear: i.e. w(λ1f1 + λ2f2) = λ1w(f1) + λ2w(f2);

(ii) w is bounded: i.e. for all f ∈ C(X, R) we have |w(f)| ≤ ‖f‖∞;

(iii) w is positive: i.e. if f ≥ 0 then w(f) ≥ 0;

(iv) w is normalised: i.e. w(1) = 1.

Then there exists a Borel probability measure µ ∈ M(X) such that

w(f) =

∫

f dµ.

Moreover, µ is unique.

Thus the Riesz Representation Theorem says that “if it looks like integration on continuousfunctions, then it is integration with respect to a (unique) Borel probability measure.”

63


§7.2.2 Properties of M(X)

First note that the space M(X) of Borel probability measures on the compact metric spaceX is non-empty (provided X 6= ∅). This is because, for each x ∈ X, the Dirac measure δx

is a Borel probability measure. Indeed, we have the following result:

Proposition 7.2.2There is a continuous embedding of X in M(X) given by the map X → M(X) : x 7→ δx,i.e. if xn → x then δxn ⇀ δx.

Proof. See Exercise 7.1. ✷

Recall that a subset C of a vector space is convex if whenever v1, v2 ∈ C and α ∈ [0, 1]then αv1 + (1 − α)v2 ∈ C.

Proposition 7.2.3The space M(X) is convex.

Proof. Let µ1, µ2 ∈ M(X), α ∈ [0, 1]. Then it is easy to check that αµ1 + (1 − α)µ2,defined by

(αµ1 + (1 − α)µ2)(B) = αµ1(B) + (1 − α)µ2(B),

is a Borel probability measure. ✷

Finally, recall that a metric space K is said to be (sequentially) compact if every se-quence of points in K has a convergent subsequence.

Proposition 7.2.4The space M(X) is weak∗ compact.

Proof. For convenience, we shall write µ(f) =∫

f dµ.Since C(X, R) is separable, we can choose a countable dense subset of functions {fi}∞i=1 ⊂

C(X, R). Given a sequence µn ∈ M(X), we shall first consider the sequence of real numbersµn(f1) ∈ R. We have that |µn(f1)| ≤ ‖f1‖∞ for all n, so µn(f1) is a bounded sequence of

real numbers. As such, it has a convergent subsequence, µ(1)n (f1) say.

Next we apply the sequence of measures µ(1)n to f2 and consider the sequence µ

(1)n (f2) ∈

R. Again, this is a bounded sequence of real numbers and so it has a convergent subsequence

µ(2)n (f2).

In this way we obtain, for each i ≥ 1, nested subsequences {µ(i)n } ⊂ {µ(i−1)

n } such that

µ(i)n (fj) converges for 1 ≤ j ≤ i. Now consider the diagonal sequence µ

(n)n . Since, for n ≥ i,

µ(n)n is a subsequence of µ

(i)n , µ

(n)n (fi) converges for every i ≥ 1.

We can now use the fact that {fi} is dense to show that µ(n)n (f) converges for all

f ∈ C(X, R), as follows. For any ε > 0, we can choose fi such that ‖f − fi‖∞ ≤ ε. Since

µ(n)n (fi) converges, there exists N such that if n,m ≥ N then

|µ(n)n (fi) − µ(m)

m (fi)| ≤ ε.

Thus if n,m ≥ N we have

|µ(n)n (f) − µ(m)

m (f)| ≤ |µ(n)n (f) − µ(n)

n (fi)| + |µ(n)n (fi) − µ(m)

m (fi)| + |µ(m)m (fi) − µ(m)

m (f)|≤ 3ε,

64


so µ(n)n (f) converges, as required.

To complete the proof, write w(f) = limn→∞ µ(n)n (f). We claim that w satisfies the

hypotheses of the Riesz Representation Theorem and so corresponds to integration withrespect to a probability measure.

(i) By construction, w is a linear mapping: w(λf + µg) = λw(f) + µw(g).

(ii) As |w(f)| ≤ ‖f‖∞, we see that w is bounded.

(iii) If f ≥ 0 then it is easy to check that w(f) ≥ 0. Hence w is positive.

(iv) It is easy to check that w is normalised: w(1) = 1.

Therefore, by the Riesz Representation Theorem, there exists µ ∈ M(X) such that w(f) =∫

fdµ. We then have that∫

fdµ(n)n →

∫

fdµ, as n → ∞, for all f ∈ C(X, R), i.e., that µ(n)n

converges weak∗ to µ, as n → ∞. ✷

§7.3 Invariant measures for continuous transformations

Let X be a compact metric space equipped with the Borel σ-algebra and let T : X → Xbe a continuous transformation. It is clear that T is measurable.

Given a measure µ, we have already defined the measure T∗µ by T∗µ(B) = µ(T−1B).If µ is a Borel probability measure, then it is straightforward to check that T∗µ is a Borelprobability measure. We can think of T∗ as a transformation on M(X), namely:

T∗ : M(X) → M(X), T∗µ = µ ◦ T−1.

That is, if B ∈ B then T∗µ(B) = µ(T−1B).The following result tells us how to integrate with respect to T∗µ.

Lemma 7.3.1For f ∈ L1(X,B, µ) we have

∫

f d(T∗µ) =

∫

f ◦ T dµ.

Proof. From the definition, for B ∈ B,∫

χB d(T∗µ) = (T∗µ)(B) = µ(T−1B) =

∫

χT−1B dµ =

∫

χB ◦ T dµ.

Thus the result holds for simple functions. If f ≥ 0 is a positive measurable function thenwe can choose an increasing sequence of simple functions fn increasing to f pointwise. Wehave

∫

fn d(T∗µ) =

∫

fn ◦ T dµ

and, applying the Monotone Convergence Theorem (Theorem 3.1.2) to each side, we obtain∫

f d(T∗µ) =

∫

f ◦ T dµ.

The result extends to an arbitrary real-valued f ∈ L1(X,B, µ) by considering positive andnegative parts and then to complex-valued integrable functions by taking real and imaginaryparts. ✷

65


Recall that a measure µ is said to be T -invariant if µ(T−1B) = µ(B) for all B ∈ B.Hence µ is T -invariant if and only if T∗µ = µ. Write

M(X,T ) = {µ ∈ M(X) | T∗µ = µ}

to denote the space of all T -invariant Borel probability measures.The following result gives a useful criterion for checking whether a measure is T -

invariant.

Lemma 7.3.2Let T : X → X be a continuous mapping of a compact metric space. The following areequivalent:

(i) µ ∈ M(X,T );

(ii) for all f ∈ C(X, R) we have that

∫

f ◦ T dµ =

∫

f dµ. (7.3.1)

Proof. We prove (i) implies (ii). Suppose that µ ∈ M(X,T ) so that T∗µ = µ. Letf ∈ C(X, R). Then f ∈ L1(X,B, µ). Hence by Lemma 7.3.1, for any f ∈ C(X, R) we have

∫

f ◦ T dµ =

∫

f d(T∗µ) =

∫

f dµ.

Conversely, Lemma 7.3.1 allows us to write (7.3.1) as: µ(f) = (T∗µ)(f) for all f ∈C(X, R). Hence µ and T∗µ determine the same linear functional on C(X, R). By uniquenessin the Riesz Representation theorem, we have T∗µ = µ. ✷

§7.4 Invariant measures for continuous maps on the torus

We can use Lemma 7.3.2 to prove that a given measure is invariant for certain dynami-cal systems. We first note that we need only check (7.3.1) for a dense set of continuousfunctions.

Lemma 7.4.1Suppose that S ⊂ C(X, R) is a uniformly dense subset of functions (that is, for all f ∈C(X, R) and all ε > 0 there exists g ∈ S such that ‖f−g‖∞ < ε). Suppose that

∫

g◦T dµ =∫

g dµ for all g ∈ S. Then∫

f ◦ T dµ =∫

f dµ for all f ∈ C(X, R).

Proof. Let f ∈ C(X, R) and let ε > 0. Choose g ∈ S such that ‖f − g‖∞ < ε. Then

∣

∣

∣

∣

∫

f ◦ T dµ −∫

f dµ

∣

∣

∣

∣

≤∣

∣

∣

∣

∫

f ◦ T dµ −∫

g ◦ T dµ

∣

∣

∣

∣

+

∣

∣

∣

∣

∫

g ◦ T dµ −∫

g dµ

∣

∣

∣

∣

+

∣

∣

∣

∣

∫

g dµ −∫

f dµ

∣

∣

∣

∣

≤∫

|f ◦ T − g ◦ T | dµ +

∣

∣

∣

∣

∫

g ◦ T dµ −∫

g dµ

∣

∣

∣

∣

+

∫

|f − g| dµ.

66


Noting that, as ‖f − g‖∞ < ε, we have that |f(Tx) − g(Tx)| < ε for all x, and that∫

g ◦ T dµ =∫

g dµ, we have that∣

∣

∣

∣

∫

f ◦ T dµ −∫

f dµ

∣

∣

∣

∣

< 2ε.

As ε is arbitrary, the result follows. ✷

Corollary 7.4.2Let T be a continuous transformation of a compact metric space X, equipped with theBorel σ-algebra. Let µ be a Borel probability measure on X.

Suppose that S ⊂ C(X, R) is a uniformly dense subset of functions such that∫

g◦T dµ =∫

g dµ for all g ∈ S. Then µ is a T -invariant measure.

Proof. This follows immediately from Lemma 7.3.2 and Lemma 7.4.1. ✷

We show how to use Corollary 7.4.2 by studying some of our examples.

§7.4.1 Circle rotations

Let T (x) = x + α mod 1 be a circle rotation. We show how to use Corollary 7.4.2 to provethat Lebesgue measure µ is T -invariant.

Let ℓ ∈ Z. We first note that if ℓ 6= 0 then∫

e2πiℓx dx =1

2πiℓe2πiℓx

∣

∣

∣

1

0= 0.

We also note that if ℓ = 0 then∫

e2πiℓx dx = 1.Let S denote the set of trigonometric polynomials, i.e.

S =

r−1∑

j=0

cje2πiℓjx | cj ∈ R, ℓj ∈ Z, r ∈ N

.

Then S is uniformly dense in C(X, R) by the Stone-Weierstrass Theorem (Theorem 1.2.2).Let g ∈ S be a trigonometric polynomial and write

g(x) =

r−1∑

j=0

cje2πiℓjx

where ℓj = 0 if and only if j = 0. Hence∫

g dµ = c0.Note that

g(Tx) =r−1∑

j=0

cje2πiℓj(x+α) =

r−1∑

j=0

cje2πiℓjαe2πiℓjx.

Hence∫

g ◦ T dµ =

∫ r−1∑

j=0

cje2πiℓjαe2πiℓjx dµ =

r−1∑

j=0

cje2πiℓjα

∫

e2πiℓjx dµ

and the only non-zero integral occurs when ℓj = 0, i.e. j = 0. We must therefore have that∫

g ◦ T dµ = c0.Hence

∫

g ◦ T dµ =∫

g dµ for all g ∈ S. It follows from Corollary 7.4.2 that µ isT -invariant.

67


§7.4.2 Toral endomorphisms

Let A be a k × k integer matrix with detA 6= 0. Define the linear toral endomorphismT : Rk/Zk → Rk/Zk by

T ((x1, . . . , xk) + Zk) = A(x1, . . . , xk) + Zk.

When T is a linear toral automorphism (i.e. when detA = ±1) we have already seen thatLebesgue measure is invariant. We use Corollary 7.4.2 to prove the Lebesgue measure µ isT -invariant when detA 6= 0.

For n = (n1, . . . , nk) ∈ Zk and x = (x1, . . . , xk) ∈ Rk define, as before, 〈n, x〉 =n1x1 + · · · + nkxk. Note that

∫

e2πi〈n,x〉 dµ =

∫

· · ·∫

e2πin1x1 · · · e2πinkxk dx1 · · · dxk.

Hence∫

e2πi〈n,x〉 dµ =

{

0 if n 6= 01 if n = 0

where 0 = (0, . . . , 0) ∈ Zk.Let

S =

r−1∑

j=0

cje2πi〈n(j),x〉 | r ∈ N, cj ∈ R,n(j) = (n

(j)1 , . . . , n

(j)k ) ∈ Zk

.

By the Stone-Weierstrass Theorem (Theorem 1.2.2), we see that S is uniformly dense inC(Rk/Zk, R).

Let g ∈ S and write

g(x) =r−1∑

j=0

cje2πi〈n(j) ,x〉

where n(j) = 0 if and only if j = 0. Then

∫

g dµ =

∫ r−1∑

j=0

cje2πi〈n(j) ,x〉 dµ =

r−1∑

j=0

cj

∫

e2πi〈n(j),x〉 dµ = c0.

Note that

g(Tx) =

r−1∑

j=0

cje2πi〈n(j),Ax〉 =

r−1∑

j=0

cje2πi〈n(j)A,x〉.

Hence∫

g ◦ T dµ =

∫ r−1∑

j=0

cje2πi〈n(j)A,x〉 dµ =

r−1∑

j=0

cj

∫

e2πi〈n(j)A,x〉 dµ.

These integrals are zero unless n(j)A = 0. As det A 6= 0 this happen only when n(j) = 0,i.e. when j = 0. Hence

∫

g ◦ T dµ = c0 =

∫

g dµ.

Hence by Corollary 7.4.2, µ is a T -invariant measure.

68


Remark. You will notice a strong connection between the above arguments and Fourierseries and you may think that we could take g(x) to be the nth partial sum of the Fourierseries for f . However, one needs to take care. Suppose f ∈ C(Rk/Zk, R) has Fourierseries

∑

n cn(f)e2πi〈n,x〉. We need to be careful about what it means for this infinite seriesto converge. We know that the sequence of partial sums sn converges in L2 to f , butwe do not know that the partial sums converge uniformly to f . That is, we know that‖f − sn‖2 → 0, but not necessarily that ‖f − sn‖∞ → 0. In fact, in general, it is not truethat ‖f − sn‖∞ → 0.

However, if one defines σn = 1/n∑n−1

j=0 sj to be the average of the first n partial sums,then it is true that ‖f − σn‖∞ → 0. (This is quite a deep result.)

§7.5 Existence of invariant measures

Given a continuous mapping T : X → X of a compact metric space, it is natural to askwhether invariant measures necessarily exist, i.e., whether M(X,T ) 6= ∅. The next resultshows that this is the case.

Theorem 7.5.1Let T : X → X be a continuous mapping of a compact metric space. Then there exists atleast one T -invariant probability measure.

Proof. Let ν ∈ M(X) be a probability measure (for example, we could take ν to be aDirac measure). Define the sequence µn ∈ M(X) by

µn =1

n

n−1∑

j=0

T j∗ ν,

so that, for B ∈ B,

µn(B) =1

n(ν(B) + ν(T−1B) + · · · + ν(T−(n−1)B)).

Since M(X) is weak∗ compact, some subsequence µnkconverges, as k → ∞, to a

measure µ ∈ M(X). We shall show that µ ∈ M(X,T ). By Lemma 7.3.2, this is equivalentto showing that

∫

f dµ =

∫

f ◦ T dµ for all f ∈ C(X, R).

To see this, first note that f ◦ T − f is continuous. Then

∣

∣

∣

∣

∫

f ◦ T dµ −∫

f dµ

∣

∣

∣

∣

=

∣

∣

∣

∣

∫

(f ◦ T − f) dµ

∣

∣

∣

∣

= limk→∞

∣

∣

∣

∣

∫

(f ◦ T − f) dµnk

∣

∣

∣

∣

= limk→∞

∣

∣

∣

∣

∣

∣

∫

(f ◦ T − f) d

1

nk

nk−1∑

j=0

T j∗ ν

∣

∣

∣

∣

∣

∣

= limk→∞

∣

∣

∣

∣

∣

∣

1

nk

∫ nk−1∑

j=0

(f ◦ T − f) dT j∗ ν

∣

∣

∣

∣

∣

∣

69


= limk→∞

∣

∣

∣

∣

∣

∣

1

nk

∫ nk−1∑

j=0

(f ◦ T j+1 − f ◦ T j) dν

∣

∣

∣

∣

∣

∣

= limk→∞

∣

∣

∣

∣

1

nk

∫

(f ◦ T nk − f) dν

∣

∣

∣

∣

≤ limk→∞

2‖f‖∞nk

= 0.

Therefore, µ ∈ M(X,T ), as required. ✷

We will need the following additional properties of M(X,T ).

Theorem 7.5.2Let T : X → X be a continuous mapping of a compact metric space. Then M(X,T ) is aweak∗ compact and convex subset of M(X).

Proof. The fact that M(X,T ) is convex is straightforward from the definition.To see that M(X,T ) is weak∗ compact it is sufficient to show that it is a weak∗ closed

subset of the weak∗ compact M(X). Suppose that µn ∈ M(X,T ) is such that µn ⇀ µ ∈M(X). We need to show that µ ∈ M(X,T ). To see this, observe that for any f ∈ C(X, R)we have that

∫

f ◦ T dµ = limn→∞

∫

f ◦ T dµn as f ◦ T is continuous

= limn→∞

∫

f dµn as µn ∈ M(X,T )

=

∫

f dµ as µn ⇀ µ.

✷

§7.6 Exercises

Exercise 7.1Prove Proposition 7.2.2: show that if xn, x ∈ X and xn → x then δxn ⇀ δx.

Exercise 7.2Prove that T∗ : M(X) → M(X) is weak∗ continuous (i.e. if µn ⇀ µ then T∗µn ⇀ T∗µ).

Exercise 7.3Let X be a compact metric space. For µ ∈ M(X) define

‖µ‖ = supf∈C(X,R),‖f‖∞≤1

∣

∣

∣

∣

∫

f dµ

∣

∣

∣

∣

.

We say that µn converges strongly to µ if ‖µn − µ‖ → 0 as n → ∞. The topology thisdetermines is called the strong topology (or the operator topology).

(i) Show that if µn → µ strongly then µn ⇀ µ in the weak∗ topology.

(ii) Suppose that X is infinite. Show that X → M(X) : x 7→ δx is not continuous in thestrong topology.

70


(iii) Prove that ‖δx − δy‖ = 2 if x 6= y. (You may use Urysohn’s Lemma: Let A and Bbe disjoint closed subsets of a metric space X. Then there is a continuous functionf ∈ C(X, R) such that 0 ≤ f ≤ 1 on X while f ≡ 0 on A and f ≡ 1 on B.)

Hence prove that M(X) is not compact in the strong topology when X is infinite.

Exercise 7.4Give an example of a sequence of measures µn and a set B such that µn ⇀ µ but µn(B) 6→µ(B).

Exercise 7.5Prove that M(X,T ) is convex.

Exercise 7.6Suppose that S ⊂ C(X, R) is a uniformly dense subset of functions (that is, for all f ∈C(X, R) and all ε > 0 there exists g ∈ S such that ‖f − g‖∞ < ε). Let µn, µ ∈ M(X).Suppose that

∫

f dµn →∫

f dµ for all f ∈ S. Prove that µn ⇀ µ.


∞j=0 | xj ∈ {0, 1}} denote the shift space on two symbols 0, 1. Let

σ : Σ → Σ, (σ(x))j = xj+1 denote the shift map.

(i) How many periodic points of period n are there?

(ii) Let Per(n) denote the set of periodic points of period n. Define

µn =1

2n

∑

x∈Per(n)

δx.

Let ij ∈ {0, 1}, 0 ≤ j ≤ m − 1 and define the cylinder set

[i0, i1, . . . , im−1] = {x = (xj)∞j=0 ∈ Σ | xj = ij , j = 0, 1, . . . ,m − 1}.

Let µ denote the Bernoulli-(1/2, 1/2) measure. Prove that

∫

χ[i0,i1,...,im−1] dµn →∫

χ[i0,i1,...,im−1] dµ as n → ∞.

(iii) Prove that χ[i0,i1,...,im−1] is a continuous function.

(iv) Use Exercise 7.6 and the Stone-Weierstrass Theorem (Theorem 1.2.2) to show thatµn ⇀ µ as n → ∞.

Exercise 7.8Let X = R3/Z3 be the 3-dimensional torus. Let α ∈ R. Define T : X → X by

T

xyz

+ Z3

=

α + xy + xz + y

+ Z3.

Use Corollary 7.4.2 to prove that Lebesgue measure µ is a T -invariant measure.

71

MATH4/61112 8. Ergodic measures for continuous transformations

8. Ergodic measures for continuous transformations

§8.1 Introduction

In the previous section we saw that, given a continuous transformation of a compact metricspace, the set of T -invariant Borel probability measures is non-empty. One can ask a similarquestion: is the set of ergodic Borel probability measures non-empty? In this section weaddress this question. We let E(X,T ) ⊂ M(X,T ) denote the set of ergodic T -invariantBorel probability measures on X.

§8.2 Radon-Nikodym derivatives

We will need the concept of Radon-Nikodym derivatives.

Definition. Let µ be a measure on the measurable space (X,B). We say that a measure νis absolutely continuous with respect to µ and write ν ≪ µ if ν(B) = 0 whenever µ(B) = 0,B ∈ B.

Remark. Thus ν is absolutely continuous with respect to µ if sets of µ-measure zero alsohave ν-measure zero (but there may be more sets of ν-measure zero).

For example, let f ∈ L1(X,B, µ) be non-negative and define a measure ν by

ν(B) =

∫

Bf dµ. (8.2.1)

Then ν ≪ µ.As a particular example, let X = [0, 1] be equipped with the Borel σ-algebra B. Define

f : [0, 1] → R by

f(x) =

{

2x if 0 ≤ x ≤ 1/20 if 1/2 < x ≤ 1.

Let µ be Lebesgue measure and let ν be the measure given by

ν(B) =

∫

Bf dµ.

If A ⊂ [1/2, 1] is any Borel set then ν(A) = 0.The following theorem says that, essentially, all absolutely continuous measures occur

by the construction in (8.2.1).

Theorem 8.2.1 (Radon-Nikodym)Let (X,B, µ) be a probability space. Let ν be a measure defined on B and suppose thatν ≪ µ. Then there is a non-negative measurable function f such that

ν(B) =

∫

Bf dµ for all B ∈ B.

Moreover, f is unique in the sense that if g is a measurable function with the same propertythen f = g µ-a.e.

72


Remark. If ν ≪ µ then it is customary to write dν/dµ for the function given by theRadon-Nikodym theorem, that is

ν(B) =

∫

B

dν

dµdµ.

The following relations are all easy to prove, and indicate why the notation was chosen inthis way.

(i) If ν ≪ µ and f is a µ-integrable function then f is ν-integrable and∫

f dν =

∫

fdν

dµdµ.

(ii) If ν1, ν2 ≪ µ thend(ν1 + ν2)

dµ=

dν1

dµ+

dν2

dµ.

(iii) If λ ≪ ν ≪ µ then λ ≪ µ anddλ

dµ=

dλ

dν

dν

dµ.

§8.3 Ergodic measures as extreme points

§8.3.1 Extreme points of convex sets

A point in a convex set is called an extreme point if it cannot be written as a non-trivialconvex combination of (other) elements of the set. More precisely, µ is an extreme point ofM(X,T ) if, whenever

µ = αµ1 + (1 − α)µ2,

with µ1, µ2 ∈ M(X,T ), 0 < α < 1, then we have µ1 = µ2 = µ.

Remarks.

(i) Let Y be the unit square

Y = {(x, y) | 0 ≤ x ≤ 1, 0 ≤ y ≤ 1} ⊂ R2.

Then the extreme points of Y are the corners (0, 0), (0, 1), (1, 0), (1, 1).

(ii) Let Y be the (closed) unit disc

Y = {(x, y) | x2 + y2 ≤ 1} ⊂ R2.

Then the set of extreme points of Y is precisely the unit circle {(x, y) | x2 + y2 = 1}.

§8.3.2 Existence of ergodic measures

The next result will allow us to show that ergodic measures for continuous transformationson compact metric spaces always exist.

Theorem 8.3.1Let T be a continuous transformation of a compact metric space X equipped with the Borelσ-algebra B. The following are equivalent:

73


(i) the T -invariant probability measure µ is ergodic;

(ii) µ is an extreme point of M(X,T ).

Proof. We prove (ii) implies (i). If µ is an extreme point of M(X,T ) then it is ergodic.In fact, we shall prove the contrapositive. Suppose that µ is not ergodic; we show that µis not an extreme point of M(X,T ). As µ is not ergodic, there exists B ∈ B such thatT−1B = B and 0 < µ(B) < 1.

Define probability measures µ1 and µ2 on X by

µ1(A) =µ(A ∩ B)

µ(B), µ2(A) =

µ(A ∩ (X \ B))

µ(X \ B).

(The assumption that 0 < µ(B) < 1 ensures that the denominators are not equal to zero.)Clearly, µ1 6= µ2, since µ1(B) = 1 while µ2(B) = 0.

Since T−1B = B, we also have T−1(X \ B) = X \ B. Thus we have

µ1(T−1A) =

µ(T−1A ∩ B)

µ(B)

=µ(T−1A ∩ T−1B)

µ(B)

=µ(T−1(A ∩ B))

µ(B)

=µ(A ∩ B)

µ(B)

= µ1(A)

and (by the same argument)

µ2(T−1A) =

µ(T−1A ∩ (X \ B))

µ(X \ B)= µ2(A),

i.e., µ1 and µ2 are both in M(X,T ).However, we may write µ as the non-trivial (since 0 < µ(B) < 1) convex combination

µ = µ(B)µ1 + (1 − µ(B))µ2,

so that µ is not an extreme point. ✷

Proof (not examinable). We prove (i) implies (ii). Suppose that µ is ergodic and thatµ = αµ1 + (1 − α)µ2, with µ1, µ2 ∈ M(X,T ) and 0 < α < 1. We shall show that µ1 = µ(so that µ2 = µ, also). This will show that µ is an extreme point of M(X,T ).

If µ(A) = 0 then µ1(A) = 0, so that µ1 ≪ µ. Therefore the Radon-Nikodym derivativedµ1/dµ ≥ 0 exists. One can easily deduce from the statement of the Radon-NikodymTheorem that µ1 = µ if and only if dµ1/dµ = 1 µ-a.e. We shall show that this is indeedthe case by showing that the sets where, respectively, dµ1/dµ < 1 and dµ1/dµ > 1 bothhave µ-measure zero.

Let

B =

{

x ∈ X | dµ1

dµ(x) < 1

}

.

74


Now

µ1(B) =

∫

B

dµ1

dµdµ =

∫

B∩T−1B

dµ1

dµdµ +

∫

B\T−1B

dµ1

dµdµ (8.3.1)

and

µ1(T−1B) =

∫

T−1B

dµ1

dµdµ =

∫

B∩T−1B

dµ1

dµdµ +

∫

T−1B\B

dµ1

dµdµ. (8.3.2)

As µ1 ∈ M(X,T ), we have that µ1(B) = µ1(T−1B). Hence comparing the last summands

in both (8.3.1) and (8.3.2) we obtain

∫

B\T−1B

dµ1

dµdµ =

∫

T−1B\B

dµ1

dµdµ. (8.3.3)

In fact, these integrals are taken over sets of the same µ-measure:

µ(T−1B \ B) = µ(T−1B) − µ(T−1B ∩ B)

= µ(B) − µ(T−1B ∩ B)

= µ(B \ T−1B).

Note that on the left-hand side of (8.3.3), the integrand dµ1/dµ < 1. However, on the right-hand side of (8.3.3), the integrand dµ1/dµ ≥ 1. Thus we must have that µ(B \ T−1B) =µ(T−1B \ B) = 0, which is to say that µ(T−1B△B) = 0, i.e. T−1B = B µ-a.e. Therefore,since µ is ergodic, we have that µ(B) = 0 or µ(B) = 1.

We can rule out the possibility that µ(B) = 1 by observing that if µ(B) = 1 then

1 = µ1(X) =

∫

X

dµ1

dµdµ =

∫

B

dµ1

dµdµ < µ(B) = 1,

a contradiction. Therefore µ(B) = 0.If we define

C =

{

x ∈ X | dµ1

dµ(x) > 1

}

then repeating essentially the same argument gives µ(C) = 0.Hence

µ

{

x ∈ X | dµ1

dµ(x) = 1

}

= µ(X \ (B ∪ C)) = µ(X) − µ(B) − µ(C) = 1,

i.e., dµ1/dµ = 1 µ-a.e. Therefore µ1 = µ, as required. ✷

We can now prove that a continuous transformation of a compact metric space alwayshas an ergodic measure. To do this, we will show that M(X,T ) has an extreme point.

Theorem 8.3.2Let T : X → X be a continuous mapping of a compact metric space. Then there exists atleast one ergodic measure in M(X,T ).

Proof. By Theorem 8.3.1, it is equivalent to prove that M(X,T ) has an extreme point.Choose a countable dense subset of C(X, R), {fi}∞i=0 say. Consider the first function

f0. Since the map

M(X,T ) → R : µ 7→∫

f0 dµ

75


is (weak∗) continuous and M(X,T ) is compact, there exists (at least one) ν ∈ M(X,T )such that

∫

f0 dν = supµ∈M(X,T )

∫

f0 dµ.

If we define

M0 =

{

ν ∈ M(X,T ) |∫

f0 dν = supµ∈M(X,T )

∫

f0 dµ

}

then the above shows that M0 is non-empty. Also, M0 is closed and hence compact.We now consider the next function f1 and define

M1 =

{

ν ∈ M0 |∫

f1 dν = supµ∈M0

∫

f1 dµ

}

.

By the same reasoning as above, M1 is a non-empty closed subset of M0.Continuing inductively, we define

Mj =

{

ν ∈ Mj−1 |∫

fj dν = supµ∈Mj−1

∫

fj dµ

}

and hence obtain a nested sequence of sets

M(X,T ) ⊃ M0 ⊃ M1 ⊃ · · · ⊃ Mj ⊃ · · ·

with each Mj non-empty and closed.Now consider the intersection

M∞ =∞⋂

j=0

Mj .

Recall that the intersection of a decreasing sequence of non-empty compact sets is non-empty. Hence M∞ is non-empty and we can pick µ∞ ∈ M∞. We shall show that µ∞ is anextreme point (and hence ergodic).

Suppose that we can write µ∞ = αµ1 + (1 − α)µ2, µ1, µ2 ∈ M(X,T ), 0 < α < 1. Wehave to show that µ1 = µ2. Since {fj}∞j=0 is dense in C(X, R), it suffices to show that

∫

fj dµ1 =

∫

fj dµ2 ∀ j ≥ 0.

Consider f0. By assumption

∫

f0 dµ∞ = α

∫

f0 dµ1 + (1 − α)

∫

f0 dµ2.

In particular,∫

f0 dµ∞ ≤ max

{∫

f0 dµ1,

∫

f0 dµ2

}

.

However µ∞ ∈ M0 and so

∫

f0 dµ∞ = supµ∈M(X,T )

∫

f0 dµ ≥ max

{∫

f0 dµ1,

∫

f0 dµ2

}

.

76


Therefore∫

f0 dµ1 =

∫

f0 dµ2 =

∫

f0 dµ∞.

Thus, the first identity we require is proved and µ1, µ2 ∈ M0. This last fact allows us toemploy the same argument on f1 (with M(X,T ) replaced by M0) and conclude that

∫

f1 dµ1 =

∫

f1 dµ2 =

∫

f1 dµ∞

and µ1, µ2 ∈ M1.Continuing inductively, we show that for an arbitrary j ≥ 0,

∫

fj dµ1 =

∫

fj dµ2

and µ1, µ2 ∈ Mj. This completes the proof. ✷

§8.4 An example: the North-South map

For many dynamical systems there exist uncountably many different ergodic measures.This is the case for the doubling map, Markov shifts, toral automorphisms, etc. Here wegive an example of a dynamical system T : X → X for which one can construct M(X,T )and E(X,T ) explicitly.

Let X ⊂ R2 denote the circle of radius 1 centred at (0, 1) ∈ R2. Call N = (0, 2) theNorth Pole and S = (0, 0) the South Pole (S) of X. Define a map φ : X \ {N} → R × {0}by drawing a straight line through N and x and denoting by φ(x) the unique point onthe x-axis that this line crosses (this is just stereographic projection of the circle). DefineT : X → X by

T (x) =

{

φ−1(

12φ(x)

)

if x ∈ X \ {N},N if x = N.

Hence T (N) = N , T (S) = S and if x 6= N,S then T n(x) → S as n → ∞. We call T the

T (x)

x

φ(x)φ(x)

2

N

S

Figure 8.4.1: The North-South map

North-South map.Clearly both N and S are fixed points for T . Hence δN and δS (the Dirac delta measures

at N , S, respectively) are T -invariant. It is easy to see that both δN and δS are ergodic.

77


Now let µ ∈ M(X,T ) be an invariant measure. We claim that µ assigns zero measureto the set X \ {N,S}. Let x ∈ X be any point in the right semi-circle (for example, takex = (1, 1) ∈ R2) and consider the arc I of semi-circle from x to T (x). Then

⋃∞n=−∞ T−nI is

a disjoint union of arcs of semi-circle and, moreover, is equal to the entire right semi-circle.Now

µ

(

∞⋃

n=−∞

T−nI

)

=

∞∑

n=−∞

µ(T−nI) =

∞∑

n=−∞

µ(I)

and the only way for this to be finite is if µ(I) = 0. Hence µ assigns zero measure to theentire right semi-circle. Similarly, µ assigns zero measure to the left semi-circle.

Hence µ is concentrated on the two points N , S, and so must be a convex combinationof the Dirac delta measures δN and δS . Hence

M(X,T ) = {αδN + (1 − α)δS | α ∈ [0, 1]}

and the ergodic measures are the extreme points of M(X,T ), namely δN , δS .

§8.5 Unique ergodicity

We conclude by looking at the case where T : X → X has a unique invariant probabilitymeasure.

Definition. Let T : X → X be a continuous transformation of a compact metric spaceX. If there is a unique T -invariant probability measure then we say that T is uniquelyergodic.

Remark. You might wonder why such T are not instead called ‘uniquely invariant’. Recallthat the extreme points of M(X,T ) are precisely the ergodic measures. If M(X,T ) consistsof just one measure then that measure is an extreme, and so must be ergodic.

Unique ergodicity implies the following strong convergence result.

Theorem 8.5.1 (Oxtoby’s Ergodic Theorem)Let X be a compact metric space and let T : X → X be a continuous transformation. Thefollowing are equivalent:

(i) T is uniquely ergodic;

(ii) for each f ∈ C(X, R) there exists a constant c(f) such that

limn→∞

1

n

n−1∑

j=0

f(T jx) → c(f), (8.5.1)

uniformly for x ∈ X.

Remark. The convergence in (8.5.1) means that

limn→∞

supx∈X

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f(T jx) − c(f)

∣

∣

∣

∣

∣

∣

= 0.

78


Remark. If M(X,T ) = {µ} then the constant c(f) in (8.5.1) is∫

f dµ.

Proof. We prove (ii) implies (i). Suppose that µ, ν are T -invariant probability measures;we shall show that µ = ν. Integrating the expression in (ii), we obtain

∫

f dµ = limn→∞

1

n

n−1∑

j=0

∫

f ◦ T j dµ =

∫

limn→∞

1

n

n−1∑

j=0

f ◦ T j dµ =

∫

c(f) dµ = c(f),

(that the convergence in (8.5.1) is uniform allows us to interchange integration and takinglimits) and, by the same argument

∫

f dν = c(f).

Therefore∫

f dµ =

∫

f dν for all f ∈ C(X, R)

and so µ = ν (by the Riesz Representation Theorem).We prove (i) implies (ii). Let M(X,T ) = {µ}. If (ii) is true, then, by the Dominated

Convergence Theorem (Theorem 3.1.3), we must necessarily have c(f) =∫

f dµ.The convergence in (ii) means: ∀f ∈ C(X, R), ∀ε > 0, ∃N ∈ N such that if n ≥ N

then for all x ∈ X we have∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f(T jx) −∫

f dµ

∣

∣

∣

∣

∣

∣

< ε.

Suppose that (ii) is false. Then, negating the above quantifiers, we see that there existsf0 ∈ C(X, R) and ε > 0 and an increasing sequence nk ↑ ∞ such that there exists xnk

forwhich

∣

∣

∣

∣

∣

∣

1

nk

nk−1∑

j=0

f0(Tjxnk

) −∫

f0 dµ

∣

∣

∣

∣

∣

∣

≥ ε. (8.5.2)

Define the probability measure µk ∈ M(X) by

µk =1

nk

nk−1∑

j=0

T j∗ δxnk

,

so that (8.5.2) can be written as

∣

∣

∣

∣

∫

f0 dµk −∫

f0 dµ

∣

∣

∣

∣

≥ ε.

Now µk ∈ M(X) and M(X) is weak∗ compact. Hence there exists a weak∗ convergentsubsequence, say with weak∗ limit ν. By following the proof of Theorem 7.5.1, it is easy tosee that ν ∈ M(X,T ). In particular, we have

∣

∣

∣

∣

∫

f0 dν −∫

f0 dµ

∣

∣

∣

∣

≥ ε.

Therefore, ν 6= µ, contradicting unique ergodicity. ✷

79


§8.6 Irrational rotations

Let X = R/Z, T : X → X, T (x) = x+α mod 1 where α is irrational. We have already seenthat Lebesgue measure µ is an ergodic T -invariant measure. We can prove that Lebesguemeasure is the only invariant measure.

Proposition 8.6.1An irrational rotation of a circle is uniquely ergodic and the unique T -invariant measure isLebesgue measure.

Proof. We use Oxtoby’s Ergodic Theorem. To prove that T is uniquely ergodic, wemust show that (8.5.1) holds for every continuous function f ∈ C(X, R). Note that theconvergence in (8.5.1) is uniform, i.e. we must show that

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∥

∥

∥

∥

∥

∥

∞

→ 0 (8.6.1)

as n → ∞.We first prove (8.6.1) in the case when f(x) = e2πiℓx, ℓ ∈ Z \ {0}. Note that T j(x) =

x + jα. Hence

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f(T j(x))

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

e2πiℓ(x+jα)

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

∣

e2πiℓx 1

n

n−1∑

j=0

e2πiℓαj

∣

∣

∣

∣

∣

∣

=1

n

|e2πiℓαn − 1||e2πiℓα − 1|

≤ 1

n

2

|e2πiℓα − 1| . (8.6.2)

As α is irrational, the denominator in (8.6.2) is not zero. Note also that∫

e2πiℓx dµ = 0.Hence

supx∈X

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∣

∣

∣

∣

∣

∣

→ 0

as n → ∞ when f(x) = e2πiℓx, ℓ ∈ Z \ {0}. Clearly (8.6.1) holds when f is a constantfunction. By taking finite linear combinations of exponential functions we see that

supx∈X

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∣

∣

∣

∣

∣

∣

→ 0

as n → ∞ for all trigonometric polynomials g. By the Stone-Weierstrass Theorem (The-orem 1.2.2), trigonometric polynomials are uniformly dense in C(X, R). Let f ∈ C(X, R)and let ε > 0. Then there exists a trigonometric polynomial g such that ‖f − g‖∞ < ε.

80


Hence for any x ∈ X we have

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∣

∣

∣

∣

∣

∣

≤

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

(f(T j(x)) − g(T j(x)))

∣

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∣

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∫

g − f dµ

∣

∣

∣

∣

≤ 1

n

n−1∑

j=0

|f(T j(x)) − g(T j(x))| +

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∣

∣

∣

∣

∣

∣

+

∫

|g − f | dµ

≤ 2ε +

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∣

∣

∣

∣

∣

∣

.

Hence, taking the supremum over all x ∈ X, we have

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∥

∥

∥

∥

∥

∥

∞

< 2ε +

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∥

∥

∥

∥

∥

∥

∞

.

Letting n → ∞ we see that

lim supn→∞

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∥

∥

∥

∥

∥

∥

∞

< 2ε.

As ε > 0 is arbitrary, it follows that

limn→∞

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∥

∥

∥

∥

∥

∥

∞

= 0.

Hence statement (ii) in Oxtoby’s Ergodic Theorem holds. As (i) and (ii) in Oxtoby’s ErgodicTheorem are equivalent, it follows that T is uniquely ergodic and Lebesgue measure is theunique invariant measure. ✷

§8.7 Exercises

Exercise 8.1Prove the following identities concerning Radon-Nikodym derivatives.

(i) If ν ≪ µ and f ∈ L1(X,B, µ) then f ∈ L1(X,B, ν) and

∫

f dν =

∫

fdν

dµdµ.

(ii) If ν1, ν2 ≪ µ thend(ν1 + ν2)

dµ=

dν1

dµ+

dν2

dµ.

81


(iii) If λ ≪ ν ≪ µ then λ ≪ µ anddλ

dµ=

dλ

dν

dν

dµ.

Exercise 8.2Let X = R3/Z3 be the 3-dimensional torus. Fix α 6∈ Q. Define T : X → X by

T

xyz

+ Z3

=

α + xy + xz + y

+ Z3

.

Prove by induction that for n ≥ 3

T n

xyz

+ Z3

=

(

n1

)

α + x(

n2

)

α +

(

n1

)

x + y(

n3

)

α +

(

n2

)

x +

(

n1

)

y + z

+ Z3

(here

(

nr

)

denotes the binomial coefficient).

Let f(x, y, z) = e2πi(kx+ℓy+mz) where k, ℓ,m ∈ Z. Assuming Weyl’s Theorem on Poly-nomials (Theorem 2.3.1), prove using Weyl’s Criterion (Theorem 1.2.1) that

supx,y,z

1

n

n−1∑

j=0

f(T j((x, y, z) + Z3)) → 0

as n → ∞, whenever (k, ℓ,m) ∈ Z3 \ {(0, 0, 0)}.Hence, using Oxtoby’s Ergodic Theorem, prove that T is uniquely ergodic and Lebesgue

measure is the unique invariant measure.

Exercise 8.3Let T be a homeomorphism of a compact metric space X. Suppose that T is uniquelyergodic with unique invariant measure µ. Prove that every orbit of T is dense if, and onlyif, µ(U) > 0 for every non-empty open set U .

82

MATH4/61112 9. Recurrence

9. Recurrence

§9.1 Introduction

We can now begin to study ergodic theorems. Before we do this, we discuss a remarkableresult due to Poincare.

§9.2 Poincare’s Recurrence Theorem

Theorem 9.2.1 (Poincare’s Recurrence Theorem)Let T : X → X be a measure-preserving transformation of the probability space (X,B, µ).Let B ∈ B be such that µ(B) > 0. Then for µ-a.e. x ∈ B, the orbit {T nx}∞n=0 returns to Binfinitely often.

Proof. LetE = {x ∈ B | T nx ∈ B for infinitely many n ≥ 1},

then we have to show that µ(B\E) = 0.If we write

F = {x ∈ B | T nx 6∈ B ∀n ≥ 1}then we have the identity

B \ E =

∞⋃

k=0

(T−kF ∩ B).

Thus we have the estimate

µ(B\E) = µ

(

∞⋃

k=0

(T−kF ∩ B)

)

≤ µ

(

∞⋃

k=0

T−kF

)

≤∞∑

k=0

µ(T−kF ).

Since µ(T−kF ) = µ(F ) ∀k ≥ 0 (because the measure is preserved), it suffices to show thatµ(F ) = 0.

First suppose that n > m and that T−mF ∩ T−nF 6= ∅. If y lies in this intersectionthen Tmy ∈ F and T n−m(Tmy) = T ny ∈ F ⊂ B, which contradicts the definition of F .Thus T−mF and T−nF are disjoint.

Since {T−kF}∞k=0 is a disjoint family, we have

∞∑

k=0

µ(T−kF ) = µ

(

∞⋃

k=0

T−kF

)

≤ µ(X) = 1.

Since the terms in the summation have the constant value µ(F ), we must have µ(F ) = 0.✷

83


Remark. Note that the hypotheses of the Poincare Recurrence Theorem are very mild:all one needs is for T to be a measure-preserving transformation of a probability space. (Onedoes not need T to be ergodic.) If you carefully look at the proof, you will see that thefact that T is measure-preserving and the fact that µ(X) = 1 are used just once. The sameproof continues to hold in the case when µ(X) is finite. Poincare’s Recurrence Theorem isfalse with either of the hypotheses that µ(X) is finite or T is measure-preserving removed.

§9.3 Ergodic Theorems

An ergodic theorem is a result that describes the limiting behaviour of sequences of theform

1

n

n−1∑

j=0

f ◦ T j (9.3.1)

as n → ∞. The precise formulation of an ergodic theorem depends on the class of functionf (for example, one could assume that f is integrable, L2, or continuous), and the notionof convergence used (for example, the convergence could be pointwise, L2, or uniform).We have already studied when one has uniform convergence of (9.3.1): this is Oxtoby’sErgodic Theorem and only holds in the very special circumstances when T is uniquelyergodic. In what follows we will discuss von Neumann’s (Mean) Ergodic Theorem andBirkhoff’s Ergodic Theorem. Von Neumann’s Ergodic Theorem is in the context of f ∈L2(X,B, µ) and L2-convergence of the ergodic averages (9.3.1); Birkhoff’s Ergodic Theoremis in the context of f ∈ L1(X,B, µ) and almost everywhere pointwise convergence of (9.3.1).Note that L2 convergence neither implies nor is implied by almost everywhere pointwiseconvergence.

Before stating these theorems, we first need to discuss conditional expectation.

§9.4 Conditional expectation

Let (X,B, µ) be a probability space. Let A ⊂ B be a sub-σ-algebra. Note that µ defines ameasure on A by restriction. Let f ∈ L1(X,B, µ). Then we can define a measure ν on Aby setting, for A ∈ A,

ν(A) =

∫

Af dµ.

Note that ν ≪ µ|A. Hence by the Radon-Nikodym theorem, there is a unique A-measurablefunction E(f | A) such that

ν(A) =

∫

AE(f | A) dµ

for all A ∈ A. We call E(f | A) the conditional expectation of f with respect to theσ-algebra A.

So far, we have only defined E(f | A) for non-negative f . To define E(f | A) for anarbitrary real-valued f , we split f into positive and negative parts f = f+ − f− wheref+, f− ≥ 0 and define

E(f | A) = E(f+ | A) − E(f− | A).

For a complex-valued f we split f into its real and imaginary parts and define

E(f | A) = E(Re(f) | A) + iE(Im(f) | A).

84


Thus we can view conditional expectation as an operator

E(· | A) : L1(X,B, µ) → L1(X,A, µ).

Note that E(f | A) is uniquely determined by the two requirements that

(i) E(f | A) is A-measurable, and

(ii)∫

A f dµ =∫

A E(f | A) dµ for all A ∈ A.

Intuitively, one can think of E(f | A) as the best approximation to f in the smaller spaceof A-measurable functions.

Let T be a measure-preserving transformation of the probability space (X,B, µ). Tostate von Neumann’s and Birkhoff’s Ergodic Theorems precisely, we will need the sub-σ-algebra I of T -invariant subsets, namely:

I = {B ∈ B | T−1B = B a.e.}.

It is straightforward to check that I is a σ-algebra. Note that if T is ergodic then I is thetrivial σ-algebra consisting of all sets in B of measure 0 or 1.

§9.5 Von Neumann’s Ergodic Theorem

Von Neumann’s Ergodic Theorem deals with the L2-limiting behaviour of 1n

∑n−1j=0 f ◦ T j

for f ∈ L2(X,B, µ).

Theorem 9.5.1 (von Neumann’s Ergodic Theorem)Let (X,B, µ) be a probability space and let T : X → X be a measure-preserving transfor-mation. Let I denote the σ-algebra of T -invariant sets. Then for every f ∈ L2(X,B, µ),we have

limn→∞

1

n

n−1∑

j=0

f ◦ T j = E(f | I)

where the convergence is in L2.

When T is ergodic with respect to µ then von Neumann’s Ergodic Theorem takes aparticularly simple form.

Corollary 9.5.2Let (X,B, µ) be a probability space and let T : X → X be an ergodic measure-preservingtransformation. Let f ∈ L2(X,B, µ). Then

limn→∞

1

n

n−1∑

j=0

f ◦ T j =

∫

f dµ, (9.5.1)

where the convergence is in L2.

Proof. If T is ergodic then I is the trivial σ-algebra N consisting of sets of measure 0and 1. If f ∈ L2(X,B, µ) then E(f | N ) =

∫

f dµ. ✷

85


Remark. The meaning of convergence in (9.5.1) is that

limn→∞

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

f ◦ T j −∫

f dµ

∥

∥

∥

∥

∥

∥

2

= 0

i.e.

limn→∞

∫

1

n

n−1∑

j=0

f(T jx) −∫

f dµ

2

dµ

1/2

= 0.

§9.6 Proof of von Neumann’s Ergodic Theorem

None of this section is examinable—it is included for people who like hard-core functionalanalysis!

We prove von Neumann’s Ergodic Theorem in the case where T is invertible.In order to prove von Neumann’s Ergodic Theorem, it is useful to recast it in terms of

linear analysis.

Theorem 9.6.1 (von Neumann’s Ergodic Theorem for Operators)Let U be a unitary operator of a complex Hilbert space H. Let I = {v ∈ H | Uv = v} bethe closed subspace of U -invariant functions and let PI : H → I be orthogonal projectiononto I. Then for all v ∈ H we have

limn→∞

1

n

n−1∑

j=0

U jv = PIv (9.6.1)

in the norm induced on H by the inner product.

Proof of Theorem 9.6.1. Denote the inner product and norm on H by 〈·, ·〉 and ‖ · ‖,respectively.

First note that if v ∈ I then (9.6.1) holds, as

1

n

n−1∑

j=0

U jv = v = PIv.

If v = Uw − w for some w ∈ H then∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

U jv

∥

∥

∥

∥

∥

∥

=1

n‖Unw − w‖ ≤ 1

n2‖w‖ → 0

as n → ∞. If we let C denote the norm-closure of the subspace {Uw −w | w ∈ H} then itfollows that

limn→∞

1

n

n−1∑

j=0

U jv = 0

for all v ∈ C, by approximation.We claim that H = I ⊕ C, an orthogonal decomposition. Suppose that v ⊥ C. Then

〈v, Uw − w〉 = 0 for all w ∈ H. Hence 〈U∗v,w〉 = 〈v,w〉 for all w ∈ H. Hence U∗v = v.As U is unitary, we have that U∗ = U−1. Hence v = Uv, so that v ∈ I. Reversing eachimplication we see that v ∈ I implies v ⊥ C, and the claim follows. ✷

86


Remark. Note that an isometry of a Hilbert space H is a linear operator U such that〈Uv,Uw〉 = 〈v,w〉 for all v,w ∈ H. We say that U is unitary if, in addition, it is invertible.Equivalently, U is unitary if the dual operator U∗ is the inverse of U : U∗U = UU∗ = id.

We can prove von Neumann’s Ergodic Theorem for an invertible measure-preservingtransformation T of a probability space (X,B, µ) as follows. Recall that L2(X,B, µ) is aHilbert space with respect to the inner product

〈f, g〉 =

∫

f g dµ

and that T induces a linear operator U : L2(X,B, µ) → L2(X,B, µ) by Uf = f ◦ T . As Tis measure-preserving, we have that U is an isometry; if T is invertible then U is unitary.

Let PI : L2(X,B, µ) → L2(X,I, µ) denote the orthogonal projection onto the subspaceof T -invariant functions. One can easily check (see Exercise 9.6) that PIf = E(f | I).

Hence, when T is invertible, Theorem 9.5.1 follows immediately from Theorem 9.6.1.One can deduce from Theorem 9.6.1 that the result continues to hold when U is an

isometry and is not assumed to be invertible.

§9.7 Exercises

Exercise 9.1Construct an example to show that Poincare’s recurrence theorem does not hold on infinitemeasure spaces. That is, find a measure space (X,B, µ) with µ(X) = ∞ and a measure-preserving transformation T : X → X such that the conclusion of Poincare’s RecurrenceTheorem does not hold.

Exercise 9.2Poincare’s Recurrence Theorem says that, if we have a measure-preserving transformationT of a probability space (X,B, µ) and a set A ∈ B, µ(A) > 0, then, if we start iterating atypical point x ∈ A then the orbit of x will return to A infinitely often.

Construct an example to show that if we have a measure-preserving transformation Tof a probability space (X,B, µ) and two sets A,B ∈ B, µ(A), µ(B) > 0, then, if we startiterating a typical point x ∈ A then the orbit of x does not necessarily visit B infinitelyoften.

Exercise 9.3(i) Prove that f 7→ E(f | A) is linear.

(ii) Suppose that T is a measure-preserving transformation. Show that E(f | A) ◦ T =E(f ◦ T | T−1A).

(iii) Show that E(f | B) = f .

(iv) Let N denote the trivial σ-algebra consisting of all sets of measure 0 and 1. Showthat a function f is N -measurable if and only if it is constant a.e. Show that E(f |N ) =

∫

f dµ.

Exercise 9.4Let (X,B, µ) be a probability space.

87


(i) Let α = {A1, . . . , An}, Aj ∈ B be a finite partition of X. (By a partition we meanthat X =

⋃nj=1 Aj and Ai ∩Aj = ∅ if i 6= j.) Let A denote the set of all finite unions

of sets in α. Check that A is a σ-algebra.

(ii) Show that g : X → R is A-measurable if and only if g is constant on each Aj , i.e.

g(x) =

n∑

j=1

cjχAj (x).

(iii) Let f ∈ L1(X,B, µ). Show that

E(f | A)(x) =r∑

j=1

χAj (x)

∫

Ajf dµ

µ(Aj).

Thus E(f | A) is the best approximation to f that is constant on sets in the partitionα.

Exercise 9.5Prove that I is a σ-algebra.

Exercise 9.6Let T be a measure-preserving transformation of the probability space (X,B, µ) and let Idenote the sub-σ-algebra of T -invariant sets. Let PI : L2(X,B, µ) → L2(X,I, µ) denotethe orthogonal projection onto the subspace of T -invariant functions. Prove that PIf =E(f | I) for all f ∈ L2(X,B, µ).

88

MATH4/61112 10. Birkhoff’s Ergodic Theorem

10. Birkhoff’s Ergodic Theorem

§10.1 Birkhoff’s Ergodic Theorem

Birkhoff’s Ergodic Theorem deals with the behaviour of 1n

∑n−1j=0 f(T jx) for µ-a.e. x ∈ X,

and for f ∈ L1(X,B, µ).

Theorem 10.1.1 (Birkhoff’s Ergodic Theorem)Let (X,B, µ) be a probability space and let T : X → X be a measure-preserving transfor-mation. Let I denote the σ-algebra of T -invariant sets. Then for every f ∈ L1(X,B, µ),we have

limn→∞

1

n

n−1∑

j=0

f(T jx) = E(f | I)(x)

for µ-a.e. x ∈ X.

Corollary 10.1.2 (Birkhoff’s Ergodic Theorem for an ergodic transformation)Let (X,B, µ) be a probability space and let T : X → X be an ergodic measure-preservingtransformation. Let f ∈ L1(X,B, µ). Then

limn→∞

1

n

n−1∑

j=0

f(T jx) =

∫

f dµ

for µ-a.e. x ∈ X.

§10.2 Consequences of, and criteria for, ergodicity

Here we give some simple corollaries of Birkhoff’s Ergodic Theorem. The first result saysthat, for a typical orbit of an ergodic dynamical system, ‘time averages’ equal ‘space aver-ages’.

Corollary 10.2.1Let T be an ergodic measure-preserving transformation of the probability space (X,B, µ).Suppose that B ∈ B. Then for µ-a.e. x ∈ X, the frequency with which the orbit of x liesin B is given by µ(B), i.e.,

limn→∞

1

ncard{j ∈ {0, 1, . . . , n − 1} | T jx ∈ B} = µ(B) µ-a.e.

Proof. Apply the Birkhoff Ergodic Theorem with f = χB. ✷

It is possible to characterise ergodicity in terms of the behaviour of iteration of pre-images of sets, rather than the iteration points, under the dynamics. The next result dealswith this.

89


Proposition 10.2.2Let (X,B, µ) be a probability space and let T : X → X be a measure-preserving transfor-mation. The following are equivalent:

(i) T is ergodic;

(ii) for all A,B ∈ B,

limn→∞

1

n

n−1∑

j=0

µ(T−jA ∩ B) = µ(A)µ(B).

Proof. We prove that (i) implies (ii). Suppose that T is ergodic. Since χA ∈ L1(X,B, µ),Birkhoff’s Ergodic Theorem tells us that

limn→∞

1

n

n−1∑

j=0

χA(T jx) = µ(A)

for µ-a.e. x ∈ X. Multiplying both sides by χB(x) gives

limn→∞

1

n

n−1∑

j=0

χA(T jx)χB(x) = µ(A)χB(x)

for µ-a.e. x ∈ X. Since the left-hand side is bounded (by 1), we can apply the DominatedConvergence Theorem (Theorem 3.1.3) to see that

limn→∞

1

n

n−1∑

j=0

µ(T−jA∩B) = limn→∞

1

n

n−1∑

j=0

∫

χA◦T j χB dµ = limn→∞

∫

1

n

n−1∑

j=0

χA◦T j χB dµ = µ(A)µ(B).

We prove that (ii) implies (i). Now suppose that the convergence holds. Suppose thatT−1B = B and take A = B. Then µ(T−jA ∩ B) = µ(B) so

limn→∞

1

n

n−1∑

j=0

µ(B) = µ(B)2.

This gives µ(B) = µ(B)2. Therefore µ(B) = 0 or 1 and so T is ergodic. ✷

§10.3 Kac’s Lemma

Poincare’s Recurrence Theorem tells us that, under a measure-preserving transformation,almost every point of a subset A of positive measure will return to A. However, it does nottell us how long we should have to wait for this to happen. One would expect that returntimes to sets of large measure are small, and that return times to sets of small measure arelarge. This is indeed the case, and forms the content of Kac’s Lemma.

Let T : X → X be a measure-preserving transformation of a probability space (X,B, µ)and let A ⊂ X be a measurable subset with µ(A) > 0. By Poincare’s Recurrence Theorem,the integer

nA(x) = inf{n ≥ 1 | T n(x) ∈ A}is defined for a.e. x ∈ A.

90


Theorem 10.3.1 (Kac’s Lemma)Let T be an ergodic measure-preserving transformation of the probability space (X,B, µ).Let A ∈ B be such that µ(A) > 0. Then

∫

AnA dµ = 1.

Proof. LetAn = A ∩ T−1Ac ∩ · · · ∩ T−(n−1)Ac ∩ T−nA.

Then An consists of those points in A that return to A after exactly n iterations of T , i.e.An = {x ∈ A | nA(x) = n}.

Consider the illustration in Figure 10.3. As T is ergodic, almost every point of X

A1 A2 A3 An

T

T T

T T

T

T

T

A

Figure 10.3.1: The return times to A

eventually enters A. Hence the diagram represents almost all of X. Note that the columnabove An in the diagram consists of n sets, An,0, . . . , An,n−1 say, with An,0 = An. Notethat T−kAn,k = An. As T is measure-preserving, it follows that µ(An,k) = µ(An) fork = 0, . . . , n − 1. Hence

1 = µ(X) =

∞∑

n=1

n−1∑

k=0

µ(An,k) =

∞∑

n=1

nµ(An)

=

∞∑

n=1

∫

An

nA dµ =

∫

AnA dµ.

✷

Remark. Let A be as in the statement of Kac’s Lemma (Theorem 10.3.1). Define aprobability measure µA on A by µA = µ/µ(A) so that µA(A) = 1. Then Kac’s Lemma says

91


that∫

AnA dµA =

1

µ(A),

i.e. the expected return time of a point in A to the set A is 1/µ(A).

§10.4 Ehrenfests’ example

The following example, due to P. and T. Ehrenfest, demonstrates that the return times inPoincare’s Recurrence Theorem may be extremely large.

Consider two urns. One urn contains 100 balls, numbered 1 to 100, and the other urnis empty. We also have a random number generator: this could be a bag containing 100slips of paper, numbered 1 to 100.

Each second, a slip of paper is drawn from the bag, the number is noted, and the slip ofpaper is returned to the bag. The ball bearing that number is then moved from whicheverurn it is currently in to the other urn.

Naively, we would expect that the system will settle into an equilibrium state in whichthere are 50 balls in each urn. Of course, there will continue to be small random fluctuationsabout the 50-50 distribution. However, it would appear highly unlikely for the system toreturn to the state in which 100 balls are in the first urn. Nevertheless, the PoincareRecurrence Theorem tells us that this situation will occur almost surely and Kac’s Lemmatells us how long we should expect to wait.

To see this, we represent the system as a shift on 101 symbols with an appropriateMarkov measure. Regard xj ∈ {0, . . . , 100} as being the number of balls in the first urnafter j seconds. Hence a sequence (xj)

∞j=0 records the number of balls in the first urn at

each time. Let Σ = {x = (xj)∞j=0 | xj ∈ {0, 1, . . . , 100}}.

Let p(i) denote the probability of there being i balls in the first urn. This is equal tothe number of possible ways of choosing i balls from 100, divided by the total number of

ways of distributing 100 balls across the 2 urns. There are

(

100i

)

ways of choosing i balls

from 100 balls. As there are 2 possible urns for each ball to be in, there are 2100 possiblearrangements of all the balls. Hence the probability of there being i balls in the first urn is

p(i) =1

2100

(

100i

)

.

If we have i balls in the first urn then at the next stage we must have either i−1 or i+1balls in the first urn. The number of balls becomes i − 1 if the random number chosen isequal to the number of one of the balls in the first urn. As there are currently i such balls,the probability of this happening is i/100. Hence the probability P (i, i − 1) that there arei − 1 balls remaining given that we started with i balls in the first urn is i/100. Similarly,the probability P (i, i + 1) that there are i + 1 balls in the first urn given that we startedwith i balls is (100 − i)/100. if j 6= i − 1, i + 1 then we cannot have j balls in the first urngiven that we started with i balls; thus P (i, j) = 0. This defines a stochastic matrix:

P =

0 1 0 0 0 · · ·1

100 0 99100 0 0 · · ·

0 2100 0 98

100 0 · · ·0 0 3

100 0 97100 · · ·

......

......

.... . .

92


It is straightforward to check that pP = p. Hence we have a Markov probability measureµP defined on Σ. The matrix P is irreducible (but is not aperiodic); this ensures that µP

is ergodic.Consider the cylinder A = [100] of length 1. This represents there being 100 balls in the

first urn. By Poincare’s Recurrence Theorem, if we start in A then we return to A infinitelyoften. Thus, with probability 1, we will return to the situation where all 100 balls havereturned to the first urn—and this will happen infinitely often! We can use Kac’s Lemmato calculate the expected amount of time we will have to wait until all the balls first returnto the first urn. By Kac’s lemma, the expected first return time to A is

1

µP (A)= 2100 seconds,

which is about 4× 1022 years, or about 3× 1012 times the length of time that the Universehas so far existed!

(This measure-preserving transformation system, with 4 balls rather than 100, was alsostudied in Exercise 3.11.)

§10.5 Proof of Birkhoff’s Ergodic Theorem

None of this section is examinable—it is included for people who like hard-core ε-δ analysis!The proof is based on the following inequality.

Theorem 10.5.1 (Maximal Inequality)Let (X,B, µ) be a probability space, let T : X → X be a measure-preserving transformationand let f ∈ L1(X,B, µ). Define f0 = 0 and, for n ≥ 1,

fn = f + f ◦ T + · · · + f ◦ T n−1.

For n ≥ 1, set Fn(x) = max0≤j≤n fj(x) so that Fn(x) ≥ 0. Then

∫

{x∈X|Fn(x)>0}f dµ ≥ 0.

Proof. Clearly Fn ∈ L1(X,B, µ). For 0 ≤ j ≤ n, we have Fn ≥ fj, so Fn ◦ T ≥ fj ◦ T .Hence

Fn ◦ T + f ≥ fj ◦ T + f = fj+1

and thereforeFn ◦ T (x) + f(x) ≥ max

1≤j≤nfj(x).

If Fn(x) > 0 thenmax

1≤j≤nfj(x) = max

0≤j≤nfj(x) = Fn(x),

so we obtain thatf ≥ Fn − Fn ◦ T

on the set A = {x | Fn(x) > 0}.

93


Hence∫

Af dµ ≥

∫

AFn dµ −

∫

AFn ◦ T dµ

=

∫

XFn dµ −

∫

AFn ◦ T dµ as Fn = 0 on X \ A

≥∫

XFn dµ −

∫

XFn ◦ T dµ as Fn ◦ T ≥ 0

= 0 as µ is T -invariant.

✷

Corollary 10.5.2Let g ∈ L1(X,B, µ) and let

Mα =

x ∈ X | supn≥1

1

n

n−1∑

j=0

g(T jx) > α

.

Then for all B ∈ B with T−1B = B we have that∫

Mα∩Ag dµ ≥ αµ(Mα ∩ B).

Proof. Suppose first that B = X. Let f = g − α, then

Mα =

∞⋃

n=1

x |n−1∑

j=0

g(T jx) > nα

=

∞⋃

n=1

{x | fn(x) > 0} =

∞⋃

n=1

{x | Fn(x) > 0}

(since fn(x) > 0 ⇒ Fn(x) > 0 and Fn(x) > 0 ⇒ fj(x) > 0 for some 1 ≤ j ≤ n). WriteCn = {x | Fn(x) > 0} and observe that Cn ⊂ Cn+1. Thus χCn converges to χMα and sofχCn converges to fχMα, as n → ∞. Furthermore, |fχCn | ≤ |f |. Hence, by the DominatedConvergence Theorem,

∫

Cn

f dµ =

∫

XfχCn dµ →

∫

XfχMα dµ =

∫

Mα

f dµ, as n → ∞.

Applying the Maximal Inequality, we have for all n ≥ 1 that∫

Cnf dµ ≥ 0. Therefore

∫

Mαf dµ ≥ 0, i.e.,

∫

Mαg dµ ≥ αµ(Mα).

For the general case, we work with the restriction of T to B, T |B : B → B, and applythe Maximal Inequality on this subset to get

∫

Mα∩Bg dµ ≥ αµ(Mα ∩ B),

as required. ✷

We will also need the following convergence result.

Proposition 10.5.3 (Fatou’s Lemma)Let (X,B, µ) be a probability space and suppose that fn : X → R are measurable functions.Define f(x) = lim infn→∞ fn(x). Then f is measurable and

∫

f dµ ≤ lim infn→∞

∫

fn dµ

(one or both of these expressions may be infinite).

94


Proof of Birkhoff’s Ergodic Theorem. Let

f∗(x) = lim supn→∞

1

n

n−1∑

j=0

f(T jx), f∗(x) = lim infn→∞

1

n

n−1∑

j=0

f(T jx).

These exist (but may be ±∞, respectively) at all points x ∈ X. Clearly f∗(x) ≤ f∗(x).Let

an(x) =1

n

n−1∑

j=0

f(T jx).

Observe thatn + 1

nan+1(x) = an(Tx) +

1

nf(x).

As f is finite µ-a.e., we have that f(x)/n → 0 µ-a.e. as n → ∞. Hence, taking the lim supand lim inf as n → ∞, gives us that f∗ ◦ T = f∗ µ-a.e. and f∗ ◦ T = f∗ µ-a.e.

We have to show

(i) f∗ = f∗ µ-a.e

(ii) f∗ ∈ L1(X,B, µ)

(iii)∫

f∗ dµ =∫

f dµ.

We prove (i). For α, β ∈ R, define

Eα,β = {x ∈ X | f∗(x) < β and f∗(x) > α}.

Note that{x ∈ X | f∗(x) < f∗(x)} =

⋃

β<α, α,β∈Q

Eα,β

(a countable union). Thus, to show that f∗ = f∗ µ-a.e., it suffices to show that µ(Eα,β) = 0whenever β < α. Since f∗ ◦ T = f∗ and f∗ ◦ T = f∗, we see that T−1Eα,β = Eα,β . If wewrite

Mα =

x ∈ X | supn≥1

1

n

n−1∑

j=0

f(T jx) > α

then Eα,β ∩ Mα = Eα,β.Applying Corollary 10.5.2 we have that

∫

Eα,β

f dµ =

∫

Eα,β∩Mα

f dµ

≥ αµ(Eα,β ∩ Mα) = αµ(Eα,β).

Replacing f , α and β by −f , −β and −α and using the fact that (−f)∗ = −f∗ and(−f)∗ = −f∗, we also get

∫

Eα,β

f dµ ≤ βµ(Eα,β).

Thereforeαµ(Eα,β) ≤ βµ(Eα,β)

95


and since β < α this shows that µ(Eα,β) = 0. Thus f∗ = f∗ µ-a.e. and

limn→∞

1

n

n−1∑

j=0

f(T jx) = f∗(x) µ-a.e.

We prove (ii). Let

gn(x) =

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f(T jx)

∣

∣

∣

∣

∣

∣

.

Then gn ≥ 0 and∫

gn dµ ≤∫

|f | dµ

so we can apply Fatou’s Lemma (Proposition 10.5.3) to conclude that limn→∞ gn = |f∗| isintegrable, i.e., that f∗ ∈ L1(X,B, µ).

We prove (iii). For n ∈ N and k ∈ Z, define

Dnk =

{

x ∈ X | k

n≤ f∗(x) <

k + 1

n

}

.

For every ε > 0, we have thatDn

k ∩ M kn−ε = Dn

k .

Since T−1Dnk = Dn

k , we can apply Corollary 10.5.2 again to obtain

∫

Dnk

f dµ ≥(

k

n− ε

)

µ(Dnk ).

Since ε > 0 is arbitrary, we have

∫

Dnk

f dµ ≥ k

nµ(Dn

k ).

Thus∫

Dnk

f∗ dµ ≤ k + 1

nµ(Dn

k ) ≤ 1

nµ(Dn

k ) +

∫

Dnk

f dµ

(where the first inequality follows from the definition of Dnk ). Since

X =⋃

k∈Z

Dnk

(a disjoint union), summing over k ∈ Z gives

∫

Xf∗ dµ ≤ 1

nµ(X) +

∫

Xf dµ

=1

n+

∫

Xf dµ.

Since this holds for all n ≥ 1, we obtain

∫

Xf∗ dµ ≤

∫

Xf dµ.

96


Applying the same argument to −f gives

∫

(−f)∗ dµ ≤∫

−f dµ

so that∫

f∗ dµ =

∫

f∗ dµ ≥∫

f dµ.

Therefore∫

f∗ dµ =

∫

f dµ,

as required.Finally, we prove that f∗ = E(f | I). First note that as f∗ is T -invariant, it is

measurable with respect to I. Moreover, if I is any T -invariant set then

∫

If dµ =

∫

If∗ dµ.

Hence f∗ = E(f | I). ✷

§10.6 Exercises

Exercise 10.1Suppose that T is an ergodic measure-preserving transformation of the probability space(X,B, µ) and suppose that f ∈ L1(X,B, µ). Prove that

limn→∞

f(T nx)

n= 0 µ-a.e.

Exercise 10.2Deduce from Birkhoff’s Ergodic Theorem that if T is an ergodic measure-preserving trans-formation of a probability space (X,B, µ) and f ≥ 0 is measurable but

∫

f dµ = ∞ then

limn→∞

1

n

n−1∑

j=0

f(T jx) = ∞ µ-a.e.

(Hint: define fM = min{f,M} and note that fM ∈ L1(X,B, µ). Apply Birkhoff’s ErgodicTheorem to each fM .)

Exercise 10.3Let T be a measure-preserving transformation of the probability space (X,B, µ). Provethat the following are equivalent:

(i) T is ergodic with respect to µ,

(ii) for all f, g ∈ L2(X,B, µ) we have that

limn→∞

1

n

n−1∑

j=0

∫

f(T jx)g(x) dµ =

∫

f dµ

∫

g dµ.

97


Exercise 10.4Let X be a compact metric space equipped with the Borel σ-algebra B and let T : X → Xbe continuous. Suppose that µ ∈ M(X) is an ergodic measure.

Prove that there exists a set Y ∈ B with µ(Y ) = 1 such that

limn→∞

1

n

n−1∑

j=0

f(T jx) =

∫

f dµ

for all x ∈ Y and for all f ∈ C(X, R).(Thus, in the special case of a continuous transformation of a compact metric space

and continuous functions f , the set of full measure for which Corollary 10.1.2 holds can bechosen to be independent of the function f .)

Exercise 10.5A popular illustration of recurrence concerns a monkey typing the complete works of Shake-speare on a typewriter. Here we study this from an ergodic-theoretic viewpoint.

Imagine a(n idealised) monkey typing on a typewriter. Each second he types one let-ter, and each letter occurs with equal probability (independently of the preceding letter).Suppose that the keyboard has 26 keys (so no space bar, carriage return, numbers, etc).Show how to model this using a shift space on 26 symbols with an appropriate Bernoullimeasure. Use Birkhoff’s Ergodic Theorem to show that the monkey must, with probability1, eventually type the word ‘MONKEY’. Use Kac’s Lemma to calculate the expected timeit would take for the monkey to first type ‘MONKEY’.

98

MATH4/61112 11. Applications of Birkhoff’s Ergodic Theorem

11. Applications of Birkhoff’s Ergodic Theorem

§11.1 Introduction

We will show how to use Birkhoff’s Ergodic Theorem to prove some interesting results innumber theory.

§11.2 Normal and simply normal numbers

Recall that any number x ∈ [0, 1] can written as a decimal

x = ·x0x1x2 . . . =

∞∑

j=0

xj

10j+1

where xj ∈ {0, 1, . . . , 9}. This decimal expansion is unique unless the decimal expansionends in either infinitely repeated 0s or infinitely repeated 9s.

More generally, given any integer base b ≥ 2, we can write x ∈ [0, 1] as a base bexpansion:

x = ·x0x1x2 . . . =

∞∑

j=0

xj

bj+1

where xj ∈ {0, 1, . . . , b − 1}. This expansion is unique unless it ends in either infinitelyrepeated 0s or infinitely repeated (b− 1)s. In what follows, if x has two expansions in baseb then we always choose the expansion that ends in infinitely repeated 0s; note that this isa countable set and so has Lebesgue measure zero.

Definition. A number x ∈ [0, 1] is said to be simply normal in base b if for each k =0, 1, . . . , b− 1, the frequency with which digit k occurs in the base b expansion of x is equalto 1/b.

Remarks.

1. Thus a number is simply normal in base b if all of the b possible digits in its base bexpansion are equally likely to occur.

2. It is straightfoward to construct examples of simply normal numbers in a given base.For example,

x = ·012 · · · 89012 · · · 89 · · · (11.2.1)

consisting of the block of decimal digits 012 · · · 89 infinitely repeated is simply normalin base 10. If a number is simply normal in one base then it need not be simplynormal in any other base. For example, x as defined in (11.2.1) is not simply normalin base 1010.

99


Fix b ≥ 2. Define the map T : [0, 1] → [0, 1] by T (x) = Tb(x) = bx mod 1. It is easy tosee, by following any of the arguments we have seen for the doubling map, that Lebesguemeasure µ on [0, 1] is an ergodic invariant measure for T .

There is a close connection between the map Tb and base b expansions. Note that ifx ∈ [0, 1] has base b expansion

x =

∞∑

j=0

xj

bj+1= ·x0x1x2 · · ·

then

Tb(x) = b∞∑

j=0

xj

bj+1mod 1 = x0 +

∞∑

j=1

xj

bjmod 1 =

∞∑

j=0

xj+1

bj+1= ·x1x2x3 · · · .

Thus Tb acts on base b expansions by deleting the zeroth term and then shifting the re-maining digits one place to the left. This relationship between base b expansions and themap Tb can be used to prove the following result.

Proposition 11.2.1Let b ≥ 2. Then Lebesgue almost every real number in [0, 1] is simply normal in base b.

Proof. Fix k ∈ {0, 1, . . . , b − 1}. Note that x0 = k if and only if x ∈ [k/b, (k + 1)/b).Hence xj = k if and only if T j

b (x) ∈ [k/b, (k + 1)/b). Thus

1

ncard{0 ≤ j ≤ n − 1 | xj = k} =

1

n

n−1∑

j=0

χ[k/b,(k+1)/b)(Tjx). (11.2.2)

By Birkhoff’s Ergodic Theorem, for Lebesgue almost every point x the above expressionconverges to

∫

χ[k/b,(k+1)/b)(x) dx = 1/b. Let Xb(k) denote the set of points x ∈ [0, 1] forwhich (11.2.2) converges. Then µ(Xb(k)) = 1 for each k = 0, 1, . . . , b − 1.

Let Xb =⋂b−1

k=0 Xb(k). Let µ denote Lebesgue measure. Then µ(Xb) = 1. If x ∈ Xb

then the frequency with which each digit k ∈ {0, 1, . . . , b−1} occurs in the base b expansionof x is equal to 1/b, i.e. x is simply normal in base b. ✷

We can consider a more general notion of normality of numbers as follows. Take x ∈ [0, 1]and write x as a base b expansion

x = ·x0x1x2 . . . =

∞∑

j=0

xj

bj+1

where xj ∈ {0, 1, . . . , b − 1}. Fix a finite word of symbols i0, i1, . . . , ik−1 where ij ∈{0, 1, . . . , b − 1}, j = 0, . . . , k − 1. We can ask what is the frequency with which theblock of symbols i0, i1, . . . , ik−1 occurs in the base b expansion of x. We call a block asymbols i0, i1, . . . , ik−1 a word of length k. Note that x has a base b expansion that startsi0i1 · · · ik−1 precisely when

x ∈

k−1∑

j=0

ijbj+1

,k−1∑

j=0

ijbj+1

+1

bk

.

Call this interval I(i0, . . . , ik−1) and note that it has Lebesgue measure 1/bk.

100


Definition. A number x ∈ [0, 1] is simply normal if it is simply normal in base b for allb ≥ 2.

Proposition 11.2.2Lebesgue almost every real number x ∈ [0, 1] is simply normal.

Proof. See Exercise 11.2(i). ✷

Definition. A number x ∈ [0, 1] is said to be normal in base b if, for each k ≥ 1 and foreach word i0, i1, . . . , ik−1 of length k, the frequency with which this word occurs in the baseb expansion of x is equal to 1/bk.

Proposition 11.2.3Let b ≥ 2 be an integer. Lebesgue almost every real number in [0, 1] is normal in base b.

Proof. Fix a word i0, i1, . . . , ik−1 of length k and define the interval I(i0, . . . , ik−1) asabove. Then the word i0, i1, . . . , ik−1 occurs at the jth place in the base b expansion of xif and only if T j

b (x) ∈ I(i0, . . . , ik−1). Thus

1

ncard{0 ≤ j ≤ n − 1 | i0, i1, . . . , ik−1 occurs at the jth place in the base b expansion of x}

=1

n

n−1∑

j=0

χI(i0,...,ik−1)(Tjx). (11.2.3)

By Birkhoff’s Ergodic Theorem, for Lebesgue almost every point x the above expres-sion converges to

∫

χI(i0,...,ik−1)(x) dx = 1/bk. Let Xb(i0, i1, . . . , ik−1) denote the set ofpoints x ∈ [0, 1] for which (11.2.3) converges. Let µ denote Lebesgue measure. Thenµ(Xb(i0, i1, . . . , ik−1)) = 1 for each word i0, i1, . . . , ik−1 of length k.

Let

Xb =

∞⋂

k=1

⋂

i0,i1,...,ik−1

Xb(i0, i1, . . . , ik−1)

where the second intersection is taken over all words of length k. As this is a countableintersection, we have that µ(Xb) = 1. If x ∈ Xb then the frequency with which any word oflength k occurs in the base b expansion of x is equal to 1/bk, i.e. x is normal in base b. ✷

We can then make the following definition.

Definition. A number x ∈ [0, 1] is normal if it is normal in base b for every base b ≥ 2.

We can now prove the following result:

Proposition 11.2.4Lebesgue almost every number x ∈ [0, 1] is normal.

Proof. See Exercise 11.2(ii). ✷

Remark. Although a ‘typical’ number is normal, there are no known examples of normalnumbers!

101


§11.3 Continued fractions

We can use Birkhoff’s Ergodic Theorem to study the frequency with which a given digitoccurs in the continued fraction expansion of real numbers.

Proposition 11.3.1For Lebesgue-almost every x ∈ [0, 1], the frequency with which the natural number k occursin the continued fraction expansion of x is

1

log 2log

(

(k + 1)2

k(k + 2)

)

.

Proof. Let λ denote Lebesgue measure and let µ denote Gauss’ measure. Recall that λand µ are equivalent, i.e. they have the same sets of measure zero. Then λ-a.e. and µ-a.e.x ∈ [0, 1] is irrational and has an infinite continued fraction expansion

x =1

x0 +1

x1 +1

x2 + · · ·

. (11.3.1)

Let T denote the continued fraction map. Then

T (x) =1

x1 +1

x2 +1

x3 + · · ·so that

1

T (x)= x1 +

1

x2 +1

x3 + · · ·.

Hence x1 = [1/T (x)], where [x] denotes the integer part of x. More generally, xn = [1/T nx].Fix k ∈ N. Note that x has a continued fraction expansion starting with digit k (i.e.

x0 = k) precisely when [1/x] = k. That is, x0 = k precisely when

k ≤ 1

x< k + 1

which is equivalent to requiring1

k + 1< x ≤ 1

k

i.e. x ∈ (1/(k + 1), 1/k]. Similarly xn = k precisely when T nx ∈ (1/(k + 1), 1/k].Hence

limn→∞

1

ncard{0 ≤ j ≤ n − 1 | xj = k} = lim

n→∞

1

n

n−1∑

j=0

χ(1/(k+1),1/k](Tjx)

=

∫

χ(1/(k+1),1/k] dµ for µ-a.e. x

=1

log 2

[

log

(

1 +1

k

)

− log

(

1 +1

k + 1

)]

for µ-a.e. x

=1

log 2log

(k + 1)2

k(k + 2)for µ-a.e. x.

102


As µ and λ have the same sets of measure zero, this holds for Lebesgue almost every point.✷

We can also study the limiting arithmetic and geometric means of the digits in thecontinued fraction expansion of Lebesgue almost every point x ∈ [0, 1].

Proposition 11.3.2(i) For Lebesgue-almost every x ∈ [0, 1], the limiting arithmetic mean of the digits in the

continued fraction expansion of x is infinite. More specifically, for Lebesgue almostevery x ∈ [0, 1] we have

limn→∞

1

n(x0 + x1 + · · · + xn−1) = ∞

where x has continued fraction expansion given by (11.3.1).

(ii) For Lebesgue-almost every x ∈ [0, 1], the limiting geometric mean of the digits in thecontinued fraction expansion of x is

∞∏

k=1

(

1 +1

k2 + 2k

)log k/ log 2

.

More specifically, for Lebesgue almost every x ∈ [0, 1] we have

limn→∞

(x0x1 · · · xn−1)1/n =

∞∏

k=1

(

1 +1

k2 + 2k

)log k/ log 2

.

where x has continued fraction expansion given by (11.3.1).

Proof. Writing

x =1

x0 +1

x1 +1

x2 + · · ·

.

the proposition claims that

limn→∞

1

n(x0 + x1 + · · · + xn−1) = ∞ (11.3.2)

for Lebesgue almost every point, and that

limn→∞

(x0x1 · · · xn−1)1/n =

∞∏

k=1

(

1 +1

k2 + 2k

)log k/ log 2

(11.3.3)

for Lebesgue almost every point.We leave (11.3.2) as an exercise.We prove (11.3.3). Define f(x) = log k for x ∈ (1/(k + 1), 1/k] so that f(x) = log k

precisely when x0 = k. Then f(T jx) = log k precisely when xj = k. By Exercise 3.5(iii),

103


to show f ∈ L1(X,B, µ) it is sufficient to show that f ∈ L1(X,B, λ). Note that

∫

f dλ =∞∑

k=1

log k λ

((

1

k + 1,1

k

])

=∞∑

k=1

log k

k(k + 1)

≤∞∑

k=1

log k

k2,

which converges. Hence f ∈ L1(X,B, µ).Now

limn→∞

log (x0x1 · · · xn−1)1/n = lim

n→∞

1

n(log x0 + log x1 + · · · + log xn−1)

= limn→∞

1

n

n−1∑

j=0

f(T jx)

=1

log 2

∫ 1

0

f(x)

1 + xdx

=1

log 2

∞∑

k=1

∫ 1/k

1/(k+1)

log k

1 + xdx

=∞∑

k=1

log k

log 2log

(

1 +1

k2 + 2k

)

as n → ∞, for Gauss-almost every point x ∈ [0, 1]. As Gauss’ measure and Lebesguemeasure have the same sets of measure zero, this limit also exists for Lebesgue almostevery point. Exponentiating both sides of the above gives the result. ✷

Let x ∈ [0, 1] be irrational and have continued fraction expansion [x0, x1, . . .]. Then[x0, x1, . . . , xn−1] is a rational number; write [x0, x1, . . . , xn−1] = Pn(x)/Qn(x), wherePn(x), Qn(x) are co-prime integers. Then Pn(x)/Qn(x) is a ‘good’ rational approxima-tion to x. We write Pn(x), Qn(x) if we wish to indicate the dependence on x. As x andPn(x)/Qn(x) lie in the same cylinder I(x0, . . . , xn−1) of rank n, we must have that

∣

∣

∣

∣

x − Pn(x)

Qn(x)

∣

∣

∣

∣

≤ diam I(x0, . . . , xn−1) ≤1

Qn(x)2,

where diam I denotes the length of the interval I. Thus we can quantify how good arational approximation Pn(x)/Qn(x) is to x by looking at the denominator Qn(x). Thusunderstanding how Qn(x) grows gives us information about x. For a typical point, Qn(x)grows exponentially fast and we can determine the exponential growth rate.

Proposition 11.3.3For Lebesgue almost every real number x ∈ [0, 1] we have that

limn→∞

1

nlog Qn(x) =

π2

12 log 2.

104


Remark. Thus, for a typical point x ∈ [0, 1], we have that Qn(x) ∼ enπ2/12 log 2.

Proof (not examinable). Let x ∈ [0, 1] be irrational and have continued fraction ex-pansion [x0, x1, . . .]. Write

[x0, x1, . . . , xn−1] =Pn(x)

Qn(x).

Then

Pn(x)

Qn(x)=

1

x0 + [x1, . . . , xn−1]=

1

x0 +Pn−1(Tx)

Qn−1(Tx)

=Qn−1(Tx)

Pn−1(Tx) + x0Qn−1(Tx). (11.3.4)

By Lemma 6.3.1(ii) and the Euclidean algorithm, we know that for all n and all x,Pn(x) and Qn(x) are coprime. As Pn−1(Tx) and Qn−1(Tx) are coprime, it follows thatPn−1(Tx) + x0Qn−1(Tx) and Qn−1(Tx) are coprime. Hence, comparing the numerators in(11.3.4), we see that Pn(x) = Qn−1(Tx). Also note that P1(x) = 1. Hence

Pn(x)

Qn(x)

Pn−1(Tx)

Qn−1(Tx)· · · P1(T

n−1x)

Q1(T n−1x)=

P1(Tn−1x)

Qn(x)=

1

Qn(x).

Taking the logarithm and dividing by n gives that

− 1

nlog Qn(x) =

1

nlog

n−1∏

j=0

Pn−j(Tjx)

Qn−j(T jx)=

1

n

n−1∑

j=0

logPn−j(T

jx)

Qn−j(T jx). (11.3.5)

This resembles an ergodic sum, except that the function Pn−j/Qn−j depends on j and sowe cannot immediately apply Birkhoff’s Ergodic Theorem. We will consider ergodic sumsusing the function f(x) = log x and show that the difference between 1

n

∑n−1j=0 f(T jx) and

(11.3.5) is small.Let f(x) = log x. Then we can write (11.3.5) as

− 1

nlog Qn(x) =

1

n

n−1∑

j=0

f(T jx) − 1

n

n−1∑

j=0

(

log T j(x) − logPn−j(T

jx)

Qn−j(T jx)

)

=1

nΣ(1)

n − 1

nΣ(2)

n .

We evaluate limn→∞1nΣ

(1)n . By Birkhoff’s Ergodic Theorem and the fact that Gauss’

measure µ and Lebesgue measure are equivalent, it follows that for Lebesgue almost everyx ∈ [0, 1] we have that

limn→∞

1

nΣ(1)

n =1

log 2

∫

f(x)

1 + xdx =

1

log 2

∫ 1

0

log x

1 + xdx.

Integrating by parts we have that

∫ 1

0

log x

1 + xdx = log x log(1 + x)|10 −

∫ 1

0

log(1 + x)

xdx = −

∫ 1

0

log(1 + x)

xdx.

The Taylor series expansion of log(1 + x) about zero is

log(1 + x) = x − x2

2+

x3

3− · · · =

∞∑

k=1

(−1)k−1xk

k

105


so thatlog(1 + x)

x=

∞∑

k=0

(−1)kxk

k + 1.

Hence for almost every x,

limn→∞

1

nΣ(1)

n = − 1

log 2

∞∑

k=0

(−1)k

k + 1

∫ 1

0xk dx = − 1

log 2

∞∑

k=0

(−1)k

(k + 1)2.

Note that

∞∑

k=0

(−1)k

(k + 1)2=

∞∑

n=1

1

n2− 2

∞∑

n=1

1

(2n)2=

∞∑

n=1

1

n2− 1

2

∞∑

n=1

1

n2=

π2

12,

using the well-known fact that∑∞

n=1 1/n2 = π2/6. Hence for almost every x,

limn→∞

1

nΣ(1)

n = − π2

12 log 2.

It remains to show that 1nΣ

(2)n → 0 as n → ∞. Recall that in §6.3 we introduced

the cylinder set I(x0, x1, . . . , xn−1) of rank n to denote the set of points x with continuedfraction expansion that starts x0, . . . , xn−1. We proved in §6.3 that I(x0, x1, . . . , xn−1) isan interval with length at most 1/Qn(x)2. Note that both x and Pn(x)/Qn(x) lie in thesame interval of rank n. Hence

∣

∣

∣

∣

x

Pn(x)/Qn(x)− 1

∣

∣

∣

∣

=Qn(x)

Pn(x)

∣

∣

∣

∣

x − Pn(x)

Qn(x)

∣

∣

∣

∣

≤ Qn(x)

Pn(x)

1

Qn(x)2=

1

Pn(x)Qn(x).

It follows from Lemma 6.3.1(i) that Pn(x) ≥ 2(n−2)/2 and Qn(x) ≥ 2(n−1)/2. Hence∣

∣

∣

∣

x

Pn(x)/Qn(x)− 1

∣

∣

∣

∣

≤ 1

2n−3/2.

By the triangle inequality and the fact that log y ≤ y − 1 we have that

∣

∣

∣Σ

(n)2

∣

∣

∣≤

n−1∑

j=0

∣

∣

∣

∣

log

(

T j(x)

Pn−j(T j(x))/Qn−j(T j(x))

)∣

∣

∣

∣

≤n−1∑

j=0

∣

∣

∣

∣

T j(x)

Pn−j(T j(x))/Qn−j(T j(x))− 1

∣

∣

∣

∣

≤n−1∑

j=0

1

2n−j− 32

.

Note thatn−1∑

j=0

1

2n−j− 32

=1

2−3/2

n−1∑

j=0

1

2j≤ 1

2−3/2

∞∑

j=0

1

2j= 25/2.

Hence Σ(2)n ≤ 25/2 for all n. Hence

limn→∞

1

nΣ(2)

n = 0

and the result follows. ✷

106


§11.4 Exercises

Exercise 11.1Let b ≥ 2 be an integer. Prove that Lebesgue measure is an ergodic invariant measure forTb(x) = bx mod 1 defined on the unit interval.

Exercise 11.2(i) A number x ∈ [0, 1] is said to be simply normal if it is simply normal in base b for all

b ≥ 2. Prove that Lebesgue a.e. number x ∈ [0, 1] is simply normal.

(ii) Prove Proposition 11.2.4.

Exercise 11.3Let r ≥ 2 be an integer. Prove that for Lebesgue almost every x ∈ [0, 1], the sequencexn = rnx is uniformly distributed mod 1.

Exercise 11.4Prove that the arithmetic mean of the digits appearing in the base 10 expansion of Lebesgue-a.e. x ∈ [0, 1) is equal to 4.5, i.e. prove that if x =

∑∞j=0 xj/10

j+1, xj ∈ {0, 1, . . . , 9} then

limn→∞

1

n(x0 + x1 + · · · + xn−1) = 4.5 a.e.

Exercise 11.5Let x ∈ [0, 1] have continued fraction expansion x = [x0, x1, x2, . . .].

Prove that

limn→∞

1

n(x0 + x1 + · · · + xn−1) = ∞

for Lebesgue almost every x ∈ [0, 1]. (Hint: use Exercise 10.2.)

107

MATH4/61112 12. Solutions

12. Solutions to the Exercises

Solution 1.1Suppose that xn ∈ R is uniformly distributed mod 1. Let x ∈ [0, 1] and let ε > 0. We wantto show that there exists n such that {xn} ∈ (x − ε, x + ε) ∩ [0, 1] (as usual, {xn} denotesthe fractional part of xn).

By the definition of uniform distribution mod 1 we have that

limn→∞

1

ncard{j | 0 ≤ j ≤ n − 1, {xj} ∈ (x − ε, x + ε)} = 2ε.

Then there exists n0 such that if n ≥ n0 then

1

ncard{j | 0 ≤ j ≤ n − 1, {xj} ∈ (x − ε, x + ε)} > ε > 0.

Hencecard{j | 0 ≤ j ≤ n − 1, {xj} ∈ (x − ε, x + ε)} > 0

for some n, so there exists j such that {xj} ∈ (x − ε, x + ε).

Solution 1.2We use Weyl’s Criterion. Let ℓ ∈ Z \ {0}. Then

1

n

n−1∑

j=0

e2πiℓxj =1

n

n−1∑

j=0

e2πiℓ(αj+β) =1

ne2πiℓβ

n−1∑

j=0

e2πiℓαj =1

ne2πiℓβ

(

e2πiℓαn − 1

e2πiℓα − 1

)

,

summing the geometric progression. As α 6∈ Q, we have that e2πiℓα 6= 1 for any ℓ ∈ Z \{0}.Hence

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

e2πiℓxj

∣

∣

∣

∣

∣

∣

≤ 1

n

∣

∣

∣

∣

e2πiℓαn − 1

e2πiℓα − 1

∣

∣

∣

∣

≤ 1

n

2

|e2πiℓα − 1| → 0

as n → ∞, as |e2πiℓβ| = 1. Hence xn = αn + β is uniformly distributed.

Solution 1.3(i) If log10 2 = p/q with p, q integers, hcf(p, q) = 1, then 2 = 10p/q, i.e. 2q = 10p = 5p2p.

Comparing indices, we see that 0 = p = q, a contradiction.

(ii) Let 2n have leading digit r. Then

2n = r · 10ℓ + terms involving lower powers of 10

where the terms involving lower powers of 10 are integers lying in [0, 10ℓ). Hence

2n has leading digit r ⇔ r · 10ℓ ≤ 2n < (r + 1) · 10ℓ

⇔ log10 r + ℓ ≤ n log10 2 < log10(r + 1) + ℓ

⇔ log10 r ≤ {n log10 2} < log10(r + 1).

108


Hence

1

ncard{k | 0 ≤ k ≤ n − 1, 2k has leading digit r}

=1

ncard{k | 0 ≤ k ≤ n − 1, {k log10 2} ∈ [log10 r, log10(r + 1))}

which, by uniform distribution, converges to log10(r + 1)− log10 r = log10(1 + 1/r) asn → ∞.

Solution 1.4The frequency with which the penultimate leading digit of 2n is r is given by

∑9q=1 A(q, r)

where A(q, r) is the frequency with which the leading digit is q and the penultimate leadingdigit is r.

Now 2n has leading digit q and penultimate digit r precisely when

q · 10ℓ + r · 10ℓ−1 ≤ 2n < q · 10ℓ + (r + 1) · 10ℓ−1.

Taking logs shows that 2n has leading digit q and penultimate leading digit r when

log10(10q + r) + ℓ − 1 ≤ n log10 2 < log10(10q + r + 1) + (ℓ − 1).

Reducing this mod 1 gives

log10(10q + r) − 1 ≤ {n log10 2} < log10(10q + r + 1) − 1

(the −1s appear because 1 < log10(10q + r), log10(10q + r + 1) < 2). As {n log10 2} isuniformly distributed mod 1, we see that

A(q, r) = (log10(10q + r + 1) − 1) − (log10(10q + r) − 1)

= log10

(

1 +1

10q + r

)

.

Hence the frequency with which the penultimate leading digit of 2n is r is

9∑

q=1

log10

(

1 +1

10q + r

)

= log10

9∏

q=1

(

1 +1

10q + r

)

.

Solution 2.1Suppose first that the numbers α1, . . . , αk, 1 are rationally independent. This means thatif r1, . . . , rk, r are rational numbers such that

r1α1 + · · · + rkαk + r = 0,

then r1 = · · · = rk = r = 0. In particular, for ℓ = (ℓ1, . . . , ℓk) ∈ Zk \ {0}

ℓ1α1 + · · · + ℓkαk /∈ Z,

so thate2πi(ℓ1α1+···+ℓkαk) 6= 1.

109


By summing the geometric progression we have that∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

e2πi(ℓ1jα1+···+ℓkjαk)

∣

∣

∣

∣

∣

∣

=

∣

∣

∣

∣

∣

1

n

e2πin(ℓ1α1+···+ℓkαk) − 1

e2πi(ℓ1α1+···+ℓkαk) − 1

∣

∣

∣

∣

∣

≤ 1

n

2

|e2πi(ℓ1α1+···+ℓkαk) − 1| → 0, as n → ∞.

Therefore, by Weyl’s Criterion, (nα1, . . . , nαk) is uniformly distributed mod 1.Now suppose that the numbers α1, . . . , αk, 1 are rationally dependent. Thus there exist

rationals r1, . . . , rk, r (not all zero) such that r1α1 + · · ·+ rkαk + r = 0. By multiplying bya common denominator we can find ℓ = (ℓ1, . . . , ℓk) ∈ Zk \ {0} such that

ℓ1α1 + · · · + ℓkαk ∈ Z.

Thus e2πi(ℓ1nα1+···+ℓknαk) = 1 for all n ∈ N and so

1

n

n−1∑

j=0

e2πi(ℓ1jα1+···+ℓkjαk) = 1 6→ 0, as n → ∞.

Therefore, (nα1, . . . , nαk) is not uniformly distributed mod 1.

Solution 2.2Let p(n) = αkn

k + · · · + α1n + α0. Suppose that αk, . . . , αs+1 ∈ Q but αs 6∈ Q. Let

p1(n) = αknk + · · · + αs+1n

s+1

p2(n) = αsns + · · · + α1n + α0

so that p(n) = p1(n)+p2(n). By choosing q to be a common denominator for αk, . . . , αs+1,we can write

p1(n) =1

q(mkn

k + · · · + ms+1ns+1)

where mj ∈ Z.By Weyl’s Criterion, we want to show that for ℓ ∈ Z \ {0} we have

1

n

n−1∑

j=0

e2πiℓp(j) → 0

as n → ∞.Write j = qm+ r where r = 0, . . . , q− 1. Then p1(qm+ r) = dr mod 1 for some dr ∈ Q.

Moreover, p2(qm + r) = p(q,r)2 (m) is a polynomial in m with irrational leading coefficient.

Now

limn→∞

1

n

n−1∑

j=0

e2πiℓp(j) = limn→∞

1

n

h

nq

i

−1∑

m=0

q−1∑

r=0

e2πiℓdre2πiℓp(q,r)2 (m)

= limn→∞

[

nq

]

n

q−1∑

r=0

e2πiℓdr1[

nq

]

h

nq

i

−1∑

m=0

e2πiℓp(q,r)2 (m)

= 0

as p(q,r)2 (m) is uniformly distributed mod 1.

110


Solution 2.3Let p(n) = αn2 + n + 1 where α 6∈ Q. Let m ≥ 1 and consider the sequence p(m)(n) =p(n + m) − p(n) of mth differences. We have that

p(m)(n) = α(n + m)2 + (n + m) + 1 − αn2 − n − 1 = 2αmn + αm2 + m

which is a degree 1 polynomial in n with leading coefficient 2αm 6∈ Q. Note that 2αm 6= 0,as m ≥ 1. By Exercise 1.2 we have that p(m)(n) is uniformly distributed mod 1 for everym ≥ 1. By Lemma 2.3.3, it follows that p(n) is uniformly distributed mod 1.

Solution 2.4By Weyl’s Criterion, we require that for each (ℓ1, ℓ2) ∈ Z2 \ {(0, 0)} we have

limn→∞

1

n

n−1∑

k=0

exp 2πi(ℓ1p(k) + ℓ2q(k)) = 0 (12.0.1)

Let

pℓ1,ℓ2(n) = ℓ1p(n) + ℓ2q(n)

= (ℓ1αk + ℓ2βk)nk + · · · + (ℓ1α1 + ℓ2β1)n + (ℓ1α0 + ℓ2β0)

This is a polynomial of degree at most k. Then (12.0.1) can be written as

1

n

n−1∑

k=0

exp 2πipℓ1,ℓ2(k).

By the 1-dimensional version of Weyl’s criterion (using the integer ℓ = 1), this will convergeto 0 as n → ∞ if pℓ1,ℓ2(n) is uniformly distributed mod 1. By Weyl’s Theorem on Polynomi-als (Theorem 2.3.1), this happens if at least one of ℓ1αk + ℓ2βk, ℓ1αk−1 + ℓ2βk−1, . . . , ℓ1α1 +ℓ2β1 is irrational. Note that ℓ1αi+ℓ2βi 6∈ Q if and only if αi, βi, 1 are rationally independent.

Solution 2.5(i) We know that ∅ ∈ B and that if E ∈ B then X \ E ∈ B. Hence X = X \ ∅ ∈ B.

(ii) Let En ∈ B. Then X \En ∈ B. Then⋃

n(X \En) ∈ B. Now⋃

n(X \En) = X \⋂n En.Hence

⋂

n En = X \ (X \⋂n En) ∈ B.

Solution 2.6The smallest σ-algebra containing the sets [0, 1/4), [1/4, 1/2), [1/2, 3/4) and [3/4, 1] is

B = {∅, [0, 1/4), [1/4, 1/2), [1/2, 3/4), [3/4, 1], [0, 1/2), [0, 1/4) ∪ [1/2, 3/4), [0, 1/4) ∪ [3/4, 1],

[1/4, 3/4), [1/4, 1/2) ∪ [3/4, 1], [1/2, 1], [0, 3/4), [0, 1/2) ∪ [3/4, 1],

[0, 1/4) ∪ [1/2, 1], [1/4, 1], [0, 1]}

Solution 2.7Clearly a finite union of dyadic intervals is a Borel set.

By Proposition 2.4.2 we need to show that if x, y ∈ [0, 1], x 6= y, then there exist disjointdyadic intervals I1, I2 such that x ∈ I1, y ∈ I2. Let ε = |x − y| and choose n such that1/2n < ε/2. Without loss of generality, assume that x < y. Then there exist integers p, q,p < q, such that

p − 1

2n≤ x <

p

2n<

q

2n< y ≤ q + 1

2n.

Hence x, y belong to different dyadic intervals.

111


Solution 2.8Let A denote the collection of finite unions of intervals. Trivially ∅ ∈ A. If A,B ∈ Aare finite unions of intervals then A ∪ B is a finite union of intervals. Hence A is closedunder taking finite unions. If A = [a, b] ⊂ [0, 1] then Ac = [0, a) ∪ (b, 1] is a finite union ofintervals. Hence A is an algebra.

Solution 2.9First note that if µ is a measure and A ⊂ B then µ(A) ≤ µ(B). (To see this, note thatif A ⊂ B then B = A ∪ (B \ A) is a disjoint union. Hence µ(B) = µ(A ∪ (B \ A)) =µ(A) + µ(B \ A) ≥ µ(A).)

Let µ denote Lebesgue measure on [0, 1]. Let x ∈ [0, 1]. For any ε > 0, we have that{x} ⊂ (x − ε, x + ε) ∩ [0, 1]. Hence µ({x}) ≤ 2ε. As ε > 0 is arbitrary, it follows thatµ({x}) = 0.

Let E = {xj}∞j=1 be a countable set. Then

µ(E) = µ

∞⋃

j=1

{xj}

=

∞∑

j=1

µ({xj}) = 0.

Hence any countable set has Lebesgue measure 0.As the rational points in [0, 1] are countable, it follows that µ(Q ∩ [0, 1]) = 0. Hence

Lebesgue almost every point in [0, 1] is irrational.

Solution 2.10Let µ = δ1/2 be the Dirac δ-measure at 1/2. Then, by definition, µ([0, 1/2) ∪ (1/2, 1]) = 0as 1/2 6∈ [0, 1/2) ∪ (1/2, 1]. Hence µ{x ∈ [0, 1] | x 6= 1/2} = 0, so that µ-a.e. point in [0, 1]is equal to 1/2.

Solution 3.1Let xn = αn where α ∈ R is irrational. Then xn is uniformly distributed mod 1 (by theresults in §1.2.1). Let A = {{αn} | n ≥ 0} ⊂ [0, 1] denote the set of fractional parts of thesequence xn; note that A is a countable set. Let f = χA. Then f ∈ L1([0, 1],B, µ) (whereB denotes the Borel σ-algebra and µ denotes Lebesgue measure on [0, 1]) and f ≡ 0 a.e.Hence

∫

f dµ = 0. However, f({xn}) = 1 for each n. Hence

1

n

n−1∑

j=0

f({xn}) = 1 6→∫

f dµ = 0

as n → ∞.

Solution 3.2Let X be a compact metric space equipped with the Borel σ-algebra B. Let T : X → Xbe continuous. Recall that B is generated by the open sets. It is sufficient to check thatT−1U ∈ B for all open sets U . But this is clear: as T is continuous, the pre-image T−1Uof any open set is open, hence T−1U ∈ B.

Solution 3.3Define fn : [0, 1] → R by

fn(x) =

{

n − n2x if 0 ≤ x ≤ 1/n0 if 1/n ≤ x ≤ 1.

112


(Draw a picture!) Then fn is continuous, hence fn ∈ L1(X,B, µ). Moreover,∫

fn dµ = 1/2for each n. Hence fn 6→ 0 in L1(X,B, µ).

However, fn → 0 µ-a.e. To see this, let x ∈ [0, 1], x 6= 0. Choose n such that 1/n < x.Then fn(x) = 0 for any n ≥ N . Hence, if x 6= 0, we have that fn(x) = 0 for all sufficientlylarge n. Hence fn → 0 µ-a.e.

Solution 3.4First note that if B ∈ B then T−1B ∈ B. Hence T∗µ(B) = µ(T−1B) is well-defined.

Clearly T−1(∅) = ∅. Hence T∗µ(∅) = µ(T−1∅) = µ(∅) = 0.Let En ∈ B be pairwise disjoint. Then T−1En ∈ B are pairwise disjoint. (To see

this, suppose that x ∈ T−1En ∩ T−1Em. Then T (x) ∈ En and T (x) ∈ Em. HenceT (x) ∈ En ∩ Em. As the En are pairwise disjoint, this implies that n = m. HenceT−1En = T−1Em.) Hence

T∗µ

(

∞⋃

n=1

En

)

= µ

(

T−1∞⋃

n=1

En

)

= µ

(

∞⋃

n=1

T−1En

)

=

∞∑

n=1

µ(T−1En) =

∞∑

n=1

T∗µ(En)

where we have used the fact that T−1⋃∞

n=1 En =⋃∞

n=1 T−1En.Hence T∗µ is a measure.Finally, note that T−1(X) = X. Hence T∗µ(X) = µ(T−1X) = µ(X) = 1, so that T is

a probability measure.

Solution 3.5(i) Let λ denote Lebesgue measure on [0, 1]. All one needs to do is to find a set B such

that λ(B) 6= λ(T−1B), and any (reasonable) choice of set B will work. For example,take B = (1/2, 1). Then

T−1(B) =

∞⋃

n=1

(

1

n + 1,

1

n + 1/2

)

.

It follows that

λ(T−1B) =

∞∑

n=1

1

(1 + 2n)(1 + n)= log(4) − 1 <

1

2= λ(B).

(ii) Recall that

µ(B) =1

log 2

∫

B

dx

1 + x=

1

log 2

∫

χB(x)

1 + xdx.

Note that 1/2 ≤ 1/(1 + x) ≤ 1 if 0 ≤ x ≤ 1. Hence

1

log 2

∫

χB(x)

2dx ≤ µ(B) ≤ 1

log 2

∫

χB(x) dx

so that

1

2 log 2λ(B) =

1

2 log 2

∫

χB(x) dx ≤ µ(B) ≤ 1

log 2

∫

χB(x) dx =1

log 2λ(B).

113


(iii) From (3.4.1) it follows that

1

2 log 2

∫

f dλ ≤∫

f dµ ≤ 1

log 2

∫

f dλ (12.0.2)

for all simple functions f . By taking increasing sequences of simple functions, wesee that (12.0.2) continues to hold for non-negative measurable functions. Let f ∈L1(X,B, µ). Then

1

2 log 2

∫

|f | dλ ≤∫

|f | dµ

so that f ∈ L1(X,B, λ). Similarly, if f ∈ L1(X,B, λ) then f ∈ L1(X,B, µ).

Solution 3.6Let [a, b] ⊂ [0, 1]. Then

T−1[a, b] =

k−1⋃

j=0

[

a + j

k,b + j

k

]

so that

T∗µ([a, b]) =k−1∑

j=0

b + j

k− a + j

k=

k−1∑

j=0

b − a

k= b − a = µ([a, b]).

Hence T∗µ and µ agree on intervals. Hence, by the Hahn-Kolmogorov Extension Theorem,T∗µ = µ so that µ is a T -invariant measure.

Solution 3.7Let T (x) = βx mod 1. Then T has a graph as illustrated in Figure 12.1.

0 1

1/β

0

1

1/β

Figure 12.1: The graph of T (x) = βx mod 1.

Let us first show that T does not preserve Lebesgue measure. For this, it is sufficient tofind a set B such that B and T−1B do not have equal Lebesgue measure; in fact, almost any

114


reasonable choice of B will suffice, but here is a specific example. Let λ denote Lebesguemeasure. Take B = [1/β, 1]. Then λ(B) = 1 − 1/β = 1/β2 (as β − 1 = 1/β). NowT−1[1/β, 1] = [1/β2, 1/β] so that λ(T−1B) = 1/β − 1/β2 = 1/β3 6= λ(B).

We now show that T does preserve the measure µ defined as in the statement of thequestion. To do this we again use the Hahn-Kolmogorov Extension Theorem, which tellsus that it is sufficient to prove that µ(T−1[a, b]) = µ[a, b] for all intervals [a, b] ⊂ [0, 1].

If [a, b] ⊂ [0, 1/β] then

T−1[a, b] =

[

a

β,b

β

]

∪[

a + 1

β,b + 1

β

]

,

a disjoint union. Hence,

µ(T−1[a, b]) =1

1β + 1

β3

(

(b − a)

β

)

+1

β(

1β + 1

β3

)

(

(b + 1) − (a + 1)

β

)

=b − a1β + 1

β3

(

1

β+

1

β2

)

= µ([a, b]).

If [a, b] ⊂ [1/β, 1] then T−1[a, b] = [a/β, b/β] and

µ(T−1[a, b]) = µ

([

a

β,b

β

])

=1

1β + 1

β3

(

b − a

β

)

= µ([a, b]).

If a < 1/β < b then we write [a, b] = [a, 1/β] ∪ [1/β, b]. Then T−1[a, b] = T−1[a, 1/β] ∪T−1[1/β, b], a disjoint union. Hence

µ(T−1[a, b]) = µ(T−1[a, 1/β] ∪ T−1[1/β, b])

= µ(T−1[a, 1/β]) + µ(T−1[1/β, b])

= µ([a, 1/β]) + µ([1/β, b]) = µ([a, b]).

Solution 3.8(i) Note that

µ([0, 1]) =1

π

∫ 1

0

1√

x(1 − x)dx =

2

π

∫ π/2

0dθ = 1,

using the substitution x = sin2 θ.

(ii) By the Hahn-Kolmogorov Extension Theorem it is sufficient to prove that µ([a, b]) =µ(T−1[a, b]) for all intervals [a, b].

Note that

T−1[a, b] =

[

1 −√

1 − a

2,1 −

√1 − b

2

]

∪[

1 +√

1 − b

2,1 +

√1 − a

2

]

(as the graph of T is decreasing on [1/2, 1] the order of a, b are reversed in the secondsub-interval). It is sufficient to prove that

µ([(1 −√

1 − a)/2, (1 −√

1 − b)/2]) =1

2µ([a, b])

115


and

µ([(1 +√

1 − b)/2, (1 +√

1 − a)/2]) =1

2µ([a, b])

We prove the first equality (the second is similar). Now

µ([(1 −√

1 − a)/2, (1 −√

1 − b)/2]) =1

π

∫ 1−√

1−b2

1−√

1−a2

1√

x(1 − x)dx. (12.0.3)

Consider the substitution u = 4x(1 − x). Then du = 4(1 − 2x)dx and as x rangesbetween (1 −

√1 − a)/2 and (1 −

√1 − b)/2, u ranges between a, b. Note also that

a simple manipulation shows that (1 − 2x)2 = 1 − u. Hence the right-hand side of(12.0.3) is equal to

1

2π

∫ b

a

1√

u(1 − u)du =

1

2µ([a, b]).

Similarly,

µ([(1 +√

1 − b)/2, (1 +√

1 − a)/2]) =1

2µ([a, b])

and the result follows.

Solution 3.9Note that

X =∞⋃

n=1

(

1

n + 1,1

n

]

∪ {0}

and that this is a disjoint union. Hence, denoting Lebesgue measure by µ,

1 = µ(X) =∞∑

n=1

µ

((

1

n + 1,1

n

])

=∞∑

n=1

1

n− 1

n + 1=

∞∑

n=1

1

n(n + 1).

By the Hahn-Kolmogorov Extension Theorem, it is sufficient to check that µ(T−1[a, b]) =µ([a, b]) for all intervals [a, b]. It is straightforward to check that

T−1[a, b] =

∞⋃

n=1

[

a + n

n(n + 1),

b + n

n(n + 1)

]

and that this is a disjoint union. Hence

µ(T−1[a, b]) =

∞∑

n=1

µ

([

a + n

n(n + 1),

b + n

n(n + 1)

])

=

∞∑

n=1

b − a

n(n + 1)

= b − a = µ([a, b]).

Solution 3.10(i) Clearly d(x,y) ≥ 0 with equality if and only if x = y. It is also clear that d(x,y) =

d(y,x). It remains to prove the triangle inequality: d(x, z) ≤ d(x,y) + d(y, z) for allx,y, z ∈ Σ.

116


If any of x,y, z are equal then the triangle inequality is clear, so we can assume thatx,y, z are all distinct. Suppose that x and y agree in the first n places and that yand z agree in the first m places. Then x and z agree in at least the first min{n,m}places. Hence n(x, z) ≥ min{n(x,y), n(y, z)}. Hence

d(x, z) =1

2n(x,z)≤ 1

2min{n(x,y),n(y,z)}≤ 1

2n(x,y)+

1

2n(y,z)= d(x,y) + d(y, z).

(ii) Let ε > 0 and choose n ≥ 1 such that 1/2n < ε. Choose δ = 1/2n+1. Suppose thatd(x,y) < δ = 1/2n+1. Then n(x,y) > n + 1, i.e. x and y agree in at least thefirst n + 1 places. Hence σ(x) and σ(y) agree in at least the first n places. Hencen(σ(x), σ(y)) > n. Hence d(σ(x), σ(y)) < 1/2n < ε.

(iii) We show that [i0, . . . , in−1] is open. Let x ∈ [i0, . . . , in−1] so that xj = ij for j =0, 1, . . . , n − 1. Choose ε = 1/2n. Suppose that d(x,y) < ε. Then n(x,y) > n, i.e. xand y agree in at least the first n places. Hence xj = yj for j = 0, 1, . . . , n− 1. Henceyj = ij for j = 0, 1, . . . , n − 1 so that y ∈ [i0, . . . , in−1]. Hence [i0, . . . , in−1] is open.

To see that [i0, . . . , in−1] is closed, note that

Σ \ [i0, . . . , in−1] =⋃

[i′0, . . . , i′n−1]

where the union is over all n-tuples (i′0, i′1, . . . , i

′n−1) 6= (i0, i1, . . . , in−1). This is a

finite union of open sets, and so is open. Hence [i0, . . . , in−1], as the complement ofan open set, is closed.

Solution 3.11First note that

P =

0 1 0 0 01/4 0 3/4 0 00 1/2 0 1/2 00 0 3/4 0 1/40 0 0 1 0

, P 2 =

1/4 0 3/4 0 00 5/8 0 3/8 0

1/8 0 3/4 0 1/80 3/8 0 5/8 00 0 3/4 0 1/4

,

P 3 =

0 5/8 0 3/8 05/32 0 3/4 0 3/32

0 1/2 0 1/2 03/32 0 3/4 0 5/32

0 3/8 0 5/8 0

, P 4 =

5/32 0 3/4 0 3/320 17/32 0 15/32 0

1/8 0 3/4 0 1/80 15/32 0 17/32 0

3/32 0 3/4 0 5/32

.

As for each i, j there exists n for which Pn(i, j) > 0, it follows that P is irreducible.Recall that the period of P is the highest common factor of {n > 0 | Pn(i, i) > 0}. As

all the diagonal entries of P 2 are positive, it follows that P has period 2.Decompose {1, 2, 3, 4, 5} = {1, 3, 5} ∪ {2, 4} = S0 ∪ S1. If P (i, j) > 0 then either i ∈ S0

and j ∈ S1, or i ∈ S1 and j ∈ S0, i.e. i ∈ Sℓ and j ∈ Sℓ+1 mod 2. When restricted to theindices {1, 3, 5}, P 2 has the form

1/4 3/4 01/8 3/4 1/80 3/4 1/4

117


which is easily seen to be irreducible and aperiodic. When restricted to the indices {2, 4},P 2 has the form

(

5/8 3/83/8 5/8

)

which is clearly irreducible and aperiodic.The eigenvalues of P are found by evaluating the determinant

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

−λ 1 0 0 01/4 −λ 3/4 0 00 1/2 −λ 1/2 00 0 3/4 −λ 1/40 0 0 1 −λ

∣

∣

∣

∣

∣

∣

∣

∣

∣

∣

.

After simplifying this expression, we obtain

(1 − λ)(1 + λ)λ

(

λ +1

4

)

.

(Note that, as P has period 2, we expect from the Perron-Frobenius Theorem that thesquare roots of 1 to be the eigenvalues of modulus 1 for P .)

A left eigenvector p = (p(1), p(2), p(3), p(4), p(5)) for the eigenvalue 1 is determined by

(p(1), p(2), p(3), p(4), p(5))

0 1 0 0 01/4 0 3/4 0 00 1/2 0 1/2 00 0 3/4 0 1/40 0 0 1 0

= (p(1), p(2), p(3), p(4), p(5))

which simplifies to

1

4p(2) = p(1), p(1)+

1

2p(3) = p(2),

3

4p(2)+

3

4p(4) = p(3),

1

2p(3)+p(5) = p(4),

1

4p(4) = p(5).

Setting p(1) = 1 we obtain (p(1), p(2), p(3), p(4), p(5)) = (1, 4, 6, 4, 1), and normalising thisto form a probability vector we obtain

p =

(

1

16,1

4,3

8,1

4,

1

16

)

.

Solution 3.12Let p = (p(1), . . . , p(k)) be a probability vector. Let P be the matrix

P =

p(1) p(2) · · · p(k)p(1) p(2) · · · p(k)

......

p(1) p(2) · · · p(k)

.

Then P is a stochastic matrix. As each p(j) > 0, it follows that P is aperiodic. It isstraightforward to check that pP = p.

As P (i, j) = p(j), the Markov measure determined by the matrix P is the same asBernoulli measure determined by the probability vector p.

118


Solution 4.1Note that

χT−1B(x) = 1 ⇔ x ∈ T−1B ⇔ T (x) ∈ B ⇔ χB(T (x)) = 1.

Hence χT−1B = χB ◦ T .

Solution 4.2Note that T n(x) = x if and only if 2nx = x mod 1, i.e. 2nx = x + p for some integer p.Hence x = p/(2n − 1). We get distinct values of x in R/Z when p = 0, 1, . . . , 2n − 2 (notethat when p = 2n − 1 then x = 1, which is the same as 0 in R/Z).

Hence there are infinitely many distinct periodic orbits for the doubling map. Ifx, Tx, . . . , Tn−1x is a periodic orbit of period n then let δ(x) = 1/n

∑n−1j=0 δT jx denote the

periodic orbit measure supported on the orbit of x. As there are infinitely many distinctperiodic orbits, there are infinitely many distinct measures supported on periodic orbits.

Solution 4.3Recall that R/Z can be regarded as [0, 1] where 0 and 1 are identified. Suppose thatf : [0, 1] → R is integrable and that f(0) = f(1) so that f is a well-defined function onR/Z. Then

∫

f ◦ T dµ =

∫ 1/2

0f ◦ T dµ +

∫ 1

1/2f ◦ T dµ

=

∫ 1/2

0f(2x) dx +

∫ 1

1/2f(2x − 1) dx (12.0.4)

=1

2

∫ 1

0f(x) dx +

1

2

∫ 1

0f(x) dx

=

∫

f dµ

where we have used the substitution u(x) = 2x for the first integral and u(x) = 2x − 1 forthe second integral in (12.0.4)

Solution 4.4It is straightforward to check that T : R2/Z2 → R2/Z2 is a diffeomorphism.

Recall that we can identify functions f : R2/Z2 → C with functions f : R2 → C thatsatisfy f(x + n) = f(x) for all n ∈ Z2. We apply the change of variables formula with thesubstitution u(x) = T (x). Note that

DT (x) =

(

1 01 1

)

so that |detDT | = 1. Hence, by the change of variables formula∫

f ◦ T dµ =

∫

R2/Z2

f ◦ T |det DT | dµ =

∫

T (R2/Z2)f dµ =

∫

f dµ.

Solution 5.1Let α = p/q with p, q ∈ Z, q 6= 0, hcf(p, q) = 1. Let

B =

q−1⋃

j=0

[

j

q,j

q+

1

2q

]

.

119


Then

T−1

[

j

q,j

q+

1

2q

]

=

[

j − p

q,j − p

q+

1

2q

]

so that T−1B = B (draw a picture to understand this better). However µ(B) = 1/2, sothat T is not ergodic with respect to Lebesgue measure.

Solution 5.2Suppose that f ∈ L2(X,B, µ) has Fourier series

∑

(n,m)∈Z2

c(n,m)e2πi(nx+my).

Then f ◦ T has Fourier series

∑

(n,m)∈Z2

c(n,m)e2πi(n(x+α)+m(x+y)) =

∑

(n,m)∈Z2

c(n,m)e2πinαe2πi((n+m)x+my).

Comparing coefficients we see that

c(n+m,m) = e2πinαc(n,m).

Suppose that m 6= 0. Then for each j > 0,

|c(n+jm,m)| = · · · = |c(n+m,m)| = |c(n,m)|,

as |e2πinα| = 1. Note that if m 6= 0 then (n + jm,m) → ∞ as j → ∞. By the Riemann-Lebesgue Lemma (Proposition 5.3.2(ii)), we must have that c(n,m) = 0 if m 6= 0. Hence fhas Fourier series

∑

(n,0)∈Z2

c(n,0)e2πinx

and f ◦ T has Fourier series∑

(n,0)∈Z2

c(n,0)e2πinαe2πinx.


c(n,0) = c(n,0)e2πinα.

Suppose that n 6= 0. As α 6∈ Q, e2πinα 6= 1. Hence c(n,0) = 0 unless n = 0. Hence f hasFourier series c(0,0), i.e. f is constant a.e. Hence T is ergodic with respect to Lebesguemeasure.

Solution 5.3Suppose that T : X → X has a periodic point x with period n. Let

µ =1

n

n−1∑

j=0

δT jx.

Let B ∈ B and suppose that T−1B = B. We must show that µ(B) = 0 or 1.

120


Suppose that x ∈ B. Then x ∈ T−1B. Hence T (x) ∈ B. Continuing inductively, we seethat T j(x) ∈ B for j = 0, 1, . . . , n − 1. Hence

µ(B) =1

n

n−1∑

j=0

δT jx(B) =1

n

n−1∑

j=0

1 = 1.

Similarly, if x ∈ X \B then T j(x) ∈ X \B for j = 0, 1, . . . , n− 1 (we have used the factthat if B is T -invariant then X \ B is T -invariant). Hence µ(B) = 0.

Solution 5.4(i) Recall that the determinant of a matrix is equal to the product of all the eigenvalues.

Let T be a linear toral automorphism with corresponding matrix A. Suppose that Ahas an eigenvalue of modulus 1. By considering A2 if necessary, there is no loss ingenerality in assuming that detA = 1.

Suppose k = 2. Then the matrix A has two eigenvalues, λ, λ. As A has an eigenvalueof modulus 1, we must have that λ = e2πiθ for some θ ∈ [0, 1). Then λ, λ satisfy theequation λ2 + 2cos θλ + 1 = 0. However the matrix A = (a, b; c, d) has characteristicequation λ2+(a+d)λ+1 = 0. Hence 2 cos θ = a+d, an integer. Thus θ = 0,±π/2,±π.Hence λ = ±1,±i, and is a root of unity and so T cannot be ergodic.

Now suppose k = 3. Then, assuming that A has an eigenvalue of modulus 1, theeigenvalues must be λ = e2πiθ, λ and µ ∈ R. As detA = 1, we must have thatλλµ = 1. As λλ = 1, it follows that µ = 1, Hence A has 1 as an eigenvalue and so Tcannot be ergodic.

Thus k ≥ 4.

(ii) A has integer entries and it is easy to see that detA = 1. Hence A determines a lineartoral automorphism of R4/Z4.

(iii) It is straightforward to calculate that the characteristic equation for A is

λ4 − 8λ3 + 6λ2 − 8λ + 1 = 0.

Clearly, λ 6= 0. Dividing by λ2 and substituting u = λ + λ−1 we see that

u2 − 8u + 4 = 0.

Henceu = 4 ± 2

√3.

Multiplying λ + λ−1 = u by λ we obtain a quadratic in λ with solution

λ =u ±

√u2 − 4

2.

Substituting the two different values of u gives four values of λ, namely:

2 +√

3 ±√

6 + 4√

3, 2 −√

3 ± i

√

4√

3 − 6.

The first two are real and not of unit modulus, whereas the second two are complexnumbers of unit modulus.

121


(iv) This question is not part of the course and is included for completeness only. Thesolution requires ideas from Galois theory.

We first claim that f(λ) = λ4 − 8λ3 + 6λ2 − 6λ + 1 is irreducible over Q. (To seethis, recall that irreducibility over Q is equivalent to irreducibility over Z. Considerthe polynomial f(λ + 1) = λ4 − 4λ3 − 12λ2 − 14λ− 6 and apply Eisenstein’s criterionusing the prime 2 to see that f(λ+ 1) is irreducible over Z. Hence f(λ) is irreducibleover Z.) Hence λ4 − 8λ3 + 6λ2 − 6λ + 1 has no common factors with λn − 1 for anyn. Hence λ is not a root of unity.

Solution 6.1(i) Let µ denote Lebesgue measure. We prove that T∗µ = µ by using the Hahn-

Kolmogorov Extension Theorem. It is sufficient to prove that T∗µ([a, b]) = µ([a, b])for all intervals [a, b]. Note that

T−1[a, b] =

[

a

2,b

2

]

∪[

1 − b

2, 1 − a

2

]

.

Hence

T∗µ([a, b]) =b

2− a

2+(

1 − a

2

)

−(

1 − b

2

)

= b − a = µ([a, b]).

Hence µ is a T -invariant measure.

(ii) Define I(0) = [0, 1/2], I(1) = [1/2, 1] and define the maps

φ0 : [0, 1] → I(0) : x 7→ x

2, φ1 : [0, 1] → I(1) : x 7→ 1 − x

2.

Then Tφ0(x) = x, Tφ1(x) = x.

Given i0, . . . , in−1 ∈ {0, 1} define

φi0,i1,...,in−1 = φi0φi1 · · ·φin−1

and note that T nφi0,i1,...,in−1(x) = x for all x ∈ [0, 1].

DefineI(i0, i1, . . . , in−1) = φi0,i1,...,in−1([0, 1])

and call this a cylinder of rank n. It is easy to see that cylinders of rank n aredyadic intervals (although the labelling of these cylinders is not the same as thelabelling that one gets when using the doubling map: for example, for the tent mapI(1, 1) = [1/2, 3/4] whereas for the doubling map I(1, 1) = [3/4, 1]). Hence the algebraA of finite unions of cylinders generates the Borel σ-algebra.

Let B ∈ B be such that T−1B = B. Note that T−nB = B. Let I = I(i0, i1, . . . , in−1)be a cylinder of rank n and let φ = φi0,i1,...,in−1 . Then T nφ(x) = x. Note also thatµ(I) = 1/2n. We will also need the fact that |φ′(x)| = 1/2n (this follows from notingthat |φ′

0(x)| = |φ′1(x)| = 1/2 and using the chain rule).

Finally, we observe that

µ(B ∩ I) =

∫

χB∩I(x) dx

=

∫

χB(x)χI(x) dx

122


=

∫

IχB(x) dx

=

∫ 1

0χB(φ(x))|φ′(x)| dx by the change of variables formula

=

∫ 1

0χT−nB(φ(x))|φ′(x)| dx as T−nB = B

=

∫ 1

0χB(T n(φ(x)))|φ′(x)| dx

=

∫ 1

0χB(x)|φ′(x)| dx as T nφ(x) = x

=1

2n

∫ 1

0χB(x) as |φ′(x)| = 1/2n

= µ(I)µ(B) as µ(I) = 1/2n.

Hence µ(B∩ I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1it follows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T .

Solution 6.2For each n ≥ 1 define I(n) = [1/(n + 1), 1/n] and define the maps

φn : [0, 1] → I(n) : x 7→ x − n

n(n + 1).

Note that Tφn(x) = x for all x ∈ [0, 1].Given i0, i1, . . . , in−1 ∈ N define

φi0,i1,...,in−1 = φi0φi1 · · ·φin−1

and note that T nφi0,i1,...,in−1(x) = x for all x ∈ [0, 1].Define I(i0, i1, . . . , in−1) = φi0,i1,...,in−1([0, 1]) and call this a cylinder of rank n. Note

that

φ′n(x) =

1

n(n + 1)≤ 1

2

so that, by the chain rule,

φ′i0,i1,...,in−1

(x) =n−1∏

j=0

1

ij(ij + 1)≤ 1

2n.

By the Intermediate Value Theorem, I(i0, i1, . . . , in−1) is an interval of length no more than1/2n. For each n, the cylinders of rank n partition [0, 1]. Let x, y ∈ [0, 1] and suppose thatx 6= y. Choose n such that |x− y| > 1/2n. Then x, y must lie in different cylinders of rankn. Hence the cylinders separate the points of [0, 1]. By Proposition 2.4.2 it follows that thealgebra A of finite unions of cylinders generates the Borel σ-algebra.

Let B ∈ B be such that T−1B = B. Note that T−nB = B. Let I = I(i0, i1, . . . , in−1)be a cylinder of rank n and let φ = φi0,i1,...,in−1. Then T nφ(x) = x. Note that

µ(I) =

n−1∏

j=0

1

ij(ij + 1)= φ′(x) (12.0.5)

123


for any x ∈ [0, 1].Finally, we observe that

µ(B ∩ I) =

∫

χB∩I(x) dx

=

∫

χB(x)χI(x) dx

=

∫

IχB(x) dx

=

∫ 1

0χB(φ(x))|φ′(x)| dx by the change of variables formula

=

∫ 1

0χT−nB(φ(x))|φ′(x)| dx as T−nB = B

=

∫ 1

0χB(T n(φ(x)))|φ′(x)| dx

=

∫ 1

0χB(x)|φ′(x)| dx

= µ(I)µ(B) by (12.0.5).

Hence µ(B ∩ I) = µ(B)µ(I) for all sets I in the algebra of cylinders. By Lemma 6.1.1 itfollows that µ(B) = 0 or 1. Hence Lebesgue measure is an ergodic measure for T .

Solution 6.3(i) First note that

1

x0=

P1

Q1,

1

x0 + 1x1

=x1

x0x1 + 1=

P2

Q2.

If we define P0 = 0, Q0 = 1 then we have that P2 = x1P1 + P0 and Q2 = x1Q1 + Q0.

Similarly,

1

x0 + t=

P1(x0; t)

Q1(x0; t),

1

x0 + 1x1+t

=x1 + t

x0x1 + 1 + t=

P2(x0, x1; t)

Q2(x0, x1; t).

thenP2(x0, x1; t) = P2 + tP1, Q2(x0, x1; t) = Q2 + tQ1.

Suppose that Pn(x0, . . . , xn−1) = Pn + tPn−1, Qn(x0, . . . , xn−1) = Qn + tQn−1. Then

Pn+1(x0, x1, . . . , xn; t)

Qn+1(x0, x1, . . . , xn; t)= [x0, . . . , xn−1, xn + t]

=

[

x0, . . . , xn−1 +1

xn + t

]

=Pn(x0, x1, . . . , xn−1;

1xn+t)

Qn(x0, x1, . . . , xn−1;1

xn+t)

=Pn + 1

xn+tPn−1

Qn + 1xn+tQn−1

=xnPn + Pn−1 + tPn

xnQn + Qn−1 + tQn.

124


Hence

Pn+1(x0, x1, . . . , xn; t) = xnPn+Pn−1+tPn, Qn+1(x0, x1, . . . , xn; t) = xnQn+Qn−1+tQn.

Putting t = 0 we obtain the recurrence relations

Pn+1 = xnPn + Pn+1, Qn+1 = xnQn + Qn+1.

Hence

Pn+1(x0, x1, . . . , xn; t) = Pn+1 + tPn, Qn+1(x0, x1, . . . , xn; t) = Qn+1 + tQn.

By induction, the recurrence relations hold.

(ii) Note that

QnPn−1 − Qn−1Pn = (xn−1Qn−1 + Qn−2)Pn−1 − Qn−1(xn−1Pn−1 + Pn−2)

= −(Qn−1Pn−2 − Qn−2Pn−1) = · · · = (−1)n.

Solution 6.4(i) Let x = (x0, x1, . . .),y = (y0, y1, . . .) ∈ Σ. Let dR/Z and dΣ denote the usual metrics

on R/Z and Σ, respectively. Now

dR/Z(π(x), π(y)) (12.0.6)

≤ |π(x0, x1, . . .) − π(y0, y1, . . .)|

=

∣

∣

∣

∣

x0 − y0

2+

x1 − y1

22+ · · ·

∣

∣

∣

∣

≤ |x0 − y0|2

+|x1 − y1|

22+ · · · . (12.0.7)

Now if dΣ(x,y) < 1/2n then xj = yj for j = 0, . . . n. Hence we can bound theright-hand side of (12.0.7) by

|xn+1 − yn+1|2n+2

+|xn+2 − yn+2|

2n+3+ · · · ≤ 1

2n+2+

1

2n+3+ · · ·

=1

2n+2

(

1 +1

2+

1

22+ · · ·

)

≤ 1

2n+1,

summing the geometric progression. This implies that π is continuous. To see this,let ε > 0. Choose n such that 1/2n+1 < ε. Choose δ = 1/2n. If dΣ(x,y) < δ thendR/Z(π(x), π(y)) < ε.

(ii) Observe that if x = (xj)∞j=0 ∈ Σ then

π(σ(x)) = π(σ(x0, x1, . . .)) = π(x1, x2, . . .) =x1

2+

x2

22+ · · ·

and

T (π(x)) = T (π(x0, x1, . . .))

= T(x0

2+

x1

22+ · · ·

)

= x0 +x1

2+

x2

22+ · · · mod 1

=x1

2+

x2

22+ · · · .

125


(iii) We must show that T∗(π∗µ) = π∗µ. To see this, observe that

T∗(π∗µ)(B) = π∗µ(T−1B)

= µ(π−1T−1B)

= µ(σ−1π−1B) as πσ = Tπ

= (σ∗µ)(π−1B)

= µ(π−1B) as µ is σ-invariant

= (π∗µ)(B).

(iv) Suppose that µ is an ergodic measure for σ. We claim that π∗µ is an ergodic measurefor T , i.e. if B ∈ B(R/Z) is such that T−1B = B then π∗µ(B) = 0 or 1.

First observe that π−1(B) is σ-invariant. This follows as:

σ−1(π−1(B)) = π−1T−1(B) = π−1(B).

As µ is an ergodic measure for σ, we must have that µ(π−1(B)) = 0 or 1. Henceπ∗µ(B) = 0 or 1.

(v) There are uncountably many different Bernoulli measures µp for Σ given by the familyof probability vectors (p, 1 − p). These are ergodic for σ. To see that π∗µp are alldifferent, notice that π∗µp([0, 1/2)) = µp(π

−1[0, 1/2)) = µp([0]) = p, where [0] denotesthe cylinder consisting of all sequences that start with 0.

Solution 7.1Suppose that xn → x. We must show that δxn ⇀ δx. Let f ∈ C(X, R). Then

∫

f dδxn = f(xn) → f(x) =

∫

f dδx

as f is continuous. Hence δxn ⇀ δx.

Solution 7.2Suppose that µn ⇀ µ. We must show that T∗µ ⇀ T∗µ. Let f ∈ C(X, R). Then

limn→∞

∫

f d(T∗µn) = limn→∞

∫

f ◦ T dµn =

∫

f ◦ T dµ =

∫

f d(T∗µ)

as f ◦ T is continuous. Hence T∗µn ⇀ T∗µ as n → ∞.

Solution 7.3(i) Suppose that µn → µ as n → ∞. We claim that µn ⇀ µ. To show this, we have to

prove that if f ∈ C(X, R) then∫

f dµn →∫

f dµ as n → ∞.

Let f ∈ C(X, R). Note that f/‖f‖∞ ∈ C(X, R) and that ‖(f/‖f‖∞)‖∞ = 1. Hence∣

∣

∣

∣

∫

f dµn −∫

f dµ

∣

∣

∣

∣

= ‖f‖∞∣

∣

∣

∣

∫

f

‖f‖∞dµn −

∫

f

‖f‖∞dµ

∣

∣

∣

∣

≤ supg∈C(X,R),‖g‖∞≤1

∣

∣

∣

∣

∫

g dµn −∫

g dµ

∣

∣

∣

∣

= ‖µn − µ‖,

which tends to 0 as n → ∞.

126


(ii) Suppose that xn → x but that xn 6= x for all n. We claim that δxn 6→ δx. Note that

‖δxn − δx‖ = supf∈C(X,R),‖f‖∞≤1

|f(xn) − f(x)|.

For each n, we can choose a continuous function fn ∈ C(X, R) such that fn(x) = 1,fn(xn) = 0 and ‖fn‖∞ ≤ 1. Hence

‖δxn − δx‖ = supf∈C(X,R),‖f‖∞≤1

|f(xn) − f(x)|.

≥ |fn(xn) − fn(x)|= 1.

Hence δxn 6→ δx.

(iii) First note that if f ∈ C(X, R) is any continuous function with ‖f‖∞ ≤ 1, then∣

∣

∣

∣

∫

f dδx −∫

f dδy

∣

∣

∣

∣

= |f(x) − f(y)| ≤ |f(x)| + |f(y)| ≤ 2.

Hence

‖δx − δy‖ = supf∈C(X,R),‖f‖∞≤1

∣

∣

∣

∣

∫

f dδx −∫

f dδy

∣

∣

∣

∣

≤ 2.

Conversely, by Urysohn’s Lemma, there exist continuous functions g1, g2 such thatg1(x) = g2(y) = 1, g1(y) = g2(x) = 0 and 0 ≤ g1, g2 ≤ 1. Let h = g1 − g2. Thenh(x) = 1, h(y) = −1 and −1 ≤ h ≤ 1 (so that ‖h‖∞ = 1). Hence

2 = |h(x) − h(y)|

=

∣

∣

∣

∣

∫

hdδx −∫

hdδy

∣

∣

∣

∣

≤ supf∈C(X,R),‖f‖∞≤1

∣

∣

∣

∣

∫

f dδx −∫

f dδy

∣

∣

∣

∣

= ‖δx − δy‖.

Hence if x 6= y then ‖δx − δy‖ = 2.

Suppose that X is infinite. Let xn ∈ X be pairwise distinct and consider the sequenceµn = δxn . Then ‖µn − µm‖ = 2 if n 6= m. Hence µn cannot have a convergentsubsequence, and so M(X) is not compact in the strong topology when X is infinite.

Solution 7.4Let xn ∈ X be a sequence such that xn → x and xn 6= x for all n. Let µn = δxn and µ = δx.Then µn ⇀ µ. Take B = {x}. Then µn(B) = 0 but µ(B) = 1. Hence µn(B) 6→ µ(B).

Solution 7.5Let µ1, µ2 ∈ M(X,T ) and suppose that α ∈ [0, 1]. Then αµ1 + (1 − α)µ2 ∈ M(X). Tocheck that αµ1 + (1 − α)µ2 ∈ M(X,T ), note that

(T∗(αµ1 + (1 − α)µ2))(B) = (αµ1 + (1 − α)µ2)(T−1B)

= αµ1(T−1B) + (1 − α)µ2(T

−1B)

= αµ1(B) + (1 − α)µ2(B)

= (αµ1 + (1 − α)µ2)(B).

127


Solution 7.6Let S ⊂ C(X, R) be uniformly dense. Let f ∈ C(X, R). Let ε > 0. Choose g ∈ S such that‖f − g‖∞ < ε. Choose N such that if n ≥ N then |

∫

g dµn −∫

g dµ| < ε. Then∣

∣

∣

∣

∫

f dµn −∫

f dµ

∣

∣

∣

∣

≤∣

∣

∣

∣

∫

f dµn −∫

g dµn

∣

∣

∣

∣

+

∣

∣

∣

∣

∫

g dµn −∫

g dµ

∣

∣

∣

∣

+

∣

∣

∣

∣

∫

f dµ −∫

g dµ

∣

∣

∣

∣

≤∫

|f − g| dµn +

∣

∣

∣

∣

∫

g dµn −∫

g dµ

∣

∣

∣

∣

+

∫

|f − g| dµ

≤ 3ε.

As ε > 0 is arbitrary, the result follows.

Solution 7.7(i) Recall that x ∈ Σ is a periodic point with period n if σn(x) = x. If x is periodic

with period n then xj+n = xj for all j = 0, 1, 2, . . .. Hence x is determined by thefirst n symbols, which then repeat. As there are two choices for each xj , there are 2n

periodic points with period n.

(ii) First note that µn is a Borel probability measure.

Let [i0, i1, . . . , im−1] be a cylinder. Let n ≥ m. Then the periodic points x of periodn in [i0, i1, . . . , im−1] have the form

x = (i0, i1, . . . , im−1, xm, . . . , xn−1, i0, i1, . . . , im−1, xm, . . . , xn−1, i0, . . .)

where the finite string of symbols i0, i1, . . . , im−1, xm, . . . , xn−1 repeats. The symbolsxm, . . . , xn−1 can be chosen arbitrarily. Hence there are 2n−m such periodic points.

Hence, if n ≥ m,∫

χ[i0,i1,...,im−1] dµn =1

2n× 2n−m =

1

2m=

∫

χ[i0,i1,...,im−1] dµ.

(iii) To prove that χ[i0,i1,...,im−1] is continuous we need to show that, if xn → x thenχ[i0,i1,...,im−1](xn) → χ[i0,i1,...,im−1](x).

First suppose that x ∈ [i0, i1, . . . , im−1]. As xn → x, it follows from the definition ofthe metric on Σ that there exists N ∈ N such that if n ≥ N then xn and x agree inthe first m places. Hence if n ≥ N then xn ∈ [i0, i1, . . . , im−1]. Hence, if n ≥ N , thenχ[i0,i1,...,im−1](xn) = 1 = χ[i0,i1,...,im−1](x).

Now suppose that x 6∈ [i0, i1, . . . , im−1]. As xn → x, it follows from the definition ofthe metric on Σ that there exists N ∈ N such that if n ≥ N then xn and x agreein the first m places. Hence if n ≥ N , there exists j ∈ {0, 1, . . . ,m − 1} such that(xn)j 6= ij; that is, if n ≥ N , then xn 6∈ [i0, i1, . . . , im−1]. Hence, if n ≥ N , thenχ[i0,i1,...,im−1](xn) = 0 = χ[i0,i1,...,im−1](x).

Hence χ[i0,i1,...,im−1] is continuous.

(iv) Let S denote the set of finite linear combinations of characteristic functions of cylin-ders. By the Stone-Weierstrass Theorem, S is uniformly dense in C(X, R). By (ii)above, if g ∈ S then

∫

g dµn →∫

g dµ as n → ∞. Let f ∈ C(X, R) and let ε > 0.Choose g ∈ S such that ‖f − g‖∞ < ε. Then a 3ε argument as in the solutions toExercise 7.6 proves that lim supn→∞ |

∫

f dµn −∫

f dµ| < 3ε and the result follows.

128


Solution 7.8As trigonometric polynomials are uniformly dense in C(X, R), it is sufficient to prove that∫

g ◦ T dµ =∫

g dµ for all trigonometric polynomials g. Let g(x) =∑r

j=0 cje2πi〈n(j),x〉,

cj ∈ R, n(j) = (n(j)1 , n

(j)2 , n

(j)3 ) ∈ Z3 be a trigonometric polynomial. We label the coefficients

so that n(j) = 0 if and only if j = 0. Then∫

g dµ = c0.Note that

g ◦ T

xyz

+ Z3

= g

α + xx + yy + z

+ Z3

=r∑

j=0

cje2πi〈(n

(j)1 ,n

(j)2 ,n

(j)3 ),(α+x,x+y,y+x)〉

=

r∑

j=0

cje2πin

(j)1 αe

2πi“

n(j)1 x+n

(j)2 (x+y)+n

(j)3 (y+z)

”

=

r∑

j=0

cje2πin

(j)1 αe2πi〈(n

(j)1 +n

(j)2 ,n

(j)2 +n

(j)3 ,n

(j)3 ),(x,y,z)〉.

Hence

∫

g ◦ T dµ =

∫ r∑

j=0

cje2πin

(j)1 αe2πi〈(n

(j)1 +n

(j)2 ,n

(j)2 +n

(j)3 ,n

(j)3 ),(x,y,z)〉 dµ

=

r∑

j=0

cje2πin

(j)1 α

∫

e2πi〈(n(j)1 +n

(j)2 ,n

(j)2 +n

(j)3 ,n

(j)3 ),(x,y,z)〉 dµ.

The integral is equal to zero unless (n(j)1 + n

(j)2 , n

(j)2 + n

(j)3 , n

(j)3 ) = (0, 0, 0), i.e. unless

n(j)1 = n

(j)2 = n

(j)3 = 0. By our choice of labelling the coefficients, this only happens if

j = 0. Hence∫

g ◦ T dµ = c0 =

∫

g dµ.

Solution 8.1(i) Let B ∈ B and let f = χB . Note that

∫

f dν =

∫

χB dν =

∫

Bdν = ν(B) =

∫

B

dν

dµdµ =

∫

χBdν

dµdµ =

∫

fdν

dµdµ.

Hence the result holds for characteristic functions, hence for simple functions (finitelinear combinations of characteristic functions). Let f ∈ L1(X,B, µ) be such thatf ≥ 0. By considering an increasing sequence of simple functions, the result followsfor positive L1 functions. By splitting an arbitrary real-valued L1 function into itspositive and negative parts, and then an arbitrary L1(X,B, µ) function into its realand imaginary parts, the result holds.

(ii) Now dν1/dµ, dν2/dµ are the unique functions such that

ν1(B) =

∫

B

dν1

dµdµ, ν2(B) =

∫

B

dν2

dµdµ,

129


respectively. Hence

ν1(B) + ν2(B) =

∫

B

dν1

dµdµ +

∫

B

dν2

dµdµ =

∫

B

dν1

dµ+

dν2

dµdµ.

However,

(ν1 + ν2)(B) =

∫

B

d(ν1 + ν2)

dµdµ.

Hence, by uniqueness in the Radon-Nikodym theorem, we have that

d(ν1 + ν2)

dµ=

dν1

dµ+

dν2

dµ.

(iii) Suppose that µ(B) = 0. As ν ≪ µ then ν(B) = 0. As λ ≪ ν then λ(B) = 0. Henceλ ≪ µ.

Now as λ ≪ µ we have

λ(B) =

∫

B

dλ

dµdµ.

As λ ≪ ν we have

λ(B) =

∫

B

dλ

dνdν =

∫

χBdλ

dνdν.

By part (i), using the fact that ν ≪ µ, it follows that

∫

χBdλ

dνdν =

∫

χBdλ

dν

dν

dµdµ =

∫

B

dλ

dν

dν

dµdµ.

Hence by uniqueness in the Radon-Nikodym theorem,

dλ

dµ=

dλ

dν

dν

dµ.

Solution 8.2The claimed formula is easily seen to be valid for n = 3. Suppose the formula is valid forn. Then

T n+1

xyz

+ Z3

= TT n

xyz

+ Z3

= T

(

n1

)

α + x(

n2

)

α +

(

n1

)

x + y(

n3

)

α +

(

n2

)

x +

(

n1

)

y + z

+ Z3

130


=

(

n1

)

α + x + α(

n2

)

α +

(

n1

)

x + y +

(

n1

)

α + x(

n3

)

α +

(

n2

)

x +

(

n1

)

y + z +

(

n2

)

α +

(

n1

)

x + y

+ Z3

=

(

n + 11

)

α + x(

n + 12

)

α +

(

n + 11

)

x + y(

n + 13

)

α +

(

n + 12

)

x +

(

n + 11

)

y + z

+ Z3

.

Hence the claimed formula holds by induction.Let f(x, y, z) = e2πi(kx+ℓy+mz). Then

1

n

n−1∑

j=0

f(T j((x, y, z) + Z3)) =1

n

n−1∑

j=0

e2πip(k,ℓ,m)x,y,z (j)

where p(k,ℓ,m)x,y,z (n) is a polynomial. When m 6= 0, p

(k,ℓ,m)x,y,z (n) is a degree 3 polynomial with

leading coefficient mα/6 6∈ Q. When m = 0, ℓ 6= 0, p(k,ℓ,m)x,y,z (n) is a degree 2 polynomial with

leading coefficient ℓα/2 6∈ Q. When m = ℓ = 0, k 6= 0, p(k,ℓ,m)x,y,z (n) is a degree 1 polynomial

with leading coefficient kα 6∈ Q. In all three cases, p(k,ℓ,m)x,y,z (n) is uniformly distributed

mod 1, by Weyl’s Theorem on Polynomials (Theorem 2.3.1). Hence by Weyl’s Criterion(Theorem 1.2.1) for all (k, ℓ,m) ∈ Z3 \ {(0, 0, 0)} we have

1

n

n−1∑

j=0

e2πip(k,ℓ,m)x,y,z (j) → 0

as n → ∞. When k = ℓ = m = 0 we trivially have that

1

n

n−1∑

j=0

e2πip(k,ℓ,m)x,y,z (j) =

1

n

n−1∑

j=0

1 → 1

as n → ∞. Hence1

n

n−1∑

j=0

f(T j((x, y, z) + Z3)) →∫

f dµ

whenever f(x, y, z) = e2πi(kx+ℓy+mz).By taking finite linear combinations of exponential functions we see that

supx∈X

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∣

∣

∣

∣

∣

∣

→ 0

as n → ∞ for all trigonometric polynomials g. By the Stone-Weierstrass Theorem (The-orem 1.2.2), trigonometric polynomials are uniformly dense in C(X, R). Let f ∈ C(X, R)

131


and let ε > 0. Then there exists a trigonometric polynomial g such that ‖f − g‖∞ < ε.Hence for any x ∈ X we have

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∣

∣

∣

∣

∣

∣

≤

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

(f(T j(x)) − g(T j(x))

∣

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∣

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∫

g − f dµ

∣

∣

∣

∣

≤ 1

n

n−1∑

j=0

|f(T j(x)) − g(T j(x))| +

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∣

∣

∣

∣

∣

∣

+

∫

|g − f | dµ

≤ 2ε +

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∥

∥

∥

∥

∥

∥

∞

.

Hence, taking the supremum over all x ∈ X, we have∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∥

∥

∥

∥

∥

∥

∞

≤ 2ε +

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

g(T j(x)) −∫

g dµ

∥

∥

∥

∥

∥

∥

∞

.

Letting n → ∞ we see that

lim supn→∞

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∥

∥

∥

∥

∥

∥

∞

≤ 2ε.

As ε > 0 is arbitrary, it follows that

limn→∞

∥

∥

∥

∥

∥

∥

1

n

n−1∑

j=0

f(T j(x)) −∫

f dµ

∥

∥

∥

∥

∥

∥

∞

= 0.

Hence statement (ii) in Oxtoby’s Ergodic Theorem holds. As (i) and (ii) in Oxtoby’s ErgodicTheorem are equivalent, it follows that T is uniquely ergodic and Lebesgue measure is theunique invariant measure.

Solution 8.3Let T be a uniquely ergodic homeomorphism with unique invariant measure µ.

Suppose that every orbit is dense. Let U be a non-empty open set. Then for all x ∈ X,there exists n ∈ Z such that T n(x) ∈ U . Hence X =

⋃∞n=−∞ T−nU . Hence

1 = µ(X) = µ

(

∞⋃

n=−∞

T−nU

)

≤∞∑

n=−∞

µ(T−nU) =∞∑

n=−∞

µ(U)

as µ is T -invariant. Hence µ(U) > 0.Conversely, suppose that µ(U) > 0 for all non-empty open sets. Suppose for a contra-

diction that there exists x0 ∈ X such that the orbit of x0 is not dense.Clearly {T n(x0) | n ∈ Z} is T -invariant. As T is continuous, the set

Y = cl{T n(x0) | n ∈ Z}

132


is also T -invariant. As the orbit of x0 is not dense, Y is a proper subset of X. As Y is closedand X is compact, it follows that Y is compact. By Theorem 7.5.1 there exists an invariantprobability measure ν for the map T : Y → Y . Extend ν to X by setting ν(B) = ν(B ∩Y )for Borel subsets B ⊂ X. Noting that X \ Y is also T -invariant, it follows that ν is aninvariant measure for T : X → X. This contradicts unique ergodicity as ν(X \ Y ) = 0 butµ(X \ Y ) > 0.

Solution 9.1Let X = R equipped with the Borel σ-algebra and Lebesgue measure. Define T (x) = x+1.Then Lebesgue measure is T -invariant. Take A = [0, 1). Then A has positive measure, butno point of A returns to A under T .

Solution 9.2Take X = {0, 1} to be a set consisting of two elements. Let B be the set of all subsets ofX and equip X with the measure µ = 1

2δ0 + 12δ1 that assigns measure 1/2 to both 0 and

1. Take T (x) = x to be the identity. Then T is a measure-preserving transformation. LetA = {0}, B = {1}. Then µ(A) = µ(B) = 1/2 > 0. However, T j(0) never lands in B.

Solution 9.3Recall that E(f | A) is determined as being the unique A-measurable function such that

∫

AE(f | A) dµ =

∫

Af dµ

for all A ∈ A.

(i) We need to show that

E(αf + βg | A) = αE(f | A) + βE(g | A).

Note that αE(f | A) + βE(g | A) is A-measurable. Moreover, as

∫

AαE(f | A) + βE(g | A) dµ = α

∫

AE(f | A) dµ + β

∫

AE(g | A) dµ

= α

∫

Af dµ + β

∫

Ag dµ

=

∫

Aαf + βg dµ

=

∫

AE(αf + βg | A) dµ

for all A ∈ A, the claim follows.

(ii) First note that E(f | A) ◦ T is T−1A-measurable. To see this, note that E(f | A) isA-measurable, i.e.

{x ∈ X | E(f | A)(x) ≤ c} ∈ A for all c ∈ R.

Hence

{x ∈ X | E(f | A)(Tx) ≤ c} = T−1{x ∈ X | E(f | A)(x) ≤ c} ∈ T−1A

133


so that E(f | A) ◦ T is T−1A-measurable.

Note that for any A ∈ A∫

T−1AE(f | A) ◦ T dµ =

∫

χT−1AE(f | A) ◦ T dµ

=

∫

χA ◦ T · E(f | A) ◦ T dµ

=

∫

χAE(f | A) dµ as µ is T -invariant

=

∫

AE(f | A) dµ

=

∫

Af dµ.

Moreover∫

T−1AE(f ◦ T | T−1A) dµ =

∫

T−1Af ◦ T dµ

=

∫

χT−1Af ◦ T dµ

=

∫

χA ◦ T · f ◦ T dµ

=

∫

χAf dµ

=

∫

Af dµ.

Hence∫

T−1AE(f ◦ T | T−1A) dµ =

∫

T−1AE(f | A) ◦ T dµ

for all A ∈ A. By the characterisation of conditional expectation, it follows that

E(f ◦ T | T−1A) = E(f | A) ◦ T.

(iii) That E(f | B) = f is immediate from the above characterisation of conditionalexpectation.

(iv) Recall that a function f : X → R is A-measurable if f−1(−∞, c) ∈ A for all c ∈ R.

Suppose that f is N -measurable. Let Bc = f−1(−∞, c) ∈ N . Hence µ(Bc) = 0 or 1.Note that c1 < c2 implies Bc1 ⊂ Bc2. Hence there exists c0 such that

c0 = sup{c | µ(Bc) = 0} = inf{c | µ(Bc) = 1}.

We claim that f(x) = c0 µ-a.e. If c < c0 then µ({x ∈ X | f(x) < c}) = 0. Hencef(x) ≥ c0 µ-a.e. Let c > c0. Then

µ({x ∈ X | f(x) ≥ c}) = µ(X \{x ∈ X | f(x) < c}) = 1−µ({x ∈ X | f(x) < c}) = 0.

Hence µ({x ∈ X | f(x) > c0}) = 0. Hence f(x) = c0 µ-a.e.

134


Suppose that f is constant almost everywhere, say f(x) = a µ-a.e. Then f−1(−∞, c) =∅ µ-a.e. if c < a and f−1(−∞, c) = X µ-a.e. if c > a. Hence µ(f−1(−∞, c)) = 0 or 1.Hence f−1(−∞, c) ∈ N for all c ∈ R. Hence f is N -measurable.

If N ∈ N has measure 0 then∫

Nf dµ = 0 =

∫

N

(∫

f dµ

)

dµ

and if N ∈ N has measure 1 then∫

Nf dµ =

∫

f dµ =

∫

N

(∫

f dµ

)

dµ.

Hence E(f | N ) =∫

f dµ.

Solution 9.4(i) Let α = {A1, . . . , An} be a finite partition of X into sets Aj ∈ B and let A be the set

of all finite unions of sets in α.

Trivially ∅ ∈ A.

Let Bj =⋃ℓj

i=1 Ai,j, Ai,j ∈ α, be a countable collection of finite unions of sets in α.Then

⋃

j Bj is a union of sets in α. As there are only finitely many sets in α, we havethat

⋃

j Bj is a finite union of sets in α. Hence⋃

j Bj ∈ A.

It is clear that A is closed under taking complements.

Hence A is a σ-algebra.

(ii) Recall that g : X → R is A-measurable if g−1(−∞, c) ∈ A for all c ∈ R.

Suppose that g is constant on each Aj ∈ α and write g(x) =∑

j cjχAj(x). Then

g−1(−∞, c) =⋃

Aj where the union is taken over sets Aj for which cj < c. Hence gis A-measurable.

Conversely, suppose that g is A-measurable. For each c ∈ R, let Ac = g−1(−∞, c).Then Ac ∈ A. Moreover, Ac ↓ ∅ as c → −∞ (in the sense that

⋂

c∈R Ac = ∅) andAc ↑ X as c → ∞ (in the sense that

⋃

c Ac = X). Let A ∈ α. Then there exists c0

such that A 6⊂ Ac for c < c0 and A ⊂ Ac for c > c0. Hence g(x) = c0 for all x ∈ A.Hence g is constant on each element of α.

(iii) Define g by

g(x) =

n∑

j=1

χAj(x)

∫

Ajf dµ

µ(Aj).

Then g is constant on each set in α, hence g is A-measurable.

Let Ai ∈ α. Then

∫

Ai

g dµ =

n∑

j=1

∫

χAiχAj

∫

Ajf dµ

µ(Aj)dµ =

n∑

j=1

∫

χAi∩Aj

∫

Ajf dµ

µ(Aj)dµ =

∫

Ai

f dµ.

Hence∫

A g dµ =∫

A f dµ for all A ∈ A. It follows that g = E(f | A).

135


Solution 9.5Clearly ∅ ∈ I.

Let I ∈ I, so that T−1(I) = I. Then T−1(X \ I) = X \ I, so that the complement of Iis in I.

Let In ∈ I. Then T−1(⋃

n In) =⋃

n T−1In =⋃

n In so that⋃

n In ∈ I.Hence I is a σ-algebra.

Solution 9.6Recall that E(f | I) is determined by the requirements that E(f | I) is I-measurable andthat

∫

IE(f | I) dµ =

∫

If dµ

for all I ∈ I. Let PI : L2(X,B, µ) → L2(X,I, µ) denote the orthogonal projection onto thesubspace of I-measurable functions. To show that PIf = E(f | I) it is thus sufficient tocheck that for each I ∈ I we have

∫

IPIf dµ =

∫

If dµ

for all f ∈ L2(X,B, µ).Note that

∫

I PIf dµ =∫

χIPIf dµ = 〈χI , PIf〉 and, similarly,∫

I f dµ = 〈χI , f〉, wherewe use 〈·, ·〉 to denote the inner product on L2(X,B, µ). Hence it is sufficient to prove that,for all I ∈ I, 〈χI , f − PIf〉 = 0.

It is proved in the proof of Theorem 9.6.1 that L2(X,B, µ) = L2(X,I, µ) ⊕ C where Cdenotes the norm-closure of the subspace {w◦T−w | w ∈ L2(X,B, µ)}. Hence it is sufficientto prove that 〈χI , g〉 = 0 for all g ∈ C. To see this, first note that for w ∈ L2(X,B, µ) wehave that

〈χI , w ◦ T − w〉 = 〈χI , w ◦ T 〉 − 〈χI , w〉= 〈χT−1I , w ◦ T 〉 − 〈χI , w〉= 〈χI ◦ T,w ◦ T 〉 − 〈χI , w〉 = 0,

using the facts that I = T−1I a.e. and that T is measure-preserving. It follows that〈χI , g〉 = 0 for all g ∈ C.

Solution 10.1Let T be an ergodic measure-preserving transformation of the probability space (X,B, µ)and let f ∈ L1(X,B, µ). Let

Sn =

n−1∑

j=0

f(T jx).

By Birkhoff’s Ergodic Theorem, there exists a set N such that µ(N) = 0 and if x 6∈ N thenSn/n →

∫

f dµ as n → ∞. Let x 6∈ N . Note that

n + 1

n

Sn+1

n + 1=

f(T nx)

n+

Sn

n.

Letting n → ∞ we have that (n + 1)/n → 1, 1n+1Sn+1 →

∫

f dµ and 1nSn →

∫

f dµ asn → ∞. Hence if x 6∈ N then f(T nx)/n → 0 as n → ∞. Hence f(T nx)/n → 0 as n → ∞for µ-a.e. x ∈ X.

136


Solution 10.2Let f ≥ 0 be measurable and suppose that

∫

f dµ = ∞. For each integer M > 0 de-fine fM (x) = min{f(x),M}. Then 0 ≤ fM ≤ M , hence fM ∈ L1(X,B, µ). MoreoverfM(x) ↑ f(x) as M → ∞ for all x ∈ X. Hence by the Monotone Convergence Theorem(Theorem 3.1.2),

∫

fM dµ →∫

f dµ = ∞.By Birkhoff’s Ergodic Theorem, there exists NM ⊂ X with µ(NM ) = 0 such that for

all x 6∈ NM we have

limn→∞

1

n

n−1∑

j=0

fM(T jx) =

∫

fM dµ. (12.0.8)

Let N =⋃∞

M=1 NM . Then µ(N) = 0. Moreover, for any M > 0 we have that if x 6∈ Nthen (12.0.8) holds.

Let K ≥ 0 be arbitrary. As∫

fM dµ → ∞, it follows that there exists M > 0 such that∫

fM dµ ≥ K. Hence for all x 6∈ N we have

lim infn→∞

1

n

n−1∑

j=0

f(T jx) ≥ limn→∞

1

n

n−1∑

j=0

fM(T jx) =

∫

fM dµ ≥ K.

As K is arbitrary, we have that for all x 6∈ N

lim infn→∞

1

n

n−1∑

j=0

f(T jx) = ∞.

Hence 1n

∑∞j=0 f(T jx) → ∞ for µ-a.e. x ∈ X.

Solution 10.3We prove that (i) implies (ii). Suppose that T is an ergodic measure-preserving trans-formation of the probability space (X,B, µ). Recall from Proposition 10.2.2 that for allA,B ∈ B,

limn→∞

1

n

n−1∑

j=0

µ(T−jA ∩ B) = µ(A)µ(B),

Equivalently, for all A,B ∈ B we have

limn→∞

1

n

n−1∑

j=0

∫

χA(T jx)χB(x) dµ =

∫

χA dµ

∫

χB dµ. (12.0.9)

Let f(x) =∑r

k=1 ckχAk(x) be a simple function. Then taking linear combinations of

expressions of the form (12.0.9) we have that

limn→∞

1

n

n−1∑

j=0

∫

f(T jx)χB(x) dµ =

∫

f dµ

∫

χB dµ.

If f ≥ 0 is a positive measurable function then we can choose a sequence of simplefunctions fn ↑ f that increase pointwise to f . By the Monotone Convergence Theorem(Theorem 3.1.2) we have that

limn→∞

1

n

n−1∑

j=0

∫

f(T jx)χB(x) dµ =

∫

f dµ

∫

χB dµ (12.0.10)

137


for all positive measurable functions f . Suppose that f ∈ L1(X,B, µ) is real-valued. Thenby writing f = f+ − f− where f+, f− are positive, we have that (12.0.10) holds when f isintegrable and real-valued. By taking real and imaginary parts of f , we have that (12.0.10)holds for all f ∈ L1(X,B, µ).

By taking finite linear combinations of characteristic functions in (12.0.10) we have that

limn→∞

1

n

n−1∑

j=0

∫

f(T jx)g(x) dµ =

∫

f dµ

∫

g dµ. (12.0.11)

for all simple functions g. By taking an increasing sequence of simple functions and applyingthe Monotone Convergence Theorem as above, we have that (12.0.11) holds for all positivemeasurable functions g. By writing g = g+ − g− where g+, g− are positive, we have that(12.0.11) holds for any real-valued integrable function g. By taking real and imaginaryparts, we have that (12.0.11) holds for any g ∈ L1(X,B, µ).

We prove that (ii) implies (i). Suppose that for all f, g ∈ L2(X,B, µ) we have that

limn→∞

1

n

n−1∑

j=0

∫

f(T jx)g(x) dµ =

∫

f dµ

∫

g dµ.

Suppose that T−1B = B, B ∈ B. Then χB ∈ L2(X,B, µ). Taking f = g = χB we havethat

limn→∞

1

n

n−1∑

j=0

∫

χB(T jx)χB(x) dµ →∫

χB dµ

∫

χB dµ = µ(B)2.

Note that χB(T jx)χB(x) = χT−jB∩B(x) = χB(x) as T−jB = B. Hence

1

n

n−1∑

j=0

∫

χB(T jx)χB(x) dµ =1

n

n−1∑

j=0

∫

χB dµ =

∫

χB dµ = µ(B).

Hence µ(B) = µ(B)2 so that µ(B) = 0 or 1.

Solution 10.4Choose a countable dense set of continuous functions {fi}∞i=1 ⊂ C(X, R). By Birkhoff’sErgodic Theorem there exists Yi ∈ B such that µ(Yi) = 1 and

limn→∞

1

n

n−1∑

j=0

fi(Tjx) =

∫

fi dµ

for all x ∈ Yi. Let Y =⋂∞

i=1 Yi. Then Y ∈ B and µ(Y ) = 1.Let f ∈ C(X, R), ε > 0, x ∈ Y . Choose i such that ‖f − fi‖∞ < ε. Choose N such that

if n ≥ N then∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

fi(Tjx) −

∫

fi dµ

∣

∣

∣

∣

∣

∣

< ε.

Then∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

f(T jx) −∫

f dµ

∣

∣

∣

∣

∣

∣

138


≤

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

(

f(T jx) − fi(Tjx))

∣

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∣

∣

1

n

n−1∑

j=0

fi(Tjx) −

∫

fi dµ

∣

∣

∣

∣

∣

∣

+

∣

∣

∣

∣

∫

fi dµ −∫

f dµ

∣

∣

∣

∣

.

< 3ε.

As ε > 0 is arbitrary, we have that for all f ∈ C(X, R) and for all x ∈ Y ,

limn→∞

1

n

n−1∑

j=0

f(T jx) =

∫

f dµ.

Solution 10.5Let S = {A,B,C, . . . , Z} denote the finite set of letters (symbols) in the alphabet. LetΣ = {x = (xj)

∞j=0 | xj ∈ S, j = 0, 1, 2, . . .} denote the space of all infinite sequences of

symbols. For each s ∈ S, let p(s) = 1/26 denote the probability of choosing symbol s. LetB denote the Borel σ-algebra on Σ and equip Σ with the Bernoulli probability measure µdefined on cylinders by

µ([i0, i1, . . . , in−1]) = p(i0)p(i1) · · · p(in−1).

Define σ : Σ → Σ by (σ(x))j = xj+1.We regard an element x ∈ Σ as one possible outcome of the monkey typing an infinite

sequence of letters.Let B denote the cylinder [M,O,N,K,E, Y ]. Then µ(B) = 1/266 > 0. By Birkhoff’s

Ergodic Theorem, for µ-a.e. x ∈ Σ,

limn→∞

1

n

n−1∑

j=0

χB(σj(x)) = µ(B) > 0.

Hence, almost surely, the infinite sequence of letters x will contain ‘MONKEY’. Hence,with probability 1, the monkey will type the word ‘MONKEY’. (Indeed, with probabilityone he will type ‘MONKEY’ infinitely often.)

By Kac’s Lemma, the expected first time at which ‘MONKEY’ appears is 1/µ(B) = 266.If the monkey types 1 letter a second, then one would expect to wait 266 seconds (about9.8 years) until ‘MONKEY’ first appears in a block of 6.

Solution 11.1We first claim that for each integer b ≥ 2, T (x) = Tb(x) = bx mod 1 is ergodic withrespect to Lebesgue measure µ (we already know that Lebesgue measure is invariant byExercise 3.6). To see this, we use Fourier series, following the argument that was used toprove that the doubling map is ergodic with respect to Lebesgue measure.

Suppose that f ∈ L2(R/Z,B, µ) is such that f ◦ T = f µ-a.e. Then f ◦ T p = f µ-a.e. for all p ∈ N. Associate to f its Fourier series

∑∞n=−∞ cne2πinx. Then f ◦ T p has

Fourier series∑∞

n=−∞ cne2πinbpx. Comparing Fourier coefficients we see that cbpn = cn.Suppose that n 6= 0. Then bpn → ∞ as n → ∞. By the Riemann-Lebesgue Lemma(Proposition 5.3.1(ii)), cn = cbpn → 0 as n → ∞. Hence cn = 0 if n 6= 0. Hence f hasFourier series c0, i.e. f is constant a.e. Hence T is ergodic with respect to Lebesgue measure.

Solution 11.2(i) Let

Xb = {x ∈ [0, 1) | x is simply normal in base b}.

139


Then for each b ≥ 2, Xb has Lebesgue measure µ(Xb) = 1. Hence

X∞ =∞⋂

b=2

Xb

consists of all numbers that are simply normal in every base b ≥ 2. Clearly µ(X∞) = 1.

(ii) Let X(b) denote the set of numbers that are normal in base b, b ≥ 2. Then X∞ =⋂∞

b=2 Xb consists of all normal numbers. Clearly µ(X∞) = 1.

Alternatively, note that x ∈ [0, 1] is simply normal in base bk if and only if every wordof length k occurs with frequency 1/bk in the base b expansion of x. Hence a numberis normal in every base if and only if it is simply normal in every base.

Solution 11.3Let T (x) = rx mod 1, T : R/Z → R/Z. From Exercise 11.1 we know that T is ergodic withrespect to Lebesgue measure µ. Let x ∈ [0, 1] and let xn = rnx. Then {xn}, the fractionalpart of xn, is equal to T nx. Let ℓ ∈ Z \ {0} and let fℓ(x) = e2πiℓx. Then there existsNℓ ∈ B, µ(Nℓ) = 0, such that

limn→∞

1

n

n−1∑

j=0

e2πiℓxj = limn→∞

1

n

n−1∑

j=0

fℓ(Tjx) =

∫

fℓ(x) dx = 0

for all x 6∈ Nℓ.Let N =

⋃

ℓ∈Z\{0} Nℓ. As µ(Nℓ) = 0 and this is a countable union, we have thatµ(N) = 0. Hence if x 6∈ N we have for all ℓ ∈ Z \ {0}

limn→∞

1

n

n−1∑

j=0

e2πiℓxn = 0.

By Weyl’s Criterion it follows that if x 6∈ N then xn is uniformly distributed mod 1.(Aside: you might wonder why we had to use Weyl’s Criterion and did not just use the

definition of uniform distribution. Whilst it is certainly true that

1

ncard{j ∈ {0, 1, . . . , n − 1} | {xj} ∈ [a, b]} =

1

n

n−1∑

j=0

χ[a,b](Tjx) →

∫

χ[a,b] dµ = b − a

for µ-a.e. x ∈ X, the set of measure zero for which this fails depends on the interval [a, b].We need a set of measure zero that works for all intervals. As there are uncountably manyintervals, we cannot just take the union of all the sets of measure zero as we did above.One can make an argument along these lines work, by considering intervals with rationalendpoints (and so a countable collection of intervals) and then approximate an arbitraryinterval.)

Solution 11.4Let T (x) = 10x mod 1. From Exercise 11.1 we know that T is ergodic with respect toLebesgue measure. Let x ∈ [0, 1] have decimal expansion

x =

∞∑

j=0

xj

10j+1

140


with xj ∈ {0, 1, . . . , 9}. Let

f(x) =

9∑

k=0

kχ[k/10,(k+1)/10)

so that f(x) = k precisely when x0 = k. Note that f(T jx) = k precisely when xj = k.Then

1

n(x0 + x1 + · · · + xn−1) =

1

n

n−1∑

j=0

f(T jx).

Hence by Birkhoff’s Ergodic Theorem,

limn→∞

1

n(x0 + x1 + · · · + xn−1) = lim

n→∞

1

n

n−1∑

j=0

f(T jx)

=

∫

f(x) dx a.e.

=

9∑

k=0

k

10a.e. = 4.5 a.e.

Solution 11.5If x ∈ [0, 1] then write the continued fraction expansion of x as [x0, x1, . . .].

Define f : [0, 1] → R by

f(x) =∞∑

k=1

kχ(1/(k+1),1/k](x)

if x ∈ (0, 1] and set f(0) = 0. Then for x ∈ (0, 1] we have that f(x) = k precisely when1/(k + 1) < x ≤ 1/k, i.e. f(x) = k when x0 = k. Hence f(T jx) = k precisely when xj = k.

We can write1

n(x0 + · · · + xn−1) =

1

n

n−1∑

j=0

f(T jx).

Clearly f ≥ 0 and is measurable. However f 6∈ L1(X,B, µ). To see this, using Exer-cise 3.5(iii), it is sufficient to show that f 6∈ L1(X,B, λ) where λ denotes Lebesgue measure.Note that

∫

f dλ =

∞∑

k=1

k λ

((

1

k + 1,1

k

])

=

∞∑

k=1

1

k + 1= ∞.

By the results of Exercise 10.2, it follows that

limn→∞

1

n

n−1∑

j=0

f(T j(x)) = ∞

for µ-a.e. x ∈ X. As Gauss’ measure and Lebesgue measure have the same sets of measurezero. we have that

limn→∞

1

n

n−1∑

j=0

f(T j(x)) = ∞

141


for Lebesgue almost every point x ∈ X.

142

Documents

MATH41112/61112 Ergodic Theory - School of …cwalkden/ergodic-theory/ergodic_theory.pdfEach question is worth 30 marks. ... There is no coursework, ... K. Petersen, Ergodic Theory