Topics in Ergodic Theory- Part III - maths-notes · An interesting problem from number theory and combinatorics is Szemer edi’s theorem. In this course, we will give an ergodic

Part III – Topics in Ergodic Theory (Ongoing course,

rough)

Based on lectures by Dr. P. VarjuNotes taken by Bhavik Mehta

Michaelmas 2018

Contents

1 Measure preserving systems 2

2 Furstenberg correspondence 42.1 Aside on weak limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Ergodicity 7

4 Ergodic theorems 10

5 Proof of pointwise ergodic theorem 13

6 Unique ergodicity 16

7 Equidistribution of polynomials 21

8 Mixing properties 258.1 Cutting and stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

9 Entropy 33

10 Rudolph’s Theorem 46

Index 53

1

1 Measure preserving systems

ErgodicLecture 1 theory is all about measure preserving systems.

Definition (Measure preserving system). A measure preserving system (X,B, µ, T )with X a set, B a σ-algebra, µ a probability measure (µ(A) ≥ 0 ∀A ∈ B and µ(X) = 1) andT is a measure preserving transformation. Recall a measure preserving transformationT : X → X is a measurable function such that µ(T−1(A)) = µ(A) ∀A ∈ B.

If Y is a random element of X with distribution µ, then T (Y ) also has distribution µ.

Example. For example, consider a circle rotation. We have X = R/Z, B is the Borel sets,µ the Lebesgue measure, and T = Rα, with x 7→ x+ α and α ∈ R/Z is a parameter.

We also have the ‘times 2’ or ‘doubling’ map, with the same X,B, µ and T = T2, x 7→ 2·x.

Proof that T2 is measure preserving. First check for intervals: Let I = (a, b), then µ(I) =b− a. Also, µ(T−1

2 I) = µ((a2 ,

b2 ) ∪ (a2 + 1

2 ,b2 + 1

2 ))

= b2 −

a2 + b

2 −a2 = b− a, as required.

Now, let U ⊂ R/Z be open. Then U = I1 t I2 t · · · is a disjoint union of intervals:

µ(T−1U) = µ(⋃

T−1Ij

)=∑

µ(T−1Ij)

=∑

µ(Ij)

= µ(U).

Let K ⊂ R/Z be a compact set.

µ(T−1K) = 1− µ((T−1K)c) = 1− µ(T−1Kc) = 1− µ(Kc) = µ(K).

Now let A ∈ B be arbitrary. Let ε > 0. ∃U open and ∃K compact such that K ⊂ A ⊂ Uand µ(U \K) < ε.

µ(K) = µ(T−1K) ≤ µ(T−1A) ≤ µ(T−1U) = µ(U).

We also have µ(K) ≤ µ(A) ≤ µ(U). Since µ(U)− µ(K) < ε, |µ(A)− µ(T−1A)| < ε. ε wasarbitrary, so µ(A) = µ(T−1A).

The two examples generalise to the Haar measure on a topological group and to endo-morphisms respectively.

In ergodic theory, we study the long term behaviour of orbits.

Definition (Orbit). The orbit of x ∈ X is the sequence

x, Tx, T 2x, . . .

Some questions we might ask are:

• Let A ∈ B and x ∈ A. Does the orbit of x visit A infinitely often? (Recurrence)

• What is the proportion of times n such that Tnx ∈ A? (Ergodicity)

• What is µ({x ∈ A | Tnx ∈ A }) if n is large? (Mixing)

Ongoing course, rough 2 Updated online

Example. Let A = [0, 14 ) ⊂ R/Z. Then Tn2 x ∈ A ⇐⇒ the n + 1th and n + 2th ‘binary

digits’ of x are 0:For x = 0.x1x2x3 . . .2, x ∈ A corresponds to x1, x2 both being 0 and the doubling map

sends x to T2x = x2x3 . . .2, showing the required property.For example, x = 1

6 = 0.00101010 . . .2 starts in A but never comes back to A, but ‘mostpoints’ do return to A. Also, we have µ({x ∈ A | Tn2 x }) = 1

16 for any n ≥ 2.

Example (Markov shift). Let (P1, P2, . . . , Pn) be a probability vector. Let A ∈ Rn×n≥0 bethe ‘matrix of transition probabilities’. Assume

A

11...1

=

11...1

,(P1 P2 . . . Pn

)A =

(P1 P2 . . . Pn

)

Take X = {1, . . . , n}Z, B = Borel σ-algebra generated by the product topology of thediscrete topology on {1, . . . , n}, T = σ the shift map: (σx)m = xm+1. Finally, set themeasure

µ({x ∈ X | xm = i0, xm+1 = i1, . . . , xm+n = in }) = Pi0ai0i1 · · · ain−1in .

On the example sheet, we will confirm this extends to a measure and that it is invariantunder T .


2 Furstenberg correspondence

An interesting problem from number theory and combinatorics is Szemeredi’s theorem.In this course, we will give an ergodic theory proof of this, using Furstenberg’s multiplerecurrence theorem.

Theorem (Szemeredi’s theorem). Let S ⊂ Z of positive upper Banach density. That is,

d(S) := lim supN,M :M−N→∞

1

M −N∣∣S ∩ [N,M − 1]

∣∣and d(S) > 0. Then S contains arbitrarily long arithmetic progressions. That is, ∀l,∃a ∈Z, d ∈ Z>0,

a, a+ d, . . . , a+ (l − 1)d ∈ S.

Theorem (Furstenberg, multiple recurrence). Let (X,B, µ, T ) be a measure preservingsystem. Let A ∈ B such that µ(A) > 0. Let l ∈ Z>0. Then

lim infN→∞

1

N

N∑n=1

µ(A ∩ T−nA ∩ · · · ∩ T−(l−1)nA) > 0.

ToLecture 2 prove Furstenberg’s theorem requires more preparation, and we will do it later in thiscourse. However, we can now prove Szemeredi assuming Furstenberg multiple recurrence.

Proof of Szemeredi using Furstenberg. Aim to construct a measure preserving system in or-der to use Furstenberg multiple recurrence, using the given set S.

Take X = {0, 1}Z, B = Borel σ-algebra and σ = the shift map (σ(x))n = xn+1. LetxS ∈ X be defined by

xSn =

{1 n ∈ S0 n /∈ S

(so xS is effectively the indicator function of S). Also let A ∈ B be given by A ={x ∈ X | x0 = 1 }. Observe then that

n ∈ S ⇐⇒ xSn = 1 ⇐⇒ (σnxS)0 = 1 ⇐⇒ σnxS ∈ A.

The equivalence n ∈ S ⇐⇒ σnxS ∈ A allows us to relate the set S to the dynamics of σ.Now, we begin to construct the invariant measure µ. Let {Mm} and {Nm} be sequences

s.t. Mm −Nm →∞ and

d(S) = limm→∞

1

Mm −Nm|S ∩ [Nm,Mm − 1]| .

Let

µm =1

Mm −Nm

Mm−1∑n=Nm

δσnxS

where δx is a measure on X defined as

δx(B) =

{1 x ∈ B0 x /∈ B.

Let µ be the weak limit of a subsequence of µm. Note that the µ could be differentdependent on subsequence choice.


2.1 Aside on weak limits

We pause the proof to briefly recap material on weak limits.

Definition (Weak limit). Let X be a compact metric space. Let µm be a sequence of Borelmeasures on X, and let µ be another Borel measure. Then µm converges weakly to µ iffor any f ∈ C(X), we have

limn→∞

∫X

f dµn =

∫X

f dµ.

We write lim-wn→∞ µn = µ.

Theorem (Banach-Alaoglu, or Helly). Let X be a compact metric space. ThenM(X), theset of Borel probability measures on X, endowed with the topology of weak convergence, iscompact and metrizable. That is, there is a weakly convergent subsequence in any sequenceof Borel probability measures.

We are ready to return to the main proof: next we check that the system constructed isindeed a measure preserving system.

Lemma. (X,B, µ, σ) as defined above is a measure preserving system.

Proof sketch. Let B ∈ B. Then

µm(B) =1

Mm −Nm∣∣{n ∈ [Nm,Mm − 1] | σnxS ∈ B }

∣∣µm(σ−1B) =

1

Mm −Nm∣∣{n ∈ [Nm,Mm − 1] | σnxS ∈ σ−1B }

∣∣=

1

Mm −Nm∣∣{n ∈ [Nm + 1,Mm] | σnxS ∈ B }

∣∣So the difference is such that∣∣µm(B)− µm(σ−1B)

∣∣ ≤ 1

Mm −Nm→ 0

It can be shown that we can pass to the limit on m and conclude that µ(B) = µ(σ−1B).

Remark. If B is a cylinder set, i.e. ∃L ∈ Z>0 and B ⊆ {0, 1}2L+1 such that

B = {x ∈ X | (x−L, . . . , xL) ∈ B } ,

then B is both closed and open. Therefore χB , the characteristic function of B is continuous.Hence limn→∞ µm(B) = µ(B), since µm(B) =

∫χB dµm and µ(B) =

∫χB dµ.

Approximating any Borel set by such cylinder sets would help complete the proof, butwe in fact can get this result on spaces where χ is not continuous on a nice set of sets. Sowe leave full proof till a more general theorem.

Proposition. Let S ⊆ Z, xS , A, (X,B, µ, σ) as defined above. Let l ∈ Z>0. Suppose that∃n ∈ Z>0 such that

µ(A ∩ σ−n(A) ∩ · · · ∩ σ−n(l−1)(A)

)> 0.

Then S contains an arithmetic progression of length l.


Proof. Without loss of generality, we can assume µ = lim-w µm - if not, pass to a subse-quence. Let B = A ∩ σ−nA ∩ · · · ∩ σ−n(l−1)(A). Observe that B is a cylinder set. Then bythe remark, µ(B) = limµm(B), hence ∃m such that µm(B) > 0.

By definition of µm, ∃k ∈ [Nm,Mm − 1] such that σkxS ∈ B. Hence

σkxS ∈ A, σkxS ∈ σ−n(A), . . . , σkxS ∈ σ−n(l−1)(A).

Thus, k, k + n, . . . , k + n(l − 1) ∈ S.

Returning to the overall proof, we note A is a cylinder set. Then µm(A)→ µ(A), i.e.

µ(A) = limm→∞

1

Mm −Nm|{n ∈ [Nm,Mm − 1] : n ∈ S }|︸︷︷︸

d(S)

> 0

where the inequality comes from satisfying the conditions of Szemeredi, and Furstenberg’smultiple recurrence finishes the argument.


3 Ergodicity

Lemma.Lecture 3 Let (X,B, µ, T ) be a measure preserving system. Let A ∈ B with µ(A) > 0. Then∃n ∈ Z>0 such that µ(A ∩ T−nA) > 0.

Proof. Suppose for contradiction that µ(A ∩ T−nA) = 0 for all n > 0. Then

µ(T−kA ∩ T−nA) = µ(A ∩ T−(n−k)A) = 0

for all n > k ≥ 0. Thus the sets A, T−1A, . . . are ‘almost pairwise disjoint’.Then

µ(A ∪ T−1A ∪ · · · ∪ T−nA) = µ(A)

+ µ(T−1A)− µ(T−1A ∩A)︸︷︷︸=0

+ µ(T−2A)− µ(T−2A ∩ (A ∪ T−1A))︸︷︷︸=0

...

+ µ(T−nA)− µ(T−nA ∩ (A ∪ T−1A ∪ · · · ∪ T−(n−1)A))︸︷︷︸=0

= (n+ 1)µ(A),

a contradiction if n+ 1 > µ(A)−1.

Theorem (Poincare recurrence). Let (X,B, µ, T ) be a measure preserving system. LetA ∈ B with µ(A) > 0. Then almost every x ∈ A returns to A infinitely often. That is:

µ

(A \

∞⋂N=1

∞⋃n=N

T−nA

)= 0.

Remark. x ∈ T−nA ⇐⇒ Tnx ∈ A.⋃∞n=N T

−nA are the points that visit A at least onceafter time N .

Proof. Let A0 be the set of points in A that never return to A. We first show µ(A0) = 0.Note that

µ(A0 ∩ T−nA0) ≤ µ(A0 ∩ T−nA) = µ(∅) = 0

for all n > 0. By the lemma, µ(A0) = 0.Note that if

x ∈ A \∞⋂N=1

∞⋃n=N

T−nA,

then there is a maximal m ∈ Z≥0 such that Tmx ∈ A0. This means that

A \⋂⋃

T−nA ⊂∞⋃m=0

T−mA0

where the right hand side has measure 0.


This effectively answers the first question we asked earlier - the orbit of almost every xvisits A infinitely often if µ(A) > 0.

But what if the point does not start in A? The main issue that can occur is that Xsplits into parts, which are preserved under T :

So, we define a notion of ‘irreducible’ for measure preserving systems.

Definition (Ergodic). A measure preserving system is called ergodic if A = T−1A impliesµ(A) = 0 or 1 for all A ∈ B.

If a measure preserving system is not ergodic, we have A ∈ B with 0 < µ(A) < 1such that T−1A = A, then we can restrict the measure preserving system to A. That is,we consider the measure preserving system (A,BA, µA, T |A) where BA = {B ∈ B | B ⊆ A }and µA(B) = µ(B)

µ(A) for all B ∈ BA.

Theorem. The following are equivalent for an measure preserving system (X,B, µ, T ).

(1) (X,B, µ, T ) is ergodic.

(2) For all A ∈ B with µ(A) > 0,

µ

( ∞⋂N=1

∞⋃n=N

T−nA

)= 1.

(3) µ(A4 T−1A) = 0 implies µ(A) = 0 or 1 ∀A ∈ B.

(4) For all bounded measurable functions f : X → R, f = f ◦T almost everywhere impliesf is constant almost everywhere.

(5) For all bounded measurable functions f : X → C, f = f ◦T almost everywhere impliesf is constant almost everywhere.

Proof. (1) ⇒ (2). Let A ∈ B with µ(A) > 0. Let B =⋂⋃

T−nA. By Poincare recurrence,µ(B) ≥ µ(A) > 0. So if we show that B = T−1B, then µ(B) = 1 follows by ergodicity. But

x ∈ B ⇐⇒ x visits A infinitely often ⇐⇒ Tx visits A infinitely often ⇐⇒ Tx ∈ B.

So B = T−1B.(2) ⇒ (3). Let A ∈ B such that µ(A4 T−1A) = 0. If µ(A) = 0, there is nothing to

prove. Suppose µ(A) > 0. Let B =⋂⋃

T−nA. By (2), we know that µ(B) = 1.We show µ(B \ A) = 0, which completes the proof. Let x ∈ B \ A, then there is a first

time m such that Tmx ∈ A, and m > 0. Hence x ∈ T−mA \ T−(m−1)A. Thus

B \A ⊆⋃m

T−mA \ T−(m−1)A,


and µ(T−mA \ T−(m−1)A) = µ(T−1A \A) = 0, so µ(B \A) = 0.(3)⇒ (4). Let f : X → R be a bounded measurable function such that f = f ◦T almost

everywhere. For all t ∈ R, let At = {x ∈ X | f(x) ≤ t }. Then µ(At4 T−1At) = 0. By (3),we have µ(At) = 0 or 1 for all t.

If t is very small, then µ(At) = 0. If t is very large, µ(At) = 1. t 7→ µ(At) is monotone,hence ∃c ∈ R such that µ(At) = 0 for all t < c and µ(At) = 1 ∀t > c. Then f(x) = c almosteverywhere.

(4) ⇔ (5) is left as an exercise.(4) ⇒ (1). Let A ∈ B with A = T−1A. Then χA = χA ◦ T everywhere so χA is constant

almost everywhere.

Example. The circle rotation map (R/Z,B, µ,Rα) is ergodic iff α is irrational. Let f :X → R be measurable. We can write f(x) =

∑n∈Z an exp(2πinx).

f ◦Rα(x) = f(x+ α) =∑n∈Z

an exp(2πin(x+ α))

=∑n∈Z

an exp(2πinα) exp(2πinx)

So f = f ◦Rα ⇐⇒ an = an exp(2πinα) ∀n. If α is irrational, then exp(2πinα) 6= 1 for alln 6= 0, so an = 0.


4 Ergodic theorems

InLecture 4 the previous section, we discussed recurrence, which is concerned with orbits visiting aparticular set infinitely often. However, we have not addressed how frequent these visitsare: the second question we asked earlier.

Theorem (Mean Ergodic Theorem, von Neumann). Let (X,B, µ, T ) be a measure preserv-ing system, and write

I := { f ∈ L2(X) | f ◦ T = f a.e. }

for the (closed) subspace of T -invariant functions. Denote by PT : L2(X)→ I the orthogonalprojection. Then for every f ∈ L2(X):

1

N

N−1∑n=0

f ◦ Tn → PT f in L2(X).

Theorem (Pointwise Ergodic Theorem, Birkhoff). Let (X,B, µ, T ) be a measure preservingsystem. Then for every f ∈ L1(X), there is a T -invariant function f∗ ∈ L1(X) such that

1

N

N−1∑n=0

f ◦ Tn → f∗(x) pointwise a.e.

When f ∈ L2(X), then the function f∗ in the Pointwise Ergodic Theorem is PT f . TheMean Ergodic Theorem also holds in Lp for 1 ≤ p <∞, see the example sheet. We call thequantity 1

N

∑N−1n=0 f ◦ Tn the ergodic average.

Lemma. Let (X,B, µ) be a probability space, and let T : X → X be a measurable trans-formation. Then T is measure preserving if and only if∫

X

f ◦ T dµ =

∫X

f dµ (∗)

for all f ∈ L1(X).

Proof. (⇒). Let A ∈ B and note that x ∈ T−1(A) is equivalent to Tx ∈ A. Hence we canwrite

µ(T−1(A)) =

∫χT−1A dµ =

∫χA ◦ T dµ =

∫χA dµ = µ(A)

so T is measure preserving.(⇐). As in (⇒), we can show (∗) for characteristic functions. Then by linearity of

integration, it also holds for simple functions. Now let f ∈ L1(X) be non-negative. Takingf1, f2, . . . an increasing sequence of simple functions with f = lim fn almost everywhere.Then f ◦ T = lim fn ◦ T almost everywhere (since T is measure preserving). Hence usingthe definition of integration and (∗) for simple functions∫

f ◦ T dµ = lim

∫fn ◦ T dµ = lim

∫fn dµ =

∫f dµ.

Finally, a general f ∈ L1(X) can be written as the difference of two non-negative functions,and conclude (∗) by linearity.


Definition (Koopman operator). Let (X,B, µ, T ) be a measure preserving system. Forf : X → C a measurable function, define

UT f = f ◦ T

and call UT the Koopman operator.

Lemma. The Koopman operator UT is an isometry on the Hilbert space L2(X), i.e.

〈f, g〉 = 〈UT f, UT g〉

for all f, g ∈ L2(X).

Proof. Apply the previous lemma to f · g ∈ L1(X).

Definition (Invertible). A measure preserving system (X,B, µ, T ) is said to be invertibleif there is a measure preserving map S : X → X such that T ◦ S = S ◦ T = idX almosteverywhere. If such a map exists, denote it as T−1.

Example. The circle rotation is invertible, but the doubling map is not.

Lemma. If the measure preserving system (X,B, µ, T ) is invertible, then UT is unitary onL2(X) and U∗T = UT−1 .

Proof. We use the earlier lemma to the function f · (g ◦ T−1):

〈f, UT−1g〉 =

∫f · g ◦ T−1 dµ

=

∫(f · (g ◦ T−1)) ◦ T dµ

=

∫f ◦ T · g dµ

= 〈UT f, g〉

This shows that U∗T = UT−1 . Clearly UTUT−1 = UT−1UT = idL2(X), so UT is unitary.

The proofs we will give of the ergodic theorems rely on the idea that convergence is easyfor certain special functions.

First, the ergodic averages of a T -invariant function f ∈ I are equal to f , so they convergeto f . If f = UT g− g for some g, then the ergodic averages become telescopic sums and onlythe boundary terms remain, that is:

1

N

N−1∑n=0

UnT (UT g − g) =1

N(UNT g − g)

and the right hand side is easily seen to converge to 0. The following lemma shows thatthese two type of functions are enough to look at.

Lemma. WriteB := {UT g − g | g ∈ L2(X) } .

Then I = B⊥.


It is important to note that the space B is not closed. So L2(X) = I ⊕ B, but not everyfunction in L2(X) is the sum of a T -invariant function and an element of B.

Proof. We can write

f ∈ B⊥ ⇐⇒ 〈f, UT g − g〉 = 0 ∀g ∈ L2(X)

⇐⇒ 〈f, g〉 = 〈f, UT g〉 = 〈U∗T f, g〉 ∀g ∈ L2(X)

⇐⇒ f = U∗T f.

If the system was invertible, then we could finish the proof by applying UT = (U∗T )−1 toboth sides of the last equation.

But, we can prove that f = UT f ⇐⇒ f = U∗T f in the general case:

f = UT f ⇐⇒ 〈f − UT f, f − UT f〉 = 0

⇐⇒ ‖f‖22 + ‖UT f‖22 − 〈f, UT f〉 − 〈UT f, f〉 = 0

⇐⇒ ‖f‖22 + ‖U∗T ‖22 − 〈U∗T f, f〉 − 〈f, U∗T f〉+ (‖UT f‖22 − ‖U∗T f‖22) = 0

⇐⇒ ‖f − U∗T f‖22 + (‖UT f‖22 − ‖U∗T f‖22) = 0

Note that both terms in the left hand side of the final equation are non-negative, since‖UT f‖2 = ‖f‖2 since UT is an isometry, and ‖U∗T f‖2 ≤ ‖f‖2 as ‖U∗T ‖ = ‖UT ‖ = 1.

Thus,f = UT f ⇐⇒ f = U∗T f and ‖f‖2 = ‖U∗T f‖2.

The second statement of the right follows from the first one, hence f = UT f ⇐⇒ f = U∗T f ,as required.

Proof of Mean Ergodic Theorem. Let f ∈ L2(X) and fix ε > 0. By the lemma, we can writef = PT f + UT g − g + e for some e, g ∈ L2(X) such that ‖e‖2 < ε. Thus

lim supn→∞

∥∥∥∥∥ 1

N

N−1∑n=0

UnT f − PT f

∥∥∥∥∥2

≤ lim supn→∞

∥∥∥∥∥ 1

N

(UNT g − g

)+

1

N

N−1∑n=0

UnT e

∥∥∥∥∥2

< ε

and conclude by taking ε→ 0.


5 Proof of pointwise ergodic theorem

ItLecture 5 is first useful to prove:

Theorem (Maximal Ergodic Theorem, Wiener). Let (X,B, µ, T ) be a measure preservingsystem. Let f ∈ L1, α ∈ R>0. Let

Eα =

{x ∈ X

∣∣∣∣∣ supN>0

1

N

N−1∑n=0

f(Tnx) > α

}.

Then µ(Eα) ≤ α−1‖f‖1.

First prove a useful proposition.

Proposition. Let (X,B, µ, T ) be a measure preserving system. Let f ∈ L1. Set

f0 = 0

f1 = f

f2 = f ◦ T + f...fn = f ◦ Tn−1 + · · ·+ f ◦ T + f

andFN = max

n=0,...,Nfn.

Then ∫EN

f dµ ≥ 0 ∀N.

where EN = {x ∈ X | FN (x) > 0 }.

Proof. Suppose that FN (x) > 0 for some x. Then FN (x) = fn(x) for some n ∈ {1, . . . , N}.Then

FN (x) = fn−1(Tx) + f(x) ≤ FN (Tx) + f(x)

=⇒ f(x) ≥ FN (x)− FN (Tx)∫EN

f(x) dµ ≥∫EN

(FN (x)− FN (Tx)) dµ

note if FN (x) ≮ 0, then FN (x)− FN (Tx) ≤ 0, so

≥∫X

FN (x)− FN (Tx) dµ = 0.

Proof of Maximal Ergodic Theorem. Define

Eα,M =

{x ∈ X

∣∣∣∣∣ maxN=1,...,M

1

N

N−1∑n=0

f(Tnx) > α

}

=

{x ∈ X

∣∣∣∣∣ maxN=1,...,M

N−1∑n=0

(f(Tnx)− α) > 0

}.


We apply the proposition for the function f − α. Then∫Eα,M

(f(x)− α) dµ ≥ 0.

Then ∫Eα,M

f(x) dµ ≥ αµ(Eα,M )

and∫Eα,M

f(x) dµ ≤ ‖f‖1. Note that Eα =⋃M Eα,M , and this is an increasing union.

Proof of Pointwise Ergodic Theorem. Fix ε > 0.Using the density of L2 ∩L1 in L1 (since simple functions are dense), we have ∃fε ∈ L2,

eε,1 ∈ L1 such that f = fε + eε1 , with ‖eε,1‖1 < ε.Next, recall the subspace I from the Mean Ergodic Theorem, and the result that L2(X) =

I ⊕ B. Thus we can say ∃gε ∈ L2, eε,2 ∈ L1 such that fε = PT fε + gε ◦ T − gε + eε,2 and‖eε,2‖1 ≤ ‖eε,2‖2 < ε (L1 norm is bounded by the L2 norm in probability spaces).

Finally, ∃hε ∈ L∞, eε,3 ∈ L1 such that gε = hε + eε,3 and ‖eε,3‖1 < ε since L∞ is densein L1, and gε is in L1.

Thus, f = PT fε + hε ◦ T − hε + eε, where eε ∈ L1 with ‖eε‖1 < 4ε.

1

N

N1∑n=0

f(Tnx) = PT fε(x) +1

N

(hε(T

Nx)− hε(x))

+1

N

N−1∑n=0

eε(Tnx).

Let

Eε,α =

{x ∈ X

∣∣∣∣∣ lim supN→∞

∣∣∣∣∣ 1

N

N−1∑n=0

f(Tnx)− PT fε(x)

∣∣∣∣∣ > α

}.

Applying the Maximal Ergodic Theorem for the function eε:

µ(Eε,α) ≤ α−1‖eε‖1 ≤4ε

α.

Let F be the set of points x such that 1N

∑N−1n=0 f(Tnx) does not converge at x. Then

F ⊂⋃Fα where

Fα =

{x ∈ X

∣∣∣∣∣ lim supN1,N2→∞

∣∣∣∣∣ 1

N1

N1−1∑n=0

f(Tnx)− 1

N2

N2−1∑n=0

f(Tnx)

∣∣∣∣∣ > 2α

}.

Notice Fα ⊂ Eε,α for all ε > 0.

µ(Fα) ≤ µ(Eε,α) ≤ 4ε

α.

Therefore µ(Fα) = 0.We can take a countable sequence of α’s and conclude µ(F ) = 0. We proved that

1N

∑N−1n=0 f(Tnx)→ f∗(x) for some function f∗.

By Fatou’s lemma, f∗ ∈ L1. It remains to prove f∗(x) = f∗(Tx) almost everywhere.


For almost every x,

f∗(x) = limN→∞

1

N

N−1∑n=0

f(Tnx)

f∗(Tx) = limN→∞

1

N

N−1∑n=0

f(Tn+1x)

= limN→∞

1

N

N∑n=1

f(Tnx)

= limN→∞

1

N − 1

N−1∑n=1

f(Tnx)

= limN→∞

1

N

N−1∑n=1

f(Tnx)

Then f∗(x)− f∗(Tx) = lim 1N f(x) = 0.


6 Unique ergodicity

Definition (Normal). ALecture 6 number x ∈ [0, 1) is called normal in base K, if for everyb1, b2, . . . , bM ∈ {0, . . . ,K}, we have

1

N|{n ∈ {0, . . . , N − 1} | xn+1 = b1, . . . , xn+M = bM }| →

1

KM

where x = 0.x1x2 · · ·(K) is the base K expansion.

Theorem. Almost every number (with respect to Lebesgue measure) is normal in everybase k ≥ 2.

Proof. Consider the measure preserving system (R/Z,B,m, TK) where TK(x) = K · x andm refers to Lebesgue measure. On the example sheet, we show this is an ergodic measurepreserving system. Now fix M and b1, . . . , bM as in the definition and consider the set

A =

[0.b1b2 . . . bM (K), 0.b1b2 . . . bM (K) +

1

KM

).

Note: Tnx ∈ A ⇐⇒ xn+1 = b1, . . . , xn+M = bM . To see that x is normal we need:

1

N

N−1∑n=0

χA(Tnx)→ 1

KM.

This holds by the Pointwise Ergodic Theorem for almost every x. Since there are countablymany choices for K,M, b1, . . . , bm, the theorem follows.

The Pointwise Ergodic Theorem as applied here shows that a property holds for almostevery number, but gives no information about any specific numbers. Unique ergodicityattempts to ask when does the Pointwise Ergodic Theorem hold for all points.

If there is more than one T -invariant measure for a given transformation T , then wecan apply the Pointwise Ergodic Theorem for both measures, which can give different limitsfor the ergodic averages. (This is not a contradiction, because the set where convergenceholds in one case may be null with respect to the other measure). However, this suggeststhat everywhere convergence of ergodic averages prohibits the existence of multiple invariantmeasures.

Definition (Topological dynamical system). A topological dynamical system is a tuple(X,T ), where X is a compact metric space and T : X → X is a continuous map.

Definition (Unique ergodicity). We say that a topological dynamical system (X,T ) isuniquely ergodic, if there is only one T -invariant Borel probability measure on X.

Theorem. Let (X,T ) be a topological dynamical system. The following are equivalent:

(1) (X,T ) is uniquely ergodic

(2) For every f ∈ C(X), there is cf ∈ C such that

1

N

N−1∑n=0

f(Tnx)→ cf

uniformly on X.


(3) There is a dense A ⊂ C(X) such that for each f ∈ A there is cf ∈ C such that

1

N

N−1∑n=0

f(Tnx)→ cf

for all x ∈ X, not necessarily uniformly.

Theorem (Riesz representation theorem). Let X be a compact metric space. Then to eachfinite Borel measure on X, we associate bounded linear functional on C(X) as follows:

Lµf =

∫f dµ.

Then µ 7→ Lµ is a bijection from the space of Borel measureM(X) on X to bounded linearfunctionals on C(X).

Corollary. Let µ1, µ2 be Borel measures on a compact metric space. Then µ1 = µ2 iff∫f dµ1 =

∫f dµ2 ∀f ∈ C(X).

Definition (Push-forward measure). Let (X,T ) be a topological dynamical system, let µbe a Borel measure. The push-forward of µ via T is the measure

T∗µ(A) = µ(T−1A) ∀A ∈ B.

Lemma. Let X,T, µ be as above. Then∫f dT∗µ =

∫f ◦ T dµ

for every bounded measurable f .

Proof sketch. First prove this for characteristic functions of sets. Let A ∈ B.∫χA dT∗µ = T∗µ(A)

= µ(T−1A)

=

∫χT−1A dµ

=

∫χA ◦ T dµ.

Using the standard argument, this can be extended to all bounded measurable f .

Lemma. µ is T -invariant iff∫f dµ =

∫f ◦ T dµ ∀f ∈ C(X). (∗)

Proof. We have already seen that µ being T -invariant ⇒ (∗). For the other direction:Suppose that (∗) holds. Then∫

f dT∗µ =

∫f ◦ T dµ =

∫f dµ ∀f ∈ C(X).

By the corollary, we have µ = T∗µ.


Theorem. Let (X,T ) be a topological dynamical system. Let νj be a sequence of Borelprobability measures on X. Let {Nj} ⊂ Z≥0 be a sequence such that Nj → ∞. Let µ bethe weak limit of a subsequence of

1

Nj

Nj−1∑n=0

Tn∗ νj .

Then µ is T -invariant.

Proof. Fix f ∈ C(X). Without loss of generality, assume

lim-w1

Nj

Nj−1∑n=0

Tn∗ νj = µ

(if not, pass to a subsequence). Now,

∫f ◦ T dµ = lim

j→∞

∫f ◦ T d

1

Nj

Nj−1∑n=0

Tn∗ νj

= limj→∞

1

Nj

Nj−1∑n=0

∫f ◦ T dTn∗ νj

= limj→∞

1

Nj

Nj−1∑n=0

∫f ◦ Tn+1 dνj

= limj→∞

1

Nj

Nj∑n=1

∫f ◦ Tn dνj

We can expand∫f dµ similarly:

∫f dµ = lim

j→∞

1

Nj

Nj−1∑n=0

∫f ◦ Tndνj

Then∣∣∣∣∫ f dµ−∫f ◦ T dµ

∣∣∣∣ ≤ lim supj→∞

‖f‖∞ + ‖f ◦ T‖∞Nj

= lim supj→∞

‖f‖∞ + ‖f‖∞Nj

→ 0.

Lecture 7 Now, return to prove the equivalent conditions for unique ergodicity stated earlier inthis section.

Proof. (1) ⇒ (2). Suppose that (2) fails with cf =∫f dµ, where µ is the unique invariant

measure. Then

∃ε > 0, ∃x1, x2, . . . X, ∃Nj ∈ Z such that∣∣∣∣∣∣ 1

Nj

Nj−1∑n=0

f(Tnxj)−∫f dµ

∣∣∣∣∣∣ > ε. (∗)


We may suppose that

1

Nj

Nj−1∑n=0

f(Tnxj)→ a

for some a ∈ C. Moreover, we can also assume that

1

Nj

Nj−1∑n=0

Tn∗ δxjweak−−−→ ν

for some probability measure ν. By the theorem earlier, ν is T -invariant. Also

∫f dν = lim

j→∞

1

Nj

Nj−1∑n=0

∫f dTn∗ δxj

= limj→∞

1

Nj

Nj−1∑n=0

f(Tnxj) = a.

By (∗), |a−∫f dµ| > ε, so

∫fdµ 6=

∫f dν, hence µ 6= ν, a contradiction.

(3) ⇒ (1). Let µ, ν be T -invariant probability measures. We will show that∫f dµ =∫

f dν ∀f ∈ A. Since A is dense, this also holds ∀f ∈ C(X). By the corollary to RieszRepresentation theorem, µ = ν. We know

1

N

N−1∑n=0

f(Tnx)→ cf ∀x ∈ X.

By dominated convergence, ∫1

N

N−1∑n=0

f(Tnx) dµ︸︷︷︸=∫f dµ ∀N

→ cf

Thus∫fdµ = cf . The same argument gives

∫fdν = cf .

Clearly (2) ⇒ (3), closing the loop.

Example. Let α ∈ R/Z be irrational. Then the circle rotation (R/Z, Rα) is uniquelyergodic. Indeed, let µ be an Rα-invariant measure. Then∫

exp(2πinx) dµ =

∫exp(2πinRα(x)) dµ

=

∫exp(2πin(x+ α)) dµ

= exp(2πinα) exp(2πinx) dµ.

Since α is irrational, exp(2πinx) 6= 1 if n 6= 0. Then∫exp(2πinx) dµ = 0 ∀u 6= 0

∫1 dµ = 1. (∗∗)


Let f be a trigonometric polynomial, that is a finite linear combination of the functionsexp(2πinx) when n ∈ Z. Then (∗∗) implies∫

f dµ =

∫f(x) dx

Now use the fact that trigonometric polynomials are dense in C(X): Stone-Weierstrasstheorem.

Definition (Equidistributed). A sequence x1, x2, . . . ∈ [0, 1) is said to be equidistributedif

1

N

N−1∑n=0

f(xn)→∫R/Z

f(x) dx ∀f ∈ C(R/Z).

Remark. Let 0 ≤ a < b < 1. Then x1, x2, . . . is equidistributed if and only if

1

N|{n ∈ [0, N − 1] | xn ∈ [a, b] }| → b− a.

Corollary. {nα+ x} is equidistributed for all α irrational ∀x ∈ [0, 1].

Proof. This follows from the unique ergodicity of the circle rotation.


7 Equidistribution of polynomials

Definition (Generic). Let (X,B, µ, T ) be a measure preserving system. Suppose that X isa compact metric space, and T : X → X is continuous. Then x ∈ X is called generic withrespect to µ if the following holds

1

N

N−1∑n=0

Tn∗ δxweak−−−→ µ.

Equivalently,

1

N

N−1∑n=0

f(Tnx)→∫f dµ ∀f ∈ C(X). (†)

Lemma. µ-almost every x ∈ X is µ-generic.

Proof. By the Pointwise Ergodic Theorem, ∀f ∈ C(X), ∃Xf with µ(Xf ) = 1 such that (†)holds. Observe that every point in

⋂f∈AXf , where A ⊂ C(X) is dense and countable is

µ-generic.

Theorem (Furstenberg’s skew-product theorem). Let (X,T )Lecture 8 be a uniquely ergodic topolog-ical dynamical system. Denote by µ the invariant measure. Write S : X ×R/Z→ X ×R/Zdefined by

S(x, y) = (Tx, y + c(x))

where c : X → R/Z is a fixed continuous function. Then µ ×m, where m is the Lebesguemeasure on R/Z is S-invariant. In addition, if µ × m is S-ergodic, then (X × R/Z, S) isuniquely ergodic.

This result holds with any compact metrizable topological group in place of R/Z, calledthe skew-product construction.

Proof. Let f ∈ C(X × R/Z).∫∫f(S(x, y)) dµ(x) dy =

∫X

∫ 1

0

f(Tx, y + c(x)) dy dµ(x)

=

∫X

∫ 1−c(x)

−c(x)

f(Tx, y) dy dµ(x)

=

∫R/Z

∫X

f(Tx, y) dµ(x) dy

=

∫∫f(x, y) dµ(x) dy.

So µ ×m is indeed S-invariant. Now we assume that µ ×m is S-ergodic. We show that(X × R/Z, S) is uniquely ergodic. Let E be the set of µ×m-generic points. We have seenthat ergodicity implies µ×m(E) = 1.

Claim: If (x, y) ∈ E, then (x, t+ y) ∈ E for all t ∈ R/Z.Proof: Observe S ◦ Ut = Ut ◦ S, where Ut(x, y) = (x, t+ y). Indeed:

S ◦ Ut(x, y) = S(x, t+ y) = (Tx, t+ y + c(x))

= Ut(Tx, y + c(x)) = Ut ◦ S(x, y).


Let f ∈ C(X × R/Z). Write

1

N

N−1∑n=0

f(Sn(x, t+ y)) =1

N

N−1∑n=0

f(SnUt(x, y))

=1

N

N−1∑n=0

f(UtSn(x, y))

using that (x, y) ∈ E

→∫∫

f ◦ Ut dmdµ

=

∫∫f dmdµ

So (x, t+ y) is indeed generic, i.e. (x, t+ y) ∈ E. �This means E = A×R/Z for some A ⊂ X Borel. Note µ(A) = µ×m(E) = 1. Let ν be

an S-invariant measure on X × R/Z.Next, aim to show ν(E) = 1. Write P for the projection X × R/Z → X. We show

P∗ν = µ. It is enough to show that P∗ν is T -invariant. Let B ⊂ X Borel.

P∗ν(T−1(B)) = ν(T−1(B)× R/Z) = ν(S−1(B × R/Z)) = ν(B × R/Z) = P∗ν(B)

So indeed P∗ν = µ. Then P∗ν(A) = µ(A) = 1, and ν(E) = ν(A× R/Z) = 1.Finally, we show that∫

f dν =

∫∫f dµ dm ∀f ∈ C(X × R/Z),

and this proves ν = µ×m. If (x, y) ∈ E, then

1

N

N−1∑n=0

f(Sn(x, y))→∫∫

f dµdm. (∗)

Since ν(E) = 1, (∗) holds ν-almost everywhere. By dominated convergence,∫1

N

N−1∑n=0

f(Sn(x, y)) dν︸︷︷︸=∫f dν ∀N

→∫∫

f dµ dm

So∫f dν =

∫∫f dµ dm indeed.

Corollary. Let S : (R/Z)d → (R/Z)d given by

S(x1, x2, . . . , xd) = (x1 + α, x2 + x1, . . . , xd + xd−1)

where α ∈ R/Z is a fixed irrational number. Then ((R/Z)d, S) is uniquely ergodic.

Proof. By induction on d. d = 1 case is the circle rotation that has already been discussed.Suppose d ≥ 2, and the claim holds for d − 1. By Furstenberg’s theorem, it is enough toshow that ((R/Z)d,B,md, S) is ergodic (where m is Lebesgue measure).


Let f be a bounded measurable function on (R/Z)d. Then

f(x) =∑n∈Zd

an exp(2πin · x)

f(Sx) =∑n∈Zd

an exp(2πi(n1(x1 + α) + n2(x2 + x1) + · · ·+ nd(xd + xd−1)))

=∑n∈Zd

exp (2πin1α) an exp(

2πi((n1 + n2)x1 + (n2 + n3)x2 + · · ·

+ (nd−1 + nd)xd−1 + ndxd))

=∑n∈Zd

e2πin1αane2πiS(n)·x,

whereS(n) = (n1 + n2, n2 + n3, . . . , nd−1 + nd, nd).

Suppose f = f ◦ S almost everywhere. Then

aS(n) = exp(2πiαn1)an,

in particular |aS(n)| = |an|.Suppose that am 6= 0 for some m ∈ Zd. We aim to show m = 0, which implies that f

is constant. By Parseval’s formula,∑n∈Z|an|2 = ‖f‖22 <∞.

This means that there are at most finite n’s such that |am| = |an|. In particular, the orbitm, S(m), S2(m), . . . must be periodic. Note(

Sj(m))d−1

= md−1 + jmd.

Thus md = 0. A similar argument gives mj = 0 ∀j = 2, 3, . . . , d. Then Sj(m) = m. Wehave

am = exp(2πiαm1)am.

Since am 6= 0, we must haveexp(2πiαm1) = 1.

As α is irrational, m1 = 0.

Theorem (Weyl). LetLecture 9 P (x) = adxd+· · ·+a1x+a0 be a polynomial such that aj is irrational

for at least one j 6= 0. Then the sequence ({P (n)})n≥Z>0is equidistributed in [0, 1).

Proof. First consider the case where ad is irrational. Recall that the system ((R/Z)d, S)is uniquely ergodic, where S(x1, . . . , xd) = (x1 + α, x2 + x1, . . . , xd + xd−1) and α ∈ R/Zirrational. Note

Sn

x1

x2

...xd

=

(n0

)x1 +

(n1

)α(

n0

)x2 +

(n1

)x1 +

(n2

)α

...(n0

)xd +

(n1

)xd−1 + · · ·+

(nd−1

)x1 +

(nd

)α


This can be proved by induction on n. Consider the polynomials

qj =t(t− 1) · · · (t− j + 1)

j!.

The polynomials q0, q1, . . . , qd form a basis in the vector space of polynomials of degree ≤ d.In particular there are x1, x2, . . . , xd, α ∈ R such that

p(t) = αqd(t) + x1qd−1(t) + . . .+ xdq0(t), (∗)

and α = αd · d! is irrational. Then there are α, x1, . . . , xd ∈ R/Z such that

p(n) = α

(n

d

)+ x1

(n

d− 1

)+ · · ·+ xd

(n

0

)(mod Z)

for all n ∈ Z. Let f ∈ C(R/Z) and let g(x1, . . . , xd) = f(xd) ∈ C((R/Z)d).Now we have

1

N

N−1∑n=0

f(p(n)) =1

N

N−1∑n=0

g(Sn(x1, . . . , xd)) (∗∗)

where x1, . . . , xd are as in (∗) and α in the definition of S is also coming from (∗).By unique ergodicity,

(∗∗) =⇒∫· · ·∫g(t1, . . . , td) dt1 . . . dtd =

∫f(t) dt.

This proves equidistribution.General case: Let j be maximal such that aj is irrational. Let q ∈ Z>0 be such that

qad, qad−1 . . . , qaj+1 ∈ Z. Fix b ∈ {0, 1, . . . , q − 1}. Note:

p(qn+ b) = adbd + · · ·+ aj+1b

j+1 + aj(qn+ b)j + · · ·+ a1(qn+ j) + a0 (mod Z)

This is a polynomial in n with irrational leading coefficient. By the special case, {P (qn+b)}equidistributes for each fixed b.


8 Mixing properties

Definition (Mixing). An measure preserving system (X,B, µ, T ) is called mixing if

∀A,B ∈ B, ∀ε > 0, ∃N such that

|µ(A ∩ T−nB)− µ(A)µ(B)| < ε ∀n > N

Definition (Mixing on k sets). A measure-preserving system (X,B, µ, T ) is called mix-ing!on k setsmixing on k sets if ∀A0, A1, . . . , Ak−1 ∈ B and ∀ε > 0, ∃N such that

|µ(A0 ∩ T−n1A1 ∩ · · · ∩ T−nk−1Ak−1)− µ(A0)µ(A1) · · ·µ(Ak−1)| < ε

∀n1, . . . , nk−1 if n1 > N,n2 − n1 > N, . . . , nk−1 − nk−2 > N .

Open problem: Is there a measure preserving system that is mixing on 2 sets but not on3 sets?

Definition (Density). A subset S ⊂ Z>0 has full density if

|S ∩ [1, N ]|N

→ 1 as N →∞.

We say that a sequence of complex numbers an converges in density to a ∈ C if:{n | |an − a| < ε } has full density ∀ε > 0. In notation, write D-lim an = a.

Definition (Cesaro convergence). We say that an Cesaro-converges to a if

1

N

N∑n=1

an → a as N →∞.

Denote this C-lim an = a.

Definition. A measure preserving system (X,B, µ, T ) is weak mixing if ∀A,B ∈ B, wehave

D-limn→∞

µ(A ∩ T−nB) = µ(A)µ(B).

Lemma. LetLecture 10 (an) ⊂ R be a bounded sequence, let a ∈ R. Then the following are equivalent.

(1) D-limn→∞ an = a.

(2) C-limn→∞ |an − a| = 0.

(3) C-limn→∞(an − a)2 = 0.

(4) C-limn→∞ an = a and C-lim a2n = a2.

Proof.

• (1) ⇒ (2). Fix ε > 0. Let M = sup |an|. Suppose that N is large enough such that

1

N|{n ∈ [1, N ] | |an − a| > ε }| < ε.


We estimate

1

N

N∑n=1

|an − a| ≤1

N(ε ·N + 2M · εN)

= ε(1 + 2M).

M is constant, ε is arbitrary, so

1

N

N∑n=1

|an − a| → 0 as N →∞.

• (2) ⇒ (1). Fix ε > 0.

ε |{n ∈ [1, N ] | |an − a| > ε }| ≤N∑n=1

|an − a|

By (2), we get εN |{n ∈ [1, N ] | |an − a| > ε }| → 0. This proves (1).

• The argument (1) ⇔ (3) is the same.

• (1),(2) ⇒ (4). ∣∣∣∣∣ 1

N

N∑n=1

(an − a)

∣∣∣∣∣ ≤ 1

N

N∑n=1

|an − a| → 0

by (2). This implies C-lim an = a.

By the definition of D-lim:

D-lim an = a =⇒ D-lim a2n = a2

So using (1) and the implication (1) ⇒ (3), C-lim |a2n − a2| = 0 hence C-lim a2

n = a2.

• (4) ⇒ (3).

1

N

N∑n=1

(an − a)2 =1

N

N∑n=1

a2n +

1

N

N∑n=1

a2 − 21

N

N∑n=1

aan

→ a2 + a2 − 2a · a = 0.

Theorem. Let (X,B, µ, T ) be a measure preserving system. The following are equivalent

(1) (X,B, µ, T ) is weak mixing.

(2) (X × Y,B × C, µ × ν, T × S) is ergodic for any ergodic measure preserving system(Y, C, ν, S).

(3) (X ×X,B × B, µ× µ, T × T ) is ergodic.

(4) (X ×X,B × B, µ× µ, T × T ) is weak mixing.

(5) UT has no non-constant eigenfunctions, i.e. if f : X → C is measurable and λ ∈ Csuch that f ◦ T = λf almost everywhere, then f is constant a.e.


We will come back to prove this in a moment.

Lemma. Let (X,B, µ, T ) be a measure preserving system. Let S ⊂ B be a semi-algebra(π-system) that generates B. Then (X,B, µ, T ) is weak mixing iff

D-limn→∞

µ(T−nA ∩B) = µ(A)µ(B) ∀A,B ∈ S

and (X,B, µ, T ) is ergodic iff

C-limn→∞

µ(T−nA ∩B) = µ(A)µ(B).

Proof. See example sheet 2, question 2.

Proof. (1) ⇒ (2). Let S be the set of measurable rectangles, i.e. sets of the form B × C,where B ∈ B, C ∈ C. We write for B1 × C1, B2 × C2 ∈ S:∣∣∣∣∣ 1

N

N∑n=1

[µ× ν((T × S)−n(B1 × C1) ∩B2 × C2)− µ× ν(B1 × C1)µ× ν(B2 × C2)]

∣∣∣∣∣=

∣∣∣∣∣ 1

N

N∑n=1

[µ(T−nB1 ∩B2)ν(S−nC1 ∩ C2)− µ(B1)µ(B2)ν(C1)ν(C2)]

∣∣∣∣∣≤ 1

N

∣∣∣∣∣N∑n=1

µ(T−nB1 ∩B2)ν(S−nC1 ∩ C2)− µ(B1)µ(B2)ν(S−nC1 ∩ C2)

∣∣∣∣∣+

1

N

∣∣∣∣∣N∑n=1

[µ(B1)µ(B2)ν(S−nC1 ∩ C2)− µ(B1)µ(B2)ν(C1)ν(C2)

]∣∣∣∣∣≤ 1

N

N∑n=1

|µ(T−nB1 ∩B2)− µ(B1)µ(B2)|

+1

Nµ(B1)µ(B2)

∣∣∣∣∣N∑n=1

[ν(S−nC1 ∩ C2)− ν(C1)ν(C2)

]∣∣∣∣∣→ 0.

(2) ⇒ (3). (3) is a special case of (2) if we show that (X,B, µ, T ) is ergodic. That canbe seen by applying (2) with |Y | = 1, S = idY .

(3) ⇒ (1). Let A,B ∈ B. We have

C-limµ× µ((T × T )−n(A×X) ∩ (B ×X)) = µ× µ(A×X)µ× µ(B ×X)

C-limµ(T−nA ∩B)µ(T−nX ∩X) = µ(A)µ(B)µ(X)2

C-limµ(T−nA ∩B) = µ(A)µ(B)

Same argument with A×A and B ×B in place of A×X and B ×X gives

C-limµ(T−nA ∩B)2 = (µ(A)µ(B))2.

ThenD-limµ(T−nA ∩B) = µ(A)µ(B).

This proves (1).


(1) ⇔ (4) is the same arguments.Lecture 11 (3)⇒ (5). Suppose that f ◦T = λ ·f µ−a.e. Consider the function f(x, y) = f(x) ·f(y).

We will show that f is T × T -invariant. If f is not constant, then f is also not constant, so(X ×X, . . . ) is not ergodic.

f(Tx, Ty) = f(Tx) · f(Ty) = λf(x) · λf(y) = |λ|2f(x, y).

Since UT is an isometry,

〈f, f〉 = 〈UT f, UT f〉 = 〈λf, λf〉 = |λ|2〈f, f〉.

So |λ| = 1, and f ◦ T × T = f indeed.Showing condition (5) implies weak mixing is more difficult, and is non-examinable. Two

approaches are given in questions 7 and 8 of example sheet 2.

Theorem. Let (X,B, µ, T ) be a weak mixing measure preserving system and let k ∈ Z>0.Let f1, . . . , fk ∈ L∞. Then

1

N

N∑n=1

UnT f1 · U2nT f2 · · ·UknT fk︸︷︷︸

gN

L2

−→∫f1 dµ · · ·

∫fk dµ︸︷︷︸

γ

.

Corollary. Let notation be as above, and let f0 ∈ L∞. Then

1

N

N∑n=1

f0UnT f1 · U2n

T f2 · · ·UknT fk −→∫f0 dµ

∫f1 dµ · · ·

∫fk dµ.

In particular, let f0 = f1 = · · · = fk = χA for some A ∈ B. Then

1

N

N∑n=1

µ(A ∩ T−nA ∩ · · · ∩ T−nkA)→ µ(A)k+1,

which gives Furstenberg’s multiple recurrence for weak mixing systems.

Note the theorem says gn → γ in L2, and the corollary says 〈f0, gn〉 → 〈f0, γ〉, so iscertainly weaker.

Lemma (van der Corput). Let u1, u2, . . . be a bounded sequence in a Hilbert space H. Foreach h ∈ Z≥0, write

sh = lim supN→∞

∣∣∣∣∣ 1

N

N∑n=1

〈un, un+h〉

∣∣∣∣∣ .Suppose that D-limn→∞ sh = 0. Then∥∥∥∥∥ 1

N

N∑n=1

un

∥∥∥∥∥→ 0.


Proof idea. ∥∥∥∥∥ 1

N

N∑n=1

un

∥∥∥∥∥2

=1

N2

N∑n1=1

N∑n2=1

〈un1 , un2〉

=1

N2

(N∑n=1

‖un‖2 +

N−1∑n=1

N−h∑n=1

2 Re(〈un, un+h〉)

).

Proof. Fix ε > 0. Let H be such that

1

H

H−1∑h=0

sh < ε.

Write ∥∥∥∥∥ 1

N

N∑n=1

un −1

NH

N∑n=1

H∑h=1

un+h

∥∥∥∥∥ ≤ 1

N

[H∑n=1

‖un‖+

N+H∑n=N+1

‖un‖

]< ε

if N is large enough. unfinished

Lemma. Let (X,B, µ, T ) be a weak mixing measure preserving system. Let f, g ∈ L2.Then

D-limn→∞

〈UnT f, g〉 =

∫f dµ ·

∫g dµ.

Lemma. If (X,B, µ, T ) is weak mixing then so is (X,B, µ, T k), the k-fold iteration.

Both these lemmas are proved on the example sheet.

Proof of theorem. InductLecture 12 on k. If k = 1, then this reduces to the Mean Ergodic Theorem.Suppose that k > 1, and the theorem and corollary hold for k− 1. We first consider the

special case∫fk dµ = 0, and apply van der Corput’s lemma for

un = UnT f1 · U2nT f2 · · ·UknT fk ∈ L2(X).

It remains to check the conditions of van der Corput, so we compute sk.

〈un, un+h〉 =

∫UnT f1 · · ·UknT fk · Un+h

T f1 · · ·Uk(n+h)T fk dµ

=

∫f1 · UhT f1 · UnT (f2U

2hT f2) · · ·U (k−1)n

T (fkUkhT fk) dµ.

Apply the corollary for k − 1 and fjUjkT fj in the role of fj−1. Then we get:

limN→∞

1

N

N∑n=1

〈un, un+h〉 =

∫f1U

hT f1 dµ · · ·

∫fkU

khT fk dµ

lim supN→∞

1

N

∣∣∣∣∣N∑k=1

〈uk, un+k〉

∣∣∣∣∣ ≤ ‖f1‖2∞ · · · ‖fk−1‖2∞ ·∣∣∣∣∫ fkU

khT fk dµ

∣∣∣∣Ongoing course, rough 29 Updated online

By an earlier lemma, (X,β, µ, T k) is weak mixing. By the other lemma, we then have:

D-lim

∫fkU

khT fk dµ =

∫fkfk dµ = 0.

Therefore D-lim sk = 0. Then van der Corput’s lemma applies.This completes the proof of the special case. General case: We can write fk =

∫fk dµ+

f ′k, where∫f ′k dµ = 0. Then

1

N

N−1∑n=0

UnT f1U2nT f2 · · ·UknT fk

=1

N

N−1∑n=0

UnT f1U2nT f2 · · ·UknT f ′k︸︷︷︸→0

+

∫fk dµ ·

1

N

N−1∑k=0

UnT f1 · · ·U (k−1)nT fk−1︸︷︷︸

→∫f1 dµ···

∫fk−1 dµ

.

8.1 Cutting and stacking

For each n ∈ N, let I(n)1 , . . . , I

(n)N(n) be a collection of subintervals I

(n)j = [anj , b

nj ) ⊂ [0, 1).

Suppose they satisfy the following properties.

(1) For each fixed n, I(n)j are pairwise disjoint and they have the same length.

(2) length(I(n)1 )→ 0 as n→∞ and

µ

[0, 1) \N(n)⋃j=1

I(n)j

→ 0 as n→∞.

(3)N(n)⋃j=1

I(n)j ⊂

N(n+1)⋃j=1

I(n+1)j .

(4) If I(n+1)j′ ∩ I(n)

j 6= ∅ for some n, j, j′ then I(n+1)j′ ⊂ I(n)

j and if j 6= N(n) then a(n+1)j′+1 −

a(n+1)j′ = a

(n)j+1 − a

(n)j .

Lemma. Given I(n)j satisfying the above conditions, there is a unique (up to a set measure

0) transformation T : [0, 1)→ [0, 1) such that T maps I(n)j onto I

(n)j+1 by a translation.That

is,

T |I(n)j

: x 7→ x+ a(n)j+1 − a

(n)j .

This holds for all n and for all j = 1, . . . , N(n)−1. Moreover T preserves Lebesgue measureon [0, 1).

For almost every point x, eventually x ∈ I(n)j with j 6= N(n), then we know I

(n)j → I

(n)j+1 is

a translation, giving Tx.


Definition (Chacon map). Let I(1)1 = [0, 2

3 ), N(1) = 1. Suppose that I(n)j are defined for

some n. Cut I(n)j into 3 equal pieces. Define

• I(n+1)j : the piece on the left hand side

• I(n+1)j+N(n): the piece in the middle

• I(n+1)j+2N(n)+1: the piece on the right hand side.

• N(n+ 1) = 3N(n) + 1.

• Finally, define I(n+1)2N(n)+1: cut off an interval of the same length as the other I

(n+1)j ’s

from the remaining part of [0, 1) \⋃N(n)j=1 I

(n)j , and let this be I

(n+1)2N(n)+1.

Lemma. ConsiderLecture 13 ([0, 1),B,m, T ) where m is the Lebesgue measure and T is the Chaconmap. Then ∀n, j:

m(I

(n)j ∩ TN(n)I

(n)j

)≥ 1

3m(I

(n)j

)m(I

(n)j ∩ TN(n)+1I

(n)j

)≥ 1

3m(I

(n)j

)

Proof. Note

I(n)j = I

(n+1)j ∪ I(n+1)

j+N(n) ∪ I(n+1)j+2N(n)+1

and TN(n)I(n+1)j = I

(n+1)j+N(n) and TN(n)+1I

(n+1)j+N(n) = I

(n+1)j+2N(n)+1. Thus I

(n)j ∩ TN(n)I

(n)j ⊇

I(n+1)j+N(n) and I

(n)j ∩ TN(n)+1I

(n)j ⊇ I(n+1)

j+2N(n)+1.

Theorem. ([0, 1),B,m, T ) is not mixing.

Proof. Consider A = I(2)1 (recall I

(1)1 = [0, 2

3 ) ), m(A) = 29 . For n ≥ 2, A is the union of

some intervals of the form I(n)j . Apply the lemma to each of these.

m(A ∩ TN(n)A) ≥ 1

3m(A) =

3

2m(A)2

This contradicts the definition of mixing.

Theorem. ([0, 1),B,m, T ) is weak mixing.

Proof. Let f ∈ L2 such that λ ◦ T = λf a.e. We aim to show that f is constant a.e. Notethat |λ| = 1. Fix ε > 0 sufficiently small: ε < 1

12 , also

m(x ∈ [0, 1) : |f(x)| > 2) > 10ε.

This might require replacing f by a suitable constant multiple.By Luzin’s theorem, there is h : [0, 1]→ C continuous such that

m(x ∈ [0, 1) : f(x) 6= h(x)) < ε.


Then if n is sufficiently large, then |h(x1)−h(x2)| < ε whenever |x1−x2| < 23n , in particular

this holds when x1, x2 ∈ I(n)j for some j.

The intervals I(n)j fill at least half of [0, 1). Therefore ∃j such that m(x ∈ I(n)

j |f(x) 6=h(x)) < 2εm(I

(n)j ). Let z = f(x) for some x ∈ I(n)

j such that f(x) = h(x).

m(x ∈ Ij : |f(x)− λN(n)z| < ε︸︷︷︸B

) ≥ m(y ∈ I(n)j : TN(n)y ∈ I(n)

j andf(y) = h(y)︸︷︷︸A

)

≥ m(I(n)j ∩ TN(n)I

(n)j )−m(y ∈ I(n)

j : f(y) 6= h(y))

≥ 1

3m(I

(n)j )− 2εm(I

(n)j ).

(If y ∈ A, then x = TN(n)y ∈ I(n)j and f(TN(n)y) = λN(n)f(y) = λN(n)h(y), and the LHS

is exactly f(x). Since |h(y)− z| < ε, |f(x)− λN(n)z| < ε. So x ∈ B.)We will show in a minute that |z| ≥ 1. Then |1−λN(n)| < 2ε. Same argument using the

second claim in the lemma gives

|1− λN(n+1)| < 2ε.

Hence |λN(n) − λN(n)+1| < 4ε, and |1− λ| < 4ε. Taking ε↘ 0, we get λ = 1.Lecture 14 Recall:

m(x ∈ I(n)j : ||f(x)| − |z|| > ε) < 2εm(I

(n)j ).

Using T i−jI(n)j = I

(n)i , and

|f(T i−jx)| = |f(x)| (∗)we get

m(x ∈ I(n)j : |f(x)− λi−jz| > ε) < 2εm(I

(n)i )

Summing for i = 1, . . . , N(n), we get

m(x ∈ [0, 1)) ≥ (1− 2ε)

N(n)∑i=1

m(I(n)i )

= (1− 2ε)(1− 1

3n).

If n is sufficiently large, then

m(x ∈ [0, 1) : |f(x)| ≤ |z|+ ε) ≥ 1− 10ε.

comparing this with our assumption on ε at the beginning of the proof, we get |z| ≥ 1, soλ = 1.

Thus fTi−jx)=f(x). The same argument using this instead of (∗) gives:

m(x ∈ [0, 1)|f(x)− zn,ε ≤ ε) ≥ (1− 2ε)(1− 1

3n) (∗∗)

Choose sequences εm → 0, nm → ∞ such that znm,em converges to a complex number z.(Possible by Bolzano-Weierstrass). If m is sufficiently large depending on δ > 0, then

m(x ∈ [0, 1) : |f(x)− z| < δ) > 1− δ

if we use (∗∗) for nm and εm.


9 Entropy

Definition (Bernoulli shift). Let (p1, p2, . . . , pd) be a probability vector. The (p1, . . . , pd)-Bernoulli shift is the measure preserving system

({1, . . . , d}Z,B, µp1,...,pd , σ)

where σ is the shift map and µp1,...,pd is the product measure with (p1, . . . , pd) on thecoordinates.

This is just a special case of the Markov shift.

Definition (Isomorphic). Measure preserving systems (X1,B1, µ1, T1) and (X2,B2, µ2, T2)are called isomorphic if there are maps

S1 : X1 −→ X2

S2 : X2 −→ X1

such that S1 ◦ S2 = idX2and S2 ◦ S1 = idX1

, the pushforwards S1∗µ1 = µ2, S2∗µ2 = µ1,and T1 ◦ S2 = S1 ◦ T2 almost everywhere.

Question: Are the (12 ,

12 ) and ( 1

3 ,13 ,

13 ) Bernoulli shifts isomorphic? This question was

open for a long time, but was solved by Kolmogorov when he defined the notion of entropy.

• These two shifts have ‘the same’ Koopman operators, so attempts using operatortheory are unlikely to succeed.

• After Kolmogorov’s proof, Meshalkin proved ( 14 ,

14 ,

14 ,

14 ) and ( 1

2 ,18 ,

18 ,

18 ,

18 ) Bernoulli

shifts are isomorphic. At first glance, these two shifts do not appear more differentthan the earlier two shifts, but these have the same entropy.

• Much later, Ornstein proved two systems have the same entropy if and only if theyare isomorphic.

Roughly, the entropy measures how difficult it is to predict the next element of the orbitif you know a long segment.

Definition (Partition). Let (X,B, µ) be a probability space. A countable measurable par-tition is a collection of measurable sets A1, A2, . . . such that Ai ∩Aj = ∅ for all i 6= j and⋃Ai = X. The sets Ai are called the atoms of the partition. The join or coarsest common

refinement of two countable measurable partitions ξ, η is

ξ ∨ η = {A ∩B | A ∈ ξ,B ∈ η } .

Definition (Entropy). Define

H(p1, . . . , pd) = −p1 log p1 − · · · − pd log pd

for all probability vectors (p1, . . . , pd) with the convention 0 log 0 = 0 (and extended tocountable vectors also). The entropy of a countable measurable partition ξ is

Hµ(ξ) = H(µ(A1), µ(A2), . . . )


where ξ = {A1, A2, . . . }The conditional entropy of ξ = {A1, A2, . . . } relative to η = {B1, B2, . . . } is

Hµ(ξ | η) =

∞∑n=1

µ(Bn) ·H(µ(A1 ∩Bn)

µ(Bn),µ(A2 ∩Bn)

µ(Bn), . . .

).

Lemma.

(1) Hµ(ξ) ≥ 0.

(2) The value of Hµ(ξ) is maximal among partitions ξ with k atoms if all atoms have thesame measure 1

k .

(3)Hµ({A1, A2, . . . , Ak}) = Hµ(Aρ1 , . . . , Aρk)

for all permutations ρ : {1, . . . , k} → {1, . . . , k}.

(4) Hµ(ξ ∨ η) = Hµ(ξ) +Hµ(η | ξ), sometimes called the chain rule.

Definition (Convex). ALecture 15 function [a, b] → R ∪ {∞} is convex if ∀x ∈ (a, b),∃αx ∈ R suchthat

f(y) ≥ f(x) + αx(y − x) ∀y ∈ (a, b),

f is strictly convex if equality occurs only for x = y.

Remark. If f is C2((a, b)) and C([a, b]) and f ′′(x) > 0 for all x ∈ (a, b) then f is strictlyconvex.

Lemma (Jensen’s inequality). Let f : [a, b]→ R∪{∞} be a convex function. Let p1, p2, . . .be a probability vector (possibly infinite). Let x1, x2, . . . ∈ [a, b]. Then

f(p1x1 + p2x2 + · · · ) ≤∑i

pif(xi).

If f is convex, then equality occurs iff those xi for which pi > 0 coincide.

Proof. The proof of Jensen’s inequality is not part of this course, but is still worth reading.

Proof of lemma. (1) is immediate by definition. For (2), let (X,B, µ) be a probability space,let ξ be a measurable partition with k atoms. Apply Jensen’s inequality to the functionx 7→ x log x with weights pi = 1

k and the points µ(Ai) where Ai are the atoms of ξ. Note∑piµ(Ai) =

1

k

∑µ(Ai) =

1

k1

klog

1

k≤∑ 1

kµ(Ai) logµ(Ai)

log k ≥∑−µ(Ai) logµ(Ai) = Hµ(ξ).

(3) is immediate from definition, and we will prove (4) in greater generality below.


Definition (Information function). Let (X,B, µ) be a probability space. Let ξ be a count-able measurable partition. The information function of ξ is

Iµ(ξ) : X −→ R ∪ {∞}x 7−→ − logµ([x]ξ),

where [x]ξ is the tom of ξ where ξ belongs.If η is another partition, then the conditional information fn of ξ relative to η is

Iµ(ξ | η)(x) = − logµ([x]ξ∨η)

µ([x]η).

Lemma. With notation as above,

Hµ(ξ) =

∫Iµ(ξ) dµ

Hµ(ξ | η) =

∫Iµ(ξ | η) dµ

Proof. The first is a special case of the second, where η = {X}, so we only prove the second.∫Iµ(ξ | η) dµ =

∑A∈ξ,B∈η

∫A∩B

Iµ(ξ | η) dµ

=∑

A∈ξ,B∈η

µ(A ∩B) logµ(A ∩B)

µ(B)

=∑B∈η

µ(B) ·∑A∈ξ

−µ(A ∩B)

µ(B)log

µ(A ∩B)

µ(B).

Lemma (Chain rule). Let (X,B, µ) be a probability space and let ξ, η, λ be countablemeasurable partitions. Then

Iµ(ξ ∨ η | λ)(x) = Iµ(ξ | λ)(x) + Iµ(η | ξ ∨ λ)(x) ∀xHµ(ξ ∨ η | λ)(x) = Hµ(ξ | λ)(x) +Hµ(η | ξ ∨ λ)(x) ∀x

Proof. Note the second immediately follows from the first by the previous lemma.

Iµ(ξ ∨ η | λ)(x) = − logµ([x]ξ∨η∨λ)

µ([x]λ)

Iµ(ξ | λ)(x) = − logµ([x]ξ∨λ)

µ([x]λ)

Iµ(η | ξ ∨ λ)(x) = − logµ([x]ξ∨η∨λ)

µ([x]ξ∨λ).

Lemma. Let notation be as above. Then

Hµ(ξ | η) ≥ Hµ(ξ | η ∨ λ).


Proof.

Hµ(ξ | η ∨ λ) =∑

A∈ξ,B∈η,C∈λ

µ(A ∩B ∩ C) logµ(B ∩ C)

µ(A ∩B ∩ C)

Hµ(ξ | η) =∑

A∈ξ,B∈η

µ(A ∩B) logµ(B)

µ(A ∩B).

It is enough to show that for all fixed A ∈ ξ, B ∈ η, we have:

µ(A ∩B) logµ(B)

µ(A ∩B)≥∑C∈λ

µ(A ∩B ∩ C) logµ(B ∩ C)

µ(A ∩B ∩ C).

Apply Jensen’s inequality for x 7→ x log x at the points µ(A∩B∩C)µ(B∩C) for C ∈ λ with weights

µ(B∩C)µ(B) . Write

∑C∈λ

µ(B ∩ C)

µ(B)· µ(A ∩B ∩ C)

µ(B ∩ C)=

1

µ(B)

∑C∈λ

µ(A ∩B ∩ C) =µ(A ∩B)

µ(B)

thusµ(A ∩B)

µ(B)· log

µ(A ∩B)

µ(B)≤∑C∈λ

µ(B ∩ C)

µ(B)· µ(A ∩B ∩ C)

µ(B ∩ C)· log

µ(A ∩B ∩ C)

µ(B ∩ C)

µ(A ∩B) · logµ(A ∩B)

µ(B)≤∑C∈λ

µ(A ∩B ∩ C) · logµ(A ∩B ∩ C)

µ(B ∩ C).

Corollary.Hµ(ξ) ≤ Hµ(ξ ∨ η) ≤ Hµ(ξ) +Hµ(η).

Proof. Using the chain rule:

Hµ(ξ ∨ η) = Hµ(ξ) +Hµ(η | ξ).

Lemma. LetLecture 16 (X,B, µ, T ) be a measure preserving system. Let ξ, η be countable measurablepartitions. Then

Iµ(T−1ξ | T−1η)(x) = Iµ(ξ | η)(Tx)

Hµ(T−1ξ | T−1η) = Hµ(ξ | η)

where T−1ξ is the partition whose atoms are T−1([x]ξ).

Proof.

Iµ(T−1ξ | T−1η)(x) = − logµ([x]T−1ξ∨T−1η)

µ([x]T−1η)

Note

T−1ξ ∨ T−1η = T−1(ξ ∨ η)

[x]T−1ξ∨T−1η = T−1[Tx]ξ∨η.


µ([x]T−1ξ∨T−1η) = µ([Tx]ξ∨η) by the measure preserving property. Similarly,

µ([x]T−1η) = µ([Tx]η).

Then

Iµ(T−1ξ | T−1η)(x) = − logµ([Tx]ξ∨η)

µ([Tx]η)

= Iµ(ξ | η)(Tx).

Corollary. Writing ξnm = T−mξ ∨ T−(m+1)ξ ∨ · · · ∨ T−nξ,

Hµ(ξn+m−10 ) ≤ Hµ(ξn−1

0 ) +Hµ(ξm−10 ).

Proof. Note that ξn+m−10 = ξn−1

0 ∨ ξn+m−1n . So we have

Hµ(ξn+m−10 ) ≤ Hµ(ξn−1

0 ) +Hµ(ξn+m−1n )

= Hµ(ξn−10 ) +Hµ(T−nξm−1

0 )

= Hµ(ξn−10 ) +Hµ(ξm−1

0 ).

Lemma (Fekete’s lemma). Let (an) ⊂ R be a subadditive sequence, i.e.

an+m ≤ an + am ∀n,m.

Then limn→∞ann = inf ann .

Proof sketch. Need to show that lim supn→∞ann ≤

an0

n0for all n0. For each fixed n0, we

can write n = j(n)n0 + i(n), where 0 ≤ i(n) < n0. Iterate subadditivity: an ≤ j(n)an0+

ai(n).

Definition (Entropy of a system). Let (X,B, µ, T ) be a measure preserving system. Let ξbe a countable measurable partition such that Hµ(ξ) < ∞. The entropy of the measurepreserving system with respect to ξ is

hµ(T, ξ) = limn→∞

Hµ(ξn−10 )

n= inf

Hµ(ξn−10 )

n.

The entropy of the measure preserving system is

hµ(T ) = sup {hµ(T, ξ) | Hµ(ξ) <∞}

Definition (2-sided generator). Let (X,B, µ, T ) be an invertible measure preserving system.Let ξ ⊂ B be a countable measurable partition. We say that ξ is a 2-sided generator if∀A ∈ B and ∀ε > 0, ∃k ∈ Z>0 such that ∃A′ ∈ B(ξk−k) (the σ-algebra generated by ξk−k)and µ(A4A′) < ε.

Theorem (Kolmogorov-Sinai). Let (X,B, µ, T ) be an invertible measure preserving system.Let ξ be a countable measurable partition with Hµ(ξ) < ∞ which is a 2-sided generator.Then hµ(T ) = hµ(T, ξ).

We return to the proof later; for now we calculate the entropy of Bernoulli shift.Let ({1, 2, . . . , k}Z,B, µ, σ) be the (p1, . . . , pk)-Bernoulli shift. Claim: The partition

ξ = { {x ∈ X | x0 = j } | j = 1, . . . , k } is a 2-sided generator.


Proof. The collection of sets

{A ∈ B | ∀ε ∃k ∃A′ ∈ ξk−k with µ(A4A′) < ε }

is a σ-algebra, and it contains cylinder sets. Hence it is equal to B.

Claim. With ξ defined as before, we have

Hµ(ξ | ξn1 ) = H(p1, p2, . . . , pk)

= −p1 log p1 − · · · − pk log pk

for all n ∈ Z≥0.

Proof. Calculate the information function:

Iµ(ξ | ξn1 )(x) = logµ([x]ξn1 )

µ([x]ξn0 )

Note [x]ξn0 = { y ∈ X | y0 = x0, y1 = x1, . . . , yn = xn }. µ([x]ξn0 ) = px0· · · pxn . Similarly

µ([x]ξn1 ) = px1· · · pxn

Iµ(ξ | ξn1 )(x) = − log px0

Hµ(ξ | ξn1 )(x) =

k∑j=1

pj · (− log pj) = H(p1, . . . , pk).

Hµ(ξn−10 ) = Hµ(ξn−1

n−1) +H(ξn−2n−2 | ξ

n−1n−1) +H(ξn−3

n−3 | ξn−1n−2) + · · ·+Hµ(ξ | ξn−1

1 )

= Hµ(ξ) +Hµ(ξ | ξ11) +Hµ(ξ | ξ2

1) + · · ·+Hµ(ξ | ξn−11 )

= nH(p1, . . . , pk).

Divide by n and take the limit:

hµ(T ) = hµ(T, ξ) = H(p1, . . . , pk).

Thus, the entropy of ( 12 ,

12 ) shift is log 2, and entropy of ( 1

3 ,13 ,

13 ) shift is log 3.

Lemma 1.Lecture 17 Let (X,B, µ, T ) be an invertible measure preserving system. Let ξ ⊂ B be acountable measurable partition. Then

hµ(T, ξn−n) = hµ(T, ξ).

Lemma 2. Let (X,B, µ, T ) be a measure preserving system. Let ξ, η ⊂ B be two countablemeasurable partitions. Then:

hµ(T, η) ≤ hµ(T, ξ) +Hµ(η | ξ)

Lemma 3. For all ε > 0, k ∈ Z>0, there is δ > 0 such that the following holds. Let (X,B, µ)be a probability space. Let ξ ⊂ B be a countable and η ⊂ B a finite partition. Suppose thatη has k atoms and for each A ∈ η, ∃B ∈ B(ξ) such that µ(A4B) < δ. Then Hµ(η | ξ) ≤ ε.


Proof of Kolmogorov-Sinai theorem. We first show hµ(T, ξ) = supη finite hµ(T, η). We needto show that for all finite partitions η ⊂ B, we have hµ(T, η) ≤ hµ(T, ξ). Fix ε > 0. ByLemma 3, and definition of 2-sided generator, ∃n such that Hµ(η | ξn−n) ≤ ε. Then we have

hµ(T, η) ≤ hµ(T, ξn−n) + ε

by Lemma 2. e are done if ε sufficiently small, by Lemma 1.Now it is left to show that ∀ε > 0, for every countable partition η ⊂ B, ∃η ⊂ B finite

such that hµ(T, η) ≤ hµ(T, η) + ε. By Lemma 2, enough to show Hµ(η | η) ≤ ε. Letη = (A1, A2, . . . , An,

⋃j>nAj) for sufficiently large n, when η = {A1, A2, . . . }. Then

Hµ(η) = Hµ(η ∨ η) = Hµ(η) +Hµ(η | η)

Thus

Hµ(η | η) =∑j>n

−µ(Aj) logµ(Aj) + µ

∞⋃j=n+1

Aj

· logµ

∞⋃j=n+1

Aj

≤∑j>n

−µ(Aj) logµ(Aj) ≤ ε

if n is sufficiently large.

Proof of Lemma 1.


1

nHµ(ξn−1

0 )

hµ(T, ξm−m) = limn→∞

1

nHµ(ξn+m−1−m)

= limn→∞

1

nHµ(ξn+2m−1

0 )

= limn→∞

1

n+ 2m− 1Hµ(ξn+2m−1

0 ) = hµ(T, ξ).

Proof of Lemma 2.

hµ(T, η) = lim1

nHµ(ηn−1

0 ) ≤ lim1

nHµ((ξ ∨ η)n−1

0 )

= lim1

n

(Hµ(ξn−1

0 ) +Hµ(ηn−10 | ξn−1

0 ))

= hµ(T, ξ) + lim1

n

n−1∑j=0

Hµ

(η′j | ξn−1

0 ∨ ηn−1j+1

)≤ hµ(T, ξ) + lim

1

n

n−1∑j=0

Hµ(ηjj | ξjj )︸︷︷︸

Hµ(η|ξ)

= hµ(T, ξ) +Hµ(η | ξ).

Proof of Lemma 3. Write A1, . . . , Ak for the atoms of η. And for each i ∈ {1, . . . , k}, letBi ∈ B(ξ) such that µ(Ai 4 Bi) < δ. We consider the partition λ which has the following


k + 1 atoms.

C0 =

k⋃i=1

Ai ∩Bi \⋃

j 6=i

Bj

Ci = Ai \ C0 for i = 1, . . . , k.

C0 has two crucial properties we note here:

– C0 is big:

µ(C0) ≥∑

(µ(Ai)− kδ)

= 1− k2δ

– If x ∈ C0, then x ∈ Ai ⇐⇒ x ∈ Bi.

Note

Hµ(λ) = −µ(C0) logµ(C0)−k∑i=1

µ(Ci) logµ(Ci)

≤ −µ(C0) logµ(C0)−

(k∑i=1

µ(Ci)

)log

(∑ki=1 µ(Ci)

k

).

If δ is sufficiently small, then µ(C0) is as close to 1 as we want, and∑ki=1 µ(Ci) is as close

to 0 as we want. Then Hµ(λ) < ε if δ is sufficiently small. Now it is enough to show thatHµ(η | ξ ∨ λ) = 0. Then,

Hµ(η | ξ) ≤ Hµ(η ∨ λ | ξ)≤ Hµ(λ | ξ)︸︷︷︸≤Hµ(λ)<ε

+Hµ(η | ξ ∨ λ)︸︷︷︸=0

It is now enough to showIµ(η | ξ ∨ λ)(x) = 0 ∀x.

To this end, it is enough to show

[x]ξ∨λ ⊂ [x]η ∀x.

We consider two cases.

1. x ∈ C0. Let Ai = [x]η. This means that x ∈ Ai. Because x ∈ C0, we must havex ∈ Bi. Now C0, Bi ∈ B(ξ ∨ λ). Then [x]ξ∨λ ⊂ C0 ∩Bi ⊂ Ai.

2. x ∈ Ci for some I. Then [x]η = Ai.

x ∈ Ci ⇒ [x]η∨λ ⊂ Ci ⊂ Ai.


Theorem (Shannon-McMillan-Brieman). LetLecture 18 (X,B, µ, T ) be an ergodic measure preservingsystem. Let ξ ⊂ B be a countable measurable partition with Hµ(ξ) <∞. Then

I(ξN−10 )

N→∞−−−−→ hµ(T, ξ)

pointwise µ-almost everywhere and in L1.

RecallI(ξN−1

0 )(x) = − logµ([x]ξN−10

),

so this theorem is loosely saying

exp(−Nhµ(T, ξ)) ≈ µ([x]ξN−10 ).

Idea: using the chain rule and then invariance:

1

NIµ(ξN−1

0 )(x) =1

N

N−1∑n=0

Iµ(ξnn | ξN−1n+1 )(x)

=1

N

N−1∑n=0

Iµ(ξ | ξN−n−11 )(Tnx).

Definition. Let (X,B, µ) be a probability space, let A ⊂ B be a sub-σ-algebra. Then forall f ∈ L1(X,B, µ), ∃f∗ ∈ L1(X,B, µ) such that

(1) f∗ is A-measurable

(2)∫Af dµ =

∫Af∗ dµ ∀A ∈ A.

If f∗1 and f∗2 both satisfy (1) and (2) in the role of f∗, then f∗1 = f∗2 µ-almost everywhere.The function f∗ is called the conditional expectation of f relative to A and it is denoted

by E[f | A].

Remark. Writing VA for the closed subspace of L2(X,B, µ) consisting of A-measurablefunctions, E[f | A] is the orthogonal projection of f to VA provided f ∈ L2. If A isgenerated by a countable partition ξ, then

E[f | A](x) = µ([x]ξ)−1

∫[x]ξ

f dµ.

Theorem. Let (X,B, µ) be a probability space, let

f, f1, f2 ∈ L1(X,B, µ), A,A1,A2 ⊆ B be sub-σ-algebras.

Then

(1) E[f1 + f2 | A] = E[f1 | A] + E[f2 | A]

(2) If f1 is A-measurable, then

E[f1f2 | A] = f1E[f2 | A]

(3) If A1 ⊆ A2 thenE[f | A1] = E[E[f | A2] | A1].


Theorem (Martingale theorems). Let (X,B, µ) be a probability space, let f ∈ L1 and letA,A1,A2, . . . ⊂ B be sub-σ-algebras. Assume that either

(1) A1 ⊂ A2 ⊂ A3 ⊂ · · · and A = B(A1 ∪ A2 ∪ · · · )

or

(2) A1 ⊃ A2 ⊃ A3 ⊃ · · · and A =⋂An

Thenlimn→∞

E[f | An] = E[f | A]

pointwise µ-almost everywhere and in L1.

We now aim to define conditional information and entropy relative to σ-algebras.

Definition (Conditional information relative to sub-σ-algebra). Let (X,B, µ) be a proba-bility space, let η ⊂ B be a countable partition, let A ⊂ B be a sub-σ-algebra. Then theconditional information function of η relative to A is

Iµ(η | A) =∑A∈η−χA · logE[χA | A].

Example. Suppose that A is generated by a countable partition ξ ⊂ B. Then:

Iµ(η | A) =∑A∈η−χA(x) logE[χA | A]

= − logE[χ[x]η | A](x)

= − log1

µ([x]ξ)

∫[x]ξ

χ[x]η dµ

= − logµ([x]η ∩ [x]ξ)

µ([x]ξ)= − log

µ([x]η∨ξ)

µ([x]ξ).

Definition.

Hµ(η | A) :=

∫Iµ(η | A) dµ.

Theorem (Maximal inequality). LetLecture 19 (X,B, µ) be a probability space, and let

A1 ⊆ A2 ⊆ · · ·

be a sequence of sub-σ-algebras of B and let η ⊂ B be another countable partition withHµ(η) <∞. Then

Iµ(η | An)→ Iµ(η | A)

as n→∞ pointwise and in L1, where A = B(⋃An). Moreover,

I∗(x) := supn∈Z>0

Iµ(η | An) ∈ L1(X,B, µ). (∗)


Proof. We do only the case where η is a finite partition, the countable case is dealt with onthe example sheet. By definition and martingale convergence:

Iµ(η | An)(x) = − logE[χ[x]η | An](x)n→∞−−−−→ − logE[χ[x]η | A](x)

= Iµ(η | A)(x).

This proves pointwise convergence.By dominated convergence, L1 convergence follows by (∗). Enough to show (∗). Fix

A ∈ η. Fix α > 0. For x ∈ X, let n(x) be the smallest positive integer n such that

− log(E[χA | An](x)) > α.

If the above does not hold for any n, then we write n(x) =∞. We write:

Bn = {x ∈ X : n(x) = n } .

Note that Bn is An-measurable. Indeed

Bn = {x ∈ X | E[χA | An](x) < exp(−α) but E[χA | Aj ](x) ≥ exp(−α) ∀j < n }

Then: ∫Bn

E[χA | An] dµ =

∫Bn

χA = µ(Bn ∩A)

by definition of conditional expectation, but

µ(Bn) · exp(−α) ≥∫Bn

E[χA | An] dµ.

Define

A∗ = {x ∈ A | I∗(x) > α } ⊂ A ∩(⋃

Bn

)so

µ(A∗) ≤∞∑n=1

µ(A ∩Bn) ≤ exp(−α)

∞∑n=1

µ(Bn) ≤ exp(−α)

µ({x ∈ X | I∗(x) > α }) ≤ |η| exp(−α).

Now, ∫I∗(x) dµ ≤

∞∑n=1

µ(x ∈ X | I∗(x) > n− 1) · n

≤∞∑n=1

|η| exp(−(n− 1))n <∞.

Lemma. Let (X,B, µ, T ) be a measure preserving system. Let ξ ⊂ B be a countablepartition with Hµ(ξ) <∞. Then


Hµ(ξ | ξn1 ).


Proof.

1

nHµ(ξn−1

0 ) =1

n

n−1∑j=0

Hµ(ξ | ξj1)

hµ(T, ξ) = C-limn→∞

Hµ(ξ | ξn1 )

Observe that Hµ(ξ | ξn1 ) is a monotone decreasing sequence, hence it converges. Then theCesaro limit is equal to the ordinary limit, so done.

Proof of Shannon-McMillan-Brieman.

1

NIµ(ξN−1

0 )(x) =1

N

N−1∑n=0

Iµ(ξnn | ξN−1n+1 )(x)

=1

N

N−1∑n=0

Iµ(ξ | ξN−n−11 )(Tnx)

=1

N

N−1∑n=0

Iµ(ξ | B(ξ∞1 ))(Tnx)

+1

N

N−1∑n=0

(Iµ(ξ | ξN−n−1

1 )(Tnx)− Iµ(ξ | B(ξ∞1 ))(Tnx))

︸︷︷︸(∗∗)

where B(ξ∞1 ) is the σ-algebra generated by⋃∞n=1 ξ

n1 .

By the Pointwise Ergodic Theorem:

1

N

N−1∑n=0

Iµ(ξ | B(ξ∞1 ))(Tnx)→∫Iµ(ξ | B(ξ∞1 )) dµ

= limn→∞

∫Iµ(ξ | ξn1 ) dµ

= limn→∞

H(ξ | ξn1 )

= hµ(T, ξ).

WriteI∗K(x) = sup

k≥K

∣∣Iµ (ξ | ξk1 ) (x)− Iµ (ξ | B(ξ∞1 )) (x)∣∣ .

By the maximal inequality: I∗K ∈ L1 for all K, and I∗K(x)→ 0 for µ-almost every x. I∗K arepointwise monotone decreasing, so we have∫

I∗K dµ→ 0 as K →∞.

Now,

|(∗∗)| ≤ 1

N

N−K−1∑n=0

I∗K(Tnx)︸︷︷︸→

∫I∗K dµ

+1

N

N∑n=N−K

I∗0 (Tnx).


The right hand term is

1

N

N1∑n=0

I∗0 (Tnx)︸︷︷︸→

∫I∗0 dµ

− N −KN︸︷︷︸→1

1

N −K

N−K−1∑n=0

I∗0 (Tnx)︸︷︷︸→

∫I∗0 dµ

which converges to 0. Thus lim supN→∞ |(∗∗)| ≤∫I∗K dµ. K is arbitrary, and∫

I∗K dµK→∞−−−−→ 0,

so lim supN→∞ |(∗∗)| = 0.

Lecture 20 Missing contentLecture 21 Left to prove: If hµ(T, ξ) > 0 ∀ξ ∈ B finite with Hµ(ξ) > 0 then T (ξ) istrivial ∀ξ ⊆ B finite.

Proposition. Let (X,B, µ, T ) be a measure preserving system. Let ξ, η ⊂ B be two finitepartitions. Then

hµ(T, ξ) = Hµ(ξ | B(ξ∞1 ) ∨ T (η)).

Lemma. Let (X,B, µ, T ) be a measure preserving system. Let ξ ⊂ B be a finite partition.Then

hµ(T, ξ) =1

nHµ(ξn−1

0 | B(ξ∞n ))

Proof. By chain rule and invariance:

Hµ(ξn−10 | B(ξ∞n )) =

n−1∑j=0

Hµ(ξjj | B(ξ∞j+1))

= nHµ(ξ | B(ξ∞1 ))

= nhµ(T, ξ).

Lemma. Let (X,B, µ, T ) be a measure preserving system. Let η, ξ ⊂ B be finite measurablepartitions. Then


1

nHµ(ξn−1

0 | B(ξ∞n ) ∨ B(η∞n ))

Proof. Observe that ∀n ∈ Z>0,

Hµ(ξn−10 | B(ξ∞n ) ∨ B(η∞n )) ≤ Hµ(ξn−1

0 | B(ξ∞n ))

= nhµ(T, ξ).

We do the following. We choose two sequences an and bn such that an ≤ bn ∀n and

lim1

n[Hµ(ξn−1

0 | B(ξ∞n ) ∨ B(η∞n )) + an] = lim1

n[Hµ(ξn−1

0 | B(ξ∞n )) + bn]. (1)

Suppose to the contrary that

lim inf1

n[Hµ(ξn−1

0 | B(ξ∞n ) ∨ B(η∞n ))] < hµ(T, ξ) = Hµ(ξn−10 | B(ξ∞n )).


Then an ≤ bn and this contradicts (1). This means that

hµ(T, ξ) ≤ lim inf1

n[Hµ(ξn−1

0 | B(ξ∞n ) ∨ B(η∞n ))]

≤ lim sup1

n[Hµ(ξn−1

0 | B(ξ∞n ) ∨ B(η∞n ))]

≤ hµ(T, ξ).

This proves the lemma.We take

an = Hµ(ηn−10 | B(ξ∞0 ) ∨ B(η∞n ))

bn = Hµ(ηn−10 | B(ξ∞0 )).

Clearly an ≤ bn indeed.

Hµ(ξn−10 | B(ξ∞n ) ∨ B(η∞n )) + an = Hµ(ξn−1

0 ∨ ηn−10 | B(ξ∞n ) ∨ B(η∞n ))

Hµ(ξn−10 | B(ξ∞n )) + bn = Hµ(ξn−1

0 ∨ ηn−10 | B(ξ∞n ))

By the earlier lemma,

Hµ(ξn−10 | B(ξ∞n ) ∨ B(η∞n )) + an

n= hµ(T, ξ ∨ η)

so we have

hµ(T, ξ∨η) =Hµ(ξn−1

0 | B(ξ∞n ) ∨ B(η∞n )) + ann

=Hµ(ξn−1

0 | B(ξ∞n )) + bnn

≤ Hµ(ξn−10 ∨ ηn−1

0 )

n→ hµ(T, ξ∨η).

Suppose that hµ(T, ξ) > 0 ∀ξ ⊂ B finite with Hµ(ξ) > 0. Suppose to the contrary thatT (ξ) is non-trivial for some ξ ⊂ B finite. Then ∃A ⊂ T (ξ) such that 0 < µ(A) < 1. Defineη = {A,X \A}. Apply the proposition with ξ ↔ η:

hµ(T, η) = Hµ(η | B(η∞1 ) ∨ T (ξ)).

We prove that hµ(T, η) = 0, which is contradiction, as Hµ(η) > 0.In fact we will show

Iµ(η | B(η∞1 ) ∨ T (ξ))(x) = 0

for a.e. x.

Iµ(η | B(η∞1 ) ∨ T (ξ)) = − logE[ξ[x]η | B(η∞1 ) ∨ T (ξ)]

= − log ξ[x]η = 0.

10 Rudolph’s Theorem

Theorem. Let µ be a probability measure on R/Z that is invariant and ergodic with respectto the joint action of T2 : x 7→ 2x and T3 : x 7→ 3x. That is to say, if A ⊂ B such thatT−1

2 A = A and T−13 A = A, then µ(A) = 0 or 1. Suppose that hµ(T2) or hµ(T3) > 0. Then

µ is the Lebesgue measure on R/Z.


Theorem (Host). Let µ be an ergodic T3-invariant measure on R/Z with hµ(T3) > 0.Then µ-a.e. x ∈ R/Z is normal in base 2, i.e. the sequence Tn2 x is equidistributed in R/Z.Moreover, if µ is not necessarily ergodic, but the other assumptions hold, then

µ(x ∈ R/Z : x is normal in base 2) ≥ hµ(T3)

log 3.

Lecture 22

Lemma. We have ord3k(2) = 2 · 3k−1, where ord3k(2) is the order of 2 mod 3k, that is, thesmallest integer n such that 3k | 2n − 1.

Proof. By induction on k, we prove that ord3k(2) = 2 · 3k−1 and 3k+1 - 22·3k−1 − 1 Easy tocheck for k = 0, 1. Assume that k ≥ 2 and the claim holds for k − 1. 3k | 2ord

3k(2) − 1, so

3k−1 | 2ord3k

(2) − 1, hence ord3k(2) = a · 2 · 3k−2 for some a ∈ Z>0. We also know

22·3k−1

= b · 3k−1 + 1

for some b ∈ Z>0 and 3 - b. Then by the binomial theorem, we have:

22·2·3k−2

= b2 · 32(k−1) + 2b · 3k−1 + 1

23·2·3k−2

= b3 · 3(k−1) + 3b2 · 32(k−1) + 3b · 3k−1 + 1

From this we see that 3k - 22·2·3k−2 − 1 but 3k | 23·2·3k−2 − 1 and 3k+1 - 23·2·3k−2 − 1.

Remark. Look at the T2 orbit of a3k

where a ∈ Z and 3 - a. This orbit is

{ b

3k| b ∈ [0, . . . , 3k − 1], 3 - b } .

Also Tn2 (x+ a3k

) = Tn2 (x) + Tn2 ( a3k

).Another idea: If hµ(T3) > 0, the µ ‘will not concentrate’ on a few elements of the set

{x+a

3k: a ∈ [0, . . . , 3k − 1], 3 - a }

We need to show that for µ-a.e. x and for all f ∈ C(R/Z), we have

1

N

N−1∑n=0

f(Tn2 x)→∫f(x) dx.

In fact, it is enough to prove this for functions of the form: f(x) = exp(2πimx) for m ∈ Z(by density and linearity).

Notation. For N ∈ Z>0, m ∈ Z, let

PN,m(x) =1

N

N−1∑n=0

exp(2πim2nx).

Now it is enough to show: for all fixed m 6= 0, and for µ-a.e. x, we have PN,m(x)→ 0.


Lemma. Fix m 6= 0 and x ∈ R/Z. Let α be the largest exponent such that 3α | m. LetN < 2 · 3k−α−1 for some k ∈ Z>0. Then

3k−1∑a=0

∣∣∣PN,m(x+a

3k)∣∣∣2 =

3k

N.

Proof.

|PN,m(x)|2 =

∣∣∣∣∣ 1

N

N−1∑n=0

exp 2πim2nx

∣∣∣∣∣2

=1

N2

N−1∑n1=0

N−1∑n2=0

exp(2πim2n1x) · exp(2πim2n2x)

=1

N2

N−1∑n1=0

N−1∑n2=0

exp(2πim(2n1 − 2n2)x).

Note that for some b ∈ Z:

3k−1∑a=0

exp(2πib(x+a

3k)) = exp(2πibx) ·

3k−1∑a=0

exp(2πiba

3k) (∗)

If 3k - b, then (∗) is 0. If 3k | b, then (∗) = 3k exp(2πibx). When does it happen that3k | m · (2n1 − 2n2). Suppose n1 > n2. Then m · (2n1 − 2n2) = m · 2n2(2n1−n2 − 1). Sincen1 − n2 ≤ N < 2 · 3k−α−1, we have 3k−α - 2n1−n2 − 1 so 3k - m · (2n1 − 2n2). Same appliesif n2 > n1. We get that 3k | m(2n1 − 2n2) ⇐⇒ n1 = n2. Therefore,

|PN,m(x)|2 =1

N2

N−1∑n=0

3k =3k

N.

Lecture 23

Proposition. Let µ be an ergodic T3-invariant probability measure on R/Z. Let k ∈ Z>0.Write

µk(A) =

3k−1∑a=0

µ(A+a

3k) ∀A ∈ B.

Let ξ = {[0, 13 ), [ 1

3 ,23 ), [ 2

3 , 1)}. Let ε > 0. Then

µk([x]ξK0 ) ≥ exp((hµ(T3)− ε)k)µ([x]ξK0 )

holds for µ-almost every x provided k is sufficiently large depending on ε, x and K is suffi-ciently large depending on ε, x, k.

Remark.

[x]ξK0 = { y ∈ R/Z = [0, 1) | y = 0.y1y2y3 . . .(3) , y1 = x1, . . . , yK+1 = xK+1 }


where x = 0.x1x2x3 . . .(3). This is an interval of length 3−(K+1) containing x.

µk([x]ξK0 ) = µ(

3k−1⋃a=0

[x]ξK0 +a

3k).

Lemma. In the above setting, we have

1

kIµ(ξk−1

0 |B(ξ∞k ))(x)→ hµ(T3)

µ-almost everywhere.

Proof. Using chain rule and invariance,

1

kIµ(ξk−1

0 | B(ξ∞k ))(x) =1

k

k−1∑j=0

Iµ(ξjj | B(ξ∞j+1))(x)

=1

k

k−1∑j=0

Iµ(ξ | B(ξ∞1 ))(T jx)

by pointwise ergodic theorem →∫Iµ(ξ | B(ξ∞1 )) dµ

= Hµ(ξ | B(ξ∞1 )) = hµ(T3, ξ).

The proof will be complete with the following.

Lemma. The partition ξ is a 1-sided generator for (R/Z,B, µ, T3). That is, ∀A ∈ B andε > 0, ∃n ∈ Z>0 and B ∈ B(ξk0 ) such that µ(A4B) < ε.

Proof. Consider:

C = {A ∈ B | ∀ε > 0,∃n ∈ Z>0, B ∈ B(ξn0 ) such that µ(A4B) < ε } .

This is a σ-algebra, so it is enough to show that it contains all open sets. Let U ⊆ R/Z beopen. Claim that

U =⋃

x∈U,n∈Z>0|[x]ξn0⊂U

[x]ξn0 .

Note that this is a countable union and [x]ξn0 ∈ C ∀x∀n.

Corollary. Let µ be a T3-invariant ergodic probability measure on R/Z. Then for allk ∈ Z>0,

∃Ak ∈ B

such that

(1) µ-almost every x is contained in Ak for k sufficiently large.

(2)µk(U) ≥ exp(khµ(T3)/2)µ(Ak ∩ U)

holds for all open sets U ⊂ R/Z.


Proof of proposition. By the lemma, if k is sufficiently large depending on x, ε, then

Iµ(ξk−10 | B(ξ∞k ))(x) ≥ k(hµ(T3)− ε/2).

If K sufficiently large, then

Iµ(ξk−10 | ξKk )(x) ≥ k(hµ(T3)− ε)

by SMB, and the LHS is

logµ([x]ξKk )

µ([x]ξK0 ).

Observe[x]ξKk =

⋃a=0,...,3k−1

([x]ξK0 +a

3k)

Thus µk([x]ξK0 ) = µ([x]ξKk ), and so we have

logµk([x]ξK0 )

µ([x]ξK0 )≥ k(hµ(T3)− ε).

Proof of corollary. Let Ak be the set of points x such that

µk([x]ξK0 ) ≥ exp(k · µk(T3)/2)µ([x]ξK0 ) (∗)

for all sufficiently large K. Claim (1) follows from the proposition.To prove (2), fix k ∈ Z>0 and fix U ⊆ R/Z open. For K ∈ Z>k, write AK for the

collection of atoms [x]ξK0 such that [x]ξK0 ⊂ U and ((∗)) holds for this atom. Since U is open

and by the definition of Ak, if x ∈ Ak ∩U and K is sufficiently large, then [x]ξK0 ∈ AK . Let

BK =⋃A∈AK A.

µK(BK) ≥ exp(k · hµ(T3)/2)µ(BK)

follows by summing ((∗)). Also, µk(BK) ≤ µk(U). Now consider⋃K≥LBK for L ∈ Z>k.

These sets increase as L grows, and their union is Ak ∩ U : Recall that ∀x ∈ Ak ∩ U , wehave [x]ξK0 ∈ AK for K sufficiently large. Therefore x ∈ BK if K is sufficiently large. So

x ∈⋂K≥LBK if L is sufficiently large.

Since⋂K≥LBK ⊂ BL, we have

µk(U) ≥ · · · ≥ µ(⋂K≥L

BK).

Lecture 24 We have reduced Host’s theorem (in the ergodic case) to showing PN,m(x)→ 0 for µ-a.e.x for all m 6= 0.

Recall


Lemma (Borel-Cantelli). Let (X,B, µ) be a probability space. Let B1, B2, . . . ,∈ B. Sup-pose

∞∑n=1

µ(Bn) <∞.

Then for µ-almost every x, x ∈ Bn holds for at most finitely many n.

Lemma. For all m 6= 0 and c ∈ Z>0,

PN,m(x)N→∞−−−−→ 0 ⇐⇒ PLc,m(x)

L→∞−−−−→ 0.

Proof. It is enough to show that PLc,m(x)→ 0 as L→∞, for some C (to be chosen later)and all m 6= 0. Fix m 6= 0 and ε > 0. For N ∈ Z>0, we set k = k(N) to be an integer suchthat 2 · 3k−2−α ≤ N < 2 · 3k−1−α. By Lemma 1, we have:

3k−1∑a=0

|PN,m(x+a

3k)|2 =

3k

N≤ N

2· 3α+2/N =

3α+2

2.

By the definition of µk, we have∫|PN,m(x)|2 dµk(x) =

∫ 3k−1∑a=0

|PN,m(x+a

3k)|2 dµ ≤ 3α+2

2.

By Markov’s inequality:

µk(x : PN,m(x) > ε) ≤ 3α+2

2ε2.

By the corollary,

µ(x : |PN,m(x)| > ε, x ∈ Ak) ≤ exp(−khµ(T3)/2) · 3α+2

2ε2

where k = k(N) defined earlier. Let

Bε = {x | |PLc,m(x)| > ε for infinitely many L } .

Our goal is to show that µ(Bε) = 0, and this completes the proof. Write

DL,ε = {x | x ∈ Ak(L), |PLc,m(x)| > ε } .

Note that x ∈ Bε implies x ∈ DL,ε for infinitely many Ls. So by B-C lemma, it is enoughto show

∑∞L=1 µ(DL,ε) <∞.

Recall we have:

µ(DL,ε) ≤ exp(−k(Lc)hµ(T3)/2) · 3α+2

2ε2.

If we choose c sufficiently large, then

k(Lc) > 4 logL/hµ(T3).

So ∑µ(DL,ε) ≤

∑L−2 · 3α+2

2 · ε2<∞.


Theorem (Rudolph’s Theorem). Let µ be a probability measure on R/Z that is invariantand ergodic under the joint action of T2 and T3. If hµ(T3) > 0 then µ is the Lebesguemeasure.

Proof. WriteA = {x | normal in base 2 } .

By Host’s theorem, µ(A) > 0.First observe that x ∈ A ⇐⇒ T2x ∈ A, so A = T−1

2 A. Claim: if x ∈ A, then T3x ∈ A.Proof. If x ∈ A, then ∀f ∈ C(X), we have

1

N

N−1∑n=0

f(Tn2 x)→∫f(x) dx.

Now we can write

1

N

N−1∑n=0

f(Tn2 T3x) =1

N

N−1∑n=0

f(T3Tn2 x)

→∫f ◦ T3(x) dx =

∫f(x) dx.

using f ◦ T3 in the role of f This proves that T3x ∈ A.The claim gives A ⊂ T−1

3 A. Since µ(A) = µ(T−13 A), we have that µ(T−1

3 A \A) = 0.Using these, it is possible to show that for B =

⋃∞n=0 T

−n3 A, we have T−1

2 (B) = B andT−1

3 B and µ(B \A) = 0. By ergodicity µ(B) = 1, hence µ(A) = 1. Then

1

N

N1∑n=0

f(Tn2 x)→∫f(x) dx

∀f ∈ C(X) and for µ-almost every x.


Index

atoms, 33

Banach-Alaoglu theorem, 5Bernoulli shift, 33

Cesaro convergence, 25Chacon map, 31circle rotation map, 2convergence in density, 25convex, 34

strictly, 34

doubling map, 2

entropyof partition, 33of system, 37

equidistributed, 20ergodic, 8

uniquely, 16ergodic average, 10ergodic theorem

maximal, 13mean, 10pointwise, 10

Furstenberg multiple recurrence, 4

generator, 37generic, 21

information function, 35invertible, 11isomorphic, 33

join, 33

Kolmogorov-Sinai theorem, 37Koopman operator, 11

Markov shift, 3maximal ergodic theorem, 13mean ergodic theorem, 10measure preserving

system, 2invertible, 11

transformation, 2mixing, 25

on k sets, 25weak, 25

normal, 16

orbit, 2

partition, 33Poincare recurrence, 7pointwise ergodic theorem, 10push-forward, 17

Riesz representation theorem, 17

skew product, 21strictly convex, 34Szemererdi’s theorem, 4

topological dynamical system, 16

uniquely ergodic, 16

weak limit, 5weak mixing, 25

53

Documents

Topics in Ergodic Theory- Part III - maths-notes · An interesting problem from number theory and combinatorics is Szemer edi’s theorem. In this course, we will give an ergodic