ERGODIC-THEORETIC IMPLEMENTATIONS OF THE ROTHweb.math.rochester.edu/misc/ojac/vol8/87.pdf · Within ergodic theory, a great deal of energy has now been spent on obtaining the most

ERGODIC-THEORETIC IMPLEMENTATIONS OF THE ROTHDENSITY-INCREMENT ARGUMENT

TIM AUSTIN

ABSTRACT. We exhibit proofs of Furstenberg’s Multiple Recurrence Theoremand of a special case of Furstenberg and Katznelson’s multidimensional versionof this theorem, using an analog of the density-increment argument of Roth andGowers. The second of these results requires also an analog of some recentfinitary work by Shkredov.

Many proofs of these multiple recurrence theorems are already known. How-ever, the approach of this paper sheds some further light on the well-knownheuristic correspondence between the ergodic-theoretic and combinatorial as-pects of multiple recurrence and Szemeredi’s Theorem. Focusing on the density-increment strategy highlights several close points of connection between thesesettings.

CONTENTS

1. Introduction 11.1. Ergodic Ramsey Theory 21.2. The density-increment argument 31.3. Outline of this note 52. Powers of a single transformation 62.1. Preliminary discussion 62.2. The density-increment proof 113. Two commuting transformations 163.1. The density increment in higher dimensions 163.2. A closer look at the characteristic factors and the main estimate 193.3. The main estimate 223.4. Shkredov’s version of the density increment 244. Further discussion 28References 31

1. INTRODUCTION

In 1975 Szemeredi published the first proof of a long-standing conjecture ofErdos and Turan concerning arithmetic progressions in dense arithmetic sets.

Work supported by fellowships from Microsoft Corporation and from the Clay MathematicsInstitute.

1

2 TIM AUSTIN

Theorem 1.1 (Szemeredi’s Theorem). If E ⇢ Z admits some � > 0 for whichthere are arbitrarily long intervals [M,N ] with

|E \ [M,N ]| � �(N �M)

(that is, E has ‘upper Banach density’ equal to at least �), then E also containsfor every k � 1 a nondegenerate arithmetic progression of length k:

E � {a, a+ n, a+ 2n, . . . , a+ (k � 1)n} for some a 2 Z, n � 1.

Separate proofs for various special cases were given earlier by Roth and bySzemeredi himself. The thirty-five years subsequent to Szemeredi’s breakthroughhave seen the emergence of a host of alternative approaches to this theorem andseveral generalizations.

The many techniques that have been brought to bear in this investigation areloosely drawn from three areas of mathematics:

• graph and hypergraph theory (in work of Szemeredi, Solymosi, Nagle,Rodl, Schacht, Skokan, Gowers and others),

• ergodic theory (largely building on ideas of Furstenberg and Katznelson),• harmonic analysis (following Roth, Bourgain, Gowers, Green, Tao and

Shkredov).The alternative arguments constructed from these three bodies of theory sometimescorrespond much more closely than is initially apparent, owing to many differencesin technical detail that turn out to be quite superficial. No really comprehensiveoverview of the relations among these approaches is yet available, but fragmentsof the picture can be found in the papers [Kra07, GT12, Tao06a] and in Chapters10 and 11 of Tao and Vu’s book [TV06].

The purpose of the present note is to extract one aspect of the harmonic analyticapproach — the ‘density-increment argument’, originating in the early work ofRoth [Rot53] — and present a natural analog of it in the rather different setting ofergodic theory. No new theorems will be proved except for some technical resultsneeded on route, but I hope that this alternative presentation of existing ideas willcontribute to enhancing the toolkits of those working on this class of problems,and also shed some light on the open questions that remain concerning the density-increment approach.

1.1. Ergodic Ramsey Theory. Two years after Szemeredi’s proof of Theorem 1.1appeared, Furstenberg offered in [Fur77] a very different approach to the sameresult based on a conversion to a problem in ergodic theory, using what is nowreferred to as ‘Furstenberg’s correspondence principle’.

A precise formulation of the general correspondence principle can be found, forexample, in Bergelson [Ber96]. Here we simply recall that Furstenberg proved theequivalence of Szemeredi’s Theorem to the following:

Theorem 1.2 (Multiple Recurrence Theorem). If T : Z y (X,µ) is a probability-preserving action on a standard Borel probability space and A ⇢ X is measurable

ERGODIC-THEORETIC DENSITY-INCREMENT 3

and has µ(A) > 0, then also

lim infN!1

1

N

NX

n=1

µ(A \ T

�nA \ · · · \ T

�(k�1)nA) > 0 8k � 1.

Furstenberg’s proof of Theorem 1.2 relied on a powerful structural classificationof probability-preserving dynamical systems developed independently by Fursten-berg and by Zimmer ([Zim76b, Zim76a]).

Shortly after that proof appeared, Furstenberg and Katznelson realized that onlya modest adaptation yields a significantly stronger result.

Theorem 1.3 (Multidimensional Multiple Recurrence Theorem). If T1, T2, . . . , Td :Z y (X,µ) are commuting probability-preserving actions on a standard Borelprobability space and A ⇢ X has µ(A) > 0 then also

lim infN!1

1

N

NX

n=1

µ(A \ T

�n1 A \ · · · \ T

�nd A) > 0.

This appeared in [FK78]. Theorem 1.2 follows from Theorem 1.3 by settingd := k � 1 and Ti := T

i for i k � 1. On the other hand, Theorem 1.3 also has acombinatorial consequence that strengthens Szemeredi’s Theorem:

Theorem 1.4. If E ⇢ Zd admits some � > 0 for which there are cuboidsQ

id[Mi, Ni]with minid |Ni �Mi| arbitrarily large and

�

�

�

E \Y

id

[Mi, Ni]�

�

�

� �

Y

id

(Ni �Mi),

then E also contains the set of vertices of a nondegenerate upright right-angledisosceles simplex:

E � {a,a+ ne1, . . . ,a+ ned} for some a 2 Zd, n � 1.

Interestingly, this result went unproven by purely combinatorial means until thedevelopment of hypergraph analogs of Szemeredi’s famous Regularity Lemma byNagle, Rodl and Schacht [NRS06], Gowers [Gow07] and [Tao06b], more thantwenty years later. In addition, several other purely combinatorial assertions havenow been accessed though ‘Ergodic Ramsey Theory’, the subject that emergedfrom Furstenberg and Katznelson’s early developments, including a density versionof the Hales-Jewett Theorem [FK91] and a density Ramsey Theorem for subtreesof trees [FW03].

Within ergodic theory, a great deal of energy has now been spent on obtainingthe most precise possible understanding of the averages whose limit infima are thesubjects of Theorems 1.2 and 1.3; we will return to some of these developmentslater.

1.2. The density-increment argument. The ‘density-increment argument’ wasfirst used by Roth for his early proof of the case k = 3 of Theorem 1.1. Muchmore recently, Gowers developed in [Gow98, Gow01] an extremely sophisticated

4 TIM AUSTIN

extension of Roth’s approach, and using this was able to give a density-incrementproof of the full Szemeredi Theorem.

We will not spend time here on the many technical accomplishments involvedin Gowers’ work, requiring a call to tools from yet other parts of arithmetic combi-natorics such as Freiman’s Theorem. Rather we record just a simple statement ofthe density-increment proposition that lies at its heart.

Proposition 1.5. Suppose that � > 0, that N is sufficiently large and that E ⇢{1, 2, . . . , N} has |E| � �N but contains no k-term arithmetic progression. Then

there is an arithmetic progression P ⇢ {1, 2, . . . , N} of size at least N ((�/2)k2k)2

2k+8

such that|E \ P | � (� + ((�/2)k2

k)2

2k+8

)|P |.⇤

This proposition is implicit in [Gow01], but does not appear in the above formbecause Gowers presents his argument in terms of the crucial auxiliary notion of‘higher-degree uniformity’, and splits the above result into several pieces that areconnected via this auxiliary notion.

This kind of uniformity is defined in terms of the Gowers uniformity norms(Section 3 of [Gow01]; see also Chapter 11 of Tao and Vu [TV06]) that have sincebecome widely used in additive combinatorics. Uniformity of degree 1 can bedescribed simply in terms of the presence of some large values among the Fouriercoefficients of 1E , regarded as a function on the group Z/NZ; this is essentiallythe notion that Roth uses in his approach for k = 3. Higher-degree uniformityextends this property, although it is not so easily described using Fourier analysis.In his more general setting, Gowers proves on the one hand that if E is sufficientlyuniform of degree k�2 then it contains a k-term arithmetic progression (Corollary3.6 in [Gow01]), and on the other that if E is not sufficiently uniform of degreek � 2 then we may partition {1, 2, . . . , N} into long arithmetic subprogressionssuch that E has a relative density inside some of these subprogressions that issubstantially larger than � (Theorem 18.1 in [Gow01]). This fact can then be usedto pick out one such subprogression satisfying the above conclusion (Lemma 5.15in [Gow01]). Proposition 1.5 amounts to the conjunction of these facts.

Our proof of Theorem 1.2 below takes a similar form (although we should stressthat our task is much simpler than Gowers’), using an ergodic-theoretic analog ofthe notion of ‘uniformity’ arising in work of Host and Kra. Similarly to the presen-tation in [Gow01], we will find that handling the consequences of non-uniformityis the more complicated of the two steps involved.

From Proposition 1.5 a proof of Szemeredi’s Theorem follows quickly by con-tradiction. If E is a counterexample of density � and N is sufficiently large, thenfor a subprogression P as given by Proposition 1.5 we see that E \ P , identifiedwith a subset of {1, 2, . . . , |P |} by the obvious affine map, is another counterex-ample with density that exceeds � by an amount depending only on � and k. Itis contained in a discrete interval of length |P | = N

(�,k) for some small fixed(�, k) > 0. Therefore, provided N was sufficiently large to begin with, iterating


this construction must eventually turn a counterexample of density at least � into acounterexample of density greater than 1: an obvious contradiction.

In addition to its aesthetic value, Gowers’ new proof of Szemeredi’s Theoremgives much the best known bound on how large N must be taken in order thata k-term arithmetic progression is certain to be found in a density-� subset E ⇢{1, 2, . . . , N}. In view of this, it was natural to ask whether this approach couldalso be brought to bear on the multidimensional Theorem 1.4 in order to give asimilarly striking improvement to the bounds available there. Gowers poses thisproblem explicitly and offers some discussion of it in his survey [Gow00]. RecentlyShkredov has made the first serious progress on this problem by essentially solvingthe case d = 2 in [Shk06b], applying some important new technical ideas that areneeded to prove and then use a relative of Proposition 1.5. However, a furtherenhancement of these ideas that will yield a density-increment proof of the fullTheorem 1.4, with or without improved bounds, still seems relatively distant.

1.3. Outline of this note. The centrepieces of this note are ‘density-increment’proofs of the Multiple Recurrence Theorem 1.2 and the case d = 2 of Theorem 1.3,corresponding to Gowers’ and Shkredov’s combinatorial implementations of thedensity-increment argument respectively.

The main steps taken by Gowers and Shkredov do have counterparts in theseproofs, but we need different structural results from within ergodic theory to en-able them. These will largely be drawn from recent studies of the ‘nonconventionalergodic averages’ whose limit infima appear in Theorems 1.2 and 1.3. In particularwe rely on the method of ‘characteristic factors’, which has emerged through theworks of several researchers since Furstenberg’s original paper [Fur77], and espe-cially on some of the technical steps in Host and Kra’s proof ([HK05]) of conver-gence for the averages of 1.2 and in the work of Conze and Lesigne [CL84, CL88a,CL88b] and the subsequent works [Aus09, Aus10a] on the multi-dimensional case.Many other researchers have contributed to this story within ergodic theory, includ-ing Rudolph, Zhang, Katznelson, Weiss, Ziegler, Frantzikinakis and Chu, and thereader is referred to [Aus10b] for a more complete discussion.

The basic ergodic theoretic version of the density-increment argument for Theo-rem 1.2 will be introduced in Subsection 2.2 and then used to complete the proof ofthat theorem later in Section 2. Although a density increment is central to Shkre-dov’s proof as well, he uses it in a slightly more complicated way, and so in Sec-tion 3 we introduce the ergodic theoretic analog of this separately and then use itto prove the case d = 2 of Theorem 1.3.

On the one hand, I hope that these proofs shed some light on the nature of thedensity-increment argument. On the other, it seems that recent progress in ergodictheory is beginning to address some of the problems of extending this approach togive a density-increment proof of the whole of Theorem 1.3 (and so, one mighthope, also to give a finitary density-increment proof of Theorem 1.4, as requestedby Gowers). In the final Section 4 we will draw on results from [Aus10b, Ausa,Ausb] to sketch some of the further developments suggested by this progress.

6 TIM AUSTIN

2. POWERS OF A SINGLE TRANSFORMATION

2.1. Preliminary discussion. In this section we show how the density-incrementstrategy can be used to give a proof of Theorem 1.2, building on two importantergodic-theoretic ingredients. Let us first recall a convenient definition.

Definition 2.1 (Process). We will refer to a probability-preserving Z-system (X,µ, T )together with a distinguished subset A as a process and denote it by (X � A, µ, T ).

Definition 2.2. An ergodic process (X � A, µ, T ) has no k-APs in its return

times if

µ(A \ T

�nA \ · · · \ T

�(k�1)nA) = 0 for all n 2 Z \ {0}.

Clearly if µ(A) > 0 then the above property is stronger than being a coun-terexample to the Multiple Recurrence Theorem, which requires that the relevantintersections have positive measure on average, not just for a single nonzero n.Since that theorem turns out to be true, the above definition is essentially vacuous,but it will be a convenient handle at various points during the proofs that follow.

The first ingredient we need is a corollary of the recent result of Host andKra [HK05] that the limiting values of the multiple recurrence averages are pre-cisely controlled by certain special nilrotation factors of a system (X,µ, T ).

Definition 2.3 (Nilrotations). For any k � 1 a k-step nilrotation is a Z-systemon a homogeneous space G/� for G a k-step nilpotent Lie group and � G acocompact discrete subgroup, where G/� is endowed with its normalized Haarmeasure m and the transformation is given by

Rg : h� 7! gh�

for some g 2 G.

Theorem 2.4 (Host-Kra Theorem). For each k � 2, any ergodic Z-system X =(X,µ, T ) has a factor map ⇡k�2 : X ! Zk�2 onto a system generated by aninverse sequence of (k � 2)-step nilrotations such that

1

N

NX

n=1

Z

Xf0 · (f1 � Tn) · · · · · (fk�1 � T (k�1)n) dµ

⇠ 1

N

NX

n=1

Z

XEµ(f0 |⇡k�2)·(Eµ(f1 |⇡k�2)�Tn)·· · ··(Eµ(fk�1 |⇡k�2)�T kn) dµ

as N ! 1 for any f0, f1, . . . , fk�1 2 L

1(µ), where the notation asserts that thedifference between these two sequences of averages tends to 0 as N ! 1. ⇤Remark. The above result is often expressed by asserting that the factor ⇡k�2 ischaracteristic for the averages in question. This theorem first appears in [HK05],where its proof invokes a family of seminorms on L

1(µ) that Host and Kra intro-duce for this purpose and that are closely analogous to Gowers’ uniformity semi-norms from [Gow01], so offering another point of proximity between the ergodictheoretic and quantitative approaches. Another proof of Theorem 2.4 has now been


given by Ziegler in [Zie07], who also shows that the maximal factor of X generatedby (k � 2)-step nilrotations is also the unique minimal factor that is characteristicin the above sense. C

With the Host-Kra Theorem in mind, the second result that we use is simply thefact that multiple recurrence does hold for nilrotations.

Theorem 2.5 (Multiple recurrence for nilrotations). If Rg y G/� is an ergodicnilrotation, A ⇢ G/� has positive measure and K � 1 then there is some r � 1such that

m(g�KrA \ g

�(K�1)rA \ · · · \ g

KrA) > 0.

⇤In fact, this result is considerably simpler than Theorem 2.4, which really does

the heavy lifting in what follows. The point is that the orbit of the diagonal � :={(x, x, . . . , x) : x 2 G/�} ⇢ (G/�)2K+1 (or rather, its normalized surfacemeasure m�) under the off-diagonal transformation R(g�K ,g�K+1,...,gK) (which isclearly still a nilrotation acting on (G/�)2K+1) can be shown to equidstribute insome finite union of closed connected nilsubmanifolds of (G/�)2K+1 that containsthe whole of this diagonal set. This follows from strong results classifying allergodic invariant measures for nilrotations. From this point a fairly elementaryargument gives the positivity of

lim infN!1

1

N

NX

n=1

m(g�KnA \ g

�(K�1)nA \ · · · \ g

KnA)

= lim infN!1

1

N

NX

n=1

Z

1A⇥A⇥···⇥A �Rn(g�K ,g�K+1,...,gK) dm�,

and also the fact that these averages actually converge, so this limit infimum is re-ally a limit. A related instance of this argument can be found presented in detail inSection 2 of the work [BLL08] by Bergelson, Leibman and Lesigne, who use it forthe related end of proving multiple recurrence along certain families of polynomi-als. Equidistribution results for nilrotations on which this reasoning can be foundedare available in either Ziegler [Zie05] or Bergelson, Host and Kra [BHK05], whichin turn build on older works of Parry [Par69, Par70, Par73], Lesigne [Les91] andLeibman [Lei98, Lei05].

With Theorems 2.4 and 2.5 at our disposal, it is relatively easy to lay out adensity-increment proof of the full Multiple Recurrence Theorem. However, it isimportant to observe right away that this is a rather perverse thing to do, because theabove two ingredients also imply that theorem through the following even quickerargument:

• given our process (X � A, µ, T ), we wish to prove that

lim infN!1

1

N

NX

n=1

µ(A \ T

�nA \ · · · \ T

�(k�1)nA) > 0,

8 TIM AUSTIN

so by Theorem 2.4 it suffices to prove instead that

lim infN!1

1

N

NX

n=1

Z

X

k�1Y

i=0

(E(A |⇡k�2) � T in) dµ > 0

with ⇡k�2 : X ! Zk�2 the inverse limit of nilrotation factors from thattheorem (and where we write E(A |⇡k�2) as short for E(1A |⇡k�2));

• this, in turn, will follow if we prove that

lim infN!1

1

N

NX

n=1

µ(B \ T

�nB \ · · · \ T

�(k�1)nB) > 0

where B := {E(A |⇡k�2) > "} for any positive " chosen so small thatµ(B) > 0 (for example, " µ(A)/2 will do);

• finally, importing a simple trick from [FK78], this follows by choosing afurther factor ↵ : Zk�2 ! G/� onto a finite-dimensional nilrotation suchthat

kE(A |⇡k�2)� E(A |↵ � ⇡k�2)k1 <"

100(k + 1)

(which is possible because ⇡k�2 is generated by an inverse sequence ofsuch further factors ↵) and letting

C :=n

E(B |↵ � ⇡k�2) > 1� 1

k + 1

o

,

for which we now easily deduced that

lim infN!1

1

N

NX

n=1

µ(B \ T

�nB \ · · · \ T

�(k�1)nB)

� 1

2lim infN!1

1

N

NX

n=1

µ(C \ T

�nC \ · · · \ T

�(k�1)nC),

which Theorem 2.5 shows is strictly positive.

(This proof is also essentially that used in [BLL08] for their instance of polynomialrecurrence.)

Therefore the point of this section is not to provide a serious new approach to theMultiple Recurrence Theorem, but rather to exhibit the density-increment strategyin a setting familiar to ergodic theorists.

The reason why the approach to multiple recurrence just sketched does not havea clear analog among quantitative proofs of Szemeredi’s Theorem is hidden inour appeal to Theorem 2.4. In fact, the technical result that drives Gowers’ workis really more analogous to the following easy corollary of Theorem 2.4 than toTheorem 2.4 itself:


Corollary 2.6. If an ergodic system (X,µ, T ) and measurable functions f0, f1, . . . , fk�1 :X ! [�1, 1] are such that

lim supN!1

1

N

NX

n=1

Z

Xf0 · (f1 � Tn) · · · · · (fk�1 � T (k�1)n) dµ � � > 0,

then there is a factor map ⇡ : (X,µ, T ) ! (G/�,m,Rg) onto an ergodic (k� 2)-step nilrotation such that

kE(fi |⇡)k2 �1

2� for each i = 0, 1, . . . , k � 1.

In particular, if (X � A, µ, T ) is a process with µ(A) � � but no k-APs in itsreturn times then there is such a factor map ⇡ for which

kE(A |⇡)� µ(A)k2 �1

2k�

k.

Proof. By Theorem 2.4, the averages in question have the same asymptotic be-haviour as the averages

1

N

NX

n=1

Z

Xf

00 · (f 0

1 � Tn) · · · · · (f 0k�1 � T (k�1)n) dµ

with f

0i := E(fi |⇡k�2), and now all these functions still lie in the unit ball of

L

1(µ), and so for any i we can apply the Cauchy-Schwartz inequality to fi �T

in and the product of the remaining factors to deduce that the above average isbounded in absolute value by kf 0

ik2. Since the sum of these averages must begreater than �/2 infinitely often, this requires that kf 0

ik2 > �/2 for each i; finally,since ⇡k�2 is generated by further factor maps onto finite-dimensional (k�2)-stepnilrotations, letting ⇡ be a large enough one of these gives the first conclusion.

To derive the second conclusion, first use Theorem 2.4 to obtain

1

N

NX

n=1

µ(A \ T

�nA \ · · · \ T

�(k�1)nA) ⇠ 1

N

NX

n=1

Z

X

k�1Y

i=0

E(A |⇡k�2) � T in dµ

as N ! 1, so that if A contains no k-APs in its return times then both of theseexpressions must vanish as N ! 1. Now use the decomposition

E(A |⇡k�2) = (E(A |⇡k�2)� µ(A)) + µ(A)

to form the telescoping sum

1

N

NX

n=1

Z

X

k�1Y

i=0

E(A |⇡k�2) � T in dµ

=k�1X

i=0

1

N

NX

n=1

Z

Xµ(A)i · ((E(A |⇡k�2) � T in � µ(A))

·Y

i<jk�1

(E(A |⇡k�2) � T jn) dµ

+µ(A)k.

10 TIM AUSTIN

Since this tends to 0, the first k terms of the sum must asymptotically cancel theterm µ(A)k � �

k, and hence at least one of these first terms must have magni-tude arbitrarily close to 1

k�k for infinitely many N . The first part of the corollary

therefore gives some (k � 2)-step nilrotation factor ⇡ for which

kE(A |⇡)� µ(A)k2 �1

2k�

k,

as required. ⇤Within Roth’s and Gowers’ works lie quantitative analogs of the above result:

this is what drives Gowers’ proof that a failure of uniformity of degree k � 2 for adensity-� set E ⇢ {1, 2, . . . , N} gives a partition of {1, 2, . . . , N} into fairly longsubprogressions on which E enjoys an enlarged relative density (Theorem 18.1in [Gow01]).

Heuristically, Gowers shows first that a failure of uniformity of degree k � 2implies a nontrivial correlation between 1E � � and a function on {1, 2, . . . , N}which behaves like the exponential of i times a real polynomial of degree (k � 2)on many large subprogressions of {1, 2, . . . , N}. He then converts this correlationinto the desired partition of {1, 2, . . . , N} into long subprogressions. This corre-lation with a function that behaves ‘locally’ like a degree-(k � 2) polynomial isthe analog of having a nontrivial conditional expectation onto a (k � 2)-step nil-system. The exact formulation of the finitary ‘inverse theorem’ for the failure ofhigher-degree uniformity is rather complicated, and we omit it here, but again agentle introduction with many further references can be found in the book [TV06]of Tao and Vu.

In the infinitary ergodic-theoretic setting the implication of Corollary 2.6 byTheorem 2.4 can easily be reversed: given any indicator function 1A we can de-compose it as 1A = (1A � E(A |⇡k�2)) + E(A |⇡k�2), and now if we form atelescoping sum for the expression 1

N

PNn=1 µ(A \ T

�nA \ · · · \ T

�(k�1)nA)

similar to the above then (the contrapositive of) Corollary 2.6 implies that all theterms involving 1A � E(A |⇡k�2) must vanish as N ! 1, leaving us with Theo-rem 2.4:

1

N

NX

n=1

µ(A \ T

�nA \ · · · \ T

�(k�1)nA) ⇠ 1

N

NX

n=1

Z

X

k�1Y

i=0

E(A |⇡k�2) � T in dµ.

However, difficulties emerge when one tries to develop a quantitative analogof this reverse implication, and so provide a truer analog of Theorem 2.4 in thefinitary setting. In order to make sense of either conditional expectations such asE(A |⇡k�2), or of the structure of ⇡k�2 as an inverse limit of a possibly-infinitecollection of nilrotation factors, one needs a quantitative analog of taking a limitin L

2(µ). In practice this leads to an explosion in the bounds obtained. Althoughsomething in this vein is possible (see Tao [Tao06a]), in general it is much less effi-cient than the density-increment strategy, for which (the finitary analog of) Corol-lary 2.6 suffices. On the other hand, our quick presentation above of the deductionof multiple recurrence from Theorems 2.4 and 2.5 clearly uses the full strength ofTheorem 2.4, and so if instead we started from Corollary 2.6 it would require us


to prove Theorem 2.4 first before proceeding as above. In the next subsection wewill see that the density-increment strategy, by contrast, uses only the conjunctionof Corollary 2.6 and Theorem 2.5, and it is this feature that accounts for its greaterefficiency (leading to better bounds) in the finitary world.

2.2. The density-increment proof. The strategy here is to prove Theorem 1.2or 1.3 by ‘induction on µ(A)’. The technical result underlying this is an ergodic-theoretic analog of Proposition 1.5.

Proposition 2.7 (Ergodic-theoretic density-increment). For each k � 1 there isa function ck : (0, 1] ! (0, 1] that is bounded away from 0 on compact subsetssuch that the following holds: if (X � A, µ, T ) is a process with µ(A) = � > 0but no k-APs in its return times, then for every " > 0 and N � 1 there are somenon-negligible B ⇢ X and integer r � 1 such that

µ(B4T

rB) < "µ(B)

andµ(A |T�rn

B) � � + ck(�) for all �N n N.

Before proving this result let us see why it implies Theorem 1.2.

Corollary 2.8. With ck as in Proposition 2.7 the following holds: if there exists aprocess (X � A, µ, T ) having µ(A) = � > 0 but no k-APs in its return times,then then is another process (Y � B, ⌫, S) having ⌫(B) � �+ck(�) but no k-APsin its return times.

Proof of Corollary from Proposition 2.7. By Proposition 2.7, for any N we canfind a non-negligible BN ⇢ X and rN � 1 such that

µ(BN4T

rNBN ) <

µ(BN )

N

andµ(A |T�rNn(BN )) � � + ck(�) for all �N n N.

Let ⌫N be the probability measure on Y = {0, 1}Z that is the law of the randomsequence

'N : x 7! (1A(TrNn(x)))n2Z

for x drawn at random from µ( · |BN ) (that is, x is chosen ‘uniformly from BN ’).Let S : Y ! Y be the coordinate left-shift and

Aa := {(!i)i2Z : !a = 1} ⇢ Y for a 2 Z,

so Aa = S

�a(A0). The lower bound on the measures µ(A |T�rNn(BN )) for�N n N implies that any vague accumulation point ⌫ of the sequence ⌫N ,say ⌫ = limi!1 ⌫Ni , must satisfy

⌫(Aa) = limi!1

µ(A |T rNia(BNi)) � � + ck(�) 8a.

12 TIM AUSTIN

On the other hand, the assumption that there are no k-APs in the return times of Aimplies that

⌫N (Aa \Aa+r \ · · · \Aa+(k�1)r)

= ⌫N{(!i)i2Z : !a = !a+r = · · · = !a+(k�1)r = 1} = 0

for all a 2 Z, r � 1 and all N , and so the same is true for ⌫.Finally, the inequality µ(BN4T

rNBN ) < µ(BN )/N implies for any Borel

C ⇢ Y that

|⌫N (C)� ⌫N (S�1C)| = |µ('�1

N C |BN )� µ(T�rN'

�1N C |BN )|

= |µ('�1N C |BN )� µ('�1

N C |T rNBN )|

µ('�1

N C \ (BN4T

rNBN ))

µ(BN )< 1/N,

so the vague limit ⌫ is also S-invariant. Letting B := A0, this gives a process (Y �B, ⌫, S) with no k-APs in its return times and the desired improved bounds. ⇤Proof of Theorem 1.2 from Corollary 2.8. Step 1 If (X � A, µ, T ) is any coun-terexample to Theorem 1.2 with �0 := µ(A) > 0, then a simple vague limit argu-ment can enhance it to an example with the same density value �0 but no k-APs inits return times. This construction forms the bulk of this proof.

We first transfer our initially-given example onto the space Y := {0, 1}Z withthe left-shift S. Let

B := {(!i)i2Z 2 Y : !0 = 1},and now consider the map

fA : X ! Y : x 7!�

1A(Ti(x))

�

i2Z.

This intertwines T with S, so the pushforward ⌫1 := (fA)#µ is an S-invariantBorel measure on Y for which ⌫1(B) = µ(A) and

lim infN!1

1

N

NX

n=1

⌫1(B \ S

�nB \ · · · \ S

�(k�1)nB)

= lim infN!1

1

N

NX

n=1

µ(A \ T

�nA \ · · · \ T

�(k�1)nA) = 0.

This implies that the corresponding averages along any subset a · N ⇢ N, a 6= 1,also tend subsequentially to zero:

lim infN!1

1

N

NX

n=1

⌫1(B \ S

�anB \ · · · \ S

�a(k�1)nB) = 0,

because for large N the terms corresponding to n 2 a ·N account for about 1/a ofthe full average.

Now let ⌫k be the image measure of ⌫1 under the coordinate-dilation

dilk : Y ! Y : (!i)i2Z 7! (!ki)i2Z,


so that each ⌫k is still shift-invariant and satisfies ⌫k(B) = ⌫1(B), since B =dil�1

k (B) because it depends only on the zeroth coordinate. Letting ⌫ be any limitpoint of the averages 1

K

PKk=1 ⌫k in the vague topology, the above convergence

tells us that⌫(B \ S

�aB \ · · · \ S

�(k�1)aB) = 0

whenever a 6= 0.Step 2 Having made this simplification, Corollary 2.8 gives a new coun-

terexample with density �1 � �0 + ck(�0). Since ck is bounded away from 0 onthe subinterval [�0, 1] ⇢ (0, 1], after finitely many iterations this procedure gives acounterexample with density greater than 1, and hence a contradiction. ⇤

Before presenting the proof of Proposition 2.7 we need one further enablingresult, for which we will make our appeal to Theorem 2.5. From that theorem weneed the consequence that one can approximately decompose an arbitrary positive-measure U ⇢ G/� into a collection of almost-invariant sets for different powersof Rg.

Proposition 2.9. If Rg y (G/�,m) is an ergodic nilrotation, U ⇢ G/� is mea-surable and of positive measure and K � 1, then there is a countable set of pairs{(V1, r1), (V2, r2), . . .} (which could be finite or infinite) such that

(i) each Vi has positive measure;(ii) if i 6= i

0 then the unionsSK

k=�K g

rikVi and

SKk=�K g

ri0kVi0 are disjoint;

(iii) U �S

i�1

SKk=�K g

rikVi;

(iv) and m

�

U \S

i�1

SKk=�K g

rikVi�

= 0.

Proof. The proof given here invokes Zorn’s Lemma, although a more carefulargument shows that it only really needs the ability to induct transfinitely below!1. Frustratingly, I have not been able to find a proof that avoids this kind ofinduction entirely, although in the finitary analog of this step all sets are finite andso the issue does not arise.

Let A be the set of all countable families of pairs {(V1, r1), (V2, r2), . . .} thathave properties (i–iii) above (but possibly not (iv)), and order A by inclusion offamilies. If F = {(V1, r1), (V2, r2), . . .} 2 A then set

m(F) := m

⇣

[

i�1

K[

k=�K

g

rikVi

⌘

.

Since m(U) > 0, Theorem 2.5 promises some r such that

m(g�KrU \ g

�(K�1)rU \ · · · \ g

KrU) > 0.

Therefore the set V := g

�KrU \ g

�(K�1)rU \ · · · \ g

KrU has positive measure

and satisfies grkV ⇢ U for every �K k K, so {(V, r)} 2 A and hence A isnonempty.

Now suppose that (F↵)↵ is a totally ordered family in A. Since m(V ) > 0for any (V, r) 2 F↵, the values of the measures m(F↵) are totally ordered, are all

14 TIM AUSTIN

distinct and are bounded by 1. We may therefore extract a non-decreasing sequenceF↵1 ⇢ F↵2 ⇢ · · · such that m(F↵i) ! sup↵m(F↵) as i ! 1.

Since each F↵ is countable and they are totally ordered, it follows that G :=S

i�1F↵i is still countable, and in fact is still a member of A. Moreover, if (V, r) 2F↵ for some ↵, then this pair must actually appear in some F↵i , for otherwise wewould have m(F↵i) m(F↵)�m

�

SKk=�K g

rkV

�

for every i, contradicting ourconstruction. Hence G � F↵ for all ↵, and so G is an upper bound for the chain(F↵)↵.

Therefore by Zorn’s Lemma the whole family A has a maximal element, sayF = {(V1, r1), (V2, r2), . . .}. Now we need simply observe that this must havem(F) = m(A) (which implies property (iv)), since otherwise another appeal toTheorem 2.5 would give V

0 ⇢ U \S

F and r

0 � 1 such that F [ {(V 0, r

0)} 2 A,contradicting the maximality of F . Therefore F has all the desired properties, andthe proof is complete. ⇤Remark. The finitary analog of this result in [Gow01] (see his Corollary 5.6) isvery elementary and quantitative. I suspect that a version of Gowers’ proof couldbe adapted to the present setting (perhaps with some additional assumptions onU , such as that it be open with piecewise-smooth boundary), but that this wouldrequire the use of a Mal’cev basis for G and the ability to study orbits of Rg interms of ‘explicit’ generalized polynomials using the resulting coordinate system.Such a more quantitative argument would probably be considerably longer than theproof given above. C

We can now complete the density-increment proof of multiple recurrence usingthe above proposition and Corollary 2.6.

Proof of Proposition 2.7. Suppose that (X � A, µ, T ) is a process having µ(A) =:� > 0 but no k-APs in its return times. Then Corollary 2.6 gives

kE(A |⇡)� µ(A)k2 �1

2k�

k

for some factor map ⇡ : (X,µ, T ) ! (G/�,m,Rg) onto a (k�2)-step nilrotation.Since G/� is compact and its Borel �-algebra is generated by its open sets, we

can find a finite Borel partition U of G/� into small-diameter positive-measurepieces such that

1

m(U)

Z

U|E(A |⇡)� µ(A |⇡�1

U)| dm <

1

20k�

k

for all U 2 U\Ubad, where Ubad is a subcollection such that m(S

Ubad) < �

k/20k.

Combined with the preceding inequality, this implies that there is some U 2 U forwhich

µ(A |⇡�1U) > � +

1

10k�

k.

Now given N � 1 choose K := LN � 1 with L � 1 so large that 1/L �

k/20k and L > 2/" + 1. Apply Proposition 2.9 to the set U to obtain pairs

(V1, r1), (V2, r2), . . . with each Vi having positive measure and such that the unions


SKk=�K g

rikVi for i � 1 are pairwise disjoint, all contained in U and together fill

up m-almost all of U . In view of the convex combination

µ(A |⇡�1U) =

X

i�1

m

�

SKk=�K g

rikVi�

m(U)µ

⇣

A

�

�

�

⇡

�1⇣

K[

k=�K

g

rikVi

⌘⌘

,

there is some i for which

µ

⇣

A

�

�

�

⇡

�1⇣

K[

k=�K

g

rikVi

⌘⌘

� � +1

10k�

k.

Letting B

0 =SK�2N�1

k=�K g

rikVi and

C :=K[

k=�K

g

rikVi

/

B

0,

the shifts C, g�ri(2N+1)C, g�2ri(2N+1)

C, . . . , g�(L�1)ri(2N+1)C are pairwise dis-

joint and contained inSK

k=�K g

rikVi, so each has measure at most 1

Lm�

SKk=�K g

rikVi�

and therefore

m(B0) � L� 1

L

m

⇣

K[

k=�K

g

rikVi

⌘

.

In addition, the set difference g

riB

0 \ B0 is contained in C and so has measureat most

1

L

m

⇣

K[

k=�K

g

rikVi

⌘

1

L� 1m(B0),

and a symmetrical argument controls the measure of B

0 \ g

riB

0 so together weobtain

m(B04g

riB

0) 2

L� 1m(B0) < "m(B0).

Finally, letting B := g

�riNB

0, it follows that grinB ⇢SK

k=�K g

rikVi for all

�N n N , that

µ(A |⇡�1g

rinB) � µ(A \ ⇡

�1g

rinB)

m

�

SKk=�K g

rikVi�

� m

⇣

A

�

�

�

⇡

�1⇣

K[

k=�K

g

rikVi

⌘⌘

� m(C)

m

�

SKk=�K g

rikVi�

� � +1

10k�

k � 1

L

� � +1

20k�

k,

and that B enjoys the same approximate gri-invariance as B0, completing the proofof Proposition 2.7 with ck(�) :=

120k�

k. ⇤

16 TIM AUSTIN

3. TWO COMMUTING TRANSFORMATIONS

3.1. The density increment in higher dimensions. With the appearance of Gow-ers’ density-increment proof of Szemeredi’s Theorem, it became natural to askwhether a similar approach can yield improved upper bounds for any cases of themultidimensional Szemeredi Theorem. Gowers discusses this question explicitlyin [Gow00]. It poses significant new challenges, and remains mostly open. For theanalogous ergodic-theoretic study of multiple recurrence we will see that the dif-ficulty arises from the nature of the characteristic factors in multiple dimensions,which are rather more complicated than the pro-nilsystems that give the completepicture for powers of a single ergodic transformation.

In the context of finitary proofs, it is still possible to set up a ‘directional’ vari-ant of the norms (actually now just seminorms) that Gowers introduced to defineuniformity, and to show that the resulting new notion of uniformity does controlthe count of the desired patterns in a subset E ⇢ {1, 2, . . . , N}d. The difficulty isin handling those sets, or more generally functions f : {1, 2, . . . , N}d ! [�1, 1],which are not uniform in the sense of this seminorm. Extending the approach ofRoth and Gowers requires one to find the appropriate class of functions againstwhich an arbitrary function must see a large correlation if it is not uniform. Foruniformity of degree k in the one-dimensional setting, these were the functionswhich on many long arithmetic subprogressions of {1, 2, . . . , N} agree with theexponential of i times some degree-k real polynomials (see the discussion follow-ing Corollary 2.6), but in the multi-dimensional setting they are much more com-plicated. Part of the difficulty in extending Gowers’ approach lies in the problemof identifying the most appropriate class of functions to use here, and part of itlies in establishing some necessary properties of those functions once they havebeen found (properties which are fairly classical in the case of the one-dimensional‘local’ polynomial functions).

However, in spite of these difficulties, Gowers-like bounds have now been ob-tained in the following special case of Theorem 1.4 by Shkredov:

Theorem 3.1. There is some absolute constant C > 0 such that if � > 0, N �

2221/�

C

and A ⇢ {1, 2, . . . , N}2 has |A| � �N

2, then A contains a corner:

A ◆ {a,a+ re1,a+ re2}for some a 2 {1, 2, . . . , N}2 and r � 1, where e1, e2 are the standard basisvectors in Z2.

In fact, since the appearance of his original article [Shk06b], in [Shk06a] Shkre-dov has improved the above bound further to the form 22

1/�C , effectively by re-placing a repeated descent to arithmetic subprogressions with a descent througha nested sequence of Bohr sets, following Bourgain’s use of these for his im-proved bounds in Roth’s Theorem [Bou99, Bou08]. In addition, Shkredov hasshown in [Shk09] how this latter argument can also be implemented in the settingof arbitrary finite Abelian groups (see also Section 5 of Green’s survey [Gre05]for a treatment of the case of high-dimensional vector spaces over a finite field).


However, for the sake of simplicity this note will focus on analogs of the originalpaper [Shk06b], and where appropriate make comparisons to the steps taken there.

Thus, we here present a new proof of the following special case of Theorem 1.3:

Theorem 3.2. If T1, T2 : Z y (X,µ) commute and A ⇢ X has µ(A) > 0 then

lim infN!1

1

N

NX

n=1

µ(A \ T

�n1 A \ T

�n2 A) > 0.

Henceforth we will generally refer to the quadruple (X,µ, T1, T2) as a Z2-system, in reference to the action of the whole group generated by T1 and T2.

In contrast with our work in the previous section, the analog of Theorem 2.4that will appear in this setting does not reduce our study to a class of systems forwhich multiple recurrence can simply be proved directly, as was the case usingTheorem 2.5. For this reason, although Theorem 3.2 has of course been knownsince Furstenberg and Katznelson’s work, the proof presented here is not quite soredundant as is the density-increment proof in one dimension (recall the discussionfollowing the statement of Theorem 2.5).

An important aspect of Shkredov’s proof is the introduction, in addition to E ⇢{1, 2, . . . , N}2, of a superset of it which is a product set F1⇥F2 which must also bemanipulated as the proof proceeds. We will employ a similar idea in the following,where for a system (X,µ, T1, T2) the structure of a ‘product set’ is replaced by thatof an intersection of sets which are invariant under either T1 or T2. The importanceof these special sets corresponds to the emergence of the factor generated by theT1- or T2-invariant sets within the structure of the characteristic factors. With thisin mind, we make the following analog of Definition 2.1.

Definition 3.3 (Augmented process). An augmented process is a Z2-system (X,µ, T1, T2)together with distinguished measurable subsets A, E1 and E2 satisfying A ⇢E1 \E2 and such that Ei is Ti-invariant. We shall sometimes denote these data by(X � E1 \ E2 � A, µ, T1, T2).

Definition 3.4. An augmented process has no corners in its return set if

µ(A \ T

�n1 A \ T

�n2 A) = 0 8n 6= 0.

In addition, the following notation will be used throughout the sequel.

Definition 3.5 (Partially invariant sets). If (X,µ, T1, T2) is a Z2-system, then asubset A ⇢ X is partially invariant if it is invariant under T

n11 T

n22 for some

(n1, n2). The �-algebra of (Tn11 T

n22 )-invariant measurable sets is denoted by

⌃(n1,n2), and in addition we let ⇣(n1,n2)0 be some factor map X ! Z

(n1,n2)0 onto

an auxiliary system where the transformation in direction (n1, n2) is trivial andwhich generates ⌃(n1,n2).

(This correspondence between globally invariant �-subalgebras of ⌃ and factormaps onto other systems is standard in ergodic theory; see, for instance, Chapter 2of [Aus10b] and the references given there.)

18 TIM AUSTIN

Definition 3.6 (Kronecker factors). If (X,µ, T1, T2) is an ergodic Z2-system then⇣

T1 will denote some choice of a factor map from X onto an action by rotations on

a compact Abelian group which generates the Kronecker factor of (X,µ, T1, T2),and similarly for Z-systems.

Definition 3.7 (Arithmetic of factors). Given two factor maps ⇡i : (X,µ, T1, T2) !(Yi, ⌫i, S1,i, S2,i) of a Z2-system, we let ⇡1 _ ⇡2 denote a factor map which gener-ates the same �-algebra as ⇡1 and ⇡2 together (for example, the Cartesian productmap (⇡1,⇡2) : X ! Y1 ⇥ Y2) will do), and ⇡1 ^ ⇡2 denote a factor map whichgenerates the �-algebra of all sets that are both ⇡1- and ⇡2-measurable.

In his setting, Shkredov considered nested inclusions

E ⇢ F1 ⇥ F2 ⇢ {1, 2, . . . , N}2.

His main innovation is the result that in order to count approximately the numberof corners in E it suffices to control the non-uniformity of E relative to its supersetF1 ⇥ F2, and crucially to an extent which depends only on the relative density

|E||F1||F2| , provided the sets F1 and F2 have some uniformity properties of their own.He effectively formulated this latter uniformity condition in terms of a uniformbound on the one-dimensional Fourier coefficients of the Fi, but for our sets Ei 2⌃Ti it turns out that a stronger condition is more convenient, formulated in termsof the independence of their shifts under Tj for j 6= i; this condition will appearshortly.

The need for the Ei below becomes natural upon understanding the analog ofTheorem 2.4 for the averages of Theorem 3.2. However, in the ergodic theoreticworld this involves another new twist, which has no real analog in the finitarysetting. It turns out that simply-described characteristic factors for the averagesof Theorem 3.2 may be obtained only after ascending to some extension of theinitially-given system. (The original system will certainly have characteristic fac-tors, but they may be much more complicated to describe.) Of course, it sufficesto prove multiple recurrence for such an extension, and so this is quite adequatefor our proof strategy. The following result is specialized from the construction ofso-called ‘pleasant and isotropized extensions’ in [Aus09, Aus10a].

Theorem 3.8. Any Z2-system (X�, µ

�, T

�1 , T

�2 ) has an extension

⇡ : (X,µ, T1, T2) ! (X�, µ

�, T

�1 , T

�2 )

with the property that

1

N

NX

n=1

Z

Xf0 · (f1 � Tn

2 ) · (f2 � Tn2 ) dµ

⇠ 1

N

NX

n=1

Z

XEµ(f0 |⇡0) · Eµ(f1 |⇡1) · Eµ(f2 |⇡2) dµ


as N ! 1 for any f0, f1, f2 2 L

1(µ), where

⇡0 := ⇣

(1,0)0 _ ⇣

(0,1)0

⇡1 := ⇣

(1,0)0 _ ⇣

(1,�1)0

⇡2 := ⇣

(1,�1)0 _ ⇣

(0,1)0 .

⇤Definition 3.9 (Pleasant system). Essentially following the nomenclature of [Aus09],we will refer to a system having the property of the extension constructed above aspleasant.

Replacing an initially-given Z2-system with an extension if necessary, we mayhenceforth concentrate on pleasant systems.

With this description of the characteristic factors in hand, we can now offer ourergodic theoretic translation of Shkredov’s main estimate (Theorem 7 in [Shk06b]).

Proposition 3.10. Suppose that (X � E1 \ E2 � A, µ, T1, T2) is a pleasantaugmented process with µ(A) > 0, that

• the return-set of A contains no corners, and• E1 ? T

n2 (E1) and E2 ? T

n1 (E2) for all n 6= 0, where ? denotes indepen-

dence,

and let ⇡0 := ⇣

(1,0)0 _ ⇣

(0,1)0 . Then

kEµ(A |⇡0)� µ(A |E1 \ E2)kL2(µ(· |E1\E2)) � µ(A |E1 \ E2)3.

The benefit of working with the conditions E1 ? T

n2 (E1) is that they will be

relatively easy to recover for the new process that we construct during the comingdensity increment. We will see shortly (Corollary 3.13) that this condition impliesthat E1 is orthogonal to the Kronecker factor ⇣T1 , and this orthogonality is a truerergodic-theoretic analog of Shkredov’s condition that they be degree-1 uniformity.

Proposition 3.10 will be proved in Subsection 3.3.

3.2. A closer look at the characteristic factors and the main estimate. Beforeproving Proposition 3.10 we need some simple auxiliary results about the factorsappearing in Theorem 3.8.

Lemma 3.11. If (X,µ, T1, T2) is ergodic as a Z2-system, then any two of thefactors ⇣(1,0)0 , ⇣(0,1)0 , ⇣(1,�1)

0 are independent, and the three together are relativelyindependent over their intersections with the Kronecker factor:

⇣

T1 ^ ⇣

(1,0)0 , ⇣

T1 ^ ⇣

(0,1)0 , ⇣

T1 ^ ⇣

(1,�1)0 .

Proof. The first assertion is an immediate consequence of the commutativity of T1

and T2. We prove it for ⇣(1,0)0 and ⇣

(0,1)0 , the other pairs being similar: since T1 and

T2 commute, if A1 is T1-invariant then the conditional expectation E(A1 | ⇣(0,1)0 )is invariant under both T1 and T2 and hence constant, by ergodicity, and musttherefore simply equal µ(A1).

20 TIM AUSTIN

Handling the three factors together is only a little trickier. If A1 2 ⇣

(1,0)0 , A2 2

⇣

(0,1)0 and A12 2 ⇣

(1,�1)0 , then by the first assertion the target of the factor map

⇣

(1,0)0 _ ⇣

(0,1)0 can simply be identified with a Cartesian product system

(Y1 ⇥ Y2, ⌫1 ⌦ ⌫2, S2 ⇥ id, id⇥ S1)

where S2 is an ergodic transformation of the first coordinate alone and S1 an er-godic transformation of the second. The fact that the invariant measure of thistarget system is a product ⌫1 ⌦ ⌫2 corresponds to the independence of ⇣(1,0)0 and⇣

(0,1)0 . In this picture the set Ai is lifted from some subset A0

i ⇢ Yi under the furthercoordinate projection Y1 ⇥ Y2 ! Yi. Since A12 is ⇣(1,�1)

0 -measurable one has

µ(A1 \A2 \A12) =

Z

XE(A1 \A2 | ⇣(1,�1)

0 ) · 1A12 dµ,

and on Y1 ⇥ Y2 the conditional expectation E(A1 \ A2 | ⇣(1,�1)0 ) is identified with

the conditional expectation of A01 ⇥A

02 onto the sets invariant under S�1

2 ⇥ S1.It is standard that the invariant sets of a product of ergodic systems depend only

on the product of their Kronecker factors (see, for instance, the more general Theo-rem 7.1 in Furstenberg’s original paper [Fur77]), and so our conditional expectationof A0

1 ⇥ A

02 is actually onto the invariant sets of ⇣S2

1 ⇥ ⇣

S11 , whose lifts back up to

X must all be measurable with respect to ⇣

T1 . Therefore E(A1 \ A2 | ⇣(1,�1)

0 ) isactually ⇣

T1 -measurable, and so the above integral is equal to

Z

XE(A1\A2 | ⇣(1,�1)

0 )·E(A12 | ⇣T1 ^⇣(1,�1)0 ) dµ =

Z

X1A1\A2 ·E(A12 | ⇣T1 ^⇣

(1,�1)0 ) dµ.

Applying a symmetric argument to the other sets Ai now shows that this equalsZ

XE(A1 | ⇣T1 ^ ⇣

(1,0)0 ) · E(A2 | ⇣T1 ^ ⇣

(0,1)0 ) · E(A12 | ⇣T1 ^ ⇣

(1,�1)0 ) dµ,

which is the desired assertion of relative independence. ⇤Remark. The second part of the above lemma, although a very simple consequenceof classical results in ergodic theory, has an important counterpart in Lemma 1 (4)of [Shk06b]. It corresponds to the assertion that if sets F1, F2, F12 ✓ Z/NZ arelifted through the coordinate projections

(n1, n2) 7! n1, n2, n1 + n2 respectively

and if in addition they are all linearly uniform (meaning that their Fourier coeffi-cients are all small), then their lifts are approximately independent. In his paperShkredov phrases this in terms of the approximate constancy of a certain convolu-tion of two functions that are lifted from Z/NZ in this way. CLemma 3.12. Suppose that (Y, ⌫, S) is an ergodic Z-system and let ⇣S1 : (Y, ⌫, S) !(Z,mZ , R) be its Kronecker factor. Then for any f, g 2 L

1(⌫), any B ⇢ X with⌫(B) > 0 that is ⇣S1 -measurable, and any " > 0, the set

n

n 2 Z :�

�

�

Z

Bf · (g � Sn) d⌫ �

Z

BE⌫(f | ⇣S1 ) · E⌫(g � Sn | ⇣S1 ) d⌫

�

�

�

"

o


has density 1 in Z.

Remark. The conclusion of this lemma may be re-phrased as asserting that

E⌫(f · (g � Sn) | ⇣S1 ) ⇠ E⌫(f | ⇣S1 ) · E⌫(g � Sn | ⇣S1 )

weakly in L

2(mZ) � ⇣S1 ⇢ L

2(⌫) as n ! 1 along some full-density subset of Z.Strong convergence here for all f and g, rather than weak convergence, would beequivalent to (Y, ⌫, S) being relatively weakly mixing over its Kronecker factor,which is not always the case. CProof. On the one hand

Z

Bf · (g � Sn) d⌫ =

Z

Y(f1B) · (g � Sn) d⌫

and on the other E⌫(f1B | ⇣S1 ) = E⌫(f | ⇣S1 )1B , because B is already ⇣

S1 -measurable,

so after replacing f with f1B if necessary it suffices to treat the case B = Y . Thedesired assertion is now simply that

hf, g � Sni ⇠ hE(f | ⇣S1 ),E(g | ⇣S1 ) � Sni

as n ! 1 outside some zero-density set of ‘exceptional times’ in Z, and this is awell-known property of the Kronecker factor (see, for instance, Furstenberg [Fur81]).

⇤Corollary 3.13. If (Y, ⌫, S) is an ergodic Z-system and E ⇢ Y is such that E ?S

n(E) for all n 6= 0 then E is independent from the �-algebra generated by ⇣

S1

under µ.

Proof. The degenerate case B = Y of the preceding lemma shows that asymptoti-cally for most n we have

⌫(E \ S

�nE) ⇡

Z

YE⌫(E | ⇣S1 ) · (E⌫(E | ⇣S1 ) � Sn) d⌫.

Since the Kronecker factor (Z,mZ , R) is a compact system, for any " > 0 thereis some nonempty Bohr set in Z along which the right-hand values above returnwithin " of

Z

YEµ(E | ⇣S1 )2 d⌫ = kEµ(E | ⇣S1 )k22.

This Bohr set must have positive density and therefore contain a further subset ofvalues of n where our first approximation above is also good. This implies that forany " > 0 there are infinitely many n for which

�

�

⌫(E \ S

n(E))� kEµ(E | ⇣S1 )k22�

�

< ",

but on the other hand our assumption on E implies that

⌫(E \ S

n(E)) = ⌫(E)2 = kEµ(E | ⇣S1 )k21 8n 6= 0.

This is possible only if kEµ(E | ⇣S1 )k1 = kEµ(E | ⇣S1 )k2, which in turn requiresthat Eµ(E | ⇣S1 ) be constant, as required. ⇤

22 TIM AUSTIN

Lemma 3.14. If (X,µ, T1, T2) is an ergodic Z2-system and E1 2 ⌃(1,0), E2 2⌃(0,1) are such that Ei ? T

nj (Ei) for all n 6= 0 whenever {i, j} = {1, 2}, then

also E1 (resp. E2) is independent from ⇣

(0,1)0 _ ⇣

(1,�1)0 (resp. ⇣(1,0)0 _ ⇣

(1,�1)0 ).

Remark. For us this is analogous to the way Shkredov uses his Lemma 1 to estimatethe second term in equation (21) in his Theorem 7. CProof. The second part of Lemma 3.11 implies

Eµ(E1 | ⇣(0,1)0 _ ⇣

(1,�1)0 ) = Eµ(Eµ(E1 | ⇣T1 ^ ⇣

(1,0)0 ) | ⇣(0,1)0 _ ⇣

(1,�1)0 ).

Corollary 3.13 now gives that Eµ(E1 | ⇣T1 ^ ⇣

(1,0)0 ) is constant, and hence so is the

conditional expectation of interest. The proof for E2 is exactly similar. ⇤

3.3. The main estimate.

Proof of Proposition 3.10. Define the trilinear form ⇤ on L

1(µ)3 by

⇤(f0, f1, f2) := limN!1

1

N

NX

n=1

Z

Xf0 · (f1 � Tn

1 ) · (f2 � Tn2 ) dµ.

(In fact this is the integral of the function f0 ⌦ f1 ⌦ f2 against a certain three-foldself-joining of the system (X,µ, T1, T2) called the ‘Furstenberg self-joining’. Wewill not use that more elaborate formalism here, but refer the reader to [Aus10b]and the references given there for a detailed explanation, as well as a proof that thelimit exists.)

Our assumptions include that ⇤(A,A,A) = 0 (where we have simply written A

in place of 1A), but on the other hand by Theorem 3.8 we have

⇤(A,A,A) = ⇤(E(A |⇡0), A,A)

= ⇤�

E(A |⇡0)� µ(A |E1 \ E2)1E1\E2 , A,A

�

+µ(A |E1 \ E2) · ⇤(E1 \ E2, A,A).

We now estimate these two terms separately.First term Directly from the definition of ⇤ we deduce that

�

�⇤�

E(A |⇡0)� µ(A |E1 \ E2)1E1\E2 , A,A

�

�

�

⇤�

|E(A |⇡0)� µ(A |E1 \ E2)1E1\E2 |, A,A�

⇤�

|E(A |⇡0)� µ(A |E1 \ E2)1E1\E2 |, E1 \ E2, E1 \ E2�

,

where the second inequality uses that these three functions are non-negative andthat 1A 1E1\E2 . Now another appeal to Theorem 3.8 shows that this last upperbound is equal to

⇤�

|E(A |⇡0)� µ(A |E1 \ E2)1E1\E2 |, E(E1 \ E2 |⇡1), E(E1 \ E2 |⇡2)�

.

From our hypothesis that E1 ? T

n2 (E1) for all nonzero n and Lemma 3.14 it

follows that E2 is ⇡2-measurable whereas E1 is independent from ⇡2, and hencethat

E(E1 \ E2 |⇡2) = µ(E1)1E2 ,


and similarly with the two indices reversed. Given this we can re-write the aboveterm as

µ(E1)µ(E2)⇤�

|E(A |⇡0)� µ(A |E1 \ E2)1E1\E2 |, E1, E2�

= µ(E1)µ(E2) limN!1

1

N

NX

n=1

Z

X|E(A |⇡0)� µ(A |E1 \ E2)1E1\E2 |

·1T�n1 (E1)

· 1T�n2 (E2)

dµ

= µ(E1)µ(E2)

Z

X|E(A |⇡0)� µ(A |E1 \ E2)1E1\E2 | dµ,

where for the second equality we have now used that Ei is Ti-invariant and that

|E(A |⇡0)�µ(A |E1\E2)1E1\E2 |·1E1 ·1E2 = |E(A |⇡0)�µ(A |E1\E2)1E1\E2 |,

which in turn holds because E1 \E2 is ⇡0-measurable while A ⇢ E1 \E2, so thatboth E(A |⇡0) and 1E1\E2 are still supported on E1 \ E2.

This integral (which no longer involves the trilinear form ⇤) may now be iden-tified as

µ(E1 \ E2)2kE(A |⇡0)� µ(A |E1 \ E2)1E1\E2kL1(µ(· |E1\E2))

µ(E1 \ E2)2kE(A |⇡0)� µ(A |E1 \ E2)1E1\E2kL2(µ(· |E1\E2)),

using the fact that ⌃(1,0) and ⌃(0,1) are independent to write µ(E1)µ(E2) = µ(E1\E2) and using Holder’s inequality for the final upper bound.

Second term This is much simpler: since A ⇢ E1 \ E2 and Ei is Ti-invariantwe have

⇤(E1 \ E2, A,A) = limN!1

1

N

NX

n=1

µ((E1 \ E2) \ T

�n1 A \ T

�n2 A)

= limN!1

1

N

NX

n=1

µ(T�n1 E1 \ T

�n1 A \ T

�n2 E2 \ T

�n2 A)

= limN!1

1

N

NX

n=1

µ(T�n1 A \ T

�n2 A)

= limN!1

1

N

NX

n=1

µ(A \ (T2T�11 )�n

A)

= kEµ(A | ⇣(1,�1)0 )k22

(using the Mean Ergodic Theorem for the last equality), and by another appeal toHolder’s inequality this is bounded below by

kEµ(A | ⇣(1,�1)0 )k21 = µ(A)2 = µ(A |E1 \ E2)

2µ(E1 \ E2)

2.

24 TIM AUSTIN

Combining the estimates Using the inequalities just obtained in our originaldecomposition of ⇤(A,A,A) we find that

0 = ⇤(A,A,A) � µ(A |E1 \ E2)3µ(E1 \ E2)

2

� kE(A |⇡)� µ(A |E1 \ E2)kL2(µ(· |E1\E2)) · µ(E1 \ E2)2,

so re-arranging gives the desired result. ⇤

3.4. Shkredov’s version of the density increment. We can now present Shkre-dov’s main increment result (which corresponds roughly to the conjunction ofProposition 2.7 and Corollary 2.8 in the one-dimensional setting):

Proposition 3.15. There is a nondecreasing function c : (0, 1] ! (0, 1] which isbounded away from 0 on compact subsets of (0, 1] and has the following property.If (X � E1 \ E2 � A, µ, T1, T1) is such that

(i) µ(A) > 0,(ii) the return-set of A contains no nontrivial corners, and

(iii) E1 ? T

�n2 (E1) for all n 6= 0 and similarly for E2,

and if we set � := µ(A |E1 \ E2), then there exists another augmented process

(X 0 � E

01 \ E

02 � A

0, µ

0, T

01, T

02)

having the analogous properties (i-iii) and such that

µ

0(A0 |E01 \ E

02) � � + c(�).

Remark. Shkredov’s argument does not give any effective control over the sizeof the sets E0

i in terms of the Ei — in particular, it could happen that they are verymuch smaller — but the point is that this is not needed. CProof. This breaks naturally into two steps.

Step 1 Extending (X,µ, T1, T2) and lifting A and the Ei if necessary, we mayassume the system is pleasant. Now by Proposition 3.10 conditions (i) and (ii)imply that

kE(A |⇡0)� µ(A |E1 \ E2)kL2(µ(· |E1\E2)) � �

3,

and hence there is some non-negligible ⇡0-measurable set F such that

µ(A |F ) > � + �

3/2.

Moreover, since ⇡0 is generated by ⇣

(1,0)0 and ⇣

(0,1)0 , after approximating this F by

a disjoint union of intersections of T1- or T2-invariant sets we may assume that itis itself of the form F1 \ F2 for some F1 2 ⇣

(1,0)0 , F2 2 ⇣

(0,1)0 .

Naively we should like to replace A ⇢ E1 \ E2 with A \ F1 \ F2 ⇢ F1 \ F2,but these sets Fi may not satisfy Fi ? T

nFi for n 6= 0. We resolve this by another

conditioning and a vague limit construction. Note at this point that this selectionof the sets Fi will be responsible for our lack of control over µ0(E0

i) in terms ofµ(Ei).

Step 2 LetX

0 := ({0, 1}⇥ {0, 1}⇥ {0, 1})Z2


with its product Borel space structure, let T 01, T

02 be the two coordinate-shifts on

this space, and let E01, E0

2 and A

0 be the three obvious time-zero cylinder sets ofX

0:E

01 := {(!1

n,!2n,!

3n) 2 X

0 : !

10 = 1} and similarly.

We will show that for any " > 0 and K � 1 there is a probability measure ⌫ onX

0 such that• ⌫ is approximately invariant: |⌫(C) � ⌫((T 0

i )�1

C)| < " for any C ⇢ X

0

and i = 1, 2,• ⌫(E0

1 \ E

02) � (�3/20)µ(F1 \ F2),

• ⌫(E0i4T

0iE

0i) = 0 for i = 1, 2,

• |⌫(E0i4(T 0

j)�k

E

0i) � ⌫(E0

i)2| < " for all nonzero �K k K for

{i, j} = {1, 2}, and• ⌫(A0 |E0

i \ E

02) � � + �

3/2.

Given this, we may take a sequence of such measures as " # 0 and K ! 1 andlet µ0 be a vague limit of some subsequence to obtain an augmented process

(X 0 � E

01 \ E

02 � A

0, µ

0, T

01, T

02)

having all the desired properties. The T 0i -invariance of µ0 follows from the approx-

imate invariance of the measures ⌫, and the T

0i -invariance of E

0i is only up to a

µ

0-negligible set, but this may then be repaired by replacing E

0i with

S

n(T0i )

nE

0i,

which differs from E

0i only by a µ

0-negligible set. The second of the above pointsensures that the limit µ0 is non-trivial insofar as µ0(A0), µ0(E0

1 \ E

02) > 0.

Now fix " and K. To obtain such a ⌫, let (Z,m,R1, R2) be a compact group ro-tation isomorphic to the Kronecker factor of (X,µ, T1, T2) with factor map ⇣

T1 =:

⇣ : X ! Z, and let U be a Borel partition of Z into sufficiently small pieces that�

�Eµ(Fi | ⇣)|U � µ(Fi |U)�

�

L2(µ(· |U))< "/4

for all U 2 U \ Ubad where m

�

S

Ubad

�

< (�3/20)µ(F1 \ F2).Considering the convex combination

µ(A |F1 \ F2) =X

U2U

µ(F1 \ F2 \ ⇣

�1U)

µ(F1 \ F2)µ(A \ ⇣

�1U |F1 \ F2 \ ⇣

�1U),

the terms indexed by Ubad must contribute very little (because their sum cannot bemore than �

3/20 if we estimate by simply ignoring the factors of µ(A\⇣�1

U |F1\F2 \ ⇣

�1U) 1). Similarly, the terms for which

µ(F1 \ F2 \ ⇣

�1U |F1 \ F2) < (�3/20)m(U)

must also contribute very little (their sum is also less than �

3/20). Therefore there

must be some U 2 U \ Ubad for which

µ(F1 \ F2 \ ⇣

�1U |F1 \ F2) � (�3/20)m(U)

andµ(A \ ⇣

�1U |F1 \ F2 \ ⇣

�1U) � � + �

3/4.

26 TIM AUSTIN

Using Bayes’ formula, the first of these inequalities implies that

µ(F1 \ F2 | ⇣�1U) = µ(F1 \ F2 \ ⇣

�1U |F1 \ F2) ·

µ(F1 \ F2)

m(U)

� (�3/20)µ(F1 \ F2).

Now let V ⇢ Z be the Bohr set

{n 2 Z : m(U4R

n1U) < "m(U)/2 and m(U4R

n2U) < "m(U)/2}.

This is nontrivial because the rotation orbit z 7! 1z+U is continuous from Z toL

2(m), and so V has some (perhaps very small) positive density in Z. In view ofthis positive density, Lemma 3.12 implies that each of the sets

Vj,k :=n

n 2 V :�

�

�

µ(Fi \ T

�knj Fi | ⇣�1

U)

� 1

m(U)

Z

UEµ(Fi | ⇣) · Eµ(T

�knj Fi | ⇣) dm

�

�

�

"/2o

still has relative density 1 inside V for any k 6= 0 and j = 1 or 2, because thewhole set Z \Vj,k has density zero. Hence we may choose some r 2 V , r � 1 thatlies in every Vj,k for j = 1, 2 and k 2 {�K,�K + 1, . . . ,K} \ {0}. On the otherhand, the approximation that defines the members of U \Ubad and the approximatereturn of U to itself under Rn

j for n 2 V imply that

1

m(U)

Z


�krj Fi | ⇣) dm

⇡ 1

m(U)

Z

Uµ(Fi | ⇣�1

U) · Eµ(T�krj Fi | ⇣) dm

= µ(Fi | ⇣�1U) · µ(Fi |T kr

j ⇣

�1U) ⇡ µ(Fi | ⇣�1

U)2

for all nonzero �K k K, where the error incurred is at most "/4+"/4 = "/2.Now consider the map

' : X ! X

0 : x 7!�

1F1(Trn22 x), 1F2(T

rn11 x), 1A\F1\F2(T

rn11 T

rn22 x)

�

(n1,n2)2Z2

and let ⌫ be the image measure '#µ( · | ⇣�1U) on X

0. We will show that this hasthe five desired properties:

• approximate invariance of ⌫ follows from approximate invariance of U

along V :

|⌫(C)� ⌫((T 0i )

�1C)| = |µ('�1

C | ⇣�1U)� µ(T�r

i '

�1C | ⇣�1

U)|

=µ('�1

C \ (⇣�1U4T

ri ⇣

�1U))

µ(⇣�1U)

"/2 < "

for any C ⇢ X

0;


• a simple calculation gives

⌫(E01 \ E

02) = µ(F1 \ F2 | ⇣�1

U),

and this is at least (�3/20)µ(F1 \ F2) by our choice of U ;• similarly,

⌫(E0i4T

0iE

0i) = µ(Fi4TiFi | ⇣�1

U) = 0

for i = 1, 2;• for any nonzero �K k K we have

⌫(E0i \ (T 0

j)�k

E

0i) = µ(Fi \ T

�krj Fi | ⇣�1

U),

and by our selection of r this is within "/2 of

1

m(U)

Z


�knj Fi | ⇣) dm,

which in turn is within "/2 of

µ(Fi | ⇣�1U)2 = ⌫(E0

i)2,

giving the required estimate;• lastly, our choice of U also guarantees that

⌫(A0 |E01 \ E

02) = µ(A \ ⇣

�1U |F1 \ F2 \ ⇣

�1U) � � + �

3/4,

as required.This completes the proof with c(�) := �

3/4. ⇤

Remark. The two steps above can also be loosely identified with two steps inShkredov’s work. The first is similar to the conjunction of Lemma 11 and Propo-sition 3 in Section 3 of [Shk06b], whose use appears at the beginning of the proofof Theorem 4. The second, rather more involved, amounts to Corollary 1 and thevarious auxiliary results needed to reach it in Section 4 of [Shk06b], which thenunderpin the second step of each increment in the proof of Shkredov’s Theorem4. C

Proof of Theorem 3.2. This now proceeds almost exactly as for Theorem 1.2.Suppose there exists an augmented process (X � E1\E2 � A, µ, T1, T2) such

that µ(A) > 0 and hence µ(A |E1 \ E2) =: �0 > 0, Ei ? T

nj (Ei) for all n � 0,

and for which1

N

NX

n=1

µ(A \ T

�n1 A \ T

�n2 A) ! 0.

In particular, if (X � A, µ, T1, T2) is a process violating Theorem 3.2, then (X �X \X � A, µ, T1, T2) is an augmented process with these properties.

From these data one can construct another augmented process (Y � G1\G2 �B, ⌫, S1, S2) such that ⌫(B) = µ(A), ⌫(Gi) = µ(Ei) and this new process actu-ally has no corners in its return set. This construction proceeds in exact analogy

28 TIM AUSTIN

with Step 1 in the proof Theorem 1.3 from Corollary 2.8: the initial process istransferred to the symbolic space

Y := ({0, 1}⇥ {0, 1}⇥ {0, 1})Z2,

where now the three copies of {0, 1} above the coordinate (n1, n2) receive theindicator functions of 1E1 �T

n11 T

n22 , 1E2 �T

n11 T

n22 and 1A �Tn1

1 T

n22 respectively;

and then averaging over dilations constructs a new shift-invariant measure on thissymbolic space that retains the properties of the original system but actually has nocorners in its return set. A quick check shows that if G1, G2 and B denote the one-dimensional cylinder sets defined by the three different {0, 1}-valued coordinatesabove (0, 0) in Y , then the Gi retain the property of ⌫-a.s. invariance under Si andalso the property that Gi ? S

nj (Gi) for all n 6= 0 (because the measure ⌫(Gi \

S

nj (Gi)) is obtained as an average over m of µ(Ei \ T

nmj (Ei)), and these are all

equal to µ(Ei)2 = ⌫(Gi)2 by assumption).Now implementing Proposition 3.15, one can construct from (Y � G1 \G2 �

B, ⌫, S1, S2) a new augmented process (X 0 � E

01 \ E

02 � A

0, µ

0, T

01, T

02) which

still has all the properties (i–iii) and for which µ(A0 |E01\E0

2) � �0+c(�0). Since cis uniformly positive on [�0, 1], after iterating this construction finitely many timeswe obtain an example of an augmented process for which this relative density isgreater than 1, a contradiction. ⇤Remark. The above treatment bears comparison with how Shkredov assembles thevarious components of the proof of his main result, Theorem 4, in [Shk06b]. C

4. FURTHER DISCUSSION

Theorem 3.2 remains the most elaborate higher-dimensional case of Theorem 1.4to be successfully proved using a density-increment argument, or to be given boundsthat improve over the hypergraph-regularity proofs of the general theorem obtainedin [Gow07] and [NRS06]. Perhaps the most obvious obstruction to further progressis that the various ‘inverse theorems’ that are known for the relevant notions of uni-formity remain incomplete. However, in the ergodic-theoretic world these corre-spond to ‘characteristic factor’ theorems such as Theorem 3.8, and recent work hasin fact taken these a little further. The following result appears (in a slightly moregeneral form) as Theorem 1.1 in [Ausb], where it is used for a different purpose.

Theorem 4.1. Any ergodic Z2-system (X�, µ

�, T

�1 , T

�2 ) admits an ergodic exten-

sion⇡ : (X,µ, T1, T2) ! (X�

, µ

�, T

�1 , T

�2 )

with the property that

1

N

NX

n=1

Z

Xf0 · (f1 � Tn

1 ) · (f2 � Tn2 ) · (f3 � Tn

1 Tn2 ) dµ

⇠ 1

N

NX

n=1

Z

XE(f0 |⇡0)·(E(f1 |⇡1)�Tn

1 )·(E(f2 |⇡2)�Tn2 )·(E(f3 |⇡3)�Tn

1 Tn2 ) dµ


in L

2(µ) as N ! 1 for any f0, f1, f2, f3 2 L

1(µ), where

⇡0 = ⇡3 := ⇣

(1,0)0 _ ⇣

(0,1)0 _ ⇣

(1,1)0 _ ⇣

T2,nil

⇡1 = ⇡2 := ⇣

(1,0)0 _ ⇣

(1,�1)0 _ ⇣

(0,1)0 _ ⇣

T2,nil,

and where ⇣

T2,nil denotes a factor generated by an inverse limit of a sequence of

actions of Z2 by two-step nilrotations.

Once again, these ⇡i are referred to as the ‘characteristic’ factors for these mul-tiple averages.

Moreover, a relatively simple extension of Lemma 3.11 shows that the four fac-tors ⇣

(1,0)0 , ⇣(0,1)0 , ⇣(1,1)0 and ⇣

(1,�1)0 that appear above are relatively independent

over their further intersections with ⇣

T2,nil (see Proposition 5.3 in [Ausb]). Theo-

rem 4.1 and this second result are both known special cases of a general conjectureon the joint distributions of partially invariant factors of Zd-systems, which maybe found formulated carefully in Section 6 of [Aus10b] and which suggests thatan inverse theory for all higher-dimensional notions of uniformity generalizing theGowers norms will ultimately be available.

Theorem 4.1 itself bears on the special case of multiple recurrence asserting that

µ(A) > 0 ) limN!1

1

N

NX

n=1

µ(A \ T

�n1 A \ T

�n2 A \ T

�n1 T

�n2 A) > 0,

which in the finitary world corresponds to finding squares in dense subsets of Z2.The above structural results offer hope that some analog of Shkredov’s density-increment approach may be possible through the study of pleasant augmented pro-cesses of the form

(X � E1 \ E2 \ E3 \ E4 � A, µ, T1, T2)

where E1, E2, E3 and E4 are measurable with respect to ⇣

(1,0)0 , ⇣(0,1)0 , ⇣(1,1)0 and

⇣

(1,�1)0 respectively. Of course, more ideas would still be needed to give a new

density-increment proof of this instance of multiple recurrence, even in the infini-tary setting of ergodic theory. For example, Proposition 3.10 must be replaced withsome more complicated estimate, and then arguments in the previous section whichused some conditioning on the Kronecker factor would presumably be replaced byconditioning on ⇣

T2,nil, which can have much more complicated behaviour.

Another interesting issue on which ergodic theory can shed some light concernsthe difference between the problems of proving multiple recurrence for the aboveaverages and for the averages

1

N

NX

n=1

µ(A \ T

�n1 A \ T

�n2 A \ T

�n3 A)

arising from a Z3-system (X,µ, T1, T2, T2). We offer only a very informal discus-sion of this here, since precise results on these more complex problems are stillin their infancy. In the finitary world, these latter averages correspond to finding

30 TIM AUSTIN

three-dimensional corners in dense subsets of Z3, rather than squares in Z2. Since atriple of the form (T1, T2, T1T2) does formally generate an action of Z3, it is clearthat multiple recurrence for these Z3-system averages is at least as strong as itscounterpart for the averages of Theorem 4.1. However, the identification of char-acteristic factors for the case of Z3-systems is also apparently simpler: the mainresult of [Aus10a] shows that, after passing to a suitable extension if necessary, onehas

1

N

NX

n=1

Z

Xf0 · (f1 � Tn

1 ) · (f2 � Tn2 ) · (f3 � Tn

3 ) dµ

⇠ 1

N

NX

n=1

Z

XE(f0 |⇡0

0)·(E(f1 |⇡01)�Tn

1 )·(E(f2 |⇡02)�Tn

2 )·(E(f3 |⇡03)�Tn

3 ) dµ

with

⇡

00 := ⇣

(1,0,0)0 _ ⇣

(0,1,0)0 _ ⇣

(0,0,1)0 , ⇡

01 := ⇣

(1,0,0)0 _ ⇣

(1,�1,0)0 _ ⇣

(1,0,�1)0 ,

⇡

02 := ⇣

(1,�1,0)0 _ ⇣

(0,1,0)0 _ ⇣

(0,1,�1)0 and ⇡

03 := ⇣

(1,0,�1)0 _ ⇣

(0,1,�1)0 _ ⇣

(0,0,1)0 ,

and these are the minimal factors with this property. These factors are ‘simpler’ inthat they involve only partially-invariant factors, and not compact group rotationsor nilsystems. The fact that some of the ingredients needed in Theorem 4.1 nolonger appear here does not contradict the fact that a triple such as (T1, T2, T1T2)generates a Z3-system, because after passing to a suitable extension the algebraicrelations among the generators of this Z3-system will usually be lost.

It thus appears that the analysis of the more general averages might actually beeasier, and in fact for deploying some of the methods at our disposal this is true.The ergodic theoretic proof of convergence of these averages in [Aus09] (reprovinga result of Tao from [Tao08]) implicitly needs the linear independence of the groupelements corresponding to T1, T2 and T3. In the finitary world, the hypergraph-regularity proofs of the multidimensional Szemeredi Theorem must first lift theproblem into a group Zd for d large enough so that one is looking for the cornersof a d-dimensional simplex (rather than any more complicated d-dimensional con-stellations) before this search can be correctly recast in the language of extremalhypergraph theory.

However, both of these arguments use only the most basic, ‘rough’ structurefor the data being studied, and by contrast the more refined density-incrementapproach is simpler in the case of two-dimensional squares than that of three-dimensional corners. Each of the superficially-simpler characteristic factors ⇡0

i forthe three-dimensional problem is assembled from ingredients of the form ⇣

v0 for

some v 2 Z3, and each of these is a factor map onto a factor of (X,µ, T1, T2, T3)on which the acting group is essentially Z3

/Zv ⇠= Z2 (owing to the partial invari-ance). In order to mimic Shkredov’s approach to these results, it is then necessaryto know how all of these essentially two-dimensional systems are jointly distributedas factors of (X,µ, T1, T2, T3) (in order to generalize our use of Lemma 3.11 in theproof of Proposition 3.10, for example). It turns out that to understand this joint


distribution one needs the same kind of machinery as for the identification of atuple of characteristic factors in the first place (the reason why these are essen-tially equivalent problems is discussed in detail in Chapter 4 of [Aus10b]); and theparticular problem of describing the joint distribution of these ‘two-dimensional’factors turns out to be of a similar level of complexity to the problem of describingcharacteristic factors for multiple recurrence across squares in a Z2-action.

So the finer information required for the density-increment strategy forces oneto understand not only the ‘top-level’ structural result that is contained in the iden-tification of a characteristic tuple of factors, but also how all the ingredients ap-pearing in those characteristic factors are jointly distributed. This can be of similardifficulty to a lower-dimensional problem of identifying characteristic factors. Forunderstanding multiple recurrence across translates and dilates of some compli-cated constellation in Zd, one might need to work with a large partially orderedfamily of factors of a given system, where the characteristic factors appear as themaximal elements, and several layers of smaller factors (including group rotations,nilsystems, or possibly something else) must also be identified in order to describeall the necessary joint distributions well enough to implement a density increment.For a density-increment proof such as in Section 3 above, this would presumablyentail working with a much richer analog of the augmented processes that appearthere.

These speculations notwithstanding, serious problems surround the status offinitary analogs of Theorem 4.1 or its generalizations. I believe such analogs areexpected by many researchers in this field, but formulating a precise conjectureis already tricky, and at this writing I know of no higher-dimensional results be-yond Shkredov’s. It is not clear what methods (extending Shkredov’s or others) areneeded to establish such structural results. Without them, the prospect of a density-increment proof of the presence of prescribed constellations in dense subsets of Zd

seems rather remote.

REFERENCES

[Ausa] Tim Austin. Pleasant extensions retaining algebraic structure, I. Preprint, available onlineat arXiv.org: 0905.0518.

[Ausb] Tim Austin. Pleasant extensions retaining algebraic structure, II. Preprint, available onlineat arXiv.org: 0910.0907.

[Aus09] Tim Austin. On the norm convergence of nonconventional ergodic averages. Ergodic The-ory Dynam. Systems, 30(2):321–338, 2009.

[Aus10a] Tim Austin. Deducing the multidimensional Szemeredi theorem from an infinitary re-moval lemma. J. Anal. Math., 111:131–150, 2010.

[Aus10b] Tim Austin. Multiple recurrence and the structure of probability-preserving systems. Pro-Quest LLC, Ann Arbor, MI, 2010. Thesis (Ph.D.)–University of California, Los Angeles.

[Ber96] Vitaly Bergelson. Ergodic Ramsey Theory – an Update. In M. Pollicott and K. Schmidt,editors, Ergodic Theory of Zd-actions: Proceedings of the Warwick Symposium 1993-4,pages 1–61. Cambridge University Press, Cambridge, 1996.

[BHK05] Vitaly Bergelson, Bernard Host, and Bryna Kra. Multiple recurrence and nilsequences.Invent. Math., 160(2):261–303, 2005. With an appendix by Imre Ruzsa.

[BLL08] V. Bergelson, A. Leibman, and E. Lesigne. Intersective polynomials and the polynomialSzemeredi theorem. Adv. Math., 219(1):369–388, 2008.

32 TIM AUSTIN

[Bou99] J. Bourgain. On triples in arithmetic progression. Geom. Funct. Anal., 9(5):968–984,1999.

[Bou08] Jean Bourgain. Roth’s theorem on progressions revisited. J. Anal. Math., 104:155–192,2008.

[CL84] Jean-Pierre Conze and Emmanuel Lesigne. Theoremes ergodiques pour des mesures di-agonales. Bull. Soc. Math. France, 112(2):143–175, 1984.

[CL88a] Jean-Pierre Conze and Emmanuel Lesigne. Sur un theoreme ergodique pour des mesuresdiagonales. In Probabilites, volume 1987 of Publ. Inst. Rech. Math. Rennes, pages 1–31.Univ. Rennes I, Rennes, 1988.

[CL88b] Jean-Pierre Conze and Emmanuel Lesigne. Sur un theoreme ergodique pour des mesuresdiagonales. C. R. Acad. Sci. Paris Ser. I Math., 306(12):491–493, 1988.

[FK78] Hillel Furstenberg and Yitzhak Katznelson. An ergodic Szemeredi Theorem for commut-ing transformations. J. d’Analyse Math., 34:275–291, 1978.

[FK91] Hillel Furstenberg and Yitzhak Katznelson. A Density Version of the Hales-Jewett Theo-rem. J. d’Analyse Math., 57:64–119, 1991.

[Fur77] Hillel Furstenberg. Ergodic behaviour of diagonal measures and a theorem of Szemeredion arithmetic progressions. J. d’Analyse Math., 31:204–256, 1977.

[Fur81] Hillel Furstenberg. Recurrence in Ergodic Theory and Combinatorial Number Theory.Princeton University Press, Princeton, 1981.

[FW03] Hillel Furstenberg and Benjamin Weiss. Markov processes and Ramsey theory for trees.Combin. Probab. Comput., 12(5-6):547–563, 2003. Special issue on Ramsey theory.

[Gow98] W. T. Gowers. A new proof of Szemeredi’s theorem for arithmetic progressions of lengthfour. Geom. Funct. Anal., 8(3):529–551, 1998.

[Gow00] W. T. Gowers. Rough structure and classification. Geom. Funct. Anal., (Special Volume,Part I):79–117, 2000. GAFA 2000 (Tel Aviv, 1999).

[Gow01] W. T. Gowers. A new proof of Szemeredi’s theorem. Geom. Funct. Anal., 11(3):465–588,2001.

[Gow07] W. T. Gowers. Hypergraph regularity and the multidimensional Szemeredi theorem. Ann.of Math. (2), 166(3):897–946, 2007.

[Gre05] Ben Green. Finite field models in additive combinatorics. In Surveys in combinatorics2005, volume 327 of London Math. Soc. Lecture Note Ser., pages 1–27. Cambridge Univ.Press, Cambridge, 2005.

[GT12] Ben Green and Terence Tao. The quantitative behaviour of polynomial orbits on nilmani-folds. Ann. of Math. (2), 175(2):465–540, 2012.

[HK05] Bernard Host and Bryna Kra. Nonconventional ergodic averages and nilmanifolds. Ann.Math., 161(1):397–488, 2005.

[Kra07] Bryna Kra. Ergodic methods in additive combinatorics. In Additive combinatorics, vol-ume 43 of CRM Proc. Lecture Notes, pages 103–143. Amer. Math. Soc., Providence, RI,2007.

[Lei98] A. Leibman. Polynomial sequences in groups. J. Algebra, 201(1):189–206, 1998.[Lei05] A. Leibman. Pointwise convergence of ergodic averages for polynomial sequences of

translations on a nilmanifold. Ergodic Theory Dynam. Systems, 25(1):201–213, 2005.[Les91] Emmanuel Lesigne. Sur une nil-variete, les parties minimales associees a une translation

sont uniquement ergodiques. Ergodic Theory Dynam. Systems, 11(2):379–391, 1991.[NRS06] Brendan Nagle, Vojtech Rodl, and Mathias Schacht. The counting lemma for regular k-

uniform hypergraphs. Random Structures Algorithms, 28(2):113–179, 2006.[Par69] William Parry. Ergodic properties of affine transformations and flows on nilmanifolds.

Amer. J. Math., 91:757–771, 1969.[Par70] William Parry. Dynamical systems on nilmanifolds. Bull. London Math. Soc., 2:37–40,

1970.[Par73] William Parry. Dynamical representations in nilmanifolds. Compositio Math., 26:159–

174, 1973.[Rot53] K. F. Roth. On certain sets of integers. J. London Math. Soc., 28:104–109, 1953.


[Shk06a] I. D. Shkredov. On a generalization of Szemeredi’s theorem. Proc. London Math. Soc. (3),93(3):723–760, 2006.

[Shk06b] I. D. Shkredov. On a problem of Gowers. Izv. Ross. Akad. Nauk Ser. Mat., 70(2):179–221,2006.

[Shk09] I. D. Shkredov. On a two-dimensional analogue of Szemeredi’s theorem in abelian groups.Izv. Ross. Akad. Nauk Ser. Mat., 73(5):181–224, 2009.

[Tao06a] Terence Tao. A quantitative ergodic theory proof of Szemeredi’s theorem. Electron. J.Combin., 13(1):Research Paper 99, 49 pp. (electronic), 2006.

[Tao06b] Terence Tao. A variant of the hypergraph removal lemma. J. Combin. Theory Ser. A,113(7):1257–1280, 2006.

[Tao08] Terence Tao. Norm convergence of multiple ergodic averages for commuting transforma-tions. Ergodic Theory and Dynamical Systems, 28:657–688, 2008.

[TV06] Terence Tao and Van Vu. Additive combinatorics. Cambridge University Press, Cam-bridge, 2006.

[Zie05] T. Ziegler. A non-conventional ergodic theorem for a nilsystem. Ergodic Theory Dynam.Systems, 25(4):1357–1370, 2005.

[Zie07] Tamar Ziegler. Universal characteristic factors and Furstenberg averages. J. Amer. Math.Soc., 20(1):53–97 (electronic), 2007.

[Zim76a] Robert J. Zimmer. Ergodic actions with generalized discrete spectrum. Illinois J. Math.,20(4):555–588, 1976.

[Zim76b] Robert J. Zimmer. Extensions of ergodic group actions. Illinois J. Math., 20(3):373–409,1976.

COURANT INSTITUTE, NEW YORK UNIVERSITY, NEW YORK, NY 10012, USAE-mail address: [email protected]: http://www.cims.nyu.edu/˜tim

Documents

ERGODIC-THEORETIC IMPLEMENTATIONS OF THE ROTHweb.math.rochester.edu/misc/ojac/vol8/87.pdf · Within ergodic theory, a great deal of energy has now been spent on obtaining the most