This is page i Logarithmic Combinatorial Structures: a ..._Publications...This is page i Printer: Opaque this Logarithmic Combinatorial Structures: a Probabilistic Approach Richard

This is page iPrinter: Opaque this

Logarithmic Combinatorial Structures:

a Probabilistic Approach

Richard Arratia, 1 A. D. Barbour 2 and Simon Tavare

DRAFT. November 26, 2002

1RAA and ST: Department of Mathematics, University of Southern California, 1042W. 36th Place, DRB 155, Los Angeles, CA 90089–1113, USA.

2ADB: Abteilung fur Angewandte Mathematik, Universitat Zurich, Winterthur-

erstrasse 190, CH–8057 Zurich, Switzerland

ii

Acknowledgements

The research described in this book has evolved over many years afterextensive discussions with many colleagues. Our thanks go in particularto Persi Diaconis, Jennie Hansen, Lars Holst, Svante Janson, Jim Pitman,Gian-Carlo Rota and Dudley Stark for discussions on the combinatorialside of the work, and to Warren Ewens, Bob Griffiths and Geoff Watter-son for discussions about the population genetics aspects of the work. Wethank Monash University and the Institute for Mathematical Sciences atthe National University of Singapore for their support.

This is page iiiPrinter: Opaque this

Contents

0 Preface vi

1 Permutations and primes 11.1 Random permutations and their cycles . . . . . . . . . . 11.2 Random integers and their prime factors . . . . . . . . . 181.3 Contrasts between permutations and primes . . . . . . . 23

2 Decomposable combinatorial structures 262.1 Some combinatorial examples . . . . . . . . . . . . . . . 272.2 Assemblies, multisets and selections . . . . . . . . . . . . 372.3 The probabilistic perspective . . . . . . . . . . . . . . . . 392.4 Refining and coloring . . . . . . . . . . . . . . . . . . . . 452.5 Tilting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

3 Probabilistic preliminaries 563.1 Total variation and Wasserstein distances . . . . . . . . . 583.2 Rates of convergence . . . . . . . . . . . . . . . . . . . . 593.3 Results for classical logarithmic structures . . . . . . . . 613.4 Stein’s method . . . . . . . . . . . . . . . . . . . . . . . . 64

4 The Ewens Sampling Formula 674.1 Size-biasing . . . . . . . . . . . . . . . . . . . . . . . . . 694.2 The random variable Xθ . . . . . . . . . . . . . . . . . . 714.3 The random variable Xθ

(α) . . . . . . . . . . . . . . . . . 76

iv Contents

4.4 Point probabilities for Tbn . . . . . . . . . . . . . . . . . 804.5 Weak laws for small cycles . . . . . . . . . . . . . . . . . 834.6 The number of cycles . . . . . . . . . . . . . . . . . . . . 884.7 The shortest cycles . . . . . . . . . . . . . . . . . . . . . 914.8 The ordered cycles . . . . . . . . . . . . . . . . . . . . . 934.9 The largest cycles . . . . . . . . . . . . . . . . . . . . . . 954.10 The Erdos-Turan Law . . . . . . . . . . . . . . . . . . . 1024.11 The Poisson-Dirichlet and GEM distributions . . . . . . 104

5 Logarithmic combinatorial structures 1115.1 Results for general logarithmic structures . . . . . . . . . 1115.2 Verifying the local limit conditions . . . . . . . . . . . . 1235.3 Refinements and extensions . . . . . . . . . . . . . . . . 132

6 General setting 1346.1 Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . 1346.2 Basic framework . . . . . . . . . . . . . . . . . . . . . . . 1386.3 Working conditions . . . . . . . . . . . . . . . . . . . . . 1406.4 Tilting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1456.5 d–fractions . . . . . . . . . . . . . . . . . . . . . . . . . . 1476.6 Illustrations . . . . . . . . . . . . . . . . . . . . . . . . . 1496.7 Main theorems . . . . . . . . . . . . . . . . . . . . . . . . 150

7 Consequences 1567.1 Functional central limit theorems . . . . . . . . . . . . . 1567.2 Poisson–Dirichlet limits . . . . . . . . . . . . . . . . . . . 1607.3 The number of components . . . . . . . . . . . . . . . . . 1707.4 Erdos–Turan laws . . . . . . . . . . . . . . . . . . . . . . 1837.5 Additive function theory . . . . . . . . . . . . . . . . . . 185

8 A Stein Equation 2088.1 Stein’s method for T0m(Z∗) . . . . . . . . . . . . . . . . 2088.2 Stein’s method for Pθ . . . . . . . . . . . . . . . . . . . . 2138.3 Applying Stein’s method . . . . . . . . . . . . . . . . . . 216

9 Point probabilities 2219.1 Bounds on individual probabilities . . . . . . . . . . . . . 2219.2 Differences of point probabilities . . . . . . . . . . . . . . 229

10 Distributional comparisons with Pθ 24710.1 Comparison of L(Tvm(Z)) and L(Tvm(Z∗)) . . . . . . . . 24710.2 Comparing L(m−1Tvm(Z∗)) with Pθ . . . . . . . . . . . 25710.3 Comparing L(m−1Tvm(Z∗)) with Pθ(α) . . . . . . . . . . 262

11 Comparisons with Pθ: point probabilities 26411.1 Local limit theorems for Tvm(Z) . . . . . . . . . . . . . . 264

Contents v

11.2 Comparison of Tvm(Z) with Tvm(Z∗): point probabilities 26711.3 Comparison with pθ . . . . . . . . . . . . . . . . . . . . . 276

12 Proofs 27912.1 Proof of Theorem 6.6 . . . . . . . . . . . . . . . . . . . . 27912.2 Proof of Theorem 6.7 . . . . . . . . . . . . . . . . . . . . 28012.3 Proof of Theorem 6.8 . . . . . . . . . . . . . . . . . . . . 28212.4 Proof of Theorem 6.9 . . . . . . . . . . . . . . . . . . . . 28512.5 Proof of Theorem 6.10 . . . . . . . . . . . . . . . . . . . 28812.6 Proof of Theorem 6.11 . . . . . . . . . . . . . . . . . . . 28912.7 Proof of Theorem 6.12 . . . . . . . . . . . . . . . . . . . 29012.8 Proof of Theorem 6.13 . . . . . . . . . . . . . . . . . . . 29212.9 Proof of Theorem 6.14 . . . . . . . . . . . . . . . . . . . 29712.10 Proof of Theorem 7.10 . . . . . . . . . . . . . . . . . . . 300

13 Technical complements 306

References 316Notation Index . . . . . . . . . . . . . . . . . . . . . . . . . . . 329Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331Subject Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334

This is page viPrinter: Opaque this

0Preface

This book comes in two parts. The first is an introduction to the asymptotictheory of combinatorial structures that can be decomposed into componentelements. The second is a detailed study of such structures, in the style ofa research monograph. The reason for this split is that the main ideas arerather straightforward, and can be relatively simply explained. However,using these ideas and a fair amount of technical application, there are manysharp results that can be derived as consequences. We present some of these,to illustrate that the method is not only simple but also powerful.

We are specifically concerned with the component frequency spectrum —that is, with the numbers and sizes of the component elements — of a ‘typ-ical’ structure of given (large) size n. A classic example of a decomposablecombinatorial structure is that of permutations of n elements, with cyclesas the component elements; here, the component spectrum is just the cycletype. Our approach is to take ‘typical’ to mean ‘with high probability’,when a structure is chosen at random according to some given probabilitydistribution from the set of all structures of size n; most commonly, but notnecessarily, according to the uniform distribution. This enables us to intro-duce ideas from probability theory, such as conditioning, Stein’s methodand distributional approximation, as tools in our investigation.

We gain our understanding of the component spectrum by comparisonwith simpler random objects. Sometimes these objects are discrete; indeed,our fundamental comparisons are with sequences of independent random

0. Preface vii

variables and with the Ewens Sampling Formula. However, we also usecontinuous approximations, such as Brownian motion, the scale invariantPoisson process and the Poisson-Dirichlet process. Our comparisons are for-mulated not only as limit theorems as n→∞, but also as approximationswith concrete error bounds, valid for any fixed n. In the first eight chapters,we introduce our approach, prove some of the basic approximations, andoutline the more detailed results and their consequences. From Chapter 9onwards, the treatise becomes (unashamedly) technical.

In a decomposable structure of size n, the component spectrum consistsof the numbers C1

(n) counting components of size 1, C2(n) counting compo-

nents of size 2, . . . , Cn(n) counting components of size n, where the Ci(n)

have to satisfy the equation

C1(n) + 2C2

(n) + · · ·+ nCn(n) = n, (0.1)

because the structure has total size n. A quantity of traditional interestand frequent study is then

K0n = C1(n) + C2

(n) + · · ·+ Cn(n),

the number of components in the structure. The vector of component counts(C1

(n), C2(n) . . . , Cn

(n)) can be viewed as a stochastic process, if the structureis chosen at random from among the p(n) structures of size n. A ‘typical’property then corresponds to an event, defined in terms of the stochasticprocess, which has ‘high’ probability; for the uniform distribution over allpossible structures of size n, this is equivalent to a property of the structurewhich is true of a ‘high’ proportion of all such structures.

We are thus concerned with the behavior of the discrete dependentnonnegative integer valued random processes

C(n) = (C1(n), C2

(n), . . . , Cn(n)), n = 1, 2, . . .

arising from randomly chosen combinatorial structures. These processeshave to satisfy (0.1), of course, but all the classical examples have muchmore in common. A key common feature is that, for each n ≥ 1, the jointdistribution L(C1

(n), . . . , Cn(n)) satisfies the Conditioning Relation

L(C1(n), . . . , Cn

(n)) = L(Z1, Z2, . . . , Zn|T0n = n), (0.2)

for a fixed sequence of independent random variables Z1, Z2, . . . takingvalues in ZZ+, where

T0n = T0n(Z) = Z1 + 2Z2 + · · ·+ nZn.

For the classical combinatorial structures, the random variables Zi haveeither Poisson, negative binomial or binomial distributions. For example,random permutations, discussed in detail in Chapter 1.1, satisfy the Con-ditioning Relation for random variables Zi that have Poisson distributionswith means IEZi = 1/i.

viii 0. Preface

Our unifying approach is developed in a context motivated by a largesub–class of classical combinatorial structures that share, in addition tothe Conditioning Relation, the following common feature. We assume thatthe random variables (Zi, i ≥ 1) are such as to satisfy the LogarithmicCondition

iIP[Zi = 1] → θ, iIEZi → θ as i→∞ (0.3)

for some θ > 0. In our probabilistic setting, there is no need to be more spe-cific about the distributions of the Zi, so that we are free to move away fromthe classical Poisson, binomial and negative binomial families; this addedflexibility has its uses, for example when investigating random characteris-tic polynomials over a finite field. And even within the classical families, wecan choose θ to take a value different from that normally associated withthe uniform distribution over a well known set of combinatorial objects.The simplest example of this arises when the Zj have Poisson distributionswith mean IEZj = θ/j, for any θ > 0; the special case θ = 1 correspondsto the uniform distribution. In the general case, the distribution of C(n) iscalled the Ewens Sampling Formula. This distribution, discussed in detailin Chapter 4, plays a central role in our work.

Our main theme is that the Conditioning Relation and the LogarithmicCondition are together enough to ensure that the component spectrumof a large decomposable combinatorial structure has a prescribed, univer-sal form; the numbers of small components of different sizes are almostindependent, and the sizes of the large components are jointly distributedalmost as those of the Ewens Sampling Formula. We complement this broadpicture with many detailed refinements.

History

The comparison of the component spectrum of a combinatorial structureto an independent process, with or without further conditioning, has along history. Perhaps the best known example is the representation of themultinomial distribution with parameters n and p1, . . . , pk as the jointlaw of independent Poisson random variables with means λp1, . . . , λpk,conditional on their sum being equal to n.

Holst (1979a) provides an approach to urn models that unifies multino-mial, hypergeometric and Polya sampling. The joint laws of the dependentcounts of the different types sampled are represented, respectively, as thejoint distribution of independent Poisson, negative binomial, and binomialrandom variables, conditioned on their sum. See also Holst (1979b, 1981).The quality of such approximations is assessed using metrics, includingthe total variation distance, by Stam (1978) and Diaconis and Freedman(1980).

The Conditioning Relation also appears in the context of certain mod-els of migration and clustering. In that setting, n individuals are classified

0. Preface ix

as belonging to different groups, the number of individuals belonging togroups of size j being Cj(n). At stationarity, the distribution of C(n) satis-fies the Conditioning Relation for independent Poisson random variablesZ1, Z2, . . .. See Whittle (1967, 1986) and Kelly (1979) for further details.

The books by Kolchin, Sevast’yanov and Chistyakov (1978) and Kolchin(1986, 1998) use the representation of the component spectrum of combi-natorial structures, including random permutations and random mappings,in terms of independently distributed random variables, conditioned on thevalue of their sum. However, Kolchin’s technique uses independent randomvariables that are identically distributed, and the number of componentsCi

(n) of size i is the number of random variables which take on the value i.Conditioning was exploited by Shepp and Lloyd (1966) in their semi-

nal paper on the asymptotics of random permutations, and also used byWatterson (1974) in a study of the Ewens Sampling Formula. The unpub-lished lecture notes of Diaconis and Pitman (1986) also emphasize the roleof conditioning and probabilistic methods. Hansen (1989, 1990) uses con-ditioning to study the Ewens Sampling Formula and random mappings.Fristedt (1992, 1993) exploits conditioning to study random partitions ofa set and random partitions of an integer; the sharpest results for ran-dom partitions of sets and integers are given in Pittel (1997a,b). Hansen(1994) has a systematic treatment of the behavior of the large componentsof logarithmic combinatorial structures.

Logarithmic combinatorial structures are usually studied without appealto the conditioning relation, but using generating function methods instead.General discussions focussing on probabilistic properties include Knopf-macher (1979), Flajolet and Soria (1990), Flajolet and Odlyzko (1990a),Odlyzko (1995), Hwang (1994, 1998a,b), Zhang (1996a,b, 1998), Gourdon(1998), Panario and Richmond (2001) and Flajolet and Sedgewick (1999).For further treatment of the algebraic aspects of decomposable structures,the reader is referred to Foata (1974) and Joyal (1981) and to the booksby Goulden and Jackson (1983), Stanley (1986) and Bergeron, Labelle andLeroux (1997).

Organization of the book

We begin in Chapter 1 with a survey of the joint behavior of the numbersof cycles of different sizes in a random permutation, to give a concreteand simple illustration of phenomena which occur throughout the classof logarithmic combinatorial structures. Then, for the sake of historicalperspective, we outline the analogous results for the prime factorizationof a random integer, even though this is not an example of the class ofcombinatorial structures studied in our book.

Chapter 2 gives the combinatorial description of decomposable combi-natorial structures, both logarithmic and non-logarithmic, first by way ofspecific examples such as mappings, partitions, and trees, and then in terms

x 0. Preface

of general classes: assemblies, multisets, and selections. Next we give theprobabilistic description of these classic combinatorial objects, focusing firston the Conditioning Relation (0.2), which is an algebraic condition; andthen on the Logarithmic Condition (0.3), an analytic condition whichcharacterizes the Logarithmic Class. We provide a combinatorial perspec-tive on refining and coloring, including for example wreath products, andwe discuss tilting, which may be considered as a probabilistic extension ofcoloring.

Chapter 3 begins the discussion of Logarithmic Combinatorial structuresin the full generality of an arbitrary sequence of independent nonnegativeinteger valued random variables Zi satisfying the Logarithmic Condition(0.3), and made into a dependent process by the Conditioning Relation(0.2), so that the classical combinatorial examples are included as a specialcase. We discuss the probability metrics — total variation distance andvarious Wasserstein distances — used to assess the accuracy of our proba-bilistic approximations. We then give a brief survey of the results that weare able to derive, and conclude with an introduction to Stein’s method, atechnique that is essential for many of our proofs.

A central family of discrete distributions, the Ewens Sampling Formula,is introduced in Chapter 4, together with the associated tools and limit pro-cesses used to understand it: size biasing, certain infinitely divisible randomvariables, the scale invariant Poisson process, the GEM distribution, andthe Poisson-Dirichlet distribution. These are used to give a rather extensiveasymptotic description of its properties.

The same tools are used in Chapter 5 to extend the asymptotic de-scription to more general logarithmic combinatorial structures. A single,relatively simple technical condition, the local limit theorem (LLT) of (5.6),is shown to imply the naive limit laws (3.4) for small components and (3.5)for large components. We then show that for combinatorial structures suchas assemblies, multisets and selections, the mild Logarithmic Condition(0.3) is already enough to imply (LLT).

For more general logarithmic combinatorial structures, this simple ap-proach fails, and more sophisticated tools are needed. Chapter 6 sets thescene. We use the Conditioning Relation to show that the joint distributionof the large components of a logarithmic combinatorial structure is closeto that of the large components of the Ewens Sampling Formula, providedthat, for large i, the distribution of Zi is close to that of Z∗i , which has thePoisson distribution with mean θ/i, and that the distribution of T0n(Z) isclose to that of T0n(Z∗). We then discuss how to measure the differencebetween L(Zi) and Po(θ/i), and establish working conditions under whichthe the influence of these differences can be controlled. Under these con-ditions, Stein’s method can be used to show the closeness of L(T0n(Z))to L(T0n(Z∗)), and it turns out that this also enables one to show thecloseness of the joint distributions L(C1

(n), . . . , Cb(n)) and L(Z1, . . . , Zb) for

b = o(n), thus treating the small components as well. We illustrate the con-

0. Preface xi

ditions as applied to some of the basic examples, such as random mappingsand random polynomials. Then we present the statements of our main ap-proximation theorems — refining the naive limit theorems such as (3.4) forsmall components and (3.5) for large components by giving error bounds.We state both local and global approximations. The proofs themselves arepresented in Chapters 8 through 12, which constitute the technical core ofthis monograph.

Chapter 7 gives a number of consequences of the approximation theoremsof the preceding chapter, illustrating the power inherent in discrete func-tional limit theorems and approximations. Each is based on earlier limitingresults, improving upon them in two ways. First, the context is broadenedfrom an often quite restrictive setting to that of a very general logarithmiccombinatorial structure. Secondly, the limit theorems are supplied with er-ror bounds. The first setting is that of the usual “functional (Brownianmotion) central limit theorem” for the number of components in varioussize ranges. Then we give several metrized comparison results relating tothe Poisson-Dirichlet limit for the sizes of large components. For the verysimplest functional of the component counting process, the total numberof components, we investigate the accuracy of Poisson and related approx-imations to its distribution. Another famous theorem that we consider isthe Erdos-Turan law for the order of a random permutation. Finally, weextend the theory of additive functions on additive arithmetic semigroupsto general logarithmic structures.

The number theory connection

Our fascination with the component spectrum of logarithmic combinato-rial structures is based partly on similarities to the prime factorization of arandom integer selected uniformly from 1, 2, . . . , n, as observed in Knuthand Trabb Pardo (1976). The similarities include: having an independentprocess limit for small component counts; having Poisson-Dirichlet andGEM process limits for large components, as in Billingsley (1972, 1973,1999), Bach (1985), Vershik (1987) and Donnelly and Grimmett (1993);and having a conditioning relation, here a related bias relation, to con-struct the dependent system from the independent system. The celebratedDickman and Buchstab functions familiar to number theorists (cf. Tenen-baum (1995)) also arise in the combinatorial setting, described in Chapter2. A further similarity involves the “Fundamental Lemma of Kubilius” innumber theory; see Kubilius (1962), and Elliott (1979, 1980). This lemmacorresponds to Theorem 6.7 for logarithmic combinatorial structures, stat-ing that the total variation distance between the law of (C1

(n), . . . , Cb(n))

and the law of the approximating process (Z1, . . . , Zb) tends to zero whenb/n→ 0, and giving an explicit upper bound for the error.

To see these similarities, one must view an integer as a multiset of primes.The most basic difference then lies in the sizes allowed for components:

xii 0. Preface

for the combinatorial structures considered in this monograph, the sizesallowed are 1, 2, 3, . . ., while, for prime factorizations, the sizes allowed arelog p for primes p. For example, the integer 1848 = 23 · 3 · 7 · 11 is theinstance having three components of size log 2, one component each ofsizes log 3, log 7, and log 11, and no other components. This brief descriptionsuffices for a preface; for a somewhat longer discussion of the connections,see Chapter 1, or Arratia, Barbour and Tavare (1997) and Arratia (1998).

Notation

We end the preface with a brief description of our notation. A more ex-tensive list may be found in the corresponding index. We write IN for thenatural numbers 1, 2, 3, . . ., ZZ+ for the nonnegative integers 0, 1, 2, . . .,and for the set of the first n natural numbers we write either [n] or1, 2, . . . , n. We write A ⊂ B for the relation that A is a subset of B, al-lowing A = B. We denote the falling factorial by x[r] = x(x−1) · · · (x−r+1)and rising factorial with x(r) = x(x+1) · · · (x+r−1); in both cases, the valueis 1 if r = 0. For the harmonic numbers we use h(n+1) = 1+ 1

2 +· · ·+ 1n . The

first order asymptotic relation is written an ∼ bn, meaning lim an/bn = 1.We write a .= b to denote a deliberately vague approximation, for heuristicsor crude numerical values. We use the standard big-oh and little-oh nota-tion: an = O(bn) means that lim supn |an/bn| <∞, and an = o(bn) meansthat limn an/bn = 0. We write an bn for the symmetric relation that bothan = O(bn) and bn = O(an). We use = to show that alternative notationsmay be used for a single object, for example C(n) = (C1

(n), . . . , Cn(n)).

We write L(X) for the law (probability distribution) of a random objectX, so that L(X) = L(Y ) means that X and Y have the same distribution.To state that Xn converges in distribution to X we write Xn →d X. Weuse ∼ when specifying the distribution of a random element; for exampleZi ∼ Po(1/i) states that Zi has the Poisson distribution with mean 1/i.

This is page 1Printer: Opaque this

1Permutations and primes

This chapter supplies some historical perspective and motivation, by con-sidering the two oldest and most significant examples of logarithmicstructures. These are permutations and primes — permutations referring toa random permutation of n objects, decomposed into cycles, and primes re-ferring to a random integer chosen from 1 to n, factored into primes. Thesetwo systems have been studied independently of each other, but there areuncanny similarities between them. In a broad sense, the combinatorialstructures that we study in this book are precisely those that share thesesimilarities. The reader is invited to browse this chapter, or skip directlyto Chapter 2 for the broader range of examples.

1.1 Random permutations and their cycles

Random permutations implicitly made their appearance in the first editionof Pierre-Remond de Montmort’s Essai d’Analyse sur les Jeux de Hasardpublished in Paris in 1708. David (1962, p. 145) gives this translation fromProblemes divers sur le jeu du treize:

The players draw first of all as to who shall be the Bank. Let ussuppose that this is Pierre, and the number of players whateverone likes. Pierre having a complete pack of 52 shuffled cards,turns them up one after the other. Naming and pronouncingone when he turns the first card, two when he turns the second,three when he turns the third, and so on until the thirteenth

2 1. Permutations and primes

which is the King. Now if in all this proceeding there is no cardof rank agreeing with the number called, he pays each one ofthe Players taking part and yields the Bank to the player onhis right.But if it has happened in the turning of the thirteen cards thatthere has been an agreement, for example turning up an ace atthe time he has called one, or a two at the time he has calledtwo, or three when he has called three, he takes all the stakesand begins again as before calling one, then two, etc.

Pierre’s game is related to the number of fixed points of a random per-mutation. If Pierre had a pack consisting of p cards all of a single suit, thenPierre would win if the permutation induced by shuffling the cards had atleast one fixed point.

Most students of probability meet random permutations in the context ofthe so–called Hat–Check Problem: n mathematicians drop off their hats ata restaurant before having a meal. After the meal, their hats are returnedat random. What is the chance that no one gets back their own hat? Feller(1968, Chapter IV) gives a number of equivalent descriptions. We mightalso want to know the distribution of the number of mathematicians whoget back their own hats, its mean, variance and so on. Questions like thiscan be formulated in terms of random permutations, as follows. The returnof the hats induces a random permutation of n objects: label the math-ematicians 1, . . . , n, and let πj be the label of the mathematician whosehat was returned to j. The solution to our problems is then provided bythe distribution of the number of singleton cycles in this random permu-tation. In the next sections, we review a variety of results about randompermutations.

Cycles of permutations

We write Sn for the set of permutations of 1, 2, . . . , n. A permutationπ ∈ Sn is often written in two-line notation of the form

π =(

1 2 3 · · · nπ1 π2 π3 · · · πn

),

so that the image of 1 is π1, the image of 2 is π2 and so on. A permutationπ ∈ Sn can be written as an (ordered) product of cycles in the followingway: start the first cycle with the integer 1, followed by its image π1, theimage of π1 and so on. Once this cycle is completed, the second cycle startswith the smallest unused integer followed by its images, and so on. Forexample, the permutation π ∈ S10 given by

π =(

1 2 3 4 5 6 7 8 9 109 1 7 4 3 2 5 8 10 6

)(1.1)

1.1. Random permutations and their cycles 3

is decomposed as

π = (1 9 10 6 2)(3 7 5)(4)(8),

a permutation with two singleton cycles (or fixed points), one cycle oflength 3, and one of length 5. Its cycle type is c = (c1, . . . , c10) =(2, 0, 1, 0, 1, 0, . . . , 0); here, cj is the number of cycles of length j in π.

For c = (c1, . . . , cn) ∈ ZZn+, the number of permutations N(n, c) in Snhaving ci cycles of size i, 1 ≤ i ≤ n, i.e. cycle type c, is given by Cauchy’sformula:

N(n, c) = 1l

n∑j=1

jcj = n

n!n∏j=1

(1j

)cj 1cj !, (1.2)

where the indicator 1l· is defined by

1lA =

1 if A is true0 otherwise.

The joint distribution of cycle counts

If a permutation is chosen uniformly and at random from the n! possiblepermutations in Sn, then the counts Cj(n) of cycles of length j (1 ≤ j ≤ n)are dependent random variables; we take Cj

(n) = 0 if j > n. The jointdistribution of C(n) = (C1

(n), . . . , Cn(n)) is given by

IP[C(n) = c] =1n!N(n, c) = 1l

n∑j=1

jcj = n

n∏j=1

(1j

)cj 1cj !, (1.3)

for c ∈ ZZn+. We refer to the distribution in (1.3) as the Ewens SamplingFormula with parameter θ = 1, written ESF(1). The ESF(θ) distribution,which plays a central role in the theory, is introduced in Example 2.19 anddiscussed in detail in Chapter 4.

Watterson (1974) derived the joint moments of (C1(n), . . . , Cn

(n)). We usethe notation

x[r] = x(x− 1) · · · (x− r + 1)

for falling factorials.

Lemma 1.1 For non-negative integers m1, . . . ,mn,

IE

n∏j=1

(Cj(n))[mj ]

=

n∏j=1

(1j

)mj

1l

n∑j=1

jmj ≤ n

. (1.4)


Proof. This can be established directly by exploiting cancellation ofthe form c

[mj ]j /cj ! = 1/(cj − mj)! when cj ≥ mj , which occurs be-

tween the ingredients in Cauchy’s formula and the falling factorials inthe moments. Write m =

∑jmj . Then, with the first sum indexed by

c = (c1, . . . , cn) ∈ ZZn+ and the last sum indexed by d = (d1, . . . , dn) ∈ ZZn+via the correspondence dj = cj −mj , we have

IE

n∏j=1

(Cj(n))[mj ]

=∑c

IP[C(n) = c]n∏j=1

(cj)[mj ]

=∑

c: cj≥mj∀j

1l

n∑j=1

jcj = n

n∏j=1

(cj)[mj ]

jcjcj !

=n∏j=1

1jmj

∑d

1l

n∑j=1

jdj = n−m

n∏j=1

1jdj (dj)!

.

This last sum simplifies to the indicator 1l(m ≤ n), corresponding to thefact that if n−m ≥ 0, then dj = 0 for j > n−m, and a random permutationin Sn−m must have some cycle structure (d1, . . . , dn−m). ut

The moments of Cj(n) follow immediately as

IE(Cj(n))[r] = j−r1ljr ≤ n. (1.5)

We note for future reference that (1.4) can also be written in the form

IE

n∏j=1

(Cj(n))[mj ]

= IE

n∏j=1

Z[mj ]j

1l

n∑j=1

jmj ≤ n

, (1.6)

where the Zj are independent Poisson-distributed random variablessatisfying IE(Zj) = 1/j.

The marginal distribution of cycle counts

Although (1.3) provides a formula for the joint distribution of the cyclecounts Cj(n), it is not simple to deduce their marginal distributions fromit. For illustration, we find the distribution of Cj(n) using a combinatorialapproach combined with the inclusion-exclusion formula. Goncharov (1944)established the following lemma.

Lemma 1.2 For 1 ≤ j ≤ n,

IP[Cj(n) = k] =j−k

k!

bn/jc−k∑l=0

(−1)lj−l

l!. (1.7)


Proof. Consider the set I of all possible cycles of length j, formed withelements chosen from 1, 2, . . . , n, so that |I| = n[j]/j. For each α ∈ I,consider the “property” Gα of having cycle α; that is, Gα is the set ofpermutations π ∈ Sn such that α is one of the cycles of π. We thenhave |Gα| = (n − j)!, since the elements of 1, 2, . . . , n not in α mustbe permuted among themselves.

To use the inclusion-exclusion formula we need to calculate the term Sr,which is the sum of the probabilities of the r-fold intersection of proper-ties, summing over all sets of r distinct properties. There are two cases toconsider. If the r properties are indexed by r cycles having no elements incommon, then the intersection specifies how rj elements are moved by thepermutation, and there are (n − rj)! 1l(rj ≤ n) permutations in the inter-section. There are n[rj]/(jrr!) such intersections. For the other case, sometwo distinct properties name some element in common, so no permutationcan have both these properties, and the r-fold intersection is empty. Thus

Sr = (n− rj)! 1l(rj ≤ n)× n[rj]

jrr!1n!

= 1l(rj ≤ n)1jrr!

.

Finally, the inclusion-exclusion series for the number of permutationshaving exactly k properties is (see Feller (1968, p. 106))∑

l≥0

(−1)l(k + l

l

)Sk+l,

which simplifies to (1.7). ut

Returning to the original hat-check problem, we substitute j = 1 in(1.7) to obtain the distribution of the number of fixed points of a randompermutation. For k = 0, 1, . . . , n,

IP[C1(n) = k] =

1k!

n−k∑l=0

(−1)l1l!, (1.8)

and the moments of C1(n) follow from (1.5) with j = 1. In particular, for

n ≥ 2, the mean and variance of C1(n) are both equal to 1.

The joint distribution of (C1(n), . . . , Cb

(n)) for any b ≥ 1 has an expressionsimilar to (1.7); this too can be derived by inclusion-exclusion. For anyc = (c1, . . . , cb) ∈ ZZb+ with m =

∑ici,

IP[(C1(n), . . . , Cb

(n)) = c]

=b∏i=1

(1i

)ci 1ci!

∑l≥0 with

Σili≤(n−m)

(−1)l1+···+lbb∏i=1

(1i

)li 1li!. (1.9)

The joint moments of the first b counts C1(n), . . . , Cb

(n) can be obtaineddirectly from (1.4) and (1.6) by setting mb+1 = · · · = mn = 0.


The limit distribution of cycle counts

It follows immediately from Lemma 1.2 that for each fixed j, as n→∞,

IP[Cj(n) = k] → j−k

k!e−1/j , k = 0, 1, 2, . . . ,

so that Cj(n) converges in distribution to a random variable Zj having aPoisson distribution with mean 1/j; we use the notation

Cj(n) →d Zj where Zj ∼ Po(1/j)

to describe this. In fact, the limit random variables are independent, as thefollowing result of Goncharov (1944) and Kolchin (1971) shows.

Theorem 1.3 The process of cycle counts converges in distribution to aPoisson process on IN with intensity j−1. That is, as n→∞

(C1(n), C2

(n), . . .) →d (Z1, Z2, . . .) (1.10)

where the Zj, j = 1, 2, . . ., are independent Poisson–distributed randomvariables with

IE(Zj) =1j.

Proof. To establish the convergence in distribution given in Theorem 1.3,one shows that for each fixed b ≥ 1, as n→∞,

IP[(C1(n), . . . , Cb

(n)) = c] → IP[(Z1, . . . , Zb) = c].

This can be verified from (1.9). An alternative proof exploits (1.6) and themethod of moments. ut

Error rates

The proof of Theorem 1.3 says nothing about the rate of convergence.Elementary analysis can be used to estimate this rate when b = 1. Usingproperties of alternating series with decreasing terms, David and Barton(1962, p. 105) show that, for k = 0, 1, . . . , n,

1k!

(1

(n− k + 1)!− 1

(n− k + 2)!

)≤ |IP[C1

(n) = k]− IP[Z1 = k]|

≤ 1k! (n− k + 1)!

.

It follows that

2n+1

(n+ 1)!n

n+ 2≤

n∑k=0

|IP[C1(n) = k]− IP[Z1 = k]| ≤ 2n+1 − 1

(n+ 1)!. (1.11)


Since

IP[Z1 > n] =e−1

(n+ 1)!

(1 +

1n+ 2

+1

(n+ 2)(n+ 3)+ · · ·

)<

1(n+ 1)!

,

we see from (1.11) that the total variation distance between the distributionL(C1

(n)) of C1(n) and the distribution L(Z1) of Z1, defined by

dTV (L(C1(n)),L(Z1)) =

12

∞∑k=0

|IP[C1(n) = k]− IP[Z1 = k]|,

satisfies the inequalities

2n

(n+ 1)!n

n+ 2≤ dTV (L(C1

(n)),L(Z1)) ≤2n

(n+ 1)!(1.12)

for n = 1, 2, . . .. Therefore the rate of convergence to the Poisson distri-bution is super–exponential in n as n → ∞. A similar calculation basedon (1.7) shows that, for each fixed j, the total variation distance betweenL(Cj(n)) and L(Zj) satisfies

dTV (L(Cj(n)),L(Zj)) ∼ j−(r+1) 2r

(r + 1)!, r = bn/jc. (1.13)

There are many circumstances in which it is useful to go further,and examine the total variation distance db(n) between the joint laws of(C1

(n), . . . , Cb(n)) and (Z1, . . . , Zb). If a bound for db(n) can be established

that depends on b and n in a reasonably explicit way, then it may bepossible also to let b = b(n) tend to infinity with n, in such a way thatdb(n)(n) still tends to zero. This proves to be a very effective tool whenapproximating complicated functions of the process C(n).

To simplify the task, we first define the quantity

T0b = Z1 + 2Z2 + · · ·+ bZb, (1.14)

for any b ≥ 1. Then, by comparing (1.3) to the joint distribution of the(independent) Zj ’s that are defined as the limit random variables in (1.10),we see that the distribution of C(n) satisfies the Conditioning Relation

L(C1(n), . . . , Cn

(n)) = L((Z1, . . . , Zn)|T0n = n). (1.15)

This relation lies at the heart of our approach to the analysis of our moregeneral combinatorial structures. In particular, it implies (Lemma 3.1) thatthe total variation distance satisfies

dTV (L(C1(n), . . . , Cb

(n)),L(Z1, . . . , Zb)) = dTV (L(T0b),L(T0b |T0n = n)),(1.16)

thus reducing the total variation distance between two vectors to a distancebetween two one-dimensional random variables. Using the result in (1.16),Arratia and Tavare (1992a) established inter alia that, for 1 ≤ b ≤ n,

dTV (L(C1(n), . . . , Cb

(n)),L(Z1, . . . , Zb)) ≤ F (n/b)


for an (explicit) function F satisfying logF (x) ∼ −x log x as x → ∞.Thus the counts of the first b cycle sizes are well approximated by inde-pendent Poisson random variables, as long as b = o(n) as n → ∞. Exactasymptotics in this case are not yet known. In the next section we describeanother approach to estimating distances between the cycle counts and theindependent Poisson process.

The Feller coupling

The following construction, which we call the Feller coupling, is basedon Feller (1945) and Renyi (1962); see also Arratia, Barbour and Tavare(1992). Consider the canonical cycle notation for a permutation π. For ex-ample, the permutation π in (1.1) is written as π = (1 9 10 6 2)(3 7 5)(4)(8).In writing the canonical cycle notation for a random π ∈ S10, one alwaysstarts with “(1 ”, and then makes a ten-way choice, between “(1)(2 ”,“(1 2 ”, . . . , and “(1 10 ”. One continues with a nine-way choice, an eight-way choice, . . . , a two-way choice, and finally a one-way choice. Define ξias the indicator function

ξi = 1lclose off a cycle when there is an i-way choice.

Thus

IP[ξi = 1] =1i, IP[ξi = 0] = 1− 1

i, i ≥ 1, (1.17)

and

ξ1, ξ2, . . . , ξn are independent. (1.18)

An easy way to see the independence of the ξi is to take Di chosen from1 to i to make the i-way choice, so that ξi = 1lDi = 1. Absolutelyno computation is needed to verify that the map constructing canonicalcycle notation, (D1, D2, . . . , Dn) 7→ π, from [1]× [2]× . . .× [n] to Sn, is abijection. The variables D1, D2, . . . , Dn determine the random permutationon n points, while the Bernoulli variables ξ1, ξ2, . . . , ξn determine the cyclestructure.

To construct random n-permutations simultaneously for n = 1, 2, . . .,simply use the same D1, D2, . . ., and hence the same ξ1, ξ2, . . ., for all n.The Feller coupling, as motivated by the process of writing out canonicalcycle notation, reads ξ1ξ2 . . . ξn from right to left: the length of the firstcycle is the waiting time to the first of ξn, ξn−1, . . . to take the value 1,the length of the next cycle is the waiting time to the next 1, and so on.The cycle lengths can also be determined without regard to right or left byplacing an artificial 1 in position n+1. Then every i-spacing in 1ξ2ξ3 . . . ξn1,that is, every pattern of two ones separated by i− 1 zeros, corresponds toa cycle of length i: note that ξ1 = 1 a.s. The spacing from the rightmostone in 1ξ2ξ3 . . . ξn to the artificial 1 at position n + 1 corresponds to thefirst cycle in canonical cycle notation.


In the Feller coupling, the cycle structure of a random π ∈ Sn has beenrealized via

Ci(n) = #i− spacings in 1ξ2ξ3 . . . ξn1. (1.19)

For comparison, consider

Ci(∞) = #i− spacings in 1ξ2ξ3 . . . =

∑k>i

1lξk−i . . . ξk = 1 0i−11. (1.20)

Note that for the event that an i-spacing occurs with right end at k, theprobability is a simple telescoping product:

IP[ξk−i . . . ξk = 1 0i−11]

=1

k − i

k − i

k − (i− 1). . .

k − 3k − 2

k − 2k − 1

1k

=1

(k − 1)k. (1.21)

We can have Ci(n) < Ci(∞), due to i-spacings whose right end occurs after

position n+ 1; the expected number of times that these occur is∑k>n+1

IP[ξk−i . . . ξk = 10i−11] =∑

k>n+1

1(k − 1)k

=1

n+ 1. (1.22)

The only way that Ci(n) > Ci(∞) can occur is if the artificial 1 at position

n + 1 in 1ξ2 . . . ξn1 creates a (single) extra i-spacing; for each n this canoccur for at most one i, and it occurs for each 1 ≤ i ≤ n with the sameprobability,

IP[ξn−i+1 . . . ξnξn+1 = 1 0i−10]

=1

n− i+ 1n− i+ 1n− i+ 2

. . .n

n+ 1=

1n+ 1

. (1.23)

This allows us to prove the following Lemma, due to Diaconis and Pitman(1986) and Barbour (1990).

Lemma 1.4

dTV (L(C1(n), . . . , Cb

(n)),L(Z1, . . . , Zb)) ≤2b

n+ 1. (1.24)

Proof. Equations (1.22) and (1.23) show that, for 1 ≤ i ≤ n,

IP[Ci

(n) 6= Ci(∞)]≤ 2/(n+ 1),

so that

IP[(C1

(n), . . . , Cb(n)) 6= (C1

(∞), . . . , Cb(∞))

]≤ 2b/(n+ 1), (1.25)

It follows that(C1

(n), . . . , Cb(n))→d

(C1

(∞), . . . , Cb(∞))

for each fixed b,and hence

(C1

(n), C2(n), . . .

)⇒(C1

(∞), C2(∞), . . .

). Comparison with Theo-

rem 1.3 shows that the Ci(∞) are therefore independent Poisson distributed


random variables, with IECi(∞) = 1/i. Thus the Ci(∞) have the same

distribution as the Zi, and it follows from (1.25) that

dTV (L(C1(n), . . . , Cb

(n)),L(Z1, . . . , Zb))= inf

couplingsIP[(C1

(n), . . . , Cb(n)) 6= (Z1, . . . , Zb)] ≤ 2b/(n+ 1).

ut

Remark. The proof of Lemma 1.4 shows that we may construct the inde-pendent Poisson random variables Zi via Zi = Ci

(∞). Equations (1.22) and(1.23) then show that

IE∣∣Ci(n) − Zi

∣∣ ≤ IP[Ci(n) > Ci(∞)] + IE(Ci(∞) − Ci

(n))+ ≤ 2/(n+ 1),

so that

IEn∑1

∣∣Ci(n) − Zi∣∣ ≤ 2n

n+ 1< 2. (1.26)

We will exploit this result in the coming sections.

The number of cycles

The distribution of the number K0n = C1(n) + · · · + Cn

(n) of cycles in arandom permutation is easily found. Since the number of permutations inSn having k cycles is |Sn(k)|, the absolute value of a Stirling number of thefirst kind, we see that

IP[K0n = k] = |Sn(k)|/n!, k = 1, 2, . . . , n.

It follows that the probability generating function of K0n is given by

IE(uK0n) =n∑k=1

IP[K0n = k]uk =u(n)

n!=

n∏j=1

(1− 1

j+ u

1j

), (1.27)

where we have defined x(n) = x(x + 1) · · · (x + n − 1). Recall that if ξis a Bernoulli random variable with parameter p, so that IP[ξ = 1] = pand IP[ξ = 0] = 1 − p, then IEuξ = (1 − p + pu). It follows that K0n isdistributed as a sum of independent Bernoulli random variables ξj thatsatisfy IP[ξj = 1] = 1 − IP[ξj = 0] = 1/j. The Feller coupling provides aconstruction of the ξj . We also note that

IE(K0n) =n∑j=1

1j, Var(K0n) =

n∑j=1

j − 1j2

.

Local limit theorems.

A number of authors have derived local limit theorems for the distributionof K0n. The simplest of these follows immediately from the asymptotics


of the Stirling numbers. Moser and Wyman (1958) demonstrated that, fork = o(log n),

|Sn(k)| ∼ (n− 1)!(γ + log n)k−1/(k − 1)!, (1.28)

from which it follows that, for k = o(log n),

IP[K0n = k] ∼ (log n)k−1e− logn

(k − 1)!. (1.29)

In particular, for k fixed, asymptotics similar to (1.29) hold for all the log-arithmic structures considered in this book; see Theorem 5.9. Furthermore,conditional on having k components, for fixed k, the joint distribution ofthe sizes of those k components has a universal limit; see Theorems 4.20and 5.9.

To address the case k ∼ β log n, Hwang (1995) showed that

|Sn(k)|n!

=(log n)k−1e− logn

(k − 1)!

1

Γ(1 + r)+O

(k

(log n)2

), (1.30)

where r = (k− 1)/ log n, uniformly over k in the range 2 ≤ k ≤ η log n, forany η > 0. When β = 1, we recover the local central limit theorem due toKolchin (1971) and Pavlov (1988).

Central limit theorems.

Several authors have studied the asymptotic distribution of K0n. Gon-charov (1944) and Shepp and Lloyd (1966) used generating functions toshow that

K0n − log n√log n

→d N(0, 1) (1.31)

as n → ∞, N(0,1) denoting the normal distribution with mean 0 andvariance 1. Feller (1945) and Renyi (1962) established this by using otherBernoulli representations and applying the Lindeberg–Feller central limittheorem. Kolchin (1971, 1986) uses a representation in terms of randomallocations of particles into cells.

Remark. We note that the Bernoulli representation forK0n can be exploitedto establish the Poisson approximation (Barbour and Hall, 1984)

dTV (L(K0n),Po(IEK0n)) 1

log n,

a result that is much stronger than (1.31).

There is also a functional central limit theorem describing the counts ofcycles of sizes up to nt, 0 ≤ t ≤ 1. Define the process Bn(·) by

Bn(t) =

∑bntcj=1 Cj

(n) − t log n√

log n, 0 ≤ t ≤ 1.


DeLaurentis and Pittel (1983) showed that Bn(·) →d B(·) as n → ∞,where B is standard Brownian motion.

A theme of this monograph is that such results may be guessed, andproved, using comparison with a process of independent components. Forexample, we can write∑bntc

j=1 Cj(n) − t log n

√log n

=

∑bntcj=1 Zj − t log n

√log n

+

∑bntcj=1 Cj

(n) − Zj√log n

.

The first term on the right, a functional of independent random variables, isreadily shown to be asymptotically standard Brownian motion. The second,an error term, tends to 0 in probability, as is readily shown using (1.26).

Another theme of this book is the development of bounds for such ap-proximations. We show in Theorem 3.5 that C(n) and B can be constructedon the same probability space in such a way that

IE

sup0≤t≤1

|Bn(t)−B(t)| ∧ 1

= O( log log n√

log n

).

Non-linear functionals.

For another application, consider the difference Dn between the number ofcycles K0n and the number of distinct cycle lengths. Wilf (1983) showedthat

IEDn →∑j≥1

(1j− 1 + exp(−1/j)

).

Clearly,

Dn =∑j≤n

(Cj(n) − 1)+,

and the heuristic suggests that, as n→∞,

Dn →d D =∑j≥1

(Zj − 1)+, IEDn → IED; (1.32)

this is proved in Arratia and Tavare (1992b, Theorem 9). It follows fromthis result that the number of distinct cycle lengths also asymptotically hasa normal distribution with mean and variance log n.

The small cycles

There is also an extensive theory for the smallest and largest cycles. LetYr

(n) be the length of the rth smallest cycle (defined to be +∞ if K0n < r),and observe that

Yr(n) > l if and only if

l∑j=1

Cj(n) < r.


Then the independent process approximation heuristic suggests that wehave Yr(n) →d Yr for each fixed r, where

IP[Yr > l] = IP

l∑j=1

Zj < r

.Since the sum of independent Poisson random variables also has a Poissondistribution, this probability is just Po(h(l+1))[0, r− 1], the probabilityassigned to the set 0, 1, . . . , r−1 by the Poisson distribution Po(h(l+1)),where

h(l + 1) = 1 +12

+ · · ·+ 1l,

the l–th harmonic number. This was established by analytical methodsin Shepp and Lloyd (1966); it also follows directly from (1.24). The jointdistribution of the r smallest cycles may be approximated in the same way;see Kolchin (1986) and Arratia and Tavare (1992b) for example.

Instead of considering the limit distribution for small cycles, one mayconsider large deviations: what if the smallest cycle is large? For fixedu ≥ 1, as n→∞,

IP[Y1(n) > n/u] ∼ uω(u)

n(1.33)

where ω is Buchstab’s function. The local version also holds: for u ≥ 2 andk ∼ n/u,

IP[Y1(n) = k] ∼ u2 ω(u− 1)

n2; (1.34)

see Panario and Richmond (2001).Buchstab’s function ω (Buchstab, 1937) is continuous on [1,∞), with

(uω(u))′ = ω(u − 1) for u > 2, uω(u) = 1 for 1 ≤ u ≤ 2, and ω(u) → e−γ

as u→∞; see Tenenbaum (1995). Furthermore, for u > 2,

ω(u) = u−1

1 +∑

2≤k≤u

1k!

∫· · ·∫Jk(u)

dy1 · · · dyk−1

y1 · · · yk−1(1− y1 − · · · − yk−1)

,

where

Jk(u) = u−1 < yi < 1, 1 ≤ i ≤ k − 1; u−1 < 1− y1 − · · · − yk−1 < 1.

For further details see Chapters 4.3 and 4.7.

The large cycles

We turn next to the longest cycles. Let L1(n) denote the length of the

longest cycle in an n-permutation. Goncharov (1944) and Shepp and Lloyd(1966) showed that n−1L1

(n) →d L1 as n → ∞. The distribution of L1


is determined by the Dickman function ρ, studied by Dickman (1930) todescribe the largest prime factor of a random integer. The function ρ ischaracterized as the unique solution of the equation

uρ′(u) + ρ(u− 1) = 0, u > 0,

satisfying ρ(u) = 0, u < 0, and ρ(u) = 1, 0 ≤ u ≤ 1. The function ρ is alsogiven by

ρ(u) = 1 +∑k≥1

(−1)k

k!

∫· · ·∫Ik(u)

dy1 · · · dyky1 · · · yk

, (1.35)

where Ik(u) = uy1 > 1, . . . , uyk > 1, y1 + · · ·+ yk < 1. Writing F1, f1 forthe distribution function and density of L1 respectively, we have

F1(1/u) = ρ(u); f1(x) =1xF1

(x

1− x

), u, x > 0.

See Chapters 4.2 and 4.9 for further details. Dickman evaluated IEL1 asapproximately 0.6243, and Golomb (1964) noted that

IEL1(n)

n↓ IEL1,

IEL1(n)

n+ 1↑ IEL1.

Now define Lr(n) to be the length of the rth longest cycle in the permuta-tion, setting Lr(n) = 0 if the permutation has fewer than r cycles. Kingman(1977) and Vershik and Shmidt (1977) showed that

n−1(L1(n), L2

(n), . . .) →d (L1, L2, . . .) in IR∞; (1.36)

that is, for each fixed r ≥ 1,

n−1(L1(n), . . . , Lr

(n)) →d (L1, . . . , Lr),

where the random vector (L1, L2, . . .) has a distribution known as thePoisson-Dirichlet distribution with parameter θ = 1. See Chapter 4.9 foranother proof. Error bounds for this approximation can be derived undera number of metrics; cf. Theorem 3.5. For example, it is shown in Arratia,Barbour and Tavare (1997b) that the L(n) and L can be constructed on acommon probability space in such a way that

IE∑j≥1

|n−1Lj(n) − Lj |

∼ log n

4n,

and furthermore that no construction can achieve a better rate. The jointdensity f1(r) of (L1, . . . , Lr) is given by

f1(r)(x1, . . . , xr) =

1x1 · · ·xr

ρ

(1− x1 − · · · − xr

xr

), 1 > x1 > · · · > xr > 0.

The joint moments of (L1, . . . , Lr) are given by

IE(Lj11 · · ·Ljrr )


=1j!

∫y1>···>yr>0

yj1−11 · · · yjr−1

r exp

(−

r∑l=1

yl − E1(yr)

)dy1 · · · dyr,

where j = j1 + · · ·+ jr and

E1(s) =∫ ∞

s

w−1e−wdw.

From this follows Shepp and Lloyd’s (1966) result that

IELjr =∫ ∞

0

yj−1

j!E1(y)r−1

(r − 1)!exp (−y − E1(y)) dy;

in particular, Var(L1).= 0.0369. Many other results about the Poisson–

Dirichlet family are collected in Chapter 4.11.

The age-ordered list of cycle lengths

In the Feller coupling, the sequence ξ1ξ2 · · · ξn ∈ 0, 1n specifies not onlythe cycle structure via (1.19), but also an ordered list of cycle lengthsA1

(n), A2(n), . . ., where Aj(n) is the length of the jth cycle in the canonical

cycle notation, with Aj(n) = 0 if j is greater than the number of cycles

of the random n-permutation. This list A1(n), A2

(n), . . . is called the age-ordered process of cycle lengths. It may also be described as a size-biasedpermutation of the multiset of cycle lengths; see Chapter 4.11.

The length A1(n) of the first cycle is

A1(n) = n+ 1−maxi ≤ n : ξi = 1,

which is exactly uniformly distributed on the set 1, 2, . . . , n, since, for1 ≤ k ≤ n,

IP[A1(n) = k] = IP[ξn−k+1 = 1, ξn−k+2 = · · · = ξn−1 = ξn = 0]

=1

n− k + 1n− k + 1n− k + 2

· · · n− 2n− 1

n− 1n

=1n. (1.37)

The procedure for canonical cycle notation is recursive: if a1 elementsare used in the first cycle of an random n-permutation, then the remainingcycles are produced, in order, as the canonical cycle notation of a randompermutation of the remaining n− a1 elements. It follows from this (and italso follows directly by calculation with ξ1, ξ2, . . . as in (1.37)) that, for eachj ≥ 1, conditional on A1

(n) = a1, . . . , Aj−1(n) = aj−1 and writing m = a1 +

· · ·+ aj−1, the distribution of Aj(n) is uniform on the set 1, 2, . . . , n−m,with the interpretation that, for m = n, instead of the uniform distributionon the empty set, we have the constant zero with probability one. Thisconditional uniformity may also be expressed as follows: for any j ≤ n and


for any a1, a2, . . . , aj ≥ 1 with a1 + · · ·+ aj ≤ n,

IP[A1(n) = a1, . . . , Aj

(n) = aj ] =1

n(n− a1) · · · (n− a1 − · · · − aj−1).

(1.38)Note that, if Uj is uniformly distributed in [0,1], then, for any 0 ≤ m ≤ n,

the random variable d(n −m)Uje is uniform on 1, 2, . . . , n −m. Hence,with U1, U2, . . . independent and all uniformly distributed on [0, 1], theprocess (A1

(n), A2(n), . . .) has the same distribution as

(dnU1e, d(n− dnU1e)U2e, . . .) , (1.39)

for any n ≥ 1. Thus the simple observation that, for all (u1, u2, . . .) ∈ IR∞,

limn→∞

n−1(dnu1e, d(n− dnu1e)u2e, . . .) = (u1, (1− u1)u2, . . .)

shows that the distributional limit of the process (A1(n), A2

(n), . . .) normal-ized by n is given by

n−1(A1(n), A2

(n), . . .) →d (A1, A2, . . .),

where

A1 = U1, Ar = Ur

r−1∏j=1

(1− Uj), r ≥ 1.

Note the structural simplicity of this law of (A1, A2, . . .), known as theGEM distribution with parameter θ = 1, compared to the Poisson-Dirichletlaw; for further details, see Chapters 4.8 and 4.11.

The Erdos-Turan Law

One of the most beautiful results about random permutations concerns theasymptotic distribution of the order On, the least common multiple of thecycle lengths. Erdos and Turan (1965) showed that, as n→∞,

logOn − 12 log2(n)√

13 log3(n)

→d N(0, 1).

Erdos and Turan noted that

Our proof is a direct one and rather long; but a first proof canbe as long as it wants to be. It would be however of interest todeduce it from the general principles of probability theory.

There are now several probabilistic proofs of this result, among them thoseof Best (1970), Kolchin (1977), Bovey (1980), Pavlov (1980), Nicolas (1985),DeLaurentis and Pittel (1985) and Stein. Arguably the simplest proof of theNormal limit law appears in Arratia and Tavare (1992b), where the Fellercoupling is used. A more detailed description may be found in Chapter 4.10.


This coupling is further exploited in Barbour and Tavare (1994), where itis proved that

supx

∣∣∣∣∣∣IP log(On)− 1

2 log2(n) + log n log log n√13 log3(n)

≤ x

− Φ(x)

∣∣∣∣∣∣ = O

(1√

log n

)where Φ denotes the distribution function of a N(0,1) random variable. SeeChapter 7.4 for further details.

Random permutations continue to be the subject of active research inter-est. Connections between the eigenvalues of a random permutation matrix(which are determined by the cycle structure of the permutation), theeigenvalues of a random unitary matrix and the Riemann zeta functionare uncovered by Wieand (1998, 2000) and Hambly et al. (1999).

More on the Conditioning Relation

We close this introductory section with some observations about the Con-ditioning Relation (1.15). Suppose that the Zj , j ≥ 1, are independentPoisson random variables with IEZj = xj/j for any fixed x > 0, and notnecessarily x = 1, as in (1.15). Then, as calculated below in (2.6)–(2.8),

L(x)((Z1, . . . , Zn)|T0n = n) = L(C1(n), . . . , Cn

(n)) (1.40)

is true, irrespective of the value of x. Thus the Conditioning Relation (1.15)does not uniquely specify the distributions of the Zj , j ≥ 1. Now, for x < 1,it is also true that the random variable T0∞ is almost surely finite, andsince, for c1, . . . , cn satisfying

∑nj=1 jcj = n,

IP[Z1 = c1, . . . , Zn = cn, T0∞ = n]/IP[T0∞ = n]

=IP[Z1 = c1, . . . , Zn = cn, T0n = n;Zj = 0, j > n]

IP[T0n = n;Zj = 0, j > n]= IP[Z1 = c1, . . . , Zn = cn, T0n = n]/IP[T0n = n],

by independence and the definition of T0n, it follows that, for x < 1,

L(x)((Z1, . . . , Zn)|T0∞ = n) = L(C1(n), . . . , Cn

(n)) (1.41)

also. The relation (1.41) was exploited by Shepp and Lloyd (1966). Theadvantage of using (1.40) over (1.41) is that it allows the use of x = 1.Further connections between the two versions of the conditioning relation,(1.40) and (1.41), are given in Arratia, Barbour and Tavare (1999a). Forx = 1, IET0n = n, and the conditioning event T0n = n has relatively largeprobability, of order n−1, improving the precision of the results derived fromthe Conditioning Relation.

For our purposes, the Conditioning Relation (1.15) with IEZj ∼ θ/j,which applies for a wide variety of combinatorial objects (once the ap-propriate independent Zj have been identified), is the important one. We


exploit it to prove results analogous to those discussed here for a muchbroader class of combinatorial structures, using a probabilistic rather thananalytic approach. One of our aims is to exploit this approach to providebounds on rates of convergence for many of the limit laws discussed above.

1.2 Random integers and their prime factors

This section on prime factorization is deliberately written in parallel to theprevious section on random permutations, in order to bring out the simi-larities and differences between these fundamental examples. We impose anotation which is natural for extending the analogy between integers anddecomposable combinatorial structures, and we also indicate the standardnumber-theoretic notation. A useful reference for results in this section isTenenbaum (1995).

Any integer decomposes uniquely as a product of primes. For example,the integer m = 220 has two factors of 2, one factor of 5, and one factor of11. To specify the factorization of an integer, one can specify the multiplic-ity cp of p for each prime p; when discussing number theory, the dummyvariable p usually denotes a prime. Thus the “type” of an integer m isgiven by a vector c = (c2, c3, c5, . . .) ∈ ZZ∞+ , with m =

∏pcp . Our example

m = 220 matches the example of a permutation on page 3, in that bothhave type c = (2, 0, 1, 0, 1, 0, 0, . . .).

The joint distribution of prime factor counts

In contrast to Cauchy’s formula for the number of permutations of a giventype, the number of integers in [n] having type c is one or zero, dependingon whether or not

∏p≤n p

cp ≤ n. We write [n] for the set of integers1, 2, . . . , n, and chose a random integer N = N(n) uniformly and atrandom from [n]. The multiplicities Cp(n) of the primes p as factors of N(n)are random variables. Note that if p > n, then Cp(n) = 0. Thus

N(n) =∏p≤n

pCp(n)

=∏p

pCp(n)

is uniformly distributed from 1 to n, and the joint distribution of C(n) =(C2

(n), C3(n), . . .) is given by

IP[C(n) = c] =1n

1l

∑p

cp log p ≤ log n

, c ∈ ZZ∞+ . (1.42)

1.2. Random integers and their prime factors 19

The marginal distribution of prime factor counts

The marginal distribution of Cp(n) for primes is notably simpler than thecorresponding expression (1.7) for random permutations. For k ≥ 0 theevent that Cp(n) ≥ k is exactly the event that N(n) is a multiple of pk, withbn/pkc possible values for N(n) ∈ [n], so that for k = 0, 1, 2, . . .,

IP[Cp(n) ≥ k] =1nbn/pkc

and hence, by differencing,

IP[Cp(n) = k] =1n

(bn/pkc − bn/pk+1c

). (1.43)

The joint distribution of an initial segment of the coordinates can also besimply expressed, in terms of its upper tail probabilities: with π(b) to denotethe number of primes less than or equal to b, for any c = (cp, p ≤ b) ∈ ZZ

π(b)+ ,

IP[Cp(n) ≥ cp, p ≤ b] =1nbn/dc, with d =

∏p≤b

pcp . (1.44)

The point probabilities for this joint distribution are then given by differ-encing, with 2π(b) terms. Expressions for the moments and joint momentsare complicated, and do not seem useful, in contrast with (1.5) and (1.4)for random permutations.

The limit distribution of prime factor counts

It follows immediately from (1.43) that, for any prime p and for any k ∈ ZZ+,

IP[Cp(n) = k] →(

1− 1p

)(1p

)kas n→∞,

so that Cp(n) converges in distribution to a random variable Zp having ageometric distribution with parameter 1/p. Similarly, from the differencedversion of (1.44), it follows that for c = (cp, p ≤ b) ∈ ZZ

π(b)+ , as n→∞,

IP[Cp(n) = cp, p ≤ b] →∏p≤b

(1− 1

p

)(1p

)cp

, (1.45)

where the right side is exactly IP[Zp = cp, p ≤ b], taking the geometricallydistributed Zp to be independent. Thus as n→∞

(C2(n), C3

(n), C5(n), . . .) →d (Z2, Z3, Z5, . . .) in ZZ∞+ .

This can be found explicitly in Billingsley (1974); the result has been well-known for a long time, and it would be interesting to find the earliestexplicit statement of this convergence in distribution.


Error rates

We see from (1.24) that, for permutations, the total variation distancedTV (L(C1

(n), . . . , Cb(n)),L(Z1, . . . , Zb)) → 0 if b/n → 0. The corresponding

result for primes, known in number theory as the “fundamental lemma ofKubilius” is that

dTV (L((Cp(n), p ≤ b)),L((Zp, p ≤ b))) → 0 iflog blog n

→ 0. (1.46)

Writing u = log n/ log b, and denoting the total variation distance in (1.46)as d(b, n), Kubilius (1964) gave an upper bound of the form d(b, n) =O(e−cu), for some c > 0, and Barban and Vinogradov (1964) improved thisto d(b, n) = O(e−cu log u) for some c > 0. See also Elliott (1979), Arratiaand Stark (1996) and Tenenbaum (1997).

The number of prime factors

Corresponding to the total number of cyclesK0n and the number of distinctcycle lengthsK0n−Dn for a random permutation, we have the total numberof prime factors of m, denoted by Ω(m) for counting with multiplicity, andω(m) for counting without multiplicity. Thus, we write

K0n =∑p

Cp(n) = Ω(N(n)), K0n −Dn =

∑p

(Cp(n) ∧ 1) = ω(N(n)).

Local limit theorems.

Landau (1909) showed that for fixed k = 1, 2, . . .,

IP[K0n = k] ∼ 1log n

(log log n)k−1

(k − 1)!, (1.47)

the case k = 1 being the prime number theorem. This suggests a com-parison between K0n − 1 and a Poisson random variable with parameterlog log n. Such a comparison can be carried out using the Selberg–Delangemethod, from Selberg (1954) and Delange (1954, 1971). For example, in ournotation, Tenenbaum (1995, Theorem II.6.5, formula (20)) is the statementthat, for any δ > 0,

IP[K0n = k]

= e− log logn (log log n)k−1

(k − 1)!

ν

(k − 1

log log n

)+O

(k

(log log n)2

),

uniformly in n ≥ 3 and 1 ≤ k ≤ (2− δ) log log n, where

ν(z) =1

Γ(z + 1)

∏p

(1− z

p

)−1(1− 1

p

)z.

1.2. Random integers and their prime factors 21

Since ν is continuous, the uniform estimate above implies that the pointprobabilities for K0n − 1 are asymptotic to the corresponding point prob-abilities for a Poisson random variable with mean log log n, in the rangek = o(log log n) using ν(0) = 1, and in the range k ∼ log log n usingν(1) = 1.

Central limit theorems.

Corresponding to the central limit theorem for the number of cycles ofa random permutation in (1.31) is the celebrated central limit theoremof Erdos and Kac (1940); see also Kac (1959), and Billingsley (1969) foran easy proof by the method of moments. The theorem was originallygiven for the number of distinct prime divisors rather than the numberwith multiplicity, but as the difference Dn has a limit distribution, so thatDn/

√log log n→d 0, the two versions of the central limit theorem are easily

seen to be equivalent. Thus, taking a small liberty, the Erdos-Kac centrallimit theorem is the following analog of (1.31): for primes, as n→∞,

K0n − log log n√log log n

→d N(0, 1). (1.48)

There is also a functional central limit theorem describing the counts ofprime factors p of sizes log p up to (log n)t, 0 ≤ t ≤ 1. Define the processBn(·) by

Bn(t) =

∑log p≤(logn)t Cp

(n) − t log log n√

log log n, 0 ≤ t ≤ 1.

Philipp (1973) and Billingsley (1974) showed thatBn(·) →d B(·) as n→∞,where B is standard Brownian motion. A proof of this, with an error boundof order O(log log log n/

√log log n), is given in Arratia (1996).

Non-linear functionals.

Just as for permutations (1.32), the difference Dn between the number ofprime factors, with and without multiplicity, has a limit distribution, asdiscussed by Renyi (1955). It follows from the lemma of Kubilius, togetherwith a truncation argument, that

Dn =∑p

(Cp(n) − 1)+ →d

∑p

(Zp − 1)+,

where the limit distribution has finite mean:

IE∑p

(Zp − 1)+ =∑p

∑k≥2

p−k.


The small and large prime factors

The smallest and largest prime factors of the integer m are denoted byP−(m) and P+(m), with the conventions that P−(1) = ∞, P+(1) = 1.Thus the smallest and largest prime factors of our random integer N(n) areY1

(n) = P−(N(n)) and L1(n) = P+(N(n)). Number theorists write Φ(x, y)

for the number of integers less than or equal to x with smallest prime factorstrictly larger that y, and Ψ(x, y) for the number of integers less than orequal to x with largest prime factor less than or equal to y. Correspondingto (1.33) is the result that for u > 1, as n→∞,

IP[Y1(n) > n1/u] =

1n

Φ(n, n1/u) ∼ uω(u)log n

,

and corresponding to (1.36), restricted to the first coordinate, is the resultthat for u ≥ 1, as n→∞,

IP[L1(n) ≤ n1/u] =

1n

Ψ(n, n1/u) → ρ(u).

For the full analog of (1.36), write Lr(n) for the rth largest prime factor ofN(n), so that, in the example of N(n) = 220, we have L1

(n) = 11, L2(n) = 5,

L3(n) = L4

(n) = 2, and Lr(n) = 0 for r > 4. Billingsley (1972) showed the

Poisson-Dirichlet limit

(log n)−1(logL1(n), logL2

(n), . . .) →d (L1, L2, . . .). (1.49)

Donnelly and Grimmett (1993) use a size-biased permutation of the processof large prime factors, and a comparison with the GEM process, to giveanother proof of (1.49). This size biased permutation was also used byBach (1984) to give an algorithm to find a uniformly distributed randominteger, factored into primes; the GEM distribution is implicit in this work,in that the first component has a uniformly distributed size, and there is arecursive structure.

Error bounds for the approximation of the cumulative distribution func-tion implicit in (1.49) are given in Knuth and Trabb Pardo (1976) for therth coordinate by itself, and in Tenenbaum (1999) for the first r coordinatesjointly. It is shown in Arratia (1998) that the L(n) and L can be constructedon a common probability space in such a way that

IE∑j≥1

|(log n)−1 logLj(n) − Lj |

= O

(log log n

log n

),

but it is conjectured that an optimal construction can achieve O(1/ log n).

1.3. Contrasts between permutations and primes 23

1.3 Contrasts between permutations and primes

We have focussed on parallels between the cycle type of a random per-mutation and the prime factorization of a random integer, which reflect astrong structural similarity between the two settings. However, there areimportant differences. These can be identified by comparing Cauchy’s for-mula (1.2) and its corollary, the conditioning relation (1.15) for randomn-permutations, with the corresponding expression (1.42) for uniformlydistributed random integers not exceeding n. Within these formulas, wefocus on the arguments in the indicator function of an admissible type c,

n∑j=1

jcj = n for permutations,∑p

cp log p ≤ log n for primes. (1.50)

The first difference is that component labels change from positive integersi to primes p. More precisely, the possible component sizes, which show upas the weights on the left sides of the expressions in (1.50), are 1, 2, 3, . . .for permutations, and log 2, log 3, log 5, . . . for primes.

The second difference is seen in the right sides of the expressions in (1.50),n for permutations and log n for primes. We view this as the “system size”.A random n-permutation obviously has size s = n, but at first it seemsperverse to view a random integer less than or equal to n as a system ofsize s = log n. Knuth and Trabb Pardo (1976) make this seem natural byreferring to a random integer at most s digits long. Indeed it is natural toconsider log n as the system size when considering a random integer from 1to n as a multiset; one picks an integer by picking a multiset of primes, andif the prime p is an object of weight log p, then an integer not exceedingn is a multiset of weight not exceeding log n. This point of view helps toexplain the systematic appearance of an extra “log” in every result aboutprimes, when compared to the corresponding result about permutations.Thus, for example, the Hardy–Ramanujan theorem, that a typical integer nhas around log log n prime divisors, is like the statement that a randompermutation of n objects typically has about log n cycles. Both statementssay that a system of size s has typically about log s components.

The third difference between permutations and primes is seen in the rela-tion appearing in the expressions in (1.50), equality for permutations, andinequality for primes. For random n-permutations, the particular randomchoice always has size n, but for prime factorizations, the size of a partic-ular random choice N(n) within the system of size n is not constant, butrather

logN(n) =∑

Cp(n) log p,

which is uniformly distributed over the set 0, log 2, log 3, log 4, . . . , log n.Since conditioning on the value of a weighted sum T0n of independent

random variables is central to this book, we now explain how the anal-


ogy between permutations and primes even includes the conditioning. Theanalog of T0n =

∑1≤i≤n iZi for random permutations is

Tn =∑p≤n

Zp log p.

This is the logarithm of a random integer M = M(n) distributednonuniformly, namely

M(n) = exp(Tn) =∏p≤n

pZp .

In contrast to our random integer distributed uniformly from 1 to n, namely

N(n) =∏p≤n

pCp(n)

,

the random integer M(n) may be larger than n, but it is always free ofprimes p > n. For i ≤ n, IP[N = i] = 1/n does not vary with i, while foran integer i free of primes larger than n, say i =

∏p≤n p

cp , we have

IP[M = i] =∏p≤n

IP[Zp = cp] =∏p≤n

(1− 1/p)p−cp = k(n)/i,

with a normalizing constant k(n) =∏p≤n(1 − 1/p). Thus, to convert

from the distribution of the independent process, encoded as the valuesof IP[M(n) = i], into the distribution of the dependent process, encoded asthe values IP[N = i] = (1/n)1li ≤ n, not only do we condition on i ≤ n,which corresponds to conditioning on the event Tn ≤ log n, but we alsobias with a factor proportional to i: for all positive integers i,

IP[N(n) = i] = IP[M(n) = i]

∏p≤n

(1− 1p)−1

1li ≤ n i

n(1.51)

We view biasing and conditioning in a unified framework as follows. Inthe context of random elements A,B of a discrete space X , one says that“the distribution of B is the h-bias of the distribution of A” if for all α ∈ X ,IP[B = α] = ch h(α) IP[A = α], where the normalizing constant ch may beexpressed as ch = (IEh(A))−1. Starting from the a given distribution for A,this h-biased distribution can be formed if and only if h(α) ≥ 0 for all αsuch that IP[A = α] > 0, and 0 < IEh(A) < ∞. Conditioning on an eventof the form A ∈ S, where S ⊂ X , is exactly the case of biasing where his an indicator function, h(α) = 1lα ∈ S, and the normalizing constantis ch = 1/IP[A ∈ S]. In our examples, A is the independent process, eitherA = (Z1, Z2, . . . , Zn) for permutations, or A = (Zp, p ≤ n), which can beencoded as M(n), for prime factorizations. Similarly B is the dependentprocess, either B = (C1

(n), . . . , Cn(n)) for permutations, or, for the prime

factorizations, B = (Cp(n), p ≤ n), which can be encoded as N(n).

1.3. Contrasts between permutations and primes 25

The conditioning relation (1.15) can be viewed as the statement thatthe distribution of B is the h-bias of the distribution of A, where h(A)is an indicator function of the event Tn = n. The relation (1.51) alsosays that distribution of B is the h-bias of the distribution of A, but nowh(A) = 1lTn ≤ log n exp(Tn− log n), corresponding to the last two factorsof (1.51).

The close similarity between these two versions of biasing shows inthe asymptotics of the normalizing factor. For random permutations, theconstant is

ch(n) = IP[T0n = n]−1 ∼ eγn; (1.52)

the exponential of Euler’s constant, times the system size; for a derivationof this, see (4.16) and (4.18). For prime factorizations, the constant is thefirst factor on the right side of (1.51), ch(n) =

∏p≤n(1 − 1/p)−1, with

ch(n) ∼ eγ log n by Mertens’ theorem. Reading this as eγ times the systemsize, the normalizing constants for prime factorizations have exactly thesame asymptotics as the normalizing constants for permutations, given by(1.52).


2Decomposable combinatorialstructures

Many combinatorial structures decompose naturally into components.Given an instance of size n, the most basic description reports only thenumber k of components. We are interested in the full component spec-trum, specifying how many components there are of sizes one, two, three,and so on. For a given combinatorial structure, the natural model assumesthat n is given and that all p(n) instances of size n are equally likely. Forsuch a random instance, we write Ci(n) for the number of components ofsize i, so that the stochastic process

C(n) = (C1(n), C2

(n), . . . , Cn(n))

specifies the entire component size counting process, the random variable

K0n = K0n(C(n)) = C1(n) + C2

(n) + · · ·+ Cn(n)

is the total number of components, and the linear combination

C1(n) + 2C2

(n) + · · ·+ nCn(n)

is identically n.The fifteen well known combinatorial structures in Chapter 2.1 share the

probabilistic property that we call the Conditioning Relation:

L(C1(n), . . . , Cn

(n)) = L(Z1, Z2, . . . , Zn|T0n = n),

for a fixed sequence of independent random variables Z1, Z2, . . . takingvalues in ZZ+, where

T0n = T0n(Z) = Z1 + 2Z2 + · · ·+ nZn.

2.1. Some combinatorial examples 27

However, some of these examples are what we call logarithmic, andsome are not. Logarithmic structures have the additional property thatthe expected number of components grows logarithmically with the size n:

IEK0n ∼ θ log n, (2.1)

for some constant θ ∈ (0,∞) depending on the structure. A more precisedefinition of the logarithmic class is given in (2.15).

Most of these examples fall into one of three families, the assemblies, mul-tisets and selections; these are discussed further in Chapter 2.2. A moreprobabilistic perspective on these structures appears in Chapter 2.3. InChapter 2.4 we discuss a number of combinatorial methods for producingnew decomposable structures from old ones, and in Chapter 2.5 we dis-cuss ways in which non-uniform decomposable random structures can beproduced.

2.1 Some combinatorial examples

We begin with fifteen examples of decomposable combinatorial structures.For most of these structures we give an instance of size n = 10, with(2, 0, 1, 0, 1, 0, 0, 0, 0, 0) for the value for the component counting process(C1

(10), C2(10), . . . , C10

(10)). Some references are listed with each example, toserve as pointers to the literature. The notation p(n) is used for the numberof possible instances of size n; for random permutations, as in the previouschapter, p(n) = n!.

Example 2.1 Integer partitions. Consider an integer n partitioned asn = l1 + l2 + · · · + lk with l1 ≥ l2 ≥ · · · ≥ lk ≥ 1. Here p(n) is thetraditional notation for the number of such partitions, as for example inHardy and Wright (1960). Euler showed that the p(n) are determined by thegenerating function

∑p(n)xn =

∏i≥1(1−xi)−1. Algorithms to enumerate

the p(n) partitions of n, and for simulating a random partition, may befound in Chapters 9 and 10 of Nijenhuis and Wilf (1978). An instance forn = 10 is given below.

This is page iPrinter: Opaque this

10 = 5 + 3 + 1 + 1

28 2. Decomposable combinatorial structures

The asymptotic formula p(n) ∼ exp(2π√n/6 )/(4n

√3) was given by

Hardy and Ramanujan (1918), and extended to an asymptotic expansion,practical for the exact calculation of p(n), by Rademacher (1937).

Integer partitions are an example of the general multiset construc-tion described below in Meta-example 2.2: for given nonnegative integersm1,m2, . . ., imagine a kingdom in which there are mi different types ofcoin worth i cents, and define p(n) to be the number of ways to producean unordered handful of change for the total amount n cents, that is, amultiset of total weight n. Integer partitions are simply the case mi = 1for i = 1, 2, 3, . . . .

We write Ci(n) for the number of parts which are equal to i. Integer

partitions form an example in which the component counting process C(n)

completely specifies the combinatorial structure. Although combinatoriallysimple, integer partitions are not an example of a logarithmic combi-natorial structure. One aspect of their non-logarithmic behavior is thatIEK0n/ log n → ∞ as n → ∞. For recent probabilistic treatments, see forexample Fristedt (1993) and Pittel (1997a).

Example 2.2 Set partitions. Partition the set [n] = 1, 2, . . . , n intoblocks. Each block is a nonempty set, with no additional structure; theseblocks may be viewed as the equivalence classes of an arbitrary equivalencerelation. For set partitions, p(n) = Bn is the nth Bell number, with gen-erating function

∑Bnx

n/n! = exp(ex − 1). Algorithms for enumeratingset partitions, and for simulating a random set partition, may be found inChapters 11 and 12 of Nijenhuis and Wilf (1978). See also Pitman (1997).An instance for n = 10 is given below. On the left we show the partition,as a set of sets, and on the right we show the equivalence relation involved,plotting a point at (i, j) whenever i and j are in the same block.

This is page iiPrinter: Opaque this

4, 8, 3, 5, 7, 1, 2, 6, 9, 10

A formula for the asymptotics of Bn was given by Moser and Wyman(1955); see Table 2.2.

The component counting process C(n) only reports how many blocks thereare of each size, and does not specify the extra information as to which


elements form the blocks. Set partitions are not an example of a logarithmiccombinatorial structure; just as for integer partitions, IEK0n/ log n → ∞as n → ∞. For recent probabilistic treatments, see for example Sachkov(1974, 1997) and Pittel (1997b).

Example 2.3 Graphs. Consider a random graph on [n], with all p(n) =2(n

2) possibilities equally likely. By comparison with the previous example,each block is enriched with the additional structure of a connected graph,and Ci(n) is the number of connected components having i vertices. Bollobas(1985, 2001) gives an introduction to the field. Our instance with n = 10has edges 1, 6, 1, 9, 2, 6, 2, 9, 2, 10, 9, 10, 3, 5, 3, 7; vertices4 and 8 are isolated. The illustration shows both a traditional graph picture,and also the incidence relation of the graph.

This is page iiiPrinter: Opaque this

4 8

5 3 7 1 6

9 2

10

Random graphs are also not an example of a logarithmic combinatorialstructure, since IEK0n/ log n→ 0 as n→∞. In fact, IEK0n → 1, reflectingthe fact that a random graph is connected with very high probability.

Example 2.4 Permutations. Consider the cycle decomposition of a per-mutation on [n], with Ci

(n) being the number of cycles of length i. Forpermutations, p(n) = n!. Once a set of i elements have been specified aselements of a cycle, there are mi = (i − 1)! ways to place them in a cy-cle. Algorithms for enumerating permutations may be found in Chapter 7of Nijenhuis and Wilf (1978). An instance for n = 10 is the function πwith π(1) = 9, π(2) = 1, π(3) = 7, π(4) = 4, π(5) = 3, π(6) = 2,π(7) = 5, π(8) = 8, π(9) = 10, π(10) = 6, whose canonical cycle notationis π = (1 9 10 6 2) (3 7 5) (4) (8). The picture on the right below showsthe graph (i, π(i)) of this permutation. In this picture, as well as thosein the next two examples, all cycles should be read clockwise.

30 2. Decomposable combinatorial structuresThis is page ivPrinter: Opaque this

all cycles go clockwise

4 83

75

1

9

106

2

An algorithm for simulating a random permutation in standard form isgiven in Chapter 8 of Nijenhuis and Wilf (1978), while the Feller couplingon page 8 can be used to produce random permutations decomposed intocycles. We mention here another algorithm, known as the Chinese Restau-rant Process, that may also be used for this purpose. Start the first cyclewith the integer 1. The integer 2 either either joins the first cycle (to theright of 1) with probability 1/2, or starts the second cycle. Suppose thatk − 1 integers have been assigned to cycles. Integer k either starts a newcycle with probability 1/k, or is inserted immediately to the right of anyone of the integers already assigned to cycles, the choice being uniform.After n integers have been assigned, it is easy to check that the resultingrandom permutation is uniformly distributed over Sn. An extensive reviewof this process, including the connection with record values, is given byDiaconis and Pitman (1986) and Pitman (1996).

Permutations are the archetypical example of a logarithmic combinato-rial structure, with IEK0n/ log n → 1 as n → ∞. In some sense we viewthe component structure of any logarithmic combinatorial structures as aperturbation of the cycle structure of random permutations. Some of theextensive probabilistic literature concerning the cycle structure of randompermutations was given in Chapter 1.1.

Example 2.5 Mappings. Consider all mappings from [n] to itself, so thatthere are p(n) = nn possibilities. A mapping f corresponds to a directedgraph with edges (i, f(i)), 1 ≤ i ≤ n, where every vertex has outdegreeone, and the “components” of f are precisely the connected componentsof the underlying undirected graph. Once a set of i elements have beenspecified as elements of a component, there are mi = (i − 1)!

∑i−1j=0 i

j/j!ways to place them in a component; see Katz (1955), and Bollobas (1985,p. 365) for further introduction and historical references. Each componentin a mapping is a directed cycle of rooted labelled trees. An instance forn = 10 is the function f with f(1) = 10, f(2) = 6, f(3) = 5, f(4) =4, f(5) = 3, f(6) = 6, f(7) = 3, f(8) = 8, f(9) = 2, f(10) = 2. Note thatthe number of fixed points, 3 in this instance, is not C1.

2.1. Some combinatorial examples 31This is page vPrinter: Opaque this

4 8

7 35

6

2

10 9

1

Note that mi/i! ∼ ei/(2i); this follows from the fact that mi/(i − 1)!is exactly ei times the probability that a Poisson random variable withmean i is strictly less than its mean, and this probability tends to 1/2 bythe central limit theorem. Random mappings are an example of a logarith-mic combinatorial structure with IEK0n/ log n → 1/2 as n → ∞. Thereis an extensive probabilistic literature concerning random mappings. Seefor example Stepanov (1969), Aldous (1985), Kolchin (1986), Flajolet andOdlyzko (1990b) and Aldous and Pitman (1994).

Example 2.6 Mapping patterns. In the previous example, instead of thelabelled mapping digraph, consider only the underlying topology. In otherwords, consider the equivalence classes of mappings f under the relation:f ∼ g if there exists a permutation π with the property that f π = π g.An instance for n = 10 is shown below. It represents the equivalence classof the mapping from the previous example. This equivalence class has 10!/2labelled representatives; the factor of two reflects the presence of two fixedpoints, and no other symmetry relations for this instance.

This is page viPrinter: Opaque this

Pick a random equivalence class, with all equivalence classes equallylikely. (Note that this is very different from picking a mapping with allof the nn possibilities equally likely, and then taking its equivalence class.)Let mi be number of topologies for a component on i points; a randommapping pattern for n points is simply a multiset of components havingtotal weight n, where there are mi types of component of weight i. Having


asymptotics of the form mi ∼ ρ−i/(2i) identifies mapping patterns as alogarithmic combinatorial structure with IEK0n/ log n → 1/2. The valueof ρ is not important, the essential feature being rather that, after takingout the exponential growth of mi, the remaining factor decays as a constantθ over ip, where the power p is exactly one. See Meir and Moon (1984) andMutafciev (1988).

Example 2.7 Forests of labelled (unrooted) trees. Partition the set [n]into blocks, and on each block form a tree, that is, a connected acyclicgraph. The number of ways that i given points can form a tree is givenby Cayley’s formula, mi = ii−2; see Cayley (1889) and Moon (1970)for example. An instance of an unrooted forest with n = 10 and edges1, 10, 2, 6, 2, 9, 2, 10, 3, 5, 3, 7 is shown below.

This is page viiPrinter: Opaque this

4 8

7 3 5

6

2

10 9

1

Example 2.8 Forests of labelled rooted trees. Partition the set [n] intoblocks, and on each block form a rooted tree; to root a tree means topick out one vertex as distinguished. The number of ways that i givenpoints can form a component is i times as large as it was in the previousexample, so now mi = ii−1. There is a natural bijection, taking a single(unrooted) tree on vertices 0, 1, 2, . . . , n into a forest of rooted trees onvertices 1, 2, . . . , n, given by considering each vertex adjacent to 0 to bethe root of a tree; this shows that p(n) for the current example equals mn+1

for the previous example, so that number of forests of labelled rooted treesis p(n) = (n + 1)n−1; see Moon (1970). An instance of a rooted forestwith n = 10, as in the previous example, but with roots (indicated byarrowheads) placed at 4,5,6, and 8, is shown below.


4 8

7 3 5

6

2

109

1

Example 2.9 Forests of unlabelled (unrooted) trees. In Example 2.7, con-sider only the underlying topology, with all topologies equally likely. Aninstance with n = 10 is shown in the figure below. This particular instanceis an equivalence class consisting of 10!/(23) different forests of labelledtrees on the set [10].

This is page viiiPrinter: Opaque this

The number mi of unlabelled unrooted trees on i points was studiedin detail by Otter (1948), who established that mi ∼ cρ−ii−5/2, whereρ.= 0.3383219 and c

.= 0.5349485. Palmer and Schwenk (1979) gave theasymptotics of p(n) in the form p(n) ∼ dmn, with d .= 1.91. Explicit valuesformi and p(n) are found by recursion; see for example Palmer and Schwenk(1979) and sequences A000055 and A005195 of the On-Line Encyclopediaof Integer Sequences (cf. Sloane and Pouffe (1995)).

Example 2.10 Forests of unlabelled rooted trees. In Example 2.8, consideronly the underlying topology, with all topologies equally likely. The instancewith n = 10 shown in the figure below is one such. This particular instanceis an equivalence class consisting of 10!/2 different forests of labelled rootedtrees on the set [10].


Otter (1948) also studied the number mi of unlabelled rooted trees on ipoints. It is easy to see that p(n) = mn+1 from the bijection which takesa rooted unlabelled tree on n + 1 points, and removes the root to createa forest, in which each neighbor of the root in the original tree becomes aroot in the forest. The asymptotics of p(n) in this case follow from those formn given by Otter (1948); see also Harary and Palmer (1973), Palmer andSchwenk (1979) and Table 2.2. Explicit values formn may again be found byrecursion. See for example sequence A000081 of the On-Line Encyclopediaof Integer Sequences, and Chapter 29 of Nijenhuis and Wilf (1978).

Forests (of trees, whether rooted or not, and whether labelled or not)do not form a logarithmic structure, since IEK0n/ log n → 0. In each casethere is a constant c ∈ (1,∞), the limiting average number of trees in aforest, such that IEK0n → c.

There is now an extensive literature relating to random trees and forests.For a flavor of this see Aldous and Pitman (1998), Mutafciev (1998),Kolchin (1999), Pitman (1998, 2001) and Pavlov (2000).

Example 2.11 Polynomials over GF(q). Consider monic polynomials ofdegree n over the finite field GF(q). Writing

f(x) = xn + an−1xn−1 + · · ·+ a1x+ a0,

we see that there are p(n) = qn possibilities. Such a polynomial can beuniquely factored into a product of monic irreducible polynomials, andCi

(n) reports the number of irreducible factors of degree i. See the nextexample, necklaces, as well as Flajolet and Soria (1990), Hansen (1993)and Arratia, Barbour and Tavare (1993). For the case q = 2, an instancewith n = 10 is

f(x) = x10+x8+x5+x4+x+1 = (x+1)2(x3+x2+1)(x5+x4+x3+x+1).

This structure is logarithmic, and, for any q, IEK0n/ log n→ 1.

Example 2.12 Necklaces over an alphabet of size q. In Example 2.11,there is a field GF(q) if and only if q is a prime power. However, thecomponent spectra of the combinatorial structures of Example 2.11 are


determined solely by the numbers mi of monic irreducible polynomials ofdegree i which can be combined to form the p(n) = qn distinct monicpolynomials of degree n, n ≥ 0. It is thus conceivable that, even if q is nota prime power, there may exist integers mi(q), i ≥ 1, that can be used todefine an analogous ‘multiset’ structure with p(n) = qn for each n ≥ 0.Just as was shown by Gauss for the case where q is a prime power, theunknown numbers mi(q) satisfy the generating function relation∏

i≥1

(1− xi)−mi(q) =∑n≥0

p(n)xn =∑n≥0

qnxn

if and only if qn =∑d|n dmd(q) for n ≥ 1, which by Mobius inversion

is equivalent to imi(q) =∑d|i µ(d)qi/d for i ≥ 1. For any integer q ≥

2, these mi(q) are all nonnegative integers, so that declaring that thereare mi(q) different types of object of weight i, for i = 1, 2, . . ., createsa multiset construction having p(n) = qn for n ≥ 0. In the special casethat q is a prime power, these mi(q) objects may naturally be taken tobe the monic irreducible polynomials of degree i over GF(q), so that amultiset of weight n of these objects is identified with their product, amonic polynomial of degree n.

For an arbitrary positive integer q, not necessarily a prime power, there isin fact a natural combinatorial structure having mi(q) objects of weight i,with qn =

∑d|n dmd(q). Consider aperiodic words of length i over an al-

phabet of size q; such words form equivalence classes of size i under rotation,so that, if mi(q) is the number of circular equivalence classes, then imi(q)is the number of aperiodic words of length i. From the correspondence be-tween an arbitrary word of length n, and the shortest initial segment ofthe word, say of length d, which repeated n/d times gives back the originalword, we see that qn =

∑d|n dmd(q). These circular equivalence classes are

called necklaces; see Metropolis and Rota (1983) or van Lint and Wilson(1992) for example. They also arise naturally in a study of card shuffling; seeDiaconis, McGrath and Pitman (1995). The structures having p(n) = qn

here could be called “multisets of necklaces.”

Example 2.13 Additive arithmetic semigroups Following Knopfmacher(1975, 1979) and Zhang (1996a,b, 1998), an “additive arithmetic semi-group” G is, by definition, a free commutative semigroup with identityelement 1 such that G has a countable free generating set of “primes” andsuch that G admits a “degree” mapping ∂ : G → ZZ+ satisfying

(1) ∂(ab) = ∂(a) + ∂(b) for all a, b ∈ G, and

(2) The number of elements a ∈ G with ∂(a) ≤ x is finite for every x.

It follows immediately from (1) and (2) that ∂(a) = 0 if and only if a = 1.


To paraphrase, “free commutative semigroup with a countable freegenerating set of primes” means that every element of G has a unique fac-torization as a finite product of primes, and every finite product of primesis an element of G. Example 2.11 is an example of an additive arithmeticsemigroup, in which the monic irreducible polynomials are the primes, andthe degree mapping is the usual degree of a polynomial. We revisit thisexample in Meta-example 2.2, where we use combinatorial terminology:the set of primes is called the universe of objects, an element of G is afinite multiset of such objects, degree is called weight, and 1 is the emptymultiset.

Knopfmacher’s examples include monic polynomials over GF(q), finitemodules over GF(q), semisimple finite algebras over GF(q), integral divisorsin algebraic function fields, ideals in the principal order of an algebraicfunction field, and finite modules or semisimple finite algebras over a ringof integral functions. We consider additive functions on additive arithmeticsemigroups, together with some extensions, in detail in Chapter 7.5.

Allowing the degree to take values in [0,∞) rather than ZZ+, one hasthe “generalized primes” considered by Beurling (1937); see also Diamond(1973) and Lagarias (1999).

Example 2.14 Squarefree polynomials over GF(q). This example is sim-ilar to Example 2.11, but we require also that all the monic irreduciblefactors be distinct; Flajolet and Soria (1990). For the case q = 2, an instancewith n = 10 is

f(x) = x10 + x9 + x5 + x = x(x+ 1)(x3 + x2 + 1)(x5 + x4 + x3 + x+ 1).

In this case, p(n) is not easy to determine, but p(n)/qn, which is theprobability that a random polynomial of degree n over GF(q) is squarefree,has a limit c(q) ∈ (0, 1) as n→∞.

Example 2.15 Characteristic polynomials of nonsingular matrices overGF(q). Pick one of the (qn−1)(qn− q)(qn− q2) · · · (qn− qn−1) nonsingularn by n matrices over GF(q), with all possibilities equally likely. Take itscharacteristic polynomial, and divide by the leading coefficient, if necessary,to get a random monic polynomial. As in Examples 2.11 and 2.14, thiscan be factored uniquely as a product of monic irreducibles, and Ci

(n) isthe number of irreducible factors of degree i; see Stong (1988), Goh andSchmutz (1991, 1993) and Hansen and Schmutz (1993). For an instance

2.2. Assemblies, multisets and selections 37

with q = 2 and n = 10, our choice of matrix is

1 0 1 0 1 0 0 0 1 01 1 1 0 1 0 0 0 1 10 0 1 0 0 0 0 0 1 01 0 1 1 1 1 0 0 1 10 0 1 0 1 0 0 0 1 01 0 1 0 1 1 1 0 1 11 0 1 1 1 1 1 0 1 11 0 1 0 1 0 0 1 1 10 0 0 0 0 0 0 0 1 11 0 1 0 1 0 0 0 1 1

whose characteristic polynomial is

f(x) = x10+x8+x5+x4+x+1 = (x+1)2(x3+x2+1)(x5+x4+x3+x+1).

2.2 Assemblies, multisets and selections

With the exception of Example 2.15, the combinatorial examples above,and indeed most of the classical combinatorial structures, belong to oneof the three classes given in Meta-examples 1, 2, and 3 below. Table 2.2summarizes the examples from this chapter.

Meta-example 2.1 Assemblies. This is a class of examples including setpartitions, permutations, mappings, forests of labelled unrooted trees, andforests of labelled rooted trees. The underlying set [n] is partitioned intoblocks, and then for each block of size i one of mi possible structures ischosen — so the combinatorial structure is determined by the sequencem1,m2, . . .. Thus the fundamental example is Example 2.2, set partitions,with 1 = m1 = m2 = · · ·. For Example 2.4, permutations, we havemi = (i− 1)!, corresponding to choosing a cyclic order on the points in ablock of size i. For Example 2.5, mappings, the structure imposed on a blockis a “directed cycle of rooted labelled trees,” and mi = (i−1)!(

∑j<i i

j/j!).For Example 2.7, a labelled unrooted tree structure is chosen for each block,and so mi = ii−2. In general, with

M(x) =∑i≥1

mixi/i!, P (x) =

∑n≥0

p(n)xn/n! ,

assemblies are characterized by the exponential formula

P (x) = exp(M(x)).

Assemblies are also known as exponential families or as uniform structurescompounded with some other structure, and their study is part of the theoryof species. See Foata (1974), Joyal (1981), van Lint and Wilson (1992) and


Arratia and Tavare (1994). Wilf (1990) cites the first appearance of theexponential formula as Riddell and Uhlenbeck (1953).

The number R(n, c) of instances of size n which have c = (c1, c2, . . . , cn)components of sizes (1, 2, . . . , n) respectively, for any vector c such that∑ni=1 ici = n, may be calculated as follows. There are

n!/ n∏i=1

(i!)ci ci!

different ways of assigning the n labelled elements to subsets of the req-uisite sizes, and, for each such assignment,

∏ni=1m

cii structures can be

distinguished. Thus the number of instances which have component countsgiven by c is just

R(n, c) = n!n∏i=1

(mi

i!

)ci 1ci!. (2.2)

Meta-example 2.2 Multisets. This is a class of examples including in-teger partitions, mapping patterns, forests of unlabelled trees, polynomialsover GF(q), and necklaces. There is a universe of objects with positiveinteger weights, with exactly mi different objects of weight i. Pick a mul-tisubset of that universe, with total weight n. Equivalently, the integer nis partitioned into parts, and for each part of size i one of the mi objectsof weight i is assigned. Thus Example 2.1, partitions of an integer, is thefundamental example, with 1 = m1 = m2 = · · ·; the universe is the set ofpositive integers, and a random multisubset of weight n is chosen. For Ex-ample 2.6, mapping patterns, the universe consists of directed cycles of oneor more unlabelled rooted trees, and the weight of an element is the num-ber of vertices. For Example 2.9, forests of unlabelled trees, the underlyinguniverse is the set of all unlabelled trees on one or more vertices, with mi

given by Otter’s formula (Otter, 1948). For Example 2.11, the universe isall monic irreducible nonconstant polynomials over GF(q), and mi countsthe number of such polynomials of degree i. For multisets in general, withp(n) being the number of multisubsets of weight n, P (x) =

∑n≥0 p(n)xn

and M(x) =∑i≥1mix

i, we have

P (x) =∏i≥1

(1− xi)−mi = exp(∑j≥1

M(xj)/j).

See for example Joyal (1981), Flajolet and Soria (1990) and Arratia andTavare (1994).

For multisets, the number of ways of choosing an unordered ci–tuplewith replacement from mi objects is

(mi+ci−1

ci

), and hence the number of

2.3. The probabilistic perspective 39

instances of size n with component size counts c is

R(n, c) =n∏i=1

(mi + ci − 1

ci

), (2.3)

for any c such that∑ni=1 ici = n.

Meta-example 2.3 Selections. This class is like multisets, as describedabove, except that now we take subsets rather than multisubsets of theuniverse: that is, all parts must be distinct. “Subsets of a universe ofobjects with positive integer weights” would be a natural alternative de-scription. The selection corresponding to Example 2.1, partitions of n, isthe structure “partitions of n with all parts distinct”. Example 2.14, squarefree polynomials over GF(q), is the selection corresponding to the multisetin Example 2.11, unrestricted polynomials. For selections in general, withp(n) being the number of subsets of weight n, P (x) =

∑n≥0 p(n)xn and

M(x) =∑i≥1mix

i, we have

P (x) =∏i≥1

(1 + xi)mi = exp(∑j≥1

M(−xj)/j).

See for example Joyal (1981), Flajolet and Soria (1990) and Arratia andTavare (1994).

For selections, the number of instances of size n with component sizecounts c is given by

R(n, c) =n∏i=1

(mi

ci

), (2.4)

for any c such that∑ni=1 ici = n.

2.3 The probabilistic perspective

The examples in Chapter 2.1 have the property that the joint distributionof the C(n) can be expressed as the joint distribution of a collection ofindependent random variables Z1, Z2, . . . , Zn conditioned on the value ofthe weighted sum T0n = Z1 + 2Z2 + · · ·+nZn being exactly n. We refer tothis as the Conditioning Relation:

L(C1(n), . . . , Cn

(n)) = L(Z1, . . . , Zn |T0n = n). (2.5)

We start by identifying the distribution of the random variables Zi for thethree major classes of combinatorial structures; the assemblies, multisetsand selections described in Meta-examples 2.1 – 2.3.


To do this, we compute the fraction of instances of size n which havec = (c1, c2, . . . , cn) components of sizes (1, 2, . . . , n) respectively, for any csuch that

∑ni=1 ici = n. This fraction is the same as the probability that an

instance chosen uniformly at random from the set of all possible instancesof size n should have component counts given by c.

Assemblies

Taking assemblies first, the fraction of instances which have componentcounts given by c is, from (2.2),

fA(n)(c) = kA(n)

n∏i=1

(mi

i!

)ci 1ci!, (2.6)

where the constant kA(n) = n!/p(n) is such that∑

(n)fA(n)(c) = 1, where

the sum∑

(n) is taken over all c ∈ ZZn+ such that∑ni=1 ici = n. On the

other hand, for any x > 0, if Z1, . . . , Zn are independent Poisson randomvariables with

Zi ∼ Po(mixi/i!), (2.7)

so that

IP[Zi = l] = e−mixi/i!

(mix

i

i!

)l 1l!, l = 0, 1, 2, . . . ,

we have

IP[Z1 = c1, . . . , Zn = cn] = exp

−n∑j=1

(mjxj/j!)

n∏i=1

(mix

i

i!

)ci 1ci!

= xn exp

−n∑j=1

(mjxj/j!)

n∏i=1

(mi

i!

)ci 1ci!,

for any c ∈ ZZn+ such that∑ni=1 ici = n. Hence it follows that

fA(n)(c) = IP

[Z1 = c1, . . . , Zn = cn

∣∣∣ n∑i=1

iZi = n]

= kA(n)n∏i=1

(mi

i!

)ci 1ci!

(2.8)

for some constant kA(n). However, since∑

(n)fA(n)(c) = 1 as well, it must

be the case that kA(n) = kA(n), and thus also that fA(n)(c) = fA(n)(c) for

all c: the component size counts for assemblies satisfy the ConditioningRelation, with Poisson distributed Zi.


Multisets

For multisets, the fraction of instances of size n with component sizecounts c is, from (2.3),

fM(n)(c) = kM (n)

n∏i=1

(mi + ci − 1

ci

), (2.9)

for any c such that∑ni=1 ici = n; once again, kM (n) = 1/p(n) is an ap-

propriate normalizing constant. On the other hand, for any 0 < x < 1, ifZ1, . . . , Zn are independent negative binomial random variables with

Zi ∼ NB (mi, xi), (2.10)

so that

IP[Zi = l] = (1− xi)mi

(mi + l − 1

l

)xil, l = 0, 1, 2, . . . ,

we have

IP[Z1 = c1, . . . , Zn = cn] =n∏i=1

[(1− xi)mi

(mi + ci − 1

ci

)xici

]

= xn

n∏j=1

(1− xj)mj

n∏i=1

(mi + ci − 1

ci

),

for any c such that∑ni=1 ici = n, so that

fM(n)(c) = IP

[Z1 = c1, . . . , Zn = cn

∣∣∣ n∑i=1

iZi = n]

= kM (n)n∏i=1

(mi + ci − 1

ci

), (2.11)

Once again it follows that kM (n) = kM (n), and thus also that fM(n)(c) =fM

(n)(c) for all c: the component size counts for multisets satisfy theConditioning Relation, with negative binomially distributed Zi.

Selections

For selections, from (2.4), the fraction of instances of size n with componentsize counts c is given by

fS(n)(c) = kS(n)

n∏i=1

(mi

ci

), (2.12)

as long as∑ici = n; as earlier, kS(n) = 1/p(n) is a normalizing constant.

For any x > 0, note that, if Z1, . . . , Zn are independent with

Zi ∼ Bi(mi, x

i/(1 + xi)), (2.13)


so that

IP[Zi = l] =(mi

l

)(xi

1 + xi

)l ( 11 + xi

)mi−l

, l = 0, 1, . . . ,mi,

then we have

IP[Z1 = c1, . . . , Zn = cn] =n∏i=1

[(mi

ci

)(xi

1 + xi

)ci(

11 + xi

)mi−ci]

= xn

n∏j=1

(1 + xj)−mj

n∏i=1

(mi

ci

),

as long as∑ici = n. Thus

fS(n)(c) = IP

[Z1 = c1, . . . , Zn = cn

∣∣∣ n∑i=1

iZi = n]

= kS(n)n∏i=1

(mi

ci

), (2.14)

and once again it follows that kS(n) = kS(n), and from this also thatfS

(n)(c) = fS(n)(c) for all c. Thus the component size counts for selections

satisfy the Conditioning Relation, with binomially distributed Zi.

Table 2.1 summarizes these three classes.

Table 2.1. Three basic classes of combinatorial structure.

This is page xPrinter: Opaque this

Name of class Relation of p(n) with mi

Distribution of Zi

and its parameters IEZi

assembly∑n≥0

p(n)zn

n!= exp

(∑i≥1

mizi

i!

) Poissonmix

i

i!, any x > 0

mixi

i!

multiset∑n≥0

p(n)zn =∏i≥1

(1− zi

)−mi

negative binomial

(mi, xi), any x ∈ (0, 1)

mixi

1− xi

selection∑n≥0

p(n)zn =∏i≥1

(1 + zi

)mi

binomial

(mi,xi

1+xi ), any x ∈ (0,∞)mix

i

1 + xi

Logarithmic assemblies, multisets and selections

For certain choices of the mi, the free parameter x in the definitions of therandom variables Zi above can be so chosen that for i large, we have

i IP[Zi = 1] → θ and i IEZi → θ, (2.15)


for some finite constant θ > 0. If this is possible, we call the structurelogarithmic.

Logarithmic assemblies are just those that have

mi

i!∼ θyi

ias i→∞, (2.16)

for some y > 0, θ > 0, since we can then take x = 1/y in (2.7). For example,permutations have y = θ = 1 and random mappings have y = e, θ = 1/2.Forests of labelled trees do not form a logarithmic combinatorial structure,but are close to the borderline, in that mi/i! = ii−2/i! ∼ ei/

√2πi5, so

that, after taking out the exponential growth of mi/i!, the remaining factordecays as a constant θ divided by ip, but for the power p we have p = 5/2,rather than p = 1. Switching to rooted trees increases mi exactly by afactor of i, causing p to change from 5/2 to 3/2, and we still fail to have alogarithmic combinatorial structure.

Logarithmic multisets and selections satisfy

mi ∼θyi

ias i→∞, (2.17)

for some y > 1, θ > 0, again by taking x = 1/y in (2.10) or (2.13). Forexample, polynomials over a field of q elements have y = q, θ = 1, andrandom mapping patterns have y = ρ−1, θ = 1/2, where ρ = 0.3383 . . . .


Table

2.2

.Som

ebasi

cco

mbin

ato

rialst

ruct

ure

s,w

ith

p(n

)=

num

ber

ofin

stance

sofsi

zen,m

i=

num

ber

ofin

stance

sofsi

zei

hav

ing

only

one

com

ponen

t.

Nam

eof

Exa

mpl

ety

pep(n

)mi

loga

rith

mic

?

inte

ger

part

itio

nsm

ulti

set

∼e2

π√

n/6

4n√

31

noin

tege

rpa

rtit

ions

wit

hal

lpa

rts

dist

inct

sele

ctio

n∼

eπ√

n/3

4n

3/431

/4

1no

set

part

itio

nsas

sem

bly

∼e

n(r−

1−

1/

r)−

1√

logn

,

whe

rerer

=n

1no

grap

hsas

sem

bly

2(n 2)

∼p(i

)no

2-re

gula

rgr

aphs

asse

mbl

y∼

√2

e3

/4

( n e) n(i−

1)!

2,i≥

3ye

s;θ

=1 2

perm

utat

ions

asse

mbl

yn!

(i−

1)!

yes;θ

=1

map

ping

sas

sem

bly

nn

(i−

1)!∑ k

<iik k!∼

1 2ei

(i−

1)!

yes;θ

=1 2

map

ping

patt

erns

mul

tise

t

∼c 0ρ−

n

√n

,

(ρ. =.3

383;c 0

. =.4

42)

∼ρ−

i

2i

yes;θ

=1 2

fore

sts

ofla

belle

d(u

nroo

ted)

tree

sas

sem

bly

∼√enn−

2ii−

2no

fore

sts

ofla

belle

dro

oted

tree

sas

sem

bly

(n+

1)n−

1ii−

1no

fore

sts

ofun

labe

lled

(unr

oote

d)tr

ees

mul

tise

t∼c 1ρ−nn−

5/2,

(c1. =

1.02

)∼cρ−i i−

5/2,

(ρ. =.3

383;c. =.5

349)

no

fore

sts

ofun

labe

lled

root

edtr

ees

mul

tise

t∼c 2ρ−nn−

3/2,

(c2. =

1.30

03)

∼c rρ−i i−

3/2,

(ρ. =.3

383;c r

. =.4

399

)no

mon

icpo

lyno

mia

lsov

erG

F(q

)m

ulti

set

qn1 i

∑ d|iµ(i/d

)qd∼qi/i

yes;θ

=1

squa

refr

eepo

lyno

mia

lsov

erG

F(q

)se

lect

ion

c(q)qn

1 i

∑ d|iµ(i/d

)qd

yes;θ

=1

Ew

ens

Sam

plin

gFo

rmul

aan

yθ>

0as

sem

bly

∏ n 1(i

+θ−

1)θ

(i−

1)!

yes

2.4. Refining and coloring 45

2.4 Refining and coloring

A number of new combinatorial structures can be built from the basicmodels described above, using various refining and coloring operations.These we now describe.

Refining a structure

In each of the examples discussed so far, the component counting processC(n) = (C1

(n), . . . , Cn(n)) specifies the number of components of weight i in

an instance of total weight∑ni=1 iCi

(n) = n. We now suppose that the mi

possibilities for a component of weight i have been labelled 1, 2, . . . ,mi, andwe let Dij

(n) count the number of occurrences of weight i having label j.Thus the counts Ci(n) may be refined as

Ci(n) =

mi∑j=1

Dij(n), 1 ≤ i ≤ n.

For the fully refined process corresponding to a random object of size n wedenote the combinatorial process by

D(n) = (Dij(n), 1 ≤ i ≤ n, 1 ≤ j ≤ mi).

In this construction, the structure has been refined as much as possi-ble. Intermediate refinements are also of interest. We suppose that themi objects of weight i are divided into ri classes, defined by a partition∆i1, . . . ,∆iri

, say, of 1, 2, . . . ,mi. We define

mij = |∆ij |, j = 1, . . . , ri; i = 1, . . . , n.

The quantity Dij(n) now counts the number of occurrences of objects be-

longing to the jth class of objects of weight i, and the refined processis

D(n) = (Dij(n), 1 ≤ i ≤ n, 1 ≤ j ≤ ri).

When ∆ij = j, j = 1, . . . ,mi, so that |∆ij | = 1, we recover the fullyrefined process.

It is convenient to generalize to a situation that handles weighted sumswith an arbitrary finite index set. We assume that I is a finite set, and thatw is a given weight function with values in IR such that for α ∈ I, w(α) isthe weight of α. For any a = (a(α))α∈I ∈ ZZI+, we use vector dot productnotation for the weighted sum

w · a =∑α∈I

a(α)w(α).

We denote by D(n) = (Dα(n))α∈I the process that counts the number of

objects having total weight w · D(n) = n with Dα(n) components of type


α ∈ I. For example, for the fully refined process we may take

I = α = (i, j) : 1 ≤ i ≤ n, 1 ≤ j ≤ mi, (2.18)

and weight function

w(α) = i for α = (i, j) ∈ I; (2.19)

for the partially refined process,

I = α = (i, j) : 1 ≤ i ≤ n, 1 ≤ j ≤ ri. (2.20)

The classical combinatorial structures.

The partially refined combinatorial processes D(n) also satisfy the Con-ditioning Relation, as may be seen using much the same argument asabove.

For the index set I given by (2.20) and weight function w given by (2.19),and for b = (bij) ∈ ZZI+ having weight b ·w = n, consider the number R(n, b)of objects having bij components of type (i, j) for (i, j) ∈ I. For assemblies,the refined generalization of Cauchy’s formula is that

R(n, b) = |assemblies on [n] : D(n) = b|

= 1lb · w = n n!∏

(i,j)∈I

(mij

i!

)bij 1bij !

. (2.21)

Fixing any x > 0, and letting Zij , (i, j) ∈ I, denote independent Poissonrandom variables with Zij ∼ Po(mijx

i/i!), we see by comparison with (2.6)that

L(Dij(n), (i, j) ∈ I) = L

Zij , (i, j) ∈ I ∣∣∣ ∑(i,j)∈I

iZij = n

. (2.22)

In the special case of complete refinement with index set (2.18), it followsthat Zij ∼ Po(xi/i!).

There is another construction of the refined process which is useful inapplications. Write Ci(n) =

∑j:(i,j)∈I Dij

(n), 1 ≤ i ≤ n, and let b be an array(bij , (i, j) ∈ I) ∈ ZZI+ satisfying b ·w =

∑(i,j)∈I ibij = n. Define the column

sums bi+ =∑j:(i,j)∈I bij , 1 ≤ i ≤ n, and set b+ = (b1+, . . . , bn+). Then the

refined combinatorial structure satisfies

IP[D(n) = b] = IP[D(n) = b | C(n) = b+]IP[C(n) = b+]

= IP[C(n) = b+]n∏i=1

(bi+

bi1, . . . , biri

) ri∏j=1

(mij

mi

)bij

.

Hence we may think of generating a refined structure in two stages: firstan unrefined instance of weight n is generated, giving component countsb1+, . . . , bn+. Then each of the bi+ components of size i is assigned the

2.4. Refining and coloring 47

label j independently and at random, each receiving label j with probabil-ity pij = mij/mi. Notice that, for wreath products in Example 2.16, thequantity pij = |Cj |/|G| depends on j but not on i.

For multisets,

R(n, b) = |multisets of weight n : D(n) = b| (2.23)

= 1lb · w = n∏α∈I

(mα + bα − 1

bα

),

so that, fixing any 0 < x < 1, we can take independent Zij , (i, j) ∈ I,with negative binomial distributions NB(mij , x

i), and (2.22) holds. In thespecial case of complete refinement, the Zij have geometric distributionwith parameter xi. This time the conditional distribution of D(n) given C(n)

is negative hypergeometric:

IP[D(n) = b | C(n) = b+] =(mi + bi+ − 1

bi+

)−1 ri∏j=1

(mij + bij − 1

bij

).

For selections,

R(n, b) = |selections of weight n : D(n) = b| (2.24)

= 1lb · w = n∏α

(mα

bα

).

It follows that we can fix any x > 0, and take the independent Zijto have the binomial distributions Bi(mij , x

i/(1 + xi)), in which case(2.22) holds once more. For complete refinement, the Zij have independentBernoulli distributions with parameter xi/(1+xi). Finally, the conditionaldistribution of D(n) given C(n) is hypergeometric:

IP[D(n) = b | C(n) = b+] =(mi

bi+

)−1 ri∏j=1

(mij

bij

).

Thus one could consider the fully refined versions of the counting formu-las (2.21), (2.23) and (2.24) to be the basic counting formulas, with (2.6),(2.9) and (2.12) as corollaries derived by summing; and thus consider thePoisson, geometric and Bernoulli distributions to be the basic distributions,with the Poisson, negative binomial and binomial distributions arising in(2.7), (2.10) and (2.13) through convolution.

Coloring

Another way to induce new combinatorial structures is by coloring thecomponents of a given structure. We consider first the case of an assembly,with mi possible components of size i and p(n) instances of total weight n.We are given a collection of t colors, and we label each element of a compo-nent of size i with one of those colors. This results in ti ways of labelling a


component of size i. The new instance is of the same weight as the old one,but the number m′

i of possible components of size i in the new structure isnow

m′i = mit

i,

and the total number p′(n) of instances of weight n is

p′(n) = p(n)tn.

Various refinements of this basic structure are of interest. For example,we may suppose that an overall color for a component is determined by thecolors assigned to each element of a component. One way to do this is toassume that for a set C of c labels, tij of the ti colorings of any componentof size i result in label j ∈ C. Then

c∑j=1

tij = ti, i ≥ 1.

The structure that keeps track of the number of components of size i andlabel j is then a refinement as described in the last section. The index setis

I = (i, j) : 1 ≤ i ≤ n; 1 ≤ j ≤ c,

and the component counts are

mα = mij = mitij , (i, j) ∈ I.

Example 2.16 Wreath products. A classical example of this type is pro-vided by wreath products of the form GwrSn, the complete monomialgroups over the finite group G; cf. James and Kerber (1981), Chapter 4.2,for example. The original structure corresponds to permutations on n ele-ments, so that mi = (i − 1)! and p(n) = n!. Suppose that G has t = |G|elements. The group elements correspond to the t colors. The refined struc-ture arises as follows. Suppose that G has c conjugacy classes C1, . . . , Ccwith sizes |C1|, . . . , |Cc|. For any cycle (ν1, . . . , νi) of size i, we pick groupelements gν1 , . . . , gνi , and form the product f = gν1 · · · gνi . We say that thecolored cycle has label j if f ∈ Cj . We show that

tij = ti−1|Cj |

as follows. Suppose that for the given cycle, one of the ti choices of groupelements results in label j. To obtain a colored cycle with label j, note thath ∈ Cj may be constructed by choosing from ti−1 possible group elementsto get gν1 , . . . , gνi−1 , letting g denote their product, and setting gνi

= g−1h.Then the label of the colored cycle is j.

2.5. Tilting 49

We see from this that if G is a finite group with c conjugacy classes, thenthe number of elements of type b = (bij) in GwrSn is

R(n, b) = 1l

∑i,j

ibij = n

n!∏i,j

((i− 1)!ti−1|Cj |

i!

)bij 1bij !

= 1l

∑i,j

ibij = n

|GwrSn|∏i,j

(|Cj |i|G|

)bij 1bij !

, (2.25)

as given in Lemma 4.2.10 of James and Kerber (1981).

Example 2.17 Linear coloring. This example applies to assemblies, mul-tisets and selections. Now any component is colored with one of t possiblecolors. In this case, mi is replaced by

m′i = mit, i ≥ 1.

The refined process that keeps track of the numbers Dij(n) of components

of weight i and color j in an instance of total weight n satisfies

I = (i, j) : 1 ≤ i ≤ n; 1 ≤ j ≤ t,

while

m′ij = mi, (i, j) ∈ I.

Example 2.18 Structures with all component sizes distinct. For any com-ponent size counting process C(n), as in any of the previous examples,condition on the event that all component sizes are distinct. For Exam-ple 2.1, this yields the example of partitions of an integer with all partsdistinct, but for all our other examples, notice that “all component sizesare distinct” is not the same as “all parts distinct”. See Stark (1994) andHwang (1994) for further examples.

2.5 Tilting

In the previous discussion, we have assumed that the particular instanceof a combinatorial structure is chosen uniformly from among the possibleinstances of a given weight. In this section, we discuss the case in which aninstance is chosen with probability proportional to ϕ# components, for someϕ > 0.


Denote by IPϕ the probability under this model, and write IP = IP1 forthe uniformly chosen case. Then for c = (c1, . . . , cn),

IPϕ[C(n) = c] ∝ ϕc1+···+cn IP[C(n) = c],

where here a(n, ϕ, c) ∝ b(n, ϕ, c) means that the ratio a/b is the same for allchoices of c. On the other hand, suppose that Zi(ϕ), i ≥ 1, are independentrandom variables with distributions given by

IP[Zi(ϕ) = k] =ϕkIP[Zi = k]

IE(ϕZi), k ≥ 0, (2.26)

where (Zi, i ≥ 1) are the independent random variables associatedwith C(n) in the Conditioning Relation. Then, writing Z(n)(ϕ) for the vector(Z1(ϕ), . . . , Zn(ϕ)), we have

IP[Z(n)(ϕ) = c] ∝ ϕc1+···+cn IP[Z(n) = c].

It follows that, if∑ni=1 ici = n, then

IPϕ[C(n) = c] ∝ ϕc1+···+cn IP[C(n) = c]= ϕc1+···+cnIP[Z(n) = c | T0n = n]∝ ϕc1+···+cnIP[Z(n) = c]

∝ IP

Z(n)(ϕ) = c

∣∣∣∣ n∑j=1

jZj(ϕ) = n

,and since both L(C(n)) and L

(Z(n)(ϕ)

∣∣∣ ∑nj=1 jZj(ϕ) = n

)are concen-

trated on the set of c satisfying∑nj=1 jcj = n, it follows that

IPϕ[C(n) = c] = IP

Z(n)(ϕ) = c

∣∣∣∣ n∑j=1

jZj(ϕ) = n

. (2.27)

Thus the Conditioning Relation holds under IPϕ as well, with the Zireplaced by the Zi(ϕ).

It remains to identify the distribution of the Zi(ϕ). If Zi ∼ Po(λi),then IE(ϕZi) = exp−λi(1 − ϕ), so that Zi(ϕ) ∼ Po(λiϕ). Hence, forassemblies, Zi(ϕ) ∼ Po(ϕmix

i/i!). The joint falling factorial moments ofC(n) are given by the following formula (cf. Arratia and Tavare, 1994, (126)).For (r1, . . . , rb) ∈ ZZb+ with m = r1 + 2r2 + · · ·+ brb,

IEϕ

b∏j=1

(Cj(n))[rj ]

= 1lm ≤ nx−m n!pϕ(n)

pϕ(n−m)(n−m)!

b∏j=1

(ϕmjx

j

j!

)rj

,

(2.28)where, if K0n denotes the number of components, and p(n) is the numberof instances of weight n in an assembly, we define

pϕ(n) = p(n)IE(ϕK0n).

2.5. Tilting 51

Note that, as in the special case (1.4), the product term on the right isprecisely IEϕ

∏bj=1(Zj(ϕ))[rj ]

.

If Zi ∼ NB(ri, pi) then, for ϕpi < 1, Zi(ϕ) ∼ NB(ri, ϕpi). Hence, formultisets, the Zi(ϕ) are NB(mi, ϕx

i), as long as ϕx < 1. Finally, if wehave Zi ∼ Bi(ri, pi), then Zi(ϕ) ∼ Bi(ri, ϕpi/(1 − pi + piϕ)), so that, forselections, the Zi(ϕ) are Bi(mi, ϕx

i/(1 + ϕxi)).

Example 2.19 The Ewens Sampling Formula. A very important exampleof tilting is provided by the Ewens Sampling Formula with parameter θ,denoted by ESF(θ). For each n, this is the one-parameter family of distribu-tions over the vectors C(n) = (C1

(n), C2(n), . . . , Cn

(n)) with n =∑ni=1 iCi

(n),defined to be the joint distribution of the component counts for a ran-dom permutation of n objects, chosen with probability biased by θK0n ,where K0n is the number of cycles. The parameter θ can take any positivevalue, and the choice θ = 1 gives exactly the component size distributionfor the cycles of a random permutation, as given in (1.3).

We saw in Theorem 1.3 that, for random permutations, the Zi have dis-tributions Po(1/i); hence, for ESF(θ), the Zi(θ) have Po(θ/i) distributions,and it follows from (2.27) that

IPθ[C(n) = c] = 1l

n∑j=1

jcj = n

n!θ(n)

n∏j=1

(θ

j

)cj 1cj !, c ∈ ZZn+. (2.29)

Note also that ESF(θ), for general θ, could be described as the assemblywith mi = θ(i− 1)!, if we ignored the requirement that the mi be nonneg-ative integers. The ESF(θ) family of distributions plays a substantial partin all that follows, and so we normally distinguish any associated randomquantities with an asterisk:

C∗(n) := (C∗(n)1 , . . . , C∗(n)

n ) ∼ ESF(θ). (2.30)

To generate a θ-biased permutation, we can use a variant of the ChineseRestaurant Process described in Example 2.4. Start the first cycle with theinteger 1. The integer 2 either either joins the first cycle (to the right of 1)with probability θ/(θ + 1), or starts the second cycle. Suppose that k − 1integers have been assigned to cycles. Integer k either starts a new cyclewith probability θ/(θ+k−1), or is inserted to the right of a randomly choseninteger already assigned to a cycle. After n integers have been assigned, itis easy to check that the resulting random permutation π has probabilityIPθ[π] = θ|π|/θ(n), for π ∈ Sn. See Chapter 4 for related material, as well asDiaconis and Pitman (1986) and Pitman (1997).

We shall see in Chapter 4.6 that the number K0n of cycles hasdistribution

IPθ[K0n = k] =θk|Sn(k)|θ(n) , k = 1, 2, . . . , n. (2.31)


In (1.38) we found the distribution of the ordered cycle sizes A1(n), A2

(n), . . .under IP1. We show in Chapter 4.8 that the analogous distributionunder IPθ is

IPθ[A1(n) = a1, · · · , Ak(n) = ak,K0n = k]

=θk

θ(n)

n!ak(ak + ak−1) · · · (ak + · · ·+ a1)

. (2.32)

The Ewens Sampling Formula arose originally in population genetics inEwens (1972); see the article by Ewens and Tavare in Johnson, Kotz andBalakrishnan (1997, Chapter 41) for an historical overview. For a recentapplication in the area of disclosure risk assessment, see Samuels (1998)and Fienberg and Makov (2001). Properties of ESF(θ) form the focus ofmuch of Chapter 4.

Example 2.20 Binary search trees. We consider a random binary searchtree on n+ 1 nodes constructed from a random sample X1, . . . , Xn+1 fromthe uniform distribution on (0,1); cf. Lynch (1965), Mahmoud (1992, Chap-ter 2), Devroye (1988). X1 occupies the first node, at level 0. Subsequentvalues are used sequentially, joining the left or right subtree according asthey are less than the root value or not. This splitting occurs recursivelyuntil a final location is found. For an example with n = 10, we observeX1 = 0.670, X2 = 0.583, X3 = 0.717, X4 = 0.465, X5 = 0.487, X6 = 0.222,X7 = 0.383, X8 = 0.213, X9 = 0.987, X10 = 0.356, X11 = 0.493.This produces the search tree shown in Figure 2.1. The last observation,X11 = 0.493, is inserted at level 4.

We note that the same tree results by replacing Xi by its position inthe sorted list X(1) < · · · < X(n+1). Thus we define π ∈ Sn+1 by X(πi) =Xi, i = 1, . . . , n + 1, and form the search tree from the successive valuesπ1, . . . , πn+1. In our example, π = (π1, . . . , π11) = (9 8 10 5 6 2 4 1 11 3 7). Wenote that the resulting π is uniformly distributed over Sn+1.

Let L(n) be the level of the last node added to the random binary searchtree, so that the possible values of L(n) are 1, 2, . . . , n. L(n) is the numberof comparisons needed to insert a new key into a tree formed from n keys.Lynch (1965) proved that

IP[L(n) = k] =2k|S(k)

n|(n+ 1)!

, k = 1, 2, . . . , n.

Noting that 2(n) = (n + 1)!, we see from (2.31) that L(n) has the samedistribution as K0n, the number of cycles in a random n-permutation underESF(2).

This intriguing parallel suggests that we look further, for some aspect ofthe structure having the same joint distribution as the cycle lengths underESF(2). This search is made easier by looking for something distributed as

2.5. Tilting 53

Figure 2.1. A binary search tree constructed from 11 items This is page xiPrinter: Opaque this

v1 .670

v2 .583 .717

v3 .465 .987

.222 v4 .487

.213 .383 v5 .493

.356

(A1(n), A2

(n), . . .), the ordered list of cycle lengths. Their joint law is givenin (2.32).

We proceed as follows. Follow the path from the root to the last nodeinserted. Label the root of the tree v1, and suppose that L(n) = k. Thepath to the root may be labelled v1, v2, . . . , vk, vk+1, where vi is a node atheight i − 1. For 1 ≤ i ≤ k, let Bi be the size of the left-or-right subtree,including vi, hanging from the vi in the direction away from the path tovk+1, so that B1 + · · · + Bk = n. We define Bi = 0 for i > L(n). For theexample with n = 10, the subtrees are indicated by the dotted lines inFigure 2.1, and B1 = 3, B2 = 1, B3 = 5, B4 = 1.

Let k ≥ 1 and a1, . . . , ak > 0 be given, with a1 + · · · + ak = n. For1 ≤ i ≤ k let si = ai + · · ·+ ak. Recalling (2.32), the goal is to prove thatfor the search tree on n+ 1 nodes

IP[B1 = a1, . . . , Bk = ak] =2k

n+ 1

k∏i=1

1si.

Multiplying by (n + 1)!, we have to show that the number of search treeson n+ 1 nodes having B1 = a1, . . . , Bk = ak is

n! 2kk∏i=1

1si.

To form a search tree to fit the specification above, start with a skeleton— an unlabelled path of k edges to serve as the path from last element


inserted back to the root. The right-left choice for each edge, to fit thebinary tree structure, accounts for the factor of 2k. The partition of [n]into blocks R1, . . . , Rk with |Ri| = ai is then determined by the search treerequirement; for example if the edge from v1 down to v2 goes left, thenR1 = [a1], while if the edge goes right, R1 = n − a1 + 1, . . . , n. In theleft-or-right subtrees hung from height i−1, the element vi on the skeletonis the extreme element of Ri, either max or min, as determined by theleft-right choice already made.

Given R1, . . . , Rk, the search tree constraint is that the element vi on theskeleton at height i−1 comes first among the si elements of Ri∪· · ·∪Rk. Thenumber of n-permutations consistent with this is n!

∏ki=1 1/si, completing

our proof that the joint distribution of (B1, B2, . . .) is given by (2.32) withθ = 2.

Example 2.21 Coagulation–fragmentation processes. Discrete Markovcoagulation–fragmentation processes are used to model the time evolutionof processes such as polymerization or social grouping, in which units as-sociate in clusters. The models that we consider here are treated in muchmore detail in Kelly (1979, Chapter 8) and Whittle (1986, Chapters 13–17),and go back to Whittle (1965). The state of a simple n particle system isdescribed by the vector c := (c1, . . . , cn) in which cj denotes the number ofclusters of size j, so that

∑nj=1 jcj = n. The transition rates are given by

Coagulation : c→ c− ε(i) − ε(j) + ε(i+j) at rate φi(ci)φj(cj)λ(i, j),1 ≤ i 6= j, i+ j ≤ n;

c→ c− 2ε(i) + ε(2i) at rate φi(ci)φi(ci − 1)λ(i, i),2 ≤ i ≤ bn/2c;

Fragmentation : c→ c− ε(i+j) + ε(i) + ε(j) at rate φi+j(ci+j)µ(i, j),1 ≤ i, j, i+ j ≤ n.

In the standard mass action model of chemical kinetics, the functions φi,which determine the relative dependence of the transition rates on theabundances of the clusters of different sizes, are all simply taken to bethe identity function — φi(l) = l for all i and l — with the homogeneousmixing interpretation that the overall encounter rate between i– and j–clusters is proportional to the product of the numbers of i– and j–clustersin the system, and that the overall dissociation rate of i + j–clusters intoi– and j–clusters is proportional to the number of i + j–clusters in thesystem. However, more general functions are also allowed in the aboveformulation. The resulting Markov process is time reversible if the reactionspecific coagulation and fragmentation rates λ(i, j) and µ(i, j) are such asto satisfy the equations

λ(i, j)µ(i, j)

=ai+jaiaj

, i, j ≥ 1, i+ j ≤ n,

2.5. Tilting 55

for some positive constants (a1, a2, . . . , an), and the equilibrium distribu-tion is then given by

π(c) = Bn

n∏j=1

acj

j

φj !(cj)

on the setc :∑nj=1 jcj = n

, where Bn is the appropriate normalizing

constant, and we use the notation

φ!(l) :=l∏

s=1

φ(s);

without loss of generality, we can always take φj(1) = 1 for all j.Thus, in the mass action model, the equilibrium distribution π satisfies

the Conditioning Relation with Zj ∼ Po(aj), j ≥ 1, and the LogarithmicCondition is then satisfied if an ∼ θn−1 for some θ > 0; the resultingcombinatorial structures are logarithmic assemblies. However, whatever thechoice of functions φj , π still satisfies the Conditioning Relation, in generalwith

IP[Zj = r] = b−1j

arjφj !(r)

, r ≥ 0, (2.33)

whenever bj :=∑r≥0 a

rj/φj !(r) < ∞ for all j ≥ 1, and the Logarithmic

Condition is satisfied if, for instance, an ∼ θn−1 and φj(l) ≥ 1 for all jand l. Since (2.33) is equivalent to

IP[Zj = r + 1]/IP[Zj = r] = aj/φj(r + 1), r ≥ 0,

any combinatorial structure satisfying the Conditioning Relation can berealized in this way, by appropriate choices of the functions φj , pro-vided that the random variables Zj have distributions with support ZZ+

or 0, 1, . . . ,m for some m <∞.


3Probabilistic preliminaries

This book now focuses exclusively on logarithmic structures which satisfythe Conditioning Relation. Thus we are concerned with the asymptotic be-havior of discrete dependent nonnegative integer valued random processesC(n) = (C1

(n), C2(n), . . . , Cn

(n)) satisfying

C1(n) + 2C2

(n) + · · ·+ nCn(n) = n, n = 1, 2, . . .

whose joint distribution fulfills the Conditioning Relation

L(C1(n), . . . , Cn

(n)) = L(Z1, Z2, . . . , Zn |T0n = n), (3.1)

for a sequence of independent random variables Z1, Z2, . . . taking values inZZ+, where

T0n = Z1 + 2Z2 + · · ·+ nZn. (3.2)

The random variables (Zi, i ≥ 1) are also assumed to be such as to satisfythe Logarithmic Condition

iIP[Zi = 1] → θ, iIEZi → θ as i→∞, (3.3)

for some θ > 0.In this probabilistic setting, there is no need to be more specific about the

distributions of the Zi, so that we are free to move away from the classicalPoisson, binomial and negative binomial families; this added flexibility hasits uses, for example when investigating random characteristic polynomialsover a finite field. However, if the classical families of distributions areabandoned, we need to impose some slight uniformity in the tail behaviorof the distributions of the Zi instead, in order to get the best results. The

3. Probabilistic preliminaries 57

way that this is done is discussed in detail in Chapter 6. Even within theclassical families, we are free to allow θ in (3.3) to take values differentfrom that normally associated with the uniform distribution over a wellknown set of combinatorial objects. The simplest example of this ariseswhen the Zj have Poisson distributions with mean IEZj = θ/j, for θ 6= 1,when the distribution of C(n) is the Ewens Sampling Formula given in (2.29).

The first property generally common to such structures is that the countsof small components are asymptotically independent, in the sense that, asn→∞,

(C1(n), C2

(n), . . .) →d (Z1, Z2, . . .) in ZZ∞+ . (3.4)

This follows from Theorem 5.5 and Theorem 11.1, under the minor re-striction (6.11). In contrast, the large components (C(n)

b+1, . . . , C(n)n) are

essentially dependent, typically having a joint distribution that is close tothat of (C∗(n)

b+1 , . . . , C∗(n)n ), where the distribution of C∗(n) is given by the

Ewens Sampling Formula in (2.29). Both of these properties describe theapproximation of one discrete process by another, simpler discrete process.The former is expressed as a limit theorem; however to express the latterin terms of convergence to a limit requires normalization, and the limitingprocess then no longer lies in ZZ∞+ , suggesting that some precision is lost indoing so. However, if L1

(n) ≥ L2(n) ≥ · · · are the sizes of the largest, second

largest, . . . components, and Lr(n) = 0 if r > K0n = C1(n) + · · ·+Cn

(n), then,as n→∞,

n−1(L1(n), L2

(n), . . .) →d (L1, L2, . . .) (3.5)

in the simplex

∆ =

x ∈ IR∞+ ;∑i≥1

xi = 1

⊂ [0, 1]∞,

where L has the Poisson-Dirichlet distribution with parameter θ, denotedby PD(θ), whose properties are described in Chapter 4. This, and more, isestablished in Theorem 5.8, once again using Theorem 11.1 and assumingthat (6.11) is satisfied. Note that neither (3.4) nor (3.5) need any extracondition in the classical settings of assemblies, multisets and selections:see Chapter 5.2.

However, our principal aim is to go further, and to use a number ofdifferent metrics to quantify the accuracy of the discrete approximationsto the distributions of both the small and the large components. To preparethe groundwork for this, we discuss some of these metrics in further detail.

58 3. Probabilistic preliminaries

3.1 Total variation and Wasserstein distances

For a treatment of the total variation distance and the Wasserstein dis-tances in probability theory in general, see for example Dudley (1989),Lindvall (1992), or the appendix to Barbour, Holst and Janson (1992).Most of the use of total variation distance in this book involves discretespaces. The total variation distance between the laws L(X) and L(Y ) ofrandom elements X, Y taking values in a discrete space S is defined by

dTV (L(X),L(Y )) = supB⊂S

(IP[X ∈ B]− IP[Y ∈ B]). (3.6)

Defining

A> = s ∈ S : IP[X = s] > IP[Y = s]

and

A≥ = s ∈ S : IP[X = s] ≥ IP[Y = s],

it is easy to see that a set B achieves the supremum in (3.6) if and only ifA> ⊂ B ⊂ A≥; in particular,

dTV (L(X),L(Y )) = IP[X ∈ A>]−IP[Y ∈ A>] =∑s∈S

(IP[X = s]−IP[Y = s])+.

We have written x+ and x− for the positive and negative parts of a realnumber x, so that x = x+ − x− and |x| = x+ + x−. The relation∑

s∈SIP[X = s] = 1 =

∑s∈S

IP[Y = s]

implies that∑s∈S

(IP[X = s]− IP[Y = s])+ =∑s∈S

(IP[X = s]− IP[Y = s])−

= 12

∑s∈S

|IP[X = s]− IP[Y = s]|.

Thus it follows also that

dTV (L(X),L(Y )) =∑s∈S

(IP[X = s]− IP[Y = s])+

= 12

∑s∈S

|IP[X = s]− IP[Y = s]|. (3.7)

A further relation is that

dTV (L(X),L(Y )) = min IP[X 6= Y ], (3.8)

the minimum being taken over all couplings of X and Y ; that is, over allconstructions of (X, Y ) on a common probability space such that L(X) =L(X) and L(Y ) = L(Y ). The minimum is achieved by any coupling in

3.2. Rates of convergence 59

which IP[X = Y = s] = min(IP[X = s], IP[Y = s]) for all s ∈ S; and thereis at least one such coupling.

The Wasserstein distance between the laws of random elementsX, Y tak-ing values in a complete separable metric space (S, d) may be characterizedby a relation similar to (3.8), namely

dW (L(X),L(Y )) = min IEd(X,Y ), (3.9)

with the minimum over all couplings. In contrast to formula (3.7) for thetotal variation distance, there is no simple direct formula for the Wasser-stein distance, with the one notable exception of the case in which (S, d)is the real line, with the usual distance d(x, y) = |x − y|, for which theWasserstein distance is the area enclosed between cumulative distributionfunctions,

dW (L(X),L(Y )) =∫ ∞

−∞|IP[X ≤ s]− IP[Y ≤ s]| ds.

Thus a typical way of proving an upper bound u for the Wasserstein dis-tance dW (L(X),L(Y )) is to exhibit a particular coupling with the propertythat IEd(X,Y ) ≤ u. The conclusions of Theorems 3.5 and 7.3 can hencebe interpreted as giving bounds on the Wasserstein distance between thenormalized partial sum processes and their Brownian limits, in terms of aWasserstein distance on S = C[0, 1], with d(x, y) = ‖x − y‖ ∧ 1. Anotherexample is given by the relation (1.26) for the Feller coupling, which im-plies a bound for the Wasserstein distance between the processes C(n) and(Z1, Z2, . . . , Zn), where now S = ZZn+ and d is the l1 metric.

3.2 Rates of convergence

Under the uniform strengthenings of the Logarithmic Condition describedin Chapter 6, we can substantially sharpen the limit theorems (3.4) for thesmall components and (3.5) for the large components. We are able to showthat

dTV (L(C(n)[1, b]),L(Z[1, b])) = o(1) if b = o(n), (3.10)

and that

dTV (L(C(n)[b+ 1, n]),L(C∗(n)[b+ 1, n])) = o(1) if b→∞, (3.11)

as well as to give bounds for the errors involved for given b and n: seeTheorems 6.7 and 6.9.

In proving estimates such as (3.10) and (3.11), we are helped by thefact that these total variation distances between joint distributions canbe reduced to distances between pairs of one-dimensional distributions, byusing independence and the Conditioning Relation. To demonstrate this,


we need to generalize the notation of (3.2), writing

TB(x) =∑i∈B

ixi, B ⊂ IN; Tbn(x) =n∑

i=b+1

ixi, 0 ≤ b < n, (3.12)

for any x ∈ ZZ∞+ ; TB(Z) is frequently abbreviated to TB , and Tbn(Z) toTbn, when no confusion is likely to occur.

Lemma 3.1 For any B ⊂ [n] = 1, 2, . . . , n, we have

dTV (L(C(n)(B)),L(Z(B))) = dTV (L(TB),L(TB |T0n = n)).

Proof. Direct computation gives

2dTV (L(C(n)(B)),L(Z(B)))

=∑a

∣∣IP[C(n)(B) = a]− IP[Z(B) = a]∣∣

=∑a

|IP[Z(B) = a|T0n = n]− IP[Z(B) = a]|

=∑a

∣∣∣∣ IP[Z(B) = a, T0n = n]IP[T0n = n]

− IP[Z(B) = a]∣∣∣∣ ,

by the Conditioning Relation. Rewriting the joint probability and usingindependence, this yields

2dTV (L(C(n)(B)),L(Z(B)))

=∑k

∑a:TB(a)=k

∣∣∣∣ IP[Z(B) = a]IP[T[n]\B = n− k]IP[T0n = n]

− IP[Z(B) = a]∣∣∣∣

=∑k

∑a:TB(a)=k

IP[Z(B) = a]

∣∣∣∣ IP[T[n]\B = n− k]

IP[T0n = n]− 1∣∣∣∣

=∑k

IP[TB = k]∣∣∣∣ IP[T[n]\B = n− k]

IP[T0n = n]− 1∣∣∣∣ .

But now, retracing the argument, we find that

2dTV (L(C(n)(B)),L(Z(B))) =∑k

|IP[TB = k|T0n = n]− IP[TB = k]|

= 2dTV (L(TB),L(TB |T0n = n)).

ut

Discrete approximations such as (3.10) and (3.11) are at the heart of thismonograph. The approximation of one discrete process with dependent co-ordinates by another having a simpler structure (with either independent

3.3. Results for classical logarithmic structures 61

or dependent coordinates) shows itself to be an extremely powerful anduseful technique, the more so, since we can prove tight bounds to accom-pany (3.10) and (3.11). In the next section, we illustrate the possibilitieswith a number of consequences and extensions.

3.3 Results for classical logarithmic structures

The main theorems of the monograph, stated in Section 6.7, are expressedin very general form. They apply to almost any combinatorial structurederived from the Conditioning Relation, for which the underlying inde-pendent random variables Zi, i ≥ 1 satisfy the Logarithmic Condition;very little more needs to be assumed. However, for many structures, muchmore can be said of the Zi’s, and the results can in consequence bemore simply expressed. Here, we concentrate on the classical combinato-rial classes, assemblies, multisets and selections. For logarithmic assemblies,which satisfy (2.16), the Zi are distributed as Po(i−1θi), with

iIEZi = θi = miy−i/(i− 1)! → θ;

logarithmic multisets satisfy (2.17) with the Zi distributed as NB (mi, pi),where pi = y−i for some y > 1, and we set

θi = imipi ∼ iIEZi ∼ θ;

logarithmic selections satisfy (2.17) with the Zi distributed as Bi (mi, pi),where pi = y−i/(1 + y−i) for some y > 1, and we set

θi = iIEZi = imipi ∼ θ.

For these combinatorial structures, we prove three kinds of main theo-rems. The first is the fundamental global approximation theorem, whichgives a description of the accuracy of the discrete approximations in (3.10)and (3.11).

Theorem 3.2 For assemblies satisfying (2.16) and for multisets andselections satisfying (2.17), under the additional conditions

|θ−1θi − 1| = O(i−g1) and |θi − θi+1| = O(i−g2) (3.13)

for some g1 > 0, g2 > 1, we have

1. dTV(L(C[1, b]),L(Z[1, b])

)= O(n−1b);

2. dTV(L(C[b+ 1, n]),L(C∗[b+ 1, n])

)= O(b−β0 log2 b);

Here, β0 denotes min1, θ, g1.

The second main group involves refinement of the basic theorems. Thefirst such refinement is in a complementary pair of local approximations,


which also reflect the distributional convergence in (3.4) and (3.5). Bothare proved using estimates developed for the total variation results.

Theorem 3.3 Under the conditions of Theorem 3.2, and as before withβ0 = min1, θ, g1,

1. For the small components, uniformly over those y ∈ ZZb+ for whichT0b(y) ≤ n/2, we have∣∣∣∣∣ IP

[C[1, b] = y

]IP[Z[1, b] = y

] − 1

∣∣∣∣∣ = O

(b+ T0b(y)

n

);

2. For the r largest components L1(n), . . . , Lr

(n), we have∣∣∣∣nrIP[L1(n) = m1, . . . , Lr

(n) = mr]f(r)

θ(n−1m1, . . . , n−1mr)− 1∣∣∣∣ = O

(n−β0

),

uniformly over all choices n > m1 > · · · > mr ≥ nη such that alson −

∑rs=1ms ≤ nη, for any fixed η > 0. Here, f(r)

θ denote the jointdensity of the r largest components of the Poisson–Dirichlet processPD(θ).

The second refinement strengthens the total variation approximation forthe small components.

Theorem 3.4 Under the conditions of Theorem 3.2,

dTV(L(C[1, b]),L(Z[1, b])

)=|1− θ|

2nIE|T0b−IET0b|+O

(b

n

[b

n+ n−β1+δ

])for any δ > 0, where β1 = min 1

2 ,θ2 , g1, g2 − 1.

The third and final main group of results involves coarsenings of the basictheorems, rather than refinements. Here, the interest lies in showing thatthe more traditional functional limit theorems can be deduced from ours.The functions of C(n) which we consider are the random elements Bn andWn of the space D[0, 1] of cadlag functions on [0, 1], defined for 0 ≤ t ≤ 1by

Bn(t) =∑[nt]i=1 C

(n)i − θt log n√

θ log n,

Wn(t) =logl.c.m. (i : 1 ≤ i ≤ [nt], C(n)

i ≥ 1) − 12θt

2 log2 n√13θ log3 n

,

3.3. Results for classical logarithmic structures 63

and the random purely atomic measure Ψ(n) on (0, 1] defined by

Ψ(n) =n∑j=1

δn−1jC(n)j .

The first two are close to standard Brownian motion, the last one to theformulation of the Poisson–Dirichlet distribution PD(θ) in Chapter 7.2 asa random measure Ψ∗ = Ψ∗

θ, obtained by setting

Ψ∗ =∑m≥1

δLm.

The accuracy of these approximations is lower than those of the total vari-ation approximations, but, perhaps only for historical reasons, they have amore immediate appeal.

Theorem 3.5 Under the conditions of Theorem 3.2, and as before withβ0 = min1, θ, g1,

1. It is possible to construct C(n) and a standard Brownian motion B onthe same probability space, in such a way that

IE

sup0≤t≤1

|Bn(t)−B(t)| ∧ 1

= O( log log n√

log n

).

2. It is possible to construct C(n) and a standard Brownian motion B onthe same probability space, in such a way that

IE

sup0≤t≤1

|Wn(t)−B(t3)| ∧ 1

= O( log log n√

log n

).

3. For any 0 < α ≤ 1, it is possible to construct C(n) and a Poisson–Dirichlet process Ψ∗ on the same probability space, in such a waythat

IE supg∈Gα

∣∣∣∫ g dΨ(n) −∫g dΨ∗

∣∣∣ = O(n−αβ0/(1+β0) log3 n

);

here, Gα = g : (0, 1] → IR : g(0) = 0, |g(x) − g(y)| ≤ |x − y|α. Inparticular,

IE∑j≥1

|n−1Lj(n) − Lj |

= O

(n−β0/(1+β0) log3 n

).

There are also a number of general approximation results for additivearithmetic functions on such structures: see Chapter 7.5.


3.4 Stein’s method

The conclusions of Theorems 3.2–3.5 all hold under much weaker condi-tions. We prove in the later chapters that it is enough to suppose thata combinatorial structure satisfies the Conditioning Relation for randomvariables Zi which satisfy

IP[Zi = l] ≤ Ci−a1−1l−a2−1 for all i ≥ 1 and l ≥ 2,

where a1 > 1 and a2 > 2, and such that (3.13) holds with iIP[Zi = 1] inplace of θi. For some of these results, even weaker conditions are sufficient;see Chapter 6.7 for the corresponding statements. However, to obtain thetheorems in such a general setting, one needs to be able to estimate certainprobabilities very accurately. We achieve this by using Stein’s method.

Stein’s method is a powerful approximation technique, introduced byStein (1970) for normal approximation in the context of sums of dependentrandom variables. The version appropriate for the Poisson approxima-tion of sums of dependent indicator random variables was developed byChen (1975a), and has been successfully exploited in a wide variety of appli-cations: see also Stein (1986, 1992). Here, we actually use Stein’s method toestablish approximation by certain compound Poisson distributions. How-ever, in order to describe the method as simply as possible, we begin withthe Stein–Chen method for Poisson approximation.

The Stein–Chen method is based on the following observations. First,for any subset A ⊂ ZZ+ and any real λ > 0, the indicator function 1lA of Acan be expressed in the form

1lA(j) = λgλ,A(j + 1)− jgλ,A(j) + Po(λ)A, j ≥ 0. (3.14)

The values of the function gλ,A : IN → IR can be successively determinedby applying (3.14) with j = 0, 1, . . ., since the equation for j = 0 onlyinvolves the value of gλ,A at j = 1; a value of gλ,A at j = 0 is never needed.What is more, it is shown in Barbour and Eagleson (1983) that ‖gλ,A‖ and‖∆gλ,A‖ can be bounded uniformly in A ⊂ ZZ+: in fact,

supA⊂ZZ+

‖gλ,A‖ ≤ min1, λ−1/2;

supA⊂ZZ+

‖∆gλ,A‖ ≤ λ−1(1− e−λ) ≤ min1, λ−1. (3.15)

Now, if W is any random variable on ZZ+, it follows from (3.14) that

IP[W ∈ A]− Po(λ)A = IEλgλ,A(W + 1)−Wgλ,A(W ),

and thus that

dTV (L(W ),Po(λ)) ≤ supA⊂ZZ+

|IEλgλ,A(W + 1)−Wgλ,A(W )|. (3.16)

3.4. Stein’s method 65

It is also immediate that, if Z ∼ Po(λ) and g = gλ,A for some A ⊂ ZZ+,then

IEZg(Z) = λIEg(Z + 1), (3.17)

a relation which is easily directly checked to be true for all bounded func-tions g, because of size–biasing: see Chapter 4.1. This latter fact suggeststhat, for random variables W with distributions expected, for structuralreasons, to be close to Po(λ), the expression

IEλgλ,A(W + 1)−Wgλ,A(W )

might well turn out almost automatically to be small. This would thenenable statements about total variation approximation to be readily made,by way of (3.16). Structural reasons suggesting that Poisson approximationmight be reasonable would be that W was expressible as a sum of weaklydependent indicator random variables, each of which had small probabilityof taking the value 1.

In order to see that this can in fact be the case, take W =∑ni=1 Ii, where

the Ii ∼ Be (pi) are independent, and take λ =∑ni=1 pi. Then it follows for

any bounded g that

IEIig(W ) = piIEg(W ) | Ii = 1 = piIEg(Wi + 1),

whereWi =∑i′ 6=i Ii′ = W−Ii, this last because Ii andWi are independent.

Hence, and from the definitions of λ and W , we find that

IEλg(W + 1)−Wg(W )

=

(n∑i=1

pi

)IEg(W + 1)−

n∑i=1

piIEg(Wi + 1)

=n∑i=1

piIEg(Wi + Ii + 1)− g(Wi + 1). (3.18)

But the quantity in braces is zero if Ii = 0, and is in modulus at most ‖∆g‖if Ii = 1, so that

|IEλg(W + 1)−Wg(W )| ≤n∑i=1

p2i ‖∆g‖.

Thus, from (3.15) and (3.16), it follows that

dTV (L(W ),Po(λ)) ≤ min1, λ−1n∑i=1

p2i ≤ max

1≤i≤npi.

This bound is of optimal order, as was shown by Barbour and Hall (1984),who established that (1/32)min1, λ−1

∑ni=1 p

2i is a lower bound for

dTV (L(W ),Po(λ)).


The approximations that we need for the core of this book are not Pois-son approximations, but approximations by particular compound Poissondistributions. We denote by CP (λ1, . . . , λm) the distribution of a sumZ =

∑mi=1 iZi, where Z1 ∼ Po(λ1), . . . , Zm ∼ Po(λm) are independent.

For such a compound Poisson distribution, it is easily deduced from (3.17)that

IEZg(Z) =m∑i=1

iλiIEg(Z + i), (3.19)

suggesting that there may be an analogue of the Stein–Chen method above,starting from the equation

1lA(j) =m∑i=1

iλigA(j + i)− jgA(j) + CP (λ1, . . . , λm)A, j ≥ 0, (3.20)

to be solved for the function gA for each given subset A ⊂ ZZ+. This isindeed the case, as was shown by Barbour, Chen and Loh (1992), but thereis in general a difficulty in exploiting it: there are no bounds comparableto those given in (3.15) for the solutions gA, except for restrictive classes ofcompound Poisson distributions. Unfortunately, the distributions in whichwe are interested, with λi = θ/i, 1 ≤ i ≤ m, for some θ > 0, do notfall into either of the amenable classes currently known. As a result, wehave to establish counterparts to (3.15) which are valid specifically for ourpurposes; this is the substance of Chapter 8. Once we have done this, wecan apply Stein’s technique very much as illustrated above: the detail ismore complicated, but the basic ideas are the same.


4The Ewens Sampling Formula

On page 8 in Chapter 1.1 we introduced the Feller coupling as a devicefor generating a uniform random permutation as an ordered product ofcycles. A similar device may be used to generate θ-biased permutations,as introduced in Example 2.19. To do this, let Di, i ≥ 1, be independentrandom variables, with Di taking values in the set [i], satisfying

IPθ[Di = 1] =θ

θ + i− 1, IPθ[Di = j] =

1θ + i− 1

, 2 ≤ j ≤ i.

We use the value of Dn to make the n-way choice between “(1)(2 ”, “(1 2 ”,. . . , “(1n ”. Here Dn = 1 corresponds to closing off the current cycle andstarting the next with the smallest unused integer, whereas Dn = j forj ≥ 2 produces the partial cycle “(1 j ”. Continuing in this way usingDn−1, . . . , D1 produces a permutation in Sn in ordered cycle notation.By considering where cycles end, it is straightforward to show that theprobability of getting a particular π after n steps is

IPθ[π] =θ|π|

θ(n) , π ∈ Sn, (4.1)

where |π| denotes the number of cycles in π. The sizes of the cycles are de-termined by the spacings between the 1’s in realizations of the independentBernoulli random variables

ξi = 1lDi = 1, i ≥ 1,

68 4. The Ewens Sampling Formula

which satisfy

IPθ[ξi = 1] =θ

θ + i− 1, IPθ[ξi = 0] =

i− 1θ + i− 1

, i ≥ 1; (4.2)

thus we have

Ci(n) = #i− spacings in 1ξ2ξ3 . . . ξn1. (4.3)

Just as in (1.20), it can be shown that

Zi = Ci(∞) = #i− spacings in 1ξ2ξ3 . . . , i ≥ 1, (4.4)

are independent Poisson-distributed random variables, with IEZi = θ/i. Inthe remainder of this chapter, we suppress the dependence on θ, writing IPin place of IPθ.

The distribution of C(n) = (C1(n), C2

(n), . . . , Cn(n)), the counts of cycles of

sizes 1, 2, . . . , n in a permutation of size n under this θ–weighted construc-tion, is given by the Ewens Sampling Formula ESF(θ) (Ewens, 1972) ofExample 2.19: for any a ∈ ZZn+,

IP[C(n) = a] = 1l

n∑i=1

iai = n

n!θ(n)

n∏j=1

(θ

j

)aj 1aj !

. (4.5)

This follows from (4.1) by using Cauchy’s formula (1.2) to count the numberof permutations having cycle index a. Throughout this chapter, to simplifythe notation, we omit the asterisk introduced in (2.30).

As noted in Example 2.19, the Conditioning Relation now holds forindependent Poisson random variables Zi with means λi = θ/i, i =1, 2, . . . , n:

IP[C(n) = a] = IP[Z[1, n] = a |T0n = n], (4.6)

where, as in (3.12),

T0n = Z1 + 2Z2 + · · ·+ nZn.

We can also specialize the result in (2.28) to the present context to showthat, for (r1, . . . , rb) ∈ ZZb+ with m = r1 + 2r2 + · · ·+ brb,

IE

b∏j=1

(Cj(n))[rj ]

= 1lm ≤ n(θ + n−m− 1

n−m

)(θ + n− 1

n

)−1 b∏j=1

(θ

j

)rj

,

(4.7)as established by Watterson (1974). Note in particular that, in (4.7),

b∏j=1

(θ

j

)rj

= IE

b∏j=1

(Zj)[rj ]

.

The asymptotic distribution of n−1T0n plays a crucial role in what fol-lows. In the next section, we develop some results about size-biasing thathelp in analyzing its distribution.

4.1. Size-biasing 69

4.1 Size-biasing

Suppose that X is a non-negative random variable with finite mean µ > 0and distribution F . The notation X? is used to denote a random variablewith distribution F ? given by

F ?(dx) =xF (dx)

µ, x > 0. (4.8)

We call X? (resp. F ?) the size-biased version of X (resp. F ). Size biasingarises naturally in statistical sampling theory (cf. Hansen and Hurwitz(1943), Midzuno (1952) and Gordon (1993)), and the results we presentbelow are all well known in the folk literature.

By standard arguments, (4.8) is equivalent to

IEg(X?) =IEXg(X)

IEXfor all bounded measurable g : IR+ → IR. (4.9)

For example, this may be used to show that for any c ∈ IR+, (cX)? = cX?.

Lemma 4.1 If X ∼ Po(λ), then X? ∼ Po(λ) + 1.

Proof. For bounded measurable g,

IEg(X + 1) =∞∑j=0

g(j + 1)λje−λ

j!

=∞∑j=0

g(j + 1)λj+1e−λ

(j + 1)!j + 1λ

=1λ

∞∑j=1

jg(j)λje−λ

j!

=IEXg(X)

IEX.

ut

There is an elementary way to realize the size biased version of a sum ofindependent random variables. The key case is contained in

Lemma 4.2 Let X1 and X2 be independent positive random variables withfinite means µ1 and µ2 respectively, and set µ = µ1 + µ2. Then

L((X1 +X2)?) =µ1

µL(X?

1 +X2) +µ2

µL(X1 +X?

2 ), (4.10)

where X?1 and X2 are independent, and X1 and X?

2 are independent.


Proof. For bounded measurable g,

IEg((X1 +X2)?) =µ1

µIEg(X?

1 +X2) +µ2

µIEg(X1 +X?

2 )

=µ1

µ

∫ ∫g(x1 + x2)

x1F1(dx1)µ1

F2(dx2)

+µ2

µ

∫ ∫g(x1 + x2)

x2F2(dx2)µ2

F1(dx1)

=1µ

∫ ∫(x1 + x2)g(x1 + x2)F1(dx1)F2(dx2)

=IE(X1 +X2)g(X1 +X2)

IE(X1 +X2).

ut

Corollary 4.3 If X is a positive random variable with mean µ, then forany c ∈ IR+,

(X + c)? =d

X? + c with probability µ/(µ+ c);X + c with probability c/(µ+ c).

For the size-biased version of the sum of n independent positive randomvariables, Lemma 4.2 may be used to establish

Theorem 4.4 Let X1, . . . , Xn be independent positive random variableswith finite means µ1, . . . , µn respectively, and write µ = µ1 + · · · + µn.Then

L((X1 + · · ·+Xn)?) =n∑j=1

µjµL(X1 + · · ·+Xn −Xj +X?

j ), (4.11)

where X1, . . . , Xn, X?1 , . . . , X

?n are independent.

The case of independent and identically distributed summands is implicitin Barouch and Kaufman (1976).

In what follows, we are particularly interested in random variables of theform

TB =∑j∈B

jZj , B ⊂ [n],

where the Zj are independent Poisson random variables with mean λj .Since ∑

j∈B,j 6=l

jZj + (lZl)? =∑

j∈B,j 6=l

jZj + lZ?l

=d

∑j∈B,j 6=l

jZj + l(Zl + 1)

4.2. The random variable Xθ 71

= TB + l,

it follows from Theorem 4.4 that

T ?B =d TB + JB , (4.12)

where JB is independent of TB and

IP[JB = j] =jλj∑i∈B iλi

=jλj

IETB, j ∈ B. (4.13)

This leads to the following size-biasing equation for the point probabilitiesassociated with the random variables TB :

kIP[TB = k] = IETB IP[T ?B = k]

= IETB∑l∈B

IP[JB = l] IP[TB = k − l]

=∑l∈B

lλlIP[TB = k − l], k = 1, 2, . . . . (4.14)

We can study the asymptotic behavior of the size-biased version of sumsof independent random variables via

Lemma 4.5 If Xn, n ≥ 1, is a sequence of positive random variables suchthat Xn →d X as n→∞, and if IEXn → IEX <∞, then X?

n →d X?.

Proof. X?n →d X

? if IEg(X?n) → IEg(X?) for all g ∈ CK , the bounded

continuous functions of compact support. By (4.9),

IEg(X?n) =

1IE(Xn)

IEXng(Xn).

The function xg(x) ∈ CK , and therefore IEXng(Xn) → IEXg(X) sinceXn →d X. Since IEXn → IEX by assumption, the proof is complete. ut

4.2 The random variable Xθ

When Zj is Poisson with mean θ/j and B = [n], the random variable Jnin (4.12) has the uniform distribution on [n] and IET0n = nθ. From thesize-biasing equation (4.14) we see that the point probabilities IP[T0n = k]satisfy

kIP[T0n = k] = θn∑j=1

IP[T0n = k − j], k = 1, 2, . . . (4.15)


It follows from this, for k ≤ n, that

kIP[T0n = k] = θIP[T0n = k − 1] + θn∑j=1

IP[T0n = k − 1− j]

= (θ + k − 1)IP[T0n = k − 1], k = 1, 2, . . . , n,

so that

IP[T0n = k] =θ(k)

k!IP[T0n = 0] = exp(−θh(n+ 1))

θ(k)

k!, k ≤ n, (4.16)

where h(n+ 1) denotes the nth harmonic number:

h(n+ 1) =n∑j=1

1/j, n ∈ ZZ+;

we extend the definition of h(·) to a continuous argument by

h(t+ 1) = γ + Γ′(t+ 1)/Γ(t+ 1), t ∈ IR+, (4.17)

where γ is Euler’s constant and Γ denotes the Gamma function. Hence

IP[T0n = k] ∼ e−γθxθ−1

nΓ(θ)if k ≤ n, k/n→ x ∈ (0, 1], (4.18)

and from (4.15) and (4.18) we conclude that

IP[T0n ≤ k] ∼ xθe−γθ

Γ(θ + 1)(4.19)

if k ≤ n, k/n→ x ∈ (0, 1]. As this suggests, n−1T0n has a limit law.

Theorem 4.6 If Zi ∼ Po(θ/i), i ≥ 1, then, as n → ∞, the random vari-ables n−1T0n(Z) converge in distribution to a random variable Xθ havingdistribution Pθ with Laplace transform given by

IEe−sXθ = exp(−∫ 1

0

(1− e−sx)θ

xdx

), (4.20)

and

IEXθ = θ. (4.21)

Proof. Let µn be the measure that puts mass n−1 at each of the points in−1,i = 1, 2, . . . , n, and note that µn converges weakly to Lebesgue measure on(0,1). The Laplace transform of the random variable n−1T0n is

IEe−sT0n/n = exp

(−

n∑i=1

θ

i(1− e−si/n)

)


= exp(−∫ 1

0

(1− e−sx)θ

xµn(dx)

)→ exp

(−∫ 1

0

(1− e−sx)θ

xdx

),

the last step following by dominated convergence. The result in (4.21) fol-lows by differentiating (4.20) with respect to s and letting s→ 0. ut

Using the classical identity∫ x

0

1− e−y

ydy = E1(x) + log x+ γ, x > 0, (4.22)

where

E1(s) =∫ ∞

s

e−x

xdx,

the transform in (4.20) can also be written in the form

IEe−sXθ = e−γθs−θe−θE1(s). (4.23)

This representation provides a formula for the density function pθ(·) of Xθ;see Vervaat (1972), p. 90.

Lemma 4.7 The random variable Xθ has density pθ(x), x > 0, given by

pθ(x) =e−γθxθ−1

Γ(θ)

1 +∞∑k=1

(−θ)k

k!

∫· · ·∫Ik(x)

1−k∑j=1

yj

θ−1

dy1 · · · dyky1 · · · yk

,

where

Ik(x) = y1 > x−1, . . . , yk > x−1, y1 + · · ·+ yk < 1.

Proof. From (4.23), note that

IEe−sXθ = e−θγs−θ∞∑k=0

(−θ)k

k!

(∫ ∞

s

e−y

ydy

)k. (4.24)

Recalling that s−θ is the Laplace transform of the function xθ−1/Γ(θ), andthat if f(x) ≥ 0 has transform f(s) then f(x − u), x ≥ u, has transforme−suf(s), we see that, if v+ = v1 + · · ·+ vk,

s−θe−sv+ =∫ ∞

v+

e−sx(x− v+)θ−1

Γ(θ)dx.

Hence, for k ≥ 1,

s−θ(∫ ∞

s

e−y

ydy

)k


=∫ ∞

1

· · ·∫ ∞

1

s−θe−s(v1+···+vk)

v1 · · · vkdv1 · · · dvk

=∫ ∞

1

· · ·∫ ∞

1

[∫ ∞

v+

e−sx(x− v+)θ−1

Γ(θ)dx

]dv1 · · · dvkv1 · · · vk

=∫ ∞

0

e−sx[∫ ∞

1

· · ·∫ ∞

1

1lv+ < x (x− v+)θ−1

Γ(θ)dv1 · · · dvkv1 · · · vk

]dx

=∫ ∞

0

e−sxxθ−1

Γ(θ)

[∫· · ·∫Ik(x)

(1− y+)θ−1 dy1 · · · dyky1 · · · yk

]dx. (4.25)

Combining (4.24) and (4.25) completes the proof. ut

Corollary 4.8 The functions pθ and Pθ satisfy

pθ(x) =e−γθxθ−1

Γ(θ); Pθ[0, x] =

e−γθxθ

Γ(θ + 1), 0 ≤ x ≤ 1, (4.26)

so that

pθ(1) =e−γθ

Γ(θ); Pθ[0, 1] =

e−γθ

Γ(θ + 1). (4.27)

Remark. The probability density pθ of Xθ turns up frequently in whatfollows, as does a related probability density fθ, defined in (4.78) below.The density pθ(·) is plotted for various values of θ in Figures 4.1 and 4.2.

Size-biasing may be used to find further properties of pθ. FromLemma 4.5 we see that (T0n/n)? →d X

?θ . Note that, from (4.12),(

T0n

n

)?=T ?0nn

=T0n

n+Jnn,

and

Jnn→d U,

where U ∼ U(0, 1). We conclude that

X?θ =d Xθ + U, (4.28)

where Xθ and U are independent.This relationship has several useful consequences, among them the fact

that the density pθ satisfies the equation

xpθ(x) = θ

∫ x

x−1

pθ(u)du, (4.29)


Figure 4.1. Probability density pθ(·).Solid line: θ = 0.5, dotted line θ = 1.0, dash-dot line θ = 2.0

0 1 2 3 4

0.0

0.5

1.0

1.5

2.0

2.5

Figure 4.2. Probability density pθ(·).Solid line: θ = 1.0, dotted line θ = 1.1, dash-dot line θ = 0.9

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0


with pθ(x) = 0 if x < 0. Hence it follows that limx→∞ xpθ(x) = 0, andindeed that

supy≥x

pθ(y) ≤θ

xsupy≥x−1

pθ(y),

so that supy≥n pθ(y) ≤ θn/n!. It also follows from (4.29) that pθ(x) isdifferentiable for x /∈ 0, 1, and that

xp′θ(x) + (1− θ)pθ(x) + θpθ(x− 1) = 0; (4.30)

see Vervaat (1972) and Watterson (1976). In particular, for x /∈ 0, 1,d

dx[x1−θpθ(x)] = −θx−θpθ(x− 1), (4.31)

so that x1−θpθ(x) is strictly decreasing in x > 1.The density pθ can in principle be calculated using Lemma 4.7, although

the following representation is more useful numerically (see Watterson(1976) and Griffiths (1988)). Integrating (4.31), we see that, for k = 1, 2, . . .,

x1−θpθ(x) = k1−θpθ(k)− θ

∫ x

k

z−θpθ(z − 1)dz, k ≤ x ≤ k + 1. (4.32)

Numerical methods in the case θ = 1, where p1(x) = e−γρ(x) and ρ isDickman’s function (1.35), are discussed in van de Lune and Wattel (1969).Asymptotics are treated in Hensley (1986), Hildebrand (1990), Wheeler(1990) and Hildebrand and Tenenbaum (1993).

4.3 The random variable Xθ(α)

We now move to partial sums of the form

Tbn =n∑

j=b+1

jZj , 0 ≤ b < n, (4.33)

of which T0n is just a special case. In this section, we investigate theasymptotic distribution of n−1Tbn, when b, n → ∞ in such a way thatb/n→ α ∈ (0, 1].

Theorem 4.9 If Zi ∼ Po(θ/i), i ≥ 1, then, as b, n→∞ in such a way thatb/n→ α ∈ (0, 1], the random variables n−1Tbn(Z) converge in distributionto a random variable Xθ

(α) having distribution Pθ(α) with Laplace transformgiven by

IEe−sXθ(α)

= exp(−∫ 1

α

(1− e−sx)θ

xdx

), (4.34)

IP[Xθ(α) = 0] = αθ (4.35)

4.3. The random variable Xθ(α) 77

and

IEXθ(α) = (1− α)θ. (4.36)

Proof. The first part follows just as in the proof of Theorem 4.6. Thesecond part follows by letting s → ∞ in (4.34), and the third part bydifferentiating (4.34) with respect to s and letting s→ 0. ut

The transform in (4.34) can also be written in the form

IEe−sXθ(α)

= αθ exp(θ

∫ 1

α

e−sy

ydy

), (4.37)

a representation that can be used to find the density pθ(α) of Xθ(α) on (0,∞).

Lemma 4.10 The random variable Xθ(α) has defective density pθ(α)(x) on

x > 0, which is given by pθ(α)(x) = 0 on 0 < x < α, and on x ≥ α by

pθ(α)(x) = αθ

θ

x1lα ≤ x ≤ 1 (4.38)

+∑

2≤k≤x/α

θk

k!

∫· · ·∫Jk(α,x)

x− k−1∑j=1

yj

−1

dy1 · · · dyk−1

y1 · · · yk−1

,

where, for k ≥ 2,

Jk(α, x) = α < yi < 1, 1 ≤ i ≤ k − 1; α < x− y1 − · · · − yk−1 < 1.

Remark. Note that Jk(α, kα) = ∅, and that

pθ(α)(x) =

θαθ

x, α ≤ x ≤ min(1, 2α). (4.39)

Indeed, the sum in (4.38) is a continuous function of x, so that pθ(α) hasdiscontinuities only at α and 1. We choose the definitions pθ(α)(x) = θx−1αθ

at x = α and at x = 1 for convenience in the statement of subsequentresults, in particular in Corollary 4.11 below.

Proof. Expanding the right side of (4.37), we see that∫ ∞

0

e−sxpθ(α)(x)dx

= αθ

θ∫ 1

α

e−sx

xdx+

∑n≥2

θn

n!

∫ 1

α

· · ·∫ 1

α

e−sy1

y1· · · e

−syn

yndy1 · · · dyn

.


Changing variables in the multiple integral to y1, . . . , yn−1 and x =∑ni=1 yi,

it follows that∫ ∞

0

e−sxpθ(α)(x)dx

= αθ∫ ∞

0

e−sx

θ

x1lα ≤ x ≤ 1

+bx/αc∑n=2

θn

n!

∫· · ·∫Jk(α,x)

x− n−1∑j=1

yj

−1

dy1 · · · dyn−1

y1 · · · yn−1

dx,

which completes the proof. ut

The Buchstab function ω(u) introduced on p. 13 is given by

ω(u) = p1(1/u)(1), u ≥ 1. (4.40)

We shall also later use the generalized version

ωθ(u) = pθ(1/u)(1), u ≥ 1. (4.41)

Corollary 4.11 For 0 < α < β ≤ 1, we have

pθ(α)(β) = βθ−1p

(α/β)θ (1) = βθ−1ωθ(β/α). (4.42)

Proof. From (4.38), we see that

pθ(α)(β) =

αθ

β

θ +∑

2≤k≤β/α

θk

k!

∫· · ·∫Jk(α,β)

1− β−1k−1∑j=1

yj

−1

dy1 · · · dyk−1

y1 · · · yk−1

.

Changing variables to wi = yi/β shows that

pθ(α)(β) = βθ−1

(αβ

)θ×

θ +∑

2≤k≤1/(α/β)

θk

k!

∫· · ·∫β−1Jk(α,β)

1−k−1∑j=1

wj

−1

dw1 · · · dwk−1

w1 · · ·wk−1

.

The proof is completed by noting that, for γ, x ≤ 1,

Jk(γ, x) =

yi > γ, 1 ≤ i ≤ k − 1;

k−1∑i=1

yi < x− γ

,

4.3. The random variable Xθ(α) 79

so that, for α ≤ β ≤ 1,

β−1Jk(α, β) = β−1

yi > α, 1 ≤ i ≤ k − 1;

k−1∑i=1

yi < β − α

=

yi > α/β, 1 ≤ i ≤ k − 1;

k−1∑i=1

yi < 1− α/β

= Jk(α/β, 1). ut

Remark. The statement of Corollary 4.11 is also formally true for α = β,as can be checked directly from (4.38). We provide another proof of the lasttwo results, using the scale-invariant Poisson process, in Chapter 4.11. Notealso that pθ(α) is continuous on (α, 1), which implies from Corollary 4.11that ωθ(u) is continuous in (1, 1/α). Since α can take any value in (0, 1), itfollows that ωθ is continuous throughout (1,∞).

Further properties of the distribution of Xθ(α) follow from the size biasing

equation (4.14), with B = b + 1, . . . , n. Since here jλj = θ is constant,JB has a uniform distribution on B, so that, from (4.12) and Lemma 4.5,we have

(Xθ(α))? =d Xθ

(α) + U,

where U ∼ U(α, 1). It follows from (4.35) and (4.36) that the densitypθ

(α)(x) satisfies

xpθ(α)(x) =

θαθ + θ

∫ x−α0

pθ(α)(v)dv, α ≤ x ≤ 1

θ∫ x−αx−1

pθ(α)(v)dv, x > 1

(4.43)

with pθ(α)(x) = 0 if x < α. We see that pθ(α)(x) is differentiable whenever

x /∈ α, 1, 2α, 1 + α, 2, and that then

x(pθ(α))′(x) + pθ(α)(x)− θpθ

(α)(x− α) + θpθ(α)(x− 1) = 0. (4.44)

Returning to the function ωθ, it follows from Corollary 4.11 that

uθωθ(u) = uθpθ(α)(uα)(uα)1−θ = uα1−θpθ

(α)(uα), 1 < u < 1/α. (4.45)

Thus it follows from (4.44) that

d

duuθωθ(u) = α1−θuα(pθ(α))′(uα) + pθ

(α)(uα)

= α1−θθpθ(α)((u− 1)α)− pθ(α)(uα− 1)

= α1−θθpθ(α)((u− 1)α),


if 1 < u < 1/α, since then also uα − 1 < 0. Hence, again from (4.45), itfollows that

d

duuθωθ(u) = θ(u− 1)θ−1ωθ(u− 1), 2 < u < 1/α, (4.46)

and, since α can take any value in (0, 1), it follows that (4.46) is satisfiedfor all u > 2. Note that (4.46) is also satisfied for 1 < u < 2, if the righthand side is interpreted as zero, because of (4.39) and (4.41). Thus (4.46)generalizes the differential equation used to define the Buchstab function ωon p. 13.

4.4 Point probabilities for Tbn

In this section, we continue the asymptotic analysis of the distributionof Tbn, concentrating now on point probabilities. We begin with somebounds for the probabilities IP[Tbn = k], derived in the next lemma byelementary methods.

Lemma 4.12 For 0 ≤ b < n,

(i) If 0 < θ ≤ 1, then

maxk≥0

IP[Tbn = k] ≤ IP[Tbn = 0] = e−θ(h(n+1)−h(b+1)); (4.47)

(ii) If θ > 1, then

maxk≥0

IP[Tbn = k] ≤ e−(h(n+1)−h(b+1)); (4.48)

(iii) For any θ > 0,

IP[Tbn = k] ≤ θ

k, k ≥ 1; (4.49)

(iv) For any θ > 0,

IP[Tbn = k] ≤ nθ2

k(k − n), k > n. (4.50)

Proof. Taking B = b + 1, . . . , n for any 0 ≤ b < n in the size-biasingequation (4.14) shows that

kIP[Tbn = k] = θn∑

l=b+1

IP[Tbn = k − l]. (4.51)

To establish (i), use (4.51) to see that, for k ≥ 1,

IP[Tbn = k] ≤ 1k

k−1∑j=0

IP[Tbn = j].

4.4. Point probabilities for Tbn 81

Thus IP[Tbn = k] is at most the average of the previous k values, and so,by induction, (4.47) holds.

To establish (ii), let Zj , j ≥ 1, be independent Poisson random variableswith IEZj = 1/j, and define Tbn =

∑nj=b+1 jZj . Define T ′bn =

∑j=b+1 jZ

′j ,

where the Z ′j are independent Poisson random variables with mean

IEZ ′j = (θ − 1)/j,

independent of the Zj . Then we can write

Tbn = Tbn + T ′bn.

Convolution of independent random variables is a smoothing operation, sothat

IP[Tbn = k] =k∑j=0

IP[Tbn = j]IP[T ′bn = k − j]

≤ max0≤j≤k

IP[Tbn = j]

≤ e−(h(n+1)−h(b+1)),

the last step following from (i). Part (iii) follows immediately from (4.51),and (iv) from (4.51) and Markov’s inequality. This completes the proof.

ut

The next result concerns the asymptotic behavior of Tbn when b = o(n).Theorem 4.6 shows that n−1T0n →d Xθ, and n−1Tbn →d Xθ also, ifb = o(n), because then n−1IE|T0n − Tbn| = n−1IET0b = n−1θb → 0. Thissuggests that approximation at the level of point probabilities may also befeasible.

Lemma 4.13 Suppose that m = mn ∈ ZZ+ satisfies m/n → y ∈ (0,∞) asn→∞, and that b = bn = o(n). Then

nIP[Tbn = m] ∼ pθ(y), n→∞. (4.52)

Proof. We write the size-biasing equation (4.51) in the form

mIP[Tbn = m] = θIP[m− n ≤ Tbn < m− b].

Now multiply by n/m and take the limit as n→∞; because of Theorem 4.6,n−1Tbn →d Xθ with continuous distribution Pθ, so that

limn→∞

nIP[Tbn = m] = y−1θIP[y − 1 ≤ Xθ ≤ y] = pθ(y),

the last equality in view of (4.29), completing the proof. ut

In the case b ∼ αn for 0 < α < 1, a different local limit applies.


Lemma 4.14 Suppose that b = bn ∼ αn for some α ∈ (0, 1), and thatm = mn ∈ ZZ+ satisfies m/n→ y ∈ (0,∞) as n→∞. Then, if y /∈ α, 1,

nIP[Tbn = m] ∼ pθ(α)(y) as n→∞. (4.53)

If y = 1, then

limn→∞

|nIP[Tbn = m]− pθ(α)(m/n)| = 0,

but pθ(α)(1) 6= pθ(α)(1+); if y = α, then

limn→∞

|nIP[Tbn = m]− pθ(α)(α)1lm > b| = 0.

Proof. We give the proof for y /∈ α, 1; the remaining parts are similar.The main tool is the size-biasing equation (4.51). There are several casesto consider.

First, if 1 ≤ m ≤ b, (4.51) implies that IP[Tbn = m] = 0, so that, for0 < y < α,

limn→∞

nIP[Tbn = m] = 0 = pθ(α)(y).

Next, if b < m < 2b+ 1 and m ≤ n, (4.51) reduces to

mIP[Tbn = m] = θIP[Tbn = 0]= θIP[Zb+1 = · · · = Zn = 0]∼ θαθ,

whereas, if 2b+ 1 ≤ m ≤ n, then (4.51) gives

mIP[Tbn = m] = θIP[0 ≤ Tbn < m− b]= θIP[Tbn = 0] + θIP[b < Tbn < m− b].

Now Theorem 4.9 shows that n−1Tbn →d Xθ(α), whose distribution Pθ(α) is

continuous except at 0; hence, for y > α,

IP[b < Tbn < m− b] ∼ IP[α < Xθ(α) < y − α],

and so, for α < y < 1,

mIP[Tbn = m] ∼ θαθ + θIP[α < Xθ(α) < y − α]

= θαθ + θ

∫ y−α

α

pθ(α)(u)du

= ypθ(α)(y),

the last equality following from (4.43).Finally, if m > n and y > 1, then (4.51) implies that

mIP[Tbn = m] = θIP[m− n ≤ Tbn < m− b]∼ θIP[y − 1 < Xθ

(α) < y − α]

4.5. Weak laws for small cycles 83

= θ

∫ y−α

y−1

pθ(α)(u)du

= ypθ(α)(y),

using (4.43) once more. Noting that m ∼ ny now completes the proof ineach case. ut

Lemmas 4.13 and 4.14 can be used to determine the limiting behaviourof ωθ(u) as u→∞. We start with the following continuity theorem.

Theorem 4.15 For each fixed y > 0, limα→0 pθ(α)(y) = pθ(y).

Proof. Take any sequence αk ↓ 0. For each k, choose nk > nk−1 such that,with mk = bynkc and bk = bαknkc,

|nkIP[Tbknk= mk]− pθ

(αk)(y)| < 1/k,

as we may, because of Lemma 4.14. Now observe that bk = o(nk) as k →∞,so that, applying Lemma 4.13, it follows that

limk→∞

nkIP[Tbknk= mk] = pθ(y).

Hence limk→∞ pθ(αk)(y) = pθ(y), and the theorem follows. ut

As a corollary, we obtain the limit of ωθ(u) as u → ∞. This generalizesthe corresponding result for the Buchstab function, stated on p. 13.

Corollary 4.16 For the generalized Buchstab function ωθ,

limu→∞

ωθ(u) = e−γθ/Γ(θ).

Proof. It follows from (4.45) that, for u > 1,

ωθ(u) = (uα)1−θpθ(α)(uα)

for any α < 1/u. So take α = 1/2u to give

ωθ(u) = 2−(1−θ)pθ(1/2u)(1/2),

and let u→∞, giving

limu→∞

ωθ(u) = 2−(1−θ)pθ(1/2) = e−γθ/Γ(θ),

from Corollary 4.8.

4.5 Weak laws for small cycles

Having established the necessary properties of T0n, we return to the mainobject of our study, the asymptotic distribution of the joint cycle counts


C(n) of a θ-biased random permutation. The first result is a verificationthat (3.4) holds in this setting, showing that the joint distribution of thesmall cycles is essentially that of the Zj , in the following sense:

Theorem 4.17 For θ–biased random permutations, as n→∞,

(C1(n), C2

(n), . . .) →d (Z1, Z2, . . .)

in ZZ∞+ . In this case, Zj ∼ Po(θ/j) for j ≥ 1.

Proof. We need only show that C(n)[1, b] →d Z[1, b] as n → ∞ for everyfixed b ∈ IN. Note that, for any a = (a1, . . . , ab) ∈ ZZb+,

IP[C(n)[1, b] = a] = IP[Z[1, b] = a |T0n = n]

= IP[Z[1, b] = a]IP[Tbn = n− T0b(a)]

IP[T0n = n], (4.54)

where, as in (3.12), Tbn = Tbn(Z) =∑nj=b+1 jZj and T0b(a) =

∑bj=1 jaj .

Since a and b are fixed, we may apply Lemma 4.13 directly to see that

limn→∞

IP[Tbn = n− T0b(a)]IP[T0n = n]

= 1,

completing the proof. ut

The proof of Theorem 4.17 simply uses the Conditioning Relation inconjunction with Lemma 4.13. In the next chapter, it is shown that theconclusion of Lemma 4.13 holds for a wide variety of logarithmic combina-torial structures, and hence that the weak convergence (3.4) of the smallcomponents also takes place in all these structures. The key step in thisgeneralization is to demand that some rough approximation to the sizebiasing equation (4.51) is valid.

The next theorem, whose proof is based on the same general approach,sharpens Theorem 4.17 in the setting of θ–biased random permutations,verifying the bound on the total variation distance

db(n) = dTV (L(C(n)[1, b]),L(Z[1, b]))

given in Theorem 3.2. However, the size biasing equation is now usedat (4.61) to make more delicate estimates than are required for Lemma 4.13,and extension to more general logarithmic combinatorial structures has towait until Chapter 6.7.

Theorem 4.18 There exists a constant c0(θ) such that, for any 1 ≤ b ≤ n,db(n) ≤ c0(θ)b/n.

Proof. Since db(n) ≤ 1 for all b, we clearly have db(n) ≤ 4b/n for allb ≥ n/4. Hence it is enough to examine b ≤ n/4 in what follows. First note


that, in the light of (4.51) and (4.19), there is a constant c1 = c1(θ) > 0such that

IP[T0n = n] = n−1θIP[n−1T0n < 1] ≥ c1n−1. (4.55)

From Lemma 3.1 and (3.7), we have

db(n) =∑r≥0

IP[T0b = r](IP[T0n = n]− IP[Tbn = n− r])+

IP[T0n = n].

We break the sum into two parts, U1 and U2, corresponding to the rangesr ≤ n/2 and r > n/2. For U2, note that

(IP[T0n = n]− IP[Tbn = n− r])+ ≤ IP[T0n = n],

so that

U2 =∑r>n/2


IP[T0n = n]

≤ IP[T0b > n/2] ≤ 2IET0b

n=

2θbn. (4.56)

To bound U1, note that(IP[T0n = n]− IP[Tbn = n− r]

)+=

(n∑s=0

IP[T0b = s]IP[Tbn = n− s]− IP[Tbn = n− r]

)+

≤n∑s=0

IP[T0b = s]IP[Tbn = n− s]− IP[Tbn = n− r]+,

and hence

U1 =n/2∑r=0


IP[T0n = n](4.57)

≤ n

c1(θ)

n/2∑r=0

IP[T0b = r]

×n∑s=0

IP[T0b = s]IP[Tbn = n− s]− IP[Tbn = n− r]+.

Once more we break this sum into two parts, U3 and U4, corresponding tothe ranges s > n/2 and s ≤ n/2. For the first, we have

U3 =n

c1(θ)

∑n/2<s≤n

IP[T0b = s]

×∑

0≤r≤n/2

IP[T0b = r]IP[Tbn = n− s]− IP[Tbn = n− r]+


≤ n

c1(θ)

∑n/2<s≤n

IP[T0b = s]∑

0≤r≤n/2

IP[T0b = r]IP[Tbn = n− s]

≤ n

c1(θ)

∑n/2<s≤n

IP[T0b = s]IP[Tbn = n− s]. (4.58)

Now use (4.50) with b = 0, n = b, k = s, recalling that b ≤ n/4 ands > n/2, to see that

IP[T0b = s] ≤ bθ2

s(s− b)≤ 8bθ2

n2.

Hence, from (4.58),

U3 ≤n

c1(θ)8bθ2

n2

∑n/2<s≤n

IP[Tbn = n− s] ≤ 8bθ2

c1(θ)n. (4.59)

We now turn to the second term from (4.57),

U4 =n

c1(θ)

∑0≤r,s≤n/2

IP[T0b = r]IP[T0b = s]IP[Tbn = n−s]−IP[Tbn = n−r]+.

(4.60)To bound this expression, we note first that from (4.51), for 0 ≤ r ≤ s,

(n− r)IP[Tbn = n− r]− IP[Tbn = n− s] (4.61)= θIP[n− s− b < Tbn ≤ n− r − b] + (r − s)IP[Tbn = n− s].

Writing∑ ′ to denote

∑0≤r<s≤n/2, it thus follows that

U4 =n

c1(θ)

∑ ′IP[T0b = r]IP[T0b = s]

×∣∣∣IP[Tbn = n− s]− IP[Tbn = n− r]

∣∣∣≤ n

c1(θ)


1

n− r

(s− r)IP[Tbn = n− s]

+ θIP[n− s− b ≤ Tbn < n− r − b]

≤ n

c1(θ)


× 2n

(s− r)IP[Tbn = n− s] + θ

n−r−b−1∑j=n−s−b

IP[Tbn = j]

.

Now we invoke Lemma 4.12 (iii) to give

U4 ≤ 2c1(θ)


(s− r)θ

n− s+ θ

n−r−b−1∑j=n−s−b

θ

j


≤ 2c1(θ)


(s− r)

θ

n/2+θ2(s− r)n− s− b

≤ 2

c1(θ)

∑ ′IP[T0b = r]IP[T0b = s](s− r)

2θ(1 + 2θ)

n

.

Replacing s− r by s in the sum then yields

U4 ≤4θ(1 + 2θ)nc1(θ)

IET0b =4bθ2(1 + 2θ)

nc1(θ), (4.62)

whence, combining (4.56), (4.59), and (4.62), we see that

dTV (L(C(n)[1, b]),L(Z[1, b])) ≤ 2θbn

1 +

2θc1(θ)

(3 + 2θ)

for b ≤ n/4, completing the proof. ut

Remark. The previous proof is based on an analysis of point probabilities,an approach that can be extended to general logarithmic structures. Inthe case of ESF(θ), the result of Theorem 4.18 can be found much moresimply using the Feller coupling. By bounding terms analogous to (1.22)and (1.23) for the ξi defined in (4.2), Arratia, Barbour and Tavare (1992)established the following result.

Lemma 4.19 In the Feller coupling, for 1 ≤ b ≤ n,

IP[C(n)[1, b] 6= Z[1, b]] ≤b∑j=1

IE∣∣Cj(n) − Zj

∣∣≤

bθ(θ+1)θ+n , θ ≥ 1;

bθ(θ+1)θ+n−b , 0 < θ < 1.

(4.63)

In addition, for θ > 0 and n ≥ 3,

n∑j=1

IE∣∣Cj(n) − Zj

∣∣ ≤ 1 +nθ(θ + 1)2θ + n

+ 2 log(

2nn− 2

)= O(1). (4.64)


4.6 The number of cycles

The distribution of the number K0n = C1(n) + · · · + Cn

(n) of cycles in aθ-biased permutation follows directly from (4.1):

IP[K0n = k] =∑

π: |π|=k

IP[π]

=θk

θ(n)

∑π: |π|=k

1

=θk|Sn(k)|θ(n) , k = 1, 2, . . . , n. (4.65)

It follows that the probability generating function of K0n is given by

IE(uK0n) =(θu)(n)

θ(n) =n∏j=1

(j − 1

θ + j − 1+

uθ

θ + j − 1

), (4.66)

Hence K0n is distributed as a sum of independent Bernoulli random vari-ables ξj satisfying IEξj = θ/(θ + j − 1), and the Feller coupling provides aconstruction of the ξj . The mean and variance of K0n are given by

IE(K0n) =n∑j=1

θ

θ + j − 1, Var(K0n) = θ

n∑j=1

j − 1(θ + j − 1)2

.

Local limit theorems

Asymptotics for the point probabilities IP[K0n = k] follow from (4.65),(1.30) and the asymptotics of the gamma function. If k/ log n → βθ andβ ∈ [0,∞), we have

IP[K0n = k] ∼ (θ log n)k−1e−θ logn

(k − 1)!Γ(θ + 1)Γ(βθ + 1)

. (4.67)

In contrast to the result for uniform random permutations in (1.29), thisformula is not the same for the local central limit case β = 1 and the casek = o(log n) for which β = 0, because, for θ 6= 1, Γ(θ + 1) 6= Γ(1).

The event K0n = k for fixed k but large n represents a large deviation,and we want a description of the random permutation conditional on thislarge deviation taking place. We give a limit law for this conditional dis-tribution, saying essentially that there are k− 1 medium sized componentsand one large component. Here “medium sized” means tending to infinitybut small relative to n. In more detail, the k − 1 small components havesizes distributed like nU1 , nU2 , . . . , nUk−1 , where the Uj are independent anduniformly distributed in (0,1). We observe, without proof, that a similarconditional limit law holds for the k prime factors in the situation gov-

4.6. The number of cycles 89

erned by Landau’s asymptotics. Later, in Chapter 5.1, we will consider theanalogous questions for logarithmic combinatorial structures in general.

Theorem 4.20 Let Yj(n) be the length of the jth smallest cycle in a randomn-permutation chosen according to the ESF with parameter θ > 0. Let K0n

be the total number of cycles. Then for fixed k, as n→∞,

L((

log Y1(n)

log n, . . . ,

log Yk−1(n)

log n

)∣∣∣∣ K0n = k

)→d (U[1], U[2], . . . , U[k−1]),

(4.68)where U[j] is the jth smallest of k − 1 independent random variablesdistributed uniformly in (0,1).

Proof. First note that the conditional distribution of (C1(n), . . . , Cn

(n)) givenK0n = k is exactly the same for all θ > 0, so without loss of generality wetake θ = 1.

Let 0 < a1 < b1 ≤ a2 < b2 ≤ · · · ≤ ak−1 < bk−1 < 1. The joint law of(U[1], U[2], . . . , U[k−1]) is determined by

IP[U[j] ∈ (aj , bj), 1 ≤ j ≤ k − 1] = (k − 1)!∏

1≤j≤k−1

(bj − aj),

and convergence to these limit values determines convergence to thisdistribution. Thus we only need to show that for events En of the form

En = Yj(n) ∈ (naj , nbj ), 1 ≤ j ≤ k − 1

we have, as n→∞,

IP[En |K0n = k] → (k − 1)!∏

1≤j≤k−1

(bj − aj).

For i1 < i2 < · · · < ik−1 < ik = n− (i1 + · · ·+ ik−1), let Gi be the event

Gi = Ci1(n) = · · · = Cik

(n) = 1,

so that Gi ⊂ K0n = k and, by Cauchy’s formula (1.2),

IP[Gi] =1

i1i2 · · · ik−1ik.

Once n is so large that knbk−1 < n, En becomes the union of the disjointevents Gi over all i1, . . . , ik−1 with ij ∈ (naj , nbj ) for j = 1 to k − 1.Summing the probabilities IP[Gi] over this same range of i1, . . . , ik−1, itfollows that, for sufficiently large n,

IP[En] =∑

i1,...,ik−1

1i1i2 · · · ik−1ik

.

Now, since bk−1 < 1 and ik = n− (i1 + · · ·+ ik−1), we have

1 > ik/n > 1− knbk−1−1,


so that ik ∼ n uniformly over the range of summation, and hence

IP[En] ∼ 1n

∑i1,...,ik−1

1i1i2 · · · ik−1

=1n

∏1≤j≤k−1

∑naj<m<nbj

1m

∼ 1

n

∏1≤j≤k−1

(bj − aj) log n .

Since En ⊂ K0n = k, combining the above with (1.29) yields

IP[En|K0n = k] =IP[En]

IP[K0n = k]→ (k − 1)!

∏1≤i≤k−1

(bi − ai),

as required. ut

Central limit theorems

The representation of K0n as a sum of independent Bernoulli randomvariables shows that the asymptotic distribution of K0n is Normal:

K0n − θ log n√θ log n

→d N(0, 1) (4.69)

as n→∞. The case θ = 1 specializes to the result given in (1.31).There is also a functional central limit theorem describing the counts of

cycles of sizes up to nt, 0 ≤ t ≤ 1. Define the process Bn(·) by

Bn(t) =

∑bntcj=1 Cj

(n) − θt log n√θ log n

, 0 ≤ t ≤ 1.

Hansen (1990) showed that Bn(·) →d B(·) as n→∞, where B is standardBrownian motion. See also the proofs of Donnelly, Kurtz and Tavare (1991)and Arratia and Tavare (1992b). The analog of this result for logarithmiccombinatorial structures is given in Theorem 7.3, together with an errorrate.

Finally, as in (1.32) for θ = 1, we consider the difference Dn between thenumber of cycles K0n and the number of distinct cycle lengths. Since

Dn =∑j≤n

(Cj(n) − 1)+,

the Poisson approximation heuristic suggests that, as n→∞,

Dn →d D =∑j≥1

(Zj − 1)+.

4.7. The shortest cycles 91

This is proved in Arratia and Tavare (1992b), together with the fact that

IEDn → IED =∑j≥1

(θ/j − 1 + exp(−θ/j)) .

As in the case θ = 1, it follows that the number of distinct cycle lengths alsoasymptotically has a normal distribution, with mean and variance θ log n.

4.7 The shortest cycles

As in Chapter 1.1, let Yr(n) denote the length of the rth smallest cycle(defined to be +∞ if K0n < r), and recall that

Yr(n) > l if and only if

l∑j=1

Cj(n) < r.

The independent process approximation heuristic and Theorem 4.18 canbe used to prove that Yr(n) →d Yr for each fixed r, where

IP[Yr > l] = IP

l∑j=1

Zj < r

,and this last expression simplifies to Po(θh(l + 1))[0, r − 1]. The jointdistribution of the r smallest cycles may be approximated in the same way;see Arratia and Tavare (1992b) for example.

In the case that the length Y1(n) of the smallest cycle is required to be of

order n, we have:

Lemma 4.21 For fixed u > 1, as n→∞,

IP[Y1(n) > n/u] ∼ n−θ Γ(θ)uθωθ(u), (4.70)

where ωθ(u) = pθ(1/u)(1) is as in (4.41).

Remark. When θ = 1, this proves (1.33).

Proof. The proof is another elementary argument based on conditioning.We have

IP[Y1(n) > b] = IP[C1

(n) = · · · = Cb(n) = 0]

= IP[Z1 = . . . = Zb = 0 |T0n = n]

= IP[Z1 = · · · = Zb = 0]IP[Tbn = n]IP[T0n = n]

= e−θh(b+1) IP[Tbn = n]IP[T0n = n]

.


For b ∼ n/u, the first term in the product is asymptotic to uθe−γθn−θ.From Lemma 4.13 with b = 0 and m = n, the denominator of the fractionin the product is asymptotic to n−1pθ(1) = n−1e−γθ/Γ(θ), while fromLemma 4.14 with m = n, the numerator is asymptotic to n−1pθ

(1/u)(1).Collecting terms and simplifying completes the proof. ut

The local version can be proved in a similar way. We have

Lemma 4.22 For fixed u > 2, as n→∞ with b ∼ n/u,

IP[Y1(n) = b] ∼ n−θ−1 Γ(θ + 1)(u− 1)θ−1u2ωθ(u− 1). (4.71)

Remark. When θ = 1, this proves (1.34).

Proof. We note that

IP[Y1(n) = b] = IP[C1

(n) = · · · = Cb−1(n) = 0, Cb(n) ≥ 1]

= IP[C1(n) = · · · = Cb−1

(n) = 0, Cb(n) = 1]+ IP[C1

(n) = · · · = Cb−1(n) = 0, Cb(n) ≥ 2]. (4.72)

The first term is

IP[C1(n) = · · · = Cb−1

(n) = 0, Cb(n) = 1]= IP[Z1 = . . . = Zb−1 = 0, Zb = 1 |T0n = n]

= IP[Z1 = · · · = Zb−1 = 0, Zb = 1]IP[Tbn = n− b]

IP[T0n = n]

= (θ/b)e−θh(b+1) IP[Tbn = n− b]IP[T0n = n]

.

Using the same steps as in the previous proof, for b ∼ n/u, the first term inthe product is asymptotic to θuθ+1e−γθn−θ−1, the denominator of the frac-tion in the product is asymptotic to n−1e−γθ/Γ(θ), and, from Lemma 4.14with m = n−b, the numerator is asymptotic to n−1pθ

(1/u)(1−1/u). Becauseu > 2, Corollary 4.11 implies that

pθ(1/u)(1− 1/u) = (1− 1/u)θ−1p

1/(u−1)θ (1) = (1− 1/u)θ−1ωθ(u− 1),

so that, collecting the terms together,

IP[C1(n) = · · · = Cb−1

(n) = 0, Cb(n) = 1] ∼ Γ(θ + 1)(u− 1)θ−1u2ωθ(u− 1)nθ+1

.

The proof is completed by showing that the second term on the right of(4.72) is of order o(n−θ−1). To see this, note that

IP[C1(n) = · · · = Cb−1

(n) = 0, Cb(n) ≥ 2]

=bn/bc∑j=2

IP[Z1 = · · · = Zb−1 = 0, Zb = j |T0n = n]

4.8. The ordered cycles 93

=bn/bc∑j=2

IP[Z1 = · · · = Zb−1 = 0, Zb = j]IP[Tbn = n− jb]

IP[T0n = n]

=

IP[Z1 = · · · = Zb−1 = 0]IP[T0n = n]

bn/bc∑j=2

IP[Zb = j]IP[Tbn = n− jb].

Using Lemma 4.13 for the probability IP[T0n = n], the factor before thesum is asymptotic to

e−θh(b) n/pθ(1) ∼ e−γθn−θuθ nΓ(θ)eγθ = Γ(θ)uθn−θ+1,

from (4.27). Next, for the sum, if u /∈ ZZ, the index j takes only values inthe fixed set 2, 3, . . . , buc for all n sufficiently large, and the j’th term isasymptotic to

(θu/n)j/j!n−1pθ(1/u)(1− j/u) = O(n−j−1) = O(n−3),

from Lemma 4.14 and since j ≥ 2. Hence, for u /∈ ZZ, multiplying these twoestimates, the second term on the right of (4.72) is of order O(n−θ−2). Ifu ∈ ZZ, the range of j in the sum includes the value u whenever b ≤ n/u, andthe corresponding term in the sum is then bounded, for all n sufficientlylarge, by

IP[Zb = u] IP[Tbn = 0] ∼ (θu/n)u/u!u−θ = O(n−3),

this last because u ∈ ZZ and u > 2, and so the second term on the rightof (4.72) is of order O(n−θ−2) once again. This completes the proof. ut

4.8 The ordered cycles

In the introduction to this chapter, we described a natural ordering of thecycles determined by the Feller coupling. As in Chapter 1.1, we denote thesize of the first, second, . . . cycles by A1

(n), A2(n), . . .. Then formula (1.38)

can be used in conjunction with (4.1) to see that

IP[A1(n) = a1, · · · , Ak(n) = ak,K0n = k]

=θk

θ(n)

n!ak(ak + ak−1) · · · (ak + · · ·+ a1)

, (4.73)

since K0n = k is equivalent to∑kj=1 aj = n and IPθ[π]/IP1[π] = n!θ|π|/θ(n).

Using the fact that∑ (n−m)!ak(ak + ak−1) · · · (ak + · · ·+ ar+1)

= |S(k−r)n−m |,


where the sum is over positive integers ar+1, . . . , ak with sum n −m, wesee that for r ≤ k and a1 + · · ·+ ar = m < n,

IP[A1(n) = a1, A2

(n) = a2, . . . , Ar(n) = ar,K0n = k]

=n!θk

θ(n)(n−m)!1

(n− a1 − · · · − ar−1) · · · (n− a1)n

×∑ (n−m)!

ak(ak + ak−1) · · · (ak + · · ·+ ar+1)

=n!θk|S(k−r)

n−m |θ(n)(n−m)!

1(n− a1 − · · · − ar−1) · · · (n− a1)n

.

Summing the last expression over k = r + 1, . . . , n−m+ r shows that

IP[A1(n) = a1, A2

(n) = a2, . . . , Ar(n) = ar,K0n > r]

=n!θrθ(n−m)

θ(n)(n−m)!1

(n− a1 − · · · − ar−1) · · · (n− a1)n. (4.74)

From this, we can deduce the asymptotic behavior of n−1(A1(n), A2

(n), . . .).For fixed r, and x1, x2, . . . , xr > 0 satisfying x1 + · · ·+ xr < 1, we see that

limn→∞

nrIP[A1

(n) = bnx1c, A2(n) = bnx2c, . . . , Ar(n) = bnxrc,K0n > r

]= f[r]

θ(x1, . . . , xr)

=θr(1− x1 − · · · − xr)θ−1

(1− x1)(1− x1 − x2) · · · (1− x1 − · · · − xr−1). (4.75)

Hence

n−1(A1(n), A2

(n), . . .) →d (A1, A2, . . .),

where the densities f[r]θ give the joint density of (A1, . . . , Ar) for each r =

1, 2, . . .. A direct calculation shows that

A1 = Y1, Ar = Yr

r−1∏j=1

(1− Yj), r ≥ 1,

where Yi, i ≥ 1 are independent and identically distributed randomvariables having Beta(1, θ) distribution with density θ(1−y)θ−1, 0 < y < 1.

The random variables A1, A2, . . ., with densities f[r]θ as in (4.75), are

said to have the GEM distribution GEM(θ) with parameter θ, so namedafter Griffiths, Engen and McCloskey. In (4.87), we give the density fθ(r) ofthe first r components of the Poisson-Dirichlet distribution PD(θ), whichis the vector of decreasing order statistics of (A1, A2, . . .); this density ismuch less explicit than the density f[r]

θ. See also Chapter 4.11 for furtherdetails.

4.9. The largest cycles 95

4.9 The largest cycles

Having established (3.4), the convergence of the joint distribution of thenumbers of small cycles, and its refinements, all in the context of θ–biasedpermutations, we now turn to the large cycles. The basic total variationestimate (3.11) is a tautology in this setting, but the limiting approxima-tion (3.5) has content, and it is this that we now explore. Our first stepis to find the asymptotic behavior of the longest cycle. The main tool isLemma 4.13. We begin with the following result, due to Kingman (1977).

Lemma 4.23 As n → ∞, n−1L1(n) →d L1, a random variable with

distribution function Fθ given by

Fθ(x) = eγθxθ−1Γ(θ)pθ(1/x), x > 0, (4.76)

where pθ is as defined in Lemma 4.7.

Proof. Notice that for 1 ≤ m ≤ n,

IP[L1(n) ≤ m] = IP[Cm+1

(n) + · · ·+ Cn(n) = 0]

= IP[Zm+1 + · · ·+ Zn = 0 |T0n = n]

= IP[Zm+1 = 0] · · · IP[Zn = 0]IP[T0m = n]IP[T0n = n]

(4.77)

= exp−θ(h(n+ 1)− h(m+ 1)) IP[T0m = n]IP[T0n = n]

.

Now use Lemma 4.13 to see that, for x ∈ (0, 1],

IP[n−1L1(n) ≤ x] = IP[L1

(n) ≤ bnxc] ∼ Fθ(x) =xθ−1pθ(1/x)

pθ(1).

The proof is completed by substituting for pθ(1) from Corollary 4.8. Forx > 1, the right hand side of (4.76) is 1, using Corollary 4.8 once more.

ut

Remark. The probability density function fθ of L1 may be found from(4.76) and (4.30), because

e−γθ

Γ(θ)fθ(x) = (θ − 1)xθ−2pθ

(1x

)− xθ−2 1

xp′θ

(1x

)= θxθ−2pθ

(1x

)− xθ−2

[pθ

(1x

)+

1xp′θ

(1x

)]= θxθ−2pθ

(1x

)− θxθ−2

[pθ

(1x

)− pθ

(1x− 1)]

= θxθ−2pθ

(1x− 1).


Figure 4.3. Probability density (4.78) of L1.Solid line: θ = 0.5, dotted line θ = 1.0, dash-dot line θ = 2.0

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Hence (cf. Watterson (1976))

fθ(x) = eγθΓ(θ + 1)xθ−2pθ

(1x− 1), 0 < x ≤ 1. (4.78)

The density fθ(·) is plotted for various values of θ in Figures 4.3 and 4.4.

Remark. Recall that, for any positive random variable X with Laplacetransform φ(s), we have

IE(1 +X)−α =∫ ∞

0

sα−1e−s

Γ(α)φ(s)ds.

For example, use of the representation (4.23) shows that

IE(1 +Xθ)−θ =1

eγθΓ(θ + 1). (4.79)

This result may be used to check directly that fθ(x) is indeed a densityfunction, since∫ 1

0

fθ(x) dx =∫ 1

0

eγθΓ(θ + 1)xθ−2pθ

(1x− 1)dx

= eγθΓ(θ + 1)∫ ∞

0

(1 + v)−θpθ(v) dv = 1,

the last equality following from (4.79).


Figure 4.4. Probability density (4.78) of L1.Solid line: θ = 1.0, dotted line θ = 1.1, dash-dot line θ = 0.9

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Remark. Combining Corollary 4.8 and (4.78) shows that

fθ(x) = θx−1(1− x)θ−1, 1/2 ≤ x < 1,

as shown in Watterson (1976)

Remark. Using (4.76) and (4.78), it can be seen that

fθ(x) = θx−1(1− x)θ−1Fθ

(x

1− x

); (4.80)

see Vershik and Shmidt (1977) if θ = 1, and Ignatov (1982) for general θ.Ignatov shows that fθ is the unique solution of (4.80) that is the densityfunction of a random variable with values in (0,1).

We now change strategies, and derive a local limit theorem in place ofthe distributional result of Lemma 4.23. To set the scene, we have

Lemma 4.24 Suppose that m ≤ n and m→∞ with n in such a way thatm/n→ x ∈ (0, 1). Then

IP[Cn(n) + · · ·+ Cm+1(n) = 0, Cm(n) = 1] ∼ fθ(x)

n. (4.81)


Proof. From the Conditioning Relation (4.6) and properties of the Poissondistribution,

IP[Cn(n) + · · ·+ Cm+1(n) = 0, Cm(n) = 1]

= IP[Zn + · · ·+ Zm+1 = 0, Zm = 1 |T0n = n]

= IP[Zn = 0] · · · IP[Zm+1 = 0]IP[Zm = 1]IP[T0,m−1 = n−m]

IP[T0n = n]

= exp−θ(h(n+ 1)− h(m)) θm

IP[T0,m−1 = n−m]IP[T0n = n]

.

Hence, invoking Lemma 4.13, we have

IP[Cn(n) + · · ·+ Cm+1(n) = 0, Cm(n) = 1] ∼ xθ

θ

nx

pθ(1/x− 1)xpθ(1)

=1nθxθ−2 pθ(1/x− 1)

pθ(1)= fθ(x)/n,

the last step following from (4.78). ut

This leads to the local limit theorem for L1(n).

Lemma 4.25 Suppose that m ≤ n satisfies m/n → x ∈ (0, 1) as n → ∞.Then

nIP[L1(n) = m] → fθ(x). (4.82)

Proof. Note first that

IP[L1(n) = m] = IP[Cn(n) + · · ·+ Cm+1

(n) = 0, Cm(n) ≥ 1].

The probability on the right can in turn be written as

IP[Cn(n) + · · ·+ Cm+1(n) = 0, Cm(n) = 1]

+∑l≥2

IP[Cn(n) + · · ·+ Cm+1(n) = 0, Cm(n) = l]. (4.83)

The asymptotic behavior of the first term is given in Lemma 4.24, so theproof will be completed if we can show that the second term is o(n−1). TheConditioning Relation (3.1) shows that this term is

IP[Zn = 0] · · · IP[Zm+1 = 0]∑l≥2

IP[Zm = l]IP[T0,m−1 = n−ml]

IP[T0n = n], (4.84)

and, as before, if m/n→ x > 0,

IP[Zn = 0] · · · IP[Zm+1 = 0] = exp−θ(h(n+ 1)− h(m+ 1)) → xθ.


Now use Lemma 4.12 (i) and (ii) with b = 0 to see that∑l≥2

IP[Zm = l]IP[T0,m−1 = n−ml]

IP[T0n = n]≤ e−θh(m)

IP[T0n = n]IP[Zm ≥ 2], (4.85)

where θ = min(1, θ). Under the assumptions of the lemma, and usingLemma 4.13 to show that IP[T0n = n] ∼ n−1pθ(1), this last term is oforder

O(n−θ · n · n−2

)= O(n−1−θ) = o(n−1),

as required. ut

Remark. The order O(n−1−θ) of the bound derived using (4.85) for thesecond term in (4.84) cannot be improved uniformly for all m, even thoughfor specific values of m, particularly m > n/2, the error may be muchsmaller. To see this, note that the remainder term on the right of (4.83) isbounded below by its first term,

ρn = IP[Cn(n) + · · ·+ Cm+1(n) = 0, Cm(n) = 2]

= IP[Zn = 0] · · · IP[Zm+1 = 0]IP[Zm = 2]IP[T0,m−1 = n− 2m]

IP[T0n = n].

For θ ≤ 1, consider the case where m = bn/2c. If n is even,

IP[T0,m−1 = n− 2m] = IP[T0,m−1 = 0] = e−θh(m) n−θ,

whereas if n is odd, then

IP[T0,m−1 = n− 2m] = IP[T0,m−1 = 1] = θe−θh(m) n−θ

also. Since, from (4.18), IP[T0n = n] ∼ n−1e−γθ/Γ(θ) n−1, we see that

ρn n−(1+θ). (4.86)

Then, for any θ, including θ > 1, taking m = d(n+ 1)/3e, we have

IP[T0,m−1 = n− 2m] ∼ IP[T0,m−1 = m− 1] n−1,

and hence ρn n−2. Thus no uniform bound of order smaller thanO(n−1−θ) is possible.

The same approach is now exploited to understand the asymptotics of ther largest cycle lengths L1

(n), . . . , Lr(n), the density of the limiting random

vector (Watterson, 1976) emerging naturally in the course of the proof. Weestablish the following local limit theorem.

Theorem 4.26 For r ≥ 2, suppose that 0 < xr < xr−1 < · · · < x1 < 1satisfy 0 < x1 + · · · + xr < 1. Then, if the integers mi = mi(n) are suchthat n−1mi → xi, 1 ≤ i ≤ r, it follows that

limn→∞

nrIP[Li(n) = mi, 1 ≤ i ≤ r] (4.87)


= fθ(r)(x1, . . . , xr) =

eγθθrΓ(θ)xθ−1r

x1x2 · · ·xrpθ

(1− x1 − · · · − xr

xr

).

Proof. The proof is essentially the same as that of Lemma 4.25. Firstassume that n is large enough to ensure that the integers m1,m2, . . . ,mr

satisfy the conditions

1 ≤ mr < mr−1 < · · · < m1 < n, m = m1 + · · ·+mr ≤ n,

and let An(C(n)) = An(C(n);m1,m2, . . . ,mr−1,mr) denote the event

Cn(n) = 0, . . . , Cm1+1(n) = 0, Cm1

(n) = 1, Cm1−1(n) = 0, . . . ,

Cm2+1(n) = 0, Cm2

(n) = 1, Cm2−1(n) = 0, . . . , Cmr−1+1

(n) = 0,Cmr−1

(n) = 1, Cmr−1−1(n) = 0, . . . , Cmr+1

(n) = 0.

Then

IP[L1(n) = m1, . . . , Lr

(n) = mr] = IP[An(C(n)), Cmr

(n) ≥ 1].

This last probability can be split into two terms,

IP[An(C(n)), Cmr

(n) = 1] +∑l≥2

IP[An(C(n)), Cmr

(n) = l].

Using the Conditioning Relation, the first of these can be expressed interms of the Zi as

IP[An(Z), Zmr= 1 |T0n = n]

= IP[An(Z)] IP[Zmr = 1]IP[T0,mr−1 = n−m]

IP[T0n = n], (4.88)

which reduces to

IP[T0,mr−1 = n−m]IP[T0n = n]

θre−θ(h(n+1)−h(mr))

m1 · · ·mr(4.89)

Applying the result of Lemma 4.13 and simplifying shows that

limn→∞

nrIP[An(C(n)), Cmr

(n) = 1] = fθ(r)(x1, . . . , xr).

It remains to show that∑l≥2 IP[An(C(n)), Cmr

(n) = l] = o(n−r). But thisprobability is just

IP[An(Z)]∑l≥2

IP[Zmr= l]

IP[Tmr−1 = n−m− (l − 1)mr]IP[T0n = n]

≤ IP[An(Z)]e−θh(mr)

IP[T0n = n]IP[Zmr ≥ 2], (4.90)

using Lemma 4.12 (i),(ii). Now, from Lemma 4.13, we have

IP[T0n = n] ∼ n−1pθ(1),


and direct calculation shows further that IP[An(Z)] ≤ θr−1/(m1 · · ·mr−1),and that IP[Zmr ≥ 2] ≤ θ2/(2m2

r). Hence, again writing θ = θ ∧ 1, theexpression (4.90) is of order

O(n−(r−1) · n · n−θ · n−2) = O(n−r−θ) = o(n−r)

as required, completing the proof. ut

The densities fθ(r)(·), defined by fθ(1)(x) = fθ(x) for r = 1 and in (4.87)for r ≥ 2, satisfy the natural consistency condition∫ xr∧(1−sr)

0

fθ(r+1)(x1, . . . , xr, y)dy = fθ

(r)(x1, . . . , xr), r = 1, 2, . . . ,

(4.91)where sr =

∑rj=1 xj . To verify this, consider the two cases xr < 1− sr and

its complement separately; make use of the identity in (4.32), and, in thelatter case, also of (4.26). Since fθ is indeed a probability density function,we deduce the same for fθ(r), for each r = 2, 3, . . .. As a consequence ofLemma 4.26, we then have the following result of Kingman (1977):

Corollary 4.27 For θ–biased random permutations, as n→∞,

n−1(L1(n), L2

(n), . . .) →d (L1, L2, . . .)

in ∆, where for each r = 1, 2, . . ., (L1, L2, . . . , Lr) has density fθ(r)(·) givenin (4.87), and where ∆ is as defined for (3.5).

Proof. For any r ≥ 1, the convergence in distribution of the randomvector n−1(L1

(n), . . . , Lr(n)) follows from Theorem 4.26 by Scheffe’s theo-

rem (Scheffe, 1947). To see this, for each n, let Un1, . . . , Unr be independentuniform random variables on (0,1), independent of (L1

(n), . . . , Lr(n)), and

define smoothed random variables by

Li(n) = n−1(Li(n) + Uni), i = 1, . . . , n.

The density of (L1(n), . . . , Lr

(n)) at (x1, . . . , xr) is given precisely bynrIP[Li(n) = bnxic, 1 ≤ i ≤ r], and so Theorem 4.26 shows that the den-sity of (L1

(n), . . . , Lr(n)) converges pointwise to the density of (L1, . . . , Lr).

Scheffe’s theorem then shows that (L1(n), . . . , Lr

(n)) →d (L1, . . . , Lr).Finally,

(L1(n), . . . , Lr

(n))− n−1(L1(n), . . . , Lr

(n)) = n−1(Un1, . . . , Unr) →d 0,

and so n−1(L1(n), . . . , Lr

(n)) →d (L1, . . . , Lr) as well. ut

Remark. The proof of Theorem 4.26 relies only on Lemmas 4.13 and 4.12 (i)and (ii), both of which can easily be extended to wider classes of logarithmiccombinatorial structures.


Remark. The random vector (L1, L2, . . .) having distribution determinedby the marginal densities f(r) in (4.87) is said to have the Poisson-Dirichlet distribution with parameter θ. In Section 4.11 we give severalother representations of the distribution.

4.10 The Erdos-Turan Law

In Chapter 1.1, we described the result of Erdos and Turan (1965) con-cerning the asymptotic normality of the log of the order On of a randomlychosen element of Sn. Their proof is based on showing that logOn is rela-tively close to logPn =

∑nj=1 Cj

(n) log j which, suitably centered and scaled,is asymptotically normally distributed.

Here we give a proof of this result when C(n) is distributed as ESF(θ),following the treatment of Arratia and Tavare (1992). The proof has threesteps. First the Feller coupling is used to show that logPn − logOn isreadily controlled by the corresponding functional of Zj , j ≥ 1. The sec-ond step uses a moment calculation for the Poisson process to show thatthis functional of the Poisson process is negligible relative to log3/2 n.The last step shows that logPn is close to the corresponding functional∑nj=1 Zj log j of the Poisson process. We begin with the following lemma.

Let a = (a1, . . . , an) ∈ ZZn+, and define

r(a) =

∏i≤n i

ai

l.c.m.i : ai > 0.

Lemma 4.28 For a, b ∈ ZZn+ satisfying a ≤ b+ej, where ej denotes the jthcoordinate vector, we have

1 ≤ r(a) ≤ nr(b). (4.92)

Proof. The leftmost inequality in (4.92) is immediate. To establish thesecond inequality, note that r(a+ ei)/r(a) ∈ [1, i], since if a is increased byei, then the numerator of r(a) is multiplied by i, whereas the denominatorof r(a) is multiplied by a divisor of i. In particular, r(·) is an increasingfunction. Finally, r(a) ≤ r(b + ej) ≤ jr(b) ≤ nr(b), completing the proof.

ut

This lemma is used to establish the following result.

Lemma 4.29 Let C(n) distributed as in (4.5) and Zj , j ≥ 1 distributedas independent Po(θ/j) random variables be realized as in (4.3) and (4.4).Then, for every n,

0 ≤ log r(C(n)) = logPn − logOn ≤ log n+ log r(Z[1, n]). (4.93)

4.10. The Erdos-Turan Law 103

Proof. Recall from (4.3) that

Ci(n) = #i− spacings in 1ξ2ξ3 . . . ξn1,

and that Zi = Ci(∞) satisfies

Zi = #i− spacings in 1ξ2ξ3 . . .

It follows that

Ci(n) ≤ Zi + 1lJn = i,

where Jn ∈ [n] is the position of the last 1 in 1, ξ2, . . . , ξn. Hence it followsthat C(n)[1, n] ≤ Z[1, n] + eJn

. Now apply Lemma 4.28. ut

The next result is based on a direct calculation for the Poisson randomvariables Zj , j ≥ 1; for a proof, see Arratia and Tavare (1992).

Lemma 4.30 As n→∞,

IE log r(Z[1, n]) = O(log n(log log n)2). (4.94)

The Erdos-Turan Law for the Ewens Sampling Formula is

Theorem 4.31 As n→∞,

logOn − θ2 log2 n√

θ3 log3 n

→d N(0, 1). (4.95)

Proof. First we combine (4.93) and (4.94) to conclude that

0 ≤ IE(logPn − logOn)√log3 n

= O

((log log n)2√

log n

),

from which it follows that the theorem will be proved if we establish that∑nj=1 Cj

(n) log j − θ2 log2 n√

θ3 log3 n

→d N(0, 1). (4.96)

As earlier, we prove the result with the (dependent) Cj(n) replaced bythe (independent) Zj , and show that the error in this approximation isnegligible. Observe that

∑nj=1 log jIEZj ∼ (θ/2) log2 n and that

n∑j=1

Var (Zj log j) = θn∑j=1

j−1 log2 j ∼ θ log3 n/3;

n∑j=1

IE|Zj − IEZj |3 log3 j = O(log4 n).


Lyapounov’s theorem then establishes that∑nj=1 Zj log j − θ

2 log2 n√θ3 log3 n

→d N(0, 1). (4.97)

The absolute value of the error Rn in the approximation of the left side of(4.96) by the left side of (4.97) is

|Rn| =|∑nj=1 log j(Cj(n) − Zj)|√

θ3 log3 n

≤√

3

∑nj=1 |Cj

(n) − Zj |√θ log n

.

The proof is completed by noting that Rn→P 0, using (4.64). utFor rates of convergence, see Chapter 7.4.

4.11 The Poisson-Dirichlet and GEM distributions

The Poisson-Dirichlet distribution, denoted by PD(θ), was named by King-man (1975), who defined it as the distribution of the normalized pointsσ1 > σ2 > · · · of a Poisson process with intensity θe−x/x, x > 0:

PD(θ) = L(L1, L2, . . .) = L((σ1/σ, σ2/σ, . . .)), (4.98)

where σ = σ1 + σ2 + · · · is independent of (L1, L2, . . .) and has a Gammadistribution with density yθ−1e−y/Γ(θ), y > 0. Kingman also showed that,since σ and (σ1/σ, σ2/σ, . . .) are independent,

PD(θ) = L((σ1, σ2, . . .) |σ = 1). (4.99)

Griffiths (1979) showed that, for r ≥ 1 and j1 + · · ·+ jr = j,

IE(Lj11 · · ·Ljrr ) =IE(σj11 · · ·σjrr )

IE(σj)(4.100)

=θrΓ(θ)

Γ(θ + j)

∫yj1−11 · · · yjr−1

r exp

(−

r∑l=1

yl − θE1(yr)

)dy1 · · · dyr,

where E1(x) =∫∞xy−1e−y dy as before, and the integral is taken over the

set y1 > · · · > yr > 0. In particular,

IELjr =Γ(θ + 1)Γ(θ + j)

∫ ∞

0

(θE1(x))r−1

(r − 1)!xj−1e−x−θE1(x) dx. (4.101)

Values of the mean and variance of L1 are given in Table 4.1 for severalvalues of θ.

If T1 < T2 < · · · are the points of a Poisson process of rate θ on (0,∞),and E1, E2, . . . are i.i.d. exponential random variables with mean 1, thenTavare (1987) showed that

si = e−TiEi, i ≥ 1,

4.11. The Poisson-Dirichlet and GEM distributions 105

Table 4.1. Mean and variance of L1

θ IEL1 Var(L1)

0.5 0.7578 0.03701.0 0.6243 0.03692.0 0.4756 0.02715.0 0.2973 0.011610.0 0.1949 0.0047

is an enumeration of the points of the Poisson process with intensityθe−x/x, x > 0, and that the GEM distribution can be represented as

GEM(θ) = L((s1/s, s2/s, . . .)),

where s = s1+s2+· · ·. Thus the decreasing order statistics of GEM(θ) havethe PD(θ) distribution. To go in the other direction, write L = (L1, L2, . . .)and let η1, η2, . . . be conditionally independent and identically distributedwith IP[η1 = k | L] = Lk, k ≥ 1. The sequence I1, I2, . . . of distinct valuesobserved in η1, η2, . . . induces a random permutation L# = (LI1 , LI2 , . . .)of L, known as the size-biased permutation of L. Patil and Taillie (1977)and Donnelly and Joyce (1989) show that L(L#) = GEM(θ). McCloskey(1965) and Engen (1975) show that GEM(θ) is invariant under size-biasedpermutation. For more on this, and on size-biasing of Poisson processesmore generally, see Perman, Pitman and Yor (1992).

Ignatov (1982) gave a representation of GEM(θ) as the distribution ofthe spacings 1− τ1, τ1 − τ2, . . . between the points τi = e−Ti , i ≥ 1, of thePoisson process on (0,1) with intensity θ/x. Now letting Ei be independentexponential random variables with mean θ−1, we see that

e−Ti − e−Ti+1 = e−(E1+···+Ei) − e−(E1+···+Ei+1)

= e−E1 · · · e−Ei(1− e−Ei+1)= (1− Y1) · · · (1− Yi)Yi+1,

where Yi = 1− e−Ei has the Beta(1,θ) distribution.The finite-dimensional distributions fθ[r] of GEM(θ) defined in (4.75) are

clearly much simpler to describe than those of PD(θ). Suppose for examplethat V = (V1, V2, . . .) is a random vector with Vi ≥ 0 for i ≥ 1 and

∑Vi = 1

almost surely. We define the residual fractions Ri by

R1 = V1; Rj =Vj

Vj + Vj+1 + · · ·=

Vj1− V1 − · · · − Vj−1

, j ≥ 2;


note that this can be inverted to give

Vj = (1−R1)(1−R2) · · · (1−Rj−1)Rj , j ≥ 1.

We have seen that when V ∼ GEM(θ), the Ri are independent, with Ri ∼Beta(1, θ). What happens when V ∼ PD(θ)? In this case, a straightforwardcalculation using (4.87) shows that Ri, i ≥ 1, forms a homogeneous Markovchain on [0,1] with transition probabilities determined by

IP(Ri+1 ∈ dy | Ri = w) = qθ(y | w)dy (4.102)

= θw−1(1− w)θ−1 fθ(y)fθ(w)

dy,

for 0 < y < w/(1−w)∧1, where fθ(·) is the density of the largest componentof PD(θ), defined in (4.78).

Following work of Vershik and Shmidt (1977) for the case θ = 1, Ignatov(1982) showed that the only continuous stationary distribution of the chainhas the law of 1/(1 +Xθ), which has density π(·) given by

π(y) =y−θfθ(y)eγθΓ(θ + 1)

=1y2pθ

(1− y

y

), 0 < y < 1. (4.103)

That π(y) ∝ y−θfθ(y) is indeed a stationary measure may be checkedimmediately by showing that

∫ 1

0π(w)qθ(y |w)dw = π(y), 0 < y < 1. When

the Markov chain R1, R2, . . . is stationary, the resulting V1, V2, . . . havethe property that the ratios Dn = Vn+1/Vn, n ≥ 1, are independent andidentically distributed Beta(θ,1) random variables. Further properties ofthe Vershik-Shmidt chain may be found in Pitman and Yor (1997).

The scale invariant Poisson process

What happens if we rescale the limit process (Z1, Z2, . . .) arising in theEwens Sampling Formula? Formally, consider the Poisson random measureMn on (0,∞) with mass Zj at the point j/n, for j ≥ 1:

Mn(·) =∞∑j=1

Zjδj/n(·).

The independence of the Zj means that, for any system of non-overlappingsubintervals of (0,∞), the random measure Mn assigns independentmasses. The expected mass assigned to an interval (a, b) is

IE∑

i/n∈(a,b)

Zi =∑

na<i<nb

θ/i ∼ θ log(b/a) =∫ b

a

θx−1dx.

Since the Zi are independent Poisson random variables, it follows that

Mn →d M, (4.104)


where M is the Poisson process on (0,∞) with intensity θx−1dx. Theconvergence in (4.104) is in the sense of point processes: for everyf ∈ Cb0((0,∞)), the set of bounded continuous functions on (0,∞) withbounded support, the random variable

∫fdMn converges in distribution

to∫fdM; see Daley and Vere-Jones (1988).

Observe that, for the random variable T0n that appears in theconditioning (4.6), we have

T0n

n=∑i≤n

i

nZi =

∫(0,1]

xMn(dx).

We have seen that T0n/n →d Xθ; thus it follows that Xθ can be realizedas

Xθ =∫

(0,1]

xM(dx).

The limit process M is fundamental. Unlike the Mn, which can havemass two or more at a single point, the process M has an intensity measurewhich is continuous with respect to Lebesgue measure. Hence it can beconsidered as a random discrete subset of (0,∞). Its points can be labelledτi, i ∈ ZZ, with

0 < · · · < τ2 < τ1 < 1 < τ0 < τ−1 < τ−2 < · · · <∞ (4.105)

almost surely. In terms of the labelling (4.105),

Xθ =∑i>0

τi

is the sum of the locations of all points of M in (0,1).For each θ, the random measure M is scale invariant, meaning that,

for any c > 0, the random set cτi : i ∈ ZZ has the same distributionas the set τi : i ∈ ZZ. The easiest way to get comfortable with thescale invariant Poisson process is to start with the ordinary, translationinvariant Poisson process on (−∞,∞) with intensity θ. This latter processhas the property that the number of points in an interval (a, b) has a Poissondistribution with mean θ(b − a), with independent numbers of points fordisjoint intervals. The points of the translation invariant Poisson processcan be labelled Ti for i ∈ ZZ with

−∞ < · · · < T−2 < T−1 < T0 < 0 < T1 < T2 < · · · <∞

almost surely. Starting from the Ti, the scale invariant Poisson pro-cess, with the specific labelling (4.105), can be constructed by settingτi = exp(−Ti) for all i ∈ ZZ.

For the reader who wants a concrete handle on the scale invariant Pois-son process, the following should be helpful. For the special case θ = 1,the arrival points T1, T2, . . . of the translation invariant Poisson process,restricted to (0,∞), can be constructed as Tk = W1 + · · · + Wk. Here,


the inter-arrival times Wi are independent, and exponentially distributedwith mean one. Thus, defining Ui = exp(−Wi), the Ui are independentand uniformly distributed over [0, 1]. Thus the points of the scale invariantPoisson process restricted to (0, 1), i.e. τ1, τ2, . . ., have been constructed asproducts of independent uniforms:

τk = U1U2 · · ·Uk. (4.106)

To get the case for general θ from the standard θ = 1 case, divide theinter-arrival times Wi by θ, which changes them to exponentials with mean1/θ, and take the 1/θ power of the uniforms Ui, which changes them torandom variables Di having density θxθ−1, 0 < x < 1. Thus

Xθ = U1/θ1 + U

1/θ1 U

1/θ2 + U

1/θ1 U

1/θ2 U

1/θ3 + · · ·

= D1 +D1D2 +D1D2D3 + · · ·

where the Di are independent with common distribution Beta(θ,1).This approach provides another representation of PD(θ):

Theorem 4.32 For any θ > 0, let the scale invariant Poisson process on(0,∞), with intensity θ dx/x, have its points labelled so that (4.105) holds.Let (L1, L2, . . .) have the Poisson-Dirichlet distribution, with parameter θ.Then

L( (L1, L2, . . .) ) = L( (τ1, τ2, . . .) |Xθ = 1) (4.107)

Proof. Let pθ denote the density function of Xθ, and let T (x) denote thesum of the locations of the points in (0, x], so that Xθ = T (1), T (x)/x hasthe same distribution as T (1), and T (x) is independent of the process Mrestricted to (x,∞). Let (x1, . . . , xk) satisfy x1 > · · ·xk > 0, and alsox1 + · · ·+ xk < 1.

The joint density of (τ1, . . . , τk, Xθ) at (x1, . . . , xk, y) is therefore givenby

e−∫ 1

x1θz−1dz θ

x1· · · e

−∫ xk−1

xkθz−1dz θ

xkf(y;x1, . . . , xk), (4.108)

where f(·;x1, . . . , xk) is the conditional density of T (1), given that the firstpoints are at x1, . . . , xk. But

IP[T (1) ≤ y | τ1 = x1, . . . , τk = xk] = IP[T (xk−) + x1 + . . .+ xk ≤ y]= IP[T (xk−) ≤ y − x1 − . . .− xk]= IP[T (1) ≤ (y − x1 − · · · − xk)/xk],

so that

f(y;x1, . . . , xk) = x−1k pθ((y − x1 − · · · − xk)/xk).


Substituting into (4.108) and simplifying, it follows that the conditionaldensity of (τ1, . . . , τk), given Xθ = 1, is

θk

x1 · · ·xkxθk

1xkpθ

(1− x1 − · · · − xk

xk

)/pθ(1),

which reduces to the expression in (4.87). ut

Here we give a final representation of PD(θ) that involves elementaryconditioning.

Lemma 4.33 Let 1 > τ1 > τ2 > · · · be the points of M in (0, 1), and setτ = τ1 + τ2 + · · ·. Then

PD(θ) = L((τ1/τ, τ2/τ, . . .) | τ ≤ 1).

Proof. Using scale invariance and independence of M on disjoint intervals,the result in (4.107) shows that

L(L1, L2, . . .) = L(t−1(τ1, τ2, . . .) | τ = t).

Mixing over the distribution of τ completes the proof. ut

Further connections between the various representations of the PD(θ)law appear in Arratia, Barbour and Tavare (1999a).

We conclude this chapter with another proof of Lemma 4.10, which givesthe density pθ(α)(x) of the random variable Xθ

(α). The formula (4.38) for thedensity pθ(α)(x) can be understood directly in terms of the scale invariantPoisson process with intensity θdx/x, so that each term has a probabilisticinterpretation. In terms of the labelling (4.105), the random variable Xθ

(α)

is realized as

Xθ(α) =

∫(α,1]

xM(dx) =∑

i>0:τi>α

τi,

the sum of the locations of all the points of M in (α, 1).

Proof. Consider first the number K of points of M in (α, 1), so thatIEK =

∫ 1

αθxdx = θ log(1/α), and K ∼ Po(θ log(1/α)). We then have

IP[Xθ(α) = 0] = IP[K = 0] = exp(−θ log(1/α)) = αθ,

accounting for (4.35). For α < x < 1,

IP[K = 1, τ1 ∈ (x, x+ dx)] = αθθ

xdx,

giving the first term of (4.38); making the choice (θ/x)1lα ≤ x ≤ 1in (4.38) just amounts to modifying a density at two single points, whichwe may.


For any 0 < α < x and k ≥ 2, the event K = k,Xθ(α) = x is possible

only if kα < x. If kα < x and K = k, and if τ1 = y1, . . . , τk−1 = yk−1, thenwe must have τk = yk, where yk is defined as a function of x, y1, . . . , yk−1

by yk = x − (y1 + · · · + yk−1). For points (y1, . . . , yk−1) ∈ Jk(α, x) whichalso satisfy y1 > · · · > yk−1 > yk, we have

IP[K = k; τi ∈ (yi, yi + dyi), i = 1, . . . , k − 1; Xθ(α) ∈ (x, x+ dx)]

= IP[K = k; τi ∈ (yi, yi + dyi), i = 1, . . . , k − 1;τk ∈ (x− (τ1 + · · ·+ τk−1), x− (τ1 + · · ·+ τk−1) + dx)]

= αθθdy1y1

· · · θdyk−1

yk−1

θdx

yk

= αθθkdy1 · · · dyk−1

y1 · · · yk−1(x− (y1 + · · ·+ yk−1))dx.

This accounts for the integrand in (4.38); the factor of 1/k! arises from thecorrespondence between Jk, which consists of y1, . . . , yk−1, yk without therestriction that y1 > · · · > yk and the subset of Jk on which y1 > · · · > yk.

ut

Remark. The Poisson-Dirichlet and GEM distributions arise in the contextof the two-parameter Ewens Sampling Formula of Pitman and Yor (1997).For more on this, and further connections with size-biasing and the ChineseRestaurant Process, see Pitman (2002).


5Logarithmic combinatorial structures

In this chapter, we take the methods which were applied to the EwensSampling Formula in the previous chapter, and adapt them for use withmore general logarithmic combinatorial structures. Thus we consider struc-tures which satisfy the Conditioning Relation (3.1) for independent randomvariables Zi taking values in ZZ+, but which do not now necessarily satisfyZi ∼ Po(θ/i), as in the previous chapter. Instead, we simply require theLogarithmic Condition (3.3):

i IP[Zi = 1] → θ; i IEZi → θ (5.1)

for some θ ∈ (0,∞), and thus also

θ = supi≥1

i IEZi <∞. (5.2)

The main results that we extend to this setting are the limit theorems (3.4)and (3.5) for the small and largest components.

5.1 Results for general logarithmic structures

To achieve these extensions, we first prove some consequences of theLogarithmic Condition, which, although elementary, lie at the heart ofthe subsequent argument. Combining them with the analogue (LLT) ofLemma 4.13, stated in (5.6) below, the limit theorems (3.4) and (3.5) forthe small and largest components follow immediately. However, establish-ing that (LLT) holds involves proving approximate analogues of the size

112 5. Logarithmic combinatorial structures

biasing equation (4.51), and this makes more demands on the structure ofthe problem.

Lemma 5.1 Suppose the Zi satisfy the Logarithmic Condition. Then, asi→∞,

IP[Zi ≥ 2] = o(i−1), (5.3)

and

IP[Zi = 0] = 1− θi−1 + o(i−1). (5.4)

Proof. To establish (5.3),

IEZi = IP[Zi = 1] +∞∑j=2

jIP[Zi = j]

≥ IP[Zi = 1] + 2∑j≥2

IP[Zi = j]

= IP[Zi = 1] + 2IP[Zi ≥ 2].

Hence

0 ≤ 2iIP[Zi ≥ 2] ≤ i(IEZi − IP[Zi = 1]) → 0.

The relation (5.4) follows immediately. ut

From this we see that, for large i, the distribution of Zi is close to Poissonwith mean θ/i.

Corollary 5.2 Suppose the Zi satisfy the Logarithmic Condition, and letZ∗i be independent Poisson random variables with IEZ∗i = θ/i, i ≥ 1. Then

dTV (L(Zi),L(Z∗i )) ≤ ε(i)i−1,

where ε(i) ↓ 0 as i→∞.

Proof. From the definition of total variation distance, we have

2dTV (L(Zi),L(Z∗i )) =∑j≥0

|IP[Zi = j]− IP[Z∗i = j]|

≤ |IP[Zi = 0]− IP[Z∗i = 0]|+ |IP[Zi = 1]− IP[Z∗i = 1]|+ IP[Zi ≥ 2] + IP[Z∗i ≥ 2].

The result now follows from Lemma 5.1. ut

We make frequent use of

5.1. Results for general logarithmic structures 113

Theorem 5.3 Let Z1, Z2, . . . be independent random variables taking valuesin ZZ+ and satisfying the Logarithmic Condition. Then, if b = bn = o(n) asn → ∞, it follows that n−1Tbn →d Xθ, where Tbn = Tbn(Z) is as definedin (3.12).

Proof. Let Z∗i be independent Poisson random variables with IEZ∗i = θ/i,and set T ∗bn = Tbn(Z∗). From Corollary 5.2, dTV (L(Zi),L(Z∗i )) ≤ ε(i)i−1.Choose any sequence b = bn = o(n) such that ε(b) log(n/b) → 0 as n→∞.Then we immediately find that

dTV (L(Tbn),L(T ∗bn)) ≤ dTV (L(Z[b+ 1, n]),L(Z∗[b+ 1, n]))

≤n∑

j=b+1

ε(j)j−1 ≤ ε(b) log(n/b), (5.5)

and hence, since n−1T ∗bn →d Xθ by Theorem 4.6, it follows also thatn−1Tbn →d Xθ.

If b = bn = o(n) is an arbitrary sequence, let b′ = b′n ≥ bn satisfyb′n = o(n) and ε(b′) log(n/b′) → 0, and write

n−1Tbn = n−1Tbb′ + n−1Tb′n.

Since IE(n−1Tbb′) = n−1∑b′

i=b+1 iIEZi ≤ θn−1b′ → 0, it follows thatn−1Tbb′ →d 0, and n−1Tb′n →d Xθ by the first part. ut

In the case b/n→ α ∈ (0, 1), we have

Theorem 5.4 Let Z1, Z2, . . . be independent random variables taking valuesin ZZ+ and satisfying the Logarithmic Condition. If b = bn ∼ αn as n→∞for some α ∈ (0, 1), then n−1Tbn →d X

(α)θ.

Proof. Just as in the proof of Theorem 5.3,

dTV (L(Tbn),L(T ∗bn)) ≤ ε(b) log(n/b);

this is O(ε(b)) and so tends to zero. ut

The distributional limit theorems 5.3 and 5.4 strongly suggest that theirlocal analogues may also be true: that is, as in Lemmas 4.13 and 4.14, ifm = mn is such that m/n → y ∈ (0,∞) and b = bn = o(n) as n → ∞,then

(LLT) limn→∞

nIP[Tbn = m] = pθ(y), (5.6)

while, if b ∼ αn, 0 < α < 1, and m/n→ y ∈ (0,∞), then

(LLTα) limn→∞

nIP[Tbn = m] = p(α)θ(y), y /∈ α, 1, (5.7)


with nIP[Tbn = m] − pθ(α)1lm > b → 0 in the case where y = α, andwith nIP[Tbn = m] ∼ pθ(m/n) when y = 1.

In the next section, we show that (LLT) and (LLTα) are both true forlogarithmic assemblies, multisets and selections. Theorems 11.1 and 11.2 infact establish that (LLT) and (LLTα) both hold for almost every logarith-mic combinatorial structure, requiring only the minor additional uniformitycondition (6.11). Note that, for y ∈ (0, α), (5.7) is immediate, sinceIP[Tbn ∈ 0 ∪ [b+ 1,∞)] = 1.

For now, we shall assume that both (LLT) and (LLTα) hold, and in-vestigate their immediate consequences. We show in particular that (LLT)is enough to establish the limiting approximations (3.4) and (3.5) for thesmall and largest components. Other immediate consequences of (LLT) and(LLTα) are the counterparts of Lemmas 4.21 and 4.22 and of Theorem 4.20.We do not, however, discuss limit theorems for the total number K0n ofcomponents at this level of generality. This is because the two conditions areenough to show that the behaviour of C(n)[1, b1(n)] is like that of Z[1, b1(n)]for some sequence b1(n) →∞, and that C(n)[b2(n), n] behaves like ESF(θ)for some sequence b2(n) →∞ such that b2(n) = o(n) as n→∞. The possi-bility that there may be a large gap between b1(n) and b2(n) is not excludedwithout imposing some further condition. The results that we prove hereconcern only the very smallest and the very largest components, includingthe situation where the smallest components are actually large. However,since K0n is influenced by the whole range of component sizes, the discus-sion of its distribution is more complicated: we refer the reader to Theorems7.3 and 7.21 for the analogs of (4.69) and (4.67).

We begin by showing that the limiting approximation (3.4) holds for thejoint distribution of the small cycles.

Theorem 5.5 Let C(n) be a combinatorial model satisfying the Condi-tioning Relation (3.1) and the Logarithmic Condition (3.3). Assume alsothat (LLT) holds. Then, as n→∞,

(C1(n), C2

(n), . . .) →d (Z1, Z2, . . .)

in ZZ∞+ .

Proof. The proof follows that of Theorem 4.17, but using (LLT) in placeof Lemma 4.13. ut

Theorem 5.5 can be applied immediately to find the limit distributionof the rth smallest component size, Yr(n), using the familiar duality resultthat Yr(n) > l =

∑lj=1 Cj

(n) < r. Hence, under the assumptions of


Theorem 5.5, for each fixed r and l,

limn→∞

IP[Yr(n) > l] = IP

l∑j=1

Zj < r

.When the smallest component is required to be large, we make use of the

asymptotic behavior of the quantity IP[Z1 = · · · = Zn = 0]. To simplify thediscussion that follows, we assume that the combinatorial structure satisfies

χ =∑j≥1

− log IP[Zj = 0]− θ

j

exists and is finite. (5.8)

We then have the general analogue of Lemma 4.21.

Theorem 5.6 Let C(n) be a combinatorial model satisfying the ConditioningRelation (3.1), and suppose that (LLT) and (LLTα) hold for 0 < α < 1. Ifin addition (5.8) holds then for fixed u > 1, as n→∞,

IP[Y1(n) > n/u] ∼ n−θe−χΓ(θ)uθωθ(u).

Proof. As in the proof of Lemma 4.21,

IP[Y1(n) > b] = IP[Z1 = · · · = Zb = 0]

IP[Tbn = n]IP[T0n = n]

.

Under assumption (5.8), we see that

IP[Z1 = · · · = Zb = 0] ∼(un

)θe−χe−γθ (5.9)

when n/b ∼ u. The asymptotics of IP[Tbn = n] are covered by (LLTα),those of IP[T0n = n] by (LLT). ut

The analogue of Lemma 4.22 is also true, provided that u is not integral.

Theorem 5.7 Let C(n) be a combinatorial model satisfying the ConditioningRelation (3.1) and the Logarithmic Condition (3.3). Suppose further that(LLT) and (LLTα) hold for 0 < α < 1, and that (5.8) holds. Then, ifb ∼ n/u for fixed u ∈ (2,∞) \ ZZ and n→∞,

IP[Y1(n) = b] ∼ n−θ−1e−χΓ(θ + 1)(u− 1)θ−1u2ωθ(u− 1).

Proof. The proof follows closely that of Lemma 4.22. The first term on theright of (4.72) is

IP[C1(n) = · · · = Cb−1

(n) = 0, Cb(n) = 1]

= IP[Z1 = · · · = Zb−1 = 0, Zb = 1]IP[Tbn = n− b]

IP[T0n = n]


= IP[Zb = 1]IP[Z1 = · · · = Zb−1 = 0]IP[Tbn = n− b]

IP[T0n = n]

∼ θu

n

(un

)θe−χe−γθ

pθ(1/u)(1− 1/u)/ne−γθ/(nΓ(θ))

,

using the Logarithmic Condition, (5.9), (LLT1/u) and (LLT) respectively;in applying (LLT1/u), note that 1 − 1/u /∈ 1/u, 1, because 2 < u < ∞.Simplifying as before, we obtain

IP[C1(n) = · · · = Cb−1

(n) = 0, Cb(n) = 1] ∼ e−χΓ(θ + 1)(u− 1)θ−1u2ωθ(u− 1)nθ+1

.

The second term on the right of (4.72) is

IP[C1(n) = · · · = Cb−1

(n) = 0, Cb(n) ≥ 2]

=IP[Z1 = · · · = Zb−1 = 0]

IP[T0n = n]

bn/bc∑j=2

IP[Zb = j]IP[Tbn = n− jb]

≤ IP[Z1 = · · · = Zb−1 = 0]IP[T0n = n]

IP[Zb ≥ 2] max2≤j≤bn/bc

IP[Tbn = n− jb].

That this term is of order o(n−θ−1) for u > 2, u /∈ ZZ, follows from (5.9)and the Logarithmic Condition (3.3), together with (LLT) applied to theprobability IP[T0n = n] and (LLT1/u) applied to each IP[Tbn = n−jb]; notethat, since u /∈ ZZ,

n−1(n− jb) → 1− j/u /∈ 0, 1/u, 1

for any j ≥ 1. ut

Note that the need for conditions strengthening the Logarithmic Conditionis already making itself felt. If integral u ≥ 3 are to be allowed, one needsto assume the additional condition that IP[Zi ≥ 3] = o(i−2).

The next result provides a local limit theorem for the largest components,L1

(n) ≥ L2(n) ≥ · · ·, analogous to that in (3.5).

Theorem 5.8 Let C(n) be a combinatorial model satisfying the Condi-tioning Relation (3.1) and the Logarithmic Condition (3.3). Assume alsothat (LLT) holds. For r ≥ 1, suppose that 0 < xr < xr−1 < · · · < x1 < 1satisfy 0 < sr < 1 and x−1

r (1− sr) /∈ ZZ+, where sr =∑ri=1 xi. Then, if the

integers mi = mi(n) are such that n−1mi → xi, 1 ≤ i ≤ r, it follows that

limn→∞

nrIP[Li(n) = mi, 1 ≤ i ≤ r] = fθ(r)(x1, . . . , xr),

where the density fθ(r) is given in (4.87). As a consequence,

n−1(L1(n), L2

(n), . . .) →d (L1, L2, . . .) ∼ PD(θ).


Proof. The proof mimics that of Theorem 4.26 down to (4.88), at whichpoint (4.89) has to be replaced by

IP[T0,mr−1 = n−m]IP[T0n = n]

n∏i=mr

IP[Zi = 0]r∏j=1

IP[Zmj = 1]IP[Zmj

= 0];

here, m =∑rj=1mj . From (LLT), the first term is asymptotic to

x−1r pθ((1− sr)/xr)/pθ(1); then, from (5.4), the first product is asymptotic

to xθr, while from assumption (5.1) the second product is asymptotic ton−rθrx−1

1 · · ·x−1r . Combining these terms and using the definition (4.87)

of fθ(r) shows that

limn→∞

nrIP[An(C(n)), Cmr

(n) = 1] = fθ(r)(x1, . . . , xr).

To show that∑l≥2 IP[An(C(n)), Cmr

(n) = l] = o(n−r), note that the leftside is

IP[An(Z)]∑l≥2

IP[Zmr = l]IP[T0,mr−1 = n−m− (l − 1)mr]

IP[T0n = n]

≤ IP[An(Z)]IP[Zmr≥ 2]

IP[T0n = n]× max

2≤l≤bx−1r (1−sr)c

IP[T0,mr−1 = n−m− (l − 1)mr], (5.10)

for all n sufficiently large. Since

IP[An(Z)] ≤ IP[Zm1 = 1] · · · IP[Zmr−1 = 1] = O(n−(r−1))

and IP[Zmr ≥ 2] = o(n−1) from Lemma 5.1, it follows from (LLT) that thefirst factor in (5.10) is of order o(n−(r−1)); the second is of order O(n−1) bythe (LLT), because of the assumptions on x1, . . . , xr. The weak convergenceof n−1(L1

(n), L2(n), . . .) to (L1, L2, . . .) now follows by Scheffe’s theorem. ut

Remark. Hansen (1994) proves the weak convergence of n−1(L1(n), L2

(n), . . .)to PD(θ) for a variety of assemblies and multisets, using complex analyticmethods. Her assumptions are rather more restrictive than ours. Both as-semblies and multisets automatically satisfy the Conditioning Relation,and she restricts attention to a proper subset of those which fulfill theLogarithmic Condition, by requiring that an additional condition involv-ing the analytic continuation of a generating function should be satisfied.In the next section, we show that the (LLT) holds for all assemblies andmultisets satisfying the Logarithmic Condition, as well as for all selec-tions, so that Theorem 5.8 and, in particular, the weak convergence ofn−1(L1

(n), L2(n), . . .) to PD(θ), hold for all such structures. Further results

appear in Chapter 7.2.

We conclude this section with a discussion of the distribution of thecomponent sizes of a logarithmic structure known to have a given number


of components. This provides the generalization of Theorem 4.20 to thegeneral logarithmic class. We assume only the Conditioning Relation (3.1)and the Logarithmic Condition (3.3).

Our first task is to identify the correct analog of Landau’s formula (1.47)for primes and its counterpart (1.29) for permutations, when we have a moregeneral logarithmic structure. This requires consideration of the minimumachievable values for the random variables Zi. For i = 1, 2, . . . let

li = minj : IP[Zi = j] > 0

so that 0 ≤ li <∞. The Logarithmic Condition implies that

N0 = minn ≥ 0: ∀i > n, IP[Zi = 0] > 0 and IP[Zi = 1] > 0

is finite. Thus

l0 =∑i≥1

li <∞; t0 =∑i≥1

ili <∞.

Note that for assemblies, multisets, and selections, l1 = l2 = · · · = 0 andhence l0 = t0 = 0.

If n is large and a random permutation is constrained to have only somefixed number k of cycles, we have seen that all of these cycles are quite large— typically of sizes of orders na1 , . . . , nak−1 and n, where a1, . . . , ak−1 isa random (k − 1)–sample from U[0, 1]. Here, however, since C(n)

i ≥ li a.s.for each i ≥ 1, any instance of large enough size must have a minimumof l0 small components, of total weight at least t0, corresponding to havingC(n)

i = li for each i, 1 ≤ i ≤ N0. Thus the minimal number of componentspossible for an instance of large size n is l0 + 1, made up of li componentsof size i, 1 ≤ i ≤ N0, and one remaining component of size n− t0, which iscertainly a possible instance provided that n−t0 > N0. We shall prove that,but for these unavoidable small components, the remaining componentshave sizes of orders similar to those of a random permutation: an instancewith k+ l0 components, for k fixed and as n→∞, has li small componentsof size i, 1 ≤ i ≤ N0, and the remaining k components are of sizes of ordersna1 , . . . , nak−1 and n, as above. This is the substance of Theorem 5.9.

Before proceeding to the theorem, we consider the asymptotics of theprobability IP[K0n = l0 + 1] that the number of components is equal toits smallest possible value. Now it follows directly from the ConditioningRelation, for n > N0 + t0, that

IP[K0n = l0 + 1] =IP[Zn−t0 = 1]IP[Zn−t0 = 0]

∏1≤i≤n IP[Zi = li]

IP[T0n = n]. (5.11)

Thus the asymptotics for IP[K0n = l0 + 1] depend to some extent on thestructure, unlike the universal behavior proved below in (5.13) and (5.15).Using only the Logarithmic Condition, the first fraction simplifies:

IP[Zn−t0 = 1]IP[Zn−t0 = 0]

∼ θ/n.


Under the (LLT) (5.6), the factor on the bottom of the second fraction alsosimplifies:

IP[T0n = n] ∼ pθ(1)/n = 1/(eγθΓ(θ)n).

In Chapter 5.2 we show that for assemblies, multisets, and selections, theLogarithmic Condition (3.3) already implies the (LLT). Condition (5.8)generalizes to the condition that

χ0 =∑i≥1

− log IP[Zi = li]− θ/i

exists and is finite, which implies the following asymptotics for the product:∏1≤i≤n

IP[Zi = li] ∼ e−χ0e−γθ n−θ.

Thus, when the local limit theorem holds, and χ0 exists and is finite, wehave the net result that

IP[K0n = l0 + 1] ∼ θ Γ(θ) e−χ0 n−θ, (5.12)

depending on the structure through l0 and χ0. See Hwang (1994, p. 116)for a result which agrees with this, in those cases which are common toboth discussions.

For assemblies, multisets and selections, l0 = 0 and

IP[K0n = l0 + 1] = IP[K0n = 1] = mn/p(n),

for which asymptotics are often known directly. For example, randommappings form the assembly with

mn = (n− 1)!∑j<n

nj/j! ∼ n!en/(2n),

p(n) = nn and θ = 1/2, so that mn/nn ∼

√2πn/(2n), and comparison

with (5.12) implies that√

2π = Γ(1/2) e−χ0 , so that χ0 = −(ln 2)/2.Polynomials over GF(q) form the multiset with p(n) = qn,mn ∼ qn/nand θ = 1, so comparison with (5.12) implies that χ0 = 0. Squarefreepolynomials over GF(q) form the selection with the same mn and θ asin the previous example, but with p(n) reduced by a factor which is theprobability (un, say) that a random polynomial of degree n is squarefree,so that now χ0 = limn(log un) ∈ (−∞, 0), a limit which always exists, butvaries with the choice of q. For more about the probability of having asingle component, see Bell et al. (2000).

Theorem 5.9 For fixed k ≥ 1, as n→∞,

IP[K0n = l0 + k]IP[K0n = l0 + 1]

∼ (θ log n)k−1

(k − 1)!, (5.13)


where IP[K0n = l0 + 1] is given exactly by (5.11) and asymptotics aretypically given by (5.12).

Let Yj(n) be the size of the jth smallest component. For the conditionaldistribution of (Y1

(n), Y2(n), . . . , Yl0+k

(n)) given that K0n = l0 + k, there is adeterministic limit for the l0 smallest components, and the logarithms tobase n of the next k− 1 smallest components converge to a sample of k− 1uniform [0,1] random variables;

IP[Ci(n) = li for 1 ≤ i ≤ N0 |K0n = l0 + k] → 1, (5.14)

and, conditionally on K0n = l0 + k,(log Y1+l0

(n)

log n, . . . ,

log Yk−1+l0(n)

log n

)→d (U[1], U[2], . . . , U[k−1]), (5.15)

where U[j] is the jth smallest of k − 1 independent random variablesdistributed uniformly in (0,1).

Proof. Assume throughout this proof that n > N0 + t0, and fix k ≥ 1.Write An,k for the good event that Ci(n) ∈ li, li + 1 for 1 ≤ i ≤ n, withexactly k indices i1, . . . , ik where the value li+1 is achieved, and note that,since n > N0, we have An,k ⊂ K0n = l0 + k. Observe also that there isequality when k = 1: An,1 = K0n = l0 + 1. We first show that

IP[An,k]/IP[K0n = l0 + 1] ∼ (θ log n)k−1/(k − 1)!,

and then we show that

IP[K0n = l0 + k \An,k]/IP[K0n = l0 + 1] = O((log n)k−2)

to conclude that (5.13) holds.For indices i1, . . . , ik specifying events belonging to An,k, it is necessary

that

ik = n− t0 − (i1 + · · ·+ ik−1). (5.16)

In the following argument, unions and sums indexed by i = (i1, . . . , ik−1)will always be taken under the assumption that

1 ≤ i1 < i2 < · · · < ik−1 < ik, (5.17)

with ik as above; in the case k = 1 there is exactly one value of i satisfyingthis — a null tuple, for which (5.16) then specifies i1 = n− t0 .

Given i satisfying (5.17), let

Gi = Ci1(n) = · · · = Cik

(n) = li + 1; K0n = l0 + k.

It follows from∑ni=1 iCi

(n) = n together with (5.16) and the indexingconvention (5.17) that

An,k =⋃i

Gi,


and it then follows from the Conditioning Relation (3.1) that

IP[An,k] =

∏1≤i≤n IP[Zi = li]IP[T0n = n]

∑i

∏1≤j≤k

r(ij),

where, for i ≥ 1,

r(i) =IP[Zi = li + 1]

IP[Zi = li]∈ [0,∞), (5.18)

and, by the Logarithmic Condition,

r(i) ∼ θ/i as i→∞, (5.19)

since li = 0 and r(i) = IP[Zi = 1]/IP[Zi = 0] for sufficiently large i. In par-ticular, for k = 1, there is only one term in the i–sum, and the asymptoticsare given by the Logarithmic Condition:

IP[K0n = l0 + 1] = IP[An,1] =


r(n− t0)

∼∏

1≤i≤n IP[Zi = li]IP[T0n = n]

θ

n.

Thus showing that IP[An,k]/IP[K0n = l0 + 1] ∼ (θ log n)k−1/(k − 1)! isequivalent to showing that∑

i

∏1≤j≤k

r(ij) ∼(θ log n)k−1

(k − 1)!θ

n(5.20)

for fixed k > 1 as n→∞.To conclude (5.20) from (5.19) is direct, as follows. Take m = m(n) to

satisfy m = o(n), logm ∼ log n and n/m = o(√

log n); for example, we canuse m = n(log n)−1/3. First we have an asymptotic lower bound∑

i

∏1≤j≤k

r(ij) ≥∑

i: ik−1<m

∏1≤j≤k

r(ij) (5.21)

∼ θ

n

∑i: ik−1<m

∏1≤j≤k−1

r(ij) (5.22)

∼ θ

n

(∑i<m

r(i)

)k−11

(k − 1)!(5.23)

∼ θ

n

(θ log n)k−1

(k − 1)!(5.24)

where (5.22) is justified using (5.19) andm = o(n), so that ik ∼ n uniformlyover all terms in the sum; (5.23) holds because∑

i1<···<ik−1<m

r(i1) · · · r(ik−1) ∼ (r(1) + · · ·+ r(m− 1))k−1/(k − 1)!


as m→∞ whenever supi≥1 r(i) <∞ and∑i≥1 r(i) = ∞; and (5.24) holds

by (5.19) again, together with logm ∼ log n. The difference between thetwo sides of (5.21) is∑

i: ik−1≥m

∏1≤j≤k

r(ij) ≤(

maxi≥m

r(i))2

n∑

i1<···<ik−2<n

∏1≤j≤k−2

r(ij)

∼(θ

m

)2

n(θ log n)k−2

(k − 2)!,

and this last expression is o(log n)k−1/n, using n/m = o(√

log n). Thuswe have proved (5.20).

The argument that

IP[K0n = l0 + k \An,k]/IP[K0n = l0 + 1] = O((log n)k−2

)is essentially the same as the above argument, in the following sense. Recallthat a “composition of k into m parts” is an ordered m-tuple (e1, . . . , em) ofpositive integers with e1 + · · ·+em = k; there are

(k−1m−1

)such compositions,

and a total of 2k−1 compositions of k, not restricting the number of parts.The event K0n = l0 + k is partitioned naturally into 2k−1 subevents,indexed by the compositions of k. The subevent A(e1,...,em) indexed by(e1, . . . , em) is the event that K0n = l0 + k and there are indices

1 ≤ i1 < · · · < im ≤ n

for which Cij(n) = lij + ej for j = 1 to m, and the good event An,k is the

subevent indexed by the composition with m = k and e1 = · · · = ek = 1.Observing that supe≥1 IP[Zi = li+e]/IP[Zi = li] ∼ θ/i can be used in placeof (5.19), the first part of proof essentially shows that for any compositionof k into m parts,

IP[A(e1,...,em)]/IP[K0n = l0 + 1] = O((log n)m−1

).

Adding these bounds over the 2k−1 − 1 subevents corresponding tocompositions with m ≤ k − 1 yields the desired bound.

We next show (5.14) together with (5.15), using a straightforwardgeneralization of the proof of Theorem 4.20. Let

0 < a1 < b1 ≤ a2 < b2 ≤ · · · ≤ ak−1 < bk−1 < 1.

Let En be the event

K0n = l0 + k; Ci(n) = li, 1 ≤ i ≤ N0; Yl0+j(n) ∈ (naj , nbj ), 1 ≤ j ≤ k− 1,

specifying the sizes of the l0 + k− 1 smallest components. Both (5.14) and(5.15) will follow if we show that, as n→∞,

IP[En]IP[K0n = l0 + k]

→ (k − 1)!∏

1≤j≤k−1

(bj − aj). (5.25)

5.2. Verifying the local limit conditions 123

For i1 > N0, the event Gi defined following (5.17) is such that

Gi = K0n = l0 + k,Ci(n) = li, 1 ≤ i ≤ N0, Ci1

(n) = · · · = Cik(n) = 1.

Once n is so large that t0 + knbk−1 < n, we find that En is the union ofthe disjoint events Gi over all i1, . . . , ik−1 with ij ∈ (naj , nbj ) for j = 1 tok − 1. Using

∑′i to denote a sum over this same range of i1, . . . , ik−1, it

follows that, for sufficiently large n,

IP[En] =


∑i

′ ∏1≤j≤k

r(ij).

Since bk−1 < 1, and recalling that ik = n− t0 − (i1 + · · ·+ ik−1), we haveik ∼ n uniformly over the range of summation; and using a1 > 0 and(5.19) we have

∏1≤j≤k−1 r(ij) ∼

∏1≤j≤k−1(θ/ij), again uniformly. Thus∑

i

′ ∏1≤j≤k

r(ij) ∼ θ

n

∑i

′ θ

i1i2 · · · ik−1

=θ

n

∏1≤j≤k−1

∑naj<m<nbj

θ

m

∼ θ

n

∏1≤j≤k−1

(bj − aj)θ log n .

Combining the above with (5.13) and (5.11) yields (5.25), as required. ut

5.2 Verifying the local limit conditions

We begin the section with two lemmas which show that probabilities of theform IP[Tbn = k] are typically small. The bounds are very crude, and inno way comparable to the limiting results of (LLT) and (LLTα), but theyprove to be useful tools when establishing the stronger results. Sharperbounds are established in Corollary 9.3.

Lemma 5.10 Let Z1, Z2, . . . be independent random variables taking valuesin ZZ+ and satisfying the Logarithmic Condition. If b = bn = o(n) as n →∞, then maxk≥0 IP[Tbn = k] → 0.

Proof. Given the sequence b = bn with b = o(n) as n → ∞, chooseb′ = b′n ≥ bn so that b′ = o(n) and ε(b′) log(n/b′) → 0 as n → ∞, whereε(·) is determined in Corollary 5.2. Then, since Tbn = Tbb′ + Tb′n and thetwo summands are independent,

maxk≥0

IP[Tbn = k] ≤ maxk≥0

IP[Tb′n = k].


Now, from the previous argument,

IP[Tb′n = k] ≤ IP[T ∗b′n = k] + ε(b′) log(n/b′),

and, by Lemma 4.12 (i) and (ii),

maxk≥0

IP[T ∗b′n = k] ≤ e−θ(h(n+1)−h(b′+1)) = O((b′/n)θ).

Hence maxk≥0 IP[Tbn = k] → 0 as n→∞. ut

If the sequence bn grows faster with n, the probability IP[Tbn = 0] is nottypically small; hence the next lemma has a slightly different statement.

Lemma 5.11 Let Z1, Z2, . . . be independent random variables taking valuesin ZZ+ and satisfying the Logarithmic Condition. If b = bn ∼ αn ∈ (0, 1) asn→∞, then maxk≥1 IP[Tbn = k] → 0.

Proof. From (5.5) and the inequality in (4.49), and because IP[T ∗bn = k] = 0,k = 1, . . . , b, we see that

IP[Tbn = k] ≤ IP[T ∗bn = k] + ε(b) log(n/b)≤ θ/b+ ε(b) log(n/b).

Since b ∼ αn, the proof is complete. ut

Equipped with these two lemmas, we now turn to the main object of thesection, proving that the local limit conditions hold for assemblies, multisetsand selections.

Assemblies

Random assemblies are decomposable combinatorial structures for whichthe counts Cj

(n) of components of size j satisfy the conditioning rela-tion (4.6) for Poisson distributed Zj with means

IEZj = λj =mjx

j

j!, for some x > 0.

In these models, the mj are prescribed in advance, and the probabilities

IP[C(n) = a] = IP[Z[1, n] = a |T0n = n]

=IP[Z[1, n] = a]1l

∑nj=1 jaj = n

IP[T0n = n]

=∏ni=1

[(mix

i/i!)ai/ai!]∑

a′:∑

ja′j=n

∏ni=1

[(mixi/i!)a

′i/a′i!

]


are the same for any value of x > 0. Hence, to be in the logarithmic class,it is enough that mj ∼ θ(j − 1)!yj for some y > 0, since we can then takex = y−1, so that jλj → θ and the Logarithmic Condition is satisfied.

Theorem 5.12 Both (LLT) and (LLTα) hold for all assemblies satisfyingthe Logarithmic Condition.

Proof. For (LLT), the proof depends on applying the size biasing equation(4.14) for the density of Tbn. It here takes the form

kIP[Tbn = k] =n∑

j=b+1

IP[Tbn = k − j]jλj , k = 0, 1, . . . (5.26)

The difference between this equation and Equation (4.51) is small forlarge n, because jλj → θ. To verify that the local limit law (LLT) forTbn indeed holds, we proceed as follows. According to equation (5.26),

kIP[Tbn = k] =n∑

j=b+1

IP[Tbn = k − j]jλj

= θIP[k − n ≤ Tbn < k − b] + rn(k) (5.27)

where

rn(k) =n∑

j=b+1

IP[Tbn = k − j](jλj − θ). (5.28)

The argument now concludes in the same way as that of Lemma 4.13, butusing Theorem 5.3 instead of Theorem 4.6, provided that we can show that|rn(k)| → 0 as n → ∞ when k/n → y > 0. To do this, let ε > 0 bearbitrary, and choose j0 = j0(ε) such that |jλj−θ| < ε for all j > j0. Thenfor n > j0

|rn(k)| ≤j0∑j=1

IP[Tbn = k − j]|jλj − θ|+ ε∑j>j0

IP[Tbn = k − j]

≤ maxj≥1

|jλj − θ| IP[k − j0 ≤ Tbn < k − 1] + ε.

Hence

lim supn→∞

|rn(k)| ≤ maxj≥1

|jλj−θ| lim supn→∞

supx>0

IP[x−j0/n ≤ n−1Tbn < x]+ε = ε,

because Tbn/n converges in distribution to Xθ, a random variable withcontinuous distribution function.

For (LLTα), we start from (5.27), but provide a different analysis of theerror term rn(k) in (5.28). Let ε > 0 be given, and choose j0 such that


|jλj − θ| < ε for all j > j0. Then, for b > j0,

|rn(k)| ≤n∑

j=b+1

IP[Tbn = k − j]ε ≤ ε.

The proof is then completed just as for Lemma 4.14, using Theorem 5.4 inplace of Theorem 4.9. ut

Remark. In the case of assemblies, IP[Zi = 0] = e−λi , so the quantity χdefined in (5.8) reduces to

χ =∑j≥1

λj −

θ

j

.

Multisets

For combinatorial multisets, the random variables Zi have negativebinomial distributions NB(mi, x

i), with

IP[Zi = k] =(mi + k − 1

k

)(1− xi)mixik, k = 0, 1, . . . ,

for any x ∈ (0, 1); once again, the mi are prescribed in the structure, andthe joint distribution of the component sizes is the same for any choice of x.Thus we have

IEZi =mix

i

1− xiand VarZi =

mixi

(1− xi)2,

and the logarithmic class consists of those structures for which

mi ∼θyi

ifor some y > 1, θ ∈ (0,∞),

since we then take x = y−1. We record that then

limi→∞

iIEZi = limi→∞

imixi = θ. (5.29)

Theorem 5.13 Both (LLT) and (LLTα) hold for all multisets satisfyingthe Logarithmic Condition.

Proof. For the (LLT), the recursion analogous to (4.15) and (5.26) for thedistribution of Tbn is, from Arratia and Tavare (1994),

kIP[Tbn = k] =k∑

j=b+1

gn(j)IP[Tbn = k − j], (5.30)


where

gn(j) = xjn∑

l=b+1; l|j

lml. (5.31)

This is already substantially more complicated than (5.26). However, wenote that, for j ≤ n,

gn(j) = g(j) = xjj∑

l=b+1; l|j

lml = jmjxj +O(xj/2),

and hence that

limj→∞

g(j) = θ; (5.32)

thus alsok∑

j=b+1

gn(j)IP[Tbn = k − j] =n∑

j=b+1

g(j)IP[Tbn = k − j] (5.33)

+1lk > nk∑

j=n+1

gn(j)IP[Tbn = k − j].

Now, for j > n, we have

gn(j) = xj∑

b+1≤l≤n:l|j

lml ≤ xj∑

1≤l≤n

lml

= xj−n∑

1≤l≤n

(lmlxl)xn−l ≤ θxj−n

n−1∑r=0

xr ≤ θxj−n

1− x, (5.34)

where

θ = supj≥1

jmjxj <∞

under the Logarithmic Condition. Applying Lemma 5.10 when k > n andusing (5.34) thus shows that, as n→∞,

k∑i=n+1

gn(i)IP[Tbn = k − i] ≤k∑

i=n+1

θxi−n

1− xIP[Tbn = k − i]

≤ θx

(1− x)2maxl≥0

IP[Tbn = l]

= o(1), (5.35)

uniformly in k > n, and hence

kIP[Tbn = k] =n∑

i=b+1

g(i)IP[Tbn = k − i] + o(1), (5.36)


uniformly in k ≥ 0.The method of proof used for assemblies, together with (5.32) and (5.36),

now shows that

kIP[Tbn = k] = θIP[k − n ≤ Tbn < k − b] + rn(k),

where rn(k) → 0 as n → ∞, uniformly in k, and the result, as forassemblies, now follows from Theorem 5.3.

In the case b ∼ αn, we need a bound different from (5.35), because thequantity IP[Tbn = 0] is substantial. Noting that, for k > n,

gn(k) = xkn∑

j=b+1

jmj1lj|k ≤ xkk/2∑j=1

jmj ≤ xkθ

k/2∑j=1

x−j ≤ θ

1− xxn/2,

we see that

gn(k)IP[Tbn = 0] ≤ θ(1− x)−1xn/2, k > n.

For the remaining terms, we apply Lemma 5.11 for k > n to get

k−1∑i=n+1

gn(i)IP[Tbn = k − i] ≤ (maxl≥1

IP[Tbn = l])θx

(1− x)2= o(1),

uniformly in k > n. It follows that (5.36) holds with the remainder uni-formly small in k ≥ 0, and the proof is completed as for the previous part,using Theorem 5.4. ut

Remark. In the case of multisets, IP[Zi = 0] = (1− xi)mi , so the quantityχ defined in (5.8) reduces to

χ =∑j≥1

−mj log(1− xj)− θ

j

;

if∑j≥1(mjx

j − θj ) exists and is finite, the same is true for χ.

Selections

The next case we consider is that of combinatorial selections, for whichthe Zj are binomially distributed with

IP[Zi = k] =(mi

k

)(xi

1 + xi

)k ( 11 + xi

)mi−k

, k = 0, 1, . . . ,mi,

for any x > 0. Once more, the assumption that

mi ∼θyi

i


is the crucial one. In this case, we take x = y−1 ∈ (0, 1), and theLogarithmic Condition holds.

Theorem 5.14 Both (LLT) and (LLTα) hold for all selections satisfyingthe Logarithmic Condition.

Proof. For (LLT), the method of proof is as before, but is this time basedon the recurrence in (5.30), where

gn(j) = xjn∑

l=b+1; l|j

(−1)(j/l)−1lml; (5.37)

see Arratia and Tavare (1994). The steps that lead to (5.36) follow imme-diately, with appropriate modification for the alternating signs in the sumdefining gn(j). ut

General combinatorial structures

Now suppose that the Zi are arbitrary ZZ+-valued random variables withmeans IEZi = λi, satisfying the Logarithmic Condition. In those combi-natorial settings that we are aware of (for example Hansen and Schmutz(1993)), Zj can be decomposed into the sum of mj i.i.d. random variablesYj1, . . . , Yjmj , each with probability generating function φj(s) and mean

IEYj1 = yj ,

with the yj eventually decreasing, and such that

jIEZj = jmjyj → θ ∈ (0,∞).

The previous proofs made use of Theorems 5.3 and 5.4 and Lemma 5.10,which apply quite generally, together with the size biasing equation (4.51)for the point probabilities IP[Tbn = k], which needs to be replaced with amore general recursion. Since

IEsTbn =n∏

j=b+1

(φj(sj)

)mj,

logarithmic differentiation leads to

kIP[Tbn = k] =k∑l=1

gn(l)IP[Tbn = k − l], (5.38)

where

gn(l) =n∑

j=b+1

jmj [sl−j ]φ′j(s

j)φj(sj)

, (5.39)


[xl]f(x) denoting the coefficient of xl in f(x). This is superficially promis-ing, but the following example shows that the recursion is not easy touse.

Example 5.15 Suppose that a combinatorial structure is conditioned tohave at most one component of each size. If the original structure C(n)

satisfies (3.1), then

IP[C(n) = a | C(n) ≤ ~1] = IP[Z[1, n] = a | Z[1, n] ≤ ~1, T0n = n]= IP[Z[1, n] = a |T0n = n],

where ~1 = (1, 1, . . . , 1) and Z = (Z1, Z2, . . .) is a vector of independentBernoulli random variables satisfying

IP[Zj = a] = IP[Zj = a | Zj ≤ 1], a = 0, 1. (5.40)

If the original Zj satisfy the Logarithmic Condition (5.1), then so too dothe Zj . That is,

πj = IP[Zj = 1] ∼ θ

j, θ ∈ (0,∞), (5.41)

and then jIEZj = jIP[Zj = 1] → θ automatically.To try to establish (LLT) for the Zi, note that the point probabilities

IP[Tbn = k] satisfy an equation of the form (5.38). Adapting (5.39) to thepresent setting with mj = 1, yj = πj and φj(s) = 1− πj + πjs leads, aftersome simplification, to the fact that

gn(l) = −n∑

j=b+1,j|l

(−1)l/jjhl/jj , hj =πj

1− πj. (5.42)

It seems difficult to make progress with this approach, although a directattack as in the earlier sections may work.

A Stein recursion

To replace (5.39), we use an alternative recursion, derived from Stein’smethod, that makes the analysis entirely transparent. Suppose that wetake Yi, i = 1, 2, . . . , n, to be independent Bernoulli random variables with

IP[Yj = 1] = πj ,

and let

W = Wb = (b+ 1)Yb+1 + · · ·+ nYn, W(i) = W − iYi, b < i ≤ n.

Then straightforward calculation shows that

IEWg(W ) = IEn∑

i=b+1

iYig(W )


=n∑

i=b+1

i1∑l=0

IE(Yig(W ) |Yi = l)IP[Yi = l]

=n∑

i=b+1

iπiIE(g(W ) |Yi = 1)

=n∑

i=b+1

iπiIEg(W(i) + i)

=n∑

i=b+1

iπiIEg(W + i)

+n∑

i=b+1

iπi[IEg(W(i) + i)− IEg(W + i)

].

But

IEg(W + i) = πiIEg(W(i) + 2i) + (1− πi)IEg(W(i) + i),

whence

IEWg(W ) =n∑

i=b+1

iπig(W + i)

+n∑

i=b+1

iπ2i

[IEg(W(i) + i)− IEg(W(i) + 2i)

]. (5.43)

Specializing to the case g = 1lk gives

kIP[W = k] =n∑

i=b+1

iπiIP[W = k − i] (5.44)

+n∑

i=b+1

iπ2i

[IP[W(i) = k − i]− IP[W(i) = k − 2i]

].

We apply the recursion in (5.44) to deduce the (LLT), taking for W therandom variable Tbn of Example 5.15, giving a much simpler scheme thanthe one determined by (5.30) and (5.42). We start by obtaining a boundon the second sum in (5.44) when b = bn = o(n), for which we do not evenneed to exploit the differing signs. Let I be the fixed (finite) set of indices ifor which πi > 1/2, and split the indices 1 ≤ i ≤ n into three ranges:i ∈ I, i ∈ [cn] \ I and cn < i ≤ n, where cn is any sequence satisfyingcn ≥ maxi ∈ I and limn→∞ cn = ∞. Note that

IP[Tbn = j] = IP[Tbn(i) = j − i]πi + IP[Tbn(i) = j](1− πi),

where Tbn(i) = Tbn − iZi, so that, if πi > 1/2,

IP[Tbn(i) = j − i] ≤ π−1i IP[Tbn = j] ≤ 2IP[Tbn = j],


whereas, if πi ≤ 1/2,

IP[Tbn(i) = j] ≤ 2IP[Tbn = j].

Hence we observe thatn∑i=1

iπ2i IP[Tbn(i) = k − i]

≤ 2θ2

∑i∈I

IP[Tbn = k] +∑

i∈[cn]\I

IP[Tbn = k − i]

+ c−1n

n∑i=cn+1

IP[Tbn = k − i]

≤ 2θ2|I|IP

[n−1Tbn =

k

n

]+ IP

[k

n− cn

n≤ n−1Tbn <

k

n

]+ c−1

n IP[k

n− 1 ≤ n−1Tbn <

k

n− cn

n

],

where θ = supi≥1 iπi. Now since, by Theorem 5.3, n−1Tbn converges indistribution to the limit Xθ with density pθ whenever bn = o(n), the firsttwo probabilities converge to zero as n → ∞, provided that we choosecn = o(n); the third element converges to zero because cn →∞. The sum∑ni=1 iπ

2i IP[Tbn(i) = k − 2i] is shown to be small in similar fashion. The

verification of (LLT) is now completed using the recursion in (5.44), withthe first term handled just as in the case of assemblies.

5.3 Refinements and extensions

The results of this chapter show that much can be achieved by relativelyelementary means, but the limitations of the standard recursive techniquesfor obtaining tractable substitutes for the size biasing equation (4.51) havealready become apparent, restricting us in effect to the classical combi-natorial structures of assemblies, multisets and selections. Moreover, onlythe simplest of the claims of Chapter 2 have so far been proved, even inthese reduced circumstances. To establish Theorems 3.2, 3.3, 3.4 and 3.5in full generality, more detailed and systematic analogues of the size bi-asing equation are needed. Fortunately, Stein’s method turns out to yieldexactly what is required; the theoretical basis for the method is presentedin Chapter 8.

There are a number of building blocks in the general argument, mostof them having precusors in this and the previous chapters. The first isthat presaged by the convergence n−1Tbn →d Xθ of Theorem 5.3; this isgarnished with error estimates in Chapter 10. The second has its roots

5.3. Refinements and extensions 133

in the upper bounds on point probabilities of Lemma 5.10 and earlier ofLemma 4.12; their equivalents in a general setting are proved in Chap-ter 9. Thirdly, there are refinements of the size biasing equation (4.51),which add error estimates to the asymptotic approximation of IP[Tbn = m]in (LLT) in Chapter 11. Finally, for the sharpest theorems, the differences(IP[Tbn = m + 1] − IP[Tbn = m]) also have to be accurately estimated, atask carried out in Chapter 9, necessitating very precise analogues of thesize biasing equation. Much of this argument requires painstaking and de-tailed calculation, but the results are well worth the effort; by using Stein’smethod in place of the traditional generating function techniques, a quiteastonishing level of generality and precision is achieved. However, the readershould beware; progress through the later chapters is heavy going.


6General setting

We now start on our more detailed study of the combinatorial structuresC = C(n) which satisfy the Conditioning Relation and the LogarithmicCondition. Our primary aim is to prove sharper forms of the two maindiscrete approximation theorems stated in (3.10) and (3.11), with errorestimates of the kind given in Theorem 3.2. To obtain the most usefulerror bounds, it is necessary to make some uniformity assumptions aboutthe distributions of the underlying independent random variables Zi, i ≥ 1.In this chapter, we investigate what extra assumptions may be necessary,while still operating in as general a setting as possible.

6.1 Strategy

To start with, we investigate the broad requirements for theorems of thisnature to be possible. We begin with the second of the discrete approxi-mations. This is a genuine invariance principle: for all the combinatorialstructures satisfying our conditions, L(C[b+1, n]) is close to L(C∗[b+1, n])in total variation, for large b, where C∗ = C∗(n) denotes the vector of countsof the numbers of cycles of lengths 1,. . . , n in a θ–biased random permuta-tion of n objects, and θ = limi→∞ iIEZi. Thus the Ewens Sampling Formulagives a valid approximation to the joint distribution of the sizes of the largecomponents for all such structures.

6.1. Strategy 135

To prove the invariance principle, we proceed as follows. For any y ∈ ZZn+such that

∑ni=1 iyi = n, the Conditioning Relation gives

IP [C[b+ 1, n] = y[b+ 1, n]] =IP [Z[b+ 1, n] = y[b+ 1, n]] IP[T0b(Z) = l]

IP[T0n(Z) = n],

(6.1)where l = n−

∑ni=b+1 iyi and, here and in all that follows, for any x ∈ ZZ∞+ ,

we use the notation

Tvm(x) =m∑

i=v+1

ixi; Kvm(x) =m∑

i=v+1

xi : (6.2)

the representation (6.1) is the same as that already exploited in (4.88)and (4.90). Using (6.1) also for C∗, it thus follows that the ratio of theprobability densities of C[b+ 1, n] and C∗[b+ 1, n] at y[b+ 1, n] is given by

IP [C[b+ 1, n] = y[b+ 1, n]]IP [C∗[b+ 1, n] = y[b+ 1, n]]

(6.3)

=IP [Z[b+ 1, n] = y[b+ 1, n]]IP [Z∗[b+ 1, n] = y[b+ 1, n]]

IP[T0b(Z) = l]IP[T0b(Z∗) = l]

IP[T0n(Z∗) = n]IP[T0n(Z) = n]

.

The first of these ratios is close to 1 if the probability densities of Zi and Z∗iare close enough for i > b, because independence reduces it to a simpleproduct of the ratios of individual probabilities. The second and third ra-tios are also close to 1, provided that the probability densities of T0m(Z)and T0m(Z∗) are close enough.

Take the first of these requirements. The distribution of Z∗i is Pois-son Po(θ/i), and so an obvious way of measuring the difference betweenthe probability densities of Zi and Z∗i is in terms of the quantities

IP[Zi = l]− e−θ/i(θ

i

)l 1l!, l ≥ 0.

However, in many of the classical examples of logarithmic combinatorialstructures, Zi has the distribution of a sum of ri independent and identicallydistributed integer valued random variables, each of which takes the value 0with high probability, and this structure in itself makes the distributionof Zi more like the Poisson. We exploit any such structure as follows.

First, we observe that Z∗ ∼ Po(θ/i) can be interpreted as a sum∑Nj=1 Yj ,

where N ∼ Po(θ/i) and the Yj are independent, with IP[Yj = 1] = 1 forall j. We then express the closeness of the distributions of Zi and Z∗i byexpressing Zi in similar form, but now with N ∼ Bi (ri, θ(1 + Ei0)/iri) forsome ri ≥ 1, and with

IP[Yj = 1] = (1 + εi1)/(1 +Ei0); IP[Yj = l] = εil/(1 +Ei0), l ≥ 2, (6.4)

where Ei0 =∑l≥1 εil. Any random variable Zi on ZZ+ can have its dis-

tribution represented in this way, by taking ri = 1 and Ei0 to satisfy

136 6. General setting

i−1θ(1 + Ei0) = IP[Zi ≥ 1], and then by defining

εil = (1 + Ei0)IP[Zi = l |Zi ≥ 1] = iθ−1IP[Zi = l], l ≥ 2;εi1 = iθ−1IP[Zi = 1]− 1.

However, the greater the value of ri that can be taken in representing thedistribution of Zi, the closer the distribution of N is to a Poisson distribu-tion, now with mean i−1θ(1+Ei0); the smaller the value of |Ei0|, the closerthe mean to the ideal θ/i. Indeed, if the Zi have infinitely divisible distri-butions, the ri can be chosen to be arbitrarily large, making N preciselyPoisson distributed, though, if Ei0 6= 0, still not with the desired mean.The remaining aim is to make IP[Yj = 1] close to 1, achieved if the |εil|are small for all l ≥ 1. Thus the Logarithmic Condition (3.3) emerges as anatural requirement, if (3.11) and its refinements are to hold.

In what follows, we derive bounds for the errors in our approximationsas explicit formulae expressed in terms of the quantities εil; i, l ≥ 1 andof the ri; i ≥ 1, for any particular representation of the distributionsof the Zi that may be valid; where there are many, one is free to choosethe representation which gives the best results. Most of the theorems arehowever stated in a more readable form, with the approximation errorsexpressed as order statements under asymptotic regimes. The LogarithmicCondition is a part of all these regimes, but it is not actually a conditionneeded for the explicit estimates to hold; rather, if it does not hold, theerror estimates will not become small in the limit.

Supposing that the εil are small enough and the ri large enough to makethe first ratio of probabilities in (6.3) close enough to 1, it then remainsto be shown that the densities of T0n(Z) and T0n(Z∗) are close to one an-other. In Chapter 5, this was accomplished by proving the (LLT), whichstates that the limiting asymptotics for n−1IP[T0n(Z) = m] are the sameas those for n−1IP[T0n(Z∗) = m], as established for assemblies, multisetsand selections in Theorems 5.12, 5.13 and 5.14. The proofs of the (LLT)made essential use of size–biasing, and this accounted for the major part ofthe argument. However, the size–biasing technique seems to be of only lim-ited usefulness, and, for more general logarithmic combinatorial structures,another approach is needed.

Showing the asymptotic equivalence of the densities of T0n(Z) andT0n(Z∗) can be viewed as a one–dimensional local limit problem for asum of independent random variables. However, it is not of standard form,since, as observed in Theorem 4.6, the limit Xθ of n−1T0n(Z∗) is not nor-mally distributed. In fact, T0n(Z∗) has a compound Poisson distribution,and Xθ is infinitely divisible with Levy measure θx−11l0<x≤1. In orderto address this local limit problem, we now make direct comparison be-tween IEf(T0n(Z)) and IEf(T0n(Z∗)) for test functions f — for instance,f(w) = 1lw=m, w ∈ ZZ+, for values ofm = 0, 1, . . .— using Stein’s methodfor compound Poisson approximation. Even in this context, the approxi-

6.1. Strategy 137

mation problem is not a standard one, and new bounds for the solution ofthe Stein Equation are required; the necessary theory is given in Chapter 8.

Sharp estimates of the difference between the probability densities ofT0n(Z) and T0n(Z∗) thus lie at the heart of the proof of the main ap-proximation theorem for the large components. Very similar estimates areneeded for the approximation of the joint distribution of the small compo-nents, showing that L(C[1, b]) is close to L(Z[1, b]), the joint distribution ofthe small components in the independent process, if b is not too large com-pared to n; as is intuitively plausible, the conditioning on T0n(Z) = nthen makes little difference for the small components. The key observation,that

L (C[1, b] |T0b(C) = l) = L (Z[1, b] |T0b(Z) = l) ,

means that

dTV(L(C[1, b]),L(Z[1, b])

)= dTV

(L(T0b(C)),L(T0b(Z))

)= 1

2

∑r≥0

|IP[T0b(Z) = r]− IP[T0b(Z) = r |T0n(Z) = n]| (6.5)

= 12

∑r≥0

IP[T0b(Z) = r]IP[T0n(Z) = n]

|IP[T0n(Z) = n]− IP[Tbn(Z) = n− r]| ,

by the independence of the Zi. Then, dissecting IP[T0n(Z) = n] accordingto the value of T0b(Z), this expression can be rewritten to give

dTV(L(C[1, b]),L(Z[1, b])

)= 1

2

∑r≥0

IP[T0b(Z) = r]IP[T0n(Z) = n]

(6.6)

×

∣∣∣∣∣∣∑s≥0

IP[T0b(Z) = s] IP[Tbn(Z) = n− s]− IP[Tbn(Z) = n− r]

∣∣∣∣∣∣ .Now Theorem 5.3 shows that n−1T0n(Z) converges in distribution to

a proper random variable Xθ with density pθ under the LogarithmicCondition alone, suggesting that the denominator IP[T0n(Z) = n] is oforder n−1, as is indeed the case whenever (LLT) holds. Because n−1

is so small, we need extremely accurate bounds on the differences|IP[Tbn(Z) = n− s]− IP[Tbn(Z) = n− r]|, for all but exceptional values rof T0b(Z), if we are to show that dTV

(L(C[1, b]),L(Z[1, b])

)is small. We

also need very precise bounds on the probability that T0b(Z) takes excep-tional values, again because of the small denominator in (6.6); exceptionaltranslates in practice into values bigger than n/2. Since we already needthe Stein machinery for compound Poisson approximation to prove thecloseness of the distributions of T0n(Z) and T0n(Z∗), when approximat-ing the large components, it is convenient to use it for these estimates aswell. For instance, we use the Stein Equation to relate IP[T0n(Z) = n] ton−1IP[T0n(Z) < n], and then, comparing the probabilities IP[T0n(Z) < n]


and IP[T0n(Z∗) < n], conclude that IP[T0n(Z) = n] is indeed of strictorder n−1.

6.2 Basic framework

Our general framework, an alternative version of (6.4), is formulated asfollows. For each i ≥ 1, we suppose that Zi =

∑ri

j=1 Zij for some ri ≥ 1,where the family Z = (Zij ; 1 ≤ j ≤ ri, i ≥ 1) consists of independentrandom variables over ZZ+, with distributions given by

IP[Zij = 0] = 1− θiri

(1 + Ei0);

IP[Zij = 1] = θiri

(1 + εi1); IP[Zij = l] = θiriεil, l ≥ 2 :

(6.7)

here,

Ei0 =∑l≥1

εil, (6.8)

and we also define

ε∗il = maxj>i

|εjl|, l ≥ 1. (6.9)

The Zij thus have distributions which are close to Be (θ/iri) for large i, ifthe εij are small, and Zi then has a distribution close to Po(θ/i).

Various quantities are used in our estimates to portray the smallness ofthe εil, the chief of which are combinations analogous to moment and tailsums:

ρi(Z) = |εi1|+∑l≥2

εil ≥ Ei0; ρ∗i (Z) =∑l≥1

ε∗il ≥ maxj>i

ρj(Z);

Eij(Z) =∑l>j

εil; E∗ij(Z) =

∑l>j

ε∗il;

Fij(Z) =∑l>j

lεil; F ∗ij(Z) =∑l>j

lε∗il;

µi(Z) = |εi1|+ Fi1 =∑l≥1

l|εil|; µ∗i (Z) = ε∗i1 + F ∗i1 =∑l≥1

lε∗il;

νi(Z) =∑l≥1

l2|εil|; ν∗i (Z) =∑l≥1

l2ε∗il; ∆i(Z) = |εi1 − εi+1,1|,

where the last of these is small if the εi1 are small, but is even smaller ifsuccessive values of the εi1 differ relatively little from one another. There

6.2. Basic framework 139

are also some more complicated combinations. For 0 ≤ α ≤ 1, we define

χ(α)i1 (n,Z) =

bn/ic∑l=b(n+1)/2ic+1

( n+ 1n− il + 1

)1−αlεil, 1 ≤ i ≤ b(n+ 1)/2c;

χ(α)i2 (n,Z) =


( n+ 1n− il + 1

)1−αlεi,l−1, 1 ≤ i ≤ b(n+ 1)/4c;

φα1 (n) =b(n+1)/2c∑

i=1

χ(α)i1 (n); φα2 (n) =

b(n+1)/4c∑i=1

r−1i χ

(α)i2 (n);

φα3 (n) =b(n+1)/4c∑

i=1

r−1i |εi1|χ(α)

i2 (n),

and we also need two number theoretic quantities derived from thedistributions of the Zij :

u1(b, s) = (s+ 1)bs/2c∑i=b+1

∑l≥2il=s

lεil, u∗1(n) = maxn/4≤s≤n

u1(0, s);

u2(b, s) = (s+ 1)bs/3c∑i=b+1

∑l≥2

i(l+1)=s

(l + 1)εilr−1i , u∗2(n) = max

n/4≤s≤nu2(0, s).

These latter combinations are all functions of the εil for l ≥ 2, and allreduce to 0 if the Zij are Bernoulli distributed; also, if the Zij are infinitelydivisible, so that the ri may be taken arbitrarily large, only φα1 and u1

are possibly not zero. If the εil are as small as in most of the classicalexamples, all of these combinations are extremely small; see, for example,Condition (G) below. We also define

En(Z) = max1≤j≤n

−n∑

i=j+1

i−1Ei0

; p−i (Z) = infj>i

IP[Zij = 0];

r−i = minj>i

rj ; S(n) =n∑i=1

1/(iri). (6.10)

As is implicit in the Logarithmic Condition (3.3), p−i → 1 as i → ∞. Thequantity En approaches a finite limit under all our working conditions,and it is in any case of order o(log n) as n → ∞ if Ei0 → 0. The r−i areusually large enough that S(∞) is finite, and S(n) = O(log n) under allcircumstances.

In what follows, we use the notation Z[u, v] to denote (Zu, . . . , Zv), andZ[u, v] denotes (Zij ; 1 ≤ j ≤ ri, u ≤ i ≤ v). The combinatorial quantityof primary interest is then a vector C = C(n) = (C1, . . . , Cn) of countsof elements of sizes 1, 2, . . . , n, related to the Z’s through satisfying the


Conditioning Relation

L(C(n)) = L(Z[1, n] |T0n(Z) = n),

where T0n is as defined in (6.2). C can also be constructed via a ‘dissected’vector C, defined to have the distribution L(Z[1, n] |T0n(Z) = n), and theelements of C may then also have a combinatorial interpretation, thoughthis need not be the case. The notation Z∗ is used to denote an array ofindependent random variables of the same dimensions as Z, but satisfyingZ∗ij ∼ Po(θ/iri). The corresponding combinatorial quantity, denoted byC∗ = C∗(n), is distributed according to the Ewens Sampling Formula withparameter θ.

Remark. The following conventions are adopted throughout. Quantitiesdenoted by ε with indices are small under best circumstances; those denotedby φ with indices are of order 1 under best circumstances. In either case,some conditions should be satisfied to ensure this; note, in particular, thatφ11.7(n) can easily be of order nα for some α < 1. An index such asthat in φ11.7(n) denotes the lemma or theorem in which the quantity isdefined; c(7.46) refers to the corresponding equation number. We use thenotation θ for min1, θ throughout.

6.3 Working conditions

In the chapters that follow, we establish error bounds for our approx-imations which are valid for (almost) any n, irrespective of particularasymptotic settings. However, their form turns out to be rather compli-cated, the structure becoming clearer only if certain asymptotic conditionsare fulfilled. For this reason, to lighten the presentation, the bounds on theapproximation errors that we derive in our theorems are stated in termsof asymptotic order estimates, with the detailed formulae for the estimatesleft in the body of the proofs. In this section, we discuss the assumptionsthat we make in order to justify such order estimates.

Simplifying assumptions

It is natural to require that p−i → 1 and that µi → 0, which are equivalentto the Logarithmic Condition. However, it is also convenient to presupposesome uniformity in the behavior of the εil; we assume throughout that

µ∗0 <∞. (6.11)

In most applications, stronger simplifying assumptions can be made. Forinstance, if the Zi ∼

∑l≥1 lPo(i−1θλil) are infinitely divisible, with λi1 → 1

6.3. Working conditions 141

and∑l≥2 lλil → 0, then Z can be chosen to have the ri arbitrarily large,

even though, in the combinatorial context, the resulting elements of C willusually have no direct meaning. The corresponding quantities εil are thengiven by

εi1 = λi1 − 1; εil = λil, l ≥ 2,

and formulae involving the pi and ri are interpreted as if pi = 1 and ri = ∞.An important example is that when Zi ∼ NB (mi, qi) and imiqi → θ asi → ∞, in which case λil = imiq

li/lθ for l ≥ 1. In particular, for infinitely

divisible Zi,

φα2 (n) = φα3 (n) = 0 and u2(0, n) = 0 (6.12)

for all n.Another important simplification occurs when the Zij are all Bernoulli

random variables, when εil = 0 for all i ≥ 1 and l ≥ 2, so that, for all iand r,

Ei1 = Fi1 = 0; ρi = µi = νi = |εi1|; χ(α)ir (n) = 0; φαr (n) = 0.

The case where the Zi have Poisson distributions is equivalent to the com-bination of these two conditions. Further simplification occurs when, inaddition, εi1 = 0 for all i, so that all the above measures of departure fromthe ideal are identically zero.

The classical combinatorial structures covered in Theorem 3.2 all satisfyrather weaker simplifying assumptions. In all cases, there exist C, g1 > 0and 0 < c < 1 such that

εi1 = O(i−g1);∑l≥m

lεil ≤ Cci(m−1), m ≥ 2, and ri ≥ c−i : (6.13)

we refer to this as Condition (G). However, almost all the estimates thatwe prove are of best order under much weaker assumptions still. For thisreason, we define alternative conditions:

Condition (Ar): εi1 = O(i−g1) for some g1 > r;

Condition (Dr): ∆i = O(i−g2) for some g2 > r;

Condition (Brs): for l ≥ 2, lεil ≤ Ci−a1 l−a2 for some fixed C > 0,a1 > r and a2 > s.

The combination of (A0), (D1) and (B12) suffices for the best order state-ments in all cases, though, for many purposes, even weaker assumptions areenough. Note that (B01) is the weakest condition of its kind to be possible,if µ∗0 <∞ is to hold automatically, and that (Ar) always implies (Dr).


Order estimates

In terms of the above conditions, the Eij , Fij and µi, ρi are easily estimatedin the following proposition, for which we give no proof.

Proposition 6.1 If Conditions (A0) and (B01) hold, we have

ε∗ij(Z) = O((i+ 1)−a1j−(a2+1)

), j ≥ 2; ε∗i1(Z) = O

((i+ 1)−g1

);

ρ∗i (Z) = O(i−(g1∧a1)

); µ∗i (Z) = O

(i−(g1∧a1)

);

E∗ij(Z) = O

(i−a1j−a2

)and F ∗ij(Z) = O

(i−a1j−(a2−1)

), j ≥ 1.

The quantities φαr (s) and ur(0, s) are less transparent. For ideal rates,they should be uniformly bounded in s, and this turns out to be true underrather weak conditions. Note that these quantities depend only on the εilfor l ≥ 2, so that a B–condition is all that need be specified: note also thesimple estimate φα3 (s) ≤ ε∗01φ

α2 (s).

Proposition 6.2 If Condition (B01) holds, then, for any 0 ≤ α ≤ 1 andany δ > 0, we have

(a) ur(0, s), φαr (s) = O(s1−(a1∧a2)+δ

), r = 1, 2;

(b)m∑i=1

χ(α)i1 (s) = O

(s1−a2mδ+(a2−a1)+

), 1 ≤ m ≤ s;

(c)b(s+1)/2c∑

i=1

i−1χ(α)i1 (s) = O

(s−[a1∧(a2−1)]+δ

).

In particular, under (B11), the ur(0, s) and the φαr (s) are uniformly boundedin s.

Proof. First, check the ur(0, s), which are bounded by

(s+ 1)∑l|s

(l/s)a1 l−a2 . (6.14)

Now, from elementary properties of the divisor functions (Tenenbaum(1995), Section I.5) , for any δ > 0,∑

l|s

lκ =O(sκ+δ), κ ≥ 0;O(sδ), κ < 0.

Hence (6.14) is of order s1−a1sδ+(a1−a2)+

= s1−(a1∧a2)+δ for any δ > 0.

6.3. Working conditions 143

Turning to the φαr , observe first that

χ(α)i1 (s) =

bs/ic∑l=b(s+1)/2ic+1

(s+ 1

s− il + 1

)1−α

lεil

≤(

maxl>(s+1)/2i

lεil

) bs/ic∑l=b(s+1)/2ic+1

(s+ 1

s− il + 1

)1−α

.

Letting Rsi = s − ibs/ic be the remainder when dividing s by i, we have,for any 0 < α ≤ 1 and i ≤ (s+ 1)/2,


(1

s− il + 1

)1−α

≤ (1 +Rsi)−(1−α) +b(s+1)/2ic∑

j=1

ij−(1−α)

≤ (1 +Rsi)−(1−α) + α−12−αi−1(s+ 1)α.

Thus, under (B01), for 0 < α ≤ 1 and 1 ≤ i ≤ (s+ 1)/2,

χ(α)i1 (s) ≤ Ci−a1

(s+ 1

2i

)−a2

(s+ 1)1−α

×

(1 +Rsi)−(1−α) + α−12−αi−1(s+ 1)α

(6.15)

= O(s1−α−(a1∧a2) + s−a1∧(a2−1)

)= O

(s1−a2∧(α+a1)

). (6.16)

Similar computations give χ(0)i1 (s) = O(s1−(a1∧a2) log s) and the same order

estimates for χ(α)i2 (s).

Now φα1 (s) =∑b(s+1)/2ci=1 χ

(α)i1 (s) is of the same form as the more general

sum in (b), but with a specific choice of m. So taking the sum in (b), webound it using (6.15). For the first element, we simply use the inequalityi−(a1−a2) ≤ 1 if a1 ≥ a2, givingm∑i=1

(s+ 1)1−α−a2

ia1−a2(1 +Rsi)1−α≤ (s+ 1)1−α−a2

m∑i=1

(1 +Rsi)−(1−α)

≤ (s+ 1)1−α−a2

m∑r=1

r−(1−α) max1≤i≤s

∑l≥1

1ll|s−i

= O(s1−α−a2+δmα

),

for any δ > 0, by the properties of the divisor function. If a1 < a2, boundi−(a1−a2) by ma2−a1 , and argue in the same way. For the second element,


by simple summation,m∑i=1

(s+ 1)1−a2ia2−a1−1 = O(s1−a2

[m(a2−a1)+ + 1la1=a2 logm

]).

Combining these two estimates completes the proof of the bound inpart (b), and thereby of φα1 (s), for 0 < α ≤ 1. The remaining computations,for α = 0 and for φα2 (s), are analogous.

For the last part, argue as in the proof of the estimates for φα1 (s), startingfrom (6.15). The first element gives

b(s+1)/2c∑i=1

(s+ 1)1−α−a2

ia1−a2+1(1 +Rsi)1−α= O

(sδ−(a1∧(a2−1))

),

for any δ > 0, arguing separately according to the sign of a1 − a2 + 1. Forthe second element, by simple summation,

b(s+1)/2c∑i=1

(s+ 1)1−a2ia2−a1−2 = O(s−(a1∧(a2−1))[1 + 1la1=a2−1 log s]

).

ut

Remark. The quantities ur(0, s) and φαr (s) are of much smaller order,if Condition (G) is satisfied. The argument in the proof of Proposi-tion 6.2 (a) can be adapted to show that ur(0, s) = O(scs/2) and thatφαr (s) = O(s2cs/4).

The following simple moment bounds are also widely used, and requireno special conditions.

Lemma 6.3 Let Tvm(Z) =∑mi=v+1 iZi be as defined in (6.2). Then, for

any 0 ≤ v < m,

IETvm(Z) ≤ θm∑

i=v+1

(1 + µi); VarTvm(Z) ≤ θm∑

i=v+1

i(1 + νi).

If µ∗0 <∞, IETvm(Z) = O(m); if ν∗0 <∞, VarTvm(Z) = O(m2).

Proof. Using the dissected random variables, we have

Tvm =m∑

i=v+1

ri∑j=1

iZij ,

where the Zij are independent, with distributions as in (6.7). Hence

IETvm =m∑

i=v+1

θ

1 + εi1 +∑l≥2

lεil

≤ θm∑

i=v+1

(1 + µi)

6.4. Tilting 145

and

VarTvm =m∑

i=v+1

i2ri∑j=1

VarZij ≤m∑

i=v+1

i2riIEZ2ij

=m∑

i=v+1

iθ

1 + εi1 +∑l≥2

l2εil

≤ θm∑

i=v+1

i(1 + νi).

ut

6.4 Tilting

The Ewens Sampling Formula ESF(ϕ) is derived from the uniform dis-tribution over permutations of n objects by giving each permutation aweight proportional to ϕκ, where κ =

∑nj=1 C

∗(n)j is the number of cycles.

The same weighting can be used to generate non–uniform distributions onother combinatorial structures, as was discussed for assemblies, multisetsand selections in Chapter 2.3. These new distributions have much the sameform as that of the original structure. The effect of tilting on our workingconditions can be described as follows.

Given Z = (Z1, Z2, . . .) and ϕ > 0 such that IEϕZi < ∞ for all i,we define the “ϕ-tilted structure” to be based on independent randomvariables Zi(ϕ) with joint distribution determined by the equations

IP[Zi(ϕ) = l] =ϕlIP[Zi = l]

IEϕZi, i ≥ 1, l ≥ 0.

Tilting can equally well be applied to the dissected family Z; if we writeMi(ϕ) =

(IEϕZi

)1/ri , then the law of Z(ϕ) has independent Zij(ϕ) with

IP[Zij(ϕ) = l] =ϕlIP[Zij = l]

IEϕZij=ϕlIP[Zij = l]

Mi(ϕ)

for i ≥ 1, 1 ≤ j ≤ ri and l ≥ 0. The quantities εil specified by (6.7) thenhave corresponding versions εil(ϕ) for the Zi(ϕ), with ϕθ now in the roleof θ. In particular, εi1(ϕ) is defined by

ϕθ

iri(1 + εi1(ϕ)) = IP[Zi1(ϕ) = 1] =

ϕIP[Zi1 = 1]Mi(ϕ)

=ϕ

Mi(ϕ)θ

iri(1 + εi1),

(6.17)so that

εi1(ϕ) =1 + εi1Mi(ϕ)

− 1, (6.18)


and, for l ≥ 2, εil(ϕ) is defined by

ϕθ

iriεil(ϕ) = IP[Zi1(ϕ) = l] =

ϕlIP[Zi1 = l]Mi(ϕ)

=ϕl

Mi(ϕ)θ

iriεil,

so that

εil(ϕ) =ϕl−1

Mi(ϕ)εil. (6.19)

The quantities εil(ϕ) can be uniformly bounded for ϕ ∈ [0, ϕ0] suchthat ϕ0 ≥ 1, if Conditions (A0) and (B01) are satisfied by the ϕ0–tiltedvariables Zij(ϕ0). To see this, we begin by showing that the Mi(ϕ) areuniformly close to 1 for ϕ ≤ ϕ0. First, note that, for 1 ≤ ϕ ≤ ϕ0,

Mi(ϕ) = 1 + (ϕ− 1)θ

iri(1 + εi1) +

∑l≥2

(ϕl − 1)θ

iriεil

≤ 1 + (ϕ0 − 1)Mi(ϕ0)θiri

(1 + εi1(ϕ0))

+∑l≥2

θ

iriϕ0θεil(ϕ0)Mi(ϕ0), (6.20)

from (6.17) and (6.19), and∑l≥2

εil(ϕ0) ≤∑l≥2

Ci−a1 l−a2 ≤ C ′i−a1 ,

from Condition (B01) for Z(ϕ0). It thus follows from (6.20) that Mi(ϕ0) isuniformly bounded in i, and that, for some constant C1 <∞,

0 ≤Mi(ϕ)− 1 ≤ C1/iri, i ≥ 1, (6.21)

uniformly in 1 ≤ ϕ ≤ ϕ0. A similar argument also shows that

0 ≤ 1−Mi(ϕ) ≤ θ

iri

1 +∑l≥1

εil

≤ Mi(ϕ0)θ

iri(1 + ρ∗0(ϕ0)) ≤ C2/iri, i ≥ 1, (6.22)

in 0 ≤ ϕ ≤ 1.Turning to the εil(ϕ), it is now immediate from (6.18) and (6.21) that

εi1 = Mi(ϕ0)εi1(ϕ0) +Mi(ϕ0)− 1 = O(i−(g1∧1)), (6.23)

and that

|εi1(ϕ)| ≤ 1 + εi1(ϕ0)Mi(ϕ0)− 1= O(i−(g1∧1)), (6.24)

6.5. d–fractions 147

uniformly in 1 ≤ ϕ ≤ ϕ0; and, uniformly in ϕ < 1, we have

|εi1(ϕ)| ≤ |εi1|+ 1−Mi(ϕ)/Mi(ϕ)= O(i−(g1∧1)), (6.25)

now using (6.22) and (6.24). Then, for l ≥ 2 and 0 ≤ ϕ ≤ ϕ0, it followsfrom (6.19), (6.21) and (6.22) that

εil(ϕ) = (ϕ/ϕ0)l−1εil(ϕ0)Mi(ϕ0)/Mi(ϕ)≤ C3εil(ϕ0). (6.26)

Thus, if the ϕ0–tilted structure satisfies Conditions (A0) and (B01), thenall the ϕ–tilted structures for 0 ≤ ϕ ≤ ϕ0 satisfy Conditions (A0) and (B01)with the same exponents and the same constants, the exponents a1 and a2

being as for the ϕ0–tilted structure, and with the exponent g1 from theϕ0–tilted structure being replaced by (g1 ∧ 1).

This is enough to prove the following proposition.

Proposition 6.4 Assume that Conditions (A0) and (B01) hold for the ϕ0–tilted structure. Then the order estimates in Propositions 6.1 and 6.2 holduniformly in 0 ≤ ϕ ≤ ϕ0 for the ϕ–tilted structures, with the same valuesof a1 and a2 as for the ϕ0–tilted structure, and with the exponent g1 fromthe ϕ0–tilted structure being replaced by (g1 ∧ 1).

Proof. All the quantities considered are bounded by linear combinationsof the |εil(ϕ)| with non–negative coefficients, and the inequalities (6.24)–(6.26) give uniform bounds for them. If g1 > 1 and the ri grow fasterthan ig1−1, the original exponent g1 can be retained. ut

A similar argument proves the following proposition.

Proposition 6.5 Suppose that a logarithmic combinatorial structure sat-isfies Condition (G), and that ϕ0 < c−1. Then Condition (G) is satisfiedby the ϕ–tilted structures uniformly in 0 ≤ ϕ ≤ ϕ0, with c replaced by cϕ0.

6.5 d–fractions

It can be of interest to analyze the structure of a multiset, under the condi-tion that it is made up only of elementary objects from a restricted subset ofall the possible elementary objects; this setting was studied by Car (1984)in the context of random polynomials, when the elementary objects are theirreducible polynomials of the various degrees. Here we show that, if onlya fraction d or thereabouts of the elementary objects of each weight are


allowed, our methods can be applied with θ replaced by θd. We generalizethe setting by considering any logarithmic combinatorial structure basedon random variables Z = (Zi, i ≥ 1) which have a dissection Z satisfyinglimi→∞ ri = ∞.

Let (r′i, i ≥ 1) be positive integers such that limi→∞ r′i/ri = d, for some0 < d < 1, and set ζi = (r′i/rid)− 1, i ≥ 1. Let

J ′ = (i, j); i ≥ 1, 1 ≤ j ≤ r′i and Z ′ = (Zij ; (i, j) ∈ J ′);

then define Z ′ =(∑r′i

j=1 Zij , i ≥ 1)

in the usual way. Set

An(Z) =⋂

(i,j)/∈J′1≤i≤n

Zij = 0.

Then we are interested in the distribution of C(n) conditional on An(C(n)),which, by the Conditioning Relation and independence, is given by

L(C(n) |An(C(n))) = L(Z[1, n] |T0n(Z) = n,An(Z)

)= L

(Z ′[1, n] |T0n(Z ′) = n).

Now the collection Z ′ is of the form (6.7), with θd for θ and

(1 + ε′i1)θd

ir′i= (1 + εi1)

θ

iri;

θd

ir′iε′il =

θ

iriεil;

(1 + E′i0)θd

ir′i= (1 + Ei0)

θ

iri,

so that

E′i0 = (1 + ζi)(1 +Ei0)− 1; ε′i1 = (1 + ζi)(1 + εi1)− 1; ε′il = (1 + ζi)εil.

In particular,

ρ∗i′ ≤ (1 + |ζ0|∗)ρ∗i + ζ∗i and F ′i1 ≤ (1 + ζ∗0 )Fi1,

where ζ∗i = maxj>i |ζi|. With these measures of departure from theBernoulli, all the preceding theorems can be applied with θd for θ, andthe errors of the corresponding approximations deduced. For instance, ifCondition (G) is satisfied by Z, and if ζ∗i = O(i−g

′) for some g′ > 0, then

Z ′ satisfies Condition (G) with g1 ∧ g′ for g1, and Theorem 3.2 can beapplied. More generally, if Conditions (A0) and (B01) are satisfied by Zand if ζ∗i = O(i−g

′) for some g′ > 0, then Z ′ also satisfies Conditions (A0)

and (B01), with (g1 ∧ g′) for g1, and with the original values of a1 and a2.It is also possible to estimate the probability IP[An(C(n))] that such a

multiset arises in the original model. The simple heuristic (roughly θ log ncomponents — see (4.69) — each with probability d of belonging to therestricted set) suggests something of order O(dθ logn), but realizations withunusually few components carry relatively more weight, making this a bad

6.6. Illustrations 149

guess. The correct asymptotics are given in the following theorem, whoseproof is deferred to Chapter 12.1: see also Car (1984), Theorem 1.

Theorem 6.6 Under Conditions (A0) and (B01), and if also ζ∗i = O(i−g′)

for some g′ > 0, then IP[An(C(n))] ∼ Kn−θ(1−d) for a constant K > 0,which can be computed using (12.1)–(12.3).

6.6 Illustrations

We illustrate the above framework by evaluating the εil in three well knownexamples. The first, the sizes of the components in a random mapping, is anexample of an assembly; the second, the degrees of the factors of a randompolynomial over GF(q), is a multiset; the third, the degrees of the factorsof a random square free polynomial over GF(q), is a selection.

Random mappings

The Zi are given by

Zi ∼ Po(θi/i),where θi = Po(i)[0, i− 1] ∼ 12 = θ. (6.27)

Here, we take ri = ∞, and, using the estimates in Gordon (1992),

Ei0 = εi1 = 2Po(i)[0, i− 1] − 1 < 0; εil = 0, l ≥ 2;

ρi, ρ∗i ≤

23√

2πi−1/2 +

1310

i−3/2 and Fi1 = 0. (6.28)

Thus g1 = 1/2 and g2 = 3/2, and a1 and a2 can be taken arbitrarily large.

Random polynomials over GF(q).

The Zi are given by Zi ∼ NB (Nq(i), q−i), where the numbers Nq(i) ofirreducible polynomials of degree i satisfy

Nq(1) = q and 0 ≤ 1− iNq(i)q−i ≤ 2q−i/2, i ≥ 2. (6.29)

Here, we again take ri = ∞, with

θ = 1; εi1 = iNq(i)q−i − 1 < 0; (6.30)

0 ≤ εil = l−1iNq(i)q−il ≤ l−1q−i(l−1), l ≥ 2,

and hence

ρi, ρ∗i ≤ 4q−i/2 and Fi1 ≤ 2q−i; (6.31)

thus each of g1, g2, a1 and a2 can be taken arbitrarily large.


Random square free polynomials.

The Zi are given by

Zi ∼ Bi(Nq(i),

q−i

1 + q−i

), (6.32)

and we take ri = Nq(i) with θ = 1, giving

Ei0 = εi1 =iNq(i)q−i

1 + q−i− 1 < 0; εil = 0, l ≥ 2, (6.33)

and thus ρi, ρ∗i ≤ 3q−i/2 and Fi1 = 0; again, g1, g2, a1 and a2 can be takenarbitrarily large.

6.7 Main theorems

In this section, we state our more explicit and general versions of Theorems3.2, 3.3 and 3.4 of Chapter 3.3. The use of these theorems to deduce lim-iting approximations such as those given in Theorem 3.5 is the substanceof Chapter 7. The proofs make heavy use of the detailed estimates in thelater chapters, and are deferred until Chapter 12. We couch our conclu-sions in terms of order statements which are valid under conditions such asConditions (A0) and (B01), but all the error estimates are actually spec-ified in such a way that they can be evaluated for any given logarithmiccombinatorial structure. We begin with the progenitor of Theorem 3.2 (1),describing the global behavior of the small components.

Theorem 6.7 Suppose that 0 ≤ b < n/8 and that n ≥ n0, where theconstant n0 = n0(Z) ≥ max18, 2θ is defined in (11.14). Then

dTV(L(C[1, b]),L(Z[1, b])

)≤ dTV

(L(C[1, b]),L(Z[1, b])

)≤ ε6.7(n, b),

where ε6.7(n, b) = ε6.7(n, b, Z) = O(b/n) under Conditions (A0), (D1)and (B11). ε6.7(n, b) is specified in (12.5) below.

Remark. The stated order O(b/n) holds for random mappings, polynomialsand square free polynomials. Under weaker conditions than those of thetheorem, ε6.7(n, b) may still be small; for instance, under Conditions (A0)and (B01), ε6.7(n, b) = O

(bn−β2+δ

)for any δ > 0, in view of (12.5), (9.25)

and Corollary 9.5, where β2 = (1 ∧ g1 ∧ a1).

The next theorem makes the error of approximation in Theorem 6.7clearer, by separating out the asymptotically leading term. Under reason-able assumptions, ε6.8(n, b) is of smaller order than the main correctionof order n−1IE|T0b − IET0b|. Note also that, if θ = 1, as is the case in many

6.7. Main theorems 151

classical examples, this term of leading order exactly vanishes, showingthat the approximation in Theorem 6.7 is in fact of sharper order thanO(b/n). See Arratia and Tavare (1992a), Arratia, Stark and Tavare (1995)and Stark (1997b).

Theorem 6.8 For 0 ≤ b < n/32 and n ≥ n0, where n0 is as definedin (11.14), we have∣∣dTV (L(C[1, b]),L(Z[1, b])

)− 1

2 (n+ 1)−1|1− θ|IE|T0b − IET0b|∣∣

≤ ε6.8(n, b),

where

ε6.8(n, b) = ε6.8(n, b, Z) = O(n−1b[n−1b+ n−β11+δ]

)for any δ > 0 under Conditions (A0), (D1) and (B12), with

β11 = min1/2, θ/2, g1, (g2 ∧ a1)− 1;

ε6.8(n, b) is specified in (12.10) below.

Remark. For random mappings, any β11 = 1/4; for polynomials and squarefree polynomials, β11 = 1/2.

We now turn to the global approximation of the distribution of the largecomponents of C, where the conditioning plays a significant part. Here, theright choice of process to approximate C[b+1, n] is the corresponding condi-tioned process C∗[b+ 1, n], derived from the Poisson random variables Z∗,rather than the independent process Z[b + 1, n], as was the case for thesmall components. The prototype for our theorem is (3.11), which we nowsharpen considerably, by exploiting the argument sketched near (6.3). Weactually compare the distributions of the dissected processes C[b+1, n] andC∗[b+ 1, n]; for convenience, we assume from now on that b is big enoughto satisfy b ≥ 3θ, so that

θ/br−b ≤ θ/b ≤ 1/3, (6.34)

irrespective of the values taken by the ri.

Theorem 6.9 For n ≥ n0 and 3θ ≤ b < n, we have

dTV(L(C[b+ 1, n]),L(C∗[b+ 1, n])

)≤ dTV

(L(C[b+ 1, n]),L(C∗[b+ 1, n])

)≤ ε6.9(n, b),

where

ε6.9(n, b) = ε6.9(n, b, Z) = O

(([b ∧( n

log n

)]−β01+δ))


for any δ > 0, under Conditions (A0) and (B01); ε6.9 is specifiedin (12.15) below, and β01 = (1 ∧ θ ∧ g1 ∧ a1). In particular,

ε6.9

(n,( n

log n

))= O(n−t), (6.35)

for any t < β01.

Remark. For b ≤ n/ log n, if a1 > 1, it in fact follows from (12.15) that

ε6.9(n, b) = O(b−β0 log1+s(θ) b1 + S(b)1lg1>1

),

where s(θ) = 1lθ=1 + 1lg1=θ≤1, S(b) is as in (6.10) and β0 = (1∧ θ ∧ g1).In particular, in the same range of b, ε6.9(n, b) is of order O(b−1/2 log2 b)for random mappings and of order O(b−1 log2 b) for polynomials and squarefree polynomials.

Theorems 6.7 and 6.9 provide separate but overlapping approximationsfor the small and large components. It is therefore natural to try to combinethem into a single approximation for C(n). In view of Theorems 6.7 and 6.9,the simplest choice of approximating process is of the form Z(b,n), where

L(Z(b,n)[1, b]) = L(Z[1, b])

and

L(Z(b,n)[b+ 1, n] | Z(b,n)[1, b]) = L(C∗(n)[b+ 1, n] |T0b(C∗) =

b∑i=1

iZ(b,n)i

),

and b = b(n) can be chosen more or less at will. Thus the distribution of Zis formed from that of independent small components, necessarily with theZ–distribution, and the conditional Ewens Sampling Formula distributionfor the remaining components, given that T0b(C∗) = T0b(Z).

Theorem 6.10 For any b such that 3θ ≤ b < n/8,

dTV (L(C(n)),L(Z(b,n))) ≤ ε6.10(n, b)

where

ε6.10(n, b) = ε6.10(n, b, Z)

= O(n−1b+ b−g1 + n−g1log n 1lg1=1 + S(n)1lg1>1

),

under Conditions (A0), (D1) and (B11), where g1 = 1 ∧ g1; the choice ofb = n1/(1+g1) gives an approximation of order O

(n−g1/(1+g1)

). ε6.10(n, b)

is specified in (12.20) below.

Remark. For random mappings, g1 = 1/2; for polynomials and square freepolynomials, g1 = 1.


An alternative way of interpreting Theorem 6.9 is that the small com-ponents are special, but the large components are much as if they followedthe Ewens Sampling Formula. This suggests another approximation, usefulin Chapter 7.5.

Theorem 6.11 Let C(b,n) = (C(b,n)1, . . . , C

(b,n)n) be defined using the

Conditioning Relation from the random variables

Z(b,n) = (Z1, . . . , Zb, Z∗b+1, . . . , Z

∗n).

Then for n ≥ n0 and 3θ ≤ b < n, we have

dTV (L(C(n)),L(C(b,n))) ≤ ε6.11(n, b),

where ε6.11(n, b) is specified in (12.21) below. If Conditions (A0)and (B01) hold, then ε6.11(n, b) is of order O(b−β2+δ) for any δ > 0,where β2 = (1 ∧ g1 ∧ a1) as before.

Theorems 6.7 – 6.11 give bounds on the accuracy of global approxima-tions to the joint distributions, and are more detailed versions of Theorems3.2 and 3.4. We now turn to local approximations for the joint distributionsof the small and large component sizes, proving theorems which specializeto Theorem 3.3. We begin with the small components.

Theorem 6.12 For 0 ≤ b < (n/8) min1, 2/[θ(1 + µ∗0)] and n ≥ n0, andfor any y ∈ ZZb+ for which T0b(y) =

∑bi=1 iyi ≤ n/2, we have∣∣∣∣ IP[C[1, b] = y]

IP[Z[1, b] = y]− 1∣∣∣∣ ≤ ε6.12(n, b),

where ε6.12(n, b) = On−1(b+S(n)T0b(y)) under Conditions (A0), (D1)and (B11). ε6.12(n, b) is specified in (12.22) below.

Remark. For random mappings, polynomials and square free polynomials,the order is On−1(b+ T0b(y)).

We now examine the local approximation of the joint distribution of thelarge components. Picking any r ≥ 1 and n > m1 > · · · > mr ≥ 1, wewish to approximate the joint probability IP[Lj(n) = mj , 1 ≤ j ≤ r]. Werephrase this probability by defining y = (yi, mr + 1 ≤ i ≤ n) as

yml= 1, 1 ≤ l ≤ r − 1; yi = 0 otherwise, (6.36)

when it becomes the probability IP[C(n)mr

≥ 1; C(n)[mr + 1, n] = y]. Itof course makes no sense to consider choices of the mj for which Mr =∑rl=1ml > n, but our asymptotics are actually a little more restrictive,

requiring that n−1Mr be uniformly bounded away from 1 from below.


Theorem 6.13 Fix 0 < η < 1 and r ≥ 1. Choose n > m1 > · · · > mr > nηwhich satisfy Mr =

∑rl=1ml ≤ n(1 − η). Define y as in (6.36), and write

xl = n−1ml, 1 ≤ l ≤ r; Xr =∑rl=1 xl. Then∣∣∣∣nrIP[L1

(n) = m1, . . . , Lr(n) = mr]

fθ(r)(x1, . . . , xr)

− 1∣∣∣∣

=∣∣∣∣nrIP[C(n)

mr≥ 1, C(n)[mr + 1, n] = y]fθ

(r)(x1, . . . , xr)− 1∣∣∣∣ ≤ ε6.13(n, η),

where the joint densities fθ(r), r ≥ 1, are as defined in (4.78) and (4.87)and where

ε6.13(n, η) = ε6.13(n, η, Z) = O(n−β0+δ

)for any δ > 0, for each fixed 0 < η < 1, under Conditions (A0) and (B11);β0 = (1 ∧ θ ∧ g1), and ε6.13(n, η) is specified in (12.25) below.

Remark. For random mappings, ε6.13(n, η) is of order O(n−1/2);for polynomials and square free polynomials, ε6.13(n, η) is of orderO(n−1 log n).

The restriction to η > 0 is not nugatory, not least because, if Xr = 1, thedensity fθ(r)(x1, . . . , xr) in the denominator is zero when θ > 1 and infinitewhen θ < 1. The simplest example to consider is the approximation ofIP[L1

(n) = n], which is the case r = 1 and m1 = n, for which Mr = m1 = ndoes not satisfy Mr ≤ n(1− η) for any η > 0. Here, under Conditions (A0)and (B01), it is shown in Section 12.8 that, for any δ > 0,

nθIP[L1(n) = n] = Γ(θ + 1)e−χ(1 +O(n−β01+δ)), (6.37)

where

χ = χ(Z) =∑i≥1

−ri log IP[Zi1 = 0]− θ/i

as in (5.8), and β01 = (1 ∧ θ ∧ g1 ∧ a1). This shows that IP[L1(n) = n] is

only of exact order O(n−r) = O(n−1) if θ = 1, and that, even then, theconstant χ, which depends on the detail of the distributions of the Zi forsmall i, multiplies the value f1(1) = 1 in the asymptotic formula: see alsoTheorem 5.6.

The final result of this chapter concerns the behaviour of the total varia-tion distance dTV

(L(C[1, b]),L(Z[1, b])

)when b is not small with n. In this

case, we have the following approximation; see also Stark (1997a).

Theorem 6.14 Under Conditions (A0) and (B01), if also b/n→ α ∈ (0, 1),then

2dTV(L(C[1, b]),L(Z[1, b])

)


= Pθ

( 1α− 1,∞

)+αθ−1pθ(1/α)

pθ(1)

+∫ 1−α

0

pθ(x)∣∣∣∣1− pθ

(α)(1− αx)pθ(1)

∣∣∣∣ dx+O

(|bn−1 − α|θ + n−β01+δ

)for any δ > 0, with β01 = (1 ∧ θ ∧ g1 ∧ a1) as above; here, as always,θ = min(θ, 1).

Remark. For random mappings, θ = 1/2 = β01; for polynomials and squarefree polynomials, θ = 1 = β01.


7Consequences

This chapter develops approximations that can be obtained for summariesof the component distribution of a logarithmic combinatorial structure thatare coarser than those treated in Chapter 6.7. We consider in turn func-tional central limit theorems, Poisson-Dirichlet limits, asymptotics for thenumber of components and Erdos-Turan laws, assessing the errors in thedistributional approximations under appropriate metrics. The chapter con-cludes with a discussion of additive functions on logarithmic combinatorialstructures.

7.1 Functional central limit theorems

We begin with a functional central limit theorem for the numbers of compo-nents of different sizes. The first result of this type was given by DeLaurentisand Pittel (1983) for the case of random permutations. Hansen (1989, 1990)gave the corresponding result for random mappings and the Ewens Sam-pling Formula, respectively. Donnelly et al. (1991) and Arratia and Tavare(1992b) provided alternative proofs for the Ewens Sampling Formula, thelatter being similar in spirit to the approach taken here. Goh and Schmutz(1993) derived the corresponding result for the degree sequence of the char-acteristic polynomial of a matrix over a finite field. Arratia, Barbour andTavare (1993) and Hansen (1993) studied the case for random polynomialsover a finite field.

We begin with two preparatory lemmas.

7.1. Functional central limit theorems 157

Lemma 7.1 If Conditions (A0), (D1) and (B11) hold, there exists for eachn ≥ 1 a coupling of C(n)[1, n] and Z[1, n] such that, if

Rn,1 = log−1/2 nn∑i=1

|C(n)i − Zi|,

then

IE(Rn,1 ∧ 1) = O( log log n√

log n

)as n→∞.

Proof. As a consequence of Theorem 6.7, there exists a coupling of C(n)[1, b]and Z[1, b] such that

IP[C(n)[1, b] 6= Z[1, b]] = O(log−1 n), (7.1)

if b = b(n) = n/ log n and Conditions (A0), (D1) and (B11) hold. Extendthis in any way to a coupling of C(n)[1, n] and Z[1, n]. Then

IE(Rn,1 ∧ 1) ≤ IP[C(n)[1, b] 6= Z[1, b]

]+ IP

[ bn/6c⋃i=b+1

C(n)i > 3

]

+ (log n)−1/2

bn/6c∑i=b+1

IE(C(n)iI[C(n)

i ≤ 3])

+n∑

i=bn/6c+1

IEC(n)i +

n∑i=b+1

IEZi

. (7.2)

Now

IP[C(n)i1 = l] =

IP[Zi1 = l]IP[T0n(i)(Z) = n− il]

IP[T0n(Z) = n]

≤ IP[Zi1 = l]( 3nK0

θPθ[0, 1](n− il)

),

because of Lemma 9.2 and Theorem 11.10, and hence

bn/6c∑i=b+1

IE(C(n)iI[C(n)

i ≤ 3]) ≤ 6K0

θPθ[0, 1]

bn/6c∑i=b+1

IEZi = O( n∑i=b+1

IEZi).

Furthermore, since∑ni=1 iC

(n)i = n, it follows that

∑ni=bn/6c+1 C

(n)i ≤ 6.

Then alson∑

i=b+1

IEZi ≤n∑

i=b+1

θi−1(1 + µi) = O(log(n/b)) = O(log log n).

158 7. Consequences

Combining these estimates with (7.1), and putting them into (7.2), thelemma is proved, if it can be shown that

IP

bn/6c⋃i=b+1

C(n)i > 3

= O(log−1/2 n).

However, by Theorem 6.9, the latter probability is no greater than

ε6.9

(n,( n

log n

))+ IP

bn/6c⋃i=b+1

C∗(n)i > 3

.Under the stated conditions, and in view of (6.35),

ε6.9

(n,( n

log n

))= O(log−1/2 n),

and the remaining probability, from Arratia, Barbour and Tavare (1992),Section 3, is no greater than

IP[ bn/6c⋃i=b+1

Z∗i > 2]≤

bn/6c∑i=b+1

θ3

6i3= O(n−2 log2 n).

ut

Lemma 7.2 Under the same conditions, the coupling of Lemma 7.1 canbe extended to include Z∗[1, n], in such a way that, if

Rn,2 =1√

log n

n∑i=1

|Z∗i − Zi|,

then

IERn,2 = O

(1√

log n

).

Proof. Use a coupling of Z[1, n] and Z∗[1, n] which achieves the Wassersteinl1 distance between them, so that, from Lemma 10.2 (1),

IEn∑i=1

|Z∗i − Zi| = dW(L(Z[1, n]),L(Z∗[1, n])

)≤

n∑i=1

θ

i

µi +

θ

iri

= O(1),

under Conditions (A0) and (B01). ut

Now define

un(t) = θh(bntc+ 1),

7.1. Functional central limit theorems 159

where h(j + 1) =∑jr=1 r

−1 as usual, and observe that

sup0≤t≤1

|un(t)− θt log n| ≤ θ(c+ 1),

where c = supx>0 |h(x+1)− log x| <∞. Embed Z∗ in a Poisson process Pof unit rate, in such a way that

P (θh(j + 1)) =j∑i=1

Z∗i , j = 1, 2, . . . .

Letting

Bn(t) =∑bntci=1 C

(n)i − θt log n√

θ log n, 0 ≤ t ≤ 1;

and

B∗n(t) =

∑bntci=1 Z

∗i − θt log n√θ log n

=P (un(t))− θt log n√

θ log n, 0 ≤ t ≤ 1,

it is clear that B∗n is close to Brownian motion if n is large, and that Bn

should be close to B∗n. This is the substance of the following theorem, whose

proof follows that of Arratia, Barbour and Tavare (1992), Theorem 4.4.

Theorem 7.3 If Conditions (A0), (D1) and (B11) hold, it is possible toconstruct Bn and a standard Brownian motion B on the same probabilityspace, in such a way that

IE

sup0≤t≤1

|Bn(t)−B(t)| ∧ 1

= O( log log n√

log n

).

Remark. The conditions of the theorem are satisfied by mappings and bypolynomials and square free polynomials.

Proof. As in Kurtz (1978), Lemma 3.1, construct a standard Brownianmotion b in such a way that

supt≥0

|P (t)− t− b(t)|2 ∨ log t

= K <∞,

where IEK <∞: then

|P (un(t))− un(t)− b(un(t))| ≤ K(2 + log un(1))

for all 0 ≤ t ≤ 1. Set B(t) = b(θt log n)/√θ log n. Then, by the triangle

inequality,

√θ log n |Bn(t)−B(t)| ≤

∣∣∣∣bntc∑

i=1

C(n)i −

bntc∑i=1

Zi

∣∣∣∣+ ∣∣∣∣bntc∑

i=1

Zi −bntc∑i=1

Z∗i

∣∣∣∣

160 7. Consequences

+ |P (un(t))− un(t)− b(un(t))|+ |un(t)− θt log n|+ |b(un(t))− b(θt log n)|.

Hence√θ sup

0≤t≤1|Bn(t)−B(t)|

≤ Rn,1 +Rn,2 +K(2 + log un(1))√

log n+θ(c+ 1)√

log n

+sup0≤t≤1 |b(un(t))− b(θt log n)|

√log n

,

and the theorem follows from Lemmas 7.1 and 7.2 and from Csorgo andRevesz (1981), Lemma 1.2.1. ut

7.2 Poisson–Dirichlet limits

Theorem 7.3, which gives not only a functional central limit theorem forthe component counts but also an estimate of the error involved in such anapproximation, uses a standardization which is appropriate for describingthe behaviour of all the medium sized components. For the very smallcomponents, Theorem 6.7, which strengthens (3.4), is already in the formof a limit theorem with an error estimate, the limiting process being theprocess of independent random variables (Zi, i ≥ 0). For the very largecomponents, the appropriate standardization is that leading to the Poisson–Dirichlet approximation of (3.5).

The Poisson-Dirichlet distribution PD(1) was introduced in Chapter 1in the context of random permutations and prime factorization; see pages13-14 and 22 for the early history. The Poisson-Dirichlet distribution forarbitrary θ arose in the context of population genetics in Watterson (1976)and Kingman (1977), the latter establishing PD(θ) limits for the EwensSampling Formula. Convergence of the (normalized) ordered componentcounts of a random mapping to PD(1/2) appeared in Aldous (1985), andMutafciev (1990) established the corresponding result for a random map-ping pattern. Arratia, Barbour and Tavare (1993) studied the case ofrandom polynomials over a finite field, while Hansen and Schmutz (1993)examined the corresponding result for the characteristic polynomial of amatrix over a finite field, and provided an estimate of the error in the ap-proximation. Hansen (1994) developed a general approach covering theseexamples and more; further discussion appears on page 117. Local limittheorems were given in Arratia, Barbour and Tavare (1999b) using theapproach described in Chapter 5.1.

New examples of Poisson-Dirichlet approximations continue to be found.Hansen and Jaworski (2000) study bipartite random mappings of two sets of

7.2. Poisson–Dirichlet limits 161

K and L elements respectively, showing that no matter how K and L tendto infinity, the proportion of the n = K+L elements in the largest, secondlargest, . . . components converges to PD(1/2) as n→∞. Andersson (2002)studies random circuit decompositions of complete graphs, showing thatthe proportion of edges in the largest, second largest, . . . circuits convergesin distribution to PD(1/2) for the undirected complete graph, and to PD(1)for the directed complete graph.

Here is another interesting example. To set the scene, we first considerthe core of a random mapping, the set of elements contained in cycles ofthe mapping. It is well known that the number Nn of points in the coresatisfies Nn/

√n →d W , where W has density w exp(−w2/2), w > 0; see

Bollobas (1985) for example. Furthermore, given Nn = r, the r points inthe core are assigned to cycles exactly as r points are assigned to cycles bya uniform random permutation. It follows that the proportions of points inthe core that fall in the longest cycle, the second longest cycle and so onhave asymptotically the PD(1) distribution. On the other hand, we haveAldous’s (1985) PD(1/2) limit for the sizes of the entire components. ThusPD(1/2) and PD(1) both arise in this model, the former in the mappingitself and the latter in its core. A similar result arises in the context ofrandom graphs. Consider the random graph G(n,m) with n vertices andm = n/2 ± s edges, where s = o(n) and sn−2/3 → ∞, and look at theset of vertices belonging to the unicyclic components (recall that, in therandom mapping, all components are unicyclic). Janson (2000) shows thatthe proportions of these vertices in the largest unicyclic component, thesecond largest and so on have asymptotically the PD(1/4) distribution.Considering those vertices belonging to the cycles in the unicyclic com-ponents, Janson shows that the proportion of them in the largest cycle,the second largest cycle,. . . have asymptotically the PD(1/2) distribution.Thus PD(1/4) and PD(1/2) arise in the graph and its core, respectively.

The starting point for our discussion is Theorem 5.8, which shows that,for any fixed r, the joint distribution of (n−1L1

(n), . . . , n−1Lr(n)) converges

to that of the first r components (L1, . . . , Lr) of the Poisson–Dirichlet dis-tribution PD (θ) if (LLT) holds, and weak convergence in ∆ ⊂ [0, 1]∞ ofn−1Lj

(n), j ≥ 1 to Lj , j ≥ 1 is then immediate. Theorem 11.1 showsthat (LLT) in fact holds for all structures satisfying the Conditioning Re-lation and the Logarithmic Condition for which also µ∗0 < ∞. Under thestronger Conditions (A0) and (B11), Theorem 6.13 sharpens the conver-gence in Theorem 5.8, giving a uniform error bound for the approximationof the joint densities of the sizes of the largest r components, for any fixed r.

Considering the infinite sequence as a whole, convergence in [0, 1]∞ isvery weak; there are few useful continuous functionals. For applicationsof the Ewens Sampling Formula in genetics, it is therefore desirable tostrengthen these results to cover the approximation of quantities such as

162 7. Consequences

the infinite sums

IE∑j≥1

g(n−1Lj(n)) (7.3)

by their Poisson–Dirichlet counterparts

IE∑j≥1

g(Lj), (7.4)

uniformly over suitable families of functions g. For such a purpose, it isnatural to view the arrays n−1Lj

(n), j ≥ 1 and Lj , j ≥ 1 as pointprocesses on (0, 1], represented by the random measures

Ψ(n) =n∑j=1

δn−1Lj

(n) =n∑j=1

δn−1jC(n)j and Ψ∗ =

∑j≥1

δLj , (7.5)

so that (7.3) and (7.4) become simply

IE∫g dΨ(n) and IE

∫g dΨ∗.

In the same way, the point process

Ψ∗(n) =n∑j=1

δn−1jC∗(n)j (7.6)

has distribution induced by ESF (θ).

Approximating Ψ(n) by Ψ∗(n)

Both Ψ(n) and Ψ∗ consist of discrete atoms, and give finite mass to any in-terval (a, 1] for a > 0, but Ψ∗(0, 1] = ∞ a.s. Thus, for IE

∑j≥1 g(n

−1Lj(n))

to converge to IE∑j≥1 g(Lj), it is necessary to impose some growth con-

dition at the origin on g. Let g∗(x) = sup0≤y≤x |g(y)|, and let mg denotethe concave majorant of g∗ on [0, 1]. The essential requirement on g is that∫ 1

0x−1mg(x) dx <∞, which merely asks that IE

∫mg dΨ∗ <∞; for many

functions g, this is no stronger than requiring that IE∫g dΨ∗ exists. If this

is the case, we can immediately show that IE∫g dΨ(n) is well approximated

by the ESF (θ) approximation IE∫g dΨ∗(n).

Theorem 7.4 Under Conditions (A0) and (B01), it follows that

dW

(L(∫

g dΨ(n)

),L(∫

g dΨ∗(n)

))= O

(min

3θ≤b≤(n+1)/2

[(nb−1−β01+δ + n−[a1∧(a2−1)]+δ)mg(b/n)


+∫ b/n

0

x−1mg(x) dx

]),

for any δ > 0, small with n if∫ 1

0x−1mg(x) dx < ∞; here, as usual, we

have β01 = (1 ∧ θ ∧ g1 ∧ a1).

Proof. By definition,∫g dΨ(n) =

∑j≥1

g(n−1Lj(n)) =

n∑i=1

g(i/n)C(n)i.

We thus need to realize C(n)[1, n] and C∗(n)[1, n] on the same probabilityspace in such a way that

IE∣∣∣ n∑i=1

g(i/n)C(n)i −

n∑i=1

g(i/n)C∗(n)i

∣∣∣can be shown to be suitably small. So fix any b ≥ 3θ, and realize C(n)[1, n]and C∗(n)[1, n] on the same probability space to minimize IP[Dbn], whereDbn = C(n)[b + 1, n] 6= C∗(n)[b + 1, n]. Then, splitting the sums into theranges i ≤ b and i > b, we immediately have the upper bound∣∣∣ n∑

i=1

g(i/n)C(n)i −

n∑i=1

g(i/n)C∗(n)i

∣∣∣≤ 2

n

bmg(b/n)I[Dbn] +

b∑i=1

mg(i/n)C(n)i + C

∗(n)i , (7.7)

because, by the concavity of mg,

n∑i=b+1

|g(i/n)|ni ≤n∑

i=b+1

nimg(i/n) ≤ n

bmg(b/n)

for all choices of ni ≥ 0 such that∑ni=b+1 ini ≤ n.

Now, from Lemma 13.3 and Proposition 6.2 (c), and if b ≤ (n+ 1)/2, wehave

b∑i=1

mg(i/n)IEC(n)i + IEC∗(n)

i

= O

(mg(b/n)n−[a1∧(a2−1)]+δ +

b∑i=1

i−1mg(i/n)

)

= O

(mg(b/n)n−[a1∧(a2−1)]+δ +

∫ b/n

0

x−1mg(x) dx

)

164 7. Consequences

for any δ > 0, under Conditions (A0) and (B01). Hence, and fromTheorem 6.9 and (6.35),

IE∣∣∣ n∑i=1

g(i/n)C(n)i −

n∑i=1

g(i/n)C∗(n)i

∣∣∣= O

((nb−1−β01+δ + n−[a1∧(a2−1)]+δ)mg(b/n) +

∫ b/n

0

x−1mg(x) dx

),

as required. ut

Remark. Under Condition (G), where a1 and a2 can be taken arbitrarilylarge and S(∞) <∞, the order can be improved to

O

(min

3θ≤b≤(n+1)/2

[(nb−1−β0 log1+s(θ) b)mg(b/n) +

∫ b/n

0

x−1mg(x) dx])

,

using the remark following Theorem 6.9; in this expression, β0 = (1∧θ∧g1)as usual, and s(θ) = 1lθ=1 + 1lg1=θ≤1.

Corollary 7.5 If mg(x) ≤ xα for some 0 < α ≤ 1, then, under Conditions(A0) and (B01),∣∣∣IE∫ g dΨ(n) − IE

∫g dΨ∗(n)

∣∣∣ = O(n−αβ01/(β01+1)+δ

)for any δ > 0. Under Condition (G), the order can be improved toO(n−αβ0/(β0+1) log1+s(θ) n

).

Proof. In general, take b = n1/(β01+1)+δ′ for any δ′ > 0. Under Condi-tion (G), take b = n1/(β0+1). ut

Approximating Ψ∗(n) by Ψ∗

Theorem 7.4 relates IE∫g dΨ(n) to IE

∫g dΨ∗(n); it thus remains to

strengthen Theorem 4.26 by showing that IE∫g dΨ∗(n) is close to IE

∫g dΨ∗

for similarly large classes of functions g. The argument is slightly differentfrom that for Theorem 7.4, which relies on a total variation coupling impliedby Theorem 6.9. Here, since n−1L∗j (n) always takes rational values but Lja.s. never does, exact matching in total variation is no longer appropriate,and the functionals under consideration must be robust with respect tosmall perturbations of the measures. Accordingly, we restrict considerationto continuous functions g, denoting the modulus of continuity by

wg(x) = sup0≤y,z≤1|y−z|≤x

|g(y)− g(z)|.


The accuracy of the approximation of IE∫g dΨ∗(n) by IE

∫g dΨ∗ now

depends not only on mg, but also on wg. The main result is as follows.

Theorem 7.6 In the above setting,

dW

(L(∫

g dΨ∗(n)

),L(∫

g dΨ∗))

≤ min3≤k≤n

2(θ ∨ 1)

∫ (k+|θ−1|)/n

0

x−1mg(x) dx

+ [wg(1/n) + wg(|1− θ|/n)](1 + θ log

(n− 1k − 1

))+

θ2

2(k − 2)[mg(1/n) + wg(|1− θ|/n)]

.

Before proving the theorem, we note two immediate consequences.

Corollary 7.7 If∫ 1

0x−1mg(x) dx <∞ and g is continuous, then

limn→∞

∣∣∣IE∫ g dΨ∗(n) − IE∫g dΨ∗

∣∣∣ = 0. ut

Corollary 7.8 If wg(x) ≤ xα and mg(x) ≤ xα for all 0 ≤ x ≤ 1 and forsome 0 < α ≤ 1, then∣∣∣IE∫ g dΨ∗(n) − IE

∫g dΨ∗

∣∣∣ = O(n−α log n

). ut

Proof. Let M be the scale invariant Poisson process with rate θ/x on (0, 1],and denote its points by W1 > W2 > · · ·; define W0 = 1. Then Ψ∗ can beconstructed from M by setting

Ψ∗ =∑j≥1

δ(Wj−1−Wj).

There is an analogous realization of Ψ∗(n), the Feller coupling, defined at thestart of Chapter 4. As in (4.2), let (ξj , j ≥ 1) be independent Be

(θ

θ+j−1

)random variables, and set

J0(n) = n+ 1; Jm

(n) =

max(j < Jm−1(n) : ξj = 1) if Jm−1

(n) > 1;1 if Jm−1

(n) = 1:

then write C∗(n)j =

∑m≥1 I[Jm−1

(n) − Jm(n) = j]. As for (4.5), it follows

that (C∗(n)1 , . . . , C

∗(n)n ) so defined is indeed distributed according to the

Ewens Sampling Formula with parameter θ. Scaling down to [0, 1], we write

166 7. Consequences

Ym(n) = n−1(Jm(n) − 1), m ≥ 0, so that Ym(n) ∈ 0, 1/n, . . . , 1, Y0

(n) = 1and Ym(n) = 0 for all m large enough; then

Ψ∗(n) =∑m≥1

δ(Ym−1

(n)−Ym(n)

).

The proof consists of coupling the (Wj , j ≥ 1) to the (Ym(n), m ≥ 1), insuch a way that their differences match closely enough for

IE∣∣∣∫ g dΨ∗(n) −

∫g dΨ∗

∣∣∣to be bounded as in the statement of the theorem.

Our argument actually uses two couplings. The first is a coupling of(Ym(n), m ≥ 1) to a Poisson process Mn on [0, 1] which has intensity µn(x)satisfying

IP[Mn[(j − 1)/n, j/n) = 0] = exp

−∫ j/n

(j−1)/n

µn(x) dx

=j − 1

θ + j − 1= IP[ξj = 0] (7.8)

for 1 ≤ j ≤ n; a suitable choice of µn is given by

µn(y) = n[h(ny + θ)− h(ny)], (7.9)

where, as in (4.17), h(x) = γ + Γ′(x)/Γ(x) in x > 0; note, in particular,that h is concave and h(x+ 1)− h(x) = 1/x for all x > 0. Then, for fixedy > 0, µn(y) ∼ θ/y as n→∞, so that Mn apparently differs little from thePoisson process M if n is large. The second coupling makes this heuristicprecise.

Let the points of Mn be denoted by (Wm(n), m ≥ 1), and set W0

(n) = 1.In view of (7.8), Mn and (Ym(n), m ≥ 0) can be constructed on the sameprobability space in such a way that Mn[(j − 1)/n, j/n) = 0 exactly whenξj = 0, where the ξj ’s are as in the construction of the Ym(n). Thus thepoints Wm

(n) occur only in intervals [(j−1)/n, j/n) for values of j such thatξj = 1 (and hence (j − 1)/n = Ym′

(n) for some m′ ≤ m), but there can bemore than one of the Wm

(n) in any such interval, so that m 6= m′ in general.However, if Wm

(n) and Wm+1(n) are in the same interval [(j − 1)/n, j/n),

then |g(Wm(n) −Wm+1

(n))| ≤ mg(1/n), whereas, if Wm(n) and Wm+1

(n) arein different intervals, there is an m′ such that

|(Ym′(n) − Ym′+1

(n))− (Wm(n) −Wm+1

(n))| ≤ 1/n,

implying that

|g(Ym′(n) − Ym′+1

(n))− g(Wm(n) −Wm+1

(n))| ≤ wg(1/n).

So, fixing any 1 ≤ k ≤ n, we define

N0(k) = 1 +Mn[k/n, 1),


N1(k) = 1 +∑m≥1

I[Ym(n) ≥ k/n] = 1 +n∑

j=k+1

ξj

and

N2(k) =n∑

j=k+1

Mn[(j − 1)/n, j/n)− ξj,

noting that N1(k) + N2(k) = N0(k); then, almost surely, with the aboveconstruction,∣∣∣∣∣∣

∫gdΨ∗(n) −

N0(k)∑m=1

g(Wm−1(n) −Wm

(n))

∣∣∣∣∣∣=

∣∣∣∣∣∣∑m≥1

g(Ym−1(n) − Ym

(n))−N0(k)∑m=1

g(Wm−1(n) −Wm

(n))

∣∣∣∣∣∣≤ N1(k)wg(1/n) +N2(k)mg(1/n) +

∑m≥N1(k)

mg(Ym(n))

= N1(k)wg(1/n) +N2(k)mg(1/n) +k∑j=1

ξjmg((j − 1)/n). (7.10)

We now investigate the difference between∑N0(k)m=1 g(Wm−1

(n) −Wm(n))

and∫gdΨ∗ by matching the Poisson processes Mn and M point by point,

using an operational time coupling. We first define Mn(x) =∫ 1

xµn(y) dy

and M(x) =∫ 1

xθy−1 dy = −θ log x, and then realize M by defining its

points as displacements of the points of Mn:

Wm = M−1(Mn(Wm(n))), m ≥ 0.

The closeness of this coupling can thus be deduced from the properties ofthe function M−1(Mn(·)) on [0, 1].

Take first the case θ > 1. Then, from the concavity of h and byconsidering the slopes of chords, the definition (7.9) of µn implies that

θ

w≥ µn(w) ≥ nθ

nw + θ − 1, 0 < w < 1,

and hence that

M(w) ≥Mn(w) ≥ −θ log(nw + θ − 1n+ θ − 1

).

Thus

w ≤M−1(Mn(w)) ≤ w +(θ − 1)(1− w)n+ θ − 1

≤ w +θ − 1n

168 7. Consequences

for all w ∈ (0, 1), implying in turn that

0 ≤Wm −Wm(n) ≤ n−1(θ − 1) for all m.

If θ < 1, an analogous argument leads to the conclusion that

0 ≥Wm −Wm(n) ≥ −n−1(1− θ) for all m.

Hence, whatever the value of θ,

|g(Wm −Wm+1)− g(Wm(n) −Wm+1

(n))| ≤ wg(n−1|1− θ|),

for all m, implying as a result that∣∣∣∣∣∣N0(k)∑m=1

g(Wm−1(n) −Wm

(n))−∫gdΨ∗

∣∣∣∣∣∣=

∣∣∣∣∣∣N0(k)∑m=1

g(Wm−1(n) −Wm

(n))−∑m≥1

g(Wm−1 −Wm)

∣∣∣∣∣∣≤ N0(k)wg(n−1|1− θ|) +

∑m≥N0(k)

mg(Wm), (7.11)

and hence from (7.10) that, for each 1 ≤ k ≤ n,∣∣∣∣∫ gdΨ∗(n) −∫gdΨ∗

∣∣∣∣≤ N1(k)wg(1/n) +N2(k)mg(1/n) +

k∑j=1

ξjmg((j − 1)/n)

+N0(k)wg(n−1|1− θ|) +∑

m≥N0(k)

mg(Wm). (7.12)

It now remains to take expectations on the right hand side of (7.12), andto make simplifying estimates. Clearly,

IEN1(k) = 1 +n∑

j=k+1

θ

θ + j − 1≤ 1 + θ log

(n− 1k − 1

).

Then

IEN2(k) ≤n∑

j=k+1

12IEMn[(j − 1)/n, j/n)2

≤n∑

j=k+1

12

( θ

j − 2

)2

≤ θ2

2(k − 2)

if k ≥ 3, since µn(x) ≤ nθ/(j − 2) when j ≥ 3 and (j − 1)/n ≤ x < j/n.For the remainder, because mg is increasing and n/(θ + j − 1) ≤ 1/(xθ)


when j ≥ 1 and (j − 1)/n ≤ x < j/n, we have

IEk∑j=1

ξjmg((j − 1)/n) =k∑j=1

θ

θ + j − 1mg((j − 1)/n)

≤ (θ ∨ 1)∫ k/n

0

x−1mg(x) dx; (7.13)

and since |WN0(k) −WN0(k)(n)| ≤ n−1|1 − θ| and also WN0(k)

(n) < k/n, thelatter from the definition of N0(k), it follows that

IE

∑m≥N0(k)

mg(Wm)

≤ θ

∫ (k+|1−θ|)/n

0

x−1mg(x) dx. (7.14)

The theorem follows by putting these bounds into (7.12). ut

Wasserstein distance

The almost sure inequalities (7.7) and (7.12), which are satisfied by ourcouplings, enable us to strengthen Theorems 7.4 and 7.6, because they givestochastic bounds which are uniform over large classes of functions g. Forinstance, define

Gα = g : g(0) = 0, |g(x)− g(y)| ≤ |x− y|α

for any 0 < α ≤ 1; then, for all g ∈ Gα, mg(x) ≤ xα and wg(x) ≤ xα.Hence, for our couplings, we find that, for 3θ ≤ b ≤ (n+ 1)/2,

supg∈Gα

∣∣∣∣∫ g dΨ(n) −∫g dΨ∗(n)

∣∣∣∣≤ 2(nb

)1−αI[Dbn] +

b∑i=1

( in

)αC(n)

i + C∗(n)i , (7.15)

and, for 1 ≤ k ≤ n, that

supg∈Gα

∣∣∣∣∫ g dΨ∗(n) −∫g dΨ∗

∣∣∣∣ (7.16)

≤ N0(k)n−α1 + |1− θ|α+k∑j=1

ξj(j − 1)/nα +∑

m≥N0(k)

Wαm.

Now the expectations of the right hand sides of (7.15) and (7.16) havealready been bounded, for suitable choices of b and k, in Corollaries 7.5and 7.8, uniformly for all g ∈ Gα. This fact can itself be translated intostatements about the distances between L(Ψ(n)), L(Ψ∗(n)) and L(Ψ∗).

For any 0 < α ≤ 1, let Nα be the set of measures Ψ on (0, 1] consistingof positive integer valued atoms, which assign finite mass to any interval

170 7. Consequences

(a, 1] for a > 0, and which have∫xαΨ(dx) <∞. Define a distance on Nα

by

dα(Ψ1,Ψ2) = supg∈Gα

∣∣∣∣∫ g dΨ1 −∫g dΨ2

∣∣∣∣ .The corresponding Wasserstein distance between the distributions ofrandom elements of Nα we denote by

ρα(L(Ψ1),L(Ψ2)) = inf IEdα(Ψ1,Ψ2),

where the infimum is taken over all realizations of Ψ1 and Ψ2 on a com-mon probability space. The above considerations then imply the followingtheorem.

Theorem 7.9 Under Conditions (A0) and (B01), for any 0 < α ≤ 1,

(1) ρα(L(Ψ(n)),L(Ψ∗(n))) = O(n−αβ01/(β01+1)+δ

);

(2) ρα(L(Ψ∗(n)),L(Ψ∗)) = O(n−α log n

),

for any δ > 0, where β01 = (1 ∧ θ ∧ g1 ∧ a1). Thus also

(3) ρα(L(Ψ(n)),L(Ψ∗)) = O(n−αβ01/(β01+1)+δ

).

for any δ > 0. Under Condition (G), (3) becomes

(3′) ρα(L(Ψ(n)),L(Ψ∗)) = O(n−αβ0/(β0+1) log1+s(θ) n

),

with s(θ) = 1lθ=1 + 1lg1=θ≤1 and β0 = (1 ∧ θ ∧ g1). ut

Part (1) gives an estimate of the ρα–distance between the distributionsof the normalized large component sizes in the combinatorial structureand that of the normalized large cycle lengths under the Ewens SamplingFormula; Part (2) relates the latter distribution to the Poisson–Dirichletdistribution. Part (3) combines the two, and gives a direct estimate of thedistance in distribution between the large component sizes and the Poisson–Dirichlet distribution, sharpening the IR∞ weak convergence conclusion inTheorem 5.8.

7.3 The number of components

As in (6.2), let K0n = K0n(C(n)) denote the total number of componentsin a random combinatorial structure. For random permutations underthe Ewens Sampling Formula, we saw in Chapter 4.6 that K0n has thedistribution of

∑ni=1 ξi, where the ξi are independent Be (θ/(θ + i − 1))

7.3. The number of components 171

random variables. Hence, in particular, K0n approximately has a Poissondistribution (cf. Barbour and Hall (1984)):

dTV (L(K0n),Po(κ0n)) = O(1/ log n), (7.17)

where κ0n =∑ni=1 θ/(θ + i− 1).

Distributional approximations to K0n have a long history. Goncharov(1942, 1944), Kolchin (1971) and Pavlov (1988) study random permuta-tions, the latter implying that

dTV (L(K0n),Po(log n)) = O((log n)−1/2+ε

), (7.18)

for any ε > 0. Analogous results for random mappings were proved byStepanov (1969), Kolchin (1976) and Pavlov (1988), for random mappingpatterns by Mutafciev (1988), for random polynomials over a finite fieldby Car (1982), Hansen (1993) and Arratia, Barbour and Tavare (1993),and for the irreducible factors of the characteristic polynomial of a matrixT ∈ GLn(Fq) by Goh and Schmutz (1991). Brenti’s (1989, Theorem 6.4.2)remarkable representation of the law of K0n for random mappings as thelaw of

∑ni=1 ξi for independent Bernoulli random variables ξi implies a

Poisson approximation analogous to (7.17), and Hwang (1999) proves aPoisson approximation of the same accuracy to the distribution of K0n,for a wide class of logarithmic assemblies, multisets and selections, anddetermines the leading asymptotic term in the error. We now considerapproximations similar to that of (7.17) for our more general logarithmiccombinatorial structures.

A shortcut approach to proving distributional approximations to L(K0n)would be to take t = 1 in Theorem 7.3, leading to a normal approxima-tion for K0n with error of order O(log log n(log n)−1/2) with respect to abounded Wasserstein metric, suitable for measuring the accuracy of weakconvergence in a very general setting. However, this order is inferior to thatin (7.17), and such a result gives no information at all about approximationin total variation, which is a much stronger concept, and is correspondinglymore difficult to work with.

We are nonetheless able to establish two analogues of (7.17) in Theorems7.12 and 7.15, under very general conditions. Our proofs of these theoremsseparate the treatment of the small and the large components. The smallcomponents are handled by using Theorem 6.7, which implies that thedistributions of K0b(C(n)) and K0b(Z) can be matched, for suitable choiceof b; the conditional distribution of Kbn(C(n)) given C(n)[1, b] = c[1, b] isclose to that of Kbn(C∗(n)) given C∗(n)[1, b] = c[1, b], where C∗(n) hasthe distribution ESFn(θ), by (12.19), which is the essence of the proof ofTheorem 6.10. The remaining ingredient is the following theorem, showingthat, for all c1, . . . , cb outside a set of small probability under ESFn(θ), theconditional distribution of Kbn(C∗(n)) given C∗(n)[1, b] = c[1, b] is close tothe same fixed Poisson distribution; the proof is given in Chapter 12.10.

172 7. Consequences

We recall the definition (4.17) for the harmonic numbers h(·):

h(t+ 1) = γ + Γ′(t+ 1)/Γ(t+ 1), t ∈ IR+,

and we define

λbn = θh(n+ 1)− h(b+ 1).

Theorem 7.10 The estimate

dTV(L(Kbn(C∗(n)) |Tbn(C∗(n)) = l),Po(λbn − θh(θ + 1) + 1)

)= O

(λ−1bn + λ

−1/2bn n−1(n− l)

)is valid uniformly in n/2 ≤ l ≤ n and 0 ≤ b ≤ n/4.

Approximation by Kb0,αnc(Z)

The detailed argument which proves Theorems 7.12 and 7.15 runs as fol-lows. We assume that Conditions (A0) and (B01) hold, and we chooseb = b(n) = bnβc, for some fixed β < 1

2 (g1 ∧ a1 ∧ 1). We then begin byapproximating the distribution of K0n = K0n(C(n)) by Q∗

n, the convolutionof Rnb = Po(λbn + θ logα), where α = αθ = expθ−1 − h(θ + 1), and ofL(K0b(Z)).

Lemma 7.11 For any combinatorial structure satisfying Conditions (A0)and (B01),

dTV (L(K0n(C(n))), Q∗n) = O(1/ log n).

Proof. Writing pkt(X) = IP[K0b(X) = k, T0b(X) = t], and suppressing thesuperscript (n), direct calculation shows that

∆2 = 2dTV (L(K0n(C)), Q∗n)

≤∑k≥0

∑t≥0

∑s≥0

∣∣∣IP[K0b(C) = k, T0b(C) = t,Kbn(C) = s]

−IP[K0b(Z) = k, T0b(Z) = t]Rnbs∣∣∣

≤∑k≥0

∑t≥0

pkt(C)×

∑s≥0

|IP[Kbn(C) = s |K0b(C) = k, T0b(C) = t]−Rnbs|

+∑k≥0

∑t≥0

|pkt(C)− pkt(Z)|.

Now the latter sum is just 2dTV(L(K0b(C), T0b(C)),L(K0b(Z), T0b(Z))

),

which is bounded by O(ε6.7(n, nβ)) = O(n−δ) for some δ > 0, from


Theorem 6.7 and the remark following it. Furthermore, by the ConditioningRelation and independence,

IP[Kbn(C) = s |K0b(C) = k, T0b(C) = t]= IP[Kbn(C) = s |Tbn(C) = n− t]= IP[Kbn(Z) = s |Tbn(Z) = n− t];

the full argument is much as in (12.40) below. Hence we reach the estimate

∆2 ≤∑k≥0

∑t≥0

pkt(C)

×2dTV

(L(Kbn(Z) |Tbn(Z) = n− t),L(Kbn(Z∗) |Tbn(Z∗) = n− t)

)+ 2dTV

(L(Kbn(Z∗) |Tbn(Z∗) = n− t), Rnb

)+O(n−δ),

for some δ > 0. For 0 ≤ t ≤ n/2, bound the second of these distances usingTheorem 7.10, giving a quantity of order

O(1/ log n) +O(λ−1/2bn n−1IET0b

)= O(1/ log n),

by Lemma 6.3 and the definition of b = b(n); in the same range of t, thefirst of the distances is bounded by O(n−δ) for some δ > 0, from (12.19).Then, finally,∑

k≥0

∑t>n/2

pkt(C) = IP[T0b(C) > n/2]

≤ n−1IET0b(Z) +O(ε6.7(n, nβ)) = O(n−δ),

for some δ > 0, by Theorem 6.7 and Lemma 6.3 and from the choice of b.This completes the proof of the lemma. ut

Theorem 7.12 For any combinatorial structure satisfying Conditions (A0)and (B01), we have

dTV(L(K0n(C(n))),L(K0,bαnc(Z))

)= O(1/ log n), (7.19)

where α = αθ = expθ−1 − h(θ + 1).

Proof. By Lemma 10.1, taking b = bnβc as for Lemma 7.11, it follows that

dTV(L(Kb,bαnc(Z)),Po(θh(bαnc+ 1)− h(b+ 1))

)= O(n−δ) (7.20)

for some δ > 0, under Conditions (A0) and (B01). However, from thedefinition of α, we have

θh(bαnc+ 1)− h(b+ 1) (7.21)= θ(h(bαnc+ 1)− h(n+ 1)) + θ(h(n+ 1)− h(b+ 1))= θ(logα+O(n−1)) + λbn,

174 7. Consequences

uniformly in b, and thus

dTV(Po(θh(bαnc+ 1)− h(b+ 1)),Po(λbn + θ logα)

)= O(n−1). (7.22)

Combining (7.20) and (7.22) with Lemma 7.11, the estimate (7.19) isestablished. ut

Poisson approximation

The next step is to investigate when approximation by a Poisson distribu-tion is appropriate. We give two such approximations; the second is usuallysharper, but requires the existence of the second moments of the Zi. Themain effort is in proving the next two lemmas.

Lemma 7.13 If Conditions (A0) and (B01) hold, then

dTV (Q∗n,Po(λ∗0n)) = O

((log n)−1/2

),

where

λ∗0n =b∑i=1

θi−1(1 + εi1) + λbn + θ logαθ,

and b = bnβc as for Lemma 7.11.

Proof. Let W1 = K0b(Z), and let W2 ∼ Po(λbn + θ logα) be independentof Z[1, b]; write

λ1 =b∑i=1

θi−1(1 + εi1) =b∑i=1

riIP[Zi1 = 1]; λ2 = λbn + θ logα.

Then, from the Stein–Chen method (Barbour, Holst and Janson (1992)),we have, for any A ⊂ ZZ+,

Q∗n(A) − Po(λ1 + λ2)A

= IE (λ1 + λ2)gA(W1 +W2 + 1)− (W1 +W2)gA(W1 +W2)= IE λ1gA(W1 +W2 + 1)−W1gA(W1 +W2) , (7.23)

where ‖gA‖ ≤ (λ1+λ2)−1/2 and ‖∆gA‖ ≤ (λ1+λ2)−1, and the last equalityfollows because W2 ∼ Po(λ2) is independent of W1. Now, for each fixed w2,

IEW1gA(W1 + w2)

=b∑i=1

ri

IP[Zi1 = 1]IEgA(W1

(i) + w2 + 1)

+∑s≥2

sIP[Zi1 = s]IEgA(W1(i) + w2 + s)

, (7.24)


where W1(i) = W1 − Zi1; thus, by the independence of Z[1, b] and W2,∣∣∣IE λ1gA(W1 +W2 + 1)−W1gA(W1 +W2)

∣∣∣≤

b∑i=1

riIP[Zi1 = 1]|IEgA(W1 +W2 + 1)− gA(W1(i) +W2 + 1)|

+ (λ1 + λ2)−1/2b∑i=1

riIEZi1I[Zi1 ≥ 2]

≤ (λ1 + λ2)−1b∑i=1

riIP[Zi1 = 1]IEZi1 + (λ1 + λ2)−1/2θb∑i=1

i−1Fi1

= O((log n)−1/2

),

by Conditions (A0) and (B01). This proves the lemma. ut

Under stronger hypotheses, sharper rates can be obtained.

Lemma 7.14 Suppose that Conditions (A0) and (B02) hold. Then

dTV (Q∗n,Po(µ∗0n)) = O (1/ log n) ,

where µ∗0n =∑bi=1 IEZi + λbn + θ logαθ, and b = bnβc as for Lemma 7.11.

Proof. Take λ1 =∑bi=1 IEZi =

∑bi=1 riIEZi1, keeping λ2 = λbn+θ logα as

above, and argue as in Lemma 7.13 to reach (7.23) and (7.24). Now, withthe new definition of λ1, we can write∣∣∣IE λ1gA(W1 +W2 + 1)−W1gA(W1 +W2)

∣∣∣≤

b∑i=1

ri∑s≥1

sIP[Zi1 = s]|IEgA(W1 +W2 + 1)− gA(W1(i) +W2 + s)|

≤ (λ1 + λ2)−1b∑i=1

ri IP[Zi1 ≥ 1]IEZi1 + IEZi1(Zi1 − 1)

= O (1/ log n) ,

which proves the lemma. ut

Lemmas 7.13 and 7.14 can be used to deduce the following Poissonapproximations.

Theorem 7.15 Under Conditions (A0) and (B01),

(1) dTV(L(K0n(C(n))),Po(θ log n)

)= O((log n)−1/2);

176 7. Consequences

if also Condition (B02) holds, then

(2) dTV

(L(K0n(C(n))),Po

(n∑i=1

IEZi + θ logαθ

))= O((log n)−1).

Proof. We begin by observing that

dTV (Po(µ),Po(ν)) ≤ ν−1/2|µ− ν|, (7.25)

for any µ and ν **Yannaros** (Barbour, Holst and Janson (1992), The-orem 1.C (i)). Take b = bnβc in Lemmas 7.11 and 7.13, and then apply(7.25) with µ = θ log n and ν = λ∗0n: with these definitions, it follows that|µ−ν| = O(1+

∑ni=1 i

−1µi) = O(1) under Conditions (A0) and (B01), andν log n. This proves Part (1).

For Part (2), take b = bnβc in Lemmas 7.11 and 7.14, and then apply(7.25) with µ =

∑ni=1 IEZi + θ logαθ and ν = µ∗0n: once again, ν log n,

and now

|µ− ν| = O

(n∑

i=b+1

i−1µi

)= O(b−(g1∧a1)) = O(1/ log n),


There is no obvious equivalent to Theorem 7.15 for the limiting PoissonDirichlet process PD(θ), since it has a.s. infinitely many points in (0, 1].However, Hirth (1997) has shown the next best thing: that if Kε denotesthe number of points of the Poisson-Dirichlet process in (ε, 1], then

dTV (L(Kε),Po(IEKε)) = O(1/ log(ε−1))

uniformly as ε ↓ 0, with IEKε log(ε−1).

Asymptotics of the mean

Theorems 7.12 and 7.15 above are directed to distributional approximation.However, the term ‘logarithmic’ as applied to combinatorial structures wasmotivated in (2.1) of Chapter 2 by appealing to the asymptotic relationIEK0n(C(n)) ∼ θ log n for some θ > 0. This relation is indeed true underour conditions, as the following theorem shows.

Theorem 7.16 Under Conditions (A0) and (B01), as n→∞,

IEK0n(C(n)) ∼ θ log n.


Proof. It is clearly enough to show that |IEK0n(C(n))−θh(n+1)| = o(log n)as n→∞. We begin by writing

IEK0n − θh(n+ 1) =n∑j=1

IECj(n) − θj−1

= U1 + U2 + U3, (7.26)

where

U1 =n∑

j=bn/4c+1

IECj(n) − θn∑

j=bn/4c+1

1/j;

U2 =bn/4c∑j=1

rjIP[Cj1(n) = 1]− θj−1,

and

U3 =bn/4c∑j=1

rjIECj1(n)I[Cj1(n) ≥ 2].

It is immediate that |U1| ≤ 4 + θ log 4. Then, arguing much as in the proofof Lemma 13.3, it follows that

|U3| ≤ Kn(2)

bn/4c∑j=1

j−1(µj + χ(θ)j1 ),

and this is bounded under Conditions (A0) and (B01), in view ofPropositions 6.1 and 6.2 (c). This leaves

U2 =bn/4c∑j=1

θj−1

(1 + εj1)

IP[T0n(j)(Z) = n− j]

IP[T0n(Z) = n]− 1,

because of the Conditioning Relation.Now, from Lemma 9.2 and Theorem 11.10, it follows that

bn/4c∑j=1

j−1|εj1|IP[T0n

(j)(Z) = n− j]IP[T0n(Z) = n]

= O(1) (7.27)

as n→∞, under Conditions (A0) and (B01), bounding the contributionto U2 from the part involving |εj1|. Then, by considering the possible valuesof Zj1, we have

IP[T0n(Z) = n− j] =(

1− θ

jrj(1 + Ej0)

)IP[T0n

(j)(Z) = n− j]

+∑r≥1

θ

jrjεjrIP[T0n

(j)(Z) = n− (r + 1)j] +θ

jrjIP[T0n

(j)(Z) = n− 2j],

178 7. Consequences

so that

|IP[T0n(Z) = n− j]− IP[T0n(j)(Z) = n− j]|

≤ θ

jrj

|IP[T0n(j)(Z) = n− j]− IP[T0n

(j)(Z) = n− 2j]|

+∑r≥1

|εjr| |IP[T0n(j)(Z) = n− (r + 1)j]− IP[T0n

(j)(Z) = n− j]|

.

Now, for s ≤ n/2, again from Lemma 9.2, nIP[T0n(j)(Z) = n − s] ≤ 2K0θ;

hence it follows that

n|IP[T0n(Z) = n− j]− IP[T0n(j)(Z) = n− j]|

≤ 2K0θ2j−1

1 +∑r≥1

|εjr|

+∑

r>12 bn/2jc

|εjr|θnj−1

≤ 2K0θ2j−1 1 + ρj+

∑r>

12 bn/2jc

θnj−1|εjr| 4jr/n,

with the latter term bounded by∑r>blognc 4θrε∗0r in j ≤ n/(4 log n) and

by µ∗0 <∞ otherwise. Hence it follows from Theorem 11.1 that

bn/4c∑j=1

|IP[T0n(Z) = n− j]− IP[T0n(j)(Z) = n− j]|

jIP[T0n(Z) = n]= o(log n). (7.28)

Furthermore, by Theorem 11.1, we have nIP[T0n(Z) = n− j] ∼ pθ(1− j/n)uniformly in 0 ≤ j ≤ n/2, and pθ is continuous at 1; hence also

bn/4c∑j=1

1j

∣∣∣∣ IP[T0n(Z) = n− j]IP[T0n(Z) = n]

− 1∣∣∣∣ = o(log n). (7.29)

Combining (7.27) – (7.29) gives |U2| = o(log n), and the theorem is proved.ut

Point probabilities

The accuracy of Theorem 7.15 (2) is sufficient to imply good estimates forpoint probabilities as well, which, in the body of the distribution, are ofmagnitude O((log n)−1/2). Combining this observation with tilting, largedeviation estimates for the point probabilities IP[K0n(C(n)) = k] can bederived, with relative errors uniformly small in ranges of the form

εθ log n ≤ k ≤ ε−1θ log n, (7.30)


for suitable ε > 0, provided that the tilted structures satisfy appropriateconditions. Asymptotic expansions for such probabilities have been ob-tained by Hwang (1998c) under rather different conditions, using generatingfunction techniques.

To establish such estimates, we assume that the ϕ0–tilted structure sat-isfies Conditions (A0) and (B01), for some ϕ0 > 1. We then take any fixedε > 1/ϕ0, and consider values of k in the corresponding range (7.30). Webegin with some useful lemmas.

Lemma 7.17 In the above setting,∣∣∣∣∣n∑i=1

IEZi(ϕ) + θ logαθ − ϕθ log n

∣∣∣∣∣ = O(1),

uniformly in 0 ≤ ϕ ≤ ε−1.

Proof. Using the notation of Chapter 6.4, we have

IEZi(ϕ) = riIEZi1(ϕ)

=θ

iMi(ϕ)

ϕ(1 + εi1) +∑l≥2

lϕlεil

,

where, as before, Mi(ϕ) = IEϕZi1. Hence∣∣∣∣∣n∑i=1

IEZi(ϕ)− ϕθ log n

∣∣∣∣∣≤ ϕθ

∣∣∣∣∣n∑i=1

1i− log n

∣∣∣∣∣+n∑i=1

1iMi(ϕ)

|Mi(ϕ)− 1|

+n∑i=1

1iMi(ϕ)

∑l≥1

lϕl−1|εil|

.

Now, from inequalities (6.21) and (6.22), |Mi(ϕ) − 1| = O(1/iri) uni-formly in 0 ≤ ϕ ≤ ϕ0; from (6.23), |εi1| = O(i−(g1∧1)); and from (6.19), forl ≥ 2,

lϕl−1εil = l(ϕ/ϕ0)l−1Mi(ϕ0)εil(ϕ0).

Since also the ϕ0–tilted structure satisfies Conditions (A0) and (B01) and(ϕ/ϕ0) ≤ 1/(εϕ0) < 1, the conclusion of the lemma is immediate. ut

180 7. Consequences

Lemma 7.18 Let β2 = (1∧ g1 ∧ a1) > 0 be as for Theorem 6.11. Then, inthe above setting,

(1) | log IEϕZi − θi−1(ϕ− 1)| = O(i−(1+β2)

);

(2)∣∣∣∣ ddϕ log IEϕZi − θi−1

∣∣∣∣ = O(i−(1+β2)

);

(3) h−1|Mi(ϕ+ h)−Mi(ϕ)− hθ/iri| = O(i−(1+β2)r−1

i

),

uniformly in 0 ≤ ϕ ≤ ε−1 and in 0 ≤ h ≤ 12 (ϕ0 − ε−1).

Proof. Again using the notation of Chapter 6.4, we have

log IEϕZi = ri logMi(ϕ) = ri(Mi(ϕ)− 1) +O(i−2),

uniformly in 0 ≤ ϕ ≤ ϕ0, by (6.21) and (6.22). Then, much as before,

|ri(Mi(ϕ)− 1)− θi−1(ϕ− 1)|

=

∣∣∣∣∣∣ri∑l≥1

(ϕl − 1)IP[Zi1 = l]− θi−1(ϕ− 1)

∣∣∣∣∣∣≤ θi−1

(ϕ− 1)|εi1|+∑l≥2

(ϕ ∨ 1)lεil

≤ θi−1

(ϕ− 1)|εi1|+∑l≥2

(ϕ ∨ 1ϕ0

)lϕ0Mi(ϕ0)εil(ϕ0)

;

part (1) now follows because the ϕ0–tilted structure satisfies Conditions(A0) and (B01) and (ϕ ∨ 1)/ϕ0 ≤ 1/(εϕ0) < 1.

For part (2), we can differentiate term by term, since we only considervalues of ϕ within the radius of convergence of the power series. This gives

d

dϕlog IEϕZi =

riMi(ϕ)

∑l≥1

lϕl−1IP[Zi1 = l],

so that∣∣∣∣ ddϕ log IEϕZi − θi−1

∣∣∣∣ ≤ θ

iMi(ϕ)

|Mi(ϕ)− 1|+∑l≥1

lϕl−1|εil|

,

bounded uniformly much as above. For part (3), the argument is similar:

|Mi(ϕ+ h)−Mi(ϕ)− hθ/iri| ≤hθ

iri

∑l≥1

l(ϕ+ h)l−1|εil|. ut


A consequence of Lemma 7.18 is that

K(ϕ) =∑i≥1

(log IEϕZi − θi−1(ϕ− 1)) (7.31)

is well defined and finite, and has uniformly bounded derivative in the range0 ≤ ϕ ≤ ε−1; furthermore,∑

i≥n+1

| log IEϕZi − θi−1(ϕ− 1)| = O(n−β2), (7.32)

uniformly in 0 ≤ ϕ ≤ ε−1.

Lemma 7.19 In the above setting, uniformly for k in the range (7.30), wehave

IPϕ[K0n(C(n)) = k] = Po(ϕθ log n)k

1 +O

(1√

log n

),

where ϕ = ϕ(k) = k/(θ log n).

Proof. The statement follows from Theorem 7.15 (2) applied to the ϕ–tilted process. Note that, because Condition (B01) holds for the ϕ0–tiltedprocess, Condition (B02) holds for each ϕ–tilted process with ϕ < ϕ0, sothat the conditions of the theorem are indeed satisfied for any 0 ≤ ϕ ≤ ε−1.It thus remains to check that the order term in Theorem 7.15 (2) appliedto the ϕ–tilted process is uniform in ϕ in the range ε ≤ ϕ ≤ ε−1, andthat shifting the mean of the Poisson distribution in the theorem from∑ni=1 IEZi(ϕ) + θ logαθ to ϕθ log n has uniformly small effect.This latter point is assured by Lemma 7.17. For the former, we need

to check back through Lemmas 7.11 and 7.14. However, the error boundsappearing there, which derive from Theorem 6.7, (12.19) and Lemma 6.3,involve only quantities expressed in terms of |εi1(ϕ)| and εil(ϕ), l ≥ 2,together with coefficients which are uniformly bounded for all ε ≤ ϕ ≤ ε−1.Hence, in view of (6.24)–(6.26) and of Proposition 6.4 in particular, theorder term in Theorem 7.15 (2) applied to the ϕ–tilted processes is indeeduniform in ε ≤ ϕ ≤ ε−1, for any fixed ε < 1/ϕ0, completing the proof. ut

Lemma 7.20 In the above setting,

nIP[T0n(Z(ϕ)) = n] = pϕθ(1)1 +O(n−β02+δ),

for any δ > 0, uniformly in ε ≤ ϕ ≤ ε−1, where β02 = (1∧ θ∧ g1∧a1∧a2).

Proof. Using the results of Chapter 6.4, the statement of the lemma isimmediate from Theorem 11.11 (ii) applied to the ϕ–tilted process, provided

182 7. Consequences

that the quantities ε10.12(n) and n−1φ11.6(n, n) which appear in theresulting error bounds are both uniformly of order O(n−β02+δ) in the rangeε ≤ ϕ ≤ ε−1. Once again, this follows from (6.24)–(6.26), and in particularfrom Proposition 6.4. ut

We now return to estimating the probability IP[K0n(C(n)) = k]. Tiltingmakes life easy, since, for any z ∈ ZZ∞+ satisfying K0n(z) = k, we have

IP[Z = z]IP[Z(ϕ) = z]

= ϕ−k∏

1≤i≤n

IEϕZi

Hence

IP[K0n(C(n)) = k] =IP[K0n(Z) = k, T0n(Z) = n]

IP[T0n(Z) = n]

=IP[K0n(Z(ϕ)) = k, T0n(Z(ϕ)) = n]

IP[T0n(Z) = n]

ϕ−k ∏1≤i≤n

IEϕZi

(7.33)

= IPϕ[K0n(C(n)) = k]

ϕ−k ∏1≤i≤n

IEϕZi

IP[T0n(Z(ϕ)) = n]IP[T0n(Z) = n]

.

Taking ϕ = k/θ log n, it follows from (7.31) and (7.32) that

ϕ−k∏

1≤i≤n

IEϕZi = ϕ−keθ(ϕ−1)h(n+1)+K(ϕ)1 +O(n−β2)

= ϕ−knθ(ϕ−1)eγθ(ϕ−1)+K(ϕ)1 +O(n−β2),

uniformly in ε ≤ ϕ ≤ ε−1. From Lemma 7.20, we have

IP[T0n(Z(ϕ)) = n]IP[T0n(Z) = n]

=pϕθ(1)pθ(1)

1 +O(n−β02+δ)

=eγθΓ(θ)eγϕθΓ(ϕθ)

1 +O(n−β02+δ),

from (4.26), uniformly in ε ≤ ϕ ≤ ε−1. Finally, from Lemma 7.19, we have

IPϕ[K0n(C(n)) = k] = Po(ϕθ log n)k

1 +O

(1√

log n

),

again uniformly in ε ≤ ϕ ≤ ε−1. Combining these with (7.33), we thus have

IP[K0n(C(n)) = k]

= Po(ϕθ log n)knθ(ϕ−1)ϕ−keK(ϕ) Γ(θ)Γ(ϕθ)

1 +O

(1√

log n

).

This expression simplifies somewhat because, for any λ > 0,

Po(ϕλ)k = e−(ϕ−1)λϕkPo(λ)k;

7.4. Erdos–Turan laws 183

taking λ = θ log n, this implies the following theorem.

Theorem 7.21 If the ϕ0–tilted structure satisfies Conditions (A0)and (B01) for some ϕ0 > 1, then, for any ε < 1/ϕ0,

IP[K0n(C(n)) = k] = Po(θ log n)keK(ϕ) Γ(θ)Γ(ϕθ)

1 +O

(1√

log n

)= Po(θ log n)k − 1eK(ϕ) Γ(1 + θ)

Γ(1 + ϕθ)

1 +O

(1√

log n

),

uniformly in εθ log n ≤ k ≤ ε−1θ log n, where ϕ = k/θ log n.

Remark. For the Ewens Sampling Formula, we have K(ϕ) = 0 for all ϕ,so that this formula agrees (as it must) with (4.67).

7.4 Erdos–Turan laws

If σ is a permutation of n objects with cycle counts c = (cj , j ≥ 1), where cjis the number of cycles of order j, then the order of σ is On = On(c), definedby

On(c) = l.c.m. i : 1 ≤ i ≤ n, ci > 0.

Erdos and Turan (1967) showed that, under the uniform distribution onthe set of permutations, logOn is approximately normally distributed. Asindicated on page 16, their theorem has been progressively sharpened; thefollowing versions are proved in Barbour and Tavare (1994).

Proposition 7.22 If C∗(n) is distributed according to the Ewens SamplingFormula ESF(θ), and

Rn(c) =(logOn(c)− θ

2 log2 n+ θ log n log log n),

then

supx

∣∣∣IP[ θ3 log3 n−1/2Rn(C∗(n)) ≤ x]− Φ(x)

∣∣∣ = O(log n−1/2).

Proposition 7.23 It is possible to construct C∗(n) and a standardBrownian motion W on the same probability space, in such a way that

IE

sup0≤t≤1

|W ∗n(t)−W (t3)|

= O

( log log n√log n

),

where W ∗n(t) = θ3 log3 n−1/2

(logO[nt](C∗(n))− θt2

2 log2 n).

184 7. Consequences

The theorems carry over almost unchanged to any combinatorial structureC(n) satisfying Conditions (A0) and (B01).

Theorem 7.24 Under Conditions (A0) and (B01),

supx

∣∣∣IP[ θ3 log3 n−1/2Rn(C(n)) ≤ x]− Φ(x)

∣∣∣ = O(log n−1/2).

Furthermore, it is possible to construct C(n) and a standard Brownianmotion W on the same probability space, in such a way that

IE

sup0≤t≤1

|Wn(t)−W (t3)| ∧ 1

= O( log log n√

log n

),

where Wn(t) = θ3 log3 n−1/2(logO[nt](C

(n))− θt2

2 log2 n).

Proof. The quantities involving C(n) are replaced by those with C∗(n), byappealing to Theorem 6.9 with b = logm n, for any m > 3/(2β01), whereβ01 = (1 ∧ θ ∧ g1 ∧ a1). Propositions 7.22 and 7.23 are then applied.

Indeed, from Theorem 6.9 with b as above, C(n) and C∗(n) can beconstructed on the same probability space in such a way that

IP[C(n)[b+ 1, n] 6= C∗(n)[b+ 1, n]

]= O(log−3/2 n).

Then, if C(n)[b+ 1, n] = C∗(n)[b+ 1, n], it follows that

∣∣logO[nt](C(n))− logO[nt](C∗(n))

∣∣ ≤ b∑j=1

(C(n)j + C

∗(n)j ) log j

for all t ∈ [0, 1]. Hence

IE

sup0≤t≤1

|Wn(t)−W ∗n(t)| ∧ 1

= O(log−3/2 n+ log−3/2 n

logm n∑j=1

log jIEC(n)j + IEC∗(n)

j )

= O((m log log n)2 log−3/2 n

),

where the last estimate uses Lemma 13.3 and Proposition 6.2 (c). The re-mainder of the proof is immediate. ut

Remark. For random polynomials over GF (q), the theorem settles aconjecture of Erdos and Nicolas (1984).

7.5. Additive function theory 185

7.5 Additive function theory

Knopfmacher (1979) formalized the idea of an additive arithmetic semi-group, and used it as a general setting for an algebraic analogue of numbertheory. An additive arithmetic semigroup G is a free commutative semi-group with identity element 1 having a countable free generating set P ofprimes p and a degree mapping ∂ : G → ZZ+ satisfying

(1) ∂(ab) = ∂(a) + ∂(b) for all a, b ∈ G;(2) G(n) <∞ for each n ≥ 0,

where G(n) denotes the number of elements of degree n in G; in view of (1)and (2), ∂(a) = 0 if and only if a = 1. A real function f on G is additive iff(ab) = f(a) + f(b) for each coprime pair a, b ∈ G; f is strongly additiveif f(pk) = f(p) for each p ∈ P, k ≥ 1, and f is completely additive iff(pk) = kf(p). For example, with f(p) = 1 for all p ∈ P and f completelyadditive, then f(a) is the number of prime factors of a; if instead f isstrongly additive, then f(a) is the number of distinct prime factors of a. Theasymptotic properties of f(a) when n = ∂(a) →∞ for additive functions fhas been much studied, in particular in Zhang (1996b), which is a goodsource of further references.

The connection with this monograph is that, with primes as irreducibleelements and with degree as the size of an element, an additive arithmeticsemigroup is just a multiset. Choosing an element a ∈ G with ∂(a) = nat random is then just sampling an element of size n uniformly at randomfrom the multiset. However, the value f(a) of an additive function at a ∈ G,in our language, depends not only on the component structure of a, but alsoon which irreducible elements of the different component sizes it is com-posed of. For instance, if Cj(n) = 1, then f(a) contains a contribution f(p)from one of the mj primes p with ∂(p) = j; if Cj(n) = 2, there is eithera contribution f(p) + f(p′) from one of the

(mj

2

)distinct pairs of primes

of degree j, or a contribution f(p2) from a repeated prime p of degree j.Because, in choosing a random instance of a multiset, the particular irre-ducible elements of each component size that are chosen are also random,there is randomness additional to that of the component structure, and itis carried over into the distribution of f(a). This motivates considerationof the following general construct, which can be defined for any logarithmiccombinatorial structure, and not just for multisets:

X(n) =n∑j=1

1lC(n)j ≥ 1Uj(C(n)

j), (7.34)

where the (Uj(l), j, l ≥ 1) are independent random variables which are alsoindependent of C(n).

For an additive function f on an additive arithmetic semigroup, X(n)

constructed as above indeed models f(a), for randomly chosen elements

186 7. Consequences

a ∈ G with ∂(a) = n, if the distributions of the random variables Uj(l) areappropriately specified. The distribution of Uj(1) assigns probability 1/mj

to f(p) for each of the mj irreducible elements p of degree j; Uj(2) givesprobability 2/mj(mj+1) to f(p)+f(p′) for each of the

(mj

2

)pairs of distinct

irreducible elements p and p′ of degree j, and probability 2/mj(mj + 1)to each f(p2); and so on. In the example with f(p) = 1 for all primes pand f completely additive, counting the total number of prime factors, thenUj(l) = l a.s. for all j; if instead f is strongly additive, counting the numberof distinct prime factors, then Uj(l) has a more complicated distribution.

Alternatively, considering the decomposition of multisets into condi-tioned geometric random variables, one could write

X(n) =n∑j=1

mj∑r=1

∑s≥0

1lCjr(n) = sf(psjr), (7.35)

where pjr denotes the r’th irreducible element of degree j, thus replac-ing the random variables Uj(s) by constants. In particular, for completelyadditive functions, this gives the representation

X(n) =n∑j=1

mj∑r=1

Cjr(n)f(pjr),

and for strongly additive functions the representation

X(n) =n∑j=1

mj∑r=1

1lCjr(n) ≥ 1f(pjr),

both of which are agreeably simple. However, we do not need this extrasimplification in our argument, and we therefore work only in terms of themore general (7.34).

Our goal here is to describe the limiting behaviour of X(n) for a generallogarithmic combinatorial structure. In Sections 7.5.1 and 7.5.2, we mimicthe results that Zhang (1996b) proves for additive arithmetic semigroups.In each case, our proof consists of showing that only the small componentscontribute significantly to the result; once this has been shown, Theorem 6.7reduces the problem to that of a sum of independent random variables, towhich classical theory can be applied. Our conditions are different in formfrom those of Zhang (1996b), since the Logarithmic Condition for multisetsis expressed in terms of the numbers mj of irreducible elements of size j,whereas Zhang formulates his conditions in terms of the total number G(n)of elements of size n. One advantage of our approach, even for additivearithmetic semigroups, is that our results are valid for all θ > 0; Zhanghas to restrict attention to situations where θ ≥ 1, and at times to θ = 1.The final section concerns the setting in which the behaviour of X(n) isdominated by that of the large components, and the dependence becomes


all important; here, the approximations are formulated in terms of theEwens Sampling Formula.

The classical definition of an additive function also allows f to be complexvalued. For complex valued f , both real and imaginary parts are real valuedadditive functions, and for our purposes such an f can be treated using atwo dimensional generalization of (7.34). More generally, we can considerthe construction

X(n) =n∑j=1

1lC(n)j ≥ 1Uj(C(n)

j) (7.36)

with Uj(l) = ((Uj1(l), . . . , Ujd(l)), j, l ≥ 1, now independent d–dimensionalrandom vectors.

7.5.1 Convergence

The first set of results concerns conditions under which the random vari-ables X(n) have a limit in distribution, without normalization. The theoremis thus an analogue of the Erdos–Wintner theorem in additive numbertheory. Hereafter, we write Uj for Uj(1).

Theorem 7.25 Suppose that Conditions (A0), (D1) and (B11) hold. ThenX(n) converges in distribution if and only if the series∑j≥1

j−1IP[|Uj | > 1];∑j≥1

j−1IEUj1l|Uj | ≤ 1;∑j≥1

j−1IEU2j 1l|Uj | ≤ 1

(7.37)all converge. If so, then

limn→∞

L(X(n)) = L

∑j≥1

1lZj ≥ 1Uj(Zj)

.

Proof. The three series (7.37) are equivalent to those of Kolmogorov’sthree series criterion (Loeve 1977a, p249) for the sum of independent ran-dom variables

∑j≥1 1lZj = 1Uj , since, from the Logarithmic Condition,

IP[Zj = 1] j−1. Since also, under Condition (B01),∑j≥1 IP[Zj ≥ 2] <∞,

it follows that∑j≥1 1lZj = 1Uj and

∑j≥1 1lZj ≥ 1Uj(Zj) are con-

vergence equivalent. Hence it is enough to show that, for some sequenceb(n) → ∞, X(n) and W0,b(n)(Z) are asymptotically close to one another,where, for any y ∈ ZZ∞+ and any 0 ≤ l < m,

Wlm(y) =m∑

j=l+1

1lyj ≥ 1Uj(yj).

188 7. Consequences

That this is the case follows from Lemmas 7.26 and 7.27 below.

Lemma 7.26 If Conditions (A0), (D1) and (B11) hold, and if

limn→∞

n−1n∑j=1

IP[|Uj | > δ] = 0 for all δ > 0, (7.38)

then there exists a sequence b(n) →∞ with b(n) = o(n) such that X(n) andW0,b(n)(Z) are convergence equivalent.

Proof. First, we note that, for any b, X(n) = W0b(C(n)) + Wbn(C(n)). ByTheorem 6.7,

dTV (L(W0b(C(n))),L(W0b(Z))) = O(b/n),

and so W0b(n)(C(n)) and W0b(n)(Z) are convergence equivalent for any se-

quence b(n) such that b(n) = o(n) as n→∞. It thus remains to show thatWb(n),n(C

(n)) →d 0 for some such sequence b(n).Now, from Theorem 6.9, under Conditions (A0) and (B01), it follows that

dTV (L(Wb(n),n(C(n))),L(Wb(n),n(C∗(n)))) → 0

provided only that b(n) →∞. Furthermore, defining

Wl,m(y) =m∑

j=l+1

1lyj = 1Uj (7.39)

for any y ∈ ZZ∞+ and any 0 ≤ l < m, we have

dTV

L(Wb(n),n(C∗(n))),L(Wb(n),n(C∗(n)))

≤ IP

n⋃j=b(n)+1

1lC∗(n)j ≥ 2

≤ b(n)−1c13.2 (7.40)

from Lemma 13.2. Hence, so long as b(n) →∞, Wb(n),n(C(n)) →d 0 follows,

if we can show that Wb(n),n(C∗(n)) →d 0.Because of the assumption (7.38), there exists a sequence δn → 0 such

that

ηn = n−1n∑j=1

IP[|Uj | > δn] → 0

as n→∞. Thus, defining

An(b) =n⋃

j=b+1

C∗(n)

j = 1 ∩ |Uj | > δn,


we have

IP[An(b)] ≤n∑

j=b+1

IP[C∗(n)j = 1]IP[|Uj | > δn]

≤n∑

j=b+1

c13.4j−1

(n

n− j + 1

)1−θ

IP[|Uj | > δn], (7.41)

from Lemma 13.4. Thus, for any n/2 < m < n, it follows that

IP[An(b)] ≤ c13.4

nb 1n

n∑j=1

IP[|Uj | > δn]

( n

n−m+ 1

)1−θ

+n1−θ(n−m+ 1)θ

mθ

≤ c13.4

nηnb

(n

n−m

)+

2θ

(n−m+ 1

n

)θ. (7.42)

Now, if An(b) does not occur, then

Wbn(C∗(n)) = W bn =n∑

j=b+1

1lC∗(n)j = 1Uj1l|Uj | ≤ δn,

and

IE|W bn| ≤ δn

n∑j=b+1

IP[C∗(n)j = 1]. (7.43)

Again, from Lemma 13.4, arguing much as above, we thus have

IE|W bn| ≤ c13.4δn

(n

n−m

)log(n+ 1b+ 1

)+

2θ

(n−m+ 1

n

)θ.

(7.44)So pick b(n) = o(n) so large that

η′n = maxnηn/b(n), δn log(n/b(n)) → 0,

and then pickm(n) such that n−m(n) = o(n) and yet nη′n/(n−m(n)) → 0;for these choices, it follows from (7.42) and (7.44) that

limn→∞

IE|W b(n),n| = 0 and limn→∞

IP[Wb(n),n(C∗(n)) 6= W b(n),n] = 0,

and hence that Wb(n),n(C∗(n)) →d 0, completing the proof. ut

190 7. Consequences

Lemma 7.27 If the three series (7.37) converge, or if Conditions (A0),(D1) and (B11) hold and X(n) converges in distribution, then

limn→∞

n−1n∑j=1

IP[|Uj | > δ] = 0 for all δ > 0.

Proof. The first part is standard, using Chebyshev’s inequality and Kro-necker’s lemma. For the second, we begin by showing that X(n) is close intotal variation to X(b,n), for suitably chosen b = b(n), where

X(b,n) =b∑j=1

1lCj(b,n) ≥ 1Uj(Cj(b,n)) +n∑

j=b+1

Cj(b,n)∑l=1

U ′jl, (7.45)

with C(b,n) defined as for Theorem 6.11 and with (U ′jl, j ≥ 1, l ≥ 1) in-

dependent of one another and of C(b,n) and such that L(Ujl) = L(Uj).This is true because Theorem 6.11 shows that, if Conditions (A0), (D1)and (B11) hold, then dTV (L(X(n)),L(X(bn))) → 0 whenever b(n) → ∞,where X(b,n) =

∑nj=1 1lCj(b,n) ≥ 1Uj(Cj(b,n)); and then

dTV (L(X(b,n)),L(X(b,n))) ≤n∑

j=b+1

IP[Cj(b,n) ≥ 2],

which is of order b−1, uniformly in b and n, by the argument provingLemma 13.2, since, for b+ 1 ≤ j ≤ n, Zj(b,n) = Z∗j , and hence

IP[Cj(b,n) = l] ≤ IP[Zj(b,n) = l] IP[T0n(Z(b,n)) = n− jl]IP[Zj(b,n) = 0] IP[T0n(Z(b,n)) = n]

=1l!

(θ

j

)l IP[T0n(Z(b,n)) = n− jl]IP[T0n(Z(b,n)) = n]

≤ 1l!

(θ

j

)lc(7.46)

(n+ 1

n− jl + 1

)1−θ

, (7.46)

by Corollary 9.3 and Theorem 11.10, for suitable choice of c(7.46). Hence,for any f ∈ FBL, where

FBL = f : IR → [− 12 ,

12 ]; ‖f ′‖ ≤ 1, (7.47)

it follows that

|IEf(X(n))− IEf(X(b,n))| ≤ η1(n, b), (7.48)

where η1(n, b) is increasing in n for each fixed b, and, if b(n) → ∞, thenlimn→∞ η1(n, b(n)) = 0.

Now let R(b,n) denote a size–biased choice from C(b,n): that is,

IP[R(b,n) = j |C(b,n)] = jCj(b,n)/n. (7.49)


Then a simple calculation shows that, for b+1 ≤ j ≤ n, and for any c ∈ ZZ∞+with

∑j≥1 jcj = n,

IP[C(b,n) = c |R(b,n) = j] = IP[C(b,n−j) + εj = c],

where εj denotes the j’th coordinate vector in ZZ∞+ . Hence, for any f ∈ FBL,the equation

IEf(X(b,n)) =n∑j=1

IP[R(b,n) = j]IEf(X(b,n)) |R(b,n) = j

implies that

IEf(X(b,n))n∑j=1

IP[R(b,n) = j] =n∑j=1

IP[R(b,n) = j]IEf(X(b,n−j) + Uj),

where Uj is independent of X(b,n−j) and L(Uj) = L(Uj). Hence, for anym ∈ [b+ 1, n], we have∣∣∣∣∣∣

m∑j=b+1

IP[R(b,n) = j]IEf(X(b,n))− IEf(X(b,n−j) + Uj)

∣∣∣∣∣∣≤ IP[R(b,n) ≤ b] + IP[R(b,n) > m]. (7.50)

If X(n) converges in distribution to some X∞, then

η2(m) = supn≥m

supf∈FBL

|IEf(X(n))− IEf(X∞)|

exists and satisfies limm→∞ η2(m) = 0, by Dudley (1976, Theorem 8.3).Hence, from (7.50), it follows that if V(b,n) is independent of X∞ andsatisfies

IP[V(b,n) ∈ A] =n∑j=1

IP[R(b,n) = j]IP[Uj ∈ A],

then

|IEf(X∞)− IEf(X∞ + V(b,n))|

=

∣∣∣∣∣∣n∑j=1

IP[R(b,n) = j]IEf(X∞)− IEf(X∞ + Uj)

∣∣∣∣∣∣≤ IP[R(b,n) ≤ b] + IP[R(b,n) > m]

+

∣∣∣∣∣∣m∑

j=b+1

IP[R(b,n) = j]IEf(X∞)− IEf(X∞ + Uj)

∣∣∣∣∣∣≤ IP[R(b,n) ≤ b] + IP[R(b,n) > m] + η2(n) + η1(n, b)

+m∑

j=b+1

IP[R(b,n) = j](η2(n− j) + η1(n− j, b))

192 7. Consequences

+

∣∣∣∣∣∣m∑

j=b+1

IP[R(b,n) = j]IEf(X(b,n))− IEf(X(b,n−j) + Uj)

∣∣∣∣∣∣≤ 2IP[R(b,n) ≤ b] + 2IP[R(b,n) > m] + 2η1(n, b) + 2η2(n−m). (7.51)

Furthermore, if Conditions (A0) and (B01) hold, then

IP[R(b,n) ≤ b] = n−1b∑j=1

jIECj(b,n) = O(bn−1),

by Lemma 13.3 and Proposition 6.2 (b), and, for m > n/2,

IP[R(b,n) > m] = n−1n∑

j=m+1

jIP[Cj(b,n) = 1] ≤ c13.4

(n−m+ 1

n

)θ,

from Lemma 13.4. Hence, for any choice of b(n) such that b(n) →∞ withb(n) = o(n), we can choose m(n) such that n − m(n) → ∞ and thatn−m(n) = o(n), and deduce that

limn→∞

|IEf(X∞)− IEf(X∞ + V(bn))| = 0,

for all f ∈ FBL. Thus, considering complex exponentials in place of f , itfollows easily that V(bn) →d 0 (Loeve 1977a, Application 3, p 210), andhence that

IP[|V(bn)| > δ] =n∑j=1

IP[R(bn) = j]IP[|Uj | > δ] → 0,

for all δ > 0.Finally, from the definition of R(b,n), for b+ 1 ≤ j ≤ n/2, we have

IP[R(b,n) = j] =θ

n

IP[T0n(Z(b,n)) = n− j]IP[T0n(Z(b,n)) = n]

=θ

nexp−θ[h(n+ 1)− h(n− j + 1)] IP[T0,n−j(Z(b,n)) = n− j]

IP[T0n(Z(b,n)) = n]

≥ 2−θθ

n

Pθ[0, 1]3K

,

from Lemma 9.2 and Theorem 11.10, where the constant K can be takento be K0(Z) +K0(Z∗). Hence we have proved that

limn→∞

n−1

n/2∑j=b(n)+1

IP[|Uj | > δ] = 0,

and, since b(n) = o(n), the lemma follows. ut

Zhang (1996b) works under conditions specifying the asymptotic be-haviour of the total numberG(n) of different elements of size n: for instance,


for his counterpart of Theorem 7.25 for additive arithmetic semigroups, heassumes that ∑

n≥1

supm≥n

|q−mG(m)−Q(m)| <∞, (7.52)

where Q(n) =∑ri=1Ain

ρ−i−1, with ρ1 < ρ2 < · · · < ρr ≥ 1 and Ar > 0.In our formulation, applying Theorem 7.25 to multisets, Conditions (A0),(D1) and (B11) relate instead to the numbers mj of irreducible elements ofsize j: if θj = jmjq

−j , then we merely require that

|θj − θ| = O(j−s) and |θj+1 − θj | = O(j−1−s) for some s > 0, (7.53)

without any more detailed specification of the exact form of the θj . Notealso that, in our formulation, essential conditions such as mi ≥ 0 areinstantly visible, whereas, with conditions such as Zhang’s, there is anadditional implicit condition on the G(n)’s, that they are in fact generatedby non–negative (and, in his setting, integral) mi’s. Thus, for example,G(n) = Aqn is not admissible if A ≥ 2, because it would require m2 < 0:see (7.55) below.

Translation between the two sorts of conditions is made possible byobserving that, in our terms, for multisets,

mn

G(n)= IP[C(n)

n = 1] =IP[Zn = 1]

IP[T0n(Z) = n]

n−1∏j=1

IP[Zj = 0]

=

mnq−n

IP[T0n(Z) = n]

n∏j=1

(1− q−j)mj ,

so that we have

G(n)q−n = IP[T0n(Z) = n]

n∏j=1

(1− q−j)mj

−1

. (7.54)

From Theorem 11.1, it follows that

nIP[T0n(Z) = n] ∼ θPθ[0, 1],

whereas, in view of (7.53),

n∏j=1

(1− q−j)mj ∼ c exp−θh(n+ 1)

for some constant c, implying that G(n)q−n ∼ c′nθ−1. Since, in (7.52),Zhang always assumes that ρr ≥ 1, he is restricted to cases where θ ≥ 1;our conditions allow the possibility of having θ < 1.

194 7. Consequences

A more precise description of the values G(n) implied by (7.53) can bederived using the size–biasing equation (5.30), which gives

nIP[T0n(Z) = n] =n∑j=1

g(j)IP[T0n(Z) = n− j]

=n∑j=1

g(j)IP[T0,n−j = n− j]n∏

l=n−j+1

(1− q−l)ml ,

for

g(j) = q−jj∑

l=1;l|j

lml = θj +O(q−j/2) ∼ θ,

as in (5.32), and with IP[T00 = 0] interpreted as 1. This, with (7.54), impliesthat

F (n) = n−1n∑j=1

g(j)F (n− j), (7.55)

where F (n) = G(n)q−n, n ≥ 1, and F (0) = 1. Equation (7.55) gives arecursive formula for F (n), and hence for G(n), in terms of the valuesof g(j), 1 ≤ j ≤ n, and of F (j), 0 ≤ j < n; it also enables generatingfunction methods, such as singularity theory (Odlyzko 1995, Theorem 11.4),to be applied, in order to deduce properties of the g(j) from those of G(n).Equation (7.55) is at the heart of Zhang’s method; under his conditionson the G(n), the solutions g(j) can have non–trivial oscillations (Zhang1998, Theorem 1.1), in which case the Logarithmic Condition is not evensatisfied; hence his results cover some cases not included in Theorem 7.25.

Theorem 7.25 has a d–dimensional analogue. Since each component ofa d–dimensional additive function is a real additive function, the sequenceof random vectors X(n) defined in (7.36) has a limit if and only if, for all1 ≤ s ≤ d, the three series in (7.37) with Uj replaced by Ujs all converge. Itis then not hard to see that this criterion is equivalent to the convergenceof the three series∑

j≥1

j−1IP[|Uj | > 1];∑j≥1

j−1IEUj1l|Uj | ≤ 1;

∑j≥1

j−1IE|Uj |21l|Uj | ≤ 1, (7.56)

only the second of which is IRd–valued. For complex valued Uj , the thirdseries can also be replaced by

∑j≥1 j

−1IEU2j 1l|Uj | ≤ 1, recovering the

same form as in (7.37).If more detailed assumptions are made to strengthen (7.37), Theo-

rem 7.25 can be complemented by a convergence rate. The following resultis an example, expressed in terms of bounded Wasserstein distance: for


probability measures P,Q on IR,

dBW (P,Q) = supf∈FBL

∣∣∣∣∫ f dP −∫f dQ

∣∣∣∣ ,with FBL as defined in (7.47) (see also (8.19) below).

Theorem 7.28 Suppose that Conditions (A0), (D1) and (B11) hold, andthat, in addition,

IP[|Uj | > 1] ≤ Cj−ζ ; IEU2j 1l|Uj | ≤ 1 ≤ Cj−2ζ (7.57)

for some 0 < ζ < 1/2. Then

dBW (L(X(n)),L(X)) = O(ψ2θ/(4θ+3)n + n−6g1/7 log3 n),

where X =∑j≥1 1lZj ≥ 1Uj(Zj) and ψn = n−ζ log n.

Proof. We first observe that the random variable X is well defined underCondition (B11). In fact, for any b ≥ 1, by Condition (B11), the LogarithmicCondition and (7.57),

IP

∞∑j=b+1

1lZj ≥ 1Uj(Zj) 6=∞∑

j=b+1

1lZj = 1Uj1l|Uj | ≤ 1

≤ IP

⋃j≥b+1

Zj ≥ 2

+ IP

⋃j≥b+1

Zj = 1 ∩ |Uj | > 1

= O

b−a1 +∑j≥b+1

j−1−ζ

= O(b−ζ),

and

IE

∣∣∣∣∣∣∞∑

j=b+1

1lZj = 1Uj1l|Uj | ≤ 1

∣∣∣∣∣∣≤

∞∑j=b+1

IP[Zj = 1]IE|Uj |1l|Uj | ≤ 1

= O

∞∑j=b+1

j−1−ζ

= O(b−ζ).

Hence

dBW (L(X),L(W0b(Z))) = O(b−ζ), (7.58)

and, from Theorem 6.7,

dBW (L(W0b(Z)),L(W0b(C(n)))) = O(n−1b). (7.59)

196 7. Consequences

SinceX(n) = W0b(C(n))+Wbn(C(n)), it only remains to bound IE|Wbn(C(n))|∧1, following the steps of Lemma 7.26.

First, from the remark following Theorem 6.9, under Condition (B11),

dTV (L(Wb(n),n(C(n))),L(Wb(n),n(C∗(n)))) = O(b−(θ∧g1) log3 b), (7.60)

and, from (7.40),

dTV (L(Wb(n),n(C∗(n))),L(Wb(n),n(C∗(n)))) = O(b−1). (7.61)

Then, from (7.57),

ηn = n−1n∑j=1

IP[|Uj | > δn] = O(n−ζ + (nζδn)−2). (7.62)

So choosing values b(n) = bnψ2θ/(4θ+3)n c, n −m(n) = bnψ2/(4θ+3)

n c andδn = ψ

2(θ+1)/(4θ+3)n log−1 n, it follows from (7.42) and (7.62) that

dTV L(Wb(n),n(C∗(n))),L(W b(n),n) = O(ψ2θ/(4θ+3)n ), (7.63)

and from (7.44) that

IE|W b(n),n| = O(ψ2θ/(4θ+3)n ) (7.64)

also.Combining (7.60) – (7.64), it follows that

dBW (L(X(n)),L(W0b(n)(C(n)))) ≤ IE|Wb(n),n(C

(n))| ∧ 1

= O(n−6(θ∧g1)/7 log3 n+ ψ2θ/(4θ+3)n ),

since b(n)−1 = O(n−1+2ζ/7) = O(n−6/7) in ζ < 1/2. Hence, and from(7.58) and (7.59),

dBW (L(X),L(X(n))) = O(n−6(θ∧g1)/7 log3 n+ ψ2θ/(4θ+3)n + n−6ζ/7),

and the theorem follows. ut

7.5.2 Slow growth

In this section, we consider situations in which X(n) converges, after ap-propriate normalization, to some infinitely divisible limit. We assume that

σ2(m) =m∑j=1

j−1IEU2j →∞ as m→∞; σ2 is slowly varying at ∞,

(7.65)where Uj = Uj(1) as before; these conditions are equivalent in the settingof Zhang (1996b) to his Condition H.


Lemma 7.29 Suppose that (7.65) and Conditions (A0) and (B01) hold.Then there exists a sequence b(n) →∞ with b(n) = o(n) such that

σ(n)−1W ′b(n),n(C

(n)) →d 0,

where, for y ∈ ZZ∞+ ,

W ′lm(y) =

m∑j=l+1

1lyj ≥ 1|Uj(yj)|. (7.66)

Proof. As in the proof of Lemma 7.26, we have

dTV (L(W ′b(n),n(C

(n))),L(W ′b(n),n(C

∗(n)))) → 0

as n→∞, provided only that b(n) →∞, where

W ′lm(y) =

m∑j=l+1

1lyj = 1|Uj |; (7.67)

hence we need only consider W ′b(n),n(C

∗(n)). Now, for any n/2 ≤ m ≤ n,by Lemma 13.4,

IP

n∑j=m+1

1lC∗(n)j = 1|Uj | 6= 0

≤ 2n

n∑j=m+1

c13.4

(n

n− j + 1

)1−θ

≤ 2θ−1c13.4

(n−m+ 1

n

)θ, (7.68)

so that the sum from m + 1 to n contributes with asymptotically smallprobability, provided that n − m is small compared to n. On the otherhand, again from Lemma 13.4,

σ−1(n)IE

m∑j=b+1

1lC∗(n)j = 1|Uj |

≤ σ−1(n)m∑

j=b+1

IP[C∗(n)j = 1]IE|Uj |

≤ σ−1(n)c13.4

(n

n−m+ 1

)1−θ m∑j=b+1

j−1IE|Uj |, (7.69)

and, by the Cauchy–Schwarz inequality,

m∑j=b+1

j−1IE|Uj | ≤

m∑j=b+1

j−1

m∑j=b+1

j−1IEU2j

1/2

≤ σ(n)log(n/b)(1− σ2(b)/σ2(n))

1/2. (7.70)

Since σ2 is slowly varying at ∞, we can pick β(n) →∞, β(n) = o(n), insuch a way that σ2(β(n))/σ2(n) → 1. Hence we can pick b(n) → ∞ with

198 7. Consequences

β(n) ≤ b(n) = o(n) such that log(n/b(n))(1 − σ2(β(n))/σ2(n)) → 0, andthus such that

ηn =log(n/b(n))(1− σ2(b(n))/σ2(n))

1/2 → 0. (7.71)

Now pick m = m(n) in such a way that n −m(n) = o(n) and such thatalso n/(n−m(n))1−θηn → 0. Then, from (7.68)–(7.70), it follows that

σ−1(n)W ′b(n),n(C

∗(n)) →d 0,

and the lemma is proved. ut

Thus, under the conditions of Lemma 7.29, there is a sequence b(n) →∞with b(n) = o(n) such that the asymptotic behaviour of σ−1(n)X(n) isequivalent to that of σ−1(n)W1,b(n)(C

(n)). Under Conditions (A0), (D1)and (B11),

dTV (L(C(n)[1, b(n)], U [1, b(n)]),L(Z[1, b(n)], U [1, b(n)]))= O(n−1b(n)) → 0, (7.72)

by Theorem 6.7. Note also that

IP

supm≥1

∣∣∣∣∣∣m∑j=1

1lZj ≥ 1Uj(Zj)−m∑j=1

1lZj = 1Uj

∣∣∣∣∣∣ > εσ(n)

≤ IP

∞∑j=1

1lZj ≥ 2|Uj(Zj)| > εσ(n)

, (7.73)

where the infinite sum is finite a.s. by the Borel–Cantelli Lemma, sincewe have

∑∞j=1 IP[Zj ≥ 2] ≤

∑∞j=1 j

−1ρj < ∞ under Conditions (A0)and (B01). Then one can also define independent Bernoulli random vari-ables Zj ∼ Be (θ/j) on the same probability space as the Zj ’s and Uj ’s,independent also of the Uj ’s, in such a way that∑

j≥1

IP[Zj 6= 1lZj = 1] <∞,

because, from Condition (A0) and using the Bonferroni inequalities,

|IP[Zj = 1]− θj−1| = O(j−(1+(a1∧1))). (7.74)

With this construction, we have

IP

supm≥1

∣∣∣∣∣∣m∑j=1

1lZj = 1Uj −m∑j=1

ZjUj

∣∣∣∣∣∣ > εσ(n)

≤ IP

∞∑j=1

1lZj 6= 1l[Zj = 1]|Uj | > εσ(n)

, (7.75)


with the infinite sum finite a.s. by the Borel–Cantelli lemma. Since alsoσ(n) →∞, the right hand sides of both (7.73) and (7.75) converge to zeroas n→∞. Finally, as in the proof of Lemma 7.29,

σ(n)−1IE

n∑j=b(n)+1

Zj |Uj |

≤ σ(n)−1n∑

j=b(n)+1

θj−1IE|Uj | ≤ ηn, (7.76)

where ηn is as defined in (7.71), and limn→∞ ηn = 0. Hence the asymptoticbehaviour of σ−1(n)X(n) is equivalent to that of

σ−1(n)X(n), where X(n) =n∑j=1

ZjUj , (7.77)

in the following sense.

Theorem 7.30 Suppose that (7.65) and Conditions (A0), (D1) and (B11)hold. Then if, for any sequence M(n) of centering constants, either of thesequences Lσ−1(n)(X(n) −M(n)) or Lσ−1(n)(X(n) −M(n)) convergesas n→∞, so too does the other, and to the same limit.

Note that X(n) is just a sum of independent random variables, with distribu-tion depending only on θ and the distributions of the Uj , to which standardtheory can be applied. Note also that the theorem remains true as statedfor d–dimensional random vectors Uj(l), if, in (7.65), IEU2

j is replaced byIE|Uj |2.

As an example, take the following analogue of the Kubilius Main The-orem, proved under very much more restrictive conditions with θ = 1 inZhang (1996b). Define µj = θj−1IEUj and M(n) =

∑nj=1 µj .

Theorem 7.31 Suppose that (7.65) and Conditions (A0), (D1) and (B11)hold. Then σ−1(n)(X(n)−M(n)) converges in distribution as n→∞ if andonly if there is a distribution function K such that

limn→∞

σ−2(n)n∑j=1

j−1IE(U2j 1lUj ≤ xσ(n)

√θ) = K(x) (7.78)

for all continuity points x of K; the limit then has characteristic function ψsatisfying

logψ(t) =∫

(eitx − 1− itx)x−2K(dx).

Proof. The theorem follows because of the asymptotic equivalence ofσ−1(n)X(n) and σ−1(n)X(n) of Theorem 7.30, together with Theorem 22.2A

200 7. Consequences

in Loeve (1977). Applied to the random variables Yj = ZjUj −µj , the nec-essary and sufficient condition in the above theorem, for the convergenceof the row sums of uniformly asymptotically negligible arrays, is that

limn→∞

σ−21 (n)

n∑j=1

IEY 2j 1lYj ≤ xσ1(n) = K(x) (7.79)

for all continuity points x of K, where

σ21(n) =

n∑j=1

VarYj =n∑j=1

θj−1IEU2j −

n∑j=1

θ2j−2(IEUj)2.

Note that

ζn = σ−2(n)|σ21(n)− θσ2(n)| ≤ σ−2(n)

n∑j=1

θ2j−2IEU2j = o(1) (7.80)

as n→∞. It then follows from (7.65) that

limn→∞

σ−2(n) max1≤j≤n

VarYj = 0,

since

σ−2(n)VarYn = 1− σ2(n− 1)/σ2(n) → 0

and σ2(n) is increasing in n; hence the random variables σ−11 (n)Yj , for

1 ≤ j ≤ m and m ≥ 1, indeed form a uniformly asymptotically negligiblearray.

To show the equivalence of (7.78) and (7.79), we start by writing

IEY 2j 1lYj ≤ xσ1(n) = θj−1IE(Uj − µj)21lUj ≤ xσ1(n) + µj

+ (1− θj−1)µ2j1l−µj ≤ xσ1(n).

Now observe thatn∑j=1

µ2j1l−µj ≤ xσ1(n) ≤

n∑j=1

(θj−1IEUj)2

≤ θ2n∑j=1

j−2IEU2j = o(σ2(n)), (7.81)

and thatn∑j=1

µjθj−1IE|Uj |1lUj ≤ xσ1(n) + µj ≤ θ2

n∑j=1

j−2(IEUj)2 = o(σ2(n))

also; hence

limn→∞

σ−21 (n)

n∑j=1

IEY 2j 1lYj ≤ xσ1(n)


= limn→∞

θ−1σ−2(n)n∑j=1

θj−1IEU2j 1lUj ≤ xσ1(n) + µj.

Finally, for any 1 ≤ n′ ≤ n,

σ−2(n)n∑

j=n′

j−1IEU2j 1lUj ≤ (x− η′n′)σ(n)

√θ

≤ σ−2(n)n∑

j=n′

j−1IEU2j 1lUj ≤ xσ1(n) + µj

≤ σ−2(n)n∑

j=n′

j−1IEU2j 1lUj ≤ (x+ η′n′)σ(n)

√θ, (7.82)

where

η′l = supj≥l

(|µj |/σ(j)) + (ζj/2θ) → 0

as l → ∞, from (7.80) and (7.81). The equivalence of the convergence in(7.78) and (7.79) at continuity points of K is now immediate. ut

The approximations in Theorems 7.30 and 7.31 both have processcounterparts. Define W(n) and W(n) for t ∈ [0, 1] by

W(n)(t) = σ−1(n)∑

j:σ2(j)≤tσ2(n)

(1lCj(n) ≥ 1Uj(Cj(n))− µj) (7.83)

and

W(n)(t) = σ−1(n)∑

j:σ2(j)≤tσ2(n)

(ZjUj − µj). (7.84)

Then it follows from Lemma 7.29 and (7.72) – (7.76) that

IP[

sup0≤t≤1

|W(n)(t)− W(n)(t)| > ε

]→ 0

for each ε > 0, so that the whole process W(n) is asymptotically equivalentto W(n), the normalized partial sum process for a sequence of independentrandom variables. In particular, if K is the distribution function of the de-generate distribution at 0, the limiting distribution of σ−1(n)(X(n)−M(n))is standard normal, and W(n) converges to standard Brownian motion(Loeve 1977b, 42.2C). The special case Uj(l) = l a.s. for all j, countingthe total number of components, and its analogue which counts the num-ber of distinct components, both come in this category, and we recover thefunctional limit theorems of Chapter 7.1; more precise asymptotics in thiscase are given in Chapter 7.3. The process version of Theorem 7.30 alsocarries over to d–dimensions.

202 7. Consequences

The theorems above can also be complemented by rates, under additionalassumptions. The approximation errors for the standard approximations toL(W(n)) are well known (see, for example, Csorgo and Revesz (1981)), so wecontent ourselves with bounding the error in approximating W(n) by W(n).

Theorem 7.32 Suppose that (7.65) and Conditions (A0), (D1) and (B11)hold, and that

IEU2j ∼ c logα j as j →∞ for some α > −1; (7.85)∑

j≥1

∑l≥2

IP[Zj = l]IE|Uj(l)| <∞. (7.86)

Then

IE

sup0≤t≤1

|W(n)(t)− W(n)(t)| ∧ 1

= O((log log n/

√log n) ∨ σ−1(n)

).

Proof. First, from (7.85), it follows that σ2(m) ∼ (logm)α+1 → ∞.Taking b = bn/ log nc, define tb = σ2(b)/σ2(n), noting that then tb isasymptotically close to 1. From Theorem 6.7, as for (7.59),

IE

sup0≤t≤tb

|W(n)(t)−WZ(n)(t)| ∧ 1

= O(n−1b) = O(1/ log n), (7.87)

where

WZ(n)(t) = σ−1(n)

∑j:σ2(j)≤tσ2(n)

(1lZj ≥ 1Uj(Zj)− µj).

Then, from (7.85) and (7.86), it follows that

IE

supm≥1

∣∣∣∣∣∣m∑j=1

1lZj ≥ 1Uj(Zj)−m∑j=1

1lZj = 1Uj

∣∣∣∣∣∣

≤∑j≥1

∑l≥2

IP[Zj = l]IE|Uj(l)| <∞

and, from (7.74), that

IE

supm≥1

∣∣∣∣∣∣m∑j=1

1lZj = 1Uj −m∑j=1

ZjUj

∣∣∣∣∣∣

≤∑j≥1

|IP[Zj = 1]− θj−1|IE|Uj | <∞,

which, combined with (7.87), shows that

IE

sup0≤t≤tb

|W(n)(t)− W(n)(t)| ∧ 1

= O((log n)−1 + σ−1(n)).


For the remaining t > tb, note first that, from (7.76),

IE

suptb≤t≤1

|W(n)(t)− W(n)(tb)| ∧ 1

(7.88)

≤ σ−1(n)IE

n∑

j=b(n)+1

Zj |Uj |

= O(log log n/√

log n).

Then

IE

suptb≤t≤1

|W(n)(t)−W(n)(tb)| ∧ 1≤ σ−1(n)IEW ′

bn(C(n)) ∧ 1,

with W ′ defined as in (7.66); this is bounded by arguing as for (7.60)and (7.61) with b = bn/ log nc, and then noting that

σ−1(n)bn/2c∑j=b+1

IP[C∗(n)j = 1]IE|Uj | = O(log log n/

√log n)

from Lemma 13.1, and finally that

IE

n∑

j=bn/2c+1

1lC∗(n)j = 1|Uj |

≤ 2(log n)α/2,

since∑nj=bn/2c 1lC∗(n)

j = 1 ≤ 2 a.s. ut

7.5.3 Regular growth

In this section, we explore the consequences of replacing the slow growthof σ2(n) in (7.65) by regular variation:

σ2(m) =m∑j=1

j−1IEU2j is regularly varying at ∞, with exponent α > 0,

(7.89)so that, in particular, σ2(b(n))/σ2(n) → 0 for all sequences b(n) = o(n) asn→∞. Our aim is to approximate X(n) by

X∗(n) =n∑j=1

1lC∗(n)j = 1Uj , (7.90)

which is a standard quantity, the same for all Zj sequences satisfyingConditions (A0), (D1) and (B11), defined solely in terms of the Uj ’s andESF(θ).

204 7. Consequences

The first step is to show that the small components play little part. UnderConditions (A0), (D1) and (B11), it follows from Theorem 6.7 that

dTV (L(W ′1b(C

(n))),L(W ′1b(Z))) = O(b/n), (7.91)

where W ′ is as defined in (7.66), and also, as in (7.73), that

σ−1(n)

∣∣∣∣∣∣W ′1,b(n)(Z)−

b(n)∑j=1

1lZj = 1|Uj |

∣∣∣∣∣∣→d 0, (7.92)

whatever the choice of b(n). But now

Var

b∑j=1

1lZj = 1|Uj |

= IE

b∑j=1

1lZj = 1Var |Uj |

+ Var

b∑j=1

1lZj = 1IE|Uj |

≤ θ(1 + ε∗01)

b∑j=1

j−1(Var |Uj |+ IE|Uj |2)

= θ(1 + ε∗01)σ2(b), (7.93)

and, from the Cauchy–Schwarz inequality as in (7.70),

IE

b∑j=1

1lZj = 1|Uj |

≤ (1+log b)σ2(b)1/2 = O(σ(b) log1/2 b). (7.94)

Combining (7.91) – (7.94), and in view of (7.89), it follows that

σ−1(n)W ′1,b(n)(C

(n))) →d 0 (7.95)

under Conditions (A0), (D1) and (B11), provided that b(n) = O(n1−δ) forsome δ > 0; and (7.93) and (7.94) then imply that

σ−1(n)b(n)∑j=1

1lC∗(n)j = 1|Uj | → 0 (7.96)

also.Now define the normalized process X(n) by

X(n) = σ−1(n)∑

j:σ2(j)≤tσ2(n)

1lCj(n) ≥ 1Uj(Cj(n)), 0 ≤ t ≤ 1,

and a process analogue X∗(n) of X∗(n) by

X∗(n)(t) = σ−1(n)∑

j:σ2(j)≤tσ2(n)

1lC∗(n)j = 1Uj , 0 ≤ t ≤ 1.


From (7.95) and (7.96), the contributions from indices j ≤ b(n) are asymp-totically negligible. Then, from Theorem 6.9, under Conditions (A0), (D1)and (B11) and for b = b(n) = O(n/ log n), it follows that

dTV (L(C(n)[b+ 1, n], U [b+ 1, n]),L(C∗(n)[b+ 1, n], U [b+ 1, n]))= O(b−(g1∧θ) log3 b) → 0, (7.97)

whereas, from Lemma 13.2,

dTVL(C∗(n)[b+ 1, n], U [b+ 1, n]),

L((1lC∗(n)j = 1, b+ 1 ≤ j ≤ n), U [b+ 1, n])

= O(b−1) → 0. (7.98)

Combining (7.95), (7.96), (7.97) and (7.98), it follows that X(n) and X∗(n)

are asymptotically equivalent. This leads to the following result.

Theorem 7.33 Suppose that (7.89) and Conditions (A0), (D1) and (B11)hold. Then if, for some sequence of centering functions Mn : [0, 1] → IR,either of L(X∗(n)−Mn) or L(X(n)−Mn) converges, it follows that the otheralso converges, and to the same limit.

This theorem remains true in d–dimensions, if, in (7.89), IEU2j is replaced

by IE|Uj |2.The choice of normalization is rather natural. For instance, if Uj = cjα/2

a.s. for each j, then σ2(n) ∼ c2α−1nα, and

X∗(n)(1) ∼ α1/2

∫ 1

0

xα/2Ψ∗(n)(dx) →d α1/2

∫ 1

0

xα/2Ψ∗(dx),

where Ψ∗(n) and Ψ∗ are the ESF(θ) and PD(θ) random measures definedin (7.6) and (7.5). Thus, in this case, X∗(n)(1) has a limit in distribution,without any centering. However, if α = 2, the limit actually has a degener-ate distribution, since

∫ 1

0xΨ∗(dx) = 1 a.s. More generally, X∗(n) can only

be expected to have a non–degenerate limit if

v(n) = VarX∗(n) ≥ cσ2(n) (7.99)

for some c > 0 and for all n. This condition is satisfied if the randomvariables Uj are centered, or, more generally, if VarUj ≥ c′IEU2

j for somec′ > 0 and for all j, since

VarX∗(n) = Var

n∑j=1

1lC∗(n)j = 1Uj

= IE

n∑j=1

1lC∗(n)j = 1VarUj

206 7. Consequences

+ Var

n∑j=1

1lC∗(n)j = 1IEUj

≥ c′

bn/2c∑j=1

IP[C∗(n)j = 1]IEU2

j

≥ c′′σ2(n), (7.100)

for suitable constants c′ and c′′, since jIP[C∗(n)j = 1] is bounded below in

j ≤ n/2 by Lemma 13.1. On the other hand, the dependence between therandom variables C∗(n)

j can result in v(n) being of smaller order than σ2(n),as in the example considered above. In particular, if Uj(s) = sj a.s. for allj and s, then σ2(n) = 1

2n(n+ 1) is regularly varying with exponent α = 2,but X(n) − n is a.s. zero, and the distribution of X∗(n) − n ≤ 0 has anon–trivial limit. In such circumstances, the non–degenerate normalizationfor X(n) may not be σ−1(n), nor need X∗(n) be appropriate for describingits limiting behaviour.

Even when (7.99) holds, so that the asymptotics of X(n) are the same asthose of X∗(n), the limit theory is complicated. For one thing, there is stillthe dependence between the C∗(n)

j , which leads to the Poisson–Dirichletapproximations of Chapter 7.2, rather than to approximations based onprocesses with independent increments. But, even allowing for this, thereis no universal approximation valid for a wide class of Uj sequences, as wasthe case with slow growth and the Gaussian approximations. For example,take the case in which IEU2

j ∼ cjα for some α > 0. Then σ2(n) ∼ cα−1nα isof the same order as IEU2

j for n/2 < j ≤ n, and there is an asymptotically

non–trivial probability that one such j will have C∗(n)j = 1. Hence the dis-

tribution of the sum X∗(n) typically depends in detail on the distributionsof the individual Uj ’s.

Rates for the approximation of X(n) by X∗(n) can of course be deduced,under suitable assumptions.

Theorem 7.34 Under Conditions (A0), (D1) and (B11), if Condi-tion (7.89) is strengthened to

IEU2j ∼ cjα for some c, α > 0; (7.101)

m∑j=1

∑l≥2

IP[Zj = l]IE|Uj(l)| = O(mα/2), (7.102)

then

σ−1(n)IE

sup0≤t≤1

|X(n)(t)− X∗(n)(t)| ∧ 1

= O(n−rβ0/(r+β0) log3 n),

where r = (1 ∧ α/2) and β0 = (1 ∧ θ ∧ g1).


Proof. Collecting the bounds in (7.91) to (7.96) gives a contribution to theerror of order O

(n−1b+ (n−1b)α/2

√log b

), since σ2(m) ∼ cα−1mα and

IE

∣∣∣∣∣∣W ′1b(Z)−

b∑j=1

1lZj = 1|Uj |

∣∣∣∣∣∣ = O(bα/2),

from (7.102). Then the errors arising from (7.97) and (7.98) are of ordersO(b−(g1∧θ) log3 b) and O(b−1) respectively. Choosing b(n) = nr/(r+β0) for rand β0 as defined above completes the proof. ut


8A Stein Equation

We now turn to the foundations on which the results of Chapters 6.7 and 7are built. In this chapter, we introduce the main building block, Stein’smethod for the compound Poisson distributions of T0m(Z∗) and Xθ. Ashort, elementary introduction to the method is to be found in Chapter 3.4.

8.1 Stein’s method for T0m(Z∗)

An essential part of the argument is to be able to compare the distributionof a sum like Tvm(Z) to that of T0m(Z∗). The distribution of T0m(Z∗) iscompound Poisson CP(λ(m)), with rates λi = θ/i, 1 ≤ i ≤ m, and λi = 0otherwise, since T0m(Z∗) =

∑mi=1 iZ

∗i , and the Z∗i are independent Poisson

Po(θ/i) random variables. Thus, in order to obtain approximations, we canuse Stein’s method for the compound Poisson distribution, described inBarbour, Chen and Loh (1992). The Stein Operator Sm can be written inthe form

(Smg)(w) =m∑i=1

iλig(w + i)− wg(w) = θm∑i=1

g(w + i)− wg(w), w ≥ 0,

(8.1)and the Stein Equation as

(Smg)(w) = f(w)− CP(λ(m))f, w ≥ 0, (8.2)

(c.f. (3.20)), where the test function f is any function whose expectationCP(λ(m))f with respect to the compound Poisson distribution CP(λ(m))

8.1. Stein’s method for T0m(Z∗) 209

exists, and g = gf is a function determined by solving (8.2). Then, if W isany nonnegative integer valued random variable, it follows from (8.2) that

IEf(W )− CP(λ(m))f = IE(Smgf )(W ),

provided that the expectations exist, and hence that

supf∈F

|IEf(W )− CP(λ(m))f| ≤ supf∈F

|IE(Smgf )(W )|, (8.3)

for any choice of family F of test functions. There are a number of choicesof F for which the left hand side of (8.3) corresponds to a useful metric onthe set of probability distributions on ZZ+: for instance,

F = 1lA, A ⊂ ZZ+ gives total variation distance, dTV ;

F =f : ZZ+ → IR : sup

j≥0|f(j + 1)− f(j)| ≤ 1

gives the Wasserstein distance, dW ; (8.4)

F =f : ZZ+ → IR : sup

j≥0|f(j + 1)− f(j)| ≤ 1, sup

j≥0|f(j)| ≤ 1

2

gives the bounded Wasserstein distance, dBW ;

F = 1l[0,j], j ≥ 0 gives Kolmogorov distance, dK .

The essence of Stein’s method is that, if L(W ) is in fact close to CP(λ(m)),then the right hand side of (8.3) can often be shown to be small by ratherdirect arguments, thus providing concrete upper bounds for the accuracyof approximation with respect to these metrics.

Taking F to be 1lA, A ⊂ ZZ+, for total variation approximation, wedenote the solutions gf by gmA. Then, if W is a nonnegative integer valuedrandom variable, it follows from (8.1) and (8.2) that

IP[W ∈ A]− CP(λ(m))A = θIEm∑i=1

gmA(W + i)− IEWgmA(W ). (8.5)

For our choices of W , the right hand side of (8.5) can relatively easily beshown to be composed of small terms, multiplied by coefficients depend-ing only on the functions gmA; indeed, if W exactly has the distributionCP(λ(m)), the right hand side of (8.5) is exactly zero, if gmA is replaced byany bounded function g, so that, in particular,

IET0m(Z∗)g(T0m(Z∗)) = θm∑i=1

IEg(T0m(Z∗) + i) (8.6)

for all bounded g : ZZ+ → IR. In order to find bounds for these coefficients,we need to know the size of |gmA(w)|, w ∈ ZZ+; we thus begin by derivingsome appropriate estimates.

An attractive starting point is apparently given by the estimates in Bar-bour, Chen and Loh, but although the sequence iλii≥1 is non–increasing,

210 8. A Stein Equation

their bounds are of little use, because λ1 − 2λ2 = 0. However, the re–expression of Sm in terms of the generator of a Markov process is useful.Setting g(w) = q(w)− q(w − 1), we have

(Smg)(w) = (Amq)(w) = θq(w+m)− q(w)+wq(w−1)− q(w), (8.7)

where Am is the infinitesimal generator of ζ, an immigration–death processwith per–capita death rate 1 and immigration at rate θ in groups of size m:ζ has CP(λ(m)) as its equilibrium distribution. Defining qmA by gmA(w) =qmA(w)− qmA(w − 1), the corresponding Stein Equation (8.2) becomes

(AmqmA)(w) = 1lw ∈ A − CP(λ(m))A, w ≥ 0, (8.8)

and a solution is then given by

qmA(w) = −∫ ∞

0

(IPw[ζ(t) ∈ A]− CP(λ(m))A) dt, w ≥ 0, (8.9)

where IPw denotes the distribution of ζ conditional on ζ(0) = w.

Lemma 8.1 For any k ≥ 1, A ⊂ ZZ+ and for all w ≥ 0,∣∣∣ k∑i=1

gmA(w + i)∣∣∣ ≤ h(k + 1), (8.10)

where, as usual, h(r + 1) =∑rj=1 j

−1.

Proof. For any k ≥ 1 and w ≥ 0, from (8.9),

qmA(w)− qmA(w + k) =∫ ∞

0

(IPw+k[ζ(t) ∈ A]− IPw[ζ(t) ∈ A]) dt

=∫ ∞

0

IE (1lζ0(t) +D(t) ∈ A − 1lζ0(t) ∈ A) dt, (8.11)

where ζ0 denotes a realization of the immigration–death process withζ0(0) = w, and D is an independent realization of a pure death processwith per–capita death rate 1 and D(0) = k. Hence, if ν0 denotes the firsttime at which D(t) = 0,∣∣∣ k∑

i=1

gmA(w + i)∣∣∣ = |qmA(w)− qmA(w + k)|

≤∫ ∞

0

IP[D(t) > 0] dt = IEν0 = h(k + 1),

and the lemma follows. ut

Lemma 8.2 For all A ⊂ ZZ+ and w ≥ 1,

|gmA(w)| ≤ 1 ∧ κ(m)w

,

8.1. Stein’s method for T0m(Z∗) 211

where κ(m) = 1 + θh(m+ 1).

Proof. The first part is just (8.10) with k = 1. For the second, observe that(8.2) gives

wgmA(w) = θm∑i=1

gmA(w + i)− 1lw ∈ A+ CP(λ(m))A,

and then invoke (8.10) with k = m. ut

Lemma 8.3 Suppose that A is of the form [0, x − 1]. Then, denoting thecorresponding function g by gmx,

0 ≤ gmx(w) ≤ 1 + θ

x+ θ

(1 ∧ x

w + 1

), w ≥ 1.

Proof. For any w ≥ 0, from (8.11) with k = 1,

gmx(w + 1) =∫ ∞

0

IE (1lζ0(t) = x− 11lD(t) > 0) dt,

where ζ0 and D are as in (8.11). Let the jump times of ζ0 +D be τrr≥1,and let R = minr : D(τr) = 0: let σjj≥1 index that subsequence ofthe τr’s generated by the upward jumps of ζ0. Set σ0 = τ0 = 0. Then, usingthe Markov property,

0 ≤∫ ∞

0

IE 1lζ0(t) = x− 11lD(t) > 0 dt

= IE

∑r≥0

1lr < R1lζ0(τr) = x− 1(τr+1 − τr)

=

1x+ θ

IE

∑r≥0

1lr < R1lζ0(τr) = x− 1

=

1x+ θ

IE

∑r≥0

∑j≥1

1lr < R1lζ0(τr) = x− 11lσj−1 ≤ r < σj

≤ 1

x+ θIE

∑j≥1

∑σj−1≤r<σj

1lσj−1 < R1lζ0(τr) = x− 1

≤ 1

x+ θIE

∑j≥1

1lσj−1 < R

,


where the last inequality holds because ζ0 can visit x − 1 at most oncebetween upward jumps. Thus

0 ≤ gmx(w + 1) ≤ 1x+ θ

1 +∑j≥1

IP[τσj < ν0]

=1 + θ

x+ θ, (8.12)

since the τσjform a Poisson process of rate θ, independent of ν0.

For w > x, let Ax denote the event that ζ0 reaches x − 1 at least oncebefore ν0. Then, by the Markov property,∫ ∞

0

IE (1lζ0(t) = x− 11lD(t) > 0) dt

= IP[Ax]∫ ∞

0

IE(1lζ0(t) = x− 11lD(t) > 0

)dt, (8.13)

where ζ0 has the distribution of the immigration-death process startingwith ζ0(0) = x− 1 and D is an independent death process with D(0) = 1.Now, in reaching x−1 from w, ζ0 must make each of the transitions j → j−1for x ≤ j ≤ w at least once, and the chance that D still takes the value 1at the end of this time is at most

w∏j=x

( j

j + 1

)=

x

w + 1,

so that IP[Ax] ≤ x/(w+1). Using this estimate in (8.13), together with theestimate (8.12) for the integral, completes the inequality. ut

Lemmas 8.1 and 8.2 provide the information about |gmA(w)| that isneeded when establishing approximations with respect to total variationdistance; Lemma 8.3 is used for Kolmogorov distance. However, whensharpening the approximation of m−1Tbm(Z) by Xθ in Theorem 5.3, wealso need to consider the Wasserstein distance between m−1Tvm(Z) andm−1T0m(Z∗), as in Theorem 10.8. Here, because of the normalizations m−1

in the random variables being compared, we need to consider the solutionsg = gmf to (8.2) for functions f satisfying |f(j)− f(k)| ≤ m−1|j − k|. Theanalogue of (8.9) now gives

gmf (w + 1) = −∫ ∞

0

IEf(ζ0(t) +D(t))− f(ζ0(t)) dt,

where ζ0(0) = w and D(0) = 1, and hence, for such f , we immediatelyobtain the simple inequality

|gmf (w)| ≤ m−1, w ≥ 1. (8.14)

To give a quick illustration of how we apply Stein’s method using these es-timates, we prove the following result, which shows how close in distributionTvm(Z∗) is to T0m(Z∗).

8.2. Stein’s method for Pθ 213

Example 8.4 For 0 ≤ v ≤ m and t ≥ 0,

0 ≤ IP[Tvm(Z∗) ≤ t]− IP[T0m(Z∗) ≤ t] ≤

1 + θ

t+ 1 + θ

θv.

Proof. The analogues of (8.1) and (8.5) for Tvm(Z∗), which is itself acompound Poisson distribution, yield the equality

IETvm(Z∗)g(Tvm(Z∗)) = θ

m∑i=v+1

IEg(Tvm(Z∗) + i) (8.15)

for all bounded functions g; (8.6) is the special case v = 0. Thus, takingW = Tvm(Z∗) in (8.5), we find that

IP[Tvm(Z∗) ∈ A]− CP (λ(m))A = θ

v∑i=1

IEg(Tvm(Z∗) + i). (8.16)

Taking A = [0, t], and applying Lemma 8.3 with x = t + 1, we obtain theresult claimed. ut

Note that, for t of order m, the difference between the probabilities is oforder v/m, which is small wheneverm v, and t of orderm is precisely therange of primary interest to us, sincem−1T0m(Z∗) converges in distributionto the positive random variable Xθ, as proved in Theorem 4.6.

8.2 Stein’s method for Pθ

A second essential ingredient in our proofs is a sharpening of Theorem 4.6,to give estimates of the error in the limiting approximation of m−1T0m(Z∗)by Xθ. In order to obtain information about the approximation error, weagain need Stein’s method.

Defining gm by gm(w) = g(w/m), for any fixed function g : IR+ → IR,and taking g = gm in (8.1), we obtain

1m

(Smgm)(w) =θ

m

m∑i=1

g(w + i

m

)− w

mg(wm

).

This suggests a Stein Operator for Pθ defined by

(Sg)(u) = θ

∫ 1

0

g(u+ t)dt− ug(u), (8.17)

with the defining property that PθSg = 0 for all bounded g. Considera-tion of g(u) = 1lx < u ≤ x+ ε for small ε shows that (8.17) is the SteinOperator appropriate to the probability distribution P on IR+ satisfying


P0 = 0 and

xd

dxP [0, x] = θP [(x− 1)+, x), x > 0,

so that P is indeed the distribution Pθ introduced in Theorem 4.6; see, forexample, Equation (4.29). Hence, if we know enough about the solutions gfof the corresponding Stein Equation

(Sgf )(u) = f(u)− Pθ(f), (8.18)

for suitable classes of test function f , we can use Stein’s method to obtainestimates of the error in approximating the distribution of m−1T0m(Z∗)by Pθ, just as in (8.3). The only difference is that we are now concernedwith probability distributions over IR+, rather than over ZZ+, so that thedefinitions of the distances need minor modification:

F = f : IR+ → IR : ‖f ′‖ ≤ 1gives the Wasserstein distance, dW ;

F =f : IR+ → IR : ‖f ′‖ ≤ 1, ‖f‖ ≤ 1

2

(8.19)

gives the bounded Wasserstein distance, dBW ;F =

1l[0,x], x ≥ 0

gives Kolmogorov distance, dK .

In particular, from (8.17) and (8.6),

IE(Sgf )(m−1T0m(Z∗))

= θIE∫ 1

0

gf (m−1T0m(Z∗) + t)dt

− 1m

m∑i=1

gf (m−1[T0m(Z∗) + i]), (8.20)

indicating that IEf(m−1T0m(Z∗))− Pθ(f) is small for all functions f suchthat gf is bounded and

∫ 1

0gf (u+t)dt is well approximated by the discretiza-

tion 1m

∑mi=1 gf (u+ i/m), for m large and all u. We now derive bounds for

the accuracy of this discretization, when the function f is uniformly Lips-chitz continuous, as for Wasserstein distance, and when it is the indicatorof a half line, as for Kolmogorov distance.

As when establishing the properties of the solutions of (8.2), it is usefulfirst to find an explicit probabilistic formula for gf , analogous to (8.9).Here, we define hf such that h′f (u) = gf (u), so that, from (8.17),

(Sgf )(u) = (Ahg)(u) = θhf (u+ 1)− hf (u) − uh′f (u). (8.21)

A is then the infinitesimal generator of a process ξ on IR+ which at thepoint u has a deterministic drift at rate u towards the origin, together withunit positive jumps at the points (σj , j ≥ 1) of a Poisson process with

8.2. Stein’s method for Pθ 215

intensity θ: thus

ξ(t) = ξ(0)e−t +∑j≥1

e−(t−σj)+

(8.22)

In these terms, hf can be expressed in the form

hf (u)− hf (v) =∫ ∞

0

IEf(ξ(t) + ve−t)− f(ξ(t) + ue−t)dt. (8.23)

Lemma 8.5 If f is uniformly Lipschitz, satisfying |f(u)− f(v)| ≤ |u− v|for all u, v ≥ 0, then, for all u ≥ 0,∣∣∣∫ 1

0

gf (u+ t)dt− 1m

m∑i=1

gf (u+ i/m)∣∣∣ ≤ 1 + θ

m

m∑i=1

1mu+ i

Proof. It is immediate from (8.23) that, under the given condition on f ,

|hf (u)− hf (v)| ≤ |v − u|, (8.24)

so that |gf (u)| = |h′f (u)| ≤ 1. Then, for x, u ≥ 0, using (8.18) and (8.21),

gf (u+ x)− gf (u) = h′f (u+ x)− h′f (u)

=1

u+ x[(u+ x)h′f (u+ x)− uh′f (u)− xh′f (u)]

=1

u+ x

[θhf (u+ x+ 1)− hf (u+ x)− hf (u+ 1) + hf (u)

− f(u+ x) + f(u)− xh′f (u)]

(8.25)

and thus, from (8.24),

|gf (u+ x)− gf (u)| ≤2xu+ x

(θ + 1). (8.26)

This in turn implies that∣∣∣∫ 1

0

gf (u+ t)dt− 1m

m∑i=1

gf (u+ i/m)∣∣∣

≤m∑i=1

∫ 1/m

0

|gf (u+ i/m)− gf (u+ i/m− y)|dy

≤ (θ + 1)m2

m∑i=1

1u+ i/m

,

as required. ut


Lemma 8.6 If f(u) = 1lu ≤ y then, for any u ≥ 0,∣∣∣∫ 1

0

gf (u+ t)dt− 1m

m∑i=1

gf (u+ i/m)∣∣∣ ≤ (2θ + 1)(θ + 1)

2my

m∑i=1

1mu+ i

+1my

.

Proof. By (8.22) and (8.23), if u > v ≥ 0,

|hf (u)− hf (v)| =∫ ∞

0

IE1lξ(t) + ve−t ≤ y < ξ(t) + ue−tdt

≤ y−1|u− v|

1 + IE(∑j≥1

e−σj ),

since the indicator is positive for at most one t-interval between successivepoints of the Poisson process, and if τ is the initial point of such an interval,its length cannot exceed y−1|u− v|e−τ . Hence

|hf (u)− hf (v)| ≤ y−1|u− v|(1 + θ), (8.27)

and thus

|gf (u)| ≤ y−1(1 + θ) (8.28)

for all u. Arguing in the same way as for (8.25), it then follows that

|gf (u+x)−gf (u)| ≤1

u+ xy−1(2θ+1)(1+θ)x+1lu ≤ y < u+x, (8.29)

after which integration gives∣∣∣∫ 1

0

gf (u+ t)dt− 1m

m∑i=1

gf (u+ i/m)∣∣∣

≤ (2θ + 1)(1 + θ)2my

m∑i=1

1mu+ i

+m∑i=1

1ly − i/m < u ≤ y − (i− 1)/mmu+ i

≤ (2θ + 1)(1 + θ)2my

m∑i=1

1mu+ i

+1my

.

ut

The estimates of Lemmas 8.5 and 8.6 are used in Theorems 10.10 and 10.12to give rates of convergence in Theorem 4.6.

8.3 Applying Stein’s method

In exploiting Stein’s method, we replace w in (8.2) by a random variable Wof interest, and take expectations; hence the key step is to be able to makegood estimates of the result of replacing w in the definition of the Stein

8.3. Applying Stein’s method 217

Operator (8.1) by the random variable W , and then taking expectations.If W has exactly the distribution to which the Stein Operator corresponds,here the compound Poisson distribution of T0m(Z∗), then IE(Smg)(W ) = 0for all bounded functions g. If W has a distribution which is close to thatof T0m(Z∗), then IE(Smg)(W ) should be close to zero. If we are able todemonstrate this effectively, then we can choose g to be the solution ofthe Stein Equation (8.2) corresponding to any member f of a family oftest functions, and hence deduce explicit estimates of the closeness of thedistributions of W and T0m(Z∗). This is a strategy that we shall use exten-sively in the later parts of this monograph, but it is not our only resource.If we have good bounds on |IE(Smg)(W )| for any g, we can make directchoices of g, rather than choosing an f and solving the Stein Equation; thisprocedure can also yield valuable information.

In our current problem, the random variables W of interest are theTvm(Z). Whichever the route to be taken, we thus need to be able toevaluate the expectation IE(Smg)(Tvm(Z)) as accurately as possible. Now,from (8.1),

(Smg)(w) = θ

m∑i=1

g(w + i)− wg(w), w ≥ 0. (8.30)

The following lemma shows how to evaluate the expectation of the secondelement, in terms which closely match the expectation of the first element.Let Tvm(i)(Z), v + 1 ≤ i ≤ m, denote the sum Tvm(Z) − iZi1, having thefirst i–contribution deleted. Then we have the following formula for thecontribution to (8.30) of the product IEWg(W ), when W = Tvm(Z).

Lemma 8.7 For any bounded g,

IETvm(Z)g(Tvm(Z)) = θ

m∑i=v+1

IEg(Tvm(Z) + i) + θ

3∑j=1

η(vm)j(g),

where

η(vm)1(g) =

m∑i=v+1

∑l≥1

lεilIEg(Tvm(i)(Z) + il),

η(vm)2(g) = −

m∑i=v+1

θ

iri

×

∑l≥2

εilIEg(Tvm(i)(Z) + i(l + 1))− Ei1IEg(Tvm(i)(Z) + i)

and

η(vm)3(g) =

m∑i=v+1

θ

iri(1 + εi1)IEg(Tvm(i)(Z) + i)− IEg(Tvm(i)(Z) + 2i).


Proof. Write W for Tvm(Z) and Wij for W − iZij . Then, for any pairs(i, j) such that v + 1 ≤ i ≤ m and 1 ≤ j ≤ ri, we have

IEZijg(W ) =θ

iriIEg(Wij + i) +

∑l≥1

θ

irilεilIEg(Wij + il)

=θ

iriIEg(Wi1 + i) +

∑l≥1

θ

irilεilIEg(Wi1 + il).

Hence, multiplying by i and adding over pairs (i, j) satisfying 1 ≤ j ≤ riand v + 1 ≤ i ≤ m, we obtain

IEWg(W ) = θ

m∑i=v+1

IEg(Wi1 + i) +∑l≥1

lεilIEg(Wi1 + il)

= θ

m∑

i=v+1

IEg(Wi1 + i) + η(vm)1(g)

. (8.31)

Furthermore,

IEg(W + i) = IEg(Wi1 + iZi1 + i)

= IEg(Wi1 + i) +θ

iri(1 + εi1)IEg(Wi1 + 2i)− IEg(Wi1 + i)

+θ

iri

∑l≥2

εilIEg(Wi1 + i(l + 1))− IEg(Wi1 + i),

from which the lemma follows. ut

Applying Lemma 8.7 to the evaluation of IE(Smg)(Tvm(Z)), we thusobtain

IE(Smg)(Tvm(Z)) = θv∑i=1

IEg(Tvm(Z) + i)− θ3∑j=1

η(vm)j(g). (8.32)

The first term is an element which describes how much of the accuracy ofapproximation has been lost by matching Tvm(Z) with T0m(Z∗) instead ofwith Tvm(Z∗). It vanishes when v = 0, and it remains small for v not toolarge, whenever the expectations IEg(Tvm(Z) + i) are small; the boundson |g| given in Chapter 8 show that this last requirement is often satisfiedfor the choices of g useful in practice. The second element summarizes thedeparture of the distributions of the random variables Zi from those of Z∗i ,and it vanishes if all of the εil are zero and the ri are infinite. If all the ri areinfinite, corresponding to infinitely divisible Zi, then only the term η(vm)

1(g)remains; if all the εil are zero, corresponding to Bernoulli Zij with exactlythe right mean, then only a contribution from η(vm)

3(g) is left.

8.3. Applying Stein’s method 219

To relate Lemma 8.7 to the arguments of Chapter 5, we state the follow-ing simple consequence, which establishes an effective analogue of the sizebiasing equation (4.51) in very general circumstances.

Corollary 8.8 Taking g = 1ls, for any s ∈ ZZ+, we have

sIP[Tvm(Z) = s] = θIP[s−m ≤ Tvm(Z) < s−v]+θ3∑j=1

ηj(vm)(1ls), (8.33)

where

η1(vm)(1ls) =

m∑i=v+1

∑l≥1

lεilIP[Tvm(i)(Z) = s− il],

η2(vm)(1ls) =

−m∑

i=v+1

θ

iri

∑l≥2

εilIP[Tvm(i)(Z) = s− i(l + 1)]− Ei1IP[Tvm(i)(Z) = s− i]

and

η3(vm)(1ls) =

m∑i=v+1

θ

iri(1+εi1)IP[Tvm(i)(Z) = s−i]−IP[Tvm(i)(Z) = s−2i].

The closeness of (8.33) to (4.51) is thus determined entirely by the size ofthe ηj(vm)’s; the better the bounds that can be established for these errorterms, the stronger the consequences that can be proved.

A simple observation which is often useful when bounding the ηj ’s isthat, for any 0 ≤ r ≤ s and v + 1 ≤ i ≤ m,

IP[Tvm(i)(Z) = r]IP[iZi1 = s− r] ≤ IP[Tvm(Z) = s]. (8.34)

Hence, for any g,

IE|g(Tvm(i)(Z))| ≤ IE|g(Tvm(Z))|/IP[Zi1 = 0]. (8.35)

This is frequently used in what follows. Since we always assume that pi =IP[Zi1 = 0] → 1 as i → ∞, the probability in the denominator causes noasymptotic loss of precision; for convenience, we define

w0 = mini′ ≥ 0 : pi ≥ 1/2 for all i > i′, (8.36)

so that

IE|g(Tvm(i)(Z))| ≤ 2IE|g(Tvm(Z))| (8.37)

for all i > w0. A further consequence of (8.34) is then that, for all r ≥ 0,

IP[Tvm(i)(Z) = r] ≤ maxs≥r

IP[Tvm(Z) = s]/

maxl≥0

IP[Zi1 = l],


implying that, for any r0 ≥ 0,

maxr≥r0

IP[Tvm(i)(Z) = r] ≤ c(w0) maxs≥r0

IP[Tvm(Z) = s], (8.38)

where

c(w0) = max

2, max1≤i≤w0

(1/max

l≥0IP[Zi1 = l]

).


9Point probabilities

In this chapter, we examine the individual probabilities IP[Tvm(Z) = s],and their differences for successive values of s. Accurate bounds for pointprobabilities are used almost everywhere, and in particular in controllingthe error terms in the analogues of the size biasing equation (4.51), such asCorollary 8.8 and the more sophisticated versions (9.26) and (9.41), whichare central to our argument. Their differences are critical in proving totalvariation estimates for the small components, as indicated already in (6.6).

9.1 Bounds on individual probabilities

We start with point probabilities of the form IP[Tvm(Z) = s], for any s ≥ 1.The bounds that we prove are extensions of those of Lemma 4.12, nowvalid in full generality, and not just when Zj ∼ Po(θ/j), as previously. InLemma 4.12, the proof is based on exploiting (4.51). Here, we work usingits analogue, given in Corollary 8.8.

Simple bounds

The first step is to show that sIP[Tvm(Z) = s] is bounded in s, thecounterpart of Lemma 4.12 (iii).

Lemma 9.1 For any s ≥ 1, we have∣∣∣IP[Tvm(Z) = s]− θs−1IP[s−m ≤ Tvm(Z) < s− v]∣∣∣ ≤ θs−1ε9.1(v),

222 9. Point probabilities

where ε9.1(v) = ε9.1(v, Z) is bounded in v if µ∗0 < ∞, and is of orderO((v + 1)−(1∧g1∧a1)

)under Conditions (A0) and (B01).

Proof. Applying Corollary 8.8, we have

|IP[Tvm(Z) = s]− θs−1IP[s−m ≤ Tvm(Z) < s− v]| ≤ θs−13∑j=1

ηj ,

where, from (8.35),

η1 ≤ 1p−v

m∑i=v+1

∑l≥1

l|εil|IP[Tvm = s− il]

≤ 1p−v

∑l≥1

lε∗vlIP[Tvm < s] ≤ µ∗v/p−v ; (9.1)

η2 ≤ θ

vr−v

m∑i=v+1

Ei1IP[Tvm(i) = s− i] +∑l≥2

εilIP[Tvm(i) = s− i(l + 1)]

≤ 2E∗

v1

θ

vr−v p−v, (9.2)

and finally

η3 ≤ θ(1 + ε∗v1)p−v vr

−v

IP[Tvm < s− b] ≤ θ(1 + ε∗v1)p−v vr

−v

. (9.3)

Hence it follows that∣∣∣IP[Tvm(Z) = s]− θs−1IP[s−m ≤ Tvm(Z) < s− v]∣∣∣ ≤ θs−1ε9.1(v),

for

ε9.1(v) = ε9.1(v, Z) = 2µ∗v +

θ

vr−v(1 + ε∗v1 + 2E∗

v1),

provided that v ≥ w0, where w0 is defined as in (8.36). It is immediatethat ε9.1(v) is bounded if µ∗0 < ∞; furthermore, by appealing to Propo-sition 6.1, it can be seen to be of order O

(v−(g1∧a1∧1)

)as v → ∞ under

Conditions (A0) and (B01). If v < w0, split the η sums in Corollary 8.8into the ranges v + 1 ≤ i ≤ w0 and i > w0, and simply bound all pointprobabilities for Tvm(i) in the former range by 1. This gives

η1 ≤w0∑

i=v+1

µi + 2µ∗w0; η2 ≤

w0∑i=v+1

2Ei1θ

iri+

4θE∗w01

w0r−w0

, (9.4)

and

η3 ≤w0∑

i=v+1

θ

iri(1 + εi1) +

2θ(1 + ε∗w01)w0r

−w0

, (9.5)

9.1. Bounds on individual probabilities 223

showing that we can take

ε9.1(v) =

[ε9.1(w0) +

w0∑i=v+1

µi +

θ

iri(1 + εi1 + 2Ei1)

]

in v < w0, completing the proof. ut

Remark. Note that for Z∗ we can take all the εil to be zero and the ri tobe infinite. This gives

IP[Tvm(Z∗) = s] = θs−1IP[s−m ≤ Tvm(Z∗) < s− v] (9.6)

for all s ≥ 1; this equation should be compared with (4.29). For mappings,ε9.1(v) = O(v−1/2); for polynomials and for square free polynomials,ε9.1(v) = O(q−v/2).

Lemma 9.1 shows in particular that sIP[Tvm = s] ≤ 1 + ε9.1(0) is uni-formly bounded in v, s and m, provided that µ∗0 <∞. A slight modificationof the argument now shows that sIP[Tvm(k)(Z) = s] and sIP[Tvm(k,k′)(Z) = s]are also uniformly bounded in v, s, m and k, k′ ∈ [v + 1,m], under thesame conditions: Tvm(k) = Tvm − Zm1 as before, and Tvm

(k,k′) denotesTvm − Zk1 − Zk′1 when k 6= k′, and Tvm − Zk1 − Zk2 when k = k′, thislatter only allowed if rk ≥ 2. These bounds are necessary for the subsequentapproximation arguments.

Lemma 9.2 Let T denote any one of Tvm(Z), Tvm(k)(Z) or Tvm(k,k′)(Z),for any s ≥ 1 and any k, k′ ∈ [v + 1,m]: then

sIP[T = s] ≤ K0θ,

where K0 = K0(Z) = 3 + ε9.1(0).

Remark. For infinitely divisible distributions, for which the ri can be takenarbitrarily large, the proof actually shows thatK0 = 1+ε9.1(0) is in order.For Z∗, where ε9.1(0) = 0, this gives K0(Z∗) = 1.

Proof. For Tvm, as already observed, the result follows from Lemma 9.1with K0−2 in place of K0. For Tvm(k), we use the proof of Lemma 8.7 withg(w) = 1lw = s, but taking W = Tvm

(k) in place of Tvm. This gives

IEWg(W ) = θm∑

i=v+1

(1− δik

ri

)IEg(Wi1 + i)

+ θm∑

i=v+1

(1− δik

ri

)∑l≥1

lεilIEg(Wi1 + il),


or

IEWg(W )− θm∑

i=v+1

IEg(W + i) = − θ

rkIEg(W + k) + θHv, (9.7)

where

Hv =m∑

i=v+1

∑l≥1

lεil

(1− δik

ri

)IEg(T (i,k)

vm + il)

−m∑

i=v+1

θ

iri

(1− δik

ri

)×∑l≥2

εilIEg(Tvm(i,k) + i(l + 1))− IEg(Tvm(i,k) + i)

+ (1 + εi1)IEg(Tvm(i,k) + 2i)− IEg(Tvm(i,k) + i)(9.8)

is bounded by the sum of the estimates of |ηt(g)|, 1 ≤ t ≤ 3, given in (9.1),(9.2) and (9.3). Hence

sIP[Tvm(k) = s] ≤ θIP[Tvm(k) < s− v] + r−1

k IP[Tvm(k) = s− k] + ε9.1(0)

≤ (K0 − 1)θ.

For Tvm(k,k′), a similar argument leads to

sIP[Tvm(k,k′) = s] ≤ θ

IP[Tvm(k,k′) < s− v] + r−1k IP[Tvm(k,k′) = s− k]

+ r−1k′ IP[Tvm(k,k′) = s− k′] + ε9.1(0)

≤ K0θ,

and the lemma follows. ut

Refined bounds

Lemma 9.1 shows that sIP[Tvm(Z) = s] is bounded by θ[1 + ε9.1(0)] forall s. The next result shows that, for values of s < n, bounds of better orderfor IP[Tvm(Z) = s] can be deduced. Taking their maximum over s, achievedat s = 0, they imply bounds of the same order as those of Lemma 4.12, butnow for all Zi such that Km

(1), defined below in (9.10), is bounded in m.This corresponds to a huge class of combinatorial structures, though notquite all those that satisfy the Logarithmic Condition: we require also themild conditions that µ∗0 < ∞ and that the Em are bounded, which arealways satisfied under Conditions (A0) and (B01). Even if the Em are notbounded, but µ∗0 < ∞ still holds, then Km

(1) grows more slowly than anypower of m, under the minimal condition that Em0 → 0: see the discussionfollowing (6.10).


Corollary 9.3 Let T denote any one of Tvm(Z), Tvm(k)(Z) or Tvm(k,k′)(Z).Then, for any s ≥ 1 and any k, k′ ∈ [v + 1,m],

IP[T = 0] ≤ Km(1)m−θ(v + 1)θ; IP[T = s] = 0, 1 ≤ s ≤ v;

IP[T = s] ≤ Km(1)m−θ(s+ 1)−(1−θ), s ≥ v + 1, (9.9)

where

Km(1) = Km

(1)(Z) = (2K0θ ∨ 1)e2+θEm (9.10)

and θ = minθ, 1.

Proof. We give only the argument for Tvm(k,k′); the others are almostidentical. Fix any k, k′ ∈ [v + 1,m]. Then, if v < s ≤ m, we have

IP[Tvm(k,k′) = s] = IP[Tvs(k,k′) = s]

×m∏

j=s+1

IP[Zj = 0](1− δjkIP[Zj1 > 0])(1− δjk′IP[Zj1 > 0])

≤ 2K0θ

s+ 1exp

−m∑

j=s+1

(1− δjk + δjk′

rj

)θj(1 + Ej0)

≤ 2K0θ

s+ 1e2+θEm

( s+ 1m+ 1

)θ,

where we use the fact that (θ/trt)(1 + Et0) = IP[Z1t > 0] ≤ 1, as well asusing Lemma 9.2 to bound IP[Tvs(k,k′) = s]; note also that rk ≥ 2 wheneverit is possible to have k = k′. This estimate improves on that of Lemma 9.2,because of the factor (s + 1)/(m + 1)θ. A similar argument also showsthat

IP[Tvm(k,k′) = 0] ≤ e2+θEm

( v + 1m+ 1

)θ; (9.11)

furthermore, IP[Tvm(k,k′) = s] = 0 for 0 < s ≤ v, and

IP[Tvm(k,k′) = s] ≤(2K0θ

s+ 1

)(9.12)

for s > m, directly from Lemma 9.2. These inequalities together are enoughto establish the corollary. ut

Corollary 9.3 gives bounds on IP[Tvm(Z) = s] which are of the correctorder when s ≤ m. For larger values of s, sharper bounds can be proved,showing that IP[Tvm(Z) = s] decreases quite fast as s becomes bigger. Thisextra precision is useful for handling the probability of occurrence of ex-ceptional values of T0b(Z), when proving the main approximation theoremsfor the distribution of the smaller components. We prove two such pairs of


bounds, generalizing Lemma 4.12 (iv). The first of these is useful for ap-proximating IP[Tvm(Z) = s] only for values of v such that p−v is not small,and this need not be the case for v = 0, which is of special interest to us.We therefore also prove a second pair of bounds, derived from the first,which apply particularly to IP[T0b(Z) = s].

Lemma 9.4 For any s ≥ 2m and any 0 ≤ v < m,

(1) sIP[Tvm(Z) = s]

≤ 2θ(s− 1)p−v

(1 + µ∗v)IETvm(Z) +θ(2K0θ ∨ 1)

s+ 1

m∑i=v+1

χ(0)i1 (s);

(2) sIP[Tvm(Z) = s]

≤ 4θ(s− 1)2p−v

(1 + µ∗v)IET2vm(Z) +

θ(2K0θ ∨ 1)s+ 1

m∑i=v+1

χ(0)i1 (s).

Proof. Taking g = 1ls in the proof of Lemma 8.7, we find using (8.35)and Lemma 9.2 that

sIP[Tvm(Z) = s] =

θ

m∑i=v+1

(1 + εi1)IP[Tvm(i) = s− i] +∑l≥2

lεilIP[Tvm(i) = s− il]

≤ θ

p−v

(1 + ε∗v1)IP[s−m ≤ Tvm < s− v]

+m∑

i=v+1

b(s+1)/2ic∑l=2

lε∗vlIP[Tvm = s− il]

+θ(2K0θ ∨ 1)

s+ 1

m∑i=v+1


lεil

(s+ 1

s− il + 1

)

≤ θ

p−v(1 + µ∗v)IP[Tvm ≥ (s− 1)/2] +

θ(2K0θ ∨ 1)s+ 1

m∑i=v+1

χ(0)i1 (s).

Now bound IP[Tvm ≥ (s− 1)/2] using moment inequalities. ut

Corollary 9.5 As before, let w0 = mini ≥ 0 : p−i ≥ 1/2. Then, for allb ≥ w0,

(1) sIP[T0b(Z) = s] ≤ ε9.5(1)(s, b), s ≥ 4b;(2) sIP[T0b(Z) = s] ≤ ε9.5(2)(s, b), s ≥ 16b,


where ε9.5(1)(s, b) = O(b/s) if Condition (B01) holds, and ε9.5(2)(s, b) =O(b/s2) if Condition (B02) holds.

Proof. By the definition of T0b = T0b(Z),

IP[T0b = s] ≤s∑r=1

IP[T0w0 = r]IP[Tw0b = s− r]

≤ maxs/2≤r≤s

IP[Tw0b = r] + maxs/2≤r≤s

IP[T0w0 = r]. (9.13)

For the first of these terms, we use Lemma 9.4 (1) to show that

maxs/2≤r≤s

IP[Tw0b = r] ≤ 16θs(s− 2)

(1 + µ∗w0)IETw0b

+4θ(2K0θ ∨ 1)s(s+ 2)

maxs/2≤r≤s

b∑i=w0+1

χi1(0)(r), (9.14)

for s ≥ 4b. For the second term, argue as in the first line of the proof ofLemma 9.4, for b ≥ w0, s ≥ 4b and r ≥ s/2, to give

rIP[T0w0 = r]

≤ θ

w0∑i=1

(1 + εi1)(

K0θ

r − i+ 1

)+

b(r+1)/2ic∑l=2

lεil

(K0θ

r − il + 1

)+θ(2K0θ ∨ 1)

r + 1

w0∑i=1

br/ic∑l=b(r+1)/2ic+1

lεil

(r + 1

r − il + 1

),

in view of Lemma 9.2, giving

maxs/2≤r≤s

IP[T0w0 = r] ≤ 8K0θ2

s(s+ 2)(1+µ∗0)w0+

4θ(2K0θ ∨ 1)s(s+ 2)

maxs/2≤r≤s

w0∑i=1

χi1(0)(r).

(9.15)Combining (9.13) with (9.14) and (9.15) completes the first part of theproof, with

ε9.5(1)(s, b) = ε9.5(1)(s, b, Z)

=16θs− 2

(1 + µ∗0)IETw0b(Z) +8K0θ

2w0

s+ 2(1 + µ∗0)

+8θ(2K0θ ∨ 1)

s+ 2max

s/2≤r≤s

b∑i=1

χi1(0)(r). (9.16)

Note that the above argument also applies almost without change to give

sIP[T0b(i)(Z) = s] ≤ ε9.5(1)(s, b), s ≥ 4b. (9.17)


For the second part, use Lemma 9.4 (2) to give

maxs/2≤r≤s

IP[Tw0b = r] ≤ 64θs(s− 2)2

(1 + µ∗w0)IET 2

w0b

+4θ(2K0θ ∨ 1)s(s+ 2)

maxs/2≤r≤s

b∑i=w0+1

χi1(0)(r). (9.18)

Then, for rIP[T0w0 = r], use the first line of the proof of Lemma 9.4 oncemore, but with IP[Tvm(i) = t] bounded by (9.17) for t ≥ r/2, as is possiblebecause we now require s ≥ 16b; this gives, for s/2 ≤ r ≤ s,

rIP[T0w0 = r] ≤ 2θw0r−1(1 + µ∗0)ε

∗9.5(1)(t, w0)

+θ(2K0θ ∨ 1)

r + 1

w0∑i=1

br/ic∑l=b(r+1)/2ic+1

lεil

(r + 1

r − il + 1

),

where we define

ε∗9.5(1)(n, b) = maxn/2≤s≤n

ε9.5(1)(s, b); (9.19)

from this, it follows that

maxs/2≤r≤s

IP[T0w0 = r] (9.20)

≤ 256θ2w0(1 + µ∗0)s2(s+ 8)

K0θw0(1 + µ∗0) + (2K0θ ∨ 1) max

s/8≤r≤s

w0∑i=1

χi1(0)(r)

+4θ(2K0θ ∨ 1)s(s+ 2)

maxs/2≤r≤s

w0∑i=1

χi1(0)(r). (9.21)

Combining (9.18) and (9.21) with (9.13) proves the second part, with

ε9.5(2)(s, b) = ε9.5(2)(s, b, Z)

=64θ

(s− 2)2(1 + µ∗0)IET

2w0b(Z) +

256K0θ3w0

2(1 + µ∗0)2

s(s+ 8)

+8θ(2K0θ ∨ 1)

s+ 2max

s/8≤r≤s

b∑i=1

χi1(0)(r). (9.22)

The order statements follow because the moments IET0b and IET 20b are

of order b and b2 if µ∗0 <∞ and ν∗0 <∞, respectively, by Lemma 6.3. Thequantity

(s+ 2)−1 maxs/8≤r≤s

b∑i=1

χi1(0)(r)

is of order (b/s)a2−δ for any δ > 0 under Condition (B01), from Propositions6.1 and 6.2 (2). ut

9.2. Differences of point probabilities 229

Remark. The faster rate in part (2) comes at the expense of requiringslightly faster decay in the εil as l increases, though still at a rate verymuch slower than is the case with the classical logarithmic combinatorialstructures. In particular, for random mappings, polynomials and square freepolynomials, ε9.5(1)(n/2, b) = O(b/n), and ε9.5(2)(n/2, b) = O((b/n)2).

9.2 Differences of point probabilities

This section is concerned with more delicate estimates than those to date.There is almost no example of their use in the easier arguments of Chapters4 and 5, because the results proved there are not the sharpest, and do notrequire as much precision. The one exception is Theorem 4.18, a theoremproved only for θ–biased random permutations, where the argument wasalready sufficiently difficult that generalization to more general assemblies,multisets and selections was not attempted in the framework of Chapter 5.The key element in its proof is equation (4.61), which accurately describesthe difference between the pair of point probabilities IP[Tbn = n − r] andIP[Tbn = n− s]. In the setting of Theorem 4.18, it follows directly from thesize biasing equation (4.51). In more general settings, Equation (4.51) doesnot hold exactly, and finding an accurate enough analogue involves carefuluse of Stein’s method; Corollary 8.8 is not sufficient for the purpose. This tosome extent explains why the argument of this section is rather awkward.

Bounds for first differences

The first lemma of the section involves a sum of the differences betweenprobabilities of the form IP[Tbn(i) = s− r] and their complete counterpartsIP[Tbn = s− r], needed in the subsequent arguments. Recall that θ = θ∧ 1.

Lemma 9.6 For s ≥ 1 and 0 ≤ b < m ≤ n, we have

(1)n∧s∑i=b+1

|εi1| |IP[Tbn(i)(Z) = s− i]− IP[Tbn(Z) = s− i]|

≤ n−θ (s+ 1)−(1−θ)φ9.6(n, s);

(2)n∧s∑i=b+1

|εi1| |IP[Tbn(i,m)(Z) = s− i]− IP[Tbn(m)(Z) = s− i]|

≤ n−θ (s+ 1)−(1−θ)φ9.6(n, s),

where φ9.6(n, s) = φ9.6(n, s, Z) is bounded under Conditions (A0)and (B01), uniformly in n and s.


Proof. We give only the proof of part (1). Since Tbn = Tbn(i) + Zi1, where

the latter two random variables are independent, it follows that

|IP[Tbn(i) = s− i]− IP[Tbn = s− i]|

≤ θ

iri(1 + εi1)|IP[Tbn(i) = s− 2i]− IP[Tbn(i) = s− i]|

+θ

iri

∑l≥2

εil|IP[Tbn(i) = s− i(l + 1)]− IP[Tbn(i) = s− i]|

≤ θ

iri

(1 + ρi)IP[Tbn(i) = s− i] + (1 + εi1)IP[Tbn(i) = s− 2i]

+∑l≥2

εilIP[Tbn(i) = s− i(l + 1)]. (9.23)

The probabilities are all bounded using Corollary 9.3, so that Lemma 13.7can be used with Kn

(1) for K to complete the estimates. First, by applyingLemma 13.7 (4,5) with vi = |εi1|(1 + ρi) and α = 1/2, we have

n∑i=b+1

|εi1|(1 + ρi)θ

iriIP[Tbn(i) = s− i]

≤ Kn(1)θn−θ(s+ 1)−(1−θ)21−θ

×

b(s+1)/2c∑

i=1

|εi1|(1 + ρi)iri

+ε∗s/2,1(1 + ρ∗s/2)

θr−s/2

;

then, from Lemma 13.7 (6,7) with vi = |εi1|(1 + εi1) and α = 1/4, we have

n∑i=b+1

|εi1|(1 + εi1)θ

iriIP[Tbn(i) = s− 2i]

≤ Kn(1)θn−θ(s+ 1)−(1−θ)21−θ

×

b(s+1)/4c∑

i=1

|εi1|(1 + εi1)iri

+2ε∗s/4,1(1 + ε∗s/4,1)

θr−s/4

;

finally, from Lemma 13.7 (8,9,10,11) with vi = |εi1|, we have

n∑i=b+1

|εi1|θ

iri

∑l≥2

εilIP[Tbn(i) = s− i(l + 1)]

≤ Kn(1)θn−θ(s+ 1)−(1−θ)

×

21−θb(s+1)/6c∑

i=1

|εi1|Ei1iri

+41−θ ε∗s/4,1 ε

∗s/4,2

θr−s/4


+ s−1[2φθ3(s) + ε∗01u∗2(s)]

.

Adding these three estimates gives (1) and (2), with

φ9.6(n, s) = Kn(1) θ

22−θb(s+1)/2c∑

i=1

|εi1|(1 + ρi)iri

+6ε∗s/4,1 (1 + ρ∗s/4)

θr−s/4

+ s−1[2φθ3(s) + ε∗01u∗2(s)]

. (9.24)

The order estimates then follow easily from Propositions 6.1 and 6.2. ut

Remark. If the Zi have infinitely divisible distributions, taking the ri to bearbitrarily large and using (6.12), it follows that φ9.6(n, s) = 0.

In the next lemma, the closeness of successive point probabilities to oneanother is made explicit, in a form used in the proof of Theorem 6.7. Thebasis for the argument is Equation (9.26), which can be seen to be thenecessary generalization of (4.61). Recall that ∆i = |εi1 − εi+1,1|.

Theorem 9.7 For b ≥ 0 and s ≥ b+ 1,

(1) |IP[Tbn(Z) = s]− IP[Tbn(Z) = s+ 1]|≤ n−θ(s+ 1)−(2−θ)φ9.7(n, s);

(2) |IP[Tbn(m)(Z) = s]− IP[Tbn(m)(Z) = s+ 1]|≤ n−θ(s+ 1)−(2−θ)(φ9.7(n, s) + φ9.7(n, s,m, b)

),

where

φ9.7(n, s) = φ9.7(n, s, Z) = O(s1−g2 + s1−(a1∧a2)+δ + S(s)

)(9.25)

for any δ > 0 under Conditions (A0) and (B01), and

φ9.7(n, s,m, b) = φ9.7(n, s,m, b, Z)

= Kn(1)θ(1 + ε∗01)

(

s+1s+1−m

)1−θ, if m < s;

(s+ 1)1−θ(b+ 1)θ if m ∈ s, s+ 1;0 if m > s+ 1.

Proof. We give the proof of part (2), in which some extra terms enter: forpart (1), the argument is almost the same, but a little simpler.


As for Lemma 9.2, we use the proof of Lemma 8.7 with W = Tbn(m) in

place of Tbn, but we now have g(w) = 1lw = s− 1lw = s+ 1 in place ofg(w) = 1lw = s. This, from (9.7) and (9.8), now gives

sIP[Tbn(m) = s]− (s+ 1)IP[Tbn(m) = s+ 1] + θIP[Tbn(m) = s− b]= −θr−1

m

IP[Tbn(m) = s−m]− IP[Tbn(m) = s+ 1−m]

+ θη4 + η1 + η2 + η3,

or

(s+ 1)IP[Tbn(m) = s]− IP[Tbn(m) = s+ 1]= IP[Tbn(m) = s]− θIP[Tbn(m) = s− b]

− θr−1m

IP[Tbn(m) = s−m]− IP[Tbn(m) = s+ 1−m]

+ θη4 + η1 + η2 + η3, (9.26)

where

η4 =n∑

i=b+1

(1− δim

rm

)εi1

×IP[Tbn(i,m) = s− i]− IP[Tbn(i,m) = s+ 1− i]

; (9.27)

η1 =n∑

i=b+1

(1− δim

rm

)∑l≥2

lεil

×IP[Tbn(i,m) = s− il]− IP[Tbn(i,m) = s+ 1− il]

; (9.28)

η2 =n∑

i=b+1

(1− δim

rm

)θ

iri

∑l≥2

εil

×IP[Tbn(i,m) = s− i]− IP[Tbn(i,m) = s+ 1− i] (9.29)

− IP[Tbn(i,m) = s− i(l + 1)] + IP[Tbn(i,m) = s+ 1− i(l + 1)]

;

η3 =n∑

i=b+1

(1− δim

rm

)θ

iri(1 + εi1)

×IP[Tbn(i,m) = s− i]− IP[Tbn(i,m) = s+ 1− i] (9.30)

− IP[Tbn(i,m) = s− 2i] + IP[Tbn(i,m) = s+ 1− 2i].

Invoking Lemma 9.6, it is immediate that∣∣∣∣∣η4 −n∑

i=b+1

εi1IP[Tbn(m) = s− i]− IP[Tbn(m) = s+ 1− i]

∣∣∣∣∣≤ r−1

m |εm1|∣∣IP[Tbn(m) = s−m]− IP[Tbn(m) = s+ 1−m]

∣∣ (9.31)

+ n−θ(s+ 1)−(1−θ)φ9.6(n, s) + φ9.6(n, s+ 1),


and then, matching terms with the same probability, we deduce fromCorollary 9.3 that∣∣∣∣∣

n∑i=b+1


∣∣∣∣∣≤ |εb+1,1|IP[Tbn(m) = s− b] +

s+1∑i=b+1

∆iIP[Tbn(m) = s+ 1− i]

≤ 21−θKn(1)n−θ(s+ 1)−(1−θ)

ε∗01 +b(s+1)/2c∑

i=1

∆i

+Kn

(1)θ−1∆∗(s+1)/2

(s+ 1n

)θ, (9.32)

noting in particular that IP[Tbn(m) = s− b] = 0 if b+ 1 ≤ s ≤ 2b. Hence

θ

∣∣∣∣∣n∑

i=b+1


∣∣∣∣∣≤ Kn

(1)θn−θ(s+ 1)−(1−θ) (9.33)

×

21−θ

ε∗01 +b(s+1)/2c∑

i=1

∆i

+ (s+ 1)θ−1∆∗(s+1)/2

.

Next, using simple estimates from Corollary 9.3, we have

θr−1m (1 + |εm1|)|

∣∣IP[Tbn(m) = s−m]− IP[Tbn(m) = s+ 1−m]∣∣

≤ Kn(1)θ(1 + ε∗01)n

−θ × (s+ 1−m)−(1−θ) if m < s;(b+ 1)θ if m ∈ s, s+ 1;0 if m > s+ 1,

(9.34)

and ∣∣θIP[Tbn(m) = s− b]− IP[Tbn(m) = s]∣∣

≤ Kn(1) max1, θ21−θn−θ(s+ 1)−(1−θ). (9.35)

Then Corollary 9.3 and Lemma 13.7 (1,2,3) give

θ|η1| ≤ Kn(1)θn−θ(s+ 1)−(1−θ) (9.36)

×

21−θb(s+2)/4c∑

i=1

Fi1 + φθ1(s) + u∗1(s)

,

where φαr (s) = maxφαr (s), φαr (s+ 1) and u∗r(s) = maxu∗r(s), u∗r(s+ 1).Then Corollary 9.3 and parts (8,9,10,11) of Lemma 13.7 with vi = 1 and


parts (4,5) with vi = Ei1 and α = 1/2 give

θ|η2| ≤ Kn(1)θ2n−θ(s+ 1)−(1−θ) (9.37)

×

22−θb(s+2)/2c∑

i=1

Ei1iri

+2φθ2(s) + u∗2(s)

s+

6E∗s/4,1

θr−s/4

;

and Corollary 9.3 and Lemma 13.7 (4,5,6,7) with vi = (1+εi1) and α = 1/4give

θ|η3| ≤ Kn(1)θ2n−θ(s+ 1)−(1−θ) (9.38)

×

4b(s+2)/4c∑

i=1

(1 + εi1)iri

+8(1 + ε∗s/4,1)

θr−s/4

.

From (9.26), (9.31), (9.33)–(9.38) and the definition (9.24) of φ9.6(n, s),and noting that θ21−θ ≤ 1, we have thus proved Part (2) of the lemma,with

φ9.7(n, s)

= Kn(1) (θ/θ)

b(s+2)/2c∑i=1

[∆i + Fi1 + 4(1 + ρi)(1 + |εi1|)

θ

iri

]

+ (s+ 1)∆∗s/2 +

4θ(1 + ρ∗s/4)(2 + 3ε∗s/4,1)

r−s/4+ φθ1(s) + u∗1(s)

+ (1 + ε∗01) + s−1θ(1 + 2ε∗01)[2φθ2(s) + u∗2(s)]

;

the order estimates now follow from Propositions 6.1 and 6.2. ut

Remark. φ9.7(n, s) is bounded uniformly in n and s, under Conditions(A0), (D1) and (B11) and if, in addition, S(∞) <∞. For any t > 1/(s+1),

φ9.7(n, s,m, b) ≤ Kn(1)θ(1 + ε∗01)t

−(1−θ) ≤ θt−(1−θ)φ9.7(n, s), (9.39)

if m ≤ (1 − t)(s + 1); we use this bound most frequently with t = 1/2,noting also that θ21−θ ≤ 1.

Corollary 9.8 Uniformly in 0 ≤ b ≤ (n− 2)/4 and n/2 ≤ s ≤ n,

|IP[Tbn(Z) = s]− IP[Tbn(Z) = s+ 1]| ≤ 4n−2φ∗9.8(n),

where

φ∗9.8(n) = φ∗9.8(n,Z) = max1≤s≤n

φ9.7(n, s) = O(S(n)),

under Conditions (A0), (D1) and (B11).


Remark. It is proved in Corollary 9.11 below that φ∗9.8(n) can be replacedby an alternative expression, which is more complicated to state, but is oforder O(1) under Conditions (A0), (D1) and (B11). The proof of this stillmakes use of Theorem 9.7.

Approximating first differences

Suppose that (A0), (D1) and (B11) hold, and that S(∞) <∞. Then, fromCorollary 9.8, the difference |IP[Tbn = n−r]− IP[Tbn = n−s]| is of order atmost n−2|r−s|, uniformly in 0 ≤ r, s ≤ n/2, and this is the main element inthe proof of Theorem 6.7. For Theorem 6.8, a more refined evaluation of thedifference is needed, which expresses it as an explicitly determined elementof order O(n−2), together with a remainder of smaller order. The improve-ment that we achieve has the added advantage of removing the conditionS(∞) < ∞, as observed above. To establish it, we start by sharpeningLemma 9.6, using Theorem 9.7 (2) to do so.

Lemma 9.9 For 0 ≤ b < n and 2b+ 3 ≤ s ≤ n, we haven∑

i=b+1

|εi1||IP[Tbn(i)(Z) = s− i]− IP[Tbn(Z) = s− i]|

≤ n−θ(s+ 1)−(1−θ)ε9.9(n, s),

where

ε9.9(n, s) = ε9.9(n, s, Z) = O(S(n)s−(g1∧1)

)under Conditions (A0), (D1) and (B11).

Proof. As in the proof of Lemma 9.6,n∑

i=b+1

|εi1||IP[Tbn(i) = s− i]− IP[Tbn = s− i]|

≤n∑

i=b+1

θ|εi1|iri

(1 + |εi1|)∣∣IP[Tbn(i) = s− 2i]− IP[Tbn(i) = s− i]

∣∣+

n∑i=b+1

θ|εi1|iri

∑l≥2

εil∣∣IP[Tbn(i) = s− i(l + 1)]− IP[Tbn(i) = s− i]

∣∣ .= U1 + U2,

say. We split U1 into the ranges i ≤ [(s+ 1)/4] and i ≥ [(s+ 1)/4] + 1. Inthe first range, we can apply Theorem 9.7 (2) to bound the differences ofprobabilities, noting that s−2i ≥ b+1 is satisfied here, because s ≥ 2b+3.Since also (s + 1)/4 ≤ (1 − t)(s + 1)/2 for t = 1/2, it follows from (9.39),Corollary 9.3 and Theorem 9.7 (2) that the conditions of Lemma 13.9 (1)


are satisfied with C = 2φ∗9.8(n). In the second range, Corollary 9.3 allowsLemma 13.7 (5,7) to be applied with K = Kn

(1), vi = |εi1|(1 + |εi1|) andα = 1/4. Together, these give

U1 ≤ 2φ∗9.8(n) θn−θ(s+ 1)−(2−θ)22−θb(s+1)/4c∑

i=1

r−1i |εi1|(1 + |εi1|)

+Kn(1) θn−θ(s+ 1)−(1−θ)

(8ε∗s/4,1(1 + ε∗s/4,1)

θr−s/4

). (9.40)

Turning to U2, we find that

U2 ≤b(s+1)/6c∑

i=1

θ|εi1|iri

b(s+1)/2ic−1∑l=2

εil

×∣∣IP[Tbn(i) = s− i(l + 1)]− IP[Tbn(i) = s− i]

∣∣+

b(s+1)/4c∑i=1

θ|εi1|iri

bs/ic−1∑l=b(s+1)/2ic

i(l+1)<s

εilIP[Tbn(i) = s− i(l + 1)]

+bs/3c∑

i=b(s+1)/4c+1

εi2IP[Tbn(i) = s− 3i]

+bs/3c∑i=1

θ|εi1|iri

∑l≥2

i(l+1)=s

εilIP[Tbn(i) = 0]

+b(s+1)/6c∑

i=1

θ|εi1|iri

∑l≥b(s+1)/2ic

εilIP[Tbn(i) = s− i]

+s∑

i=b(s+1)/6c+1

θ|εi1|iri

Ei1IP[Tbn(i) = s− i];

as above, we use (9.39) with t = 1/2 and Theorem 9.7 (2) in combinationwith Lemma 13.9 (2) on the first of these sums, then invoking Corollary 9.3and Lemma 13.7 (9,10,11) with K = Kn

(1) and vi = |εi1| for the next threeand Lemma 13.7 (5) with α = 1/6 and vi = |εi1|Ei1 for the last of them,leaving

b(s+1)/6c∑i=1

θ|εi1|iri

∑l≥b(s+1)/2ic

εilIP[Tbn(i) = s− i]

≤ θ

b(s+1)/6c∑i=1

r−1i |εi1|


×∑

l≥b(s+1)/2ic

4l/(s+ 1)εilKn(1)n−θ(s+ 1)−(1−θ)(6/5)1−θ

≤ Kn(1)θn−θ(s+ 1)−(2−θ)

b(s+1)/6c∑i=1

5|εi1|Fi2ri

,

which has been directly estimated using Corollary 9.3. This gives thestatement of the lemma, with

ε9.9(n, s) = 2φ∗9.8(n) θ(s+ 1)−1

b(s+1)/4c∑i=1

22−θ|εi1|(1 + µi)ri

+Kn(1)θs−1

5b(s+1)/6c∑

i=1

|εi1|Fi1ri

+ 2φθ3(s) + ε∗01u∗2(s)

+Kn

(1)θ

(9ε∗s/6,1(1 + ρ∗s/6)

θr−s/6

)= O

(φ∗9.8(n)s−(g1∧1) + s−(a1∧a2)+δ

),

for any δ > 0, from Propositions 6.1 and 6.2. This concludes the proof. ut

Remark. If the Zi’s are infinitely divisible, ε9.9(n, s) = 0 for all n and s.

Using the preceding lemma together with Theorem 9.7 (2), we can nowsharpen the result of Theorem 9.7 (1) in the way that we require.

Theorem 9.10 Uniformly in n ≥ 18, w0 ≤ b ≤ n/8 and n/2 ≤ s < n,∣∣IP[Tbn(Z) = s]− IP[Tbn(Z) = s+ 1]− (1− θ)(s+ 1)−1IP[Tbn(Z) = s]∣∣

≤ n−θ(s+ 1)−(2−θ)ε9.10(n) + θ21−θ min4φ∗9.8(n)bn−1,Kn(1),

with

ε9.10(n) = ε9.10(n, Z)

= O(φ∗9.8(n)n−(g1∧a1∧1) + n−θ/2 log n+ n1−(g1∧a1∧a2)+δ

),

under Conditions (A0) and (B01). If Conditions (A0), (D1) and (B11) hold,then ε9.10(n) = O(n−β12+δ) for any δ > 0, where

β12 = min1/2, θ/2, g1, [(g2 ∧ a1 ∧ a2)− 1].

Proof. We start once again with (9.26) from the proof of Theorem 9.7, butwithout the index m, giving

(s+ 1)IP[Tbn = s]− IP[Tbn = s+ 1]+ θIP[Tbn = s− b]− IP[Tbn = s]


= θη4 + η1 + η2 + η3, (9.41)

where

η4 =n∑

i=b+1

εi1IP[Tbn(i) = s− i]− IP[Tbn(i) = s+ 1− i]

; (9.42)

η1 =n∑

i=b+1

∑l≥2

lεilIP[Tbn(i) = s− il]− IP[Tbn(i) = s+ 1− il]

; (9.43)

η2 =n∑

i=b+1

θ

iri

∑l≥2

εilIP[Tbn(i) = s− i]− IP[Tbn(i) = s+ 1− i]

− IP[Tbn(i) = s− i(l + 1)]− IP[Tbn(i) = s+ 1− i(l + 1)]

;(9.44)

η3 =n∑

i=b+1

θ

iri(1 + εi1)

IP[Tbn(i) = s− i]− IP[Tbn(i) = s+ 1− i]

− IP[Tbn(i) = s− 2i]− IP[Tbn(i) = s+ 1− 2i]. (9.45)

Note that the formulae for the η’s all contain differences of probabilities,a feature hardly exploited in the proof of Theorem 9.7. Now, in appro-priate ranges of i, we can use Theorem 9.7 (2) and Lemma 13.9 to obtainbetter estimates of the differences than is possible using Corollary 9.3 andLemma 13.7. For instance, taking η1, we split the ranges of the sums insuch a way that Theorem 9.7 (2) and (9.39) with t = 1/2 can be applied inthe first sum; using Lemmas 13.9 (5) and 13.7 (2,3), this then gives

|η1| ≤b(s+1)/4c∑i=b+1

b(s+1)/2ic∑l=2

lεil∣∣IP[Tbn(i) = s− il]− IP[Tbn(i) = s+ 1− il]

∣∣+

∣∣∣∣∣∣∣bs/2c∑i=b+1

bs/ic∑l=[(s+1)/2i]+1

il<s

lεilIP[Tbn(i) = s− il]

−b(s+1)/2c∑i=b+1

b(s+1)/ic∑l=[(s+1)/2i]+1

il<s+1

lεilIP[Tbn(i) = s+ 1− il]

∣∣∣∣∣∣∣+

∣∣∣∣∣∣∣bs/2c∑i=b+1

∑l≥2il=s

lεilIP[Tbn(i) = 0]−b(s+1)/2c∑i=b+1

∑l≥2

il=s+1

lεilIP[Tbn(i) = 0]

∣∣∣∣∣∣∣≤ 2φ∗9.8(n)n−θ(s+ 1)−(2−θ)22−θ

b(s+1)/4c∑i=1

Fi1

+Kn(1)n−θ(s+ 1)−(1−θ)


×

φθ1(s) + u∗1(s) + 21−θ(b+ 1)θ(s+ 1)1−θb(s+2)/4c∑

i=1

∑l≥2

2il=s+2

lεil

≤ 2φ∗9.8(n)n−θ(s+ 1)−(2−θ)22−θ

b(s+1)/4c∑i=1

Fi1

+Kn(1)n−θ(s+ 1)−(1−θ) (9.46)

×φθ1(s) + u∗1(s) + 4u1(0, (s+ 2)/2)

;

the last term enters because the range of summation in Lemma 13.7 (2)with s+ 1 for s does not exactly match that needed here. Note that, if s isodd, u1(0, (s+ 2)/2) can be defined to be 0.

Next, taking η2, we separate the pairs of differences, and first applyLemmas 13.9 (3) and 13.7 (5) with α = 1/4 and vi = Ei1 to show that∣∣∣∣∣

n∑i=b+1

θ

iriEi1IP[Tbn(i) = s− i]− IP[Tbn(i) = s+ 1− i]

∣∣∣∣∣≤ 2φ∗9.8(n)θn−θ(s+ 1)−(2−θ)(4/3)2−θ

b(s+1)/4c∑i=1

Ei1iri

+Kn(1)θn−θ(s+ 1)−(1−θ)E

∗s/4,1

θr−s/4

×

4(3/4)θ + 4(s+ 2)−1

(4(s+ 1)3(s+ 2)

)1−θ

≤ 2φ∗9.8(n)θn−θ(s+ 1)−(2−θ)2b(s+1)/4c∑

i=1

Ei1iri

+Kn(1)θn−θ(s+ 1)−(1−θ) 4E

∗s/4,1

θr−s/4, (9.47)

again taking careful account of the ranges of summation, here inLemma 13.7 (5) if (s + 2)/4 is an integer. In a similar fashion, usingLemmas 13.9 (6) and 13.7 (9,10,11) with vi = 1, we then obtain∣∣∣∣∣∣

n∑i=b+1

θ

iri

∑l≥2

εilIP[Tbn(i) = s− i(l + 1)]− IP[Tbn(i) = s+ 1− i(l + 1)]

∣∣∣∣∣∣≤ 2φ∗9.8(n)θn−θ(s+ 1)−(2−θ)2

b(s+1)/6c∑i=1

Ei1iri

+Kn(1)θn−θ(s+ 1)−(1−θ)


×

2(φθ2(s) + u∗2(s))s+ 1

+4ε∗s/4,1θr−s/4

+ 21−θb(s+2)/6c∑i=b+1

∑l≥2

2i(l+1)=s+2

εiliri

+4

s+ 2

(4(s+ 1)s+ 2

)1−θ ε∗s/4,2

θr−s/4

≤ 2φ∗9.8(n)θn−θ(s+ 1)−(2−θ)2b(s+1)/6c∑

i=1

Ei1iri

+Kn(1)θn−θ(s+ 1)−(1−θ) (9.48)

×

2(φθ2(s) + u∗2(s))

s+ 1+

4ε∗s/4,1 + ε∗s/4,2

θr−s/4+

8u2(0, (s+ 2)/2)(s+ 1)2

;

here, the range of summation needed care in Lemma 13.7 (9) whenever(s+ 2)/2i was an integer, and in Lemma 13.7 (10) when (s+ 2)/4 was aninteger.

The treatment of η3 is slightly more complicated. The aim is to findan estimate which becomes small fast enough as n becomes large, with-out necessarily assuming that rn → ∞, thus increasing the generality ofTheorem 6.8. In order to achieve this, we separate the pairs of differencesin (9.45), and estimate each using parts of Lemmas 13.9 and 13.7 withvalues of α which change with s. The values of α that we need are largerthan before, and, as a result, the uniform bounds on φ9.7(n, s,m, b) givenin (9.39) are no longer adequate. Instead, we consider the contributionfrom φ9.7 separately, when applying Theorem 9.7 (2).

For instance, for the first pair, from Theorem 9.7 (2) and Lemmas 13.9 (3)and 13.7 (5) with vi = 1 + εi1, we have

∣∣∣∣∣n∑

i=b+1

θ

iri(1 + εi1)IP[Tbn(i) = s− i]− IP[Tbn(i) = s+ 1− i]

∣∣∣∣∣≤ θn−θ(s+ 1)−(2−θ)(1− α)−(2−θ)

bα(s+1)c∑i=1

1 + εi1iri

φ∗9.8(n)

+bα(s+1)c∑i=1

1 + εi1iri

θn−θ(s− i+ 1)−(2−θ)φ9.7(n, s− i, i, b)

+Kn(1)θn−θ(s+ 1)−(1−θ) 2(1 + ε∗s/2,1)(1− α)−(1−θ)

(s+ 1)r−s/21lbα(s+1)c≥s−b

+Kn(1)θn−θ(s+ 1)−(1−θ) (1 + ε∗αs,1)(1− α)θ

θαr−αs

1 +

θ

(1− α)(s+ 2)

;


taking α = 1− (s+ 1)−1/2, this gives∣∣∣∣∣n∑

i=b+1

θ

iri(1 + εi1)IP[Tbn(i) = s− i]− IP[Tbn(i) = s+ 1− i]

∣∣∣∣∣≤ θn−θ(s+ 1)−(1−θ/2)

×φ∗9.8(n)

(s+1)∧n∑i=1

1 + εi1iri

+ 4Kn(1)

(1 + ε∗s/2,1)

θr−s/2

+ 22−θKn(1)(1 + ε∗01)

(22−θh(b(s+ 1)/4c+ 1) + 5

), (9.49)

with h(i+1) =∑ij=1 j

−1 as usual, since, from the definition (9.25) of φ9.7,φ9.7(n, s− i, i, b) = 0 for 2i > s+ 1, and

∑i≥1

i−1φ9.7(n, s− i, i, b) ≤ Kn(1)θ(1 + ε∗01)

b(s+1)/4c∑i=1

1i

(s+ 1

s− 2i+ 1

)1−θ

+∑

(s+1)/4<i<s/2

1i

(s+ 1

s− 2i+ 1

)1−θ

+ 4s−1(b+ 1)θ(s+ 1)1−θ,

with the quantity in braces at most

21−θh(b(s+ 1)/4c+ 1)

+4

21−θ(s+ 1)θ∑

(s+1)/4<i<s/2

(s+ 1)/2− i−(1−θ) + 5(b+ 1s+ 1

)θ≤ 21−θh(b(s+ 1)/4c+ 1)

+4

21−θ(s+ 1)θ

(s+ 1

4

)θ b(s+1)/4c∑j=1

j−1 + 5(b+ 1s+ 1

)θ

≤ 22−θh(b(s+ 1)/4c+ 1) + 5(b+ 1s+ 1

)θ.

The second of the differences is estimated using Theorem 9.7 and Lemmas13.9 (4) and 13.7 (7) with vi = 1 + εi1, giving∣∣∣∣∣

n∑i=b+1

θ

iri(1 + εi1)IP[Tbn(i) = s− 2i]− IP[Tbn(i) = s+ 1− 2i]

∣∣∣∣∣≤ θn−θ(s+ 1)−(2−θ)(1− 2α)−(2−θ)

bα(s+1)c∑i=1

1 + εi1iri

φ∗9.8(n)

+bα(s+1)c∑i=1

1 + εi1iri

θn−θ(s− 2i+ 1)−(2−θ)φ9.7(n, s− 2i, i, b)


+Kn(1)θn−θ(s+ 1)−(1−θ) 4(1 + ε∗s/4,1)(1− 2α)−(1−θ)

(s+ 1)r−s/41l2bα(s+1)c≥s−b

+Kn(1)θn−θ(s+ 1)−(1−θ) (1 + ε∗αs,1)(1− 2α)θ

θαr−αs

1 +

θ

(1− 2α)(s+ 2)

;

choosing α = 121− (s+ 1)−1/2 then gives∣∣∣∣∣

n∑i=b+1

θ

iri(1 + εi1)IP[Tbn(i) = s− 2i]− IP[Tbn(i) = s+ 1− 2i]

∣∣∣∣∣≤ θn−θ(s+ 1)−(1−θ/2)

×φ∗9.8(n)

(s+1)∧n∑i=1

1 + εi1iri

+ 8Kn(1)

(1 + ε∗s/2,1)

θr−s/2

+ 32−θKn(1)(1 + ε∗01)

(22−θh(b(s+ 1)/6c+ 1) + 7

), (9.50)

where the contribution from φ9.7 is bounded much as for the previousestimate.

To bound η4, we first invoke Lemma 9.9 to give∣∣∣∣∣η4 −n∑

i=b+1

εi1IP[Tbn = s− i]− IP[Tbn = s+ 1− i]

∣∣∣∣∣≤ n−θ(s+ 1)−(1−θ)ε9.9(n, s) + ε9.9(n, s+ 1). (9.51)

Then, from Lemma 13.9 (3) with α = 1/4 and vi = iri|εi1|/θ, we have

b(s+1)/4c∑i=b+1

|εi1| |IP[Tbn = s− i]− IP[Tbn = s+ 1− i]|

≤ 4φ∗9.8(n)n−θ(s+ 1)−(2−θ)b(s+1)/4c∑

i=1

|εi1|; (9.52)

for the remainder, matching terms with equal probabilities and usingCorollary 9.3 to bound the remaining element gives∣∣∣∣∣∣

s+1∑i=b(s+1)/4c+1

εi1IP[Tbn = s− i]− IP[Tbn = s+ 1− i]

∣∣∣∣∣∣≤ Kn

(1)n−θ(s+ 1)−(1−θ)(4/3)1−θε∗s/2,1

+s+1∑

i=b(s+1)/4c+1

IP[Tbn = s+ 1− i]∆i

≤ Kn(1)n−θ(s+ 1)−(1−θ)

(4ε∗s/2,1/3) + 2θ−1(s+ 1)∆∗

s/4

, (9.53)


the last line once again from Corollary 9.3, used to show that

IP[Tbn ≤ 3(s+ 1)/4] ≤ Kn(1)n−θ

(b+ 1)θ +

1θ

(34

)θ(s+ 1)θ

.

Combining (9.46)–(9.53) into (9.41), we have established that∣∣(s+ 1)IP[Tbn = s]− IP[Tbn = s+ 1]+ θIP[Tbn = s− b]− IP[Tbn = s]

∣∣≤ n−θ(s+ 1)−(1−θ)ε9.10(n),

whenever n ≥ 18, w0 ≤ b < (n− 6)/4 and n/2 ≤ s < n, where

ε9.10(n) = ε9.10(n,Z)

= 4n−1φ∗9.8(n) (θ/θ)b(n+2)/4c∑

i=1

2(µi +

2θEi1iri

)+

4|εi1|(1 + µi)ri

+ 3n−θ/2φ∗9.8(n) θ2n∑i=1

1 + εi1iri

+Kn(1)θ

×

2θn−1

10b(n+2)/4c∑

i=1

|εi1|Fi1ri

+ (1 + 2ε∗01)2(φ∗θ2 (n) + u∗2(n))

+ φ∗θ1 (n) + 5u∗1(n) + 32n−2u∗2(n) + 2ε∗n/4,1

+ 2θ−1n∆∗n/8 +

2θθn−θ/2(1 + ε∗01)

12h(bn/4c+ 1) + 25

+

15θρ∗n/12(1 + ρ∗n/12)

θr−n/12

,

estimated under Conditions (A0) and (B01) using Propositions 6.1 and 6.2.Then finally, from Theorem 9.7 and Corollary 9.3,

|IP[Tbn = s− b]− IP[Tbn = s]| (9.54)

≤ min

22−θφ∗9.8(n) bn−θ(s+ 1)−(2−θ),Kn(1)21−θn−θ(s+ 1)−(1−θ)

,

and the theorem is proved. ut

Theorem 9.10 can now be combined with Lemma 9.2 to give a boundon the first differences IP[Tbn(Z) = s] − IP[Tbn(Z) = s + 1] in the rangen/2 ≤ s ≤ n, which improves upon Corollary 9.8 in its asymptotic order.

Corollary 9.11 Uniformly in n ≥ 18, w0 ≤ b ≤ n/8 and n/2 ≤ s < n,

|IP[Tbn(Z) = s]− IP[Tbn(Z) = s+ 1]| ≤ n−2φ∗9.11(n),


where

φ∗9.11(n) = φ∗9.11(n,Z) = 4K0θ|1− θ|+ 4ε9.10(n) + 2θKn(1)

is bounded, irrespective of the values of the ri, under Conditions (A0), (D1)and (B11).

Simplified approximations

It is convenient to simplify the conclusion of Theorem 9.10 a little further,replacing IP[Tbn = s] by IP[T0n = n] and (s + 1)−1 by (n + 1)−1 in theprincipal approximation to the first differences, which should make littledifference if b is small and s is close to n. The first step in this direction isto establish how close the two probabilities are.

Lemma 9.12 For n ≥ 18, w0 ≤ b ≤ n/8 and 0 ≤ l ≤ n/2,

|IP[Tbn(Z) = n− l]− IP[T0n(Z) = n]| = O(n−2(l + b)

)under Conditions (A0), (D1) and (B11).

Proof. Direct calculation shows that

|IP[Tbn = n− l]− IP[T0n = n]|

=

∣∣∣∣∣∣∑r≥0

IP[T0b = r]IP[Tbn = n− r]− IP[Tbn = n− l]

∣∣∣∣∣∣≤

bn/2c∑r=0

IP[T0b = r]|IP[Tbn = n− r]− IP[Tbn = n− l]|

+ max

maxn/2<r≤n

IP[T0b = r], IP[Tbn = n− l]IP[T0b > n/2].

Now, from Corollary 9.8, the first term is no bigger than∑r≥0

IP[T0b = r]|r − l| 4n−2φ∗9.8(n) ≤ 4(IET0b + l)n−2φ∗9.8(n);

then Corollary 9.5 (1) implies that

maxn/2<r≤n

IP[T0b = r] ≤ 2n−1ε∗9.5(1)(n, b)

and Corollary 9.3 and Chebyshev’s inequality give

IP[Tbn = n− l]IP[T0b > n/2] ≤ 4n−1IET0bKn(1)n−1,

from which it follows that

|IP[Tbn = n− l]− IP[T0n = n]| ≤ n−1ε9.12(n, b) + 4φ∗9.8(n)(l/n),


where

ε9.12(n, b) = 4n−1IET0bφ∗9.8(n) +Kn(1)+ 2ε∗9.5(1)(n, b).

From Lemma 6.3 and Corollaries 9.8 and 9.5 (1), if Conditions (A0), (D1)and (B11) hold, then ε9.12(n, b) is of order O(b/n) provided also thatS(∞) <∞; this last condition is shown to be superfluous by using Corol-lary 9.11 in place of Corollary 9.8. ut

The next lemma also bounds the error incurred in replacing (s+1)−1 by(n+ 1)−1.

Lemma 9.13 For n ≥ 18, w0 ≤ b ≤ n/8 and 0 ≤ l ≤ n/2,∣∣∣∣ IP[Tbn(Z) = n− l]n− l + 1

− IP[T0n(Z) = n]n+ 1

∣∣∣∣ = O(n−3(b+ l)

),

under Conditions (A0), (D1) and (B11).

Proof. We simply write∣∣∣∣ IP[Tbn = n− l]n− l + 1

− IP[T0n = n]n+ 1

∣∣∣∣=∣∣∣∣ (n+ 1)IP[Tbn = n− l]− IP[T0n = n]+ lIP[T0n = n]

(n+ 1)(n− l + 1)

∣∣∣∣≤ 2n−2ε9.12(n, b) + 4φ∗9.8(n)(l/n) + lIP[T0n = n],

after making the obvious estimates. The first two elements are estimated asfor Lemma 9.12, and Lemma 9.2 is used for the third; together, this showsthat the difference is of order O(n−3(b + l)) under Conditions (A0), (D1)and (B11). ut

The simplified approximation to differences of point probabilities cannow be established.

Theorem 9.14 For n ≥ 18, w0 ≤ b ≤ n/8 and 0 < l ≤ n/2,

|IP[Tbn(Z) = n− l]− IP[Tbn(Z) = n− l + 1]− (1− θ)(n+ 1)−1IP[T0n(Z) = n]

∣∣≤ n−2

2ln−1|1− θ|K0θ + 4φ∗9.8(n)+ ε9.14(n, b)

, (9.55)

where

ε9.14(n, b) = ε9.14(n, b, Z)

= 22−θε9.10(n) + θ25−2θφ∗9.8(n)bn−1 + 2|1− θ| ε9.12(n, b).

The estimate (9.55) is of order O(n−2n−1(b+l)+n−β12+δ) for any δ > 0,under Conditions (A0), (D1) and (B11) and if S(∞) < ∞, where β12 is


given in Theorem 9.10. Using Corollary 9.11 in place of Corollary 9.8, itfollows that the condition S(∞) <∞ is in fact superfluous.

Proof. We combine Theorem 9.10 and Lemma 9.13 to give∣∣IP[Tbn = n− l]− IP[Tbn = n− l + 1] − (1− θ)(n+ 1)−1IP[T0n = n]∣∣

≤ 22−θn−2ε9.10(n) + θ23−θbn−1φ∗9.8(n)

+ 2|1− θ|n−2ε9.12(n, b) + 4φ∗9.8(n)(l/n) + lIP[T0n = n],

which is enough. ut


10Distributional comparisons with Pθ

This chapter is concerned with sharpening Theorem 5.3, which proves theconvergence of L(m−1Tvm(Z)) to Pθ under the Logarithmic Condition,by providing estimates of the rate of convergence. We accomplish this intwo stages. In the first, culminating in Theorems 10.5–10.8, the differencebetween the distributions of Tvm(Z) and Tvm(Z∗) is analysed. In the re-mainder of the chapter, L(m−1Tvm(Z∗)) is compared to Pθ. The argumentinvolves Stein’s method throughout.

10.1 Comparison of L(Tvm(Z)) and L(Tvm(Z∗))

We begin with two simple comparisons of the distributions of Tvm(Z)and Tvm(Z∗). Using elementary arguments, we show that they becomeclose with respect to total variation distance as v and m→∞ with v < m,under very mild conditions; and that, under more stringent conditions,approximation in Wasserstein distance is also good. To dispense with theassumption that v → ∞, more sophisticated methods are required: seeTheorem 10.5 below.

Elementary comparisons

The two lemmas in this section give bounds on the error in approximatingL(Tvm(Z)) by L(Tvm(Z∗)) which are small provided that v →∞ with m.

248 10. Distributional comparisons with Pθ

Lemma 10.1 For any 0 ≤ v < m,

dTV (L(Tvm(Z)),L(Tvm(Z∗)) ≤ dTV (L(Z[v + 1,m]),L(Z∗[v + 1,m]))

≤m∑

i=v+1

θ

i

ρi(Z) +

2θiri

= O

(v−(g1∧a1∧1)

)under Conditions (A0) and (B01).

Proof. By the independence of the components of Z and Z∗,

dTV (L(Z[v + 1,m]),L(Z∗[v + 1,m])) ≤m∑

i=v+1

ri∑j=1

dTV (L(Zij),L(Z∗ij)).

(10.1)Now, since, from (6.7),

dTV (L(Zij),Be (θ/iri)) =θ

2iri

|Ei0|+ |εi1|+∑l≥2

εil

≤ θρi(Z)/iri,

and since also

ρi(Z∗) =

1− e− θ

iri

+∑l≥2

e− θ

iri

( θ

iri

)l−1 1l!≤ 2θiri,

it follows that

dTV (L(Zij),L(Z∗ij)) ≤θ

iri

ρi(Z) +

2θiri

,

proving the lemma. ut

Remark. For mappings, the estimate in Lemma 10.1 is of order O(v−1/2);for polynomials and for square free polynomials, it is of order O(v−1q−v/2).In general, the estimate becomes small as v increases, uniformly in n, if∑i≥1 i

−1ρi(Z) <∞.

Lemma 10.1 gives bounds in terms of total variation distance. The nextlemma bounds the same quantities in Wasserstein l1 distance. Here, it is nolonger the case that dW (L(Z[v+ 1,m]),L(Z∗[v+ 1,m])) necessarily domi-nates dW (L(Tvm(Z)),L(Tvm(Z∗)), because a change of 1 in Zij occasionsa change of i in Tvm(Z), and this is reflected in dW .

Lemma 10.2 For any 0 ≤ v < m,

(1) dW (L(Z[v + 1,m]),L(Z∗[v + 1,m])) ≤ θm∑

i=v+1

i−1µi(Z) + θ/iri

10.1. Comparison of L(Tvm(Z)) and L(Tvm(Z∗)) 249

(2) dW(L(Tvm(Z)),L(Tvm(Z∗))

)≤ θ

m∑i=v+1

µi(Z) + θ/iri;

the bound in Part (1) is of order O(v−β2

)under Conditions (A0)

and (B01), with β2 = (1 ∧ g1 ∧ a1) as before.

Proof. By the independence of the components of Z and Z∗,

dW (L(Z[v + 1,m]),L(Z∗[v + 1,m])) ≤m∑

i=v+1

ri∑j=1

dW (L(Zij),L(Z∗ij)),

and

dW (L(Tvm(Z)),L(Tvm(Z∗))) ≤m∑

i=v+1

ri∑j=1

i dW (L(Zij),L(Z∗ij)).

Now, much as for Lemma 10.1, we compute

dW (L(Zij),L(Z∗ij)) ≤ dW(L(Zij),Be ((1 + Ei0)θ/iri)

)+ dW

(Be ((1 + Ei0)θ/iri),Be (1− e−θ/iri)

)+ dW

(Be (1− e−θ/iri),L(Z∗ij)

).

Since L(Zij) is stochastically greater than Be ((1 + Ei0)θ/iri) and L(Z∗ij)is greater than Be (1− e−θ/iri), we easily have

dW (L(Zij),Be ((1 +Ei0)θ/iri)) = IEZij − (1 +Ei0)θ/iri =θ

iriFi1 − Ei1

and

dW (Be (1− e−θ/iri),L(Z∗ij)) = IEZ∗ij − 1 + e−θ/iri ≤ θ2

2i2r2i.

Finally,

dW (Be ((1 + Ei0)θ/iri),Be (1− e−θ/iri))

= |1− e−θ/iri − (1 + Ei0)θ/iri| ≤θ2

2i2r2i+

θ

iri|Ei0|,


Remark. Note that direct calculation gives the slightly smaller estimate

|IETvm(Z)− IETvm(Z∗)| ≤ θm∑

i=v+1

µi(Z) (10.2)

for the difference between the means.


Improved comparisons

In order to dispense with the requirement that v → ∞, we use Stein’smethod. In the previous sections, which were all concerned with pointprobabilities, the Stein Operator (8.30) has been applied only to functions gwhich are indicators of singletons. For comparisons of whole distributions, agreater range of functions g is required, including all those functions whichsolve the Stein Equation (8.2) for bounded and Lipschitz test functions f .In particular, it is necessary to be able to estimate the size of quantitiessuch as |IEg(Tvm(i)(Z)+i)|, for the functions g arising in this way. Chapter 8gives bounds on the individual values |g(w)|; the next lemma is a step inconverting them into estimates of |IEg(Tvm(i)(Z) + i)|.

Lemma 10.3 For all v < i ≤ m and k ≥ 1,

IE

1 ∧ K

Tvm(i)(Z) + k

≤ e(1+θEm)η(v,m, k,K, θ),

where, if K ≥ k + v + 1,

η(v,m, k,K, α) =K

m+

(

2−α1−α

)Kα

mα if α < 1;

h(m+ 1)Km if α = 1;(1

α−1

)Km if α > 1,

and, if K < k + v + 1,

η(v,m, k,K, α) =K

m+

(

2−α1−α

)[(v+1)∨k]α−1K

mα +(v+1m

)α(1 ∧ K

k

)if α < 1;

Kmh(m+ 1) +

(v+1m

)(1 ∧ K

k

)if α = 1;(

1α−1

)Km +

(v+1m

)α(1 ∧ K

k

)if α > 1.

Remark. The lemma is used mostly with K = 1+θh(m+1), and sometimeswith K bounded in m. As before, h(m+1) =

∑mj=1 j

−1. Note also that thebound given for K < k + v + 1 exceeds that given for K ≥ k + v + 1 in allcases, so that the latter bound is in fact valid for all K.

Proof. For any v < j ≤ m,

IP[Tvm(i)(Z) ≤ j] ≤ IP[Zj+1 = · · · = Zm = 0]/(1− IP[Zi1 > 0] 1li>j)

≤ exp

−θm∑

t=j+1

t−1(1− δit/ri)(1 + Et0)

≤ e(1+θEm)

( j + 1m+ 1

)θ. (10.3)


For 0 ≤ j ≤ v,

IP[Tvm(i)(Z) ≤ j] = IP[Tvm(i)(Z) = 0] ≤ e(1+θEm)( v + 1m+ 1

)θ. (10.4)

The lemma now follows from Lemma 10.4 below. ut

Lemma 10.4 Suppose that W is a non–negative integer valued randomvariable satisfying

IP[W ≤ j] ≤ c( j + 1m+ 1

)α, j > v;

IP[W ≤ j] = IP[W = 0] ≤ c( v + 1m+ 1

)α, 0 ≤ j ≤ v,

for some c ≥ 1, and that k ≥ 1. Then

c−1IE

1 ∧ K

W + k

≤ η(v,m, k,K, α).

Proof. Since the function u(j) = 1 ∧ Kj+k is decreasing in j, it follows by

partial integration that

IEu(W ) =∑j≥0

u(j)− u(j + 1)IP[W ≤ j]

≤m−1∑

j=(K−k)∨(v+1)

KIP[W ≤ j](j + k)(j + k + 1)

+cK

m

+ c( v + 1m+ 1

)α(1 ∧ K

k

)−(1 ∧ K

k + v + 1

)≤ cK

1(m+ 1)α

m−1∑j=(K−k)∨(v+1)

(j + 1)α

(j + k)(j + k + 1)+

1m

+ c( v + 1m+ 1

)α(1 ∧ K

k

)1lK < k + v + 1.

For α < 1 and K ≥ k + v + 1, we now use the inequality

m−1∑j=K−k

(j + 1)α

(j + k)(j + k + 1)≤

∑l≥K

lα−2 ≤ (K − 1)α−1

1− α

≤ 21−αKα−1

1− α≤(2− α

1− α

)Kα−1.

For α < 1,K < k + v + 1 and v + 1 ≤ k, we estimate

m−1∑j=v+1

(j + 1)α

(j + k)(j + k + 1)≤

k−1∑j=v+1

kα

(j + k)(j + k + 1)+∑l>k

lα−2


≤ kα

v + k + 1+kα−1

1− α

≤(2− α

1− α

)kα−1.

For α < 1,K < k + v + 1 and v + 1 > k we havem−1∑j=v+1

(j + 1)α

(j + k)(j + k + 1)≤∑l>v+1

lα−2 ≤ (v + 1)α−1

1− α.

For α ≥ 1, we simply estimatem−1∑

j=(K−k)∨(v+1)

(j + 1)α

(j + k)(j + k + 1)≤

m∑l=1

lα−2 ≤h(m+ 1) if α = 1;(m+1)α−1

α−1 if α > 1.

ut

Remark. If the Em are bounded above,

IE

1 ∧ K

Tvm(i) + k

= O(η(v,m, k,K, θ)),

uniformly in i, v,m, k and K.

Using the estimates from Lemma 10.3, we are now in a position to provethe main total variation estimate of the section.

Theorem 10.5 For any m ≥ 1,

dTV(L(T0m(Z)),L(T0m(Z∗))

)≤ ε10.5(m),

where

ε10.5(m) = ε10.5(m,Z)

= O(m−β01 log1+s(θ)m

)1 + U(m)

under Conditions (A0) and (B01), where we have β01 = (1 ∧ θ ∧ g1 ∧ a1),s(θ) = 1lθ=1 and

U(m) = S(m) 1lθ≥11l(g1∧a1)>1 + logm 1l(g1∧a1)=θ.

Remark. For mappings, ε10.5(m) = O(m−1/2 log2(m + 1)); for poly-nomials and square free polynomials, ε10.5(m) = O(m−1 log2(m +1)).

Proof. By (8.2) and Lemma 8.7,

IP[T0m(Z) ∈ A]− IP[T0m(Z∗) ∈ A] = IE(SmgmA)(T0m(Z))


= −θ3∑j=1

η(0m)j(gmA).

Now, from Lemma 8.2, we have the bound

|gmA(w)| ≤ 1 ∧ w−1(1 + θh(m+ 1)).

Thus, from Lemma 10.3, it follows that

|η(0m)1(gmA)| ≤

m∑i=1

∑l≥1

l|εil|IE

1 ∧ 1 + θh(m+ 1)T0m

(i)(Z) + il

≤ e(1+θEm)m∑i=1

µi η(0,m, i, 1 + θh(m+ 1), θ),

where η(v,m, k,K, α) is as in Lemma 10.3. Similarly,

|η(0m)2(gmA)| ≤ 2e(1+θEm)

m∑i=1

Ei1θ

iriη(0,m, i, 1 + θh(m+ 1), θ)

and

|η(0m)3(gmA)| ≤ 2e(1+θEm)

m∑i=1

θ

iri(1 + |εi1|)η(0,m, i, 1 + θh(m+ 1), θ).

Hence, and because

η(0,m, i, 1 + θh(m+ 1), θ) ≤ η(m, i, θ)= m−1(1 + θh(m+ 1))

+ (1 + θh(m+ 1))

m−θi−(1−θ)(3− 2θ)/(1− θ) if θ < 1;m−11 + h(m+ 1) if θ = 1;m−1θ/(θ − 1) if θ > 1,

we have

dTV(L(T0m(Z)),L(T0m(Z∗))

)≤ ε10.5(m)

= θe(1+θEm)m∑i=1

µi + 2(1 + ρi)

θ

iri

η(m, i, θ).

The order estimates follow from Proposition 6.1. ut

There is a corresponding estimate for Kolmogorov distance. Here, webroaden our horizons a little, comparing T0m(Z∗) with Tvm(Z), for a gen-eral value of v. For Kolmogorov distance, dividing both random variablesby m leaves the distance unchanged.

Theorem 10.6 For any 0 ≤ v < m,

dK(L(Tvm(Z)),L(T0m(Z∗))

)= dK

(L(m−1Tvm(Z)),L(m−1T0m(Z∗))

)≤ ε10.6(m, v),


where

ε10.6(m, v) = ε10.6(m, v, Z)

= O(m−β01 logs(θ)m

[1 + U(m)

]+ (v/m)θ logm

),

under Conditions (A0) and (B01), where s(θ) and U(m) are as forTheorem 10.5.

Proof. Again using the Stein Equation (8.2) and Lemma 8.7, now withf(w) = 1lw < x, we obtain

IP[Tvm(Z) < x]− IP[T0m(Z∗) < x]

= IE(Smgmx)(Tvm(Z)) = θv∑i=1

IEgmx(Tvm(Z) + i)− θ3∑j=1

ηj(vm)(gmx),

with gmx as defined in Lemma 8.3. In the previous theorem, we used thebound

|gbA(j)| ≤ 1 ∧ j−1(1 + θh(b+ 1)),

from Lemma 8.2. Here, for gmx, we can apply Lemma 8.3, to give

0 ≤ gmx(j) ≤1 + θ

x+ θ

(1 ∧ x

j + 1

).

This, with lemma 10.3, gives

|η1(vm)(gmx)| ≤ e(1+θEm)m∑

i=v+1

µi

( 1 + θ

x+ θ

)η(v,m, i, x, θ),

|η2(vm)(gmx)| ≤ 2e(1+θEm)m∑

i=v+1

Ei1θ

iri

( 1 + θ

x+ θ

)η(v,m, i, x, θ),

|η3(vm)(gmx)| ≤ 2e(1+θEm)m∑

i=v+1

θ

iri(1 + |εi1|)

( 1 + θ

x+ θ

)η(v,m, i, x, θ),

since, from the definitions of the ηj(vm) in Lemma 8.7, the function gmx isevaluated only at arguments Tvm(i)(Z) + ir for r ≥ 1. Note also that, againfrom Lemma 10.3,

0 ≤ θv∑i=1

IEgmx(Tvm(Z) + i)

≤ θe(1+θEm)v∑i=1

( 1 + θ

x+ θ

)η(v,m, i, x, θ). (10.5)

It then remains to bound the quantities η(v,m, i, x, θ)/(x + θ) using thedefinition of η(v,m, k,K, α) in Lemma 10.3.


Consider first the case 1 ≤ i ≤ v, relevant to (10.5); then, for θ < 1,

1x+ θ

η(v,m, i, x, θ) ≤ x

m(x+ θ)+(v + 1

m

)θ 1x+ θ

(1 ∧ x

i

)+

1mθ

(2− θ

1− θ

) xθ

x+ θ

( x

v + 1

)1−θ

∧ 2.

Splitting the bound for the third element according to whether or notx/(v + 1) ≤ 2, this in turn implies thatv∑i=1

( 1x+ θ

)η(v,m, i, x, θ) ≤ v

m+(2− θ

1− θ

)(2vm

)θ+(v + 1

m

)θh(v + 1)

≤(2vm

)θ3− 2θ1− θ

+ h(v + 1).

Similar considerations givev∑i=1

( 1x+ θ

)η(v,m, i, x, θ) ≤

( vm

)1 + 2h(v + 1) + h(m+ 1)

if θ = 1, and

v∑i=1

( 1x+ θ

)η(v,m, i, x, θ) ≤

( vm

)( θ

θ − 1

)+ 2(v + 1

m

)θ−1

h(v + 1)

if θ > 1.If now v + 1 ≤ i ≤ m and θ < 1, Lemma 10.3 yields

1x+ θ

η(v,m, i, x, θ) ≤ 1m

+2mθ

(2− θ

1− θ

) 1(x ∨ i)1−θ

+(v + 1

m

)θ 1(x ∨ i)

≤ 1m

+1mθ

(5− 3θ1− θ

) 1(x ∨ i)1−θ

(10.6)

≤ 1m

+1

mθi1−θ

(5− 3θ1− θ

)≤ 2(3− 2θ)

mθi1−θ(1− θ);

for θ = 1 we have the estimate1

x+ θη(v,m, i, x, θ) ≤ 1

m[2 + h(m+ 1)];

and for θ > 11

x+ θη(v,m, i, x, θ) ≤ 1

m

(2θ − 1θ − 1

).

This completes the proof of the theorem, with

ε10.6(m, v) = θ(1 + θ)e(1+θEm)


×

η(v,m, θ) +

m∑i=v+1

µi + 2(1 + ρi)

θ

iri

ηi(m, θ)

;

here,

ηi(m, θ) =

2

mθi1−θ

(3−2θ1−θ

)if θ < 1;

1m [2 + h(m+ 1)] if θ = 1;1m

(2θ−1θ−1

)if θ > 1,

and

η(v,m, θ) =

(

2vm

)θ3−2θ1−θ + h(v + 1)

if θ < 1;

vm1 + 2h(v + 1) + h(m+ 1) if θ = 1;vm

θθ−1 + 2

(v+1m

)θ−1

h(v + 1)

if θ > 1.

The order estimates follow from Proposition 6.1. ut

Remark. For mappings,

ε10.6(m, v) = O((v + 1)1/2m−1/2 logm),

and for polynomials and square-free polynomials

ε10.6(m, v) = O((v + 1)m−1 logm).

Corollary 10.7 For v = 0, we have

|IP[T0m(Z) < m]− IP[T0m(Z∗) < m]|

≤ θ(1 + θ)m

e(1+θEm)m∑i=1

µi + 2(1 + ρi)

θ

iri

×

(

2(3−2θ)1−θ

)if θ < 1;

[2 + h(m+ 1)] if θ = 1;(2θ−1θ−1

)if θ > 1.

Proof. If θ ≥ 1, the result is a consequence of Theorem 10.6. If θ < 1, itfollows from (10.6) with v = 0 and x = m, because now x ∨ i = m. ut

For Wasserstein distance, the normalization of the random variables isimportant.

Theorem 10.8 For any 0 ≤ v < m,

dW(L(m−1Tvm(Z)),L(m−1T0m(Z∗))

)

10.2. Comparing L(m−1Tvm(Z∗)) with Pθ 257

≤ θm−1m∑i=1

[µi +

2θiri

(1 + ρi)]

+ θm−1v(1 + µ∗0)

= O(m−β2

1 + 1lg1∧a1=1 logm+ S(m) 1l(g1∧a1)>1

),

under Conditions (A0) and (B01), with β2 = (1 ∧ g1 ∧ a1) as before.

Proof. Because T0m(Z) is the sum of independent non-negative randomvariables T0v(Z) and Tvm(Z),

dW(L(m−1Tvm(Z)),L(m−1T0m(Z))

)≤ m−1IET0v(Z) ≤ θ(1 + µ∗0)vm

−1,

by Lemma 6.3. Then, to compare m−1T0m(Z) and m−1T0m(Z∗) in Wasser-stein distance, we use (8.32) with v = 0 and g = gf , for any f satisfying|f(j) − f(k)| ≤ m−1|j − k|; from the definition of ηl(vm)(g) in Lemma 8.7and from (8.14), we conclude that

dW(L(m−1T0m(Z)),L(m−1T0m(Z∗))

)≤ θm−1

m∑i=1

[µi +

2θiri

(1 + ρi)],

which is enough. ut

10.2 Comparing L(m−1Tvm(Z∗)) with Pθ

The second part of the chapter consists of showing how close the distribu-tion of m−1Tvm(Z∗) is to Pθ. The argument uses Stein’s method for Pθ,much as illustrated in (8.20). In particular, it is necessary to be able toestimate the quantities appearing in the bounds given in Lemmas 8.5and 8.6, when u is replaced by m−1Tvm(Z∗). The following lemma pro-vides the necessary information, leading on to the results of Theorems10.10 and 10.12.

Lemma 10.9 If X ∼ T0m(Z∗), then

(i) IE( 1X + 1

)≤

(

3−2θ1−θ

)/(m+ 1)θ if θ < 1;

(1 + h(m+ 1))/(m+ 1) if θ = 1;(θθ−1

)/(m+ 1) if θ > 1,

(ii)m∑i=1

IE( 1X + i

)≤ (1 + θ)/θ.

Also, if X ∼ Tvm(Z∗), 1 ≤ v < m, then

(iii)v∑i=1

IE

1X + i

1lX≥v+1

≤ ε10.9(m, v),


where

ε10.9(m, v) =

(v+1m+1

)θ(2−θ1−θ

)if θ < 1;(

v+1m+1

)1 + log

(m+1v+1

)if θ = 1;(

v+1m+1

)(θθ−1

)if θ > 1.

Proof. First, letting F denote the distribution function of X ∼ Tvm(Z∗),note that, for j ≥ v,

F (j) = IP[X ≤ j] ≤ IP[Z∗j+1 = Z∗j+2 = · · · = Z∗m = 0]

= exp

−m∑

r=j+1

θ

r

≤j + 1m+ 1

θ. (10.7)

Thus, for X ∼ T0m(Z∗), direct computation yields

IE( 1X + i

)=

∑j≥0

F (j)− F (j − 1)

j + i

=∑j≥0

F (j)(j + i)(j + i+ 1)

≤m−1∑j=0

j + 1m+ 1

θ 1(j + i)(j + i+ 1)

+1

m+ i. (10.8)

For Part (i), taking i = 1, it then follows that

IE( 1X + 1

)≤m−1∑j=0

(j + 1)θ−2

(m+ 1)θ+

1m+ 1

,

whereas, for Part (ii), it follows from (10.8) that

m∑i=1

IE( 1X + i

)≤m−1∑j=0

j + 1m+ 1

θ 1j + 1

+m

m+ 1;

standard integral estimates now complete the proofs of these estimates.For the last part, as for (10.8), if X ∼ Tvm(Z∗), then

IE

1X + i

1lX≥v+1

≤

m−1∑j=v+1

( j + 1m+ 1

)θ 1(j + i)(j + i+ 1)

+1

m+ i,

leading easily to the boundv∑i=1

IE

1X + i

1lX≥v+1

≤ v

(m+ 1)θ

∫ m+1

v+1

xθ−2 dx+v

m+ 1,

from which Part (iii) follows. ut


The estimates of Lemma 10.9 enable us to show how close L(m−1Tvm(Z∗))is to Pθ. We first prove an error bound in terms of Wasserstein distance,strengthening the convergence proved in Theorem 4.6.

Theorem 10.10 For 0 ≤ v < m,

dW (L(m−1Tvm(Z∗)), Pθ) ≤(1 + θ)2 + vθ

m.

Proof. Combining (8.18), (8.20) and Lemma 8.5, if |f(u)− f(v)| ≤ |u− v|for all u, v ≥ 0, then

|IEf(m−1T0m(Z∗))− Pθ(f)| ≤ θ(θ + 1)m

m∑i=1

IE( 1T0m(Z∗) + i

),

and the conclusion for v = 0 follows from Lemma 10.9(ii). For theremainder, coupling Tvm(Z∗) + T0v(Z∗) = T0m(Z∗), we have

dW (L(m−1T0m(Z∗)),L(m−1Tvm(Z∗))) ≤ vθ/m.

ut

We now give an alternative measure of closeness, expressed in terms ofKolmogorov distance. However, in order to deduce the result for Tvm(Z∗)with v ≥ 1 from that with v = 0, we first need to sharpen Example 8.4.

Lemma 10.11 For 1 ≤ v < m, we have

dK(L(T0m(Z∗)),L(Tvm(Z∗))) ≤ ε10.11(m, v),

where

ε10.11(m, v) = ε10.11(m, v, Z) = O

(( v + 1m+ 1

)θ[1 + logm 1lθ=1]

).

Proof. For any t ≥ v + 1, we use the argument of Example 8.4, and inparticular (8.16), to bound |IP[Tvm(Z∗) ≤ t] − IP[T0m(Z∗) ≤ t]|, but nowwith the full strength of Lemma 8.3, giving

|IP[Tvm(Z∗) ≤ t]− IP[T0m(Z∗) ≤ t]|

≤ θv∑i=1

IE( 1 + θ

t+ 1 + θ

)∧( 1 + θ

Tvm(Z∗) + i

). (10.9)

Applying Lemma 10.9 together with the bound

IP[Tvm(Z∗) ≤ v] ≤ (v + 1)/(m+ 1)θ,


from (10.7), we conclude that

|IP[Tvm(Z∗) ≤ t]− IP[T0m(Z∗) ≤ t]|

≤ θ(1 + θ)( v + 1

m+ 1

)θ+ ε10.9(m, v)

,

for any t ≥ v + 1. For 0 ≤ t ≤ v, use the simple inequality

0 ≤ IP[Tvm(Z∗) ≤ t]− IP[T0m(Z∗) ≤ t] ≤ IP[Tvm(Z∗) = 0]≤ (v + 1)/(m+ 1)θ;

taking

ε10.11(m, v) = (1 + θ)2( v + 1

m+ 1

)θ+ ε10.9(m, v)

,

the statement of the lemma follows, in view of the definition of ε10.9(m, v)in Lemma 10.9. ut

Theorem 10.12

dK(L(m−1T0m(Z∗)), Pθ) ≤ ε10.12(m),

where

ε10.12(m) = O(m−θ

).

Furthermore, for v ≥ 1,

dK(L(m−1Tvm(Z∗)), Pθ) ≤ ε10.12(m) + ε10.11(m, v)

= O

(( v + 1m+ 1

)θ[1 + 1lθ=1 logm]

).

Proof. Take f(u) = 1lu ≤ y for any y ≥ 1, and use (8.18), (8.20) andLemma 8.6, giving

|IP[m−1T0m(Z∗) ≤ y]− Pθ[0, y]|

≤ (2θ + 1)(θ + 1)2m

m∑i=1

IE( 1T0m(Z∗) + i

)+

1m

≤ m−11 + c10.12b(θ),

where c10.12b(θ) = (2θ+1)(1+θ)2

2θ , by Lemma 10.9 (ii). For y = j/m with1 ≤ j ≤ m, we then relate the discrepancy to that for y = 1. First, weobserve that, from (4.15), for 1 ≤ j < m,

jIP[T0m(Z∗) = j] = θIP[T0m(Z∗) ≤ j − 1],

and thus, adding θIP[T0m(Z∗) = j] to each side, that

(j + θ)IP[T0m(Z∗) = j] = θIP[T0m(Z∗) ≤ j];


hence it follows that

IP[T0m(Z∗) ≤ j] = (1 + θj−1)IP[T0m(Z∗) ≤ j − 1], j ≥ 1,

and so

IP[T0m(Z∗) ≤ j] = IP[T0m(Z∗) ≤ m]m∏

r=j+1

(1 +

θ

r

)−1

, 0 ≤ j < m.

(10.10)Correspondingly, for the limit of m−1T0m(Z∗), we have

Pθ[0, j/m] =( jm

)θPθ[0, 1],

from Corollary 4.8. Together, these give

|IP[T0m(Z∗) ≤ j]− Pθ[0, j/m]|

≤( jm

)θ|IP[m−1T0m(Z∗) ≤ 1]− Pθ[0, 1]|

+

∣∣∣∣∣∣1− exp

θlogm− log j −

m∑r=j+1

1r

+m∑

r=j+1

[θr− log

(1 +

θ

r

)]∣∣∣∣∣∣

≤( jm

)θ∣∣IP[m−1T0m(Z∗) ≤ 1]− Pθ[0, 1]∣∣+ j−1c10.12a(θ)

,

where c10.12a(θ) = 12 (θ2 + 2θ) exp 1

2 (θ2 + 2θ).Finally, because m−1T0m(Z∗) puts all its mass on m−1ZZ+, there is an

additional continuity correction of at most

sup0<x<1

Pθ(x, x+m−1] ≤m−θ, θ ≤ 1m−1θ, θ > 1,

(10.11)

again from Corollary 4.8. This yields the first part of the theorem, with

ε10.12(m) =

m−11 + c10.12b(θ)+m−θ1 + c10.12a(θ) if θ ≤ 1;

m−1c10.12b(θ) + (1 + θ) + c10.12a(θ)

if θ > 1.

The estimate for dK(L(m−1Tvm(Z∗)), Pθ) now follows from Lemma 10.11and the triangle inequality. ut

Remark. If v = 0, the error in Theorem 10.12 for Kolmogorov distance isof order m−θ. This cannot be improved upon, in view of (10.11). Theo-rem 10.10 gives an error of order m−1 throughout for Wasserstein distance,which is smaller if θ < 1.

Combining the results of Theorems 10.8 and 10.6 with those of Theorems10.10 and 10.12, we have the following corollary.


Corollary 10.13 For any 0 ≤ v < m, under Conditions (A0) and (B01),

dW (L(m−1Tvm(Z)), Pθ) = O(m−β2 logm+ v/m

)and

dK(L(m−1Tvm(Z)), Pθ)

= O([m−β01 logm

][1 + 1lθ=1 logm

]+ (v/m)θ logm

),

where β2 = (1 ∧ g1 ∧ a1) and β01 = (1 ∧ θ ∧ g1 ∧ a1) as before.

10.3 Comparing L(m−1Tvm(Z∗)) with Pθ(α)

The approximation of L(m−1Tvm(Z∗)) by Pθ cannot be expected to begood, unless v = o(m). For larger v, it makes sense to look instead foran approximation which also includes v in its specification. Now, if Mdenotes the scale invariant Poisson process on (0,∞) with rate θx−1, Pθis the distribution of

∫ 1

0xM(dx): see Chapter 4.11. Since each Z∗i has

a Poisson distribution with mean θi−1, which is close to the mean ofM((i − 1)/m, i/m] if i is large, whatever the value of m, this suggestsapproximating L(m−1Tvm(Z∗)) by Pθ(v/m), where

Pθ(α) = L

(∫ 1

α

xM(dx)), 0 < α < 1, (10.12)

is the distribution discussed in Chapter 4.3. Unfortunately, for α > 0,there is no simple probabilistic interpretation of the solutions to the SteinEquation for Pθ(α), and arguments similar to those above cannot be used.However, once v is large, it is easy to argue directly, as in the followingtheorem.

Theorem 10.14 Suppose that |α−v/m| ≤ α/2 and that 1 ≤ v < m. Then

dK(L(m−1Tvm(Z∗)), Pθ(α)) ≤ θ2α−1|α− v/m|+ 3v−1.

Proof. The chance that M has a point in the interval between α and v/mis at most

θ|α− v/m|α− |α− v/m|−1 ≤ 2θα−1|α− v/m|, (10.13)

because |α− v/m| ≤ α/2, and thus

dTV (Pθ(α), Pθ(v/m)) ≤ 2θα−1|α− v/m|. (10.14)

10.3. Comparing L(m−1Tvm(Z∗)) with Pθ(α) 263

Now Pθ(v/m) is the distribution of T =

∑mi=v+1

∫Iim

xM(dx), where Iimdenotes the interval ((i− 1)/m, i/m], and clearly

m∑i=v+2

m−1(i− 1)M(Iim) ≤ T ≤m∑

i=v+1

m−1iM(Iim).

Again, much as for (10.13), for any i ≥ 2,

dTV (L(Z∗i ),L(M(Iim))) ≤∣∣∣∣θi −

∫Iim

θ

xdx

∣∣∣∣= θ|i−1 + log(1− i−1)| ≤ 2θi−2

and

dTV (L(Z∗i−i),L(M(Iim))) ≤∣∣∣∣ θ

i− 1−∫Iim

θ

xdx

∣∣∣∣= θ|(i− 1)−1 − log(1 + (i− 1)−1)| ≤ 2θi−2.

Hence, if v ≥ 1, for any y ≥ 0, it follows that

IP[T ≤ y] ≥ IP[m−1Tvm(Z∗) ≤ y]− 2θv−1

and that

IP[T ≤ y] ≤ IP[m−1Tv,m−1(Z∗) ≤ y] + 2θv−1

≤ IP[m−1Tvm(Z∗) ≤ y] + 2θv−1 + θm−1,

since IP[Z∗m 6= 0] ≤ θm−1, completing the proof. ut


11Comparisons with Pθ: pointprobabilities

In Chapter 9, we established upper bounds for point probabilities of theform IP[Tvm(Z) = s], sufficient to generalize the results of Lemma 4.12.Here, we look for accurate approximations to the same probabilities, sharp-ening the simple asymptotics of (LLT). The argument again proceeds in twostages, first matching IP[Tvm(Z) = s] with IP[Tvm(Z∗) = s], and then esti-mating the difference between m−1IP[Tvm(Z∗) = s] and the density pθ(y)of Pθ, evaluated at y = s/m. First, however, we establish both (LLT)and (LLTα) in great generality.

11.1 Local limit theorems for Tvm(Z)

In this section, we show that both (LLT) and (LLTα) are true under veryweak conditions. We assume only the Logarithmic Condition together withthe condition µ∗0 <∞.

Theorem 11.1 Suppose that the Logarithmic Condition holds andthat µ∗0 <∞. Then, if v = vm = o(m) as m→∞, it follows that

|sIP[Tvm(Z) = s]− θIP[s−m ≤ Tvm(Z) < s− v]| → 0,

uniformly in s ≥ 0. Furthermore, for any η > 0,

limm→∞

sups≥ηm

|mIP[Tvm(Z) = s]− pθ(s/m)| = 0.

11.1. Local limit theorems for Tvm(Z) 265

Thus the (LLT) holds for all combinatorial structures satisfying theConditioning Relation and the Logarithmic Condition for which µ∗0 <∞.

Proof. The proof is a modification of the proof of Corollary 8.8. ByLemma 5.10, if v = o(m) as m → ∞, then maxr≥0 IP[Tvm = r] → 0,and so we can pick wm ≥ w0, with w0 as defined in (8.36), in such a waythat wm →∞ but stillh(wm + 1) +

wm∑i=1

∑l≥1

l|εil|

maxr≥0

IP[Tvm = r] → 0. (11.1)

Next, the Logarithmic Condition (3.3) implies that ε∗il → 0 as i → ∞ foreach l ≥ 1, whereas ε∗il ≤ ε∗0l for each l and µ∗0 =

∑l≥1 lε

∗0l <∞. Thus, by

dominated convergence,

limi→∞

∑l≥1

lε∗il = 0. (11.2)

With these preliminaries out of the way, we turn to estimating the errorsηj , 1 ≤ j ≤ 3, in the proof of Corollary 8.8. For the first, we have

|η1| ≤wm∑i=1

∑l≥1

l|εil|IP[Tvm(i) = s− il] +m∑

i=wm+1

∑l≥1

l|εil|IP[Tvm(i) = s− il].

Applying (8.38) and (8.37), we thus have

|η1| ≤ c(w0)wm∑i=1

∑l≥1

l|εil|maxr≥0

IP[Tvm = r]

+ 2∑l≥1

lε∗wm,l

m∑i=wm+1

IP[Tvm = s− il],

and this tends to zero by (11.1) and (11.2) and because of the fact that∑i≥1 IP[Tvm = s− il] ≤ 1 for all l ≥ 1.Turning to η2, we have

|η2| ≤

wm∑i=1

+m∑

i=wm+1

θ

iri

∑l≥2

εilIP[Tvm(i) = s−i(l+1)]+IP[Tvm(i) = s−i].

The first i–sum as above contributes at most

2θc(w0)wm∑i=1

∑l≥2

l|εil|maxr≥0

IP[Tvm = r] → 0;

266 11. Comparisons with Pθ: point probabilities

the second at most (4θ/wm)∑l≥2 ε

∗wm,l

→ 0. Finally, we have

|η3| ≤

wm∑i=1

+m∑

i=wm+1

θ

iri(1+εi1)εilIP[Tvm(i) = s−2i]+IP[Tvm(i) = s−i],

with the first sum bounded by

4θc(w0)(1 + ε∗01)h(wm + 1) maxr≥0

IP[Tvm = r] → 0,

and the second by (4θ/wm)(1 + ε∗01) → 0.The second conclusion follows from the convergence of m−1Tvm(Z)

to Xθ, established in Theorem 5.3. Since Pθ is continuous, and has boundeddensity in [η,∞) for any η > 0, it follows that∣∣∣∣ms IP

[s

m− 1 ≤ m−1Tvm(Z) <

s− v

m

]− m

sIP[ sm− 1 ≤ Xθ <

s

m

]∣∣∣∣≤ 2ηdK(L(m−1Tvm(Z)), Pθ) +

v

mηsupy≥η

pθ(y) → 0,

uniformly in s ≥ ηm, andθm

sIP[ sm− 1 ≤ Xθ <

s

m

]= pθ(s/m)

from (4.29). The final statement follows on letting s/m→ y > 0. ut

Theorem 11.2 Suppose that the Logarithmic Condition holds andthat µ∗0 < ∞. Then (LLTα) holds: if, as m → ∞, v ∼ αm for some0 < α < 1 and s/m→ y ∈ (0,∞), then, for y /∈ α, 1,

limm→∞

mIP[Tvm(Z) = s] = pθ(α)(y).

If y = 1, then

limn→∞

|mIP[Tvm(Z) = s]− pθ(α)(s/m)| = 0,

but pθ(α)(1) 6= pθ(α)(1+); if y = α, then

limn→∞

|mIP[Tvm(Z) = s]− pθ(α)(α)1lm > b| = 0.

Proof. The proof runs much as for Theorem 11.1, adapted becauseLemma 5.11 only gives maxr≥1 IP[Tvm = r] → 0. Thus wm is chosen asin (11.1), but with maxr≥1 IP[Tvm = r] replacing maxr≥0 IP[Tvm = r], andalso with wm ≤

√m. There are now two extra terms to consider, arising

for i ≤ wm and il = s when bounding |η1|, and for i ≤ wm and i(l+ 1) = swhen bounding |η2|. For |η1|, we observe that, if s ∼ ym, then

wm∑i=1

∑l≥1

1lil=sl|εil| ≤∑

l≥s/wm

lε∗il → 0

11.2. Comparison of Tvm(Z) with Tvm(Z∗): point probabilities 267

as m→∞, because µ∗0 <∞ and s/wm →∞; for |η2|, we havewm∑i=1

θ

iri

∑l≥2

1li(l+1)=sεil ≤ θs−1∑

l≥(s/wm−1)∨2

(l + 1)ε∗0l → 0

also. This shows that, if s/m→ y ∈ (0,∞) as m→∞, then∣∣∣sIP[Tvm(Z) = s]− θIP[ sm− 1 ≤ m−1Tvm(Z) <

s

m− v

m

]∣∣∣→ 0

as m → ∞. For y /∈ α, 1, the theorem follows from Theorem 5.4and (4.43), since then both y − α and y − 1 are continuity points of Pθ(α).For y ∈ α, 1, we also need the fact that

limm→∞

IP[Tvm(Z) = 0] = IP[Xθ(α) = 0],

which is immediate from Theorem 5.4 because, in addition,

IP[m−1Tvm(Z) ∈ (−∞, v/m] \ 0] = IP[Xθ(α) ∈ (−∞, α] \ 0] = 0.

ut

11.2 Comparison of Tvm(Z) with Tvm(Z∗): pointprobabilities

Replacing point probabilities with interval probabilities

In order to derive the concrete bounds given in our main approximationtheorems, the convergence in Theorem 11.1 has to be sharpened by theaddition of error bounds, which are of the required smallness when condi-tions such as (A0) and (B01) hold. This sharpening requires the comparisonof point probabilities with low relative error, which can be a difficult taskwhen the individual probabilities are small, as is the case for Tvm(Z∗)when m is large. However, we have already seen in Lemma 9.1 that thequantity sIP[Tvm = s] can be well approximated by an interval probabilityof the form IP[Tvm < s− v], at least when v is large. This opens the doorto making comparisons of point probabilities by instead comparing intervalprobabilities, which is rather easier. However, the restriction to large v hasfirst to be removed. This is the substance of our next result. To state it,we define

κ(v, s,m) = κ(v, s,m;Z) =m∧s∑i=v+1

εi1IP[Tvm(Z) = s− i]. (11.3)

Theorem 11.3 For s ≥ 1 and 0 ≤ v < m, we have the estimate

|IP[Tvm(Z) = s]− θs−1IP[s−m ≤ Tvm(Z) < s− v] + κ(v, s,m;Z)|≤ θs−1m−θ(s+ 1)−(1−θ)φ11.3(m, s),


for all s ≥ 1, where

φ11.3(m, s) = φ11.3(m, s, Z) = O(S(s) + s1−(a1∧a2)+δ

)for any δ > 0, under Conditions (A0) and (B01).

Proof. If 1 ≤ s ≤ v, there is nothing to prove. For s > v, we follow theproof of Corollary 8.8, but now use Corollary 9.3 to bound the quantitiesIP[Tvm(i) = s− k]. Because of the correction κ(v, s,m), we need to replaceη1 = η1(1ls) in the proof by

η1 =m∧s∑i=v+1

∑l≥2

lεilIP[Tvm(i) = s− il],

then using the inequality |η1 − η1 − κ(v, s,m)| ≤ η4, where

η4 =m∧s∑i=v+1

|εi1| |IP[Tvm(i) = s− i]− IP[Tvm = s− i]| (11.4)

is bounded using Lemma 9.6.The quantity η1 is bounded by

|η1| ≤ Km(1)m−θ(s+ 1)−(1−θ)

×

21−θb(s+1)/4c∑

i=1

Fi1 + φθ1(s) + u∗1(s)

, (11.5)

by Corollary 9.3 and Lemma 13.7 (1,2,3) with K = Km(1). For

η3(1ls) =m∑

i=v+1

θ

iri(1 + εi1)IP[Tvm(i) = s− i]− IP[Tvm(i) = s− 2i],

observe that θiri

(1 + εi1) = IP[Zi1 = 1] ≥ 0, so that |η3| is no larger thaneither of the sums. Hence, from Corollary 9.3 and Lemma 13.7 (4,5) (withα = 1/2) and (6,7) (with α = 1/4), also with K = Km

(1) and vi = (1+εi1),we have

|η3(1ls)| ≤ Km(1)θm−θ(s+ 1)−(1−θ)21−θ

×

b(s+1)/2c∑

i=1

1 + εi1iri

+2(1 + ε∗s/4,1)

θr−s/4

. (11.6)

In much the same way,

|η2(1ls)|

=

∣∣∣∣∣∣m∑

i=v+1

θ

iri

∑l≥2

εilIP[Tvm(i) = s− i(l + 1)]− Ei1IP[Tvm(i) = s− i]

∣∣∣∣∣∣


is bounded using Corollary 9.3 and Lemma 13.7 (8,9,10,11) with vi = 1and (4,5) with vi = Ei1 and α = 1/2, giving

|η2(1ls)| ≤ Km(1)θm−θ(s+ 1)−(1−θ) (11.7)

×

22−θb(s+1)/2c∑

i=1

Ei1iri

+23−2θE∗

s/4,1

θr−s/4+ s−12φθ2(s) + u∗2(s)

.

Combining (11.4)–(11.7) with (8.33), we thus find that

|IP[Tvm(Z) = s]− θs−1IP[s−m ≤ Tvm(Z) < s− v] + κ(v, s,m)|≤ θs−1η1 + |η2|+ |η3|+ η4≤ θs−1m−θ(s+ 1)−(1−θ)

×

φ9.6(m, s) +Km(1)

21−θb(s+1)/4c∑

i=1

(Fi1 +

θ

iri(1 + 2ρi)

)

+θ23−2θ(1 + ρ∗s/4)

θr−s/4+ φθ1(s) + u∗1(s) + θ

2φθ2(s) + u∗2(s)s

]≤ θs−1m−θ(s+ 1)−(1−θ)φ11.3(m, s),

where

φ11.3(m, s) =

Km(1)

21−θb(s+1)/4c∑

i=1

[Fi1 +

θ

iri(1 + 2ρi)(1 + 2|εi1|)

]

+ φθ1(s) + u∗1(s) +8θ(1 + ρ∗s/4)(1 + ε∗s/4,1)

θr−s/4

+ θs−1(2φθ2(s) + u∗2(s))(1 + ε∗01)

,


Corollary 11.4 Defining

φ∗11.4(n,Z) = max1≤s≤n

φ11.3(n, s, Z),

it follows that φ∗11.4(n) is bounded if supn En, ρ∗0, S(∞),

supsφθ1(s) + u∗1(s) + s−1[φθ2(s) + u∗2(s)]


and∑i≥1 Fi1 are all finite. Under Conditions (A0) and (B01), φ∗11.4(n) =

O(n1−a1+δ

)for any δ > 0 if a1 ≤ 1, and φ∗11.4(n) = O(S(n)) if a1 > 1.

Under Condition (G), φ∗11.4(n) is bounded.

Comparing interval probabilities

It follows from Theorem 11.3 that any point probability for Tvm(Z) can beapproximated through probabilities of the form IP[Tvm(Z) < x]. These canin turn be compared to the probabilities IP[Tvm(Z∗) < x]. This comparisonis the same as that needed when approximating with respect to Kolmogorovdistance, and suggests that the results of Chapter 10, and in particularTheorem 10.6, should be applicable. However, the situation here is different.On one hand, we only need to compare the probabilities when x is large,where they are closer to one another than when x is small. On the other,we would like to be able to take v quite large without degrading the result,and the situation in Theorem 10.6 is not appropriate, because there Tvm(Z)is approximated by T0m(Z∗), resulting in a discrepancy (10.5) of typicalorder O(v/m)θ logm reflecting this difference. So now we prove a directapproximation of Tvm(Z) by Tvm(Z∗), even though still using the SteinEquation

θm∑i=1

g(w + i)− wg(w) = 1lw < x − IP[T0m(Z∗) < x] (11.8)

appropriate to T0m(Z∗), which has a solution g = gmx satisfying

0 ≤ gmx(w) ≤ 1 + θ

x+ θfor all w, (11.9)

by Lemma 8.3. The detailed result is as follows.

Lemma 11.5 For x ≥ 1,

|IP[Tvm(Z) < x]− IP[Tvm(Z∗) < x]| ≤ θ(1 + θ)x−1φ11.5(m),

where

φ11.5(m) = φ11.5(m,Z) = 2θ2 +m∑i=1

µi(1 + θ) +

θ

iri(1 + ρi)

is of order O

(m[1−(g1∧a1)]+1 + 1l(g1∧a1)=1 logm+ S(m)

), under Con-

ditions (A0) and (B01).

Proof. Combining (11.8), (8.15) and Lemma 8.7, it follows that

IP[Tvm(Z) < x]− IP[Tvm(Z∗) < x]


= θv∑i=1

IEg(Tvm(Z) + i)− IEg(Tvm(Z∗) + i) − θ3∑s=1

ηs(g),

where g satisfies (11.9) and the ηs(g) = η(vm)s(g) are defined as in

Lemma 8.7. Note that therefore

|η1(g)| ≤1 + θ

x+ θ

m∑i=v+1

µi; |η2(g)| ≤1 + θ

x+ θ

m∑i=v+1

θEi1iri

,

and

|η3(g)| ≤1 + θ

x+ θ

m∑i=v+1

θ

iri(1 + εi1).

Using Lemma 10.1, the remaining term is in modulus at most

1 + θ

x+ θvθ dTV (L(Tvm(Z)),L(Tvm(Z∗)))

≤ 1 + θ

x+ θvθ

1 ∧

m∑i=v+1

θ

i

(ρi +

2θiri

)

≤ θ21 + θ

x+ θ

(2θ +

m∑i=v+1

ρi

),

and the result follows by combining the estimates. ut

As a result of Theorem 11.3, Lemma 11.5 and (9.6), we now have thefollowing comparison of point probabilities.

Theorem 11.6 For 0 ≤ v < m and any s ≥ 2v + 3,

sθ−1|IP[Tvm(Z) = s]− IP[Tvm(Z∗) = s]|≤ m−θ(s+ 1)−(1−θ)φ11.6(m, s),

where

φ11.6(m, s) = φ11.6(m, s, Z)

= O(φ11.3(m, s) + (m/s)θφ11.5(m) + s1−(g1∧1)(1 + log s 1lg1=1)

)under Conditions (A0) and (B01). In particular, under these conditions,

φ∗11.6(n) = supn/4≤s≤n

φ11.6(n, s)

is of order n1−(g1∧a1)+δ for any δ > 0, when (g1 ∧ a1) ≤ 1, and is oforder O(S(n)) if (g1 ∧ a1) > 1.


Proof. Applying Theorem 11.3 and (9.6),

sθ−1|IP[Tvm(Z) = s]− IP[Tvm(Z∗) = s]|≤ |IP[s−m ≤ Tvm(Z) < s− v]− IP[s−m ≤ Tvm(Z∗) < s− v]|

+m−θ(s+ 1)−(1−θ)φ11.3(m, s) + |κ(v, s,m)|. (11.10)

Lemma 11.5 is used for the difference of probabilities, with the conditions ≥ 2v + 3 ensuring that 1/(s− v) ≤ 2/(s+ 1); for the remainder, we usethe definition (11.3) of κ, together with Corollary 9.3, to give the bound

|κ(v, s,m)| =m∧s∑i=v+1

|εi1|IP[Tvm = s− i]

≤ Km(1)m−θ

(

2s+ 1

)1−θ bs/2c∑i=1

|εi1|+ ε∗s/2,1θ−1

(s+ 1

2

)θ .

This gives the estimate of the theorem, with

φ11.6(m, s) =

φ11.3(m, s) + 2θ(1 + θ)

(m

s+ 1

)θφ11.5(m)

+Km(1)21−θ(1 + θ−1)

bs/2c+1∑i=1

ε∗i1;

the order estimates follow from Proposition 6.1, Theorem 11.3 andLemma 11.5. ut

Remark. For all 3 ≤ s ≤ m, we have

φ11.6(m, s) ≤ φ∗11.4(m) + 2Km(1)(1 + θ−1)

bm/2c+1∑i=1

ε∗i1

+ 2θ(1 + θ)(

m

s+ 1

)θφ11.5(m). (11.11)

The following result is an immediate consequence.

Corollary 11.7 If m ≥ 18, m/2 ≤ s ≤ m and 0 ≤ v ≤ m/6, then

|IP[Tvm(Z) = s]− IP[Tvm(Z∗) = s]| ≤ θs−1m−1φ11.7(m),

where

φ11.7(m) = φ11.7(m,Z)

= 2φ∗11.4(m) + 4θ(1 + θ)φ11.5(m) + 4Km(1)(1 + θ−1)

bm/2c+1∑i=0

ε∗i1;


φ11.7(m) is of order O(m[1−(g1∧a1)]++δ

)if (g1 ∧ a1) ≤ 1, and of order

O(S(n)) if (g1 ∧ a1) > 1, under Conditions (A0) and (B01).

Remark. For mappings, φ11.7(m) is of order O(m1/2), and for polynomialsand square free polynomials of order O(1).

Corollary 11.7 shows that IP[Tvm(Z) = s] and IP[Tvm(Z∗) = s] are closeif v ≤ m/6 and s ≥ m/2. In Theorem 6.14, we are interested in values of vwhich may be larger than m/6, and in any value of s between v+1 and m.For 1 ≤ s ≤ v, both probabilities are zero, and for s = 0 and v large, thetotal variation approximation of Lemma 10.1 is already adequate; in orderto get a bound which is valid in all remaining circumstances, all that needbe done is to adjust the proof of Theorem 11.6.

Theorem 11.8 For any 0 ≤ v ≤ m and any v + 1 ≤ s ≤ m, we have

sθ−1|IP[Tvm(Z) = s]− IP[Tvm(Z∗) = s]| ≤ (v + 1)−1φ11.8(m),

where

φ11.8(m) = φ11.8(m,Z)

= φ∗11.4(m) + 4θ2 + θ + 2Km(1)(1 + θ−1)

m∑i=1

ρ∗i

= O(m1−a1+δ +m[1−(g1∧a1)]+(1 + 1l(g1∧a1)=1 logm) + S(m)

)for any δ > 0, under Conditions (A0) and (B01).

Proof. As for (11.10),

sθ−1|IP[Tvm(Z) = s]− IP[Tvm(Z∗) = s]|≤ |IP[s−m ≤ Tvm(Z) < s− v]− IP[s−m ≤ Tvm(Z∗) < s− v]|

+m−θ(s+ 1)−(1−θ)φ∗11.4(m) + |κ(v, s,m)|.

Estimating |κ(v, s,m)| as for Theorem 11.6, and now using Lemma 10.1 forthe difference in probabilities, gives

sθ−1|IP[Tvm(Z) = s]− IP[Tvm(Z∗) = s]|

≤ m−θ(s+ 1)−(1−θ)

φ∗11.4(m) + 2Km(1)(1 + θ−1)

bm/2c+1∑i=1

ε∗i1

+

m∑i=v+1

θ

i

ρi +

2θiri

≤ (v + 1)−1φ11.8(m),


as required. ut

Bounding the relative error

It is useful in the argument that follows to have the result of Corollary 11.7expressed instead in terms of relative error, which is conveniently possibleif sIP[Tvm(Z∗) = s] is bounded away from 0. Conditions under which thisis true are given in the next lemma.

Lemma 11.9 There exists a constant c11.9 = c11.9(θ) > 0 such that,for all m ≥ s ≥ m/2 ≥ 1 and 0 ≤ v < m/2,

sθ−1IP[Tvm(Z∗) = s] ≥ c11.9.

In the particular case v = 0, s = m, we have the sharper estimate

(m+ θ)IP[T0m(Z∗) = m] ≥ θPθ[0, 1],

where Pθ is as in Theorem 4.6.

Proof. It follows from (9.6) that, for s ≤ m,

sθ−1IP[Tvm(Z∗) = s] = IP[Tvm(Z∗) < s− v]. (11.12)

If m/4 ≤ v < m/2, the inequality

IP[Tvm(Z∗) < s− v] ≥ IP[Tvm(Z∗) = 0]

= exp

−θ

m∑i=v+1

1i

≥ exp

−θ∫ m

v

dx

x

=( vm

)θshows that sθ−1IP[Tvm(Z∗) = s] ≥ 4−θ. If 0 ≤ v < m/4, observe that

IP[Tvm(Z∗) < s− v] ≥ IP[T0m(Z∗) < m/4],

and that, from (10.10),

IP[T0m(Z∗) ≤ x] =

x∏j=1

(1 +

θ

j

) IP[T0m(Z∗) = 0]

=

x∏j=1

(1 +

θ

j

) exp

−θm∑j=1

1j

(11.13)

≥ exp

−θm∑

j=x+1

1j− θ2

2

∑j≥1

1j2

.

Thus, for 0 ≤ v < m/4, taking x+ 1 = m/4, we have

sθ−1IP[Tvm(Z∗) = s] ≥(1

4− 1m

)θe−π

2θ2/12 ≥ 8−θe−π2θ2/12,


whenever m ≥ 8. The extension to include m < 8 is immediate.For the case v = 0, s = m, (11.12) and (11.13) give

mIP[T0m(Z∗) = m] =θm

m+ θ

m∏j=1

(1 + θj−1)e−θ/j

≥ θm

m+ θ

∏j≥1

(1 + θj−1)e−θ/j

=

θm

m+ θ

e−γθ

Γ(θ + 1)=

θm

m+ θPθ[0, 1],

from Corollary 4.8. ut

Theorem 11.10 Let the ratio IP[Tvm(Z) = s]/IP[Tvm(Z∗) = s] be denotedby rvs(m) = rvs

(m)(Z). Then

max|rvs(m) − 1|, |rvs(m)−1 − 1| ≤ 2c−111.9m

−1φ11.7(m),

uniformly in m ≥ n0, 0 ≤ v < m/6 and m/2 ≤ s ≤ m, where n0 is thesmallest m ≥ max18, 2θ such that

φ11.7(m) ≤ 12mc11.9. (11.14)

In particular,

IP[T0m(Z) = m] ≥ θPθ[0, 1]/3m, m ≥ n0.

Proof. Combine Corollary 11.7 with Lemma 11.9 to give

|rvs(m) − 1| ≤ δm = c−111.9m

−1φ11.7(m) :

the inequality for |rvs(m)−1 − 1| then follows, so long as δm ≤ 12 , which is

true for all m ≥ n0. In the case of IP[T0m(Z) = m], this equation impliesthat, for all m ≥ n0,

IP[T0m(Z) = m] ≥ 12 IP[T0m(Z∗) = m] ≥ θPθ[0, 1]/2(m+ θ),

this last from Lemma 11.9. But now m ≥ 2θ implies that m + θ ≤ 3m/2.ut

Remark. Under Conditions (A0) and (B01),m−1φ11.7(m) → 0 asm→∞,in view of Corollary 11.7. Thus n0 is some fixed number, depending on theparticular logarithmic combinatorial structure being considered.


11.3 Comparison with pθ

In the previous section, we have shown that the point probabilitiesIP[Tvm(Z) = s] and IP[Tvm(Z∗) = s] are close to being equal when mis large. The next theorem sharpens (LLT) and (LLTα), showing that theprobabilities are both close to the values to be expected from the factsthat L(m−1Tvm(Z)) converges to Pθ, with density pθ given in (4.29), ifv = o(m), and to Pθ(α) defined in Theorem 4.9 if v/m→ α > 0.

Let pθ(α) denote the density of Pθ(α) on (0,∞); then, as in Chapter 4.3,

xpθ(α)(x) = θPθ

(α)[x− 1, x− α], x > 0, (11.15)

and so pθ(α)(x) = 0 for 0 < x < α and pθ(α)(x) ≤ θx−1 for x ≥ α; Pθ(α) alsohas an atom of probability αθ at zero.

Theorem 11.11 For any s ≥ v + 1, 0 ≤ v < m,

(i) |mIP[Tvm(Z∗) = s]− pθ(s/m)| ≤ θms−1ε11.11(m, v),

where

ε11.11(m, v) = O

((v + 1m+ 1

)θ(1 + 1lθ=1 logm)

)under Conditions (A0) and (B01), and

(ii) |mIP[Tvm(Z) = s]− pθ(s/m)| ≤ θms−1ε11.11(m, v)

+ θs−1

m/(s+ 1)1−θφ11.6(m, s), if s ≥ 2v + 3;2m/sφ11.8(m), if v + 1 ≤ s ≤ 2(v + 1).

Furthermore, if also v ≥ 1, then

(iii) |mIP[Tvm(Z∗) = s]− pθ(v/m)(s/m)| ≤ 4θ2ms−1v−1;

(iv) |mIP[Tvm(Z) = s]− pθ(v/m)(s/m)|

≤ 6θ2ms−1v−1 + θms−1v−1φ11.8(m) :

note that, for 1 ≤ s ≤ v, IP[Tvm(Z∗) = s] = IP[Tvm(Z) = s] = 0.

Proof. For Part (i), from (9.6), Theorem 10.12 and Corollary 4.8, we have

sθ−1IP[Tvm(Z∗) = s] = IP[s−m ≤ Tvm(Z∗) < s− v] (11.16)

and ∣∣∣IP[s−m ≤ Tvm(Z∗) < s− v]− Pθ

( sm− 1,

s

m

)∣∣∣≤ 2ε10.12(m) + ε10.11(m, v)+ (v/m)(θ ∨ 1);

11.3. Comparison with pθ 277

note that ε10.11(m, 0) is taken to be zero. Now, using (4.29), Part (i)follows with

ε11.11(m, v) = 2ε10.12(m) + ε10.11(m, v) + (v/m)(θ ∨ 1), (11.17)

so that

ε11.11(m, 0) = 2ε10.12(m), (11.18)

and Part (ii) follows also, from Theorems 11.6 and 11.8; the order estimatefollows from Lemma 10.11 and Theorem 10.12. For Part (iii), use (9.6) andTheorem 10.14 to give∣∣∣s

θIP[Tvm(Z∗) = s]− Pθ

(v/m)[ sm− 1,

s

m− v

m

]∣∣∣≤ 3θ

v+ Pθ

(v/m)(s− v − 1

m,s− v

m

],

in which the last term is zero for s ≤ 2v, and is at most

1mθ

(s− v − 1

m

)−1

≤ θ

v

for s ≥ 2v + 1, from the properties of the density pθ(α) listed after (11.15);the proof of this part is now completed using (11.15). Part (iv) then followsfrom Theorem 11.8. ut

The final result of this chapter shows that the influence of v on the valueof IP[Tvm(Z∗) = s] is of little significance, provided that v is not too largeand s is not too far from m, as is to be expected, in view of (LLT).

Lemma 11.12 There exists a constant c11.12 = c11.12(θ) such that,uniformly in m ≥ 18, 0 ≤ v < m/4 and 0 ≤ l ≤ m/2,∣∣∣∣ IP[Tvm(Z∗) = m− l]

IP[T0m(Z∗) = m]− 1∣∣∣∣ ≤ c11.12

(l + v

m

).

Proof. Replace w by W = Tvm(Z∗) + l+ v and x by m in (11.8) and takeexpectations, using (8.15) and (11.9), to give

|IP[Tvm(Z∗) < m− l − v]− IP[T0m(Z∗) < m]|

=∣∣∣θ v∑

i=1

IEg(W + i)− (l + v)IEg(W )∣∣∣ ≤ ( 1 + θ

m+ θ

)v(θ + 1) + l.

It then follows from (9.6) that∣∣∣∣ IP[Tvm(Z∗) = m− l]IP[T0m(Z∗) = m]

− 1∣∣∣∣ = ∣∣∣∣( m

m− l

) IP[Tvm(Z∗) < m− l − v]IP[T0m(Z∗) < m]

− 1∣∣∣∣

≤ l

m− l+( m

m− l

)( 1 + θ

m+ θ

) v(θ + 1) + lIP[T0m(Z∗) < m]


≤ c11.12

(l + v

m

),

for an appropriately chosen

c11.12 ≤ 2(1 + θ)(2 + θ) expθ

18+π2θ2

12

,

in the stated ranges of m, l and v, since, from (11.13),

IP[T0m(Z∗) < m] ≥ exp− θ

m− π2θ2

12

.

ut


12Proofs

In this chapter, we give the proofs of Theorems 6.6–6.14 and 7.10.

12.1 Proof of Theorem 6.6

We wish to establish the asymptotics of IP[An(C(n))] under Conditions (A0)and (B01), where

An(C(n)) =⋂

1≤i≤n

⋂r′

i+1≤j≤ri

C(n)ij = 0,

and ζi = (r′i/rid)− 1 = O(i−g′) as i→∞, for some g′ > 0. We start with

the expression

IP[An(C(n))] =IP[T0n(Z ′) = n]IP[T0n(Z) = n]

∏1≤i≤n

r′i+1≤j≤ri

1− θ

iri(1 + Ei0)

. (12.1)

Now, from (11.12) and (11.13) in the proof of Corollary 11.9 and fromTheorem 11.10, we have

IP[T0n(Z ′) = n] (12.2)

=θd

nexp

∑i≥1

[log(1 + i−1θd)− i−1θd]

1 +O(n−1φ′11.7(n))

280 12. Proofs

and

IP[T0n(Z) = n] (12.3)

=θ

nexp

∑i≥1

[log(1 + i−1θ)− i−1θ]

1 +O(n−1φ11.7(n)),

where φ′11.7(n) refers to the quantity derived from Z ′. It thus followsthat IP[An(C(n))] ∼ Kn−θ(1−d) for a constant K, depending on Z andthe r′i and computable explicitly from (12.1)–(12.3), if Conditions (A0)and (B01) are satisfied and if ζ∗i = O(i−g

′) for some g′ > 0, since, under

these circumstances, both n−1φ′11.7(n) and n−1φ11.7(n) tend to zero asn → ∞. In particular, for polynomials and square free polynomials, therelative error in this asymptotic approximation is of order n−1 if g′ > 1;see Theorem 1 of Car (1984).


We wish to prove that, for 0 ≤ b < n/8 and n ≥ n0, with n0 as defined inTheorem 11.10, then

dTV(L(C[1, b]),L(Z[1, b])

)≤ dTV

(L(C[1, b]),L(Z[1, b])

)≤ ε6.7(n, b),

where ε6.7(n, b) = O(b/n) under Conditions (A0), (D1) and (B11).The proof follows the same line as that of Theorem 4.18, but now uses

estimates which are valid for more general structures than θ–biased randompermutations. Since, by the Conditioning Relation,

L(C[1, b] |T0b(C) = l) = L(Z[1, b] |T0b(Z) = l),

it follows by direct calculation that

dTV(L(C[1, b]),L(Z[1, b])

)= dTV

(L(T0b(C)),L(T0b(Z))

)= max

A

∑r∈A

IP[T0b(Z) = r]

1− IP[Tbn(Z) = n− r]IP[T0n(Z) = n]

. (12.4)

Suppressing the argument Z from now on, we thus obtain

dTV(L(C[1, b]),L(Z[1, b])

)=

∑r≥0

IP[T0b = r]

1− IP[Tbn = n− r]IP[T0n = n]

+

≤∑r>n/2

IP[T0b = r] +bn/2c∑r=0

IP[T0b = r]IP[T0n = n]

12.2. Proof of Theorem 6.7 281

×

n∑s=0

IP[T0b = s](IP[Tbn = n− s]− IP[Tbn = n− r])

+

≤∑r>n/2

IP[T0b = r] +bn/2c∑r=0

IP[T0b = r]

×bn/2c∑s=0

IP[T0b = s]IP[Tbn = n− s]− IP[Tbn = n− r]+

IP[T0n = n]

+bn/2c∑r=0

IP[T0b = r]n∑

s=bn/2c+1

IP[T0b = s]IP[Tbn = n− s]/IP[T0n = n].

The first sum is at most 2n−1IET0b; the third is bounded by(max

n/2<s≤nIP[T0b = s]

)/IP[T0n = n] ≤

2ε9.5(1)(n/2, b)n

3nθPθ[0, 1]

,

in view of Corollary 9.5 (1) and Theorem 11.10; the second, by Theo-rem 11.10 and Corollary 9.8, is at most

3nθPθ[0, 1]

4n−2φ∗9.8(n)bn/2c∑r=0

IP[T0b = r]bn/2c∑s=0

IP[T0b = s] 12 |r − s|

≤12φ∗9.8(n)

θPθ[0, 1]IET0b

n.

Hence we may take

ε6.7(n, b) = 2n−1IET0b(Z)

1 +

6φ∗9.8(n)

θPθ[0, 1]

+6

θPθ[0, 1]ε9.5(1)(n/2, b). (12.5)

In view of Lemma 6.3 and Corollaries 9.8 and 9.5 (1), ε6.7(n.b) has therequired order under Conditions (A0), (D1) and (B11), if S(∞) < ∞. Ifnot, φ∗9.8(n) can be replaced by φ∗9.11(n) in the above, which by Corol-lary 9.11 has the required order, without the restriction on the ri impliedby S(∞) <∞. ut

Examining the Conditions (A0), (D1) and (B11), it is perhaps surprisingto find that (B11) is required instead of just (B01); that is, that

∑l≥2 lεil =

O(i−a1) for some a1 > 1 should be necessary. A first observation is that asimilar problem arises with the rate of decay of εi1 as well. For this reason,η1 is replaced by η1 in the proofs of Theorems 9.7 and 9.10. This makesit possible to replace condition (A1) by the weaker pair of conditions (A0)and (D1) in the eventual assumptions needed for ε6.7(n, b) to be of order

282 12. Proofs

O(b/n); the decay rate requirement of order i−1−γ is shifted from εi1 itselfto its first differences. This is needed to obtain the right approximationerror for the random mappings example. However, since all the classicalapplications make far more stringent assumptions about the εil, l ≥ 2,than are made in (B11), we have not attempted any similar modificationto weaken (B11).

The critical point of the proof is to be seen in Theorem 9.10, wherethe initial estimate of the difference IP[Tbn(m) = s]− IP[Tbn(m) = s+ 1] givenin Theorem 9.7 (2) is used to derive a refined estimate of the differenceIP[Tbn = s]− IP[Tbn = s+ 1] which improves upon that given in Theo-rem 9.7 (1). The factor ε9.10(n), which should be small, contains a far

tail element from η1 of the form φ∗(θ)1 (n) + u∗1(n), which is only small if

a1 > 1, being otherwise of order O(n1−a1+δ) for any δ > 0, since a2 > 1 isin any case assumed. For s ≥ n/2, this gives rise to a contribution of orderO(n−1−a1+δ) in the estimate of the difference IP[Tbn = s]− IP[Tbn = s+ 1],which, in the remainder of the proof, is translated into a contribution of or-der O(tn−1−a1+δ) for differences of the form IP[Tbn = s]− IP[Tbn = s+ t],finally leading to a contribution of order bn−a1+δ for any δ > 0 inε6.7(n, b).

At the expense of further complicating the proofs, some improvementwould seem to be possible. Using the proof of Theorem 9.10, but definingthe function g by g(w) = 1lw=s−1lw=s+t, differences that are of the formIP[Tbn = s]− IP[Tbn = s+ t] can be directly estimated, at a cost of only asingle contribution of the form φ

∗(θ)1 (n) + u∗1(n). Then, iterating the cycle

from Theorem 9.7 to Theorem 9.10, in which one estimate of a differencein point probabilities is improved to an estimate of smaller order, a boundof the form

|IP[Tbn = s]− IP[Tbn = s+ t]| = O(n−2t+ n−1−a1+δ)

for any δ > 0 could perhaps be attained, leading to a final error estimate inTheorem 6.7 of order O(bn−1+n−a1+δ) for any δ > 0, to replace ε6.7(n, b).This would be of the ideal order O(b/n) for large enough b, but wouldstill be coarser for small b. Any further improvement would seem to entailfundamental changes to the method of proof.


With b and n as in the previous section, we wish to show that∣∣dTV (L(C[1, b]),L(Z[1, b]))− 1

2 (n+ 1)−1|1− θ|IE|T0b − IET0b|∣∣

≤ ε6.8(n, b),

where

ε6.8(n, b) = O(n−1b[n−1b+ n−β12+δ]

)


for any δ > 0 under Conditions (A0), (D1) and (B12), with β12 as inTheorem 9.10. The proof has much the same structure as that of Theo-rem 6.7, but uses the sharper estimates of Theorem 9.14 in place of thoseof Theorem 9.7.

As before, we begin with the formula

dTV(L(C[1, b]),L(Z[1, b])

)=∑r≥0

IP[T0b = r]


+

.

Now we observe that∣∣∣∣∣∣∑r≥0

IP[T0b = r]


+

−bn/2c∑r=0


×

bn/2c∑s=0


+

∣∣∣∣∣∣≤ IP[T0b > n/2] +

bn/2c∑r=0


×

∣∣∣∣∣∣n∑

s=bn/2c+1


∣∣∣∣∣∣≤ 4n−2IET 2

0b +(

maxn/2<s≤n

IP[T0b = s])/

IP[T0n = n] + IP[T0b > n/2]

≤ 8n−2IET 20b +

3ε9.5(2)(n/2, b)θPθ[0, 1]

, (12.6)

using Corollary 9.5(2) for the last estimate. Then, from Theorem 9.14, wehave∣∣∣∣∣∣bn/2c∑r=0


×

bn/2c∑s=0


+

−

bn/2c∑s=0

IP[T0b = s](s− r)(1− θ)

n+ 1IP[T0n = n]

+

∣∣∣∣∣∣≤ 1n2IP[T0n = n]

∑r≥0

IP[T0b = r]∑s≥0

IP[T0b = s]|s− r|

284 12. Proofs

×ε9.14(n, b) + 2(r ∨ s)|1− θ|n−1K0θ + 4φ∗9.8(n)

≤ 6θnPθ[0, 1]

IET0b ε9.14(n, b)

+ 4|1− θ|n−2IET 20b

K0θ + 4φ∗9.8(n)

(3

θPθ[0, 1]

), (12.7)

the last line using Theorem 11.10. The approximation in (12.7) is furthersimplified by noting that

bn/2c∑r=0

IP[T0b = r]

∣∣∣∣∣∣bn/2c∑s=0

IP[T0b = s](s− r)(1− θ)

n+ 1

+

−

∑s≥0

IP[T0b = s](s− r)(1− θ)

n+ 1

+

∣∣∣∣∣∣≤

bn/2c∑r=0

IP[T0b = r]∑

s>bn/2c

IP[T0b = s](s− r) |1− θ|

n+ 1

≤ |1− θ|n−1IE(T0b1lT0b > n/2) ≤ 2|1− θ|n−2IET 20b, (12.8)

and then by observing that

∑r>bn/2c

IP[T0b = r]

∑s≥0

IP[T0b = s]|(s− r)(1− θ)|

n+ 1

≤ n−1|1− θ| (IET0bIP[T0b > n/2] + IE(T0b1lT0b > n/2))≤ 4|1− θ|n−2IET 2

0b. (12.9)

Combining the contributions of (12.6)–(12.9), we thus find that∣∣∣∣∣∣dTV (L(C[1, b]),L(Z[1, b]))

− (n+ 1)−1∑r≥0

IP[T0b = r]

∑s≥0

IP[T0b = s](s− r)(1− θ)

+

∣∣∣∣∣∣≤ ε6.8(n, b)

=3

θPθ[0, 1]ε9.5(2)(n/2, b) + 2n−1IET0b ε9.14(n, b)

+ 2n−2IET 2

0b

4 + 3|1− θ|+

24|1− θ|φ∗9.8(n)

θPθ[0, 1]

. (12.10)


The quantity ε6.8(n, b) is seen to be of the order claimed under Conditions(A0), (D1) and (B12), by invoking Lemma 6.3, Theorem 9.14 and Corollaries9.8 and 9.5 (2), provided that S(∞) < ∞; this supplementary conditioncan be removed if φ∗9.8(n) is replaced by φ∗9.11(n) in the definition ofε6.8(n, b), which by Corollary 9.11 has the required order without therestriction on the ri implied by assuming that S(∞) <∞.

Finally, a direct calculation now shows that

∑r≥0

IP[T0b = r]

∑s≥0

IP[T0b = s](s− r)(1− θ)

+

= 12 |1− θ| IE|T0b − IET0b|,



For n ≥ n0 and 3θ ≤ b < n, we have to show that

dTV(L(C[b+ 1, n]),L(C∗[b+ 1, n])

)≤ dTV

(L(C[b+ 1, n]),L(C∗[b+ 1, n])

)≤ ε6.9(n, b),

where

ε6.9(n, b) = O

(([b ∧( n

log n

)]−(1∧θ∧g1∧a1)+δ))

for any δ > 0, under Conditions (A0) and (B01). So fix b, 3θ ≤ b < n/4.For any 0 ≤ l ≤ n, take any y ∈ 0, 1R such that

∑i,r iyir = n− l, where

R = Rbn =∑ni=b+1 ri. Then

IP[Z[b+ 1, n] = y]

IP[Z∗[b+ 1, n] = y]

=[ n∏i=b+1

ri∏r=1

1− θiri

(1 + Ei0)

1− θiri

(1 + Ei0(Z∗))

1−yir

1 + εi11 + εi1(Z∗)

yir]

≥ 1−n∑

i=b+1

θri(|Ei0|+ θ/2iri)iri − θ(1 + θ/2iri)

−n∑

i=b+1

ri∑r=1

yir|εi1|1− θ/iri

. (12.11)

Hence, recalling (6.3),

IP[C[b+ 1, n] = y]

IP[C∗[b+ 1, n] = y]

=IP[Z[b+ 1, n] = y]

IP[Z∗[b+ 1, n] = y]

IP[T0b(Z) = l]IP[T0b(Z∗) = l]

IP[T0n(Z∗) = n]IP[T0n(Z) = n]

286 12. Proofs

≥ 1− 18θ11

n∑i=b+1

i−1(ρi + θ/2iri)−32

n∑i=b+1

ri∑r=1

yir|εi1|

− 2c−111.9n

−1φ11.7(n)−D(l),

using Theorem 11.10 and (6.34), where

0 ≤ D(l) =

1− IP[T0b(Z) = l]IP[T0b(Z∗) = l]

+

≤ 1.

Hence, for any A ⊂ ZZR+,

IP[C[b+ 1, n] ∈ A]

≥ IP[C∗[b+ 1, n] ∈ A]− IP[C∗[b+ 1, n] ∈ A \ 0, 1R]

− 18θ11

n∑i=b+1

i−1(ρi + θ/2iri)−32

n∑i=b+1

ri∑r=1

|εi1|IP[C∗i1 = 1]

− 2c−111.9n

−1φ11.7(n)−n∑l=0

IP[T0b(C∗) = l]D(l). (12.12)

The remaining elements of (12.12) are now estimated as follows. First,from Lemma 13.2,

IP[C∗[b+ 1, n] ∈ A \ 0, 1R] ≤ b−1c13.2. (12.13)

Then, using (9.6),

IP[C∗i1 = 1] = eθ/iri

IP[Z∗i1 = 1]IP[T0n(i)(Z∗) = n− i]

IP[T0n(Z∗) = n]e−θ/iri

≤ θ

iri

IP[T0n(Z∗) = n− i]IP[T0n(Z∗) = n]

≤ 2θiri

in i ≤ n/2, so that

32

bn/2c∑i=b+1

ri∑r=1

|εi1|IP[C∗i1 = 1] ≤ 3θ

n∑i=b+1

i−1ρi.

For i > n/2, the simple inequality

n∑i=bn/2c+1

ri∑r=1

1lC∗ir = 1 ≤ 1

suffices to prove that

32

n∑i=bn/2c+1

ri∑r=1

|εi1|IP[C∗i1 = 1] ≤ 3

2ρ+n/2.


Finally, for l ≤ n/2 and b < n/4,

IP[T0b(C∗) = l]IP[T0b(Z∗) = l]

=IP[Tbn(Z∗) = n− l]

IP[T0n(Z∗) = n]≤ 1 + c11.12, (12.14)

from Lemma 11.12, giving

bn/2c∑l=0

IP[T0b(C∗) = l]D(l) ≤ (1 + c11.12)n∑l=0

IP[T0b(Z∗) = l]D(l)

= (1 + c11.12) dTV(L(T0b(Z∗)),L(T0b(Z))

)≤ (1 + c11.12)ε10.5(b),

from Theorem 10.5. For l > n/2,

IP[T0b(C∗) = l]IP[T0b(Z∗) = l]

≤ 1IP[T0n(Z∗) = n]

≤ n

θc11.9,

from Lemma 11.9 and (12.14), and D(l) ≤ 1; also, if b ≤ n/4θ log n andb < n, then θ log(b+ 1) ≤ n/(4b), and hence

IP[T0b(Z∗) > n/2] ≤ Po(θ log(b+ 1))(n/2b,∞) ≤ 2√2π

exp− n

24b

,

from Barbour, Holst and Janson (1992), Proposition A.2.3 (i) and (ii); ifalso b ≤ b(n) = n/(72 log n), then exp−n/(24b) ≤ n−3. Together, thesegive

n∑l=bn/2c+1

IP[T0b(C∗) = l]D(l) ≤ n2θc11.9−1.

Collecting the various estimates, and substituting them into (12.12), wefind, for any A ⊂ ZZR+, that

IP[C∗[b+ 1, n] ∈ A]− IP[C[b+ 1, n] ∈ A] ≤ ε6.9(n, b)

= ε10.5(b)(1 + c11.12(θ)) +1

n2θc11.9+ 5θ

n∑i=b+1

i−1ρi

+32ρ∗n/2 + 2c−1

11.9n−1φ11.7(n) + b−1(θ2 + c13.2), (12.15)

provided that b ≤ b(n); if b ≥ b(n), define ε6.9(n, b) = ε6.9(n, b(n)),since restricting the vector of random variables under consideration canonly make the approximation better. Under Conditions (A0) and (B01),ε6.9(n, b) is of the required asymptotic order, as can be seen from Propo-sition 6.1, Theorem 10.5 and Corollary 11.7, and the theorem is proved.

ut

288 12. Proofs


For any b such that 3θ ≤ b < n/8, we wish to prove that

dTV (L(C(n)),L(Z(b,n))) ≤ ε6.10(n, b)

where ε6.10(n, b) is of order O(n−1b+ b−g1+δ

)for any δ > 0, under

Conditions (A0), (D1) and (B11), where g1 = 1 ∧ g1.Let

Pbl = L(C[b+ 1, n]

∣∣∣ C[1, b] ∩ T0b(C) = l)

= L(C[b+ 1, n] |T0b(C) = l)

and

Pbl∗ = L

(C∗[b+1, n]

∣∣∣ C∗[1, b]∩T0b(C∗) = l)

= L(C∗[b+1, n] |T0b(C∗) = l)

denote the conditional distributions of the large components of C and C∗,given the small components. Then, from the definition of Z(b,n), it followsthat

dTV (L(C(n)),L(Z(b,n))) ≤ dTV(L(C[1, b]),L(Z[1, b])

)+ sup

0≤l≤n/2dTV (Pbl,Pbl

∗ ) + IP[T0b(Z) > n/2].

The first of these terms is at most ε6.7(n, b), by Theorem 6.7, and thethird is at most 2n−1IET0b, by Chebyshev’s inequality. It thus remains tobound dTV (Pbl,Pbl

∗ ) for 0 ≤ l ≤ n/2.So fix 0 ≤ l ≤ n/2, and set R = Rbn =

∑ni=b+1 ri as before. Then, for

any y ∈ 0, 1R such that∑i,r iyir = n− l,

Pbl∗ (y) =

IP[Z∗[b+ 1, n] = y]IP[Tbn(Z∗) = n− l]

and Pbl(y) =IP[Z[b+ 1, n] = y]IP[Tbn(Z) = n− l]

,

and hence

dPbl

dPbl∗

(y) =[ n∏i=b+1

ri∏r=1

1− θiri

(1 + Ei0)

1− θiri

(1 + E∗i0)

1−yir

1 + εi11 + ε∗i1

yir]

× IP[Tbn(Z∗) = n− l]IP[Tbn(Z) = n− l]

≥ 1− 18θ11

n∑i=b+1

i−1(ρi + θ/2iri)−32

n∑i=b+1

ri∑r=1

yir|εi1|

− 2c−111.9n

−1φ11.7(n),

much as for (12.11), but where the last inequality now also usesTheorem 11.10. Now argue as for (12.12), incorporating (12.13), to obtain

dTV (Pbl,Pbl∗ ) ≤ 18θ

11

n∑i=b+1

i−1ρi +32

n∑i=b+1

riρiIP[C∗i1 = 1 |T0b(C∗) = l]


+ 2c−111.9n

−1φ11.7(n) + b−1(θ2 + c13.2), (12.16)

since, for i > 3θ, θ ≤ iri/3. Now, in view of (9.6),

IP[C∗i1 = 1 |T0b(C∗) = l]

= eθ/iriIP[Z∗i1 = 1]IP[Tbn(Z∗)− iZ∗i1 = n− l − i]

IP[Tbn(Z∗) = n− l]e−θ/iri

≤ θ

iri

IP[Tbn(Z∗) = n− l − i]IP[Tbn(Z∗) = n− l]

≤ θ

iri

( n− l

n− l − i

) IP[Tbn(Z∗) < n− l − i− b]IP[Tbn(Z∗) < n− l − b]

≤ 2θiri, (12.17)

if i ≤ (n− l)/2. On the other hand, on Tbn(Z∗) = n− l,

n−l∑i=b(n−l)/2c+1

ri∑r=1

1lZ∗ir = 1 ≤ 1,

and thusn−l∑

i=b(n−l)/2c+1

riIP[C∗i1 = 1 |T0b(C∗) = l] ≤ 1. (12.18)

Combining (12.16) – (12.18), it follows that, for 0 ≤ l ≤ n/2,

dTV (Pbl,Pbl∗ ) ≤ 5θ

n∑i=b+1

i−1ρi +32ρ∗(n−l)/2 + 2c−1

11.9n−1φ11.7(n)

+ b−1(θ2 + c13.2), (12.19)

and taking

ε6.10(n, b) = 5θn∑

i=b+1

i−1ρi +32ρ∗n/4 + 2c−1

11.9n−1φ11.7(n)

+ b−1(θ2 + c13.2) + ε6.7(n, b) + 2n−1IET0b (12.20)

proves the theorem. The order estimates under Conditions (A0), (D1)and (B11) follow from Proposition 6.1, Corollary 11.7, Theorem 6.7 andLemma 6.3. ut


Taking Z(b,n) = (Z1, . . . , Zb, Z∗b+1, . . . , Z

∗n), and defining C(b,n) from the Con-

ditioning Relation, we are to show how close L(C(b,n)) is to L(C(n)). Theproof is much as for Theorem 6.9, but is a little easier.

290 12. Proofs

Using the dissection (ri, i ≥ 1) appropriate to Z also for the randomvariables Z∗j , we obtain, as in (12.11),

IP[Z[b+ 1, n] = y]

IP[Z(b,n)[b+ 1, n] = y]≥ 1− 18θ

11

n∑i=b+1

i−1(ρi+θ/2iri)−32

n∑i=b+1

ri∑r=1

yir|εi1|,

where y ∈ 0, 1R andR = Rbn =∑ni=b+1 ri. Hence, for any c ∈ ZZR

′

+ , whereR′ = R0b =

∑bi=1 ri, such that

∑bi=1

∑ri

j=1 icij +∑ni=b+1

∑ri

j=1 iyij = n, itfollows that

IP[C(b,n)[1, n] = (c, y)]

IP[C(b,n)[1, n] = (c, y)]=

IP[Z[b+ 1, n] = y]

IP[Z∗[b+ 1, n] = y]

IP[T0n(Z(b,n)) = n]IP[T0n = n]

≥ 1− 18θ11

n∑i=b+1

i−1(ρi + θ/2iri)

− 32

n∑i=b+1

ri∑r=1

yir|εi1| − 4c−111.9n

−1φ11.7(n),

from Theorem 11.10. The remaining argument is as in the proof ofTheorem 6.9, giving

dTV (L(C(n)),L(C(b,n))) ≤ ε6.11(n, b)

= 5θn∑

i=b+1

i−1ρi +32ρ∗n/2 (12.21)

+ 4c−111.9n

−1φ11.7(n) + b−1(θ2 + c13.2). ut

If Conditions (A0) and (B01) hold, then ε6.11(n, b) = O(b−(1∧g1∧a1)+δ

)for any δ > 0, by Proposition 6.1 and Theorem 11.7.


For 0 ≤ b < (n/8) min1, 2/[θ(1 + µ∗0)] and n ≥ n0, and for any y ∈ ZZb+for which T0b(y) =

∑bi=1 iyi ≤ n/2, we must show that∣∣∣∣ IP[C[1, b] = y]

IP[Z[1, b] = y]− 1∣∣∣∣ ≤ ε6.12(n, b),

where ε6.12(n, b) = O(n−1(b + T0b(y)) under Conditions (A0), (D1)and (B11).

From the definition of the distribution of C, and dropping theargument Z where possible, we have

1− IP[C[1, b] = y]IP[Z[1, b] = y]

= 1− IP[Tbn = n− T0b(y)]IP[T0n = n]


=1

IP[T0n = n]

∑s≥0

IP[T0b = s](IP[Tbn = n− s]− IP[Tbn = n− T0b(y)]).

Now, much as in the proof of Theorem 6.7, we deduce that∣∣∣∣ IP[C[1, b] = y]IP[Z[1, b] = y]

− 1∣∣∣∣

≤bn/2c∑s=0

IP[T0b = s]|IP[Tbn = n− s]− IP[Tbn = n− T0b(y)]|

IP[T0n = n]

+n∑

s=bn/2c+1

IP[T0b = s](IP[Tbn = n− s] + IP[Tbn = n− T0b(y))

IP[T0n = n].

The second term is bounded by

1IP[T0n = n]

(max

n/2<s≤nIP[T0b = s]

)+ IP[T0b > n/2]

IP[Tbn = n− T0b(y)]IP[T0n = n]

in which the first element, from Corollary 9.5 (1) and Theorem 11.10, is nolarger than 6ε∗9.5(1)(n, b)/θPθ[0, 1], and the second element is at most

2n−1IET0bIP[Tbn = n− T0b(y)]/IP[T0n = n]≤ 2n−1IET0b + 1

2 |1− IP[Tbn = n− T0b(y)]/IP[T0n = n]|

by Chebyshev’s inequality, noting that 2n−1IET0b ≤ 1/2, in view ofLemma 6.3 and the restriction on the range of b. The first term, usingTheorem 11.10 and Corollary 9.8, is at most

3nθPθ[0, 1]

n−2φ∗9.8(n)bn/2c∑s=0

IP[T0b = s]|T0b(y)− s|

≤3φ∗9.8(n)

θPθ[0, 1]

(IET0b + T0b(y)

n

).

Combining these bounds, we find that∣∣∣∣ IP[C[1, b] = y]IP[Z[1, b] = y]

− 1∣∣∣∣ ≤ ε6.12(n, b) (12.22)

= 2

6ε∗9.5(1)(n, b)

θPθ[0, 1]+

3φ∗9.8(n)

θPθ[0, 1]

(IET0b + T0b(y)

n

)+

2IET0b

n

,

proving the theorem. The order estimates under Conditions (A0), (D1)and (B11) follow from Lemma 6.3, Corollary 9.5 (1) and Corollary 9.8, ifS(∞) < ∞; otherwise, use Corollary 9.11 instead, to replace φ∗9.8(n) byφ∗9.11(n). ut

292 12. Proofs


Fix 0 < η < 1 and r ≥ 1. Choose any n > m1 > · · · > mr > nη forwhich Mr =

∑rl=1ml ≤ n(1 − η), and write xl = n−1ml, 1 ≤ l ≤ r, and

Xr =∑rl=1 xl. Then we must show that∣∣∣∣nrIP[Cmr ≥ 1, C[mr + 1, n] = y]

fθ(r)(x1, . . . , xr)

− 1∣∣∣∣ ≤ ε6.13(n, η),

where fθ(r) is as in (4.87) and where, under Conditions (A0) and (B11),ε6.13(n, η) = O

(n−(θ∧g1)+δ

)for any δ > 0, for each fixed 0 < η < 1; here,

for mr +1 ≤ i ≤ n, yi = 1 if i = ml, 1 ≤ l < r, and yi = 0 otherwise. To doso, we need the following lemma, which involves a calculation analogous tothose in (4.85), (4.90) and (5.10).

Lemma 12.1 Let m0 = m0(Z) be such that, for all m ≥ m0,

|εm1|+ θm−1(1 + |Em0|) ≤ 1/2.

Then, for all s ≥ m ≥ m0,∑l≥2 IP[Zm = l]IP[T0,m−1 = s− lm]IP[Zm = 1]IP[T0,m−1 = s−m]

≤ φ12.1(m, s),

where

φ12.1(m, s) = φ12.1(m, s, Z) = O(m(1−a1)+−θ

)under Conditions (A0) and (B01), uniformly in s/m ≥ 1 + η, for any fixedη > 0.

Proof. For s = m there is nothing to prove. Otherwise, direct computationgives

IP[Zm ≥ 2] ≤ rmIP[Zm1 ≥ 2] + 12rm(rm − 1)IP[Zm1 = Zm2 = 1],

and hence∑l≥2

IP[Zm = l]IP[T0,m−1 = s− lm]

≤ IP[Zm ≥ 2]maxl≥2

IP[T0,m−1 = s− lm]

≤

θ2

2m2(1 + |εm1|)2 +

θ

mEm1

maxl≥2

IP[T0,m−1 = s− lm],

whereas

IP[Zm = 1] = rmIP[Zm1 = 1]IP[Zm1 = 0]rm−1

≥ θm−1(1− |εm1|)

1− θ

mrm(1 + Em0)

rm

≥ θm−11− |εm1| − θm−1(1 + |Em0|) ≥ θ/(2m).


Now use Corollary 9.3 to bound IP[T0,m−1 = s − lm] from above, andTheorem 11.11 (ii) to bound IP[T0,m−1 = s − m] from below, giving thelemma, with

φ12.1(m, s) =Km

(1)m−θ(s−m)θm−1(1 + |εm1|)2 + 2Em1

φ12.1(m, s)

,

where

φ12.1(m, s) = θPθ

(s−m

m− 1− 1,

s−m

m− 1

)− 2ε10.12(m− 1)

− (m− 1)−θ(s−m+ 1)−(1−θ)φ11.6(m− 1, s−m).

The order estimates follow from Proposition 6.1, Theorem 10.12 and The-orem 11.6. ut

We now complete the proof in two steps. First, we show that∣∣∣∣ IP[Cmr≥ 1; C[mr + 1, n] = y]

IP[C∗mr

= 1; C∗[mr + 1, n] = y]− 1∣∣∣∣ ≤ eφ(1)6.13(n, y), (12.23)

where

φ(1)6.13(n, y)

= φ12.1(mr, n−Mr +mr) + 2c−111.9n

−1φ11.7(n)

+ 2

[θ

n∑i=mr

i−1ρi +r∑l=1

|εml,1|+θ2

2mr

]

+(mr − 1)−θ(n−Mr + 1)−(1−θ)φ11.6(mr − 1, n−Mr)

Pθ

(n−Mr

mr−1 − 1, n−Mr

mr−1

)− 2ε10.12(mr − 1)

,

provided that n and y are such that φ(1)6.13(n, y) ≤ 1. Under Conditions

(A0) and (B11), for each fixed η, φ(1)6.13(n, y) is of order O(n−(θ∧g1)+δ

)for any δ > 0uniformly over the admissible choices of y, as followsfrom Proposition 6.1, Theorem 11.6, Corollary 11.7, Lemma 12.1 andTheorem 10.12; hence this latter restriction is immaterial for the orderstatements. The estimation of (12.23) is similar to that in Theorem 6.9,except that now large as well as small values of the ratio have to be takeninto account.

First, by Lemma 12.1, we have∣∣∣∣ IP[Cmr≥ 1; C[mr + 1, n] = y]

IP[Cmr = 1; C[mr + 1, n] = y]− 1∣∣∣∣ ≤ φ12.1(mr, n−Mr +mr).

294 12. Proofs

Then, as for (12.11) in the proof of Theorem 6.9, we compute that

IP[Zmr= 1; Z[mr + 1, n] = y]

IP[Z∗mr= 1; Z∗[mr + 1, n] = y]

≤ exp

[n∑

i=mr

θ

i|Ei0|+

r∑l=1

(|εml,1|+

θ

ml

)]

≤ exp[θ

n∑i=mr

i−1ρi +r∑l=1

|εml,1|+rθ

mr

],

and that

IP[Zmr= 1; Z[mr + 1, n] = y]

IP[Z∗mr= 1; Z∗[mr + 1, n] = y]

≥ 1− 2

[θ

n∑i=mr

i−1ρi +r∑l=1

|εml,1|+θ2

2mr+rθ

mr

].

For the remaining elements, use Theorem 11.10 to show that∣∣∣∣ IP[T0n(Z∗) = n]IP[T0n(Z) = n]

− 1∣∣∣∣ ≤ 2c−1

11.9n−1φ11.7(n),

and then observe, by Theorems 11.6 and 10.12 and by (9.6), that∣∣∣∣ IP[T0,mr−1(Z) = n−Mr]IP[T0,mr−1(Z∗) = n−Mr]

− 1∣∣∣∣

≤(mr − 1)−θ(n−Mr + 1)−(1−θ)φ11.6(mr − 1, n−Mr)

Pθ

(n−Mr

mr−1 − 1, n−Mr

mr−1

)− 2ε10.12(mr − 1)

.

Combining these estimates, and observing that∣∣∣∣∣4∏t=1

(1 + ηt)− 1

∣∣∣∣∣ ≤ e

4∑t=1

|ηt|

if∑4t=1 |ηt| ≤ 1, the bound given in (12.23) is proved.

The second step is to show that∣∣∣∣nrIP[C∗mr

= 1, C∗[mr + 1, n] = y]fθ

(r)(x1, . . . , xr)− 1∣∣∣∣ ≤ eφ(2)6.13(n, y), (12.24)

where

φ(2)6.13(n, y) = φ(2)6.13(n, y, Z)

=θ

nxr+

2ε10.12(mr − 1) + (θ ∨ 1)

1−Xr

xr(nxr−1)

θPθ

(1−Xr

xr− 1, 1−Xr

xr

) +2ε10.12(n)Pθ[0, 1]

,


for any n and y such that φ(2)6.13(n, y) ≤ 1 and that ε10.12(n) <12Pθ[0, 1]. Since φ(2)6.13(n, y) and ε10.12(n) are uniformly of order O(n−θ)as n → ∞, both by Theorem 10.12, these restrictions are unimportantfor the asymptotic order statements. Now, because Z∗i ∼ Po(θ/i), andusing (9.6), we have

IP[C∗mr

= 1, C∗[mr + 1, n] = y]

=IP[Z∗mr

= 1, Z∗[mr + 1, n] = y, T0,mr−1(Z∗) = n−Mr]IP[T0n(Z∗) = n]

= exp

−θ

n∑i=mr

i−1

r∏l=1

( θ

ml

)( n

n−Mr

)× IP[n−Mr −mr + 1 ≤ T0,mr−1(Z∗) < n−Mr]

IP[T0n(Z∗) < n]

= n−r

θr r∏l=1

( 1xl

)( xθr1−Xr

)Pθ( 1−Xr

xr− 1, 1−Xr

xr

)Pθ[0, 1]

× exp

−θ( n∑i=nxr

i−1 + log xr)

×

IP[

1−Xr

xr−n−1 − 1 ≤ 1mr−1T0,mr−1(Z∗) < 1−Xr

xr−n−1

]Pθ

(1−Xr

xr− 1, 1−Xr

xr

)

×

Pθ[0, 1]IP[n−1T0n(Z∗) < 1]

.

The factor in square brackets is just fθ(r)(x1, . . . , xr), from (4.87) and

Corollary 4.8, and we also have

0 ≤n∑

i=nxr

i−1 + log xr ≤1nxr

;

then, using Corollary 4.8 for x ≤ 1 and (4.29), it follows that∣∣∣∣Pθ ( 1−Xr

xr − n−1− 1,

1−Xr

xr − n−1

)− Pθ

(1−Xr

xr− 1,

1−Xr

xr

)∣∣∣∣≤ (θ ∨ 1)

(1−Xr

xr

)θ ( 1nxr − 1

)θ.

Finally, from Theorem 10.12,∣∣∣∣IP [ 1−Xr

xr − n−1− 1 ≤ 1

mr − 1T0,mr−1(Z∗) <

1−Xr

xr − n−1

]− Pθ

(1−Xr

xr − n−1− 1,

1−Xr

xr − n−1

)∣∣∣∣ ≤ 2ε10.12(mr − 1)

296 12. Proofs

and

|IP[n−1T0n(Z∗) < 1]− Pθ[0, 1]| ≤ ε10.12(n),

completing the proof of (12.24). The theorem now follows by combining(12.23) and (12.24), taking

ε6.13(n, η) = e2φ(1)6.13(n, y) + φ(2)6.13(n, y); (12.25)

the order statements have already been discussed at (12.23) and (12.24).ut

We also prove (6.37), that

nθIP[L1(n) = n] = Γ(θ + 1)e−χ(1 +O(n−β01+δ))

for any δ > 0 under Conditions (A0) and (B01), where β01 = (1∧θ∧g1∧a1)as usual, and χ is as defined in (12.26) below. Indeed, direct calculationusing the Conditioning Relation gives

IP[L1(n) = n] =

IP[Zn = 1]IP[T0n = n]

n−1∏i=1

IP[Zi1 = 0]ri

=

IP[Zn = 1]e−θh(n)/IP[T0n = n] n−1∏i=1

eθ/iIP[Zi1 = 0]ri.

Now

log IP[Zi1 = 0] = log(1− IP[Zi1 ≥ 1]) = − θ

iri(1 +O(i−(g1∧a1∧1)))

under Conditions (A0) and (B01), and hence

χ = χ(Z) =∑i≥1

−ri log IP[Zi1 = 0]− θ/i (12.26)

is finite and positive, and∣∣∣∣∣eχn−1∏i=1

eθ/iIP[Zi1 = 0]ri − 1

∣∣∣∣∣ = O(n−(g1∧a1∧1));

in addition,

IP[Zn = 1] = n−1θ(1 + εn1)IP[Zn1 = 0]rn−1 = n−1θ(1 +O(n−(g1∧1))),

and, from Theorem 11.11 (ii),

IP[T0n = n] = n−1pθ(1) +O(n−β01+δ)

for any δ > 0. Combining these estimates with Corollary 4.8 and theasymptotics of h(n) gives (6.37).



We wish to show that, if b/n → α ∈ (0, 1) and Conditions (A0) and (B01)hold, then

2dTV(L(C[1, b]),L(Z[1, b])

)= Pθ

( 1α− 1,∞

)+αθ−1pθ(1/α)

pθ(1)+∫ 1−α

0

pθ(x)∣∣∣∣1− pθ

(α)(1− αx)pθ(1)

∣∣∣∣ dx+O

(|bn−1 − α|θ + n−(θ∧g1∧a1)+δ

),

for any δ > 0. We suppose throughout that n ≥ n0 is large enough that|bn−1 − α| ≤ α/2 and (α/2)θ ≥ 3θb−1 +K0θn

−1 are satisfied.We start once again from the total variation formula (12.4), though now

in absolute value form, writing

2dTV(L(C[1, b]),L(Z[1, b])

)=∑r≥0

IP[T0b = r]∣∣∣∣1− IP[Tbn = n− r]

IP[T0n = n]

∣∣∣∣= IP[T0b ≥ n− b] +

IP[T0b = n]IP[T0n = n]

IP[Tbn = 0]− 2IP[T0b = n]

+n−b−1∑r=0

IP[T0b = r]∣∣∣∣1− IP[Tbn = n− r]

IP[T0n = n]

∣∣∣∣ , (12.27)

provided that IP[Tbn = 0] ≥ IP[T0n = n]. Now IP[T0m = n] ≤ K0θn−1 from

Lemma 9.2, whatever the values of m and n; hence IP[T0b = n] ≤ K0θn−1,

and also, since (12.30) below implies that IP[Tbn = 0] ≥ (b/n)θ − 3θb−1, itfollows that IP[Tbn = 0] ≥ IP[T0n = n] indeed holds under our assumptions.

The various terms in (12.27) are estimated by making intensive use ofTheorem 11.11. First, for the sum, it follows from Theorem 11.11(ii) that∣∣∣∣∣

n−b−1∑r=0

IP[T0b = r]∣∣∣∣1− IP[Tbn = n− r]

IP[T0n = n]

∣∣∣∣− ∣∣∣∣1− nIP[Tbn = n− r]pθ(1)

∣∣∣∣∣∣∣∣∣

≤n−b−1∑r=0

IP[T0b = r]IP[Tbn = n− r]IP[T0n = n]

∣∣∣∣1− nIP[T0n = n]pθ(1)

∣∣∣∣≤ [2θε10.12(n) + θn−1φ11.6(n, n)]/pθ(1); (12.28)

then, by Theorem 11.11(iv),

|nIP[Tbn = n− r]− pθ(b/n)(1− r/n)| ≤ nθb−2(6θ + φ11.8(n)).

Combining these estimates, we have∣∣∣∣∣n−b−1∑r=0

IP[T0b = r]∣∣∣∣1− IP[Tbn = n− r]

IP[T0n = n]

∣∣∣∣

298 12. Proofs

−n−b−1∑r=0

IP[T0b = r]∣∣∣∣1− pθ

(b/n)(1− r/n)pθ(1)

∣∣∣∣∣∣∣∣∣ (12.29)

≤ [2θε10.12(n) + θn−1φ11.6(n, n) + nθb−2(6θ + φ11.8(n))]/pθ(1).

The second term is then estimated by observing that, from Theo-rem 11.11(ii) and Theorem 11.10,∣∣∣∣ IP[T0b = n]− b−1pθ(n/b)

IP[T0n = n]

∣∣∣∣≤ 3nθPθ[0, 1]b

2θbn−1ε10.12(b) + θn−1φ11.6(b, n),

and that∣∣∣∣ pθ(n/b)bIP[T0n = n]

− npθ(n/b)bpθ(1)

∣∣∣∣≤ 3nb−1pθ(n/b)

θPθ[0, 1]2θε10.12(n) + θn−1φ11.6(n, n)/pθ(1),

whereas, from Theorem 10.14,

|IP[Tbn = 0]− (b/n)θ| ≤ 3b−1θ; (12.30)

thus, and using the fact that xpθ(x) ≤ θ and that pθ(1) = θPθ[0, 1], wehave ∣∣∣∣ IP[T0b = n]

IP[T0n = n]IP[Tbn = 0]−

( bn

)θ−1 pθ(n/b)pθ(1)

∣∣∣∣≤ 3Pθ[0, 1]−1[2ε10.12(b) + b−1φ11.6(b, n) + θb−1

+ θ(2ε10.12(n) + n−1φ11.6(n, n)]. (12.31)

It then remains to replace expectations involving T0b with expectationsinvolving Xθ, which can be accomplished because of the approxima-tions with respect both to Kolmogorov and to Wasserstein distances inCorollary 10.13:

|IP[T0b ≥ n− b]− IP[Xθ ≥ (n/b)− 1]| ≤ dK(L(b−1T0b), Pθ),

and ∣∣∣∣∣∣∑r≥0

IP[T0b = r]

∣∣∣∣∣1− pθ(b/n)

((1− r/n) ∨ (b/n)

)pθ(1)

∣∣∣∣∣−∫ ∞

0

pθ(x)

∣∣∣∣∣1− pθ(b/n)

((1− bx/n) ∨ (b/n)

)pθ(1)

∣∣∣∣∣ dx∣∣∣∣∣

≤ θ(1 + θ)n2b−2 dW (L(b−1T0b), Pθ)/pθ(1),


since, from (4.44) and (11.15),∣∣∣ ddxpθ

(b/n)(1−x)∣∣∣ ≤ (θ∨1)(n/b) sup

b/n≤y<1

pθ(b/n)(y) ≤ θ(1+θ)(n/b)2 (12.32)

in 0 < x ≤ 1−b/n. Combining these observations with (12.29) and (12.31),together with the pair of additional inequalities pθ(b/n)(b/n) ≤ nθ/b andpθ(1) = θPθ[0, 1] ≤ θ, we thus find that

2dTV(L(C[1, b]),L(Z[1, b])

)= Pθ

(nb− 1,∞

)+( bn

)θ−1 pθ(n/b)pθ(1)

+∫ (n/b)−1

0

pθ(x)∣∣∣∣1− pθ

(b/n)(1− bx/n)pθ(1)

∣∣∣∣ dx+ η, (12.33)

where

η ≤ dK(L(b−1T0b), Pθ)1 + n/(bPθ[0, 1])+ dW (L(b−1T0b), Pθ)(n/b)2(1 + θ)/Pθ[0, 1]+ Pθ[0, 1]−1n−1φ11.6(n, n) + 2ε10.12(n) + nb−2(6θ + φ11.8(n))+ 2K0θn

−1 + 3Pθ[0, 1]−12ε10.12(b) + b−1φ11.6(b, n) + θb−1

+ θ(2ε10.12(n) + n−1φ11.6(n, n)). (12.34)

Bounds on dK(L(b−1T0b), Pθ) and dW (L(b−1T0b), Pθ) are given in Corol-lary 10.13, and the orders of the remaining terms follow from Theorems10.12, 11.8 and 11.6.

It remains to replace b/n by α in the expressions in (12.33), which isaccomplished at a cost of order O(|bn−1−α|θ). We assume throughout that|bn−1 −α| ≤ α/2, and consider first the case bn−1 < α. Then, from (4.29),( b

n

)θ ∣∣∣∣nb pθ(n/b)− 1αpθ(1/α)

∣∣∣∣≤ θ( bn

)θmax

Pθ

( 1α− 1,

n

b− 1), Pθ

( 1α,n

b

)= O(|bn−1 − α|θ) (12.35)

and

pθ(1/α)αpθ(1)

|(b/n)θ − αθ| ≤ 2θpθ(1/α)αpθ(1)

αθ−1|(b/n)− α|

≤ 2θPθ[0, 1]−1αθ−1|(b/n)− α|. (12.36)

For the integral, we immediately have∫ (n/b)−1

(1/α)−1

pθ(x)∣∣∣∣1− pθ

(b/n)(1− bx/n)pθ(1)

∣∣∣∣ dx ≤ n

bPθ[0, 1]Pθ

( 1α− 1,

n

b− 1),

300 12. Proofs

since pθ(b/n)(y) ≤ θn/b in 0 < y < 1. Now, for 0 < x < (1/α) − 1, writingy = 1− bx/n > α, we find from (11.15) that

|pθ(b/n)(1− bx/n)− pθ(α)(1− bx/n)|

= (θ/y)∣∣Pθ(b/n)[0, y − b/n)− Pθ

(α)[0, y − α)∣∣

≤ (θ/α)∣∣Pθ(b/n)[0, y − b/n)− Pθ

(α)[0, y − b/n)∣∣+ (θ/α)2|(b/n)− α|

≤ θ

α

[ θα

∣∣∣ bn− α

∣∣∣+ dTV (Pθ(b/n), Pθ(α))]≤ 3( θα

)2∣∣∣ bn− α

∣∣∣, (12.37)

this last by (10.14), and also that

|pθ(α)(1− bx/n)− pθ(α)(1− αx)| ≤ x|(b/n)− α|θ(1 + θ)α−2, (12.38)

from (12.32), so that∫ (1/α)−1

0

pθ(x)∣∣∣∣ ∣∣∣∣1− pθ

(b/n)(1− bx/n)pθ(1)

∣∣∣∣− ∣∣∣∣1− pθ(α)(1− αx)pθ(1)

∣∣∣∣ ∣∣∣∣ dx≤(

3θ + α−1(1 + θ)α2Pθ[0, 1]

) ∣∣∣ bn− α

∣∣∣. (12.39)

For the case α < b/n, swap the order of n/b and 1/α in the probabilitiesin (12.35), use estimates (12.37) and (12.38) in 0 < x < (n/b)−1, and thenbound the remaining integral by 1

αPθ[0,1] Pθ

(nb − 1, 1

α − 1). Finally, note

that Pθ(x, y) ≤ maxθ, Pθ[0, 1]|x− y|θ.


To prove Theorem 7.10, some preparations are needed. First, note thatthe Conditioning Relation and the independence of the Z∗i imply that wecan equally consider the conditional distribution of K∗

bn = Kbn(Z∗) givenT ∗bn = Tbn(Z∗) = l. Indeed, using the notation X[r, s] = (Xr, . . . , Xs) andsuppressing the superscript (n), we have, for any y ∈ ZZn+, and whatever thedistributions of the individual Zi,

IP[C[b+ 1, n] = y[b+ 1, n] |Tbn(C) = l]

=IP[C[b+ 1, n] = y[b+ 1, n], Tbn(C) = l]

IP[Tbn(C) = l]

=IP[Z[b+ 1, n] = y[b+ 1, n], Tbn(Z) = l, T0n(Z) = n]

IP[Tbn(Z) = l, T0n(Z) = n]

=IP[Z[b+ 1, n] = y[b+ 1, n], Tbn(Z) = l]IP[T0b(Z) = n− l]

IP[Tbn(Z) = l]IP[T0b(Z) = n− l]= IP[Z[b+ 1, n] = y[b+ 1, n] |Tbn(Z) = l]. (12.40)


To derive the joint distribution of K∗bn and T ∗bn, note first that the

unconditional distribution of K∗bn is Po(λbn), where

λbn = θh(n+ 1)− h(b+ 1),

and that, conditional on K∗bn = s, the distribution of Z∗[b + 1, n] is

multinomial:

L(Z∗[b+ 1, n] |K∗bn = s) = MN(s; L(U)), (12.41)

where

IP[U = r] = θ/(rλbn), b+ 1 ≤ r ≤ n. (12.42)

Thus, conditional on K∗bn = s, T ∗bn has the distribution of Ws =

∑sj=1 Uj ,

where the (Uj , j ≥ 1) are independent and identically distributed with thedistribution of U . Hence

IP[K∗bn = s, T ∗bn = l] = Po(λbn)sIP[Ws = l], (12.43)

and further progress depends on understanding the distribution of Ws.Let g : ZZ+ → IR be any bounded function, and take any s ≥ 1; then

IEUjg(Ws) =n∑

r=b+1

r

r(h(n+ 1)− h(b+ 1))IEg(Ws−1 + r), 1 ≤ j ≤ s,

implying that

IEWsg(Ws) = θs

n∑r=b+1

IEg(Ws−1 + r), (12.44)

where θs = s/(h(n+1)−h(b+1)). Hence, taking g = 1ll for any 1 ≤ l ≤ n,it follows that

lIP[Ws = l] = θsIP[Ws−1 ≤ l − b− 1]. (12.45)

On the other hand, because T ∗bn is a weighted sum of Poisson randomvariables, we have

IET ∗bng(T ∗bn) =n∑

r=b+1

rθ

rIEg(T ∗bn + r), (12.46)

from which, again taking g = 1ll, we find that

lIP[T ∗bn = l] = θIP[T ∗bn ≤ l − b− 1]; (12.47)

combining this with (12.43) and (12.45), it follows that

IP[K∗bn = s |T ∗bn = l]

= Po(λbn)sθsθ

IP[Ws−1 ≤ l − b− 1]IP[T ∗bn ≤ l − b− 1]

= Po(λbn)s− 1 IP[Ws−1 ≤ l − b− 1]IP[T ∗bn ≤ l − b− 1]

. (12.48)

302 12. Proofs

We next use Stein’s method to compare the probabilities involving Ws−1

and T ∗bn with corresponding probabilities from the limiting distribution Pφas n → ∞ of n−1T0n(Zφ), where the Zφi ∼ Po(φ/i) are independent.For Ws−1 we use φ = θs, and for T ∗bn we take φ = θ; the Stein equation isgiven by combining (8.17) and (8.18).

Lemma 12.2 The following estimates hold uniformly in n/2 ≤ l ≤ n and0 ≤ b ≤ n/4 and in 2 ≤ s ≤ 2λbn:

(i) IP[T ∗bn ≤ l − b− 1] = Pθ[0, n−1(l − b− 1)] +O(n−1(b+ log n));(ii) IP[Ws−1 ≤ l − b− 1] = Pθs

[0, n−1(l − b− 1)] +O(λ−1bn ).

Proof. Substitute u = n−1T ∗bn and f = 1l[0,x] with x = n−1(l − b− 1) intothe Stein Equation (8.18), and take expectations: this gives

IP[T ∗bn ≤ l − b− 1]− Pθ[0, n−1(l − b− 1)]

= IEθ

∫ 1

0

gx(n−1T ∗bn + t) dt− n−1T ∗bn gx(n−1T ∗bn)

= θIE

∫ 1

0

gx(n−1T ∗bn + t) dt− n−1n∑

i=b+1

gx(n−1(T ∗bn + i))

,

from (8.15). But now, from Lemma 8.6 and (8.28), for any w > 0,∣∣∣∣∣∫ 1

0

gx(w + t) dt− n−1n∑

i=b+1

gx(w + n−1i)

∣∣∣∣∣≤ b(1 + θ)

nx+

(1 + 2θ)(1 + θ)2nx

n∑i=1

(1

nw + i

)+

1nx

= O(n−1(b+ log n)),

uniformly in x ≥ 1/8, proving part (i).For part (ii), take u = n−1Ws−1 and θs in place of θ in the Stein

Equation (8.18), with f and x as above, and use (12.44) to give

IP[Ws−1 ≤ l − b− 1]− Pθs[0, n−1(l − b− 1)]

= IEθs

∫ 1

0

gx(n−1Ws−1 + t) dt− n−1Ws−1 gx(n−1Ws−1)

= IE

θs

∫ 1

0

gx(n−1Ws−2 + n−1Us−1 + t) dt (12.49)

− n−1

(θs −

1h(n+ 1)− h(b+ 1)

) n∑i=b+1

gx(n−1(Ws−2 + i))

.


Arguing from Lemma 8.6 and (8.28) much as above, we now find that

θs

∣∣∣∣∫ 1

0

gx(w + n−1Us−1 + t) dt

− n−1

(1− 1

θs(h(n+ 1)− h(b+ 1))

) n∑i=b+1

gx(w + n−1i)

∣∣∣∣∣= θs

∣∣∣∣∣∫ 1

0

gx(w + n−1Us−1 + t) dt− n−1n∑i=1

gx(w + n−1i)

∣∣∣∣∣+O(n−1b+ λ−1

bn )

= θs

∣∣∣∣∫ 1

0

[gx(w + n−1Us−1 + t)− gx(w + t)] dt∣∣∣∣

+O(n−1b+ λ−1bn ),

and then

θs

∣∣∣∣∫ 1

0

[gx(w + n−1Us−1 + t)− gx(w + t)] dt∣∣∣∣

≤∫ n−1Us−1

0

θs|gx(w + t)|+ |gx(w + 1 + t)| dt

= O(n−1Us−1),

again by (8.28). Part (ii) now follows upon taking expectations, since wehave IEn−1Us−1 = O(λ−1

bn ), and λbn ∼ θ log(n/b) = o(n/b). ut

The estimates of Lemma 12.2 could be substituted directly into (12.48).However, a simpler result is obtained if they are both first expressed interms of Pθ[0, 1].

Lemma 12.3 For any φ > 0 and 0 ≤ ε < 1, we have

0 ≤ 1− Pφ[0, 1− ε]Pφ[0, 1]

≤ φε

and

Pφ+δ[0, 1] = Pφ[0, 1](1− δh(φ+ 1)) +O(δ2),

uniformly in 0 < ε < 1/(2φ) and |δ| ≤ φ.

Proof. The formula Pφ[0, x] = e−γφxφ/Γ(φ + 1), 0 ≤ x ≤ 1, is givenin Corollary 4.8. This then implies that

Pφ[0, 1] ≥ Pφ[0, 1− ε] = Pφ[0, 1](1− ε)φ ≥ Pφ[0, 1](1− φε),

304 12. Proofs

proving the first part. The second then follows because

d

dφPφ[0, 1] = −h(φ+ 1)e−γφ

Γ(φ+ 1)= −h(φ+ 1)Pφ[0, 1]

and because | d2

dψ2Pψ[0, 1]| ≤ h(ψ+1)2+| ddψh(ψ+1)| is uniformly boundedin 0 ≤ ψ ≤ 2φ. ut

Substituting the results of Lemmas 12.2 and 12.3 into (12.48), we havethus proved that, as n→∞,

IP[K∗bn = s |T ∗bn = l]

= Po(λbn)s− 1(l − b− 1

n

)θs−θ Pθs [0, 1]Pθ[0, 1]

(1 +O(λ−1bn ))

= Po(λbn)s− 1 1− (θs − θ)h(θ + 1)+ O

(λ−1bn + n−1|θs − θ|(n− l + b) + (θs − θ)2

),

with

θs − θ = θ(sλ−1bn − 1),

where the order terms are uniform in n/2 ≤ l ≤ n, in 0 ≤ b ≤ n/4 and in2 ≤ s ≤ 2λbn. Henceb2λbnc∑s=2

∣∣IP[K∗bn = s |T ∗bn = l]− Po(λbn)s− 11− θh(θ + 1)(sλ−1

bn − 1)∣∣

= O(λ−1bn + λ

−1/2bn n−1(n− l)

), (12.50)

uniformly in n/2 ≤ l ≤ n and 0 ≤ b ≤ n/4. Of particular importance isthat the approximating expression in (12.50) is free of l, so that, to orderO(λ−1

bn +λ−1/2bn n−1(n−l)), the conditional distribution ofK∗

bn is independentof the value l of T ∗bn, for n/2 ≤ l ≤ n. The approximation can in fact befurther simplified to a Poisson approximation, by judicious modification ofthe mean.

Lemma 12.4 For c ≥ 0 and λ ≥ 1,∣∣∣∣Po(λ)s− 1(1− c(λ−1s− 1))Po(λ− c)s− 1

− 1∣∣∣∣ = O

(sc2

λ2+ c2

( sλ− 1)2

+c

λ

),

uniformly in 0 ≤ λ−1s ≤ 2 and 0 ≤ c2 ≤ λ.

Proof. From the formula for the Poisson density, the fraction is just

e−c(1− c/λ)−(s−1)(1− c(λ−1s− 1))= exp−c+ λ−1(s− 1)c− c(λ−1s− 1)


×

1 +O

(sc2

λ2+ c2

( sλ− 1)2)

= 1 +O

(sc2

λ2+ c2

( sλ− 1)2

+c

λ

),

uniformly in the given ranges. ut

This allows Theorem 7.10 to be proved. Apply Lemma 12.4 to the ap-proximation formula for the conditional distribution ofK∗

bn given in (12.50),use the Stein–Chen method to show that, for any λ > 0,

dTV (Po(λ+ 1), 1 + Po(λ)) ≤ 1/(λ+ 1),

and note that values of s not in the range 2 ≤ s ≤ 2λbn are covered by

Po(λbn − θh(θ + 1) + 1)0, 1 ∪ [2λbn,∞) = O(λ−1bn ),

by Chebyshev’s inequality. Collected, these give the bound

dTV (L(K∗bn |T ∗bn = l),Po(λbn−θh(θ+1)+1)) = O(λ−1

bn +λ−1/2bn n−1(n− l)),

uniformly in n/2 ≤ l ≤ n and 0 ≤ b ≤ n/4, from which Theorem 7.10follows.

Note that to replace the approximating distribution Po(λbn−θh(θ+1)+1)in Theorem 7.10 by the simpler Po(λbn) would incur an error of the largerorder O(λ−1/2

bn ).


13Technical complements

This chapter collects a number of technical lemmas, which are used else-where in the proofs. The first few involve properties of the distributionsof the components of C(n) and C∗(n). We begin with those of C∗(n). Thefollowing result reflects the extent to which the components C∗(n)

j mimicthe logarithmic property of the Z∗j , expressed in terms of a lower bound.

Lemma 13.1 There exists a constant c13.1 > 0 such that, for all n ≥ 3,

jIP[C∗(n)j = 1] ≥ c13.1

(n+ 1

n− j + 1

)1−θ

, 1 ≤ j ≤ n. (13.1)

Proof. By the Conditioning Relation and Lemma 4.12(iii), for 1 ≤ j ≤ n,

jIP[C∗(n)j = 1] =

jIP[Z∗j = 1]IP[T0n(Z∗)− jZ∗j = n− j]IP[T0n(Z∗) = n]

≥ ne−θ/je−θ/(n−j+1)

(n− j + 1n+ 1

)θ× IP[T0,n−j(Z∗)− jZ∗j 1lj≤n−j = n− j]

≥ ne−2θ

(n− j + 1n+ 1

)θ(13.2)

× IP[T0,n−j(Z∗)− jZ∗j 1lj≤n−j = n− j].

13. Technical complements 307

Thus, for j = n or j = n− 1, if n ≥ 3, we have

jIP[C∗(n)j = 1] ≥ c

(n+ 1

n− j + 1

)1−θ

, (13.3)

with c = (3/4)e−2θ min1, 2θe−θ. Note that, for n = 2, IP[C∗(n)1 = 1] = 0.

To exploit (13.2) for other values of j, note that, from (4.55) orLemma 11.9,

mIP[T0m(Z∗) = m] = θIP[T0m(Z∗) < m] ≥ c1 > 0

for some c1 > 0 and for all m ≥ 1, because m−1T0m(Z∗) → Xθ. Now, ifm ≥ 2 and k > m,

mIP[T0m(Z∗)− kZ∗k1k≤m = m] = mIP[T0m(Z∗) = m] ≥ c1; (13.4)

if 1 ≤ k ≤ m, then the size–biasing equation (4.14) gives

mIP[T0m(Z∗)− kZ∗k = m]= θIP[T0m(Z∗)− kZ∗k < m]

− θIP[T0m(Z∗)− kZ∗k = m− k] (13.5)≥ θIP[T0m(Z∗) < m]− IP[T0m(Z∗)− kZ∗k = m− k]≥ c1 − θIP[T0m(Z∗)− kZ∗k = m− k]. (13.6)

But, essentially as for Corollary 9.3, for 1 ≤ k ≤ m,

IP[T0m(Z∗)− kZ∗k = m− k]

≤ eθ(m− k + 1m+ 1

)θIP[T0,m−k(Z∗)− kZ∗k1l2k≤m = m− k],

and, by (13.5),

IP[T0,m−k(Z∗)− kZ∗k1l2k≤m = m− k] ≤ (2θ ∨ 1)/(m− k + 1),

so that IP[T0m(Z∗) − kZ∗k = m − k] = O(m−θ) uniformly in 1 ≤ k ≤ m;hence, from (13.6), there exists an m0 ≥ 2 such that

mIP[T0m(Z∗)− kZ∗k = m] ≥ 12c1 (13.7)

for all 1 ≤ k ≤ m and for all m ≥ m0. Finally, it is immediate that

min2≤m≤m0

min1≤k≤m

mIP[T0m(Z∗)− kZ∗k = m] =: c2 > 0; (13.8)

combining (13.3) – (13.8), it follows that (m + 1)IP[T0m(Z∗) − kZ∗k = m]is bounded below by minc2, c1/2, uniformly in m ≥ 2 and in 1 ≤ k ≤ m.Using this and (13.4) to bound the right hand side of (13.2) in the range1 ≤ j ≤ n− 2, the lemma follows. ut

The next lemma shows that C∗(n) usually has at most one componentof any given size, whenever the size is big enough.

308 13. Technical complements

Lemma 13.2 For any 0 ≤ b ≤ n,

IP

n⋃j=b+1

C∗(n)j ≥ 2

≤ b−1c13.2,

where

c13.2 =

θ2(

2θ∨1θc11.9

)136 + 9

2θ

if θ < 1;

12θ

2eθ if θ ≥ 1.

Proof. By the conditioning relation, and because Z∗j ∼ Po(θ/j),

IP[C∗(n)j = l] =

IP[Z∗j = l]IP[T0n(Z∗)− jZ∗j = n− jl]IP[T0n(Z∗) = n]

≤IP[Z∗j = l]IP[T0n(Z∗) = n− jl]

IP[Z∗j = 0]IP[T0n(Z∗) = n]

=1l!

(θ

j

)l IP[T0n(Z∗) = n− jl]IP[T0n(Z∗) = n]

. (13.9)

For θ ≥ 1, the ratio of the probabilities is at most 1, from (4.16), giving

IP[C∗(n)j ≥ 2] =

∑l≥2

1l!

(θ

j

)l≤ 1

2j−2θ2eθ,

proving the lemma when θ ≥ 1.For θ < 1, observe that

IP[T0n(Z∗) = n− jl]= exp−θ[h(n+ 1)− h(n− jl + 1)]IP[T0,n−jl(Z∗) = n− jl]≤ exp−θ[h(n+ 1)− h(n− jl + 1)]min1, θ/(n− jl),

by (4.15), giving

IP[T0n(Z∗) = n− jl] ≤ (2θ ∨ 1)/(n− jl + 1)1−θ(n+ 1)θ,

whereas IP[T0n(Z∗) = n] ≥ n−1θc11.9 by Lemma 11.9. These bounds arenow used in combination with (13.9), distinguishing different ranges of j.

First, it is clear that there is no contribution from j > n/2. Then, in therange n/(s+ 1) < j ≤ n/s for any integer s ≥ 2, we have

IP[C∗(n)j ≥ 2] ≤

s∑l=2

1l!

(θ

j

)l ( 2θ ∨ 1θc11.9

)n

(n− jl + 1)1−θ(n+ 1)θ. (13.10)

For s ≥ 3, recalling that θ < 1, we bound this by

IP[C∗(n)j ≥ 2]


≤(

2θ ∨ 1θc11.9

)s−1∑l=2

1l!

(θ

j

)l (s

s− l

)θ+

1s!

(θ

j

)sn1−θ

≤(

2θ ∨ 1θc11.9

)s−1∑l=2

1l!

(θ

j

)l (s

s− l

)+θ

s!

(θ

j

)s−1s+ 1nθ

.

For s ≥ 5, the expression in braces is bounded above by(θ

j

)2(s

s− 2+s+ 1s!

)≤ 2

(θ

j

)2

;

for s = 4 by 45(θ/j)2/24; and for s = 3 by 13(θ/j)2/6. For s = 2, we simplyuse (13.10). Adding over j, this givesn∑

j=b+1

IP[C∗(n)j ≥ 2] ≤

(2θ ∨ 1θc11.9

)13θ2

6b+

12

(3θn

)2n

θ

((n/3) + 1n+ 1

)θ

≤ θ2

b

(2θ ∨ 1θc11.9

)136

+92θ

,


We now turn to the components of C(n), proving results related to butdifferent from those just proved for C∗(n). The first two are related to thelogarithmic property, but now involve upper bounds.

Lemma 13.3 If µ∗0 <∞, there exists a constant

Kn(2) = Kn

(2)(θ, Z) = 3Kn(1)/Pθ[0, 1]

such that

iIEC(n)i ≤ Kn

(2)1 + µi + χ(θ)i1 (n),

for all n ≥ n0 and 1 ≤ i ≤ (n+ 1)/2.

Proof. It is immediate that IEC(n)i = riIEC(n)

i1 and that

IP[C(n)i1 = l] =

IP[Zi1 = l]IP[T0n(i) = n− il]

IP[T0n = n].

Using Lemma 9.2 when il ≤ (n+ 1)/2, we have

IP[T0n(i) = n− il] ≤ 2(n+ 1)−1K0θ;

then, from Theorem 11.10, for n ≥ n0,

IP[T0n = n] ≥ θPθ[0, 1]/3n.

Hence, if l ≤ (n+ 1)/2i,

IP[C(n)i1 = l] ≤ 6K0θ

θPθ[0, 1]IP[Zi1 = l].


Thus

IEC(n)i1I[C(n)

i1 ≤ (n+ 1)/2i] ≤ 6K0θ

θPθ[0, 1]IEZi1

≤ 3Kn(1)

iriPθ[0, 1](1 + µi), (13.11)

from (6.7) and the definition of Kn(1) in Corollary 9.3.

Furthermore, for i ≤ (n+ 1)/2, from Corollary 9.3 and Theorem 11.10,

IEC(n)i1I[C(n)

i1 > (n+ 1)/2i] =θ

iri


lεilIP[T0n

(i) = n− il]IP[T0n = n]

≤ θ

iri

3nθPθ[0, 1]


lεil IP[T0n(i) = n− il]

≤ n−1Kn(1) θ

iri

3nθPθ[0, 1]


lεil

(n

n− il + 1

)1−θ

=3Kn

(1)

iriPθ[0, 1]χ

(θ)i1 (n),

whenever i ≤ (n+ 1)/2. Hence

iriIEC(n)i1I[C(n)

i1 > (n+ 1)/2i] ≤ (3Kn(1)/Pθ[0, 1])χ(θ)

i1 (n)

whenever i ≤ (n + 1)/2, and adding this to (13.11) completes the proof.ut

Lemma 13.4 For all n ≥ n0 and all 1 ≤ j ≤ n,

jIP[C(n)j = 1] ≤ c13.4

(n

n− j + 1

)1−θ

,

where

c13.4 = 3Kn(1)(1 + ε∗01)/Pθ[0, 1]

is uniformly bounded in n under Conditions (A0) and (B01).

Proof. From Corollary 9.3 and Theorem 11.10, for all n ≥ n0,

IP[C(n)j = 1] ≤ rjIP[Zj1 = 1]IP[T0n

(j)(Z) = n− j]IP[T0n(Z) = n]

≤ Kn(1)j−1θ(1 + |εj1|)

3nθPθ[0, 1]

(n− j + 1)−(1−θ)n−θ

≤ 3Kn(1)(1 + ε∗01)jPθ[0, 1]

(n

n− j + 1

)1−θ

,


giving the required result. ut

The final lemma concerning the components of C(n) limits the probabilityof having very large values of Cij(n) for small values of i; it can be thoughtof as restricting the possibility that there may be very many componentsof a particular small size, an event which depends in detail on the tails ofthe distributions of the Zi.

Lemma 13.5 If b ≤ n/4, then

IP[ b⋃i=1

ri⋃j=1

C(n)ij > (n+ 1)/2i

]≤ 6Kn

(1)

(n+ 1)Pθ[0, 1]

b∑i=1

χ(θ)i1 (n).

Proof. Observe that

IP[ b⋃i=1

ri⋃j=1

C(n)ij > (n+ 1)/2i

]

≤b∑i=1

riθ

iri


εilIP[T0n

(i) = n− il]IP[T0n = n]

≤b∑i=1

θi−1


(2iln+ 1

)εil

IP[T0n(i) = n− il]

IP[T0n = n],

and then argue as in Lemma 13.3. ut

The remaining results in this chapter consist of routine calculations, allof them similar in structure, which are widely used in the technical partsof the proofs. We assume the usual notation, and write θ = θ ∧ 1.

We suppose throughout the remainder of the chapter that the nonnega-tive real numbers pt, t ∈ ZZ, are defined by

p0 = Kn−θ(b+1)θ; pt = Kn−θ(t+1)−(1−θ), t ≥ b+1; pt = 0 otherwise,(13.12)

for some K > 0. Because of Corollary 9.3, the pt act as bounds forprobabilities such as IP[Tbn(Z) = t].

Lemma 13.6 For pt as in (13.12),∑s−1t=0 pt ≤ Kθ−1n−θsθ.

Proof. A simple integral inequality suffices. ut

Lemma 13.7 For any vi ≥ 0, i ≥ 1, we have

(1) :b(s+1)/4c∑i=b+1

b(s+1)/2ic∑l=2

lεilps−il ≤ Kn−θ(s+ 1)−(1−θ)21−θb(s+1)/4c∑

i=1

Fi1;


(2) :bs/2c∑i=b+1


il<s

lεilps−il ≤ Kn−θ(s+ 1)−(1−θ)φθ1(s);

(3) :bs/2c∑i=b+1

∑l≥2il=s

lεilp0 ≤ Kn−θ(s+ 1)−(1−θ)u∗1(s)(b+ 1s+ 1

)θ;

(4) :bα(s+1)c∑i=b+1

viθ

irips−i ≤ Kθn−θ(s+ 1)(1− α)−(1−θ)

bα(s+1)c∑i=1

viiri,

0 < α < 1;

(5) :s∑

i=bα(s+1)c+1

viθ

irips−i ≤ Kθn−θ

(1− α)(s+ 1)θ

θα(s+ 1)max

i>α(s+1)

viri,

0 < α < 1;

(6) :bα(s+1)c∑i=b+1

viθ

irips−2i ≤ Kθn−θ(s+ 1)(1− 2α)−(1−θ)

bα(s+1)c∑i=1

viiri,

0 < α < 1/2;

(7) :bs/2c∑

i=bα(s+1)c+1

viθ

irips−2i ≤ Kθn−θ

(1− 2α)(s+ 1)θ

θα(s+ 1)max

i>α(s+1)

viri,

0 < α < 1/2;

(8) :b(s+1)/6c∑i=b+1

viθ

iri

b(s+1)/2ic−1∑l=2

εilps−i(l+1)

≤ Kθn−θ(s+ 1)−(1−θ)21−θb(s+1)/6c∑

i=1

Ei1viiri

;

(9) :b(s+1)/4c∑i=b+1

viθ

iri

bs/ic−1∑l=b(s+1)/2ic

i(l+1)<s

εilps−i(l+1)

≤ 2Kθn−θ(s+ 1)−(2−θ)b(s+1)/4c∑

i=1

viriχ

(θ)i2 (s);

(10) :bs/3c∑

i=b(s+1)/4c+1

viθ

iriεi2ps−3i

≤ Kθn−θ(s+ 1)−(1−θ)θ−141−θ(ε∗s/4,2/r−s/4) max

i>s/4vi;

(11) :bs/3c∑i=b+1

viθ

iri

∑l≥2

i(l+1)=s

εilp0


≤ Kθn−θ(s+ 1)−(1−θ)s−1u∗2(s)(b+ 1s+ 1

)θmaxi>0

vi.

Proof. For parts (1,4,6,8), simply use (13.12) to bound pt by the largestvalue it could take in the ranges of t under consideration. For example,in (8), s− i(l+1) ≥ (s−1)/2, which gives ps−i(l+1) ≤ Kn−θ2/(s+1)1−θ;then note that Ei1 =

∑l≥2 εil.

For parts (5,7,10), bound the coefficient of pt by the largest value it cantake in the given range of indices, and then use Lemma 13.6 to bound thesum of the pt. For example, in (10),

viεi2/iri ≤ 4(s+ 1)−1(ε∗s/4,2/r−s/4) max

i>s/4vi

for each i ≥ [(s + 1)/4] + 1, and the largest possible index t for pt in thegiven range is s − 3b(s + 1)/4c + 1 ≤ (s − 3)/4, so that their sum is atmost Kθ−1n−θ(s+ 1)/4θ, by Lemma 13.6.

For parts (3) and (11), (13.12) bounds p0, and the definitions of u∗j (s),j = 1, 2, complete the proof; in (11), observe also that i−1 = s−1(l + 1).For part (2), use the bound for pt and the definition of φα1 (s); do the samefor part (9), but multiply εil by 2i(l + 1)/(s + 1) ≥ 1 and then use thedefinition of χ(θ)

i2 (s). ut

In the next two lemmas, the non–negative real numbers qt are definedby

qt = 0, t < b; qb = Kn−θ(b+ 1)−(1−θ);

qt = Cn−θ(t+ 1)−(2−θ), t > b, (13.13)

for some C,K > 0. The qt act as (elements of the) bounds for differences ofprobabilities such as |IP[Tbn = t]− IP[Tbn = t+1]|, in view of Theorem 9.7.The first lemma requires no proof.

Lemma 13.8 For qt as defined in (13.13),

t+m−1∑j=t

qj ≤ Cn−θm(t+ 1)−(2−θ).

for any t ≥ b+ 1 and m ≥ 1.

Lemma 13.9 Suppose that s ≥ 2b + 3, and that the qt are as definedin (13.13). Then

(1) :b(s+1)/4c∑i=b+1

θ|εi1|iri

(1 + |εi1|)s−i−1∑t=s−2i

qt


≤ Cθn−θ(s+ 1)−(2−θ)22−θb(s+1)/4c∑

i=1

r−1i |εi1|(1 + |εi1|);

(2) :b(s+1)/6c∑i=b+1

θ|εi1|iri

b(s+1)/2ic−1∑l=2

εil

s−i−1∑t=s−i(l+1)

qt

≤ Cθn−θ(s+ 1)−(2−θ)22−θb(s+1)/6c∑

i=1

r−1i |εi1|Fi1;

(3) :bα(s+1)c∑i=b+1

viθ

iriqs−i ≤ Cθn−θ(1− α)(s+ 1)−(2−θ)

bα(s+1)c∑i=1

viiri

+ 2Kθn−θ(s+ 1)−(2−θ)(1− α)−(1−θ)r−s/2−1

(maxi>s/2

vi

)× 1lbα(s+1)c≥s−b, 0 ≤ α < 1;

(4) :bα(s+1)c∑i=b+1

viθ

iriqs−2i ≤ Cθn−θ(1− 2α)(s+ 1)−(2−θ)

bα(s+1)c∑i=1

viiri

+ 4Kθn−θ(s+ 1)−(2−θ)(1− 2α)−(1−θ)r−s/4−1

(maxi>s/4

vi

)× 1l2bα(s+1)c≥s−b, 0 ≤ α < 1/2;

(5) :b(s+1)/4c∑i=b+1

b(s+1)/2ic∑l=2

lεilqs−il ≤ Cn−θ(s+ 1)−(2−θ)22−θb(s+1)/4c∑

i=1

Fi1;

(6) :b(s+1)/6c∑i=b+1

viθ

iri

b(s+1)/2ic−1∑l=2

εilqs−i(l+1)

≤ Cθn−θ(s+ 1)−(2−θ)22−θb(s+1)/6c∑

i=1

Ei1viiri

.

Proof. For parts (1) and (2), we use Lemma 13.8 to bound the qt–sums;in each case, the lower limit for t is at least 1

2 (s − 1), and we recall that12 (s− 1) ≥ b+ 1 if s ≥ 2b+ 3. In parts (5) and (6), we also have the upperbound Cn−θ2/(s + 1)2−θ for all elements qt in the sums. For parts (3)and (4), the argument is analogous, except when indices t ≤ b are possiblefor qt. Taking part (3), this happens if s− α(s+ 1) ≤ b, in which case thevalue i = s−b gives a contribution of θvs−b/(s−b)rs−bqb, to be boundedusing (13.13) and the facts that s ≥ 2b + 3 and b + 1 ≥ (s + 1)(1 − α);since qs−i = 0 for all i > s− b, and the remaining positive values of qt areno greater than

Cn−θ(1− α)(s+ 1)−(2−θ),


the stated result follows. A similar argument applies in part (4) if s− b iseven and i = 1

2 (s− b) ≤ [α(s+ 1)]. ut


References

[1] D. J. Aldous. Exchangeability and related topics. In Ecole d’ete de proba-bilites de Saint-Flour XIII, volume 1117 of Lecture Notes in Mathematics,pages 1–198. Springer, Berlin, 1985.

[2] D. J. Aldous and J. W. Pitman. Brownian bridge asymptotics for randommappings. Random Structures and Algorithms, 5:487–512, 1994.

[3] D. J. Aldous and J. W. Pitman. Tree-valued markov chains derived fromgalton-watson processes. Annales de l’Institut Henri Poincare, 34:637–686,1998.

[4] P. Andersson. Random circuit decompositions of complete graphs. Unpub-lished, 2002.

[5] R. Arratia. Independence of prime factors: total variation and Wasser-stein metrics, insertions and deletions, and the Poisson-Dirichlet process.Preprint, 1996.

[6] R. Arratia. On the amount of dependence in the prime factorizationof a uniform random integer. In First Erdos Workshop on ProbabilisticCombinatorics, 1998. In press.

[7] R. Arratia, A. D. Barbour, and S. Tavare. Poisson process approximationsfor the Ewens Sampling Formula. Annals of Applied Probability, 2:519–535,1992.

[8] R. Arratia, A. D. Barbour, and S. Tavare. On random polynomials overfinite fields. Mathematical Proceedings of the Cambridge PhilosophicalSociety, 114:347–368, 1993.

[9] R. Arratia, A. D. Barbour, and S. Tavare. Random combinatorial structuresand prime factorizations. Notices of the American Mathematical Society,44:903–910, 1997a.

References 317

[10] R. Arratia, A. D. Barbour, and S. Tavare. Expected l1 distance inPoisson-Dirichlet approximations for random permutations: a tale of threecouplings. Preprint, 1997b.

[11] R. Arratia, A. D. Barbour, and S. Tavare. The Poisson-Dirichlet distri-bution and the scale invariant Poisson process. Combinatorics, Probabilityand Computing, 8:407–416, 1999a.

[12] R. Arratia, A. D. Barbour, and S. Tavare. On Poisson-Dirichlet limits forrandom decomposable combinatorial structures. Combinatorics, Probabilityand Computing, 8:193–208, 1999b.

[13] R. Arratia and D. Stark. A total variation distance invariance principle forprimes, permutations, polynomials and Poisson-Dirichlet. Preprint, 1996.

[14] R. Arratia, D. Stark, and S. Tavare. Total variation asymptotics for Poissonprocess approximations of logarithmic combinatorial assemblies. Annals ofProbability, 23:1347–1388, 1995.

[15] R. Arratia and S. Tavare. The cycle structure of random permutations.Annals of Probability, 20:1567–1591, 1992a.

[16] R. Arratia and S. Tavare. Limit theorems for combinatorial structuresvia discrete process approximations. Random Structures and Algorithms,3:321–345, 1992b.

[17] R. Arratia and S. Tavare. Independent process approximations for randomcombinatorial structures. Advances in Mathematics, 104:90–154, 1994.

[18] E. Bach. Analytic Methods in the Analysis and Design of Number-theoreticAlgorithms. ACM Distinguished Dissertations. MIT Press, Cambridge, MA,1985.

[19] M. B. Barban and A. I. Vinogradov. On the number theoretic basis ofprobabilistic number theory. Doklady Akademii Nauk SSSR, 154:495–496,1964.

[20] A. D. Barbour. Comment on a paper of Arratia, Goldstein and Gordon.Statistical Science, 5:425–427, 1990.

[21] A. D. Barbour, L. H. Y. Chen, and W.-L. Loh. Compound Poisson approx-imation for nonnegative random variables via Stein’s method. Annals ofProbability, 20:1843–1866, 1992.

[22] A. D. Barbour and P. G. Hall. On the rate of Poisson convergence. Math-ematical Proceedings of the Cambridge Philosophical Society, 95:473–480,1984.

[23] A. D. Barbour, L. Holst, and S. Janson. Poisson Approximation. OxfordUniversity Press, Oxford, 1992.

[24] A. D. Barbour and S. Tavare. A rate for the Erdos-Turan law. Combina-torics, Probability and Computing, 3:167–176, 1994.

[25] E. Barouch and G. M. Kaufman. Probabilistic modelling of oil and gasdiscovery. In F. S. Roberts, editor, Energy: Mathematics and Models, pages133–150. SIAM, Philadelphia, PA., 1976.

[26] J. P. Bell, E. A. Bender, P. J. Cameron, and L. B. Richmond. Asymp-totics for the probability of connectedness and the distribution of number of

318 References

components. Electronic Journal of Combinatorics, 7, 2000. Research Paper33.

[27] F. Bergeron, G. Labelle, and P. Leroux. Combinatorial Species and Tree-likeStructures, volume 67 of Encyclopedia of Mathematics and its Applications.Cambridge University Press, Cambridge, 1998.

[28] M. R. Best. The distribution of some variables on symmetric groups. Konin-klijke Nederlandse Akademie van Wetenschappen. Indagationes Mathemat-icae, 73:385–402, 1970.

[29] A. Beurling. Analyse de la loi asymptotique de la distribution des nombrespremiers generalises, I. Acta Mathematica, 68:255–291, 1937.

[30] P. Billingsley. On the central limit theorem for the prime divisor function.American Mathematical Monthly, 76:132–139, 1969.

[31] P. Billingsley. On the distribution of large prime divisors. PeriodicaMathematica Hungarica, 2:283–289, 1972.

[32] P. Billingsley. The 1973 Wald memorial lecture: The probability theory ofadditive arithmetic functions. Annals of Probability, 2:749–791, 1974.

[33] P. Billingsley. Convergence of Probability Measures. Wiley, New York,second edition, 1999.

[34] B. Bollobas. Random Graphs. Academic Press, London, 1985.

[35] B. Bollobas. Random Graphs. Cambridge University Press, second edition,2001.

[36] J. D. Bovey. An approximate probability distribution for the order ofelements of the symmetric group. Bulletin of the London MathematicalSociety, 12:41–46, 1980.

[37] F. Brenti. Unimodal, log-concave, and Polya frequency sequences in com-binatorics, volume 413 of Memoirs of the American Mathematical Society.American Mathematical Society, Providence, RI, 1989.

[38] A. A. Buchstab. An asymptotic estimation of a general number-theoreticfunction. Matematicheskii Sbornik, 44:1239–1246, 1937.

[39] M. Car. Factorization dans Fq(X). Comptes Rendus de l’Academie desSciences. Serie I. Mathematique, 294:147–150, 1982.

[40] M. Car. Ensembles de polynomes irreductibles et theoremes de densite.Acta Arithmetica, 44:323–342, 1984.

[41] A. Cayley. A theorem on trees. Quarterly Journal of Pure and AppliedMathematics, 23:376–378, 1889.

[42] M. Csorgo and P. Revesz. Strong Approximation in Probability andStatistics. Academic Press, New York, 1981.

[43] D. J. Daley and D. Vere-Jones. An Introduction to the Theory of PointProcesses. Springer-Verlag, New York, 1988.

[44] F. N. David. Games, Gods and Gambling. Hafner Publishing Co., NewYork, 1962.

[45] F. N. David and D. E. Barton. Combinatorial Chance. Hafner PublishingCo., New York, 1962.

References 319

[46] H. Delange. Generalisation du theoreme de Ikehara. Annal Scientifiques del’Ecole Normale Superieure, 71:213–242, 1954.

[47] H. Delange. Sur des formules de Atle Selberg. Acta Arithmetica, 19:105–146,1971.

[48] J. M. DeLaurentis and B. G. Pittel. Counting subsets of the random par-tition and the ‘Brownian bridge’ process. Stochastic Processes and theirApplications, 15:155–167, 1983.

[49] J. M. DeLaurentis and B. G. Pittel. Random permutations and Brownianmotion. Pacific Journal of Mathematics, 119:287–301, 1985.

[50] L. Devroye. Applications of the theory of records in the study of randomtrees. Acta Informatica, 26:123–130, 1988.

[51] P. Diaconis and D. Freedman. Finite exchangeable sequences. Annals ofProbability, 8:745–764, 1980.

[52] P. Diaconis, M. McGrath, and J. W. Pitman. Riffle shuffles, cycles anddescents. Combinatorica, 15:11–29, 1995.

[53] P. Diaconis and J. W. Pitman. Permutations, record values and randommeasures. Unpublished lecture notes, Statistics Department, University ofCalifornia, Berkeley, 1986.

[54] H. G. Diamond. Chebychev estimates for Beurling generalized primenumbers. Proceedings of the American Mathematical Society, 39:503–508,1973.

[55] K. Dickman. On the frequency of numbers containing prime factors ofa certain relative magnitude. Arkiv for Matematik, Astronomi och Fysik,22:1–14, 1930.

[56] P. Donnelly and G. Grimmett. On the asymptotic distribution of largeprime factors. Journal of the London Mathematical Society, 47:395–404,1993.

[57] P. Donnelly and P. Joyce. Continuity and weak convergence of ranked andsize-biased permutations on the infinite simplex. Stochastic Processes andtheir Applications, 31:89–103, 1989.

[58] P. Donnelly, T. G. Kurtz, and S. Tavare. On the functional central limittheorem for the Ewens Sampling Formula. Annals of Applied Probability,1:539–545, 1991.

[59] R. M. Dudley. Probabilities and metrics. Convergence of laws on metricspaces, with a view to statistical testing. Number 45 in Lecture Notes Series.Matematisk Institut, Aarhus Universitet, Aarhus, 1976.

[60] R. M. Dudley. Real Analysis and Probability. Wadsworth and Brooks/Cole,Pacific Grove, California, 1989.

[61] P. D. T. A. Elliott. Probabilistic Number Theory I, volume 239 ofGrundlehren der math. Wissenschaften. Springer, Berlin, 1979.

[62] P. D. T. A. Elliott. Probabilistic Number Theory II, volume 240 ofGrundlehren der math. Wissenschaften. Springer, Berlin, 1980.

[63] S. Engen. A note on the geometric series as a species frequency model.Biometrika, 62:697–699, 1975.

320 References

[64] P. Erdos and M. Kac. The Gaussian law of errors in the theory of additivenumber theoretic functions. American Journal of Mathematics, 62:738–742,1940.

[65] P. Erdos and P. Turan. On some problems of a statistical group-theory. I.Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 4:175–186, 1965.

[66] P. Erdos and P. Turan. On some problems of a statistical group-theory. III.Acta Mathematica Academiae Scientiarum Hungaricae, 18:309–320, 1967.

[67] W. J. Ewens. The sampling theory of selectively neutral alleles. TheoreticalPopulation Biology, 3:87–112, 1972.

[68] W. Feller. The fundamental limit theorems in probability. Bulletin of theAmerican Mathematical Society, 51:800–832, 1945.

[69] W. Feller. An Introduction to Probability Theory and its Applications.,volume 1. Wiley, third edition, 1968.

[70] S. E. Fienberg and U. E. Makov. Uniqueness and disclosure risk: Urn modelsand simulation. In Bayesian Methods with Applications to Science, Policyand Official Statistics (Selected papers from ISBA 2000), Monographs inOfficial Statistics, pages 135–144. 2001.

[71] P. Flajolet and A. M. Odlyzko. Singularity analysis of generating functions.SIAM Journal on Discrete Mathematics, 3:216–240, 1990a.

[72] P. Flajolet and A. M. Odlyzko. Random mapping statistics. In J.-J.Quisquater, editor, Proc. Eurocrypt ’89, volume 434 of Lecture Notes inComputer Science, pages 329–354. Springer-Verlag, Berlin, 1990b.

[73] P. Flajolet and R. Sedgewick. Analytic Combinatorics. Book in preparation,1999.

[74] P. Flajolet and M. Soria. Gaussian limiting distributions for the number ofcomponents in combinatorial structures. Journal of Combinatorial Theory.Series A, 53:165–182, 1990.

[75] D. Foata. La serie generatrice exponentielle dans les problemes d’enumera-tion. Seminaire de Mathematiques Superieures. Les Presses de l’Universitede Montreal, Montreal, Quebec, 1974.

[76] B. Fristedt. The structure of random partitions of large integers. Transac-tions of the American Mathematical Society, 337:703–735, 1993.

[77] W. M. Y. Goh and E. Schmutz. A central limit theorem on GLn(Fq).Random Structures and Algorithms, 2:47–53, 1991.

[78] W. M. Y. Goh and E. Schmutz. Random matrices and Brownian motion.Combinatorics, Probability and Computing, 2:157–180, 1993.

[79] S. W. Golomb. Research problems 11. random permutations. Bulletin ofthe American Mathematical Society, 70:747, 1964.

[80] V. L. Goncharov. On the distribution of cycles in permutations. DokladyAkademii Nauk SSSR, 35:299–301, 1942.

[81] V. L. Goncharov. Some facts from combinatorics. Izvestia Akademii NaukSSSR, Ser. Mat., 8:3–48, 1944. See also: On the field of combinatoryanalysis. Translations of the American Mathematical Society 19, 1-46.

References 321

[82] L. Gordon. Estimation of large successive samples with unknown inclusionprobabilities. Advances in Applied Mathematics, 14:89–122, 1993.

[83] I. P. Goulden and D. M. Jackson. Combinatorial Enumeration. Wiley, NewYork, 1983.

[84] X. Gourdon. Largest component in random combinatorial structures.Discrete Mathematics, 180:185–209, 1998.

[85] R. C. Griffiths. On the distribution of allele frequencies in a diffusion model.Theoretical Population Biology, 15:140–158, 1979.

[86] R. C. Griffiths. On the distribution of points in a Poisson-Dirichlet process.Journal of Applied Probability, 25:336–345, 1988.

[87] B. M. Hambly, P. Keevash, N. O’Connell, and D. Stark. The characteristicpolynomial of a random permutation matrix. Stochastic Processes and theirApplications, 90:335–346, 2000.

[88] J. C. Hansen. A functional central limit theorem for random mappings.Annals of Probability, 17:317–332, 1989.

[89] J. C. Hansen. A functional central limit theorem for the Ewens SamplingFormula. Journal of Applied Probability, 27:28–43, 1990.

[90] J. C. Hansen. Factorization in Fq[x] and Brownian motion. Combinatorics,Probability and Computing, 2:285–299, 1993.

[91] J. C. Hansen. Order statistics for decomposable combinatorial structures.Random Structures and Algorithms, 5:517–533, 1994.

[92] J. C. Hansen and J. Jaworski. Large components of bipartite randommappings. Random Structures and Algorithms, 17:317–342, 2000.

[93] J. C. Hansen and E. Schmutz. How random is the characteristic poly-nomial of a random matrix? Mathematical Proceedings of the CambridgePhilosophical Society, 114:507–515, 1993.

[94] M. H. Hansen and W. N. Hurwitz. On the theory of sampling from finitepopulations. Annals of Mathematical Statistics, 14:333–362, 1943.

[95] F. Harary and E. M. Palmer. Graphical Enumeration. Academic Press,London, 1973.

[96] G. H. Hardy and S. Ramanujan. Asymptotic formulae in combinatoryanalysis. Proceedings of the London Mathematical Society, 17:75–115, 1918.

[97] G. H. Hardy and E. M. Wright. An Introduction to the Theory of Numbers.Clarendon Press, England, 1960. 4th Edition.

[98] D. Hensley. The convolution powers of the Dickman function. Journal ofthe London Mathematical Society, 33:395–406, 1986.

[99] A. Hildebrand. The asymptotic behavior of the solutions of a classof differential-difference equations. Journal of the London MathematicalSociety, 42:11–31, 1990.

[100] A. Hildebrand and G. Tenenbaum. On a class of differential-differenceequations arising in number theory. Journal d’Analyse Matematique,61:145–179, 1993.

[101] U. M. Hirth. A Poisson approximation for the Dirichlet law, the Ewenssampling formula and the Griffiths-Engen-McCloskey law by the Stein-Chen coupling method. Bernoulli, 3:225–232, 1997.

322 References

[102] L. Holst. Two conditional limit theorems with applications. Annals ofStatistics, 7:551–557, 1979a.

[103] L. Holst. A unified approach to limit theorems for urn models. Journal ofApplied Probability, 16:154–162, 1979b.

[104] L. Holst. Some conditional limit theorems in exponential families. Annalsof Probability, 9:818–830, 1981.

[105] H.-K. Hwang. Theoremes limites pour les structures combinatoires etles fonctions arithmetiques. PhD thesis, L’Ecole Polytechnique Palaiseau,1994.

[106] H.-K. Hwang. Asymptotic expansions for the Stirling numbers of the firstkind. Journal of Combinatorial Theory. Series A, 71:343–351, 1995.

[107] H.-K. Hwang. On convergence rates in the central limit theorems forcombinatorial structures. European Journal of Combinatorics, 19:329–343,1998a.

[108] H.-K. Hwang. A Poisson ∗ geometric convolution law for the number of com-ponents in unlabelled combinatorial structures. Combinatorics, Probabilityand Computing, 7:89–110, 1998b.

[109] H.-K. Hwang. Large deviations of combinatorial distributions. II. local limittheorems. Annals of Applied Probability, 8:163–181, 1998c.

[110] H.-K. Hwang. Asymptotics of Poisson approximation to random dis-crete distributions: an analytic approach. Advances in Applied Probability,31:448–491, 1999.

[111] T. Ignatov. A constant arising in the asymptotic theory of symmetricgroups, and Poisson–Dirichlet measures. Theory of Probability and itsApplications, 27:136–147, 1982.

[112] G. James and A. Kerber. The Representation Theory of the SymmetricGroup, volume 16 of Encyclopedia of Mathematics and its Applications.Addison-Wesley, Reading, Massachusetts, 1981.

[113] S. Janson. Cycles and unicyclic components in random graphs. Preprint,2000.

[114] N. S. Johnson, S. Kotz, and N. Balakrishnan. Discrete MultivariateDistributions. Wiley, New York, 1997.

[115] A. Joyal. Une theorie combinatoire des series formelles. Advances inMathematics, 42:1–82, 1981.

[116] M. Kac. Statistical Independence in Probability, Analysis and Number The-ory, volume 12 of The Carus Mathematical Monographs. Wiley, New York,1959.

[117] L. Katz. Probability of indecomposability of a random mapping function.Annals of Mathematical Statistics, 26:512–517, 1955.

[118] F. P. Kelly. Reversibility and Stochastic Networks. Wiley, New York, 1979.

[119] J. F. C. Kingman. Random discrete distributions. Journal of the RoyalStatistical Society, Series B, 37:1–22, 1975.

[120] J. F. C. Kingman. The population structure associated with the Ewenssampling formula. Theoretical Population Biology, 11:274–283, 1977.

References 323

[121] J. Knopfmacher. Abstract Analytic Number Theory, volume 12 of North-Holland Mathematical Library. North-Holland Publishing Company, Ams-terdam, 1975.

[122] J. Knopfmacher. Analytic Arithmetic of Algebraic Function Fields, vol-ume 50 of Lecture Notes in Pure and Applied Mathematics. Marcel Dekker,Inc., New York and Basel, 1979.

[123] D. E. Knuth and L. Trabb Pardo. Analysis of a simple factorizationalgorithm. Theoretical Computer Science, 3:321–348, 1976.

[124] V. F. Kolchin. A problem of the allocation of particles in cells and cycles ofrandom permutations. Theory of Probability and its Applications, 16:74–90,1971.

[125] V. F. Kolchin. A problem of the allocation of particles in cells and randommappings. Theory of Probability and its Applications, 21:48–63, 1976.

[126] V. F. Kolchin. A new proof of asymptotic lognormality of the orderof a random substitution. In Proceedings Combinatorial and Asymptoti-cal Analysis, pages 82–93. Krasnoyarsk State University Press, 1977. (InRussian).

[127] V. F. Kolchin. Random Mappings. Optimization Software, Inc., New York,1986.

[128] V. F. Kolchin. Random Graphs. Number 53 in Encyclopedia of Mathematicsand its Applications. Cambridge University Press, Cambridge, 1999.

[129] V. F. Kolchin, B. A. Sevastyanov, and V. P. Chistyakov. RandomAllocations. Halsted Press, Washington, 1978.

[130] J. Kubilius. Probabilistic Methods in the Theory of Numbers, volume 11of Translations of Mathematical Monographs. American MathematicalSociety, Providence, RI, 1964.

[131] T. G. Kurtz. Strong approximation theorems for density dependent Markovchains. Stochastic Processes and their Applications, 6:223–240, 1978.

[132] J. C. Lagarias. Beurling generalized integers with the Delone property.Forum Mathematicum, 11:295–312, 1999.

[133] E. Landau. Handbuch der Lehre von der Verteilung der Primzahlen.Teubner, Leipzig, 1909. 3rd edition: Chelsea, New York, 1974.

[134] T. Lindvall. Lectures on the Coupling Method. Wiley, New York, 1992.

[135] J. H. van Lint and R. M. Wilson. A Course in Combinatorics. CambridgeUniversity Press, Cambridge, 1992.

[136] M. Loeve. Probability Theory, volume 1. Springer, New York, 4th edition,1977a.

[137] M. Loeve. Probability Theory, volume 2. Springer, New York, 4th edition,1977b.

[138] J. van de Lune and E. Wattel. On the numerical solution of a differential-difference equation arising in analytic number theory. Mathematics ofComputation, 23:417–421, 1969.

[139] W. C. Lynch. More combinatorial properties of certain trees. ComputerJournal, 7:299–302, 1965.

324 References

[140] H. M. Mahmoud. Evolution of Random Search Trees. Wiley-Interscience,New York, 1992.

[141] J. W. McCloskey. A model for the distribution of individuals by species inan environment. PhD thesis, Michigan State University, 1965.

[142] A. Meir and J. W. Moon. On random mapping patterns. Combinatorica,4:61–70, 1984.

[143] N. Metropolis and G.-C. Rota. Witt vectors and the algebra of necklaces.Advances in Mathematics, 50:95–125, 1983.

[144] H. Midzuno. On the sampling system with probability proportionate to sumof sizes. Annals of the Institute of Statistical Mathematics, Tokyo, 3:99–107,1952.

[145] P.-R. de Montmort. Essai d’Analyse sur les Jeux de Hasard. Paris, 1708.

[146] J. W. Moon. Counting Labelled Trees. Canadian Mathematical Mono-graphs, Vol. 1. Clowes and Sons, London, 1970.

[147] L. Moser and M. Wyman. An asymptotic formula for the Bell numbers.Transactions of the Royal Society of Canada. Section III, 49:49–54, 1955.

[148] L. Moser and M. Wyman. Asymptotic development of the Stirling numbersof the first kind. Journal of the London Mathematical Society, 33:133–146,1958.

[149] L. R. Mutafciev. Limit theorem concerning random mapping patterns.Combinatorica, 8:345–356, 1988.

[150] L. R. Mutafciev. Large components and cycles in a random mapping pat-tern. In M. Karonski, J. Jaworski, and A. Rucinski, editors, Random Graphs’87, pages 189–202, New York, 1990. John Wiley and Sons.

[151] L. R. Mutafciev. The largest tree in certain models of random forests.Random Structures and Algorithms, 13:211–228, 1998.

[152] J.-L. Nicolas. A gaussian law on FQ[X]. Colloquia Mathematica SocietatisJanos Bolyai, 34:1127–1162, 1984.

[153] J.-L. Nicolas. Distribution statistique de l’ordre d’un element du groupesymetrique. Acta Mathematica Hungarica, 45:69–84, 1985.

[154] A. Nijenhuis and H. S. Wilf. Combinatorial Algorithms. Academic Press,Inc., Orlando, FL., second edition, 1978.

[155] A. M. Odlyzko. Asymptotic enumeration methods. In Handbook ofCombinatorics, pages 1063–1229. Elsevier, Amsterdam, 1995.

[156] R. Otter. The number of trees. Annals of Mathematics, 49:583–599, 1948.

[157] E. M. Palmer and A. J. Schwenk. On the number of trees in a randomforest. Journal of Combinatorial Theory, Series B, 27:109–121, 1979.

[158] D. Panario and B. Richmond. Exact largest and smallest size of compo-nents. Algorithmica, 31:413–432, 2001.

[159] G. P. Patil and C. Taillie. Diversity as a concept and its implicationsfor random communities. Bulletin of the International Statistical Institute,XLVII:497–515, 1977.

[160] A. I. Pavlov. On a theorem by Erdos and Turan. Problems of Cybernetics,64:57–66, 1980. (In Russian).

References 325

[161] A. I. Pavlov. Local limit theorems for the number of components of randompermutations and mappings. Theory of Probability and its Applications,33:183–187, 1988.

[162] Y. L. Pavlov. Random Forests. VSP, The Netherlands, 2000.

[163] M. Perman, J. W. Pitman, and M. Yor. Size-biased sampling of Pois-son point processes and excursions. Probability Theory and Related Fields,92:21–39, 1992.

[164] W. Philipp. Arithmetic functions and Brownian motion. In Analytic Num-ber Theory, pages 233–246. American Mathematical Society, Providence,RI, 1973.

[165] J. W. Pitman. Some developments of the blackwell-macqueen urn scheme.In T. S. Ferguson, Shapley L. S., and Macqueen J. B., editors, Statistics,Probability and Game Theory, volume 30 of IMS Lecture Notes-MonographSeries, pages 245–267. Institute of Mathematical Statistics, Hayward, CA.,1996.

[166] J. W. Pitman. Some probabilistic aspects of set partitions. AmericanMathematical Monthly, 104:201–209, 1997.

[167] J. W. Pitman. Enumerations of trees and forests related to branching pro-cesses and random walks. In D. Aldous and J. Propp, editors, Microsurveysin Discrete Probability, volume 41, pages 163–180. American MathematicalSociety, Providence, RI, 1998.

[168] J. W. Pitman. Random mappings, forests, and subsets associated withAbel-Cayley-Hurwitz multinomial expansions. Seminaire Lotharingien deCombinatoire, 46, 2001. Article B46h.

[169] J. W. Pitman. Combinatorial stochastic processes. Draft of lecture notesfor Saint-Flour 2002, 2002.

[170] J. W. Pitman and M. Yor. The two-parameter Poisson-Dirichlet distribu-tion derived from a stable subordinator. Annals of Probability, 25:855–900,1997.

[171] B. G. Pittel. On a likely shape of the random Ferrers diagram. Advancesin Applied Mathematics, 18:432–488, 1997a.

[172] B. G. Pittel. Random set partitions: asymptotics of subset counts. Journalof Combinatorial Theory. Series A, 79:326–359, 1997b.

[173] H. Rademacher. On the partition function p(n). Proceedings of the LondonMathematical Society, 43:241–254, 1937.

[174] A. Renyi. On the density of certain sequences of integers. Academie Serbedes sciences: publications de l’institut mathemathematique, 8:157–162, 1955.

[175] A. Renyi. On the outliers of a series of observations. A Magyar TudomanyosAkademia Matematikai es Fizikai Tudomanyok Osztalyanak Kozlemenyei,12:105–121., 1962. Reprinted in Selected papers of Alfred Renyi, Vol. 3, pp.50-65, 1976. Published by Akademiai Kiado.

[176] R. J. Riddell Jr. and G. E. Uhlenbeck. On the theory of the virial develop-ment of the equation of state of mono-atomic gases. Journal of ChemicalPhysics, 21:2056–2064, 1953.

326 References

[177] V. N. Sachkov. Random partitions of sets. Theory of Probability and itsApplications, 19:184–190, 1974.

[178] V. N. Sachkov. Probabilistic Methods in Combinatorial Analysis, volume 56of Encyclopedia of Mathematics and its Applications. Cambridge UniversityPress, Cambridge, 1997.

[179] S. M. Samuels. A Bayesian, species-sampling-inspired approach to theuniques problem in microdata disclosure risk assessment. Journal of OfficialStatistics, 14:373–383, 1998.

[180] H. Scheffe. A useful convergence theorem for probability distributions.Annals of Mathematical Statistics, 18:434–438, 1947.

[181] A. Selberg. Note on the paper by L. G. Sathe. Journal of the IndianMathematical Society, 18:83–87, 1954.

[182] L. A. Shepp and S. P. Lloyd. Ordered cycle lengths in a random permu-tation. Transactions of the American Mathematical Society, 121:340–357,1966.

[183] N. J. S. Sloane and S. Plouffe. The Encyclopedia of Integer Sequences. Aca-demic Press, San Diego, 1995. The The On-Line Encyclopedia of Integer Se-quences may be found at http://www.research.att.com/ njas/sequences/index.html.

[184] A. J. Stam. Distance between sampling with and without replacement.Statistica Neerlandica, 32:81–91, 1978.

[185] R. P. Stanley. Enumerative Combinatorics, volume 1. Wadsworth Brooks/Cole, Pacific Grove, CA., 1986.

[186] D. Stark. Total variation distance for independent process approxima-tions of random combinatorial objects. PhD thesis, University of SouthernCalifornia, 1994.

[187] D. Stark. Explicit limits of total variation distance in approxima-tions of random logarithmic assemblies by related Poisson processes.Combinatorics, Probability and Computing, 6:87–106, 1997a.

[188] D. Stark. Total variation asymptotics for independent process approxi-mations of logarithmic multisets and selections. Random Structures andAlgorithms, 11:51–80, 1997b.

[189] C. Stein. The order of a random permutation. Unpublished manuscript.

[190] V. E. Stepanov. Limit distributions of certain characteristics of randommappings. Theory of Probability and its Applications, 14:612–626, 1969.

[191] R. Stong. Some asymptotic results on finite vector spaces. Advances inApplied Mathematics, 9:167–199, 1988.

[192] S. Tavare. The birth process with immigration, and the genealogical struc-ture of large populations. Journal of Mathematical Biology, 25:161–168,1987.

[193] G. Tenenbaum. Introduction to Analytic and Probabilistic Number The-ory, volume 46 of Cambridge Studies in Advanced Mathematics. CambridgeUniversity Press, 1995.

[194] G. Tenenbaum. Crible d’Eratosthene et modele de Kubilius. In NumberTheory in Progress (Zakopane-Koscielisko, 1997), volume 2, pages 1099–1129. De Gruyter, Berlin, 1999.

References 327

[195] G. Tenenbaum. A rate estimate in Billingsley’s theorem for the size dis-tribution of large prime factors. Oxford Quarterly Journal of Mathematics,51:385–403, 2000.

[196] A. M. Vershik. The asymptotic distribution of factorizations of naturalnumbers into prime divisors. Soviet Mathematics. Doklady, 34:57–61, 1987.

[197] A. M. Vershik and A. A. Shmidt. Limit measures arising in the asymptotictheory of symmetric groups. I. Theory of Probability and its Applications,22:70–85, 1977.

[198] W. Vervaat. Success Epochs in Bernoulli Trials with Applications in Num-ber Theory. Mathematical Center Tracts, vol. 42. Mathematisch Centrum,Amsterdam, 1972.

[199] G. A. Watterson. The sampling theory of selectively neutral alleles.Advances in Applied Probability, 6:463–488, 1974.

[200] G. A. Watterson. The stationary distribution of the infinitely-many neutralalleles diffusion model. Journal of Applied Probability, 13:639–651, 1976.

[201] F. S. Wheeler. Two differential-difference equations arising in numbertheory. Trans. Amer. Math. Soc., 318:491–523, 1990.

[202] P. Whittle. Statistical processes of aggregation and polymerisation.Proceedings of the Cambridge Philosophical Society, 61:475–495, 1965.

[203] P. Whittle. Nonlinear migration processes. Bulletin of the InternationalStatistical Institute, 42:642–647, 1967.

[204] P. Whittle. Systems in stochastic equilibrium. John Wiley and Sons,Chichester, 1986.

[205] K. L. Wieand. Eigenvalue distributions of random matrices in the per-mutation group and compact Lie groups. PhD thesis, Harvard University,1998.

[206] K. L. Wieand. Eigenvalue distributions of random permutation matrices.Annals of Probability, 28:1563–1587, 2000.

[207] H. Wilf. Three problems in combinatorial asymptotics. Journal of Combi-natorial Theory. Series A, 35:199–207, 1983.

[208] H. Wilf. Generatingfunctionology. Academic Press, San Diego, CA, 1990.

[209] W.-B. Zhang. The prime element theorem in additive arithmetic semi-groups, I. Illinois Journal of Mathematics, 40:245–280, 1996a.

[210] W.-B. Zhang. Probabilistic number theory in additive arithmetic semi-groups I. In B. C. Berndt, H. G. Diamond, and A. J. Hildebrand.,editors, Analytic number theory: proceedings of a conference in honor ofHeini Halberstam, volume 139 of Progress in Mathematics, pages 839–885.Birkhauser, Boston, Mass., 1996b.

[211] W.-B. Zhang. The prime element theorem in additive arithmetic semi-groups, II. Illinois Journal of Mathematics, 42:198–229, 1998.


NOTATION INDEX 329

Notation Index

1l · , 3a1, 141a2, 141Ar, 16

c(r.s), 140c(w0), 220mC, 140C(n), vii, 3, 26Ci

(n), vii, 3, 26C(b,n), 153C∗(n), 57, 134, 140

dBW , 195, 209, 214dK , 209, 214dTV , 7, 58, 209dW , 209, 214Dn, 12

Eij , 138E∗ij , 138jEn, 139ESF(θ), 51

FBL, 190fθ, 14, 95f[r]

θ, 94fθ

(r), 14, 62, 100F ?, 69Fij , 138F ∗ij , 138Fθ, 95

G, 35, 185Gα, 63g1, 141g2, 141G(n), 185GEM(θ), 94GF(q), 34

h(n+ 1), 13

K0n, vii, 10, 20, 26Kvm( · ), 26, 135

Lr, 14Lr

(n), 14

mg, 162mi, 28, 44M, 107Mn, 106

On, 16

p−i , 139p(n), 27, 44pθ, 73pθ

(α)(x), 77Pθ, 72Pθ

(α), 77, 262Pbl, 288Pbl∗ , 288

PD(θ), 14, 104Po(λ), 13

Q∗n, 172

ri, 138r−i , 139rvs

(m), 275Rnb, 172R(n, c), 38

Sn, 2S, 213Sm, 208S(n), 139Sn

(k), 10

T0n, 7, 26T ∗bn, 113Tvm( · ), 60, 135

ul(b, s), 139

330 References

u∗l , 139Uj(l), 185

wg, 164w0, 219W(n), 201jW(n), 201

x(n), 10x[r], 3X?, 69X(n), 185jX(n), 199mX(n), 204X∗(n), 203mX∗(n), 204Xθ, 71Xθ

(α), 76

Yr, 13Yr

(n), 12

Zj , 6, 17, 19, 26, 111Zij , 138mZ, 138mZ[a, b], 139jZj , 198Z∗, 112Z∗j , 112mZ∗, 140Z(b,n), 153EZ(b,n), 152αθ, 172β0, 61β01, 152β02, 181β1, 62β11, 151β12, 237β2, 150∆, 57∆i, 138εil, 138ε∗il, 138εr.s, 140

θ, viii, 56, 111θ, 99, 140θi, 61Eθ, 111κ(v, s,m), 267λbn, 172, 301λ∗0n, 174µi, 138µ∗i , 138µ∗0n, 175νi, 138ν∗i , 138ξj , 10ρα, 170ρ( · ), 14ρi, 138ρ∗i , 138φαl , 139φr.s, 140χ, 115, 154, 296χ0, 119χi1

(α), 139χi2

(α), 139Ψ(n), 63, 162Ψ∗, 63, 162ω( · ), 13ωθ( · ), 91∂, 35, 185

AUTHOR INDEX 331

Author Index

Aldous, D., 34Aldous, D. J., 31, 160Arratia, R., xii, 7, 8, 13, 14, 17,

20–22, 34, 38, 39, 87, 90,91, 102, 109, 126, 129,151, 156, 158–160, 171

Bach, E., xi, 22Balakrishnan, N., 52Barban, M. B., 20Barbour, A. D., xii, 8, 9, 11, 12,

14, 17, 34, 58, 87, 109,158–160, 171, 174, 176,183, 208, 287

Barouch, E., 70Barton, D., 6Bell, J. P., 119Bender, E. A., 119Bergeron, F., ixBest, M. R., 17Beurling, A., 36Billingsley, P., xi, 19, 21, 22Bollobas, B., 29, 30, 161Bovey, J. D., 17Brenti, F., 171Buchstab, A. A., xi, 13

Cameron, P. J., 119Car, M., 147, 149, 171, 280Cayley, A., 32Chen, L. H. Y., 64, 208Chistyakov, V. P., ixCsorgo, M., 160, 202

Daley, D. J., 107David, F. N., 6Delange, H., 20DeLaurentis, J. M., 12, 17, 156Devroye, L., 52Diaconis, P., viii, ix, 9, 30, 35, 51Diamond, H. G., 36Dickman, K., xi, 14

Donnelly, P., xi, 22, 90, 105, 156Dudley, R. M., 58, 191

Elliott, P. D. T. A., xi, 20Engen, S., 94, 105Erdos, P., xi, 16, 21, 102, 183, 184Ewens, W. J., 52, 68

Feller, W., 8, 11, 87Fienberg, S. E., 52Flajolet, P., ix, 31, 34, 36, 38, 39Foata, D., ix, 38Freedman, D., viiiFristedt, B., ix, 28

Goh, W. M. Y., 36, 156, 171Golomb, S. W., 14Goncharov, V. L., 4, 6, 11, 13, 171Gordon, L., 69, 149Goulden, I. P., ixGourdon, X., ixGriffiths, R. C., 76, 94, 104Grimmett, G., xi, 22

Hall, P. G., 11, 171Hambly, B. M., 17Hansen, J. C., ix, 34, 36, 90, 117,

129, 156, 160, 171Hansen, M. H., 69Harary, F., 34Hardy, G. H., 23, 27Hensley, D., 76Hildebrand, A., 76Hirth, U. M., 176Holst, L., viii, 58, 174, 176, 287Hurwitz, W. N., 69Hwang, H.-K., ix, 11, 49, 119,

171, 179

Ignatov, T., 97, 105

Jackson, D. M., ixJames, G., 48, 49

332 References

Janson, S., 58, 161, 174, 176, 287Jaworski, J., 160Johnson, N. S., 52Joyal, A., ix, 38, 39Joyce, P., 105

Kac, M., 21Katz, L., 30Kaufman, G. M., 70Keevash, P., 17Kelly, F. P., ix, 54Kerber, A., 48, 49Kingman, J. F. C., 14, 95, 101,

104, 160Knopfmacher, J., ix, 35, 185Knuth, D. E., xi, 22, 23Kolchin, V. F., ix, 6, 11, 13, 17,

31, 34, 171Kotz, S., 52Kubilius, J., xi, 20, 21Kurtz, T. G., 90, 156, 159

Labelle, G., ixLagarias, J. C., 36Landau, E., 20, 89, 118Leroux, P., ixLindvall, T., 58Lint, J. H. van, 35, 38Lloyd, S. P., ix, 11, 13, 17Loeve, M., 187, 192, 200, 201Loh, W.-L., 208Lune, J. van de, 76Lynch, W. C., 52

Mahmoud, H. M., 52Makov, U. E., 52McCloskey, J. W., 94, 105McGrath, M., 35Meir, A., 32Metropolis, N., 35Midzuno, H., 69Montmort, P.-R. de, 1Moon, J. W., 32Moser, L., 11, 28Mutafciev, L. R., 32, 160, 171

Mutavciev, L. R., 34

Nicolas, J.-L., 17, 184Nijenhuis, A., 27–29, 34

O’Connell, N., 17Odlyzko, A., ix, 31, 194Otter, R., 33

Palmer, E. M., 33, 34Panario, D., ix, 13Patil, G. P., 105Pavlov, A. I., 11, 17, 171Pavlov, Y. L., 34Perman, M., 105Philipp, W., 21Pitman, J. W., ix, 9, 28, 30, 31,

34, 35, 51, 105, 106, 110Pittel, B. G., ix, 12, 17, 28, 29,

156Plouffe, S., 33

Renyi, A., 8, 11, 21Revesz, P., 160, 202Rademacher, H., 28Ramanujan, S., 23, 28Richmond, L. B., ix, 13, 119Riddell, R. J., 38Rota, G.-C., 35

Sachkov, V. N., 29Samuels, S. M., 52Scheffe, H., 101, 117Schmutz, E., 36, 129, 156, 160,

171Schwenk, A. J., 33, 34Sedgewick, R., ixSelberg, A., 20Sevast’yanov, B. A., ixShepp, L. A., ix, 11, 13, 17Shmidt, A. A., 14, 97, 106Sloane, N. J. A., 33Soria, M., ix, 34, 36, 38, 39Stam, A. J., viiiStanley, R. P., ixStark, D., 17, 20, 49, 151, 154

AUTHOR INDEX 333

Stein, C., 17, 64, 208Stepanov, V. E., 31, 171Stong, R., 36

Taillie, C., 105Tavare, S., xii, 7, 8, 12–14, 17,

34, 38, 39, 52, 87, 90, 91,102, 104, 109, 126, 129,151, 156, 158–160, 171,183

Tenenbaum, G., xi, 13, 18, 20, 22,76, 142

Trabb Pardo, L., xi, 22, 23Turan, P., xi, 16, 102, 183

Vere-Jones, D., 107Vershik, A. M., xi, 14, 97, 106Vervaat, W., 73, 76Vinogradov, A. I., 20

Wattel, E., 76Watterson, G. A., ix, 68, 76, 96,

97, 160Wheeler, F. S., 76Whittle, P., ix, 54Wieand, K. L., 17Wilf, H. S., 12, 27–29, 34, 38Wilson, R. M., 35, 38Wright, E. M., 27Wyman, M., 11, 28

Yor, M., 105, 106, 110

Zhang, W.-B., ix, 35, 185, 186,192, 194, 196, 199

334 References

Subject Index

Pθ, 257, 262Xθ, 71–76, 107, 113, 136, 137,

213, 257–263density pθ, 73, 108

Xθ(α), 76–80, 109, 113

density pθ(α), 77, 109pθ, 95, 137

additive functions, 185–207additive semigroup

arithmetic, 35, 185–207Erdos–Wintner theorem, 187

rate, 195Kubilius Main Theorem, 199

functional, 201functional rate, 202

regular variationconvergence, 205functional rate, 206

slow variationconvergence, 199functional rate, 202

age-ordering, 15approximation

Brownian, 156–160group order, 183

discrete, 134global, 61, 84

all components, 152, 153large components, 151small components, 150

local, 62, 136large components, 153small components, 153

normal, 171group order, 183

Poisson, 171, 175, 304Poisson-Dirichlet, 160–170total variation, 209

approximation of m−1Tvm(Z) byXθ

Kolmogorov, 262

Wasserstein, 262approximation ofm−1Tvm(Z∗) by

Xθ

in distribution, 257–263Kolmogorov, 260Wasserstein, 259

approximation ofm−1Tvm(Z∗) byXθ

(α)

v/m ∼ α, 262approximation of T0m(Z) by T0m(Z∗)

Kolmogorov, 256total variation, 252

approximation of Tvm(Z) by T0m(Z∗)Kolmogorov, any v, 254Wasserstein, any v, 257

approximation of Tvm(Z) by Tvm(Z∗)in distribution, 247–257interval probabilities, 270point probabilities, 267–275

from interval probabilities,268

large v, 273main theorem, 271ratios, 275uniform bound, 273

total variation, large v, 248Wasserstein, large v, 249

approximation of Tvm(Z) by Xθ

in distribution, 247–263point probabilities, 264–278

approximation of Tvm(Z∗) by Xθ

point probabilities, 276–278assemblies, 37–40, 46, 50, 51, 61–

63, 119, 124–126logarithmic, 43, 61

asymptotic independence, 57, 62

Bell numbers, 28Bernoulli

random variables, 10, 47, 130,141, 171, 218

representation, 10, 11, 88

SUBJECT INDEX 335

Beta random variables, 94, 105,106, 108

Beurling generalized primes, 36binomial random variables, 42Brenti’s representation, 171Brownian motion, 12, 201Buchstab’s function

generalized, 78–80Buchstab’s function ω, 13, 78–80

card shuffling, 35central limit theorem, 90

functional, 11, 21, 90, 156–160, 201

permutations, 11prime factorizations, 21

coagulation, 54–55coloring, 47–49completely additive, 185component spectrum, vi, 26components

fixed number of, 89, 119large, 57, 62–63, 134–137,

151, 187largest, 116number of, 170–183

functional limit theorem,201

small, 57, 62–63, 83–87, 137–138, 150, 186

smallest, 114Conditioning Relation, vii, 7, 17,

23, 25, 39, 59, 68, 84, 98,100, 111, 135, 140

general statement, 26, 56conditions

G,A,D,B, 141general, 140–141

conventionsgeneral, 140

coupling, 166–170Feller, 87, 165operational time, 167Poisson process, 166, 167

decomposable, 26derangement, 5Dickman’s function ρ, 14dissected representation, 138, 140,

148, 151distance

bounded Wasserstein, 159,171, 194, 209, 214

Kolmogorov, 209, 214total variation, 7, 58–61, 209Wasserstein, 170, 209, 214

distinctcomponent sizes, 49parts of a multiset, 49

divisor function, 142

Erdos–Turan theoremfunctional rate, 183rate, 183

Erdos–Wintner theorem, 187Ewens Sampling Formula, viii, 3,

51–55, 57, 67–110, 134,140, 152, 161, 165, 170,183, 187

exponential random variables, 104,108

factorialfalling, 3moments, 4, 50, 68rising, 10

Feller coupling, 67, 87, 165finite fields

random nonsingular matri-ces, 36

random polynomials, 34functional central limit theorem

general, 156–160permutations, 11, 90prime factorizations, 21

GEMdensity, 94, 105distribution, 16, 22, 94, 104–

110

336 References

process, 104–110generalized primes, 36generating function, 10generator approach, 210geometric random variables, 19

harmonic number h(n+1), 13, 72hat–check problem, 2

immigration-death process, 210–212

indicator notation, 3infinitely divisible, 136, 139–141,

196, 218, 223infinitesimal generator, 210, 214integer partitions, 27invariance principle

discrete, 134

Kolmogorovdistance, 209, 214three series criterion, 187

Kubiliusfundamental lemma, 20Main Theorem, 199

Landau’s formula, 20, 89, 118large deviations, 149

fixed number of components,88–90, 118–123

number of components, 179–183

smallest components, 115–116

smallest cycles, 13limit distribution, 6limit theorem

central, 11, 21, 90functional, 11, 21, 62–63, 90,

156–160local, 10, 62, 81, 82, 88, 98,

100, 113, 116, 136, 137,265, 266

LLT, 113–119, 123–132, 136, 161,265, 266, 277

definition, 113sharpening, 276

local limit theorem, 62, 81, 82, 88,98, 100, 113, 116, 136,137, 265, 266

permutations, 10logarithmic class, 42, 84, 134–145Logarithmic Condition, viii, 111–

133, 137, 140general statement, 56

Markov process, 210–212measures of smallness, 138–139Mertens’ theorem, 25multisets, 28, 38–41, 47, 51, 61–

63, 119, 126–128, 147,185, 193

logarithmic, 43, 61

necklaces, 34negative binomial

random variables, 41, 141normal approximation, 171

partitionsinteger, 27set, 28

permutations, 1–18, 29, 51age-ordering, 15–16canonical cycle notation, 2Cauchy’s formula, 3, 23, 89cycle type, 3, 51, 68distinct cycle lengths, 12functional central limit theo-

rem, 11, 90group order, 16, 102, 183–

184limit distribution, 6local limit theorem, 10longest cycles, 13–15, 95–102moments, 3number of cycles, 10, 51, 88–

90Poisson approximation, 11

ordered cycles, 93–94

SUBJECT INDEX 337

short cycles, 83shortest cycles, 12–13, 87,

91–93size-biased, 105total variation distance, 7–8

point probabilitiesapproximation, 264–278bounds, 221–246bounds on differences, 229–

246first bound, 223large argument, 225–229second bound, 225successive differences

first bound, 231first uniform bound, 234second bound, 237second uniform bound, 244simplified bound, 246

Poissonapproximation, 11, 64–65, 171,

175, 304compound, 65–66, 136, 208–

220Poisson process, 6, 104–110, 212,

215, 216coupling, 166, 167scale invariant, 106–110, 165,

262spacings, 105translation invariant, 107

Poisson random variables, 40, 68,112, 140, 141, 208, 301

Poisson-Dirichletapproximation, 160–170density, 14, 99distribution (L1, L2, . . .), 14,

102, 104–110limit, 22, 57, 63process, 63, 104–110, 176

population genetics, 52prime factorizations, 18–22

conditioning relation, 23–25functional central limit theo-

rem, 21

generalized, 36Mertens’ theorem, 25number of factors, 20, 185

asymptotics, 20central limit theorem, 21

Poisson-Dirichlet, 22size-biased permutation, 22total variation distance, 20

prime factorslargest, 22smallest, 22

pure death process, 210–212

random graphs, 29random mapping patterns, 31,

43, 171random mappings, 30, 149, 171random polynomials, 34, 36, 43,

147, 149, 150, 171, 184refinement, 45–47regular variation, 203rising factorial, 10

Selberg–Delange method, 20selections, 39, 41–42, 47, 51, 61–

63, 119, 128–129logarithmic, 43, 61

set partitions, 28singularity theory, 194size-biased permutation

cycle lengths of permutations,15

prime factorizations, 22size-biasing, 69–71, 74, 79–82, 84,

105, 125, 126, 129, 136,190, 194, 229, 307

equation, 71, 112Stein analogue, 219

slow variation, 196species, 37Stein Equation, 210

for Pθ, 214for Pθ(α), 262for T0m(Z∗), 208, 250, 270

Stein Operator

338 References

for Pθ, 213for T0m(Z∗), 208, 216, 250

Stein’s method, 64–66, 130–132,136, 137, 208–220, 229

for Pθ, 213–216, 257–263, 302for T0m(Z∗), 208–213, 250–

257Stein-Chen method, 64–65, 174,

305Stirling numbers, 10

asymptotics, 10–11strongly additive, 185subsets, 39

theta, viiitilting, 49–55, 145–147, 179–183total variation

approximation, 209asymptotics, 154

trees, 32search, 52–54

uniform random variables, 108uniform structures, 37

Wasserstein distance, 170bounded, 159, 171, 194

wreath products, 47, 48

Documents

This is page i Logarithmic Combinatorial Structures: a ..._Publications...This is page i Printer: Opaque this Logarithmic Combinatorial Structures: a Probabilistic Approach Richard