THE UBIQ UITOUS DIGIT A L TR EEmcw/Teaching/refs/tries/trie... · 2006-03-16 · THE UBIQ UITOUS DIGIT A L TR EE PH ILI PPE FLAJ O L ET Abstra ct. Th e digit al tree als o kno wn

THE UBIQUITOUS DIGITAL TREE

PHILIPPE FLAJOLET

Abstract. The digital tree also known as trie made its first appearance as ageneral-purpose data structure in the late 1950’s. Its principle is a recursivepartitioning based on successive bits or digits of data items. Under variousguises, it has then surfaced in the management of very large data bases, in thedesign of e!cient communication protocols, in quantitative data mining, in theleader election problem of distributed computing, in data compression, as wellas in some corners of computational geometry. The algorithms are invariablyvery simple, easy to implement, and in a number of cases surprisingly e!-cient. The corresponding quantitative analyses pose challenging mathematicalproblems and have triggered a flurry of research works. Generating functionsand symbolic methods, singularity analysis, the saddle-point method, transferoperators of dynamical systems theory, and the Mellin transform have all beenfound to have a bearing on the probabilistic behaviour of trie algorithms. Weo"er here a perspective on the rich algorithmic, analytic, and probabilistic as-pects of tries, culminating with a connection between a sorting problem andthe Riemann hypothesis.

Invited lecture, STACS06, Marseille, February 2006.Proceedings in Lecture Notes in Computer Science.

While, in the course of the 1980s and 1990s, a large portion of the theoreticalcomputer science community was massively engaged in worst-case design and anal-ysis issues, the discovery of e!cient algorithms continued to make tangible progress.Such algorithms are often based on simple and elegant ideas, and, accordingly, theirstudy is likely to reveal structures of great mathematical interest. Also, e!ciencyis much better served by probabilistic analyses1 than by the teratological construc-tions of worst-case theory. I propose to illustrate this point of view by discussinga fundamental process shared by algorithmics, combinatorics, and discrete proba-bility theory—the digital tree process. Because of space-time limitations, this text,an invited lecture at STACS’06, cannot be but a brief guide to a rich subject whoseproper development would require a book of full length.

1. The basic structure

Consider first as domain of our data items the set of all infinitely long binarystrings, B = {0, 1}!. The goal is to devise a data structure in which elements of Bcan be stored and easily retrieved. Given a finite set ! ! B like

! =!110100 · · · , 01011 · · · , 01101 · · ·

",

a natural idea is to form a tree in which the left subtree will contain all the elementsstarting with 0, all elements starting with 1 going to the right subtree. (On theexample, the last two strings would then go to the left subtree, the first one to the

1To be mitigated by common sense and a good feel for algorithmic engineering, of course!

1

2 PHILIPPE FLAJOLET

right subtree.) The splitting process is repeated, with the next bit of data becomingdiscriminant. Formally, given any ! ! B, we define

! \ 0 := {" | 0" " "}, ! \ 1 := {" | 1" " "}.The motto here is thus simply “filter and shift left”. The digital tree or trie asso-ciated to ! is then defined by the recursive rule:

(1) trie(!) :=

#$$%

$$&

# if ! = #" if ! = {"}$•, trie(! \ 0), trie(! \ 1)%.

The tree trie(!) makes it possible to search for elements of !: in order to access" " B, simply follow a path in the tree dictated by the successive bits of ", going lefton a 0 and right on a 1. This continues till either an external node containing oneelement, or being an empty node, is encountered. Insertion proceeds similarly (splitan external node if the position is already occupied), while deletion is implementedby a dual process (merging a node with its newly vacant brother). The tree trie(!)can be either constructed from scratch by a sequence of insertions or built by a topdown procedure reflecting the inductive definition (1). In summary:

Tries serve to implement dictionaries, that is, they support the operationsof insertion, deletion, and query.

A trie thus bears some resemblance to the Binary Search Tree (BST), with thebasic BST navigation based on relative order being replaced by decisions based onvalues (bits) of the data items:

BST: $x, “< x”, “> x” %; Trie: $•, “ = 0”, “ = 1” %.Equivalently, if infinite binary strings are interpreted as [0, 1] real numbers, theseparation at the root is based on the predicates < 1

2 ,& 12 . Like for the BST, a left to

right traversal of the external nodes provides the set ! in sorted order: the resultingsorting algorithm is then essentially isomorphic to Radix Exchange Sort [43]. (Thisparallels the close relationship that BSTs entertain with the Quicksort algorithm.)The books by Knuth [43] and Sedgewick [54] serve as an excellent introduction tothese questions.

There are many basic variations on the trie principle (1).

— Multiway branching. The alphabet {0, 1} has been so far binary. An m-ary alphabet can be accommodated by means of multiway branching, withinternal nodes being m-ary.

— Paging. Recursion may be halted as soon in the set ! has cardinality lessthan some fixed threshold b. The standard case is b = 1. The general caseb & 1 corresponds to “bucketing” or paging and is used for retrieval fromsecondary memory.

— Finite-length data. Naturally occurring data tend to be of finite length.The trie can then be implemented by appending a terminator symbol toeach data item, which causes branching to stop immediately.

— Digital search trees (DSTs). These are a hybrid between BSTs and tries.Given a sequence of elements of B, place the first element at the root ofthe tree, partition the rest according to the leading digit and proceed re-cursively. DSTs are well described and analysed in Knuth’s volume [43].

THE UBIQUITOUS DIGITAL TREE 3

Their interest as a general purpose data structure has faded, but they havebeen found to play an important role in connection with data compressionalgorithms.

— Patricia tries. They are obtained from tries by adding skip fields in orderto collapse sequences of one way branches.

Let us point out at this stage that, as a general purpose data structure, triesand their kin are useful for performing not only dictionary operations, but also setintersection and set union. This fact was recognized early by Trabb Pardo [58].The corresponding algorithms are analysed in [27], which also contains a thoroughdiscussion of the algebra of finite-length models.Complexity issues. Under the basic model of infinitely long strings, the worst-case complexity of the algorithms can be arbitrarily large. In the more realisticcase of finite-length strings, the worst-case search cost may equal the length of thelongest item, and this may well be quite a large quantity. Like for many proba-bilistic algorithms, what is in fact relevant is the “typical” behaviour of the trie,measured either on average or in probability under realistic data models. Analysisof algorithms plays here a critical role in helping us decide in which contexts a triecan be useful and how parameters should be dimensioned for best e#ect. This isthe topic we address next.

2. Random tries

The field of analysis of algorithms has evolved over the past two decades. Theold-style recurrence approach is nowadays yielding ground to modern “symbolicmethods” that replace the study of sequences of numbers (counting sequences,probabilities of events, average-case values or moments of parameters) by the studyof generating functions. The algebra of series and the analysis of functions, mostlyin the complex domain C, then provide precise asymptotic information on the orig-inal sequence. For an early comparative analysis of tries and digital search trees inthis perspective, see [29].The algebra of generating functions. Let (fn) be a numeric sequence; its ex-ponential generating function (EGF) is by definition the formal sum

(2) f(z) ='

n"0

fnzn

n!.

(We systematically use the same groups of letters for numeric sequences and theirEGFs.) Consider a parameter # defined inductively over tries by

(3) #[$ ] = t[$ ] + #[$0] + #[$1].

There, the trie $ is of the form $•, $0, $1%, and the quantity t($), called the “toll”function, often only depends on the number of items stored in $ (so that t($) = tnif $ contains n data). Our goal is to determine the expectation (E) of the parameter#, when the set ! on which the trie is built comprises n elements.

The simplest probabilistic model assumes bits in strings to be identically inde-pendently distributed,

P(0) = p, P(1) = q = 1' p,

the n strings of ! furthermore being drawn independently. This model is known asthe Bernoulli model. The unbiased model also known as uniform model corresponds

4 PHILIPPE FLAJOLET

to the further condition P(0) = P(1) = 12 . Under the general Bernoulli model, the

inductive definition (3) admits a direct translation

(4) #(z) = t(z) + eqz#(pz) + epz#(qz).

There #(z) is the EGF of the sequence (#n) and #n = En(#) is the expectation ofthe parameter #[·] taken over all trees comprising n data items. The verificationfrom simple rules of series manipulation is easy: it su!ces to see that, given nelements, the probability that k of them go into the left subtree (i.e, start with a0) is the binomial probability pkqn#k

(nk

), so that, as regards expectations,

#n = tn +'

k

pkqn#k

*n

k

+(#k + #n#k).

For the number of binary nodes in the tree, a determinant of storage complexity,the toll is tn = 1 ' %n0 ' %n1. For path length, which represents the total accesscost of all elements, it becomes tn = n' %n0. The functional equation (4) can thenbe solved by iteration. Under the unbiased Bernoulli model, we have for instance

#(z) = t(z) + 2ez/2t(z

2) + 4e3z/4t(

z

4) + · · · .

Then, expansion around z = 0 yields coe!cients, that is expectations. We quoteunder the unbiased Bernoulli model

The expected size (number of binary nodes) and the expected path length ofa trie built out of n uniform independent random keys admit the explicitexpressions

Sn ='

k"0

2k,1' (1' 2#k)n ' n

2k(1' 2#k)n#1

-, Pn = n

'

k"0

(1' (1' 2#k)n#1

).

This result has been first discovered by Knuth in the mid 1960’s.

Asymptotic analysis and the Mellin transform. A plot of the averages, Sn andPn, is instructive. It strongly suggests that Sn is asymptotically linear, Sn ( cn(?),while Pn ( n lg n(?), where lg x := log2 x. As a matter of fact, the conjecture onsize is false, but by an amazingly tiny amount. What we have is the followingproperty:

The expected size (number of binary nodes) and the expected path length ofa trie built out of n uniform independent random keys

(5) Sn =n

log 2(1 + &(lg n)) + o(n), Pn = n lg n + O(n).

There &(x) is a continuous period function with amplitude < 10#5:

–2e–06

–1e–06

0

1e–06

1000 2000

We can only give here a few indications on the proof techniques and refer thereader to our long survey [24]. The idea, suggested to Knuth by the great analystDe Bruijn, is to appeal to the theory of integral transforms. Precisely, the Mellin


Figure 1. A random trie of size n = 500 built over uniform data.

transform associates to a function f(x) (with x " R"0) the function f!(s) (withs " C) defined by

f!(s) = M[f(x), x )* s] :=. !

0f(x)xs#1 dx.

For instance M[e#x] = $(s), the familiar Gamma function [61]. Mellin tranformshave two strikingly powerful properties. First, they establish a correspondencebetween the asymptotic expansion of a function at ++ (resp. 0) and singularitiesof the transform in a right (resp. left) half-plane. Second, they factorize harmonicsums, which correspond to a linear superposition of models taken at di#erent scales.

Consider the function s(x) = e#xS(x), where S(z) is the EGF of the sequence(Sn) of expectations of size. (This corresponds to adopting a Poisson rather thanBernoulli model; such a choice does not a#ect our asymptotic conclusions since, ascan be proved elementarily [43, p. 131], Sn ' s(n) = o(n).) A simple calculationshows that

s(x) ='

k"0

2k/1'

,1 +

x

2k

-e#x/2k

0, s!(s)' (1 + s)$(s)

1' 21+s.

The asymptotic estimates (5) result from there, given that a pole at ' of thetransform corresponds to a term x#" in the asymptotic expansion of the originalfunction. It is the existence of complex poles at

s = '1 +2ik(

log 2, k " Z,

that, in a sense, “creates” the periodic fluctuations present in s(x) (and hence inSn).

6 PHILIPPE FLAJOLET

Models. Relative to the unbiased model (unbiased 0-1 bits in independent data),the expected size estimate expresses the fact that storage occupation is at most oflinear growth, despite the absence of convergence to a constant occupation ratio.The path length estimate means that the trie is nearly optimal in some informationtheoretic sense, since an element is typically found after ( lg n binary questions.The profile of random trie under this model is displayed in Figure 1.

The &-fluctuation, with an amplitude of 10#5, in the asymptotic behaviour of sizetends to be quite puzzling to programmers. Undeniably, such fluctuations will neverbe detected on simulations not to mention executions on real-life data. However,mathematically, their presence implies that most elementary strategies for analysingtrie algorithms are doomed to failure. (See however [55, p. 403] for an elementaryapproach.) It is a fact that no coherent theory of tries can be developed withouttaking such fluctuations into account. For instance, the exact order of the varianceof trie size and trie path length must involve them [41, 42]. As a matter of fact, someanalyses, which were developed in the late 1970s and ignored fluctuations, led towrong conclusions, even regarding the order of growth of important characteristicsof tries.

Back to modelling issues, the uniform model seems at first sight to be of littlevalue. It is however fully justified in situations where elements are accessed viahashing and the indications it provides are precious: see for instance the discussionof dynamic and extendible hashing in Section 4. Also, the Mellin transform tech-nology is equally suitable for extracting asymptotic information from the baisedBernoulli model (p ,= q). In that case, it is found that, asymptotically2

(6) Sn -n

H, Pn (

n

Hlog n,

where H . H(p, q) = 'p log p'q log q is the entropy function of the Bernoulli (p, q)model. The formulæ admit natural generalizations to m-ary alphabets.

The estimates (6) indicate that trees become less e!cient roughly in propor-tion to entropy. For instance, for a four symbol alphabet, where each letter hasprobability larger than 0.10, (this is true of most {A,G,C,T} genomic sequences),the degradation of performance is by less than a factor of 1.5 (i.e., a 50% lossat most). In particular linear storage and logarithmic access costs are preserved.Equally importantly, more realistic and considerably more general data models canbe analysed precisely: see Section 8 relative to dynamical sources, which encapsu-late the framework of Markov chains as a particular case.

Amongst the many fascinating techniques that have proved especially fruitfulfor trie analyses, we should also mention: Rice’s method from the calculus of finitedi#erences [30, 43]; analytic depoissonization specifically developed by Jacquet andSzpankowski [40], which has led to marked successes in the analysis of dictionary-based compression algorithms. Complex analysis, that is, the theory of analytic(holomorphic) functions is central to most serious works in the area. Books thatdiscuss relevant methods include those of Sedgewick-Flajolet [31], Hofri [35], Mah-moud [45], and Szpankowski [57].

2The symbol ‘!’ is used throughout in the strict sense of asymptotic equivalence; the symbol‘"’ is employed here to represent a numerical approximation up to tiny fluctuations.


3. Multidimensional tries

Finkel and Bentley [21] adapted the BST to multidimensional data as early as1974. Their ideas can be easily transposed to tries. Say you want to maintain setsof points in d-dimensional space. For d = 2, this gives rise to the standard quadtrie,which associates to a finite set ! ! [0, 1]2 a tree defined as follows.

(i) If card(!) = 0, then quadtrie(!) = #;(ii) if card(!) = 1, then quadtrie(!) consists of a single external node con-

taing !;(iii) else, partition ! into the four subsets determined by their position with

respect to the center ( 12 , 1

2 ) of space, and attach a root to the subtreesrecursively associated to the four subsets (NW, NE,SW,SE, where NEstands for North-East, etc).

A moment’s reflection shows that the quadtrie is equivalent to the 4-way trie builtover an alphabet of cardinality 4: given any point P = (x, y), write its coordinatesin binary, x = x1x2, . . . and y = y1y2 · · · , then encode the pair of coordinates “inparallel” over the alphabet {a, b, c, d}, where, say, a = (0, 0), b = (0, 1), c = (1, 0),d = (1, 1). The quadtrie is none other than the 4-way trie built over the set of suchencodings.

Another idea of Bentley [6] gives rise to k-d-tries. For d = 2, associate toeach point P = (x, y), where x = x1x2, . . . and y = y1y2 · · · , the binary stringz = x1y1x2y2 · · · obtained by interleaving bits of both coordinates. The k-d-trie isthe binary trie built on the z-codes of points.

Given these equivalences, the analytic methods of Section 2 apply verbatim:

Over uniform independent data, the d-dimensional quadtrie requires on av-erage - cn pointers, where c = 2d/ log 2d; for k-d-tries this number ofpointers is - c$n, where c$ = 2/ log 2. The mean number of bit accessesneeded by a search is ( log2 n.

Roughly, multidimensional tries grant us fast access to multidimensional data. Thestorage requirements of quad-tries may however become prohibitive when the di-mension of the underlying data space grows, owing to a large number of null pointersthat carry little information but encumber memory.

Quadtries and k-d-tries also serve to implement partial-match queries in an ele-gant way. This corresponds to the situation, in d-dimensional space, where s out ofd coordinates are specified and all points matching the s known coordinates are tobe retrieved3. Put otherwise, one wants to reconstruct data given partial knowledgeof their attributes. It is easy to set up recursive procedures reflected by inductivedefinitions for the cost parameters of such queries. The analytic methods of Sec-tion 2 are then fully operational. The net result, due to Flajolet and Puech [26],is

The mean number of operations needed to retrieve objects of d-dimensionalspace, when s out of d of their coordinates are known, is O(n1#/s/d). Theestimate holds both for quadtries and for k-d-tries.

3For an excellent discussion of spatial data structures, we redirect the reader to Samet’sbooks [53, 52].

8 PHILIPPE FLAJOLET

2

4

6

8

10

12

14

0 3000 6000

Figure 2. Left: a trie. Middle: a corresponding TST. Right:cost of TST search on Moby Dick (number of letter comparisonsagainst number of words scanned).

In contrast, quadtrees and k-d-trees (based on the BST concept) require

O(n1#s/d+#(s/d))

operations for some function ) > 0. For instance, for comparison-based structures,the case s = 1 and d = 2, entails a complexity O(n"), where ' =

%17#32

.= 0.56155,which is of higher order than the O(

/n) attached to bit-based structures. The

better balancing on bit based structures pays—at least on uniform enough data.Devroye [11] has provided an insightful analysis of tries (d = 1) under a density

model, where data are drawn independently according to a probability densityfunction spread over the unit interval. A study of multidimensional search alongsimilar lines would be desirable.

Ternary search tries (TST). Multiway tries start require a massive amount ofstorage when the alphabet cardinality is large. For instance, a dictionary thatwould contain the word Apple should have null pointers corresponding to thenon-existent forms Appla, Applb, Applc, etc. When attempting to address thisproblem, Bentley and Sedgewick [5] made a startling discovery: it is possible todesign a highly e!cient hybrid of the trie and the BST. In essence, you buildan m-way trie (with m the alphabet cardinality), but implement the local decisionstructure at each node by a BST. The resulting structure, known as a TST, is simplya ternary tree, where, at each node, a comparison between letters is performed.Upon equality, go down one level, i.e., examine the next letter of the item to beretrieved; else proceed to the left or the right, depending on the outcome of thecomparison between letters. It’s as simple as that!

The TST was analysed by Clement, Flajolet, and Vallee [7, 8]. Quoting from [7]:Ternary search tries are an e!cient data structure from the information theoreticpoint of view since a search costs typically about log n comparisons. For an alphabetof cardinality 26, the storage cost of ternary search tries is about 9 times smallerthan standard array-tries. (Based on extensive natural language data.)

4. Hashing and height

Paged tries also known as bucket tries are digital trees defined like in (1), but withrecursion stopped as soon as at most b elements have been isolated. This techniqueis useful in the context of a two-level memory. The tree itself can then be stored incore-memory as an index. Its end-nodes then point to pages or buckets in secondarymemory. The technique can then be applied to hashed values of records, rather than


records themselves which may be rather non-uniformly distributed in practice. Theresulting algorithm is known as dynamic hashing and is due to Larson [44]. It isinteresting to note that it was first discovered without an explicit reference to tries,the author viewing it as an evolution of hashing with separate chaining (in a pagedenvironment), and with the splitting of buckets replacing costly chains of linkedpages. The analysis methods of Section 2 show that the mean number of pages is

- n

b log 2.

In other words, the pages are approximately 69% filled, a score that is comparableto the one of B–trees.

For very large data bases, the index of dynamic hashing may become too largeto fit in primary memory. Fagin et al. [16] discovered an elegant way to remedy thesituation, known as extendible hashing and based on the following principle:

Perfect tree embedding. Let a tree $ of some height H be given, withonly the external nodes of $ containg information. Form the perfect tree Pof height H (i.e., all external nodes are at distance H from the root). Thetree $ can be embedded into the perfect tree with any information beingpushed to the external nodes of P . (This in general involves duplications.)The perfect tree with decorated external nodes can then be represented asan array of dimension 2H , thereby granting direct access to its leaves.

In this way, in most practical situations, only two disc accesses su!ce to reach anyitem stored in the structure—one for the index, which is a paged array, the otherfor the referenced page itself. This algorithm is the definite solution to the problemof maintaining very large hashed tables.

In its time, extendible hashing posed a new problem to analysts. Is the size of theindex of linear or of superlinear growth? That question brings the analysis of heightinto our algorithmic games. General methods of combinatorial enumeration [31, 56,33, 62] are relevant to derive the basic equations. The starting point is ([zn]f(z)represents the coe!cient of zn in the expansion of f(z) at 0)

Pn(H 0 h) = n![zn]eb

, z

2h

-2h

, eb(z) := 1 +z

1+ · · ·+ zb

b!.

The problem is thus to extract coe!cients of large index in the large power of a fixedfunction (here, the truncated exponential, eb(z)). The saddle point method [10, 31]of complex analysis comes to mind. It is based on Cauchy’s coe!cient formula,

[zn]f(z) =1

2i(

.

O+f(z)

dz

zn+1,

which relates values of an analytic function to its coe!cients, combined with thechoice of a contour that crosses a saddle point of the integrand (Figure 3). The netresult of the analysis [22] is the following:

Height of a paged b-trie is of the form*

1 +1b

+log n + O(1)

both on average and in probability. The limit distributions are in the formof a double exponential function.

10 PHILIPPE FLAJOLET

Figure 3. A saddle point of the modulus of an analytic function.

The size (2H) of the extendible-hashing index is on average of the formC(log n)n1+1/b, with C(·) a bounded function. In particular, it grows non-linearly with n.

(See also Yao [63] and Regnier [50] for earlier results under the Poisson model.)The ideas of extendible hashing are also susceptible of being generalized to higher

dimensional data: see Regnier’s analysis of grid-file algorithms in [51].Level compressed tries. Nilsson and Karlsson [48] made a sensation when theydiscovered the “LC trie” (in full: Level Compressed trie): they demonstrated thattheir data structure could handle address lookup in routing tables with a standardPC in a way that can compete favorably with dedicated hardware embedded intorouters. One of their beautifully simple ideas consists in compressing the perfecttree contained in a trie (starting from the root) into a single node of high degree—this principle is then used recursively. It is evocative of a partial realization ofextendible hashing. The decisive advantage in terms of execution time stems fromthe fact that chains of pointers are replaced by a single array access, while thesearch depth decreases to O(log log n) for a large class of distributions [12, 13, 48]

5. Leader election and protocols

Tries have found unexpected applications as an abstract structure underlyingseveral algorithms of distributed computing. We discuss here leader election andthe tree protocol due to Capetanakis-Tsybakov-Mikhailov (also known as the CTMprotocol or the stack protocol). In both cases, what is assumed is a shared channelon which a number of stations are hooked. At any discrete instant, a station canbroadcast a message of unit duration. It can also sense the channel and get aternary feedback: 0 for silence; 1 for a succesful transmission; 2+ for a collisionbetween an unknown number of individuals.

The leader election protocol in its bare version is as follows:Basic leader election. At time t = 0 the group G of all the n stations4on the channel are ready to start a round for electing a reader. Each onetransmits its name (an identifier) at time 1. If n = 0, the channel hasremained silent and nothing happens. If n = 1, the channel fedback is 1and the corresponding individual is elected. Else all contenders in G flip acoin. Let GH (resp GT ) be the subgroup of those who flipped head (resp.tails). Members of the group GT withdraw instantly from the competition.Members of GH repeat the process over the next time slot.

4The number n is unknown.


We expect the size of G to decrease by roughly a factor of 2 each time, whichsuggests that the number of rounds should be close to log2 n. The basic protocoldescribed above may fail with probability close to 0.27865, see [55, p. 407], but it iseasily amended: it su!ces to let the last nonempty group of contenders (likely tobe of small size) start again the process, and repeat, until a group of size 1 comesout.

This leader election protocol is a perfect case for the analytic methods evokedearlier. The number of rounds for instance coincides with the leftmost branch oftree, a parameter easily amenable to the analytic techniques described in Section 2.The complete protocol has been analysed thoroughly by Prodinger [49] and Fill etal. [20]. Fluctuations are once more everywhere to be found.

The tree protocol was invented around 1977 independently in the USA and in theSoviet Union. For background, references, and results, we recommend the specialissue of the IEEE Transactions on Information Theory edited by Jim Massey [46].The idea is very simple: instead of developing only the leftmost branch of a trie,develop cooperatively the whole trie.

Basic tree protocol. Let G be the group of stations initially waitingto transmit a message. During the first available slot, all stations of Gtransmit. If the channel feedback is 0 or 1, transmission is complete. Else,G is split into GH , GT . All the members of GH are given precedence andresolve their collisions between themselves, by a recursive application of theprotocol. Once this phase has been completed, the group GT takes its turnand proceeds similarly.

Our description presupposes a perfect knowledge of the system’s state by all pro-tagonists at every instant. It is a notable fact that the protocol can be implementedin a fully decentralized manner, each station only needing to monitor the channelfeedback (and maintain a priority stack, in fact, a simple counter). The time ittakes to resolve the contention between n initial colliders coincides with the totalnumber of nodes in the corresponding trie (think of stations as having predeter-mined an infinite sequence of coin flips), that is, 2Sn + 1 on average. The unbiasedBernoulli model is exactly applicable, given a decent random number generator.All in all, the resolution of a collision of multiplicity n takes times asymptotic to(cf Equation (5))

2log 2

n,

upon neglecting the usual tiny fluctuations. In other words, the service time percustomer is about 2/ log 2. By standard queuing theory arguments, the protocol isdemonstrably stable for all arrival rates * satisfying * < *max, where

*max =log 2

2± ·10#5 .= 0.34657.

In contrast, the Ethernet protocol has been proved unstable by Aldous in a stunningstudy [1].

We have described above a simplified version (the one with so-called “blockedarrivals”) of the tree protocol. An improved version allows competitors to enterthe game as soon as they are ready. This “free arrivals” version leads to nonlocalfunctional equations of the form

+(z) = t(z) + +(* + pz) + +(* + qz),


whose treatment involves interesting properties of iterated functions systems (IFS)and associated Dirichlet series: see G. Fayolle et al [17, 18] and the account inHofri’s book [35]. The best protocol in this class (Mathys-Flajolet [47]) was largelydiscovered thanks to analytic techniques, which revealed the following: a throughputof *max = 0.40159 is achievable when combining free arrivals and ternary branch-ing.

6. Probabilistic counting algorithms

A problem initially coming from query optimization in data bases led NigelMartin and me to investigate, at an early stage, the following problem: Given amultiset M of data of sorts, estimate the number of distinct records, also calledcardinality, that M contains. The cardinality estimation problem is nowadays ofgreat relevance to data mining and to network management. (We refer to [23] fora general discussion accompanied by references.)

The idea consists in applying a hash function h to each record. Then bits ofhashed values are observed. The detection of patterns in observed hashed valuescan serve as a fair indicator of cardinality. Note that, by construction, such algo-rithms are totally insensitive to the actual structure of repetitions in the originalfile (usually, no probabilistic assumption regarding these can be made). Also, oncea hash function of good quality has been chosen, the hashed values can legitimatelybe identified with uniform random strings. This makes it possible to trigger avirtuous cycle, involving probabilistic analysis of observables and suitably tunedcardinality estimators.

The original algorithm, called probabilistic counting [25], was based on a simpleobservable: the length L of the longest initial run of 1-bits in h(M). This quantitycan be computed with an auxiliary memory of a single 32 bit word, for reasonablefile cardinalities, say n 0 109. We expect Ln - log2 n, which suggests 2Ln as arough estimator of n. The analysis of Ln is attached to that of tries—we are ina way developing the leftmost branch of a pseudo-trie to which the methods ofSection 2 apply perfectly. It involves the Thue-Morse sequence, which is familiarto aficionados of combinatorics on words. However, not too surprisingly, the roughestimate just described is likely to be typically o#, by a little more than one binaryorder of magnitude, from the actual (unknown) value of n. Improvements are calledfor.

The idea encapsulated into the complete Probabilistic Counting algorithm is toemulate at barely any cost the e#ect of m independent experiments. There is asimple device, called “stochastic averaging” which makes it possible to do so bydistribution into buckets and then averaging. The resulting algorithm estimatescardinalities using m words of memory, with a relative accuracy of about 0.78%

m. It

is pleasant to note that a multiplicative correction constant, provided by a Mellintransform analysis,

, =e$

/2

23

!1

p=1

24p + 1)(4p + 2)

(4p)(4p + 3)

3%(p)

(- is Euler’s constant, .(p) " {'1,+1} indicates the parity of the number of 1-bits in the binary representation of p) enters the very design of the algorithm byensuring that it is free of any systematic bias.


eddcdfdddddfcfdeeeeeefeedfedeffeffdefefeb fedefceffdefdfefeecfdedeeededffefeffeecddefcfcddccedddcfddedeccdefddfcedddfdfedecddfedcfedcdfdeedegddcfededggfffdggdfgfegdgddddegddffededceeeefdedgfgdddeefdceeeefeeddeedefcffffdh

hcgdccgchdfdchdehdgeeegfeedccfdedfddf

Figure 4. The LogLog algorithm asociates to a text a signa-ture, from which the number of di#erents words can be inferred.Here, the signature of Hamlet uses m = 256 bytes, with which thecardinality of the vocabulary is estimated to an accuracy of 6.6%.

Recently, Marianne Durand and I were led to revisit the question, given therevival of interest in the area of network monitoring and following stimulating ex-changes with Estan and Varghese (see, e.g., [15]). We realized that a previouslyneglected observable, the position 4L of the rightmost 1-bit in hashed values, thoughit has inferior probabilistic properties (e.g., a higher variance), can be maintainedas a register in binary, thereby requiring very few bits. Our algorithm [14], calledLogLog Counting estimates cardinalities using m bytes of memory, with a relativeaccuracy of about 1.3%

m. Given that a word is four bytes, the overall memory require-

ment is divided by a factor of 3, when compared to Probabilistic Counting. This is,to the best of my knowledge, the most e!cient algorithm available for cardinalityestimation. Once more the analysis can be reduced to trie parameters and Mellintransform as well as the saddle point method play an important part.

For a highly valuable complexity-theoretic perspective on such questions see thestudy [2] by Alon, Matias, and Szegedy. In recent years, Piotr Indyk and hiscollaborators have introduced radically novel ideas in quantitative data mining,based on the use of stable distributions, but these are unfortunately outside of ourscope, since tries do not intervene at all there.

7. Suffix tries and compression

Say you want to compress a piece of text, like the statement of Pythagoras’Theorem:

In any right triangle, the area of the square whose side is the hypotenuse(the side of the triangle opposite the right angle) is equal to the sumof the areas of the squares of the other two sides.

It is a good idea to notice that several words appear repeated. They could then beencoded once and for all by numbers. For instance:

1the,2angle,3triangle,4area,5square,6side,7right|In any 7 3, 1 4 of 15 whose 6 is 1 hypotenuse (1 6 of 1 3 opposite 1 7 2) is equal to 1 sumof 1 4s of 1 5s of 1 other two 6s.

A dictionary of frequently encountered terms has been formed. That dictionarycould even be recursive as in

1the,2angle,3tri2,4area,5square,6side,7right| ...

Around 1977–78, Lempel and Ziv developed ideas that were to have a profoundimpact of the field of data compression. They proposed two algorithms that makeit possible to build a dictionary on the fly, and in a way that adapts nicely to thecontents of the text. The first algorithm, known as LZ’78, is the following:


LZ’78 algorithm. Scan the text left to right. The text is segmented intophrases. At any given time, the cursor is on a yet unsegmented portion ofthe text. Find the longest phrase seen so far that matches the continuationof the text, starting from the cursor. A new phrase is created that containsthis longest phrase plus one new character. Encode the new phrase withthe rank of the previously matching phase and the new character.

For instance, “abracadabra” is segmented as follows0a| 0b| 0r| a1c| a1d| a1b| r3a| ab6r| ac4a| 0d| abr7a|,

resulting in the encoding0a0b0r1c1d1b3a6r4a0d7a,

As it is well known, the algorithm can be implemented by means of a trie whosenodes store the ranks of the corresponding phrases.

From a mathematical perspective, the tree built in this way obeys the same lawsas a digital search tree (DST), so that we’ll start with examining them. The DSTparameters can be analysed on average by the methods of Section 2, with suitableadjustments [28, 29, 43, 45, 57]. For instance, an additive parameter # associatedto a toll function t gives rise, at EGF level, to a functional equation of the form (inthe unbiased case)

#(z) = t(z) + 2. z

0et/2f

*t

2

+dt,

which is now a di#erence-di#erential equation. The treatement is a bit more di!-cult, but the equation eventually succumbs to the Mellin technology. In particular,path length under a general Bernoulli model is found to be satisfy

(7) P &n =

1H

n log n + O(n),

with H the entropy of the model.Back to the LZ’78 algorithm. Equation (7) means that when n phrases have

been produced by the algorithm, the total number of characters scanned is N (H#1n log n on average. Inverting this relation5 suggest the following relation be-tween number of characters read and number of phrases produced:

n ( HN

log N.

Since a phrase requires at most log2 N bits to be encoded, the total length ofthe compressed text should be ( hn with h = H/ log 2 the binary entropy. Thishandwaving argument suggests the true fact: for a memoryless (Bernoulli) source,the entropic (optimal) rate is achieved by LZ compression.

The previous argument is quite unrigorous. In addition, information theoristsare interested not only in dominant asymptotics but in quantifying redundancy,which measures the distance to the entropic optimum. The nature of stochasticfluctuations is also of interest in this context. Jacquet and Szpankowski have solvedthese di!cult questions in an important work [39]. Their treatment starts from thebivariate generating function P &(z, u) of path length in DSTs, which satisfies anonlinear functional equation,

P &(z, u)/z

= P &(pz, u) · P &(qz, u).

5This is in fact a renewal type of argument.


They deduce asymptotic normality via a combination of inductive bounds, boot-strapping, analytic depoissonization, and the Quasi-Powers Theorem of analyticcombinatorics [31, 36, 57]. (Part of the treatment makes use of ideas developedearlier by Jacquet and Regnier in their work establishing asymptotic normality ofpath length and size in tries [37].) From there, the renewal argument can be put ona sound setting. A full characterization of redundancy results and the fluctuationsin the compression rate are determined to be asymptotically normal.

Su!x trees and antidictionaries. Given an infinitely long text T " B . {0, 1}!,the su!x tree (or su#x trie) of index n is the trie built on the first n su!xes of T .Such trees are important as an indexing tool for natural language data. They arealso closely related to a variant of the LZ algorithm, known as LZ’77, that we havejust presented. A new di!culty in the analysis comes from the back that su!xtrees are tries built on data that are intrinsically correlated, due to the overlappingstructure of su!xes of a single text. Jacquet and Szpankowski [38] are responsiblefor some of the early analyses in this area. Their treatment relies on the autocorre-lation polynomial of Guibas and Odlyzko [34] and on complex-analytic techniques.Julien Fayolle [19] has recently extended this methodology. His methods also pro-vide insight on the quantitative behaviour of a new scheme due to Crochemore etal. [9], which is based on the surprising idea of using antidictionaries, that is, adescription of some of the patterns that are avoided by the text.

8. Dynamical sources

So far, tries have been analysed when data are provided by a source, but one of asimple type. A new paradigm in this area is Brigitte Vallee’s concept of dynamicalsources. Such sources are most likely constituting the widest class of models, whichcan be subjected to a complete analytic treatment. To a large extent, Vallee’sideas [59] evolved from the realization that methods, originally developed for thepurpose of analysing continued fraction algorithms [60], could be of a much widerscope.

Consider a transformation T of the unit interval that is piecewise di#erentiableand expanding: T $(x) > 1. Such a transformation is called a shift. It consists ofseveral branches, as does the multivalued inverse function T#1, which is formedof a collection of contractions. Given an initial value x0, the sequence of iter-ates (T j(x0)) can then be encoded by recording at each iteration which branchis selected—this is a fundamental notion of symbolic dynamics. For instance, thefunction T (x) = {2x} (with {w} representing the fractional part of w) generatesin this way the binary representation of numbers; from a metric (or probabilistic)point of view, it also describes the unbiased Bernoulli model. Via a suitable design,any biased Bernoulli model is associated with a shift, which is a piecewise-linearfunction. Markov chains (of any order) also appear as a special case. Finally, thecontinued fraction representation of numbers itself arises from the transformation

T (x) :=5

1x

6=

1x'

71x

8.

A dynamical source is specified by a shift (which determines a symbolic encoding)as well as by an initial density on the unit interval. The theory thus unites De-vroye’s density model, Bernoulli and Markov models, as well as continued fraction


Figure 5. Dynamical sources: [left] the shift associated withcontinued fractions; [right] a rendering of fundamental intervals.

representions of real numbers and a good deal more. As opposed to earlier mod-els, such sources take into account correlations between letters at an unboundeddistance.Tries under dynamical sources. Vallee’s theory has been applied to tries, inparticular in a joint work with Clement [8]. What it brings to the field is the unifyingnotion of fundamental intervals which are the subintervals of [0, 1] associated toplaces corresponding to potential nodes of tries. Much transparence is gained by thisway of viewing a trie process as a succession of refined partitions of the unit interval,and the main parameters of tries can be expressed simply in this framework.

Technically, a central role is played by Ruelle’s transfer operator. Given a shiftT with H the collection of its inverse branches, the transfer operator is defined overa suitable space of functions by

Gs[f ](x) :='

h'H|h$(x)|s f 1 h(x).

For instance, in the continued fraction case, one has

Gs[f ](x) :='

m"1

1(m + x)2s

f

*1

m + x

+.

The quantity s there is a parameter that is a priori allowed to assume complexvalues. As Vallee showed, by considering iterates of Gs, it then becomes possibleto construct generating functions, usually of the Dirichlet type, associated withpartitions into fundamental intervals. (In a way, the transfer operator is a su-pergenerating operator.) Equiped with these, it then becomes possible to expressexpectations and probability distributions of trie parameters, after a Mellin trans-form round. Then functional analysis comes into play (the operators have a dis-crete spectrum), to the e#ect that asymptotic properties of tries built on dynamicalsources are explicitly related to spectral properties of the transfer operator.

As an example of unified formulæ, we mention here the mean value estimatesof (6) which are seen to hold for an arbitrary dynamical source. The role of en-tropy in these formulæ comes out neatly—entropy is bound to be crucial underany dynamical source model. The analysis of height becomes almost trivial; thecharacteristic constant turns out to be in all generality none other than *1(2), thedominant eigenvalue of operator G2.


We conclude this section with a brief mention of an algorithm due to Gosper, firstdescribed in the celebrated “Hacker’s memorandum” also known as HAKMEM [4,Item 101A]. The problem is to compare two fractions a

b and cd . It is mathematically

trivial, sincea

b' c

d=

ad' bc

bd,

but the algorithms that this last formula suggests either involve going to multipreci-sion routines or operating with floating point arithmetics at the risk of reaching awrong conclusion.

Gosper’s comparison algorithm. In order to compare ab and c

d , performa continued fraction expansion of both fractions. Proceed in lazy mode.Stop as soon as a discrepant digit is encountered.

Gosper’s solution operates within the set precision of data and is error-free (asopposed to the use of floating point approximations). For this and other reasons6,it has been found to be of interest by the community of computational geometersengaged in the design of robust algorithms [3]. (It also makes an appearance thesource code of Knuth’s Metafont, for similar reasons.)

In our perspective, the algorithm can be viewed as the construction of the digitaltree associated to two elements accessible via their continued fraction representa-tions. Vallee and I give a thorough discussions of the fascinating mathematicsthat surround its analysis in [32]. The algorithm extends to the lazy and robustcomparison of a system of n fractions: it su!ces to build, by lazy evaluation, thetrie associated to continued fraction representations of the entries. What we foundin [32] is the following result: The expected cost of sorting n uniform random realnumbers by lazy evaluation of their continued fraction representations satisfies

Pn = K0n log n + K1n + Q(n) + K2 + o(1),

where (0(s) is the Riemann zeta function)

K0 =6 log 2

(2, K1 = 18

- log 2(2

+ 9(log 2)2

(2' 72

log 2 0 $(2)(4

' 12,

and Q(u) is an oscillating function with mean value 0 whose order is

Q(u) = O,u&/2

-, where % is any number such that % > sup

!2(s)

99 0(s) = 0".

The Riemann hypothesis has just made an entry into the world of tries!

References

1. David J. Aldous, Ultimate instability of exponential back-o! protocol for acknowledgement-based transmission control of random access communication channels, IEEE Transactions onInformation Theory 33 (1987), no. 2, 219–223.

2. Noga Alon, Yossi Matias, and Mario Szegedy, The space complexity of approximating thefrequency moments, Journal of Computer and System Sciences 58 (1999), no. 1, 137–147.

3. F. Avnaim, J.-D. Boissonnat, O. Devillers, F. P. Preparata, and M. Yvinec, Evaluating signsof determinants using single-precision arithmetic, Algorithmica 17 (1997), no. 2, 111–132.

4. M. Beeler, R. W. Gosper, and R. Schroeppel, HAKMEM, Memorandum 239, M.I.T.,Artificial Intelligence Laboratory, February 1972, Available on the WorldWide Web athttp://www.inwap.com/pdp10/hbaker/hakmem/hakmem.html.

5. Jon Bentley and Robert Sedgewick, Fast algorithms for sorting and searching strings, EighthAnnual ACM-SIAM Symposium on Discrete Algorithms, SIAM Press, 1997.

6The sign question is equivalent to determining the orientation of a triangle in the plane.


6. Jon Louis Bentley, Multidimensional binary search trees used for associative searching, Com-munications of the ACM 18 (1975), no. 9, 509–517.

7. Julien Clement, Philippe Flajolet, and Brigitte Vallee, The analysis of hybrid trie structures,Proceedings of the Ninth Annual ACM–SIAM Symposium on Discrete Algorithms (Philadel-phia), SIAM Press, 1998, pp. 531–539.

8. Julien Clement, Philippe Flajolet, and Brigitte Vallee, Dynamical sources in informationtheory: A general analysis of trie structures, Algorithmica 29 (2001), no. 1/2, 307–369.

9. M. Crochemore, F. Mignosi, A. Restivo, and S. Salemi, Text compression using antidictionar-ies, Automata, languages and programming (Prague, 1999), Lecture Notes in Computer Sci-ence, vol. 1644, Springer, Berlin, 1999, pp. 261–270.

10. N. G. de Bruijn, Asymptotic methods in analysis, Dover, 1981, A reprint of the third NorthHolland edition, 1970 (first edition, 1958).

11. Luc Devroye, A probabilistic analysis of the height of tries and of the complexity of triesort,Acta Informatica 21 (1984), 229–237.

12. Luc Devroye, An analysis of random LC tries, Random Structures & Algorithms 19 (2001),359–375.

13. Luc Devroye and Wojtek Szpankowski, Probabilistic behaviour of level compressed tries, Ran-dom Structures & Algorithms 27 (2005), no. 2, 185–200.

14. Marianne Durand and Philippe Flajolet, LogLog counting of large cardinalities, AnnualEuropean Symposium on Algorithms (ESA03) (G. Di Battista and U. Zwick, eds.), LectureNotes in Computer Science, vol. 2832, 2003, pp. 605–617.

15. Cristian Estan and George Varghese, New directions in tra"c measurement and accounting:Focusing on the elephants, ignoring the mice, ACM Transactions on Computer Systems 21(2003), no. 3, 270–313.

16. R. Fagin, J. Nievergelt, N. Pippenger, and R. Strong, Extendible hashing: A fast accessmethod for dynamic files, A.C.M. Transactions on Database Systems 4 (1979), 315–344.

17. Guy Fayolle, Philippe Flajolet, and Micha Hofri, On a functional equation arising in theanalysis of a protocol for a multiaccess broadcast channel, Advances in Applied Probability18 (1986), 441–472.

18. Guy Fayolle, Philippe Flajolet, Micha Hofri, and Philippe Jacquet, Analysis of a stack algo-rithm for random access communication, IEEE Transactions on Information Theory IT-31(1985), no. 2, 244–254, (Special Issue on Random Access Communication, J. Massey editor).

19. Julien Fayolle, An average-case analysis of basic parameters of the su"x tree, Mathematicsand Computer Science III: Algorithms, Trees, Combinatorics and Probabilities (M. Drmotaet al., ed.), Trends in Mathematics, Birkhauser Verlag, 2004, pp. 217–227.

20. James Allen Fill, Hosam M. Mahmoud, and Wojciech Szpankowski, On the distribution forthe duration of a randomized leader election algorithm, The Annals of Applied Probability 6(1996), no. 4, 1260–1283.

21. R. A. Finkel and J. L. Bentley, Quad trees, a data structure for retrieval on composite keys,Acta Informatica 4 (1974), 1–9.

22. Philippe Flajolet, On the performance evaluation of extendible hashing and trie searching,Acta Informatica 20 (1983), 345–369.

23. , Counting by coin tossings, Proceedings of ASIAN’04 (Ninth Asian Computing ScienceConference) (M. Maher, ed.), Lecture Notes in Computer Science, vol. 3321, 2004, (Text ofOpening Keynote Address.), pp. 1–12.

24. Philippe Flajolet, Xavier Gourdon, and Philippe Dumas, Mellin transforms and asymptotics:Harmonic sums, Theoretical Computer Science 144 (1995), no. 1–2, 3–58.

25. Philippe Flajolet and G. Nigel Martin, Probabilistic counting algorithms for data base appli-cations, Journal of Computer and System Sciences 31 (1985), no. 2, 182–209.

26. Philippe Flajolet and Claude Puech, Partial match retrieval of multidimensional data, Journalof the ACM 33 (1986), no. 2, 371–407.

27. Philippe Flajolet, Mireille Regnier, and Dominique Sotteau, Algebraic methods for trie sta-tistics, Annals of Discrete Mathematics 25 (1985), 145–188, In Analysis and Design of Algo-rithms for Combinatorial Problems, G. Ausiello and M. Lucertini Editors.

28. Philippe Flajolet and Bruce Richmond, Generalized digital trees and their di!erence–di!erential equations, Random Structures & Algorithms 3 (1992), no. 3, 305–320.

29. Philippe Flajolet and Robert Sedgewick, Digital search trees revisited, SIAM Journal on Com-puting 15 (1986), no. 3, 748–767.


30. , Mellin transforms and asymptotics: finite di!erences and Rice’s integrals, Theoret-ical Computer Science 144 (1995), no. 1–2, 101–124.

31. Philippe Flajolet and Robert Sedgewick, Analytic combinatorics, October 2005, ChaptersI–IX of a book to be published, 688p.+x, available electronically from P. Flajolet’s homepage.

32. Philippe Flajolet and Brigitte Vallee, Continued fractions, comparison algorithms, and finestructure constants, Constructive, Experimental, and Nonlinear Analysis (Providence) (MichelThera, ed.), Canadian Mathematical Society Conference Proceedings, vol. 27, American Math-ematical Society, 2000, pp. 53–82.

33. Ian P. Goulden and David M. Jackson, Combinatorial enumeration, John Wiley, New York,1983.

34. L. J. Guibas and A. M. Odlyzko, String overlaps, pattern matching, and nontransitive games,Journal of Combinatorial Theory. Series A 30 (1981), no. 2, 183–208.

35. Micha Hofri, Analysis of algorithms: Computational methods and mathematical tools, OxfordUniversity Press, 1995.

36. Hsien-Kuei Hwang, On convergence rates in the central limit theorems for combinatorialstructures, European Journal of Combinatorics 19 (1998), no. 3, 329–343.

37. Philippe Jacquet and Mireille Regnier, Trie partitioning process: Limiting distributions,CAAP’86 (P. Franchi-Zanetacchi, ed.), Lecture Notes in Computer Science, vol. 214, 1986,Proceedings of the 11th Colloquium on Trees in Algebra and Programming, Nice France,March 1986., pp. 196–210.

38. Philippe Jacquet and Wojciech Szpankowski, Autocorrelation on words and its applications:analysis of su"x trees by string-ruler approach, Journal of Combinatorial Theory. Series A66 (1994), no. 2, 237–269.

39. , Asymptotic behavior of the Lempel-Ziv parsing scheme and digital search trees, The-oretical Computer Science 144 (1995), no. 1–2, 161–197.

40. , Analytical de-Poissonization and its applications, Theoretical Computer Science 201(1998), no. 1-2, 1–62.

41. Peter Kirschenhofer and Helmut Prodinger, On some applications of formulæ of Ramanujanin the analysis of algorithms, Mathematika 38 (1991), 14–33.

42. Peter Kirschenhofer, Helmut Prodinger, and Wojciech Szpankowski, On the variance of theexternal path length in a symmetric digital trie, Discrete Applied Mathematics 25 (1989),129–143.

43. Donald E. Knuth, The art of computer programming, 2nd ed., vol. 3: Sorting and Searching,Addison-Wesley, 1998.

44. P. A. Larson, Dynamic hashing, BIT 18 (1978), 184–201.45. Hosam M. Mahmoud, Evolution of random search trees, John Wiley, New York, 1992.46. James L. Massey (ed.), Special issue on random-access communications, vol. IT-31, IEEE

Transactions on Information Theory, no. 2, March 1985.47. Peter Mathys and Philippe Flajolet, Q–ary collision resolution algorithms in random access

systems with free or blocked channel access, IEEE Transactions on Information Theory IT-31(1985), no. 2, 217–243.

48. S. Nilsson and G. Karlsson, IP–address lookup using LC tries, IEEE Journal on SelectedAreas in Communications 17 (1999), no. 6, 1083–1092.

49. Helmut Prodinger, How to select a loser, Discrete Mathematics 120 (1993), 149–159.50. Mireille Regnier, On the average height of trees in in digital search and dynamic hashing,

Information Processing Letters 13 (1982), 64–66.51. , Analysis of grid file algorithms, BIT 25 (1985), 335–357.52. Hanan Samet, Applications of spatial data structures, Addison–Wesley, 1990.53. , The design and analysis of spatial data structures, Addison–Wesley, 1990.54. Robert Sedgewick, Algorithms in C, Parts 1–4, third ed., Addison–Wesley, Reading, Mass.,

1998.55. Robert Sedgewick and Philippe Flajolet, An introduction to the analysis of algorithms,

Addison-Wesley Publishing Company, 1996.56. Richard P. Stanley, Enumerative combinatorics, vol. II, Cambridge University Press, 1998.57. Wojciech Szpankowski, Average-case analysis of algorithms on sequences, John Wiley, New

York, 2001.


58. Luis Trabb Pardo, Set representation and set intersection, Tech. report, Stanford University,1978.

59. Brigitte Vallee, Dynamical sources in information theory: Fundamental intervals and wordprefixes, Algorithmica 29 (2001), no. 1/2, 262–306.

60. Brigitte Vallee, Euclidean dynamics, October 2005, 69p. Submitted to Discrete and Contin-uous Dynamical Systems.

61. E. T. Whittaker and G. N. Watson, A course of modern analysis, fourth ed., CambridgeUniversity Press, 1927, Reprinted 1973.

62. Herbert S. Wilf, Generatingfunctionology, Academic Press, 1990.63. Andrew Chi Chih Yao, A note on the analysis of extendible hashing, Information Processing

Letters 11 (1980), 84–86.

Algorithms Project, INRIA Rocquencourt, F-78153 Le Chesnay, France

E-mail address: Philippe.Flajolet AT inria.fr

Documents

THE UBIQ UITOUS DIGIT A L TR EEmcw/Teaching/refs/tries/trie... · 2006-03-16 · THE UBIQ UITOUS DIGIT A L TR EE PH ILI PPE FLAJ O L ET Abstra ct. Th e digit al tree als o kno wn