17
A Fast Monte Carlo Algorithm for Site or Bond Percolation M. E. J. Newman R. M. Ziff SFI WORKING PAPER: 2001-02-010 SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or proceedings volumes, but not papers that have already appeared in print. Except for papers by our external faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or funded by an SFI grant. ©NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the author(s). It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only with the explicit permission of the copyright holder. www.santafe.edu SANTA FE INSTITUTE

A Fast Monte Carlo Algorithm for Site or Bond Percolation

Embed Size (px)

DESCRIPTION

title

Citation preview

  • A Fast Monte Carlo Algorithmfor Site or Bond PercolationM. E. J. NewmanR. M. Ziff

    SFI WORKING PAPER: 2001-02-010

    SFI Working Papers contain accounts of scientific work of the author(s) and do not necessarily represent the

    views of the Santa Fe Institute. We accept papers intended for publication in peer-reviewed journals or

    proceedings volumes, but not papers that have already appeared in print. Except for papers by our external

    faculty, papers must be based on work done at SFI, inspired by an invited visit to or collaboration at SFI, or

    funded by an SFI grant.

    NOTICE: This working paper is included by permission of the contributing author(s) as a means to ensure

    timely distribution of the scholarly and technical work on a non-commercial basis. Copyright and all rights

    therein are maintained by the author(s). It is understood that all persons copying this information will

    adhere to the terms and constraints invoked by each author's copyright. These works may be reposted only

    with the explicit permission of the copyright holder.

    www.santafe.edu

    SANTA FE INSTITUTE

  • A fast Monte Carlo algorithm for site or bond percolation

    M. E. J. Newman1 and R. M. Zi21Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM 87501

    2Michigan Center for Theoretical Physics and Department of Chemical Engineering,University of Michigan, Ann Arbor, MI 48109

    We describe in detail a new and highly ecient algorithm for studying site or bond percolationon any lattice. The algorithm can measure an observable quantity in a percolation system for allvalues of the site or bond occupation probability from zero to one in an amount of time which scaleslinearly with the size of the system. We demonstrate our algorithm by using it to investigate anumber of issues in percolation theory, including the position of the percolation transition for sitepercolation on the square lattice, the stretched exponential behavior of spanning probabilities awayfrom the critical point, and the size of the giant component for site percolation on random graphs.

    I. INTRODUCTION

    Percolation [1] is certainly one of the best studied prob-lems in statistical physics. Its statement is trivially sim-ple: every site on a specied lattice is independently ei-ther \occupied," with probability p, or not with proba-bility 1 p. The occupied sites form contiguous clusterswhich have some interesting properties. In particular,the system shows a continuous phase transition at a nitevalue of p which, on a regular lattice, is characterized bythe formation of a cluster large enough to span the entiresystem from one side to the other in the limit of innitesystem size. We say such a system \percolates." As thephase transition is approached from small values of p, theaverage cluster size diverges in a way reminiscent of thedivergence of fluctuations in the approach to a thermalcontinuous phase transition, and indeed one can denecorrelation functions and a correlation length in the ob-vious fashion for percolation models, and hence measurecritical exponents for the transition.

    One can also consider bond percolation in which thebonds of the lattice are occupied (or not) with probabilityp (or 1p), and this system shows behavior qualitativelysimilar to though dierent in some details from site per-colation.

    Site and bond percolation have found a huge variety ofuses in many elds. Percolation models appeared orig-inally in studies of actual percolation in materials [1]|percolation of fluids through rock for example [2{4]|buthave since been used in studies of many other systems, in-cluding granular materials [5,6], composite materials [7],polymers [8], concrete [9], aerogel and other porous me-dia [10,11], and many others. Percolation also nds usesoutside of physics, where it has been employed to modelresistor networks [12], forest res [13] and other ecolog-ical disturbances [14], epidemics [15,16], robustness ofthe Internet and other networks [17,18], biological evolu-tion [19], and social influence [20], amongst other things.It is one of the simplest and best understood examplesof a phase transition in any system, and yet there aremany things about it that are still not known. For ex-ample, despite decades of eort, no exact solution of the

    site percolation problem yet exists on the simplest two-dimensional lattice, the square lattice, and no exact re-sults are known on any lattice in three dimensions orabove. Because of these and many other gaps in our cur-rent understanding of percolation, numerical simulationshave found wide use in the eld.

    Computer studies of percolation are simpler than mostsimulations in statistical physics, since no Markov pro-cess or other importance sampling mechanism is neededto generate states of the lattice with the correct prob-ability distribution. We can generate states simply byoccupying each site or bond on an initially empty latticewith independent probability p. Typically, we then wantto nd all contiguous clusters of occupied sites or bonds,which can be done using a simple depth- or breadth-rstsearch. Once we have found all the clusters, we can easilycalculate, for example, average cluster size, or look for asystem spanning cluster. Both depth- and breadth-rstsearches take time O(M) to nd all clusters, where Mis the number of bonds on the lattice. This is optimal,since one cannot check the status of M individual bondsor adjacent pairs of sites in any less than O(M) opera-tions [21]. On a regular lattice, for which M = 12zN ,where z is the coordination number and N is the numberof sites, O(M) is also equivalent to O(N).

    But now consider what happens if, as is often the case,we want to calculate the values of some observable quan-tity Q (e.g., average cluster size) over a range of valuesof p. Now we have to perform repeated simulations atmany dierent closely-spaced values of p in the range ofinterest, which makes the calculation much slower. Fur-thermore, if we want a continuous curve of Q(p) in therange of interest then in theory we have to measure Qat an innite number of values of p. More practically,we can measure it at a nite number of values and theninterpolate between them, but this inevitably introducessome error into the results.

    The latter problem can be solved easily enough. Thetrick [22,23] is to measure Q for xed numbers of occu-pied sites (or bonds) n in the range of interest. Let usrefer to the ensemble of states of a percolation systemwith exactly n occupied sites or bonds as a \microcanon-

    1

  • ical percolation ensemble," the number n playing the roleof the energy in thermal statistical mechanics. The morenormal case in which only the occupation probability pis xed is then the \canonical ensemble" for the problem.(If one imagines the occupied sites or bonds as represent-ing particles instead, then the two ensembles would be\canonical" and \grand canonical," respectively. Someauthors [24] have used this nomenclature.) Taking sitepercolation as an example, the probability of there beingexactly n occupied sites on the lattice for a canonical per-colation ensemble is given by the binomial distribution:

    B(N; n; p) =

    N

    n

    pn(1 p)Nn: (1)

    (The same expression applies for bond percolation, butwith N replaced by M , the total number of bonds.)Thus, if we can measure our observable within the mi-crocanonical ensemble for all values of n, giving a setof measurements fQng, then the value in the canonicalensemble will be given by

    Q(p) =NX

    n=0

    B(N; n; p)Qn =NX

    n=0

    N

    n

    pn(1 p)NnQn:

    (2)

    Thus we need only measure Qn for all values of n, ofwhich there are N+1, in order to nd Q(p) for all p. Sincelling the lattice takes time O(N), and construction ofclusters O(M), we take time O(M +N) for each value ofn, and O(N2 + MN) for the entire calculation, or moresimply O(N2) on a regular lattice. Similar considerationsapply for bond percolation.

    In fact, one frequently does not want to know the valueof Q(p) over the entire range 0 p 1, but only in thecritical region, i.e., the region in which the correlationlength is greater than the linear dimension L of thesystem. If jp pcj close to pc, where is a (con-stant) critical exponent, then the width of the criticalregion scales as L1= . Since the binomial distribution,Eq. (1), falls o very fast away from its maximum (ap-proximately as a Gaussian with variance of order N1),it is safe also only to calculate Qn within a region scal-ing as L1= , and there are order Ld1= values of N insuch a region, where d is the dimensionality of the lat-tice. Thus the calculation of Q(p) in the critical regionwill take time of order NLd1= = N21=d . In two di-mensions, for example, is known to take the value 43 ,and so we can calculate any observable quantity over thewhole critical region in time N13=8.

    In this paper we describe a new algorithm which canperform the same calculation much faster. Our algorithmcan calculate the values of Qn for all n in time O(N).This is clearly far superior to the O(N2) of the simple al-gorithm above, and even in the case where we only wantto calculate Q in the critical region, our algorithm is stillsubstantially faster doing all values of n, than the stan-dard one doing only those in the critical region. (Because

    of the way it works, our algorithm cannot be used to cal-culate Qn only for the critical region; one is obliged tocalculate O(N) values of Qn.) A typical lattice size forpercolation systems is about a million sites. On such alattice our algorithm would be on the order of a milliontimes faster than the simple O(N2) method, and evenjust within the critical region it would still be on theorder of a hundred times faster in two dimensions.

    The outline of this paper is as follows. In Section IIwe describe our algorithm. In Section III we give resultsfrom the application of the algorithm to three dierentexample problems. In Section IV we give our conclusions.Some of the results reported here made have appearedpreviously in Ref. [25].

    II. THE ALGORITHM

    Our algorithm is based on a very simple idea. In thestandard algorithms for percolation, one must recreatean entire new state of the lattice for every dierent valueof n one wants to investigate, and construct the clus-ters for that state. As various authors have pointedout [22,23,26,27], however, if we want to generate statesfor each value of n from zero up to some maximum value,then we can save ourselves some eort by noticing that acorrect sample state with n+1 occupied sites or bonds isgiven by adding one extra randomly chosen site or bondto a correct sample state with n sites or bonds. In otherwords, we can create an entire set of correct percolationstates by adding sites or bonds one by one to the lattice,starting with an empty lattice.

    Furthermore, the conguration of clusters on the lat-tice changes little when only a single site or bond isadded, so that, with a little ingenuity, one can calculatethe new clusters from the old with only a small amountof computational eort. This is the idea at the heart ofour algorithm.

    Let us consider the case of bond percolation rst, forwhich our algorithm is slightly simpler. We start with alattice in which all M bonds are unoccupied, so that eachof the N sites is its own cluster with just one element.As bonds are added to the lattice, these clusters will bejoined together into larger ones. The rst step in ouralgorithm is to decide an order in which the bonds willbe occupied. We wish to choose this order uniformly atrandom from all possible such orders, i.e., we wish tochoose a random permutation of the bonds. One way ofachieving this is the following:

    1. Create a list of all the bonds in any convenientorder. Positions in this list are numbered from 1to M .

    2. Set i 1.3. Choose a number j uniformly at random in the

    range i j M .

    2

  • a b

    FIG. 1. Two examples of bonds (dotted lines) being addedto bond-percolation congurations. (a) The added bond joinstwo clusters together. (b) The added bond does nothing, sincethe two joined sites were already members of the same cluster.

    4. Exchange the bonds in positions i and j. (If i = jthen nothing happens.)

    5. Set i i + 1.6. Repeat from step 3 until i = M .

    An implementation of this algorithm in C is given in Ap-pendix A. It is straightforward to convince oneself thatall permutations of the bonds are generated by this pro-cedure with equal probability, in time O(M), i.e., goinglinearly with the number of bonds on the lattice.

    Having chosen an order for our bonds, we start to oc-cupy them in that order. The rst bond added will,clearly, join together two of our single-site clusters toform a cluster of two sites, as will, almost certainly, thesecond and third. However, not all bonds added will jointogether two clusters, since some bonds will join pairsof sites which are already part of the same cluster|seeFig. 1. Thus, in order to correctly keep track of the clus-ter conguration of the lattice, we must do two thingsfor each bond added:

    Find: when a bond is added to the lattice we mustnd which clusters the sites at either end belong to.

    Union: if the two sites belong to dierent clusters,those clusters must be amalgamated into a single cluster;otherwise, if the two belong to the same cluster, we needdo nothing.

    Algorithms which achieve these steps are known as\union/nd" algorithms [28,29]. Union/nd algorithmsare widely used in data structures, in calculations ongraphs and trees, and in compilers. They have been ex-tensively studied in computer science, and we can makeprotable use of a number of results from the computerscience literature to implement our percolation algorithmsimply and eciently.

    It is worth noting that measured values for latticesquantities such as, say, average cluster size, are not sta-tistically independent in our method, since the congura-tion of the lattice changes at only one site from one step ofthe algorithm to the next. This contrasts with the stan-dard algorithms, in which a complete new congurationis generated for every value of p investigated, and henceall data points are statistically independent. While thisis in some respects a disadvantage of our method, it turns

    out that in most cases of interest it is not a problem, sincealmost always one is concerned only with statistical in-dependence of the results from one run to another. Thisour algorithm clearly has, which means that an error esti-mate on the results made by calculating the variance overmany runs will be a correct error estimate, even thoughthe errors on the observed quantity for successive valuesof n or p will not be independent.

    In the next two sections we apply a number of dif-ferent and increasingly ecient union/nd algorithms tothe percolation problem, culminating with a beautifullysimple and almost linear algorithm associated with thenames of Michael Fischer [30] and Robert Tarjan [31].

    A. Trivial (but possibly quite ecient)union/nd algorithms

    Perhaps the simplest union/nd algorithm which wecan employ for our percolation problem is the following.We add a label to each site of our lattice|an integer forexample|which tells us to which cluster that site be-longs. Initially, all such labels are dierent (e.g., theycould be set equal to their site label). Now the \nd"portion of the algorithm is simple|we examine the la-bels of the sites at each end of an added bond to see ifthey are the same. If they are not, then the sites be-long to dierent clusters, which must be amalgamatedinto one as a result of the addition of this bond. Theamalgamation, the \union" portion of the algorithm, ismore involved. To amalgamate two clusters we have tochoose one of them|in the simplest case we just chooseone at random|and set the cluster labels of all sites inthat cluster equal to the cluster label of the other cluster.

    In the initial stages of the algorithm, when most bondsare unoccupied, this amalgamation step will be quick andsimple; most bonds added will amalgamate clusters, butthe clusters will be small and only a few sites will haveto be relabeled. In the late stages of the algorithm, whenmost bonds are occupied, all or most sites will belongto the system-size percolating cluster, and hence clusterunions will rarely be needed, and again the algorithmis quick. Only in the intermediate regime, close to thepercolation point, will any signicant amount of work berequired. In this region, there will in general be manylarge clusters, and much relabeling may have to be per-formed when two clusters are amalgamated. Thus, thealgorithm displays a form of critical slowing down as itpasses through the critical regime. We illustrate this inFig. 2 (solid line), where we show the number of site rela-belings taking place, as a function of fraction of occupiedbonds, for bond percolation on a square lattice of 3232sites. The results are averaged over a million runs of thealgorithm, and show a clear peak in the number of rela-belings in the region of the known percolation transitionat bond occupation probability pc = 12 .

    In Fig. 3 (circles) we show the average total number ofrelabelings which are carried out during the entire run of

    3

  • 0.0 0.2 0.4 0.6 0.8 1.0fraction of occupied bonds

    0

    50

    100

    150num

    ber o

    f rel

    abel

    ings

    FIG. 2. Number of relabelings of sites performed, per bondadded, as a function of fraction of occupied bonds, for thetwo algorithms described in Section IIA, on a square latticeof 32 32 sites.

    the algorithm on square lattices of N = L L sites as afunction of N . Given that, as the number of relabelingsbecomes large, the time taken to relabel will be the dom-inant factor in the speed of this algorithm, this graphgives an indication of the expected scaling of run-timewith system size. As we can see, the results lie approxi-mately on a straight line on the logarithmic scales usedin the plot and a t to this line gives a run-time whichscales as N, with = 1:91 0:01. This is a little betterthan the O(N2) behavior of the standard algorithms, andin practice can make a signicant dierence to runningtime on large lattices. With a little ingenuity however,we can do a lot better.

    The algorithms eciency can be improved consider-ably by one minor modication: we maintain a separatelist of the sizes of the clusters, indexed by their clusterlabel, and when two clusters are amalgamated, insteadof relabeling one of them at random, we look up theirsizes in this list and then relabel the smaller of the two.This \weighted union" strategy ensures that we carryout the minimum possible number of relabelings, and inparticular avoids repeated relabelings of the large perco-lating cluster when we are above the percolation transi-tion, which is a costly operation. The average number ofrelabelings performed in this algorithm as a function ofnumber of occupied bonds is shown as the dotted line inFig. 2, and it is clear that this change in the algorithmhas saved us a great deal of work. In Fig. 3 (squares)we show the total number of relabelings as a functionof system size, and again performance is clearly muchbetter. The run-time of the algorithm now appears toscale as N with = 1:06 0:01. In fact, we can provethat the worst-case running time goes as N log N , a formwhich frequently manifests itself as an apparent power

    102 103 104 105

    size of system N

    102

    103

    104

    105

    106

    107

    tota

    l num

    ber o

    f rel

    abel

    ings

    FIG. 3. Total number of relabelings of sites performed in asingle run as a function of system size for the two algorithms(circles and squares) of Section IIA. Each data point is anaverage over 1000 runs; error bars are much smaller than thepoints themselves. The solid lines are straight-line ts to therightmost ve points in each case.

    law with exponent slightly above 1. To see this considerthat with weighted union the size of the cluster to whicha site belongs must at least double every time that siteis relabeled. The maximum number of such doublings islimited by the size of the lattice to log2 N , and hence themaximum number of relabelings of a site has the samelimit. An upper bound on the total number of relabel-ings performed for all N sites during the course of thealgorithm is therefore N log2 N . (A similar argumentapplied to the unweighted relabeling algorithm impliesthat N2 is an upper bound on the running time of thisalgorithm. The measured exponent 1:91 0:01 indicateseither that the experiments we have performed have notreached the asymptotic scaling regime, or else that theworst-case running time is not realized for the particularcase of union of percolation clusters on a square lattice.)

    An algorithm whose running time scales as O(N log N)is a huge advance over the standard methods. However,it turns out that, by making use of some known tech-niques and results from computer science, we can makeour algorithm better still while at the same time actu-ally making it simpler. This delightful development isdescribed in the next section.

    B. Tree-based union/nd algorithms

    Almost all modern union/nd algorithms make use ofdata trees to store the sets of objects (in this case clus-ters) which are to be searched. The idea of using a treein this way seems to have been suggested rst by Gallerand Fischer [32]. Each cluster is stored as a separate tree,

    4

  • FIG. 4. An example of the tree structure described in thetext, here representing two clusters. The shaded sites are theroot sites and the arrows represent pointers. When a bondis added (dotted line in center) that joins sites which belongto dierent clusters (i.e., whose pointers lead to dierent rootsites), the two clusters must be amalgamated by making one(right) a tree of the other (left). This is achieved by adding anew pointer from the root of one tree to the root of the other.

    with each vertex of the tree being a separate site in thecluster. Each cluster has a single \root" site, which is theroot of the corresponding tree, and all other sites possesspointers either to that root site or to another site in thecluster, such that by following a succession of such point-ers we can get from any site to the root. By traversingtrees in this way, it is simple to ascertain whether twosites are members of the same cluster: if their pointerslead to the same root site then they are, otherwise theyare not. This scheme is illustrated for the case of bondpercolation on the square lattice in Fig. 4.

    The union operation is also simple for clusters storedas trees: two clusters can be amalgamated simply byadding a pointer from the root of one to any site in theother (Fig. 4). Normally, one chooses the new pointer topoint to the root of the other cluster, since this reducesthe average distance that will be have to be traversedacross the tree to reach the root site from other sites.There are many varieties of tree-based union/nd algo-rithms, which dier from one another in the details ofhow the trees are updated as the algorithm progresses.However, the best known performance for any such algo-rithm is for a very simple one|the \weighted union/ndwith path compression"|which was rst proposed byFischer [28{30]. Its description is brief:

    Find: In the nd part of the algorithm, trees are tra-versed to nd their root sites. If two initial sites leadto the same root, then they belong to the same cluster.In addition, after the traversal is completed, all pointersalong the path traversed are changed to point directlythe root of their tree. This is called \path compression."Path compression makes traversal faster the next timewe perform a nd operation on any of the same sites.

    Union: In the union part of the algorithm, two clus-ters are amalgamated by adding a pointer from the rootof one to the root of the other, thus making one a subtreeof the other. We do this in a \weighted" fashion as inSection II A, meaning that we always make the smallerof the two a subtree of the larger. In order to do this, wekeep a record at the root site of each tree of the number

    of sites in the corresponding cluster. When two clustersare amalgamated the size of the new composite cluster isthe sum of the sizes of the two from which it was formed.

    Thus our complete percolation algorithm can be sum-marized as follows.

    1. Initially all sites are clusters in their own right.Each is its own root site, and contains a recordof its own size, which is 1.

    2. Bonds are occupied in random order on the lattice.

    3. Each bond added joins together two sites. We fol-low pointers from each of these sites separately un-til we reach the root sites of the clusters to whichthey belong. Then we go back along the paths wefollowed through each tree and adjust all pointersalong those paths to point directly to the corre-sponding root sites.

    4. If the two root sites are the same site, we need donothing further.

    5. If the two root nodes are dierent, we examine thecluster sizes stored in them, and add a pointer fromthe root of the smaller cluster to the root of thelarger, thereby making the smaller tree a subtreeof the larger one. If the two are the same size, wemay choose whichever tree we like to be the subtreeof the other. We also update the size of the largercluster by adding the size of the smaller one to it.

    These steps are repeated until all bonds on the latticehave been occupied. At each step during the run, thetree structures on the lattice correctly describe all of theclusters of joined sites, allowing us to evaluate observablequantities of interest. For example, if we are interested inthe size of the largest cluster on the lattice as a functionof the number of occupied bonds, we simply keep trackof the largest cluster size we have seen during the courseof the algorithm.

    Although this algorithm may seem more involved thanthe relabeling algorithm of Section II A, it turns out thatits implementation is actually simpler, for two reasons.First, there is no relabeling of sites, which eliminates theneed for the code portion which tracks down all the sitesbelonging to a cluster and updates them. Second, theonly dicult parts of the algorithm, the tree traversaland path compression, can it turns out be accomplishedtogether by a single function which, by artful use of re-cursion, can be coded in only three lines (two in C). Theimplementation of the algorithm is discussed in detail inSection II D and in Appendix A.

    C. Performance of the algorithm

    Without either weighting or path compression eachstep of the algorithm above is known to take time lin-ear in the tree size [30], while with either weighting or

    5

  • path compression alone it takes logarithmic time [30,31].When both are used together, however, each step takesan amount of time which is very nearly constant, in asense which we now explain.

    The worst-case performance of the weighted union/ndalgorithm with path compression has been analyzed byTarjan [31] in the general case in which one starts withn individual single-element sets and performs all n 1unions (each one reducing the number of clusters by 1from n down to 1), and any number m n of interme-diate nds. The nds are performed on any sequence ofitems, and the unions may be performed in any order.Our case is more restricted. We always perform exactly2M nds, and only certain unions and certain nds maybe performed. For example, we only ever perform ndson pairs of sites which are adjacent on the lattice. Andunions can only be performed on clusters which are ad-jacent to one another along at least one bond. However,it is clear that the worst-case performance of the algo-rithm in the most general case also provides a bound onthe performance of the algorithm in any more restrictedcase, and hence Tarjans result applies to our algorithm.

    Tarjans analysis is long and complicated. However,the end result is moderately straightforward. The unionpart of the algorithm clearly takes constant time. Thend part takes time which scales as the number of stepstaken to traverse the tree to the root. Tarjan showedthat the average number of steps is bounded above byk(m; n), where k is an unknown constant and the func-tion (m; n) is the smallest integer value of z greaterthan or equal to 1, such that A(z; 4dm=ne) > log2 n.The function A(i; j) in this expression is a slight varianton Ackermanns function [33] dened by

    A(0; j) = 2j for all jA(i; 0) = 0 for i 1A(i; 1) = 2 for i 1A(i; j) = A(i 1; A(i; j 1)) for i 1, j 2. (3)

    This function is a monstrously quickly growing functionof its arguments|faster than any multiple exponential|which means that (m; n), which is roughly speaking thefunctional inverse of A(i; j), is a very slowly growing func-tion of its arguments. So slowly growing, in fact, that forall practical purposes it is a constant. In particular, notethat (m; n) is maximized by setting m = n, in whichcase if log2 n < A(z; 4) then (m; n) (n; n) z. Set-ting z = 3 and making use of Eq. (3), we nd that

    A(3; 4) = 2222o

    65 536 twos: (4)

    This number is ludicrously large. For example, with onlythe rst 5 out of the 65 536 twos, we have

    2222

    2

    = 2:0 1019728: (5)This number is far, far larger than the number of atomsin the known universe. Indeed, a generous estimate of

    102 103 104 105

    size of system N

    103

    104

    105

    106

    nu

    mbe

    r of s

    teps

    take

    n ac

    ross

    tree

    s

    FIG. 5. Total number of steps taken through trees dur-ing a single run of the our algorithm as a function of systemsize. Each point is averaged over 1000 runs. Statistical errorsare much smaller than the data points. The solid line is astraight-line t to the last ve points, and gives a slope of1:006 0:011.

    the number of Planck volumes in the universe would putit at only around 10200. So it is entirely safe to say thatlog2 n will never exceed A(3; 4), bearing in mind that n inour case is equal to N , the number of sites on the lattice.Hence (m; n) < 3 for all practical values of m and nand certainly for all lattices which t within our universe.Thus the average number of steps needed to traverse thetree is at least one and at most 3k, where k is an unknownconstant. The value of k is found experimentally to beof order 1.

    What does this mean for our percolation algorithm?It means that the time taken by both the union and ndsteps is O(1) in the system size, and hence that eachbond added to the lattice takes time O(1). Thus it takestime O(M) to add all M bonds, while maintaining anup-to-date list of the clusters at all times. (To be pre-cise the time to add all M bonds is bounded above byO(M(2M; N)), where (2M; N) is not expected to ex-ceed 3 this side of kingdom come.)

    In Fig. 5 we show the actual total number of stepstaken in traversing trees during the entire course of arun of the algorithm averaged over several runs for eachdata point, as a function of system size. A straight-linet to the data shows that the algorithm runs in time pro-portional to N, with = 1:006 0:011. (In actual fact,if one measures the performance of the algorithm in real(\wallclock") time, it will on most computers (circa 2001)not be precisely linear in system size because of the in-creasing incidence of cache misses as N becomes large.This however is a hardware failing, rather than a prob-lem with the algorithm.)

    6

  • We illustrate the comparative superiority of perfor-mance of our algorithm in Table I with actual timingresults for it and the three other algorithms described inthis paper, for square lattices of 10001000 sites, a typi-cal size for numerical studies of percolation. All programswere compiled using the same compiler and run on thesame computer. As the table shows, our algorithm, inaddition to being the simplest to program, is also easilythe fastest. In particular we nd it to be about 2 mil-lion times faster than the simple depth-rst search [34],and about 50% faster than the best of our relabeling al-gorithms, the weighted relabeling of Section II A. Forlarger systems the dierence will be greater still.

    Site percolation can be handled by only a very slightmodication of the algorithm. In this case, sites are oc-cupied in random order on the lattice, and each site oc-cupied is declared to be a new cluster of size 1. Thenone adds, one by one, all bonds between this site and theoccupied sites, if any, adjacent to it, using the algorithmdescribed above. The same performance arguments ap-ply: generation of a permutation of the sites takes timeO(N), and the adding of all the bonds takes time O(M),and hence the entire algorithm takes time O(M + N).On a regular lattice, for which M = 12zN , where z is thecoordination number, this is equivalent to O(N).

    It is worth also mentioning the memory requirementsof our algorithm. In its simplest form, the algorithm re-quires two arrays of size N for its operation; one is usedto store the order in which sites will be occupied (forbond percolation this array is size M), and one to storethe pointers and cluster sizes. The standard depth-rstsearch also requires two arrays of size N (a cluster labelarray and a stack), as does the weighted relabeling al-gorithm (a label array and a list of cluster sizes). Thusall the algorithms are competitive in terms of memoryuse. (The well-known Hoshen{Kopelman algorithm [35],which is a variation on depth-rst search and runs inO(N2 log N) time, has signicantly lower memory re-quirements, of the order of N11=d, which makes it usefulfor studies of particularly large low-dimensional systems.)

    algorithm time in seconds

    depth-rst search 4 500 000unweighted relabeling 16 000weighted relabeling 4:2tree-based algorithm 2:9

    TABLE I. Time in seconds for a single run of each of thealgorithms discussed in this paper, for bond percolation on asquare lattice of 1000 1000 sites.

    FIG. 6. Initial conguration of occupied (gray) and empty(white) sites for checking for the presence of a spanning clus-ter. In this example there are no periodic boundary condi-tions on the lattice. As the empty sites are lled up duringthe course of the run, the two sites marked will have thesame cluster root site if and only if there is a spanning clusteron the lattice.

    D. Implementation

    In Appendix A we give a complete computer programfor our algorithm for site percolation on the square latticewritten in C. The program is also available for downloadon the Internet [36]. As the reader will see, the imple-mentation of the algorithm itself is quite simple, takingonly a few lines of code. In this section, we make a fewpoints about the algorithms implementation, which maybe helpful to those wishing to make use of it.

    First, we note that the \nd" operation in which a treeis traversed to nd its root and the path compressioncan be conveniently performed together using a recursivefunction which in pseudocode would look like this:

    function nd(i : integer): integerif ptr [i ] < 0 return iptr [i ] := nd(ptr [i ])return ptr [i ]

    end function

    This function takes an integer argument, which is thelabel of a site, and returns the label of the root site ofthe cluster to which that site belongs. The pointers arestored in an array ptr, which is negative for all root nodesand contains the label of the site pointed to otherwise.A version of this function in C is included in the ap-pendix. A non-recursive implementation of the functionis of course possible (as is the case with all uses of re-cursion), and indeed we have experimented with such animplementation. However, it is considerably more com-plicated, and with the benet of a modern optimizingcompiler is found to oer little speed advantage.

    The version of the program we have given in Ap-pendix A measures the largest cluster size as a functionof number of occupied sites. In the limit of large systemsize, this is equal to the size of the giant component, andis for this reason a quantity of some interest. However,there a many other quantities which one might want tomeasure using our algorithm. Doing so usually involves

    7

  • keeping track of some other quantities as the algorithmruns. Here are three examples.

    1. Average cluster size: In order to measure the av-erage number of sites per cluster in site percolation,for example, it is sucient to keep track only of thetotal number c of clusters on the lattice. Then theaverage cluster size is given by n=c, where n is thenumber of occupied sites. To keep track of c, wesimply set it equal to zero initially, increase it byone every time a site is occupied, and decrease itby one every time we perform a union operation.Similarly for cluster size averages weighted by clus-ter size one can keep track of higher moments ofthe cluster-size distribution.

    2. Cluster spanning: In many calculations onewould like to detect the onset of percolation in thesystem as sites or bonds are occupied. One way ofdoing this is to look for a cluster of occupied sitesor bonds which spans the lattice from one side tothe other. Taking the example of site percolation,one can test for such a cluster by starting the lat-tice in the conguration shown in Fig. 6 in whichthere are occupied sites along two edges of the lat-tice and no periodic boundary conditions. (Noticethat the lattice is now not square.) Then one pro-ceeds to occupy the remaining sites of the latticeone by one in our standard fashion. At any point,one can check for spanning simply by performing\nd" operations on each of the two initial clus-ters, starting for example at the sites marked .If these two clusters have the same root site thenspanning has occurred. Otherwise it has not.

    3. Cluster wrapping: An alternative criterion forpercolation is to use periodic boundary conditionsand look for a cluster which wraps all the wayaround the lattice. This condition is somewhatmore dicult to detect than simple spanning, butwith a little ingenuity it can be done, and the ex-tra eort is, for some purposes, worthwhile. Anexample is in the measurement of the position pcof the percolation transition. As we have shownelsewhere [25], estimates of pc made using a clus-ter wrapping condition display signicantly smallernite-size errors than estimates made using clusterspanning on open systems.A clever method for detecting cluster wrapping hasemployed by Machta et al. [37] in simulations ofPotts models, and can be adapted to the case ofpercolation in a straightforward manner. We de-scribe the method for bond percolation, althoughit is easily applied to site percolation also. We addto each site two integer variables giving the x- andy-displacements from that site to the sites parentin the appropriate tree. When we traverse the tree,we sum these displacements along the path tra-versed to nd the total displacement to the root

    a a

    b

    FIG. 7. Method for detecting cluster wrapping on periodicboundary conditions. When a bond is added (a) which joinstogether two sites which belong to the same cluster, it is pos-sible, as here, that it causes the cluster to wrap around thelattice. To detect this, the displacements to the root siteof the cluster (shaded) are calculated (arrows). If the dif-ference between these displacements is not equal to a singlelattice spacing, then wrapping has taken place. Conversely ifthe bond (b) where added, the displacements to the root sitewould dier by only a single lattice spacing, indicating thatwrapping has not taken place.

    site. (We also update all displacements along thepath when we carry out the path compression.)When an added bond connects together two siteswhich are determined to belong to the same cluster,we compare the total displacements to the root sitefor those two sites. If these displacements dier byjust one lattice spacing, then cluster wrapping hasnot occurred. If they dier by any other amount,it has. This method is illustrated in Fig. 7.

    It is worth noting also that, if ones object is only todetect the onset of percolation, then one can halt the al-gorithm as soon as percolation is detected. There is noneed to ll in any more sites or bonds once the perco-lation point is reached. This typically saves about 50%on the run-time of the algorithm, the critical occupationprobability pc being of the order of 12 . A further smallperformance improvement can often be achieved in thiscase by noting that it is no longer necessary to gener-ate a complete permutation of the sites (bonds) beforestarting the algorithm. In many cases it is sucient sim-ply to choose sites (bonds) one by one at random as thealgorithm progresses. Sometimes in doing this one willchoose a site (bond) which is already occupied, in whichcase one must generate a new random one. The probabil-ity that this happens is equal to the fraction p of occupiedsites (bonds), and hence the average number of attemptswe must make before we nd an empty site is 1=(1 p).The total number of attempts made before we reach pcis therefore

    N

    Z pc0

    dp1 p = N log(1 pc); (6)

    or M log(1 pc) for bond percolation. If pc = 0:5, forexample, this means we will have to generate N log 2 0:693N random numbers during the course of the run,

    8

  • rather than the N we would have had to generate to makea complete permutation of the sites. Thus it should bequicker not to generate the complete permutation. Onlyonce pc becomes large enough that log(1 pc) > 1does it start to become protable to calculate the entirepermutation, i.e., when pc > 11=e 0:632. If one wereto use a random selection scheme for calculations overthe whole range 0 p 1, then the algorithm wouldtake time

    N

    Z 11=N0

    dp1 p = N log N (7)

    to nd and occupy all empty sites, which means overalloperation would be O(N log N) not O(N), so generatingthe permutation is crucial in this case.

    One further slightly tricky point in the implementa-tion of our scheme is the performance of the convolution,Eq. (2), of the results of the algorithm with the binomialdistribution. Since the number of sites or bonds on thelattice can easily be a million or more, direct evaluation ofthe binomial coecients using factorials is not possible.And for high-precision studies, such as the calculationspresented in Section III, a Gaussian approximation to thebinomial is not suciently accurate. Instead, therefore,we recommend the following method of evaluation. Thebinomial distribution, Eq. (1), has its largest value forgiven N and p when n = nmax = pN . We arbitrarily setthis value to 1. (We will x it in a moment.) Now wecalculate B(N; n; p) iteratively for all other n from

    B(N; n; p) =

    (B(N; n 1; p)Nn+1n p1p n > nmaxB(N; n + 1; p) n+1Nn

    1pp n < nmax.

    (8)

    Then we calculate the normalization coecient C =Pn B(N; n; p) and divide all the B(N; n; p) by it, to cor-

    rectly normalize the distribution.

    III. APPLICATIONS

    In this section we consider a number of applicationsof our algorithm to open problems in percolation the-ory. Not all of the results given here are new|some haveappeared previously in Refs. [18] and [25]. They are gath-ered here however to give an idea of the types of problemsfor which our algorithm is an appropriate tool.

    A. Measurement of the position ofthe percolation transition

    Our algorithm is well suited to the measurement ofcritical properties of percolation systems, such as positionof the percolation transition and critical exponents. Ourrst example application is the calculation of the position

    FIG. 8. Two topologically distinct ways in which a percola-tion cluster can wrap around both axes of a two-dimensionalperiodic lattice.

    pc of the percolation threshold for site percolation on thesquare lattice, a quantity which for which we currentlyhave no exact result.

    There are a large number of possible methods for de-termining the position of the percolation threshold nu-merically on a regular lattice [38{40], dierent methodsbeing appropriate with dierent algorithms. As discussedin Ref. [25], our algorithm makes possible the use of aparticularly attractive method based on lattice wrappingprobabilities, which has substantially smaller nite-sizescaling corrections than methods employed previously.

    We dene RL(p) to be the probability that for siteoccupation probability p there exists a contiguous clusterof occupied sites which wraps completely around a squarelattice of L L sites with periodic boundary conditions.As in Section II D, cluster wrapping is here taken to bean estimator of the presence or absence of percolation onthe innite lattice. There are a variety of possible ways inwhich cluster wrapping can occur, giving rise to a varietyof dierent denitions for RL:

    R(h)L and R(v)L are the probabilities that there existsa cluster which wraps around the boundary condi-tions in the horizontal and vertical directions re-spectively. Clearly for square systems these twoare equal. In the rest of this paper we refer onlyto R(h)L .

    R(e)L is the probability that there exists a clusterwhich wraps around either the horizontal or verticaldirections, or both.

    R(b)L is the probability that there exists a clusterwhich wraps around both horizontal and verticaldirections. Notice that there are many topologi-cally distinct ways in which this can occur. Two ofthem, the \cross" conguration and the \single spi-ral" conguration, are illustrated in Fig. 8. R(b)L isdened to include both of these and all other pos-sible ways of wrapping around both axes.

    R(1)L is the probability that a cluster exists whichwraps around one specied axis but not the otheraxis. As with R(h)L , it does not matter, for thesquare systems studied here, which axis we specify.

    9

  • 0.0

    0.5

    1.0

    0.0

    0.5

    1.0

    0.55 0.60 0.65

    occupation probability p

    0.0

    0.5

    1.0

    wra

    ppin

    g pr

    obab

    ility

    RL(p

    )

    0.55 0.60 0.650.0

    0.1

    0.2

    (a) (b)

    (c) (d)

    FIG. 9. Plots of the cluster wrapping probability functionsRL(p) for L = 32, 64, 128 and 256 in the region of the per-colation transition for percolation (a) along a specied axis,(b) along either axis, (c) along both axes, and (d) along oneaxis but not the other. Note that (d) has a vertical scaledierent from the other frames. The dotted lines denote theexpected values of pc and R1(pc).

    These four wrapping probabilities satisfy the equalities

    R(e)L = R

    (h)L + R

    (v)L R(b)L = 2R(h)L R(b)L ; (9)

    R(1)L = R

    (h)L R(b)L = R(e)L R(h)L = 12

    (R

    (e)L R(b)L

    ; (10)

    so that only two of them are independent. They alsosatisfy the inequalities R(b)L R(h)L R(e)L and R(1)L R

    (h)L .Each run of our algorithm on the LL square system

    gives one estimate of each of these functions for the com-plete range of p. It is a crude estimate however: sincean appropriate wrapping cluster either exists or doesntfor all values of p, the corresponding RL(p) in the mi-crocanonical ensemble is simply a step function, exceptfor R(1)L , which has two steps, one up and one down. Allfour functions can be calculated in the microcanonical en-semble simply by knowing the numbers n(h)c and n

    (v)c of

    occupied sites at which clusters rst appear which wraphorizontally and vertically. (This means that, as dis-cussed in Section II D, the algorithm can be halted oncewrapping in both directions has occurred, which for thesquare lattice gives a saving of about 40% in the amountof CPU time used.)

    Our estimates of the four functions are improved byaveraging over many runs of the algorithm, and the re-sults are then convolved with the binomial distribution,Eq. (2), to give smooth curves of RL(p) in the canonicalensemble. (Alternatively, one can perform the convolu-tion rst and average over runs second; both are linearoperations, so order is unimportant. However, the con-

    volution is quite a lengthy computation, so it is sensibleto choose the order that requires it to be performed onlyonce.)

    In Fig. 9 we show results from calculations of RL(p)using our algorithm for the four dierent denitions, forsystems of a variety of sizes, in the vicinity of the perco-lation transition.

    The reason for our interest in the wrapping probabili-ties is that their values can be calculated exactly. Exactexpressions can be deduced from the work of Pinson [41]and written directly in terms of the Jacobi #-function#3(q) and the Dedekind -function (q). Pinson calcu-lated the probability, which he denoted (Z Z), of oc-currence of a wrapping cluster with the \cross" topologyshown in the left panel of Fig. 8. By duality, our prob-ability R(e)1 (pc) is just 1 (Z Z), since if there isno wrapping around either axis, then there must neces-sarily be a cross conguration on the dual lattice. Thisyields [42,43]

    R(e)1 (pc) = 1#3(e3=8)#3(e8=3) #3(e3=2)#3(e2=3)

    2[(e2)]2: (11)

    The probability R(1)1 (pc) for wrapping around exactly oneaxis is equal to the quantity denoted (1; 0) by Pinson,which for a square lattice can be written

    R(1)1 (pc) =3#3(e6) + #3(e2=3) 4#3(e8=3)p

    6[(e2)]2: (12)

    The remaining two probabilities R(h)1 and R(b)1 can now

    calculated from Eq. (10). To ten gures, the resultingvalues for all four are:

    R(h)1 (pc) = 0:521058290; R(e)1 (pc) = 0:690473725;

    R(b)1 (pc) = 0:351642855; R(1)1 (pc) = 0:169415435:

    (13)

    If we calculate the solution p of the equation

    RL(p) = R1(pc); (14)

    we must have p! pc as L!1, and hence this solutionis an estimator for pc. Furthermore, it is a very goodestimator, as we now demonstrate.

    In Fig. 10, we show numerical results for the nite-sizeconvergence of RL(pc) to the L =1 values for both siteand bond percolation. (For site percolation we used thecurrent best-known value for pc from Ref. [25].) Notethat the expected statistical error on RL(pc) is knownanalytically to good accuracy, since each run of the al-gorithm gives an independent estimate of RL which iseither 1 or 0 depending on whether or not wrapping hasoccurred in the appropriate fashion when a fraction pc ofthe sites are occupied. Thus our estimate of the meanof RL(pc) over n runs is drawn from a simple binomialdistribution which has standard deviation

    10

  • 1 10 100

    linear dimension L

    106

    104

    102

    R L(p c

    ) R

    (p c

    )

    10 100105

    104

    103

    (a) (b)

    FIG. 10. Convergence with increasing system size ofRL(pc) to its known value at L = 1 for R(e)L (circles) andR

    (1)L (squares) under (a) site percolation and (b) bond per-

    colation. Filled symbols in panel (a) are exact enumerationresults for system sizes L = 2 : : : 6. The dotted lines show theexpected slope if, as we conjecture, RL(pc) converges as L

    2

    with increasing L. The data for R(1)L in panel (b) have been

    displaced upwards by a factor of two for clarity.

    RL =

    rRL(pc)[1RL(pc)]

    n: (15)

    If we approximate RL(pc) by the known value of R1(pc),then we can evaluate this expression for any n.

    As the gure shows, the nite-size corrections to RLdecay approximately as L2 with increasing system size.For example, ts to the data for R(1)L (for which ournumerical results are cleanest), give slopes of 1:95(17)(site percolation) and 2:003(5) (bond percolation). Onthe basis of these results we conjecture that RL(pc) con-verges to its L =1 value exactly as L2.

    At the same time, the width of the critical region isdecreasing as L1= , so that the gradient of RL(p) in thecritical region goes as L1= . Thus our estimate p of thecritical occupation probability from Eq. (14) convergesto pc according to

    p pc L21= = L11=4; (16)where the last equality makes use of the known value = 43 for percolation on the square lattice. This con-vergence is substantially better than that for any otherknown estimator of pc. The best previously known con-vergence was ppc L11= for certain estimates basedupon crossing probabilities in open systems, while manyother estimates, including the renormalization-group es-timate, converge as p pc L1= [40]. It implies thatwe should be able to derive accurate results for pc fromsimulations on quite small lattices. Indeed we expect that

    0.00000 0.00002 0.00004 0.00006 0.00008L11/4

    0.59274

    0.59276

    0.59278

    p c0.002 0.004L2

    0.310

    0.320

    p c

    FIG. 11. Finite size scaling of estimate for pc on squarelattices of L L sites using measured probabilities of clus-ter wrapping along one axis (circles), either axis (squares),both axes (upward-pointing triangles), and one axis but notthe other (downward-pointing triangles). Inset: results of asimilar calculation for cubic lattices of L L L sites usingthe probabilities of cluster wrapping along one axis but notthe other two (circles), and two axes but not the other one(squares).

    statistical error will overwhelm the nite size correctionsat quite small system sizes, making larger lattices notonly unnecessary, but also worthless.

    Statistical errors for the calculation are estimated inconventional fashion. From Eq. (15) we know that theerror on the mean of RL(p) over n runs of the algo-rithm goes as n1=2, independent of L, which impliesthat our estimate of pc carries an error pc scaling ac-cording to pc n1=2L1= . With each run takingtime O(N) = O(Ld), the total time T nLd taken forthe calculation is related to pc according to

    pc Ld=21=p

    T=

    L1=4pT

    ; (17)

    where the last equality holds in two dimensions. Thus inorder to make the statistical errors on systems of dierentsize the same, we should spend an amount of time whichscales as TL Ld2= on systems of size L, or TL

    pL

    in two dimensions.In Fig. 11 we show the results of a nite-size scal-

    ing calculation of this type for pc. The four dierentdenitions of RL give four (non-independent) estimatesof pc: 0:59274621(13) for R

    (h)L , 0:59274636(14) for R

    (e)L ,

    0:59274606(15) for R(b)L , and 0:59274629(20) for R(1)L .

    The rst of these is the best, and is indeed the mostaccurate current estimate of pc for site percolation onthe square lattice. This calculation involved the simula-tion of more than 7109 separate samples, about half ofwhich were for systems of size 128 128.

    11

  • The wrapping probability R(1)L is of particular interest,because one does not in fact need to know its value for theinnite system in order to use it to estimate pc. Since thisfunction is non-monotonic it may in fact intercept its L =1 value at two dierent values of p, but its maximummust necessarily converge to pc at least as fast as eitherof these intercepts. In fact, in the calculation shown inFig. 11, we used the maximum of R(1)L to estimate pc andnot the intercept, since in this case R(1)L was strictly lowerthan R(1)1 for all p, so that there were no intercepts.

    The criterion of a maximum in R(1)L can be used to ndthe percolation threshold in other systems for which ex-act results for the wrapping probabilities are not known.As an example, we show in the inset of Fig. 11 a nite-size scaling calculation of pc for three-dimensional perco-lation on the cubic lattice (with periodic boundary condi-tions in all directions) using this measure. Here there aretwo possible generalizations of our wrapping probability:R

    (1)L (p) is the probability that wrapping occurs along one

    axis and not the other two, and R(2)L (p) is the probabilitythat wrapping occurs along two axes and not the third.We have calculated both in this case.

    Although neither the exact value of nor the expectedscaling of this estimate of pc is known for the three-dimensional case, we can estimate pc by varying the scal-ing exponent until an approximately straight line is pro-duced. This procedure reveals that our estimate of pcscales approximately as L2 in three dimensions, and wederive estimates of pc = 0:3097(3) and 0:3105(2) for theposition of the transition. Combining these results weestimate that pc = 0:3101(10), which is in reasonableagreement with the best known result for this quantityof 0:3116080(4) [44,45]. (Only a short run was performedto obtain our result; better results could presumably bederived with greater expenditure of CPU time.)

    B. Scaling of the wrapping probability functions

    It has been proposed [46,47] that below the percolationtransition the probability of nding a cluster which spansthe system, or wraps around it in the case of periodicboundary conditions, should scale according to

    RL(p) exp(L=) = exp[cL(pc p) ]; (18)where c is a constant. In other words, as a function ofp, the wrapping probability should follow a stretched ex-ponential with exponent . This contrasts with previousconjectures that RL(p) has Gaussian tails [1,48,49].

    The behavior (18) is only seen when the correlationlength is substantially smaller than the system dimen-sion, but also greater than the lattice spacing, i.e., 1 L. This means that in order to observe it clearlywe need to perform simulations on large systems. InFig. 12 we show curves of R(h)L (p) for site percolationon square lattices of 4096 4096 sites. We plot log RL

    0.0000 0.0002 0.0004 0.0006 0.0008(p

    c p)

    6

    4

    2

    0

    log

    R L(h)

    (p)

    FIG. 12. Demonstration of the stretched exponential be-havior, Eq. (18), in simulation results for 4096 4096 squaresystems. The results are averaged over 18 000 runs of the al-gorithm. The curves are for (top to bottom) = 1:2, 1.3, 1.4,1.5, 1.6, and 1.7.

    against (pc p) for various values of , to look forstraight-line behavior. The best straight line is observedfor = 1:4 0:1, in agreement with the expected = 43 ,and the Gaussian behavior is strongly ruled out.

    C. Percolation on random graphs

    For our last example, we demonstrate the use of ouralgorithm on a system which is not built upon a regu-lar lattice. The calculations of this section are insteadperformed on random graphs, i.e., collections of sites(or \vertices") with random bonds (or \edges") betweenthem.

    Percolation can be considered a simple model for therobustness of networks [17,18]. In a communications net-work, messages are routed from source to destination bypassing them from one vertex to another through thenetwork. In the simplest approximation, the networkbetween a given source and destination is functional solong as a single path exists from source to destination,and non-functional otherwise. In real communicationsnetworks such as the Internet, a signicant fraction ofthe vertices (routers in the case of the Internet) are non-functional at all times, and yet the majority of the net-work continues to function because there are many pos-sible paths from each source to each destination, and it israre that all such paths are simultaneously interrupted.The question therefore arises: what fraction of verticesmust be non-functional before communications are sub-stantially disrupted? This question may be rephrasedas a site percolation problem in which occupied vertices

    12

  • represent functional routers and unoccupied vertices non-functional ones. So long as there is a giant component ofconnected occupied vertices (the equivalent of a perco-lating cluster on a regular lattice) then long-range com-munication will be possible on the network. Below thepercolation transition, where the giant component disap-pears, the network will fragment into disconnected clus-ters. Thus the percolation transition represents the pointat which the network as a whole becomes non-functional,and the size of the giant component, if there is one, repre-sents the fraction of the network which can communicateeectively.

    Both the position of the phase transition and the sizeof the giant component have been calculated exactly byCallaway et al. [18] for random graphs with arbitrary de-gree distributions. The degree k of a vertex in a networkis the number of other vertices to which it is connected.If pk is the probability that a vertex has degree k and qis the occupation probability for vertices, then the per-colation transition takes place at

    qc =P1

    k=0 k(k 1)pkP1k=0 kpk

    ; (19)

    and the fraction S of the graph lled by the giant com-ponent is the solution of

    S = q q1X

    k=0

    pkuk; u = 1 q + q

    P1k=0 kpku

    k1P1k=0 kpk

    : (20)

    For the Internet, the degree distribution has been foundto be power-law in form [50], though in practice the powerlaw must have some cuto at nite k. Thus a reasonableform for the degree distribution is

    pk = Ck ek= for k 1. (21)The exponent is found to take values between 2.1 and2.5 depending on the epoch in which the measurementwas made and whether one looks at the network at therouter level or at the coarser domain level. Vertices withdegree zero are excluded from the graph since a vertexwith degree zero is necessarily not a functional part ofthe network.

    We can generate a random graph of N vertices withthis degree distribution in O(N) time using the prescrip-tion given in Ref. [51] and then use our percolation al-gorithm to calculate, for example, the size of the largestcluster for all values of q. In Fig. 13 we show the resultsof such a calculation for N = 1 000 000 and = 2:5 forthree dierent values of the cuto parameter , alongwith the exact solution derived by numerical iteration ofEq. (20). As the gure shows, the two are in excellentagreement. The simulations for this gure took about anhour in total. We would expect a simulation performedusing the standard depth-rst search and giving resultsof similar accuracy to take about a million times as long,or about a century.

    0.0 0.2 0.4 0.6 0.8 1.0fraction of occupied sites q

    0.0

    0.1

    0.2

    0.3

    0.4

    size

    of g

    iant

    com

    pone

    nt S

    = 10 = 20 = 100

    FIG. 13. Simulation results (points) for site percolation onrandom graphs with degree distribution given by Eq. (21),with = 2:5 and three dierent values of . The solid linesare the exact solution of Callaway et al. [18], Eq. (20), for thesame parameter values.

    A number of authors [18,52,53] have examined the re-silience of networks to the selective removal of the verticeswith highest degree. This scenario can also be simulatedeciently using our algorithm. The only modicationnecessary is that the vertices are now occupied in orderof increasing degree, rather than in random order as inthe previous case. We note however that the averagetime taken to sort the vertices in order of increasing de-gree scales as O(N log N) when using standard sortingalgorithms such as quicksort [28], and hence this calcu-lation is dominated by the time to perform the sort forlarge N , making overall running time O(N log N) ratherthan O(N).

    IV. CONCLUSIONS

    We have described in detail a new algorithm for study-ing site or bond percolation on any lattice which cancalculate the value of an observable quantity for all val-ues of the site or bond occupation probability from zeroto one in time which, for all practical purposes, scaleslinearly with lattice volume. We have presented a timecomplexity analysis demonstrating this scaling, empiri-cal results showing the scaling and comparing runningtime to other algorithms for percolation problems, anda description of the details of implementation of the al-gorithm. A full working program is presented in the fol-lowing appendix and is also available for download. Wehave given three example applications for our algorithm:the measurement of the position of the percolation tran-sition for site percolation on a square lattice, for which

    13

  • we derive the most accurate result yet for this quantity;the conrmation of the expected 43 -power stretched expo-nential behavior in the tails of the wrapping probabilityfunctions for percolating clusters on the square lattice;and the calculation of the size of the giant componentfor site percolation on a random graph, which conrmsthe recently published exact solution for the same quan-tity.

    V. ACKNOWLEDGEMENTS

    The authors would like to thank Cris Moore, BarakPearlmutter, and Dietrich Stauer for helpful comments.This work was supported in part by the National ScienceFoundation and Intel Corporation.

    APPENDIX A: PROGRAM

    In this appendix we give a complete program in C forour algorithm for site percolation on a square lattice ofN = LL sites with periodic boundary conditions. Thisprogram prints out the size of the largest cluster on thelattice as a function of number of occupied sites n forvalues of n from 1 to N . The entire program consists of73 lines of code.

    First we set up some constants and global variables:

    #include #include

    #define L 128 /* Linear dimension */#define N (L*L)#define EMPTY (-N-1)

    int ptr[N]; /* Array of pointers */int nn[N][4]; /* Nearest neighbors */int order[N]; /* Occupation order */

    Sites are indexed with a single signed integer label forspeed, taking values from 0 to N 1. Note that on com-puters which represent integers in 32 bits, this programcan, for this reason, only be used for lattices of up to231 2 billion sites. While this is adequate for most pur-poses, longer labels will be needed if you wish to studylarger lattices.

    The array ptr[] serves triple duty: for non-root oc-cupied sites it contains the label for the sites parent inthe tree (the \pointer"); root sites are recognized by anegative value of ptr[], and that value is equal to minusthe size of the cluster; for unoccupied sites ptr[] takesthe value EMPTY.

    Next we set up the array nn[][] which contains a listof the nearest neighbors of each site. Only this arrayneed be changed in order for the program to work witha lattice of dierent topology.

    void boundaries(){int i;

    for (i=0; i

  • int s1,s2;int r1,r2;int big=0;

    for (i=0; i

  • puter Simulation Methods, 2nd edition, p. 444, Addison{Wesley, Reading, MA (1996).

    [24] L. N. Shchur and O A. Vasilyev, \Number of bondsin the site-diluted lattices: sampling and fluctuations,"cond-mat/0005448.

    [25] M. E. J. Newman and R. M. Zi, \Ecient MonteCarlo algorithm and high-precision results for percola-tion," Phys. Rev. Lett. 85, 4104{4107 (2000).

    [26] C. Moukarzel, \A fast algorithm for backbones," Int. J.Mod. Phys. C 9, 887{895 (1998).

    [27] J. E. de Freitas, L. S. Lucena, and S. Roux, \Percola-tion as a dynamical phenomenon," Physica A 266, 81{85(1999).

    [28] R. Sedgewick, Algorithms, 2nd edition, Addison-Wesley,Reading, Mass. (1988).

    [29] D. E. Knuth, The Art of Computer Programming, Vol. 1:Fundamental Algorithms, 3rd Edition, Addison{Wesley,Reading, Mass. (1997).

    [30] M. J. Fischer, \Eciency of equivalence algorithms," inComplexity of Computer Calculations, R. E. Miller andJ. W. Thatcher (eds.), Plenum Press, New York (1972).

    [31] R. E. Tarjan, \Eciency of a good but not linear setunion algorithm," J. ACM 22, 215{225 (1975).

    [32] B. A. Galler and M. J. Fischer, \An improved equivalencealgorithm," Comm. ACM 7, 301{303 (1964).

    [33] W. Ackermann, \Zum Hilbertshen Aufbau der reelenZahlen," Math. Ann. 99, 118{133 (1928).

    [34] We did not, as the reader may have guessed, actually runthe depth-rst search algorithm for 4.5 million seconds(about 2 months), since this would have been wasteful ofcomputing resources. Instead, we ran the algorithm for100 representative values of n, the number of occupiedbonds, and scaled the resulting timing up to estimatethe run-time for all 1 000 000 values.

    [35] J. Hoshen and R. Kopelman, \Percolation and clus-ter distribution: Cluster multiple labelling techniqueand critical concentration algorithm," Phys. Rev. B 14,3438{3445 (1976).

    [36] http://www.santafe.edu/~mark/percolation/[37] J. Machta, Y. S. Choi, A. Lucke, T. Schweizer and L. M.

    Chayes, \Invaded cluster algorithm for Potts models,"Phys. Rev. E 54, 1332{1345 (1996).

    [38] P. J. Reynolds, H. E. Stanley, and W. Klein, \Percolationby position-space renormalisation group with large cells,"J. Phys. A 11, L199{207 (1978).

    [39] P. J. Reynolds, H. E. Stanley, and W. Klein, \Large-cell Monte Carlo renormalization group for percolation,"Phys. Rev. B 21, 1223{1245 (1980).

    [40] R. M. Zi, \Spanning probability in 2D percolation,"Phys. Rev. Lett. 69, 2670{2673 (1992).

    [41] H. T. Pinson, \Critical percolation on the torus," J. Stat.Phys. 75, 1167{1177 (1994).

    [42] R. M. Zi, C. D. Lorenz, and P. Kleban, \Shape-dependent universality in percolation," Physica A 266,17{26 (1999).

    [43] Sometimes (for example, in the symbolic manipulationprogram Mathematica) the -function is dened in com-plex argument form. If we were to use this denition, the-functions in the denominators of Eqs. (11) and (12)would become (i).

    [44] C. D. Lorenz and R. M. Zi, \Universality of the excessnumber of clusters and the crossing probability functionin three-dimensional percolation," J. Phys. A 31, 8147{8157 (1998)

    [45] H. G. Ballesteros, L. A. Fernandez, V. Martn-Mayor,A. Mu~noz Sudupe, G. Parisi, and J. J. Ruiz-Lorenzo,\Scaling corrections: Site percolation and Ising model inthree dimensions," J. Phys. A 32, 1{13 (1999).

    [46] L. Berlyand and J. Wehr, \The probability distributionof the percolation threshold in a large system," J. Phys.A 28, 7127{7133 (1995).

    [47] J.-P. Hovi and A. Aharony, \Scaling and universality inthe spanning probability for percolation," Phys. Rev. E53, 235{253 (1996).

    [48] M. E. Levinshtein, B. I. Shklovskii, M. S. Shur, and A.L. Efros, \The relation between the critical exponentsof percolation theory," Sov. Phys. JETP 42, 197{203(1976).

    [49] F. Wester, \Non-Gaussian distribution of percolationthresholds in nite lattice systems," Int. J. Mod. Phys.C 11, 843{850 (2000).

    [50] M. Faloutsos, P. Faloutsos, and C. Faloutsos, \Onpower-law relationships of the Internet topology," Comp.Comm. Rev. 29, 251{262 (1999).

    [51] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, \Ran-dom graphs with arbitrary degree distributions and theirapplications," cond-mat/0007235.

    [52] R. Albert, H. Jeong, and A.-L. Barabasi, \Error and at-tack tolerance of complex networks," Nature 406, 378{382 (2000).

    [53] A. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Ra-jagopalan, R. Stata, A. Tomkins, and J. Wiener, \Graphstructure in the web," Computer Networks 33, 309{320(2000).

    16