Observations on the generation of permutations from random sequences

This article was downloaded by: [University of Auckland Library]On: 05 December 2014, At: 16:14Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

International Journal of ComputerMathematicsPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/gcom20

Observations on the generationof permutations from randomsequencesGerald W. Kimble aa Department of Mathematics , University of Nevada,Reno , Reno, NV, 89557, USAPublished online: 19 Mar 2007.

To cite this article: Gerald W. Kimble (1989) Observations on the generation ofpermutations from random sequences, International Journal of Computer Mathematics,29:1, 11-19, DOI: 10.1080/00207168908803745

To link to this article: http://dx.doi.org/10.1080/00207168908803745

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information(the “Content”) contained in the publications on our platform. However, Taylor& Francis, our agents, and our licensors make no representations or warrantieswhatsoever as to the accuracy, completeness, or suitability for any purposeof the Content. Any opinions and views expressed in this publication are theopinions and views of the authors, and are not the views of or endorsed by Taylor& Francis. The accuracy of the Content should not be relied upon and should beindependently verified with primary sources of information. Taylor and Francisshall not be liable for any losses, actions, claims, proceedings, demands, costs,expenses, damages, and other liabilities whatsoever or howsoever caused arisingdirectly or indirectly in connection with, in relation to or arising out of the use ofthe Content.

This article may be used for research, teaching, and private study purposes.Any substantial or systematic reproduction, redistribution, reselling, loan,sub-licensing, systematic supply, or distribution in any form to anyone isexpressly forbidden. Terms & Conditions of access and use can be found athttp://www.tandfonline.com/page/terms-and-conditions

http://www.tandfonline.com/loi/gcom20

http://www.tandfonline.com/action/showCitFormats?doi=10.1080/00207168908803745

http://dx.doi.org/10.1080/00207168908803745

http://www.tandfonline.com/page/terms-and-conditions

ilnrern. J . Computer M a t h , Val. 29. pp 11-19 I Reprlnts available directly from [he publisher I Photocopying permitted by ltcense only

IG 1989 Gordon and Breach. Sc~encc Publishers, Inc Printed in Great B n t a ~ n

I 1 OBSERVATIONS ON THE GENERATION OF

1 1 PERMUTATIONS FROM RANDOM SEQUENCES

I

I GERALD W. KIMBLE

1 University of Nevada, Reno, Department of Mathematics, Reno, N V 89557, U S A I

I (Received I June 1988)

1 Several algorithms for generating random permutations are examined relative to their random I characteristics and their efficiency. It is noted that uniformly-distributed permutations cannot be 1 generated by sampling a finite portion of a random sequence. An algorithm is constructed which 1 includes as a special case the generation of permutations using n r l 0 g n l - ( 2 ~ ~ " ~ " ~ - 1 ) bits per I permutation.

) KEY WORDS: Random permutations, efficient algorithms, sampling without replacement, shufling.

; C.R. CATEGORIES: F.2.2, G.2 .1 , G . 3 , G.4

1 Diverse methods for generating random permutations were common before electronic computation: drawing objects without replacement; shuffling cards; sampling pairs, triples, etc., from tables of random digits; even consulting directly tables of random permutations [lo]. By contrast, present-day computer methods for this problem have reduced virtually to the one first published by Durstenfeld [5] in 1964. It alone appears in Knuth [9], and in Nijenhuis and Wilf [12].

The algorithm is: Given the objects a , , ..., a,, and random (real) values xi, i = 1 , . . . , n - I , in the interval [0, l), compute ci = Lxi(i+ 1)J; with m = n - i and j = i + c , exchange ai with a j . (This method is based on the factorial representation of integers c , + c,2! + c , 3! + . . . + c,k! , 0 5 ci 5 i, from which a direct 1-1 correspon- dence is possible between the n! values represented by ( n - 1) places and the n! permutations of n objects.) In practice each x-value is truncated to, say, t bits; so perhaps we should refer to this as a computer implementation of the method. Instead references below to the "standard method" will imply truncation.

The standard method is technically deficient in two respects: It does not preserve in the permutations the randomness of the x-values (due to the aforementioned truncations); and it is not as efficient as possible relative to the number of random values it does use. A useful analogy is that of a set of ( n - 1 ) "wheels of fortune", each wheel corresponding to a place in the factorial representations of the integers from 0 to (n!-1). The first wheel is divided in half by two sectors, the second into thirds by three sectors, the last divided evenly by n sectors. Turning the wheels, then when they come to rest observing the positions of the sectors, corresponds to selecting the numbers ci . Restricting the rest positions of the wheels by equally-spaced pegs on the rims of the wheels

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

14 0

5 D

ecem

ber

2014

12 G. W. KIMBLE

corresponds to truncating the x-values. For instance, 64 pegs per wheel simulates the standard method as it would be implemented on a computer having a 6-bit floating-point mantissa (unnormalized). In general, the presence of 2ki pegs corresponds to a sample of k i bits. The analogy dramatizes by extreme cases that the standard method: (1) is not random (for sixty-four pegs are insufficient on a wheel divided into a hundred sectors); (2) is not efticient (only two pegs are needed on the wheel divided into two sectors).

The method used by Moses and Oakford [lo] has been neglected in the computing literature on random permutations since 1964, although in other contexts the approach has been called the rejection technique. The randomness of the generating sequence is preserved even in the computer implementation of the method; the penalty is that the number of samples required per permutation may approach infinity. In Section 2 we review and extend their work concerning the average number of samples used to generate a permutation.

In Section 3 the impossibility of a uniform distribution of permutations generated from a finite set of random numbers is shown to follow from a theorem of Weyl. This result exp!ains the emphasis placed on the rejection method; for if a uniform distribution is to be preserved there is no alternative to the rejection method (that is, a method that involves a potentially infinite number of random samples, such as appears in [I], based on a model of card shuffling due to Shannon).

A generalization of the standard method is introduced in Section 4. It exhibits quantitatively the bias in the distribution of the permutations due to truncation. By varying a numerical parameter the bias can be reduced to an arbitrarily small level, at the expense of more sampling.

Section 5 presents an efficient algorithm. This is made possible by changing the goal to that of obtaining permutations having only a uniform distribution in the limit. The term "quasirandom" is used to describe this distribution. The principal result is that by use of a cycling technique it is possible to generate at least quasirandom permutations from finite random sequences with optimal efficiency.

2. THE REJECTION METHOD

For each value of i from 1 to n - 1 read k , = [log, ( i + 1)l consecutive elements (constituting the ith sample) from a random radix-b sequence to determine an integer hi in the range Oshi<bk' . Set ci equal to hi while hi$ i , else reject the sample and repeat the procedure indefinitely until h i s i occurs. Then, with m = n - i, j = i + c,, exchange ai with aj .

Moses and Oakford 1101 gave an upper bound, nln4, for the expected number of random binary integers (samples) needed to generate a permutation, with n restricted to integral powers of two. It is convenient to extend their work in three stages: first, to generalize from a binary sequence to a radix-b sequence; second, to drop the requirement that n=br; and third, to obtain a much sharper upper bound.

The greatest lower bound on the number of samples, expressed in terms of the

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

14 0

5 D

ecem

ber

2014

RANDOM PERMUTATIONS 13

number of radix-b elements, is relevant to the question of efficient algorithms. It is attained when no samples are actually rejected; hence it equals ~ Y Z , ' k, . For the case n= b' the following formula is easily established by induction on r (or as a special case of Theorem 2 below):

An upper bound is given by the following lemma and theorem. (The lemma is not restricted to the case n = b'.)

LEMMA The expected number of random selections of integers from the interval [O, bk - I] to obtain one in the interval [O, m - I], for m bk, is bk/m.

Proof The probability of success in each trial is p=m/bk, and the expected number of trials for one success is l/p. 0

THEOREM 1 The expected number E,, of radix-b elements required to generate a permutation of n = b' objects is bounded by

Proof The lower bound is a direct consequence of (1). To obtain the upper bound apply the lemma for the values m = bk-' + 1 , . . . , bk; then sk = bk- bk-', the expected number of samples for the sk places of the factorial expansion having index values bk- ' < m 5 bk is

Note next that the average expected number per place, denoted by a,(k) is l/sk times this sum, namely:

The sum C;=, skab(k) is the expected total number of samples. Since k radix-b elements convert to one integer in the range [0, bk- 11, the expected total number of radix-b elements required is

An upper bound for a,(k) is obtained by considering successive lower sums for ji (llx) dx over (b- l)bk-' equally-spaced subintervals. (This is equivalent to one

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

14 0

5 D

ecem

ber

2014

term of the Euler-Maclaurin summation used by integral is an upper bound for the lower sums, and in

sr - 1 1 lim x - =In b.

bk- j

Moses and Oakford.) The the limit then:

From (4) and (6) therefore a,(k) is bounded above by h In h/(b- 1). The replacement of a,(k) by this bound in every term of (5) establishes the upper bound in (2). 0

It is worthwhile to generalize the bounds to arbitrary positive integral values of n > 1. (A basis for the extension can be found, for example, in Knuth [8, p. 43, ex. 42bl.)

THEOREM 2 With r = logh n, the greatest lower hound identified in Theorem 1 is

Proof For b 2 2 ,

Hence

If log, n is an integer then ~ r j = r r l = r and M, = nr - (b* - l)/(b - I) = nrrl- (brr'- l)/(b- I ) . If log,n is not an integer, then LrJ + 1 = rrl, and M,=nrrl- (brrl - l)/(b - 1). 0

The upper bound as expressed above by the inequality (2) is not a close estimate of E,. The computation of another term in the Esuler-Maclaurin summation will sharpen the right-hand side and reinforce an asymptotic value that could be inferred from it.

THEOREM 3 The expected value E, satisfies the inequality E,S U,, where

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

14 0

5 D

ecem

ber

2014

COROLLARY Asymptotically, as a furtction of n with b fixed and n=br, r a positive integer, it is true that

n 1h n nlnn Mb - - 2- ub - Inb ( I - b - I ) '

Proof The limits follow from the expressions for M, and U, given in Theorems 2 and 3.

3. THE NECESSITY O F POTENTIALLY-INFINITE SEQUENCES

Is there a way to generate permutations from a random sequence while avoiding the rejection method's potentially infinite number of samples? The problem can be reduced to providing an algorithm: to generate place numbers of the factorial representation of the integers from a random binary sequence, subject to the conditions: ( 1 ) No bits are wasted; (2) the randomness of the sequence is preserved; (3) only a finite number of bits are required to generate a finite number of factorial places. Or prove that this is impossible [7].

The following result shows that n6 such algorithm exists [6, Theorem 4451.

THEOREM 4 No .finite number N of: random radix-b ( b > 1 ) elements can determine each oj'the integers in the range 0 through b with probability l / ( b + 1 ) .

Proof Any N radix-h elements scaled represent one of bN possible values of a variable v in the interval [0, 1). Suppose that of the integers 0, 1 , . . . , b the integer 0 is identified with the k values of v :that are in the interval [O,(l/b+ I ) ) . Then the probability of selecting 0 as the result of an N-element sample is k/bN. Now b+ 1 is not a divisor of bN since b > I is assumed, implying h < b + 1 < b2; hence k/bN is not equal to l / (b + 1) for any of thepossible values of k.

COROLLARY For b> 1 and n > b, afinite number of random radix-b elements cannot generate a permutation of n objects that is equally likely as any other.

4. THE STANDARD METHOD AND VARIATIONS

The following generalization of the standard method formalizes the truncation that always accompanies that method's: computer implementations in a manner that shows a relationship to the rejection method.

Method S Introduce a real-valued parameter i.2 1, possibly depending on n and i. For each value of i from : 1 to n - 1 read ki = Dogb A(i+ 1)1 consecutive elements from a radix-b sequence to determine an integer hi in the range 0 5 hi<hka; scale it by b-k ' to deterkine x,, O I x i < 1; compute ci= Lxi(i+ 1)J; then with m = n - i and j= i+c,, exchange ai with aj.

The results of the previous section show that the permutations have only an

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

14 0

5 D

ecem

ber

2014

The standard method (as implemented with a t-bit floating-point mantissa) can be viewed as a special case of Method S: choose 2 = 2'/(i + 1 ) and b = 2. Then ki= t for all values of i , and exactly t (n- I ) bits are read. (Alternatively, choose b=2' , A=b/ ( i+ 1); that is, consider each r-bit mantissa as corresponding to one random element of a radix-2' sequence; then k , = 1 for all values of i, in which case ( n - 1) elements are read.) The standard method therefore generates only approximately uniformly-distributed permutations.

Moreover, the standard method uses more bits than the lower bound M,. The following theorem shows that it is not efficient for any value of t for which the method would apply, namely r 2 Flog, nl .

THEOREM 5 For every value o f t 2 [log, nl , the following inequality holds:

Proof By hypothesis, t 2 Flog, nl > rlog, jl for j = 2,3,. . . , n - 1 ; hence

from Theorem 2. 0 ( A method of the past is closely related to another choice of A in Method S. The

choices ).= 10k/(i+ 1) and h= I0 in Method S yield the variation: Read ki= rlog,, 10kl = k random digits to determine each xi, etc. This method, as presented in [4], is biased; for instance, if n= 1 1 and k = 2, then of the remainders from dividing a pair of random digits by n-a list of distinct such remainders constituting a permutation-the remainder 0 occurs more frequently than 1 through 10 do.)

The choice A = 1 yields a submethod of Method S which requires the smallest possible number of random elements. For a fixed value of b, as the value of A approaches infinity, the permutations so generated clearly approach having a uniform distribution. Thus the parameter i provides some measure of the approximation to a uniform distribution. But it is not really a measure of the complexity of the permutations. The degree of randomness of a finite sequence of numbers has been defined by Chaitin [3] in terms of the shortest computer algorithm capable of generating the sequence. Even with A = 1 in Method S the permutations (when considered as integers in the range 1 through n!) clearly have an indefinitely large degree of randomness in Chaitin's sense.

5. A HYBRID METHOD

The form of Method S makes possible a combination of features of the standard method and the rejection method reflecting a different compromise concerning the distribution of the permutations. Suppose we ask only that as more permutations are generated the relative frequencies of occurence of each of the n! distinct

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

14 0

5 D

ecem

ber

2014

RANDOM PERMUTATIONS 17

permutations become equal in the limit, rather than that each permutation be equally likely to occur as it is produced. Then a procedure for this that uses only a finite number of samples is possible. (By analogy to certain developments in the theory of random sequences it would seem appropriate to apply the phrase quusirandom distribution to these permutations; however, we do not explore this connection here [ I l l . ) The algorithm is easily expressed in terms of modular addition.

Method Q. As in Method S, for R 2 1, read k , = [log, A(i+ 1)l elements of a radix-b sequence and scale them by b-k' to determine x i , i = 1,2,. . . , n- 1 . To obtain the pth permutation, p=0,1,2 , . . . , let m=n-i and compute cmP= Lx,(m + I ) J + p mod(m + 1); finally, with j = i + cmp, exchange object ai with aj.

The essence of Method Q is to cycle the assignment of sample values from the sequence through the possible values of each place of the factorial representation. This may be clearer through use of the introductory analogy to wheels of fortune. Imagine that the rims, with their (labelled) pegs, are moveable relative to the enclosed sectors; then the cycling of assignments corresponds to rotating the rims one sector before generating the next permutation. (If the wheel with three sectors labelled 0, 1, 2 has four pegs labelled 00, 01, 10, 11, then the outcomes 00 and 01 would be assigned 0, outcome 10 assigned 1 and outcome 11 assigned 2 during the generation of the first permutation; after rotation of the rim, the outcomes 00 and 01 would be assigned 1, outcome 10 assigned 2 and outcome 11 assigned 0 during the generation of the second permutation, and so on.)

THEORFM 6 Method Q generates quasirandom permutationsfrom a finite number of samples qf a random or quasirandom sequence.

Proof The computation of the factorial coefficients cmp, as given by Method Q, produces in the limit (p-+m) equal relative frequencies for the values of each coefficient. This implies equal relative frequencies of the n! distinct permutations 12, p. 7 1 I].

The above proof is independent of the choice of the parameter 1. The case i= 1 therefore may be singled out profitably. For in this case Method Q uses only the minimum rlumber of elements M , of the rejection method (see Theorem 2). Since no samples are rejected this algorithm is optimally efficient.

References

111 D. Aldous and P. Diaconis, ShufTling cards and stopping times, Amer. Marh. Monthly 93 (1986). 333 -348.

[2] G. De Balbine, Note on random permutations, Math. Cornp. 21 (1967), 71s712. [3] G. J. Chaitin, On the length of programs for computing finite binary sequences, J. ACM 13

(1966), 547-569. [4] W. G. Cochran and G. M Cox, Experimental Designs. 2nd ed., Wiley, New York, 1957. [ S ] R. Durstenfeld, Random permutation (algorithm 235), Comm. ACM 7 (1964), 420. [ 6 ] G. H. Hardy and E. M. Wright, A n Introduction to the Theory of Numbers, 4th ed., Clarendon

Press, Oxford, 1960.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

14 0

5 D

ecem

ber

2014

18 G. W. KIMBLE

[7] G. W. Kimble, Research problem (no. 12). In: Algorithmic Aspects of Comhinatorics (B. Alspach, P. Hell and D. J. Miller, eds.). North-Holland, Amsterdam, 1978.

[8] D. E. Knuth, Fundamental Algorithms, 2nd ed., Addison-Wesley, Reading, MA, 1973. [9] D. E. Knuth, Seminumerical Algorithms, 2nd ed.. Addison-Wesley, Reading, MA, 1981.

[lo] L. E. Moses and R. V. Oakford, Tables of Random Permutations, Stanford University Press, Stanford. 1963.

[ I l l H. Neiderreiter, Quasi-Monte Carlo methods and pseudo-random numbers, Bull. Amer. Math. Soc. 84 (1978). 957-1041.

[12J A. Nijenhuis and H. S. Wilf. Combinatorial Algorithms, 2nd ed., Academic Press, New York. 1978.

APPENDIX

Proof of Theorem 3 Let sk= bk- bk-', p = rrl, and t = n - bP-'. From (5) in the proof of Theorem 1

where

and

Now apply Euler-Maclaurin summation with f ( x ) = l/(bk - x) to obtain

with R 2 5 0 . Thus, for k = 1.2 ,..., p- I,

b lnb 1 b + l ab(k) 5- - -+-

b-1 2bk-' 12b2k-1'

Similarly, in the case k = p, since

then

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

14 0

5 D

ecem

ber

2014

RANDOM PERMUTATIONS

with (R"; 5 I R , ~ . Hence

It follows that

Replace a,(k) and ab(p) in (7) to obtain

Use routine algebraic simplification and In n = r ln b to get the equivalent expression

A further rearrangement of terms together with the interpretation p=rr1 proves the result.

Dow

nloa

ded

by [

Uni

vers

ity o

f A

uckl

and

Lib

rary

] at

16:

14 0

5 D

ecem

ber

2014

Documents

Observations on the generation of permutations from random sequences