25
Random Numbers in HEP Tests of Random Engines in CLHEP Heinrich’s note about RandEngine

Random Numbers in HEP

Embed Size (px)

DESCRIPTION

Random Numbers in HEP. Tests of Random Engines in CLHEP Heinrich’s note about RandEngine. Essential Concepts. Instances Use of a random in one module need not affect the sequence of randoms seen by another Engines vs Distributions An Engine is a source of uniform (pseudo-) randomness - PowerPoint PPT Presentation

Citation preview

Page 1: Random Numbers in HEP

Random Numbers in HEP

Tests of Random Engines in CLHEP

Heinrich’s note about RandEngine

Page 2: Random Numbers in HEP

Essential Concepts• Instances

– Use of a random in one module need not affect the sequence of randoms seen by another

• Engines vs Distributions– An Engine is a source of uniform (pseudo-) randomness– A Distribution uses an engine to fire off variates

• Flat()• Gaussian()

• Validity of Randomness – How {unbiased, ergodic, unpredictable} is the engine?

Page 3: Random Numbers in HEP

How Engines Can be Bad

• Wrong single-number statistics– Wrong mean (c.f. the buggy Linux RandEngine)– Wrong sigma (or higher moment)– Very easy to detect such a major deficiency– Non-uniform distribution with right moments

• Possible! (but mathematically pathological)• And certainly possible to an excellent approximation *

• Sequence-order dependent flaws– Can be harder to detect– Can still lead to incorrect physics!

Page 4: Random Numbers in HEP

An engine with good moments but flawed distribution

• Every engine on a real computer!– (But if the granularity is small enough this

“flaw” becomes moot)

Page 5: Random Numbers in HEP

Typical Sequence Flaws• Tom Nash, doing a thesis in the 70’s, found

that pairs of numbers from the “random” engine in use lied in stripes.

• He was far from the first to notice this!

• But it was affecting the physics

Page 6: Random Numbers in HEP

Testing an Engine• The general test:

– Start with a mathematical observation: “If Sn is an ergodic sequence, uniformly distributed on (0,1), then for all subsequences T, some property P should hold.”

– Property P should hold, to accuracy (n) and with probability p(n), where (n)0 and p(n) 1 for large n.

– Apply the “Does P hold” test to a bunch of sequences generated by the engine.

• An Engine that passes every possible test based on every possible property to perfect precision would be a perfect random engine– But mere mortals have to pick some set of tests– And have to settle for some finite precision on each test

Page 7: Random Numbers in HEP

The DIEHARD Suite• Invented (more like collected) by Marsaglia.• 13 tests (for 13 properties of the sequence).• Each test uses the sequence to generate a group of

numbers, with mathematically known distribution for random sequences.

• The distribution obtained is compared via a Kolmogorov-Smyrnoff test with the proper distribution.

• If any P-value is extremely close to 1, this is proof of non-randomness of the sequence.

Page 8: Random Numbers in HEP

Interpreting the Results of a test suite

• The longer the sequences used, the more discrimination power.

• If no test fails, that is not proof that the sequence is random:– It is proof that either the tester was not clever enough to

find and expose non-random properties, or that the sequence used was not long enough.

– But for practical uses, some suite of tests at some level of N is “good enough.”

Page 9: Random Numbers in HEP

The DIEHARD Tests

• Birthday Spacings– The likelihood of j birthdays being shared among m people in an n-day

year is Poisson distributed with mean m3/(4n).– Not order-sensitive.

• Overlapping 5-permutation Test– Each 5-number subsequence forms one of 120 possible ordering numbers.

The count for each should be equal and uncorrelated.

• Binary Rank Test (32x32 and 6x8)– The distribution of ranks of 32x32 bit matrices is known theoretically.

This detects many linear realations involving up to 1000 bits of the sequence.

• Bitstream Test• For a random stream of 221 bits, the number of “missing” 20-bit “words”

should be normally distributed with mean 142,000 and sigma 428.

Page 10: Random Numbers in HEP

and…

• Overlapping Pairs and Quadruples, Sparse Occupancy– Missing 2- and 4-”letter” words in a long sequence of “letters” from a

1024-letter alphabet

– Not very order sensitive past N=20 or 40

• DNA Test– Missing “10-genome” sequences in the 4-letter genome alphabet

• Count-the-1-’s test on a stream of or specific bytes– Uses the number of 1’s in 8-bit sequences, to form pseudo-letters, which

are then grouped into 5-letter words.

• Runs Test– The theoretical distributions and covariances fro up and down runs of N

numbers are known.

Page 11: Random Numbers in HEP

and…

• Overlapping Pairs and Quadruples, Sparse Occupancy– Missing 2- and 4-”letter” words in a long sequence of “letters” from a

1024-letter alphabet

– Not very order sensitive past N=20 or 40

• DNA Test– Missing “10-genome” sequences in the 4-letter genome alphabet

• Count-the-1-’s test on a stream of or specific bytes– Uses the number of 1’s in 8-bit sequences, to form pseudo-letters, which

are then grouped into 5-letter words.

• Runs Test– The theoretical distributions and covariances fro up and down runs of N

numbers are known.

Page 12: Random Numbers in HEP

and…

• Craps test– A long sequence of games of craps is simulated. The distribution of wins

and losses, and of the number of dice throws for each game, is known.

• Parking Lot Test– Make a number of attempts to park a circle in a random “lot” of a square

of side 100. Re-try if there are collisions. After 120,000 tries, the number of successes shuld match some (experimentally determined) distribution.

• Overlapping Sums Test– The sums of 100 variables should be virtually normally distributed.– Sensitive to medium-scale imperfections.

• Squeeze Test– Number of iterations needed to reduce 231 down to 1, by doing

k = ceiling(k*random()).

Page 13: Random Numbers in HEP

The Minimum Distance Test

• The last two tests are the MD test in 2 and 3 dimensions:– Use N pairs (triplets) of randoms as coordinates

of N points in the unit volume. – The closest pair of points has some distance d.– d has a “known” theoretical distribution

• As applied by Marsaglia, most generators semi-fail the 2-D Minimum distance test!

Page 14: Random Numbers in HEP

The Minimum Distance Test• The last two tests are the MD test in 2 and 3 dimensions:

– Use N pairs (triplets) of randoms as coordinates of N points in the unit volume.

– The closest pair of points has some distance d.– d has a “known” theoretical distribution

• As applied by Marsaglia, most generators semi-fail the 2-D Minimum distance test!– P-values of .99 are typical.– They did not semi-fail when the FORTRAN versions of the tests

were applied.– The f2c version showed this behavior.– ?

Page 15: Random Numbers in HEP

The Minimum Distance Test

• Marsaglia speculated this was due to some funny business (error) with converting array code using f2c.

• Turns out it was a function of being able to do more iterations on later processors.

• The theoretical distribution had been calculated only to lowest order. By the time ZOOM did these tests, we were exposing the fact that the actual distribution differed.

• See my Technical memo TM2170, where I do the next few orders. This resolves the discrepancy.

Page 16: Random Numbers in HEP

The CLHEP Engines

• JamesRandom– Fred James’ adaptation of Marsaglia’s add-with-carry generator. Large state.

• Ranecu– Multiplicative Congruential generator using formula constants of L'Ecuyer. Large

state.

• Ranecu– Multiplicative Congruential generator using formula constants of L'Ecuyer. Loarge

state.

• Ranlux and Ranlux64– Luescher’s “folding” generator. For luxury levels of 2+ it is excellent; for luxury 4

is “provably” ergodic. Moderate state.

• TripleRand– Used in the Canopy Lattice gauge computations, this combines a Hurd primitive

polynomial generator with two other unrelated generators to form a family of trustworthy yet distinct sequences. Moderate state.

Page 17: Random Numbers in HEP

The CLHEP Engines

• JamesRandom– Fred James’ adaptation of Marsaglia’s add-with-carry generator. Large state.

• Ranecu– Multiplicative Congruential generator using formula constants of L'Ecuyer. Large

state.

• Ranecu– Multiplicative Congruential generator using formula constants of L'Ecuyer. Loarge

state.

• Ranlux and Ranlux64– Luescher’s “folding” generator. For luxury levels of 2+ it is excellent; for luxury 4

is “provably” ergodic. Moderate state.

• TripleRand– Used in the Canopy Lattice gauge computations, this combines a Hurd primitive

polynomial generator with two other unrelated generators to form a family of trustworthy yet distinct sequences. Moderate state.

Page 18: Random Numbers in HEP

More CLHEP Engines• Hurd288 and Hurd 160

– Based on primitive polynomials over Z(2). Provably ergodic (but I don’t know how well to trust the proof). Used as basis for TripleRand and DualRand. Small state.

• RanshiEngine– Modeled on spinning balls. Very large state, but fast and excellent.

• MTwistEngine– Based on number-theory considerations involving large Mersenne primes.

Very Large state. Many randomness properties proven, though the math is difficult. Fast and excellent (can be made even faster…)

• All the above engines have excellent randomness properties.

Page 19: Random Numbers in HEP

More CLHEP Engines• Hurd288 and Hurd 160

– Based on primitive polynomials over Z(2). Provably ergodic (but I don’t know how well to trust the proof). Used as basis for TripleRand and DualRand. Small state.

• RanshiEngine– Modeled on spinning balls. Very large state, but fast and excellent.

• MTwistEngine– Based on number-theory considerations involving large Mersenne primes.

Very Large state. Many randomness properties proven, though the math is difficult. Fast and excellent (can be made even faster…)

• All the above engines have excellent randomness properties.

Page 20: Random Numbers in HEP

And some Low-quality CLHEP Engines• DRand48Engine

– Based on a common Unix system random number generator. Very small state and fast, but fails several randomness tests.

• RandEngine– Directly calls the system function rand(). Not portable; state not savable. Its claim to

fame is that the usual implementation of rand() is dissected in Knuth and shown to be a terrible generator!!

– RandEngine, when coded properly, fails half of the DIEHARD tests• NonRandomEngine

– Lets people test program behavior on a set forced sequence of randoms. Requires user to specify a sequence. Useful for testing. If you use this to try to generate true random numbers, you get what you deserve.

• Why was RandEngine left in CLHEP?– According to a web page circa 1997, because Geant4 insisted on it.– Fortunately, they then proceeded not to use it.

Page 21: Random Numbers in HEP

How the CLHEP Engines do on DIEHARD

• DRand48Engine failed the three “alphabet sequences” tests– indicating bad correllations at the range of 5-

number subsequences

• Ranlux at luxury level 0 (which was never recommended) failed the Birthday and Craps tests.– At luxury level above 1 it is fine

Page 22: Random Numbers in HEP

How the CLHEP Engines do on DIEHARD

• Several engines semi-failed the Minimum Distance test– The more numbers we could afford to generate, the

more likely the failure– We now understand the flaw– A summer student in 1998 coded the test in C++ from

scratch, with my correction terms in the distribution, and all the “good” engines passed.

• All engines other than RandEngine and DRand48 pass the entire DIEHARD Suite.

Page 23: Random Numbers in HEP

RandEngine• On systems until Linux, rand() used a

standardized 15-bit linear congruence– Lame, see Knuth– But kind of OK.

• At some point, an attempt was made in CLHEP to combine 2 calls to rand() by a shift and OR – to get 3 bits of randomness.– Still just as lame

Page 24: Random Numbers in HEP

RandEngine

• This coding was blown, in that when Linux came along with a 32-bit result for rand(), the OR caused the average value returned to be 5/8!!!!

• Results from programs run on Linux which used RandEngine must be discarded.

• Fortunately, it appears nobody trusted RandEngine anyway– CDF uses Ranecu– Geant4 does not have RandENgine anywhere in there

source code.

Page 25: Random Numbers in HEP

Recommendations

• Ranecu is perfectly fine, no need to change.• Ranlux on level 2 is about as fast but “provably”

reliable.• MTwist (if you have only a few instances so can

afford a very large state), TripleRand, and JamesEngine are about twice as fast and very probably just as good or better.

• Ranshi is faster still but has a huge state, and I’m not 100% convinced it can’t have flaws.