Upload
independent
View
6
Download
0
Embed Size (px)
Citation preview
Outline
โSerial RNG
โBackground
โLCG, LFG, crypto-hash
โParallel RNG
โLeapfrog, splitting, crypto-hash
Serial RNG: LCG
โ Linear-congruential (LCG)
โ ๐๐ = ๐ โ ๐๐โ1 + ๐ ๐๐๐ ๐,
โ a, c and M must be chosen carefully!
โ Never choose ๐ = 231! Should be a prime
โ Park & Miller: ๐ = 16807, ๐ = 214748647 =231 โ 1. ๐ is a Mersenne prime!
โ Most likely in your C runtime
LCG: the good and bad
โ Good:
โ Simple and efficient even if we use mod
โ Single word of state
โ Bad:
โ Short period โ at most m
โ Low-bits are correlated especially if ๐ = 2๐
โ Pure serial
Mersenne Prime modulo
โ IDIV can be 40~80 cycles for 32b/32b
โ ๐ ๐๐๐ ๐ where ๐ = 2๐ โ 1:
โ ๐ = ๐ & ๐ + ๐ โซ ๐ ;
โ ๐๐๐ก ๐ โฅ ๐ ? ๐ โ ๐ โถ ๐;
Lagged-Fibonacci Generator
โ ๐๐ = ๐๐โ๐ โ ๐๐โ๐; p and q are the lags โ โ is =-* mod M (or XOR);
โ ALFG: ๐๐ = ๐๐โ๐ + ๐๐โ๐(๐๐๐ 2๐)
โ * give best quality
โ Period = 2๐ โ 1 2๐โ3; ๐ = 2๐
LFG
โ The good:
โVery efficient: 2 ops + power-of-2 mod
โMuch Long period than LCG;
โDirectly works in floats
โHigher quality than LCG
โALFG can skip ahead
LFG โ the bad
โ Need to store max(p,q) floats
โ Pure sequential โ
โ multiplicative LFG canโt jump ahead.
Mersenne Twister
โ Gold standard ?
โ Large state (624 ints)
โ Lots of flops
โ Hard to leapfrog
โ Limited parallelism
power spectrum
Parallel RNG
โ Maintain the RNGโs quality
โ Same result regardless of the # of cores
โ Minimal state especially for gpu.
โ Minimal correlation among the streams.
Random Tree
โข 2 LCGs with different ๐
โข L used to generate a seed for R
โข No need to know how many generators or # of values #s per-thread
โข GG
Leapfrog with 3 cores
โข Each thread leaps ahead by ๐ using L
โข Each thread use its own R to generate its own sequence
โข ๐ = ๐๐๐๐๐ โ ๐ ๐๐๐๐๐๐๐๐๐
Leapfrog
โ basic LCG without c:
โ ๐ฟ๐+1 = ๐๐ฟ๐๐๐๐ ๐
โ ๐ ๐+1 = ๐๐๐ ๐ ๐๐๐ ๐
โ LCG: ๐ด = ๐๐and ๐ถ = ๐(๐๐ โ 1)/(๐ โ 1) โ each core jumps ahead by n (# of cores)
Leapfrog with 3 cores
โข Each sequence will not overlap
โข Final sequence is the same as the serial code
Leapfrog โ the good
โ Same sequence as serial code
โ Limited choice of RNG (e.g. no MLFG)
โ No need to fix the # of random values used per core (need to fix โnโ)
Leapfrog โ the bad
โ ๐๐no longer have the good qualities of ๐
โ power-of-2 N produce correlated sub-sequences
โ Need to fix โnโ - # of generators/sequences
โ the period of the original RNG is shorten by a factor of โnโ. 32 bit LCG has a short period to start with.
Sequence Splitting
โข If we know the # of values per thread ๐
โข ๐ฟ๐+1 = ๐๐๐ฟ๐ ๐๐๐ ๐ โข ๐ ๐+1 = ๐๐ ๐๐๐๐ ๐
โข the sequence is a subset of the serial code
Leapfrog and Splitting
โ Only guarantees the sequences are non-overlap; nothing about its quality
โ Not invariant to degree of parallelism
โ Result change when # cores change
โ Serial and parallel code does not match
Lagged-Fibonacci Leapfrog
โ LFG has very long period โ Period = 2๐ โ 1 2๐โ3; ๐ = 2๐
โ ๐ can be power-of-two!
โ Much better quality than LCG
โ No leapfrog for the best variant โ โ*โ
โ Luckily the ALFG supports leapfrogging
Issues with Leapfrog & Splitting โ LCGโs period get even shorter
โ Questionable quality
โ ALFG is much better but have to store more state โ for the โlagโ.
Core Idea
1. input trivially prepared in parallel, e.g. linear ramp
2. feed input value into hash, independently and in parallel
3. output white noise
hash
input
output
Magic โdeltaโ
โ ๐๐๐๐ก๐ = 5 โ 1 231
โ Avalanche in 6 cycles (often in 4)
โ * mixes better than ^ but makes TEA twice as slow
SPRNG
โ Good package by Michael Mascagni
โ http://www.sprng.org/
References โ [Mascagni 99] Some Methods for Parallel Pseudorandom Number Generation, 1999.
โ [Park & Miller 88] Random Number Generators: Good Ones are hard to Find, CACM, 1988.
โ [Pryor 94] Implementation of a Portable and Reproducible Parallel Pseudorandom Number Generator, SC, 1994
โ [Tzeng & Li 08] Parallel White Noise Generation on a GPU via Cryptographic Hash, I3D, 2008
โ [Wheeler 95] TEA, a tiny encryption algorithm, 1995.