Upload
gavin-morton
View
323
Download
0
Embed Size (px)
Citation preview
1
Random Number Generation and Testing随机数的生产及检验
W. W. Tsang 曾衛寰Department of Computer Science
The University of Hong Kong
计算机科学系 , 香港大学
http://www.cs.hku.hk/~tsang/RNGT.ppt
7
Mathematics数学
Computer Science
计算机科学
Statistical Computing, Computational Statistic
Random Number Generation and Testing is an interdisciplinary ( 跨学科的 ) area
RNGT
8
Overview
1. Random numbers and their applications 应用 2. Early random number generators (RNGs) in
computers 早期的随机数生产器3. Criteria of good RNGs 标准4. Good RNGs 优秀的生产器5. Goodness-of-fit tests 拟合优度检验6. Statistical tests for RNGs 统计检验7. Conversions of uniform random integers to variates
of other distributions 随机数的变换
9
Entertainment 娱乐Gambling 赌博 Lottery, lucky draw 抽奖Games 遊戏
Cryptography 密码学Key generation
Computer simulation 模拟Software testing 软体测试
Generating testing data
Randomized algorithms 随机化算法 Avoiding worst cases
1. Random numbers and their applications
10
2. Early RNGs in computers
Reading a large file of random numbers 阅读随机数档案 Deterministic 预决的
A 10 billion bit file is available atDiehard Battery of Tests of Randomness v0.2 beta
http://www.csis.hku.hk/~diehard/
Reading of the last few bits of a fast ticking clock 阅读时钟 Unpredictable 不可预测的
11
2. Early RNGs in computers
Mid-square method, 1940s Suggested by John von Neumann in the
development of the first atomic bomb 应用在原子弹的开发
Xn+1 = middle_digits(Xn × Xn )
X = 45086273
X × X = 2032772013030529
new X = 77201303Deterministic Period ( 周期 ) depends on the seed and
is hard to determine
1903-1957 obsolete!
12
2. Early RNGs in computers
Congruential generator, 1951, most commonly usedSuggested by Lehmer Xn+1 = (a X n + c) mod m .
X = 45086273
(X×7654321 + 1) mod 108 = 345104806235634 mod 108
new X = 06235634
Simple, fastestFor 32-bit words, the period can reach 232 Insecure, the formula can be worked out from outputFails in many tests
Sufficiently random for many applications
最简单 , 最常用
13
2. Early RNGs in computers
3D points generated using a congruential RNG
Points fall on planes Ideal random points有模式 , 不够乱
14
2. Early RNGs in computers
Lagged Fibonacci generator, 1958suggested by Mitchell and Moore
Xn = (Xn-24 + Xn-55) mod 232,
n ≥ 55
The period is 231(2551) 长周期 !
Fails in the birthday spacing test
Knuth, The Art of Computer Programming, vol 2, 1998.
mXXX pnqpnn mod
01
31
55
. . .. . .
+
: exclusive-or 异
15
3. Criteria of good RNGs 标准Fast, especially in simulation 快Well distributed 分布正确
pass all statistical tests knownIndependent 独立的Portable and reproducible 在不同的电腦能重複生产的
(for verifying simulation results)Long periods (for deterministic RNGs) 長周期Unpredictable and irreproducible (for cryptography)Security 保密 (for cryptography)Large seed spaces (for deterministic RNGs) 种子的选择要够多
16
4. Good RNGsMersenne Twister, 1988
Makoto Matsumoto &
Takuji Nishimura
Output xk+624T
Period: 2199371Evenly distributed in high dimension Fast, pass all tests, insecure
Matsumoto, M., and Nishimura, T., 1998, Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Trans. Model. Comput. Simul. 8, No. 1, 3-30.
http://www.math.sci.hiroshima-u.ac.jp/%7Em-mat/MT/emt.html
))|(( 1397624 Alk
ukkk xxxx
01
397
624
T
A
. . .
. . .
17
4. Good RNGs
Combined Generators 组合的生产器Combine the outputs of two or more RNGs, eg, using . More evenly distributed, more independent, longer period,
more secureThe universal generators 通用生产器
Combine 2 generators: a Lagged Fibonacci, and Xn+1 = (X n k) mod 1677213
Portable, pass all tests
Marsaglia, G., 1984, A current View of Random Number Generators, Keynote Address, Computer Science and Statistics: 16th Symposium on the Interface, Atlanta.
G. Marsaglia, A. Zaman and W.W. Tsang, Toward a universal random number generator, Letters in Statistics and Probability, 9 (1), 35-39, January 1990.
G. Marsaglia and W.W. Tsang, The 64-bit universal RNG, Letters in Statistics and Probability, Vol. 6, Issue 2, pp. 183-187, January, 2004.
1924 -
18
4. Good RNGs
Combined GeneratorsThe KISS generator (Keep It Simple, Stupid) 简单生产器
Suggested by George MarsagliaCombine three simple generators
A congruential generatorA 3-shift generatorA Multiply-with-carry generator
Pass all tests, popularPeriod: ~2124
http://oldmill.uchicago.edu/~wilder/Code/random/Papers/Marsaglia_2003.html
unsigned long KISS() { static unsigned long x=123456789, y=362436, z=521288629, c=7654321; unsigned long long t, a=698769069LL; x=69069*x+12345; y^=(y<<13); y^=(y>>17); y^=(y<<5); t=a*z+c; c=(t>>32); return x+y+(z=t); }
19
4. Good RNGsTuring Award winner in 2000
Andrew Chi-Chih Yao 姚期智
Contributions 貢献Theory of computation
Complexity
Theory of RNGs 随机数理论
1946 -
Alan Turing, 1912 -1954
If there is no practical way to predict the next bit of an RNG with more than 50% chance, the RNG will pass all statistical tests.
20
4. Good RNGs
Blum-Blum-Shub (BSS) generators, 1986 First generator that fulfills Andrew Yao’s RNG theory
m=pq, | p | = | q |
p and q are distinct primes ( 质数 ) of the form 4z+3
m has 1024 to 4096 bitsOutput the last bit of Xn+1
Well distributed: pass all tests in theory 可以通过所有检验Secure but very slow
The period depends on the seed and can only be worked out using an algorithm.
mXX nn mod21
21
4. Good RNGs
The HAVEGE generatorHArdware Volatile Entropy Gathering and ExpansionAndré Seznec Read the fast changing states in the computer in real time,
eg, cache, Pipeline states, TLB, etc.
阅读电腦內迅速轉变的数据Hardware dependentUnpredictable Irreproducible
http://www.irisa.fr/caps/projects/hipsor/HAVEGE1.0.html
22
5. Goodness-of-fit tests 拟合优度检验The following shows 178 outcomes of a dice ( 骰子 ). Is
the dice honest?
Face values 1 2 3 4 5 6
Observed 14 35 28 25 39 35
1 2 3 4 5 6
A goodness-of-fit test measures the discrepancy between the sample distribution and the purported distribution.
23
5. Goodness-of-fit tests
Pearson’s chi-square test
Compute a statistic, X2, that summarizes the difference
Face values 1 2 3 4 5 6
Expected 29.3 29.3 29.3 29.3 29.3 29.3
Observed 14 35 28 25 3935
985.01.14)(
1
22
pe
eok
i i
ii
24
5. Goodness-of-fit testsThe chi-square test
If the samples are distributed as expected, X2 follows the Chi-square distribution of 5 degrees of freedom.
The p-value is the chance that X2 is smaller then 14.1.
p = Pr[ X2 ≤ 14.1] = 0.985
If the p-value is greater than a pre-determined threshold (eg, 0.95), rejected.
-3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.514.1
25
5. Goodness-of-fit tests
Let X be a random variable that is uniformly distributed in [0,1). The cumulative distribution function (CDF) 累积分布函数 of X is the diagonal from (0,0) to (1,1). 对角线
Suppose x(1), x(2),…, x(n) are samples of X.
x1, x2,…, xn be the ordered x(i)’s.
The Empirical distribution ( 经验分布 )
nxxxx ....321
n
x xxxF ii
n
such that s' ofnumber )(
is the staircase 楼梯
1x 2x 3x 4x 5x
5
11
n
26
5. Goodness-of-fit tests
If the samples truly follows the uniform distribution, the staircase will be close to the diagonal most of the time.
1x 2x 3x 4x 5x
5
11
n
Good fit符合
Bad fit
27
5. Goodness-of-fit tests
The Kolmogorov-Smirov (KS) testMost commonly used 最常用 Measure the maximum absolute distanceDn = max | F(x) –Fn(x) | 1903 - 1987
28
5. Goodness-of-fit tests
The KS testThe p-value is the CDF of Dn. The
CDF of Dn is difficult to evaluate. 很难算
In 2003, we found the long forgotten matrix formula derived by Durbin. It is computationally stable and efficient except in the extreme right tail. We fixed the problem using an approximation. The resulting program evaluates the CDF with 13-digit accuracy for 2 ≤ n ≤ 16000.
G. Marsaglia, W.W. Tsang and J. Wang, Evaluating Kolmogorov's distribution, Journal of Statistical Software, Vol. 8, Issue 18, Pages 1-4, November, 2003. (available at http://www.jstatsoft.org/ )
W. W. Tsang and J. Wang, Evaluating the CDF of the Kolmogorov statistic for normality testing, Proceedings of the COMPSTAT 2004, 16th Symposium of IASC, Prague, August 23-27, 2004, 1893-1900.
29
5. Goodness-of-fit tests
The Anderson and Darling (AD) testSummation of the weighted squares of the vertical
differences 加权面积的平方
More powerful than the KS test
( 比 KS 检验强 )
The CDF of An is harder to evaluate than the CDF of Dn 更难算
dx
xxxFxnA nn )1(
1)}({ 2
安徒生 , 心愛的人
30
The AD test In 2004, Marsaglia published a recursive procedure for
computing the CDF of A∞ with 13-digit accuracy. He also gave an approximation formula for evaluating the An with 3-digit accuracy for n > 35.
G. Marsaglia and J. Marsaglia, Evaluating the Anderson-Darling distribution, Journal of Statistical Software, 9(2): 1-5, 2004.
5. Goodness-of-fit tests
31
6. Statistical tests for RNGs
Statistical tests are used to reject poor RNGsThe collision test 碰撞检验 (An example)
Suppose we throw n balls at random into m cells. A collision occurs when a ball falls into a cell that is occupied. The test counts the no. of collisions (c). A generator passes this test if it doesn’t induce too many or too few collisions.
32
6. Statistical tests for RNGs
The collision test The prob. that c collisions occur is
where is a Sterling no. of 2nd kind.
Knuth, The Art of Computer Programming, vol 2, 1998.
W.W. Tsang, L.C.K. Hui, K.P. Chow, C.F. Chong, and C.W. Tso, Tuning the collision test for power, Conferences in Research and Practice in Information Series, Vol. 26. No. 1, pp. 23-30, 2004. (Proceedings of the 27th Australasian Computer Science Conference, Dunedin, New Zealand, 2004.)
m m m n c
m
n
n cn
( ) ( )
1 1
n
k
n n
n
n
kk
n
k
n
k
10 1
1 1
1
; ;
, otherwise
A p-value is computed from c. If it is greater than a threshold (eg, 0.99), rejected.
33
6. Statistical tests for RNGs
Criteria of good testsPowerful 能力Efficient 效率The experiment is similar to certain important
applications. Eg, the collision test is similar to the insertion of a hash table.
34
6. Statistical tests for RNGs
Knuth’s collection
The most well-known collection of tests for RNGs is the one compiled by Knuth. It comprises 11 tests.
Knuth, The Art of Computer Programming, vol 2,
1938 -
1998.
35
6. Statistical tests for RNGs
The National Institute of Standards and Technology (NIST) of USA has suggested 16 statistical tests for checking cryptographic RNGs ( 美国国家科技標準局 )Frequency (Monobit) TestFrequency Test within a BlockRuns TestTests for the longest Run of Ones in a BlockBinary Matrix Rank TestDiscrete Fourier Transform (Spectral) TestNon-overlapping Template Matching TestOverlapping Template Matching Test
36
6. Statistical tests for RNGsThe NIST collection
Maurer’s “Universal Statistical” TestLampel-Ziv Compression TestLinear Complexity TestSerial TestApproximate Entropy TestCumulative Sums (Cusum) TestRandom Excursions TestRandom Excursions Variant Test
Official Website: Random number generation and testing
<http://csrc.nist.gov/rng/>.
37
6. Statistical tests for RNGs
Diehard is the most widely used testing package for examining RNGs. 最常用的
Developed by George Marsaglia Birthday SpacingsGCDGorilla 大猩猩Overlapping PermutationsBinary Rank nn Binary Rank 68Monkey Tests OPSO, OQSO, DNACount the 1’sCount the 1’s specific
Most powerfulAn RNG passes these tests passes all other tests
38
6. Statistical tests for RNGs
DiehardParking LotMinimum DistanceRandom SpheresThe SqueezeOverlapping SumsRuns Up and DownThe Craps
Diehard Battery of Tests of Randomness v0.2 beta http://www.csis.hku.hk/~diehard/
G. Marsaglia and W.W. Tsang, Some difficult-to-pass tests of randomness, Journal of Statistical Software, Vol. 7, Issue 3, Pages 1-8, January, 2002. (available at http://www.jstatsoft.org/ ).
39
7. Conversions of uniform random integers to variates of other distributionsAn RNG outputs random integers that are uniformly
distributed ( 均匀分布 ), eg, in [0, 232-1]In applications, we often needs random numbers of
other distributions, eg, Uniform in [0, 1)Normal ( 正态分布 ) Exponential ( 指数分布 )Gamma ( 伽玛分布 )PoissonBinomial ( 二项式分布 )
Fast methods are needed for the conversions
40
7. Conversions
Given I, a random integer uniformly distributed in [0, 232-1], generate U that is a uniform random number in [0,1).
U = I / 232
Generate points that are uniformly distributed in a rectangleX = a UY = c U
0 a
c
0
X
Y
41
7. Conversions
If we generate points that are uniformly distributed under a density function ( 密度函数 ), the x-coordinates of the points follow the density distribution
-3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5
42
7. Conversions
-3.5 -2.5 -1.5 -0.5 0.5 1.5 2.5 3.5
The acceptance-rejection ( 接受 - 拒收 ) methodTo generate X with the density f(x), 0 x a1. X = a * U2. Y = b * U3. if (Y < f(X)) return X4. Go to Step 1.
c
f(x)
00
a
AccRej
43
o a b
c
7. Conversions
The Monty Python Method 拼凑法Put a unit rectangle on top of a density (blue area). Flip the cap onto the empty area in top-right. To generate X with the density
1. X = b U2. If (X < a) return X3. Y = c U4. If (Y < f(X) return X5. If (Y > f’(X) return b-X6. Sample from the tail
f’ (x) is f(x) after flipping over.
f’(x)=c-[f(b-x)-c]
f(x)f’(x)
44
7. Conversions
g(x)
f(x)
The Monty Python MethodA tail can be sampled using the acceptance-rejection method. Instead of using a rectangle, use an easy-to-sample density function g(x) that dominates and close to the tail, f(x)
45
7. Conversions
The Monty Python method can be used to generate variates of various distribution, including normal, exponential, gamma, student-t, etc.
G. Marsaglia and W.W. Tsang, The Monty Python method for generating random variables, ACM Transactions on Mathematical Software, Vol. 24, No. 3, Pages 341-350, September, 1998.
G. Marsaglia and W.W. Tsang, The Monty Python method for generating gamma variables, Journal of Statistical Software, Vol. 3, Issue 3, Pages 1-8, January 1999. (available at http://www.jstatsoft.org/ )
G. Marsaglia and W.W. Tsang, A simple method for generating gamma variables, ACM Transactions on Mathematical Software, Vol. 26, No. 3, Pages 363-372, September, 2000.
46
7. Conversions
The Ziggurat method is a sophisticated version of the Monty Python method. Instead of using a unit rectangle, it uses a staircase curve that is close to the density being sampled.
The method leads to the fastest way to sample from normal and exponential. It is used in Matlab and other softwarewww.mathworks.com/company/newsletters/news_notes/clevescorner/spring01_cleve.html
47
7. Conversions The Ziggurat method
References
G. Marsaglia and W.W. Tsang, The ziggurat method for generating random variables, Journal of Statistical Software, Vol. 5, Issue 8, Pages 1-7, October, 2000. (available at http://www.jstatsoft.org/ )
G. Marsaglia and W.W. Tsang, A fast, easily implemented method for sampling from decreasing or symmetric unimodal density functions, SIAM J. Sci. Stat. Comput., Vol. 5, No. 2, June 1984.
48
7. Conversions
The alias method for generating discrete variates 别名法
Suggested by A. J. Walker in 1970sFirst convert a histogram into a rectangle.
Then Stack up the bars into a unit rod
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4
不连续的
0.2
49
7. Conversions
The alias method Sample from the rod using a single U.
First find out which bar the U lands.Then determine whether it lands on the upper or the lower segment
0
0.2
0.4
0.6
0.8
1
Rtn 0
Rtn 4
//First set up V[ ] and K[ ]u = U;L = 1 + 5*u ;If (u > V[L]) then return(K[L]); else return( L );
0 1 2 3 4
50
7. Conversions
The straightforward table look up method ( 查表法 ) for generating discrete variatesLet the distribution of Y be
Pr[Y=1] = 0.345
Pr[Y=2] = 0.103
Pr[Y=3] = 0.276
Pr[Y=4] = 0.050
Pr[Y=5] = 0.226
Generation
Y = V[1000*U]
11.....122...233........344..455........5
0 999
345 103 276 50 226
V
51
7. Conversions
The Table Look-Up methodPr[Y=1] = 0.3 + 0.04 + 0.005
Pr[Y=2] = 0.1 + 0.00 + 0.003
Pr[Y=3] = 0.2 + 0.07 + 0.006
Pr[Y=4] = 0.0 + 0.05 + 0.000
Pr[Y=5] = 0.2 + 0.02 + 0.006
0.8 0.18 0.020
1 1 1 2 3 3 5 5
0 1 2 3 4 5 6 7
V1
1 1 1 1 3 3 ... 3 4 ... 4 5 ... 5
0 1 2 17
V2
1 1 ... 1 2 2 2 3 3 ... 3 5 5 ... 5
0 1 2 19
V3
u = U;
If u < 0.8 return( V1[ 10*u ] );If u < 0.98 return(V2[ 100*(u-0.8) ] );Otherwise return(V3[ 1000*(u-0.98) ])
G. Marsaglia, W. W. Tsang and J. Wang, Fast Generation of Discrete Random Variables, Journal of Statistical Software, Volume 11, Issue 3, July, 2004.
52
8. A summary For simulation, software testing, randomization or games, use
Mersenne twister or KISS. For cryptography, use the recommended RNGs in the standards.
To boost security, combine the RNGs with the HAVEGE or BBS generator, or both.
For goodness-of-fit testing, use the AD test For testing RNGs, use the new version of Diehard For generating continuous variates, use the Monty Python
method For generating discrete variates, use the alias method or the
table lookup methods.
RNGs Mersenne Twister KISS HAVEGE
Times for generating 100M numbers (sec)
3.296 3.235 2.281