33
Massively Parallel Random Number Generators Holger Dammertz | September 15, 2010

Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

  • Upload
    others

  • View
    4

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Massively Parallel Random Number Generators

Holger Dammertz | September 15, 2010

Page 2: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 2 Massively Parallel Random Number Generators | Overview | September 15, 2010

Overview

Applications for Pseudo-Random NumbersI Monte Carlo Simulation, Integration

I Test and Content Generation

I These applications are often easily parallelizableI MRIP paradigm: multiple replications in parallel

Generating Pseudo-Random NumbersI Generate i.i.d. uniform random numbers (for example 32bit)

I Transform into (0,1)

I Additional transformation to target distribution (for example, normaldistributed)

Page 3: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 3 Massively Parallel Random Number Generators | Overview | September 15, 2010

Structure of a RNG

Formal Definition [L’E06]

(S ,µ, f ,U,g)

I S state space

I µ prob. distr. on S to select initial state (seed) s0 ∈ S

I f : S → S transition function

I g : S → U output function

I U = (0,1) output space

I si = f (si−1), i ≥ 1 and ui = g(si )

Page 4: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 4 Massively Parallel Random Number Generators | Overview | September 15, 2010

Structure of a RNG

In a parallel Implementation

(S ,µ, f ,U,g)

I S needed per generating stream (usually per thread)

I S if possible hold in fast memory

I S store in global memory after finishing for multiple calls

I f and g are device functions (usually in a single function)

I output space often unsigned int → transform needed

Example: Linear Congruence Generator LCG [Knu81]

I si ∈ S is an integer (for example 32bit), U = S ,g = id

I f : si = (asi−1 + c) mod m

I Needs well chosen a, c and m

Page 5: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 5 Massively Parallel Random Number Generators | Overview | September 15, 2010

Structure of a RNG

Required properties of a RNGI Speed

I Repeatability

I Minimal statistical bias

Additional propertiesI Random access on uiI Independent number streams

I Long period

Page 6: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 6 Massively Parallel Random Number Generators | Overview | September 15, 2010

Structure of a RNG

Why is a long period important?

From [SPM05]: for a cycle length of n a single simulation should use at most

16 3√n

random numbers (to trust the results of statistical simulation).Assume period of 248 and simple parallel cuda app with 4096 threads:

163√

248

4096≈ 256

random numbers per thread.

Page 7: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 7 Massively Parallel Random Number Generators | Overview | September 15, 2010

Choice of RNG parameters are important

Simple LCG, 210−3 points

Page 8: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 8 Massively Parallel Random Number Generators | Parallelization Techniques | September 15, 2010

Two different views on parallel random numbers

1) Single Stream for all ThreadsI Each thread computes parts of one problem

I Result should not depend on number of threads and should be repeatable

I Ideally RNG update is also parallel (→ speed)

Page 9: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 9 Massively Parallel Random Number Generators | Parallelization Techniques | September 15, 2010

Two different views on parallel random numbers

2) Each Thread uses it’s own RNGI Each thread computes individual solutions

I Need to guarantee independence of streams

I If you could have a true RNG both methods would behave exactly the same(but repeatability would be lost).

I Using Pseudo RNGs this needs to be explicitly designed in the program

Page 10: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 10 Massively Parallel Random Number Generators | Parallelization Techniques | September 15, 2010

Parallelizing RNGs

Random SeedingI Easy to implement

I Generally very bad parallelization methodI Need to seed valid statesI No guarantee of independence

I Can work for generators with a long period

I Better alternative: well chosen seeds (per thread)I For example: Mersenne Twister dcmt libraryI New GPU Mersenne Twister (MTGP) provides seed tables

ParametrizationI n different RNG configurations for n threads

I Needs to be especially developed and tested

I Often restricted to specific n

Page 11: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 11 Massively Parallel Random Number Generators | Parallelization Techniques | September 15, 2010

Parallelizing RNGs

Block SplittingI Can guarantee non overlapping sequences of length m

I m needs to be known in advance

I Starting states need to be known or computed

Page 12: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 12 Massively Parallel Random Number Generators | Parallelization Techniques | September 15, 2010

Parallelizing RNGs

Leap-FroggingI Needs RNG to be able to skip n numbers (or else quite inefficient)

Page 13: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 13 Massively Parallel Random Number Generators | Application Scenarios | September 15, 2010

Application Design Considerations

On the fly Computation

Compute RN when needed in kernel

I State needs to be stored per stream

I RNG uses additional resources (registers and memory)

I Needs properly parallelized implementation

Pre-computation

Store RN in main memory

I Only memory access per RN

I Easily parallelizable (same access as leap frog)

I Memory requirements can be huge

I Bandwidth need to be considered

Page 14: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 14 Massively Parallel Random Number Generators | Application Scenarios | September 15, 2010

Upload vs. Computation Example

Intel(R) Xeon(R) E5420 @ 2.50GHz, GTX 480, 256MB random numbers,Mersenne Twister: MTGP/SFMT, 214 threads, blocksize 256

PrecomputationI Upload CPU-computed RN

I Precompute on CPU (SFMT): 180msI Upload: 44ms

I Precompute on GPUI Precompute on GPU (MTGP): 9.18ms

I Consume (Asian Options): 783.5ms

Produce and consume (MTGP)I Single Kernel: 1058.1ms

Page 15: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 15 Massively Parallel Random Number Generators | Overview of some RNGs | September 15, 2010

Overview of some RNGs

LCGs

si = (asi−1 + c) mod m

I Combining multiple LCGs can give longer period

I Independent streams: Wichmann-Hill (273 threads)

Multiple Recursive Generator

si =

(k

∑ξ=1

aξ si−ξ

)mod m

I Larger period (for k=1 equal to LCG)

I Blocking: MRG32k3a [LSCK02]

Page 16: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 16 Massively Parallel Random Number Generators | Overview of some RNGs | September 15, 2010

Overview of some RNGs

RNGs based on Cryptographic functionsI Creates white noise from input

I MD5 [TW08]: hash function

I Tiny Encryption Algorithm (TEA) [ZOC10]

I Different configuration per thread (parametrization)I Transform counter and thread id

I Random Access in the sequence

Mersenne TwisterI New GPU version [Sai10]

I Long period: 32bit version provides 211213−1,223209−1,244497−1

I Good seeding strategies (see MTGPDC)

Page 17: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 17 Massively Parallel Random Number Generators | Testing of RNGs | September 15, 2010

Testing RNGs

Why use testsI Implementation of RNGs is very sensitive

I Before using any RNG implementation it should be tested

I Failed tests: very likely bad sequence

I Passed tests: guarantees nothing

Test SuitsI Use statistical tests to find flaws

I DIEHARD + NIST (both integrated in DIEHARDER [Bro09])

I TestU01 [LS07]

Page 18: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 18 Massively Parallel Random Number Generators | Our Collection | September 15, 2010

Collection of Parallel Random Number Generators

Selecting suitable RNGsI Domain specific problem

I Depends on current compiler and hardware

Our CollectionI CUDA implementation of several different RNGs

I Easy to use

I Currently tested on Linux

I Most of the code: MIT License

I Copy-paste ready code

Available athttp://mprng.sf.net

Page 19: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 19 Massively Parallel Random Number Generators | Our Collection | September 15, 2010

Collection of Parallel Random Number Generators

Integrated BenchmarkI Benchmarks compiles and runs on your machine

I Configure threads and blocks according to your target application

Integrated Test SuitsI DIEHARDER and TestU01

I Automatic report generation

Easy to extendI Integrate your own RNG

I Integrate your own tests

Page 20: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 20 Massively Parallel Random Number Generators | Some Results | September 15, 2010

Comparisons: TestU01 SmallCrush Battery

RNG Name Bir

thd

ayS

pac

ings

Col

lisio

n

Gap

Sim

pP

oker

Cou

pon

Col

lect

or

Max

Oft

Max

Oft

AD

Wei

ghtD

istr

ib

Mat

rixR

ank

Ham

min

gIn

dep

Ran

dom

Wal

k1H

Ran

dom

Wal

k1M

Ran

dom

Wal

k1J

Ran

dom

Wal

k1R

Ran

dom

Wal

k1C

combinedLCGTaus P P P P P P P P P P P P P P P

drand48gpu P P P P P F F P P P F F F F P

kiss07 P P P P P P P P P P P P P P P

lfsr113 P P P P P P P P P P P P P P P

md5 P P P P P P P P P P P P P P P

mtgp P P P P P P P P P P P P P P P

park miller F F P P P F F P P P F F F F F

ranecu P F P P P F F P P P F F F F F

tea P P P P P P P P P P P P P P P

tt800 P P P P P P P P P P P P P P P

Page 21: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 21 Massively Parallel Random Number Generators | Some Results | September 15, 2010

Raw Performance Test

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

combinedLCGTaus

drand48gpu

kiss07

lfsr113

mtgp

park_miller

ranecu

tt800

seco

nds

Raw Performance, GeForce GTX 480, 335544320 Random Numbers

Page 22: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 21 Massively Parallel Random Number Generators | Some Results | September 15, 2010

Raw Performance Test

0

0.05

0.1

0.15

0.2

0.25

0.3

combinedLCGTaus

drand48gpu

kiss07

lfsr113

md5

mtgp

park_miller

ranecu

teatt800

seco

nds

Raw Performance, GeForce GTX 480, 335544320 Random Numbers

Page 23: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 22 Massively Parallel Random Number Generators | Some Results | September 15, 2010

Performance Test: Asian Option Example from [HT07]

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

combinedLCGTaus

drand48gpu

kiss07

lfsr113

mtgp

park_miller

ranecu

tt800

seco

nds

Asian Options, GeForce GTX 480, 96 runs, 163840 simulations

Page 24: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 22 Massively Parallel Random Number Generators | Some Results | September 15, 2010

Performance Test: Asian Option Example from [HT07]

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

combinedLCGTaus

drand48gpu

kiss07

lfsr113

md5

mtgp

park_miller

ranecu

teatt800

seco

nds

Asian Options, GeForce GTX 480, 96 runs, 163840 simulations

Page 25: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 23 Massively Parallel Random Number Generators | Some Results | September 15, 2010

Quality vs. Speed

Reduced rounds of MD5/TEA RNGI For some applications speed more important than quality

I MD5 and TEA allow to reduce number of iterationsI Quality of random numbers may degradeI Avalanche effect

I MD5: passes most of the DIEHARDER tests after 16 rounds

Page 26: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 24 Massively Parallel Random Number Generators | Some Results | September 15, 2010

Quality vs. Speed, TEA, DIEHARDER tests

0 5

10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95

100 105 110 115 120

2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

Die

hard

er te

sts

Number of rounds

Tiny Encryption Algorithm

weakpassed

Page 27: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 25 Massively Parallel Random Number Generators | Some Results | September 15, 2010

Quality vs. Speed, TEA performance

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32

Run

time

in m

s

Number of rounds

Raw Performance, 335544320 Random Numbers

GTX480, Tiny Encryption Algorithm

Page 28: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 26 Massively Parallel Random Number Generators | Some Results | September 15, 2010

Quality vs. Speed, MD5 performance

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64

Run

time

in m

s

Number of rounds

Raw Performance, 335544320 Random Numbers

GTX480, MD5

Page 29: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 27 Massively Parallel Random Number Generators | Conclusion | September 15, 2010

Conclusion

I Many good (parallel) RNGs exist

I Several different properties

I Choice of fitting RNG application dependent

Some PicksI KISS

+ Simple code and state management− Random seeding: may be ok for non-critical applications

I MTGPU

+ Very good quality and sophisticated seeding+ Long period− Relatively complex code− Fixed block/thread layout

I MD5, TEA

+ Random access+ No Seeding− Slow

Page 30: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 28 Massively Parallel Random Number Generators | Conclusion | September 15, 2010

Conclusion

Precomputing on GPUI May be an alternative to in kernel computation

RNG CollectionI Always evaluate your RNG choice and implementation

I Our framework provides an easy platform for testinghttp://mprng.sf.net

Questions?

AcknowledgementsI Framework: Christoph Schied

I This work was supported by a NVIDIA Professor Partnership Award with Hendrik P.A.Lensch

Page 31: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 28 Massively Parallel Random Number Generators | Conclusion | September 15, 2010

Conclusion

Precomputing on GPUI May be an alternative to in kernel computation

RNG CollectionI Always evaluate your RNG choice and implementation

I Our framework provides an easy platform for testinghttp://mprng.sf.net

Questions?

AcknowledgementsI Framework: Christoph Schied

I This work was supported by a NVIDIA Professor Partnership Award with Hendrik P.A.Lensch

Page 32: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 28 Massively Parallel Random Number Generators | References | September 15, 2010

Robert G. Brown.Dieharder: A random number test suite.http://www.phy.duke.edu/~rgb/General/dieharder.php, 2009.

Lee Howes and David Thomas.Efficient Random Number Generation and Application Using CUDA.In GPU Gems 3, pages 805–830, 2007.

D.E. Knuth.The art of computer programming: Seminumerical algorithms, volume 2,1981.

P. L’Ecuyer.Uniform random number generation.Handbooks in Operations Research and Management Science, 13:55–81,2006.

P. L’Ecuyer and R. Simard.TestU01: AC library for empirical testing of random number generators.ACM Transactions on Mathematical Software (TOMS), 33(4):22, 2007.

Pierre L’Ecuyer, Richard Simard, E. Jack Chen, and W. David Kelton.An object-oriented random-number package with many long streams andsubstreams.

Page 33: Massively Parallel Random Number Generators · Page 4 Massively Parallel Random Number Generators jOverview jSeptember 15, 2010 Structure of a RNG In a parallel Implementation (S;m;f;U;g)

Page 28 Massively Parallel Random Number Generators | References | September 15, 2010

Oper. Res., 50(6):1073–1075, 2002.

Mutsuo Saito.A variant of mersenne twister suitable for graphic processors.CoRR, abs/1005.4973, 2010.

Marcus Schoo, Krzysztof Pawlikowski, and Donald C. Mcnickle.A survey and empirical comparison of modern pseudo-random numbergenerators for distributed stochastic simulations, 2005.

S. Tzeng and L.Y. Wei.Parallel white noise generation on a GPU via cryptographic hash.In Proceedings of the 2008 symposium on Interactive 3D graphics and games,pages 79–87. ACM, 2008.

F. Zafar, M. Olano, and A. Curtis.GPU Random Numbers via the Tiny Encryption Algorithm.2010.