APM 541: Stochastic Modelling in Biology Random Number ...jtaylor/teaching/Fall2013/APM541/lectures/sampling.pdfnumbers, most computer simulations of random processes rely on pseudorandom

;

APM 541: Stochastic Modelling in BiologyRandom Number Generation

Jay TaylorFall 2013

Jay Taylor (ASU) APM 541 Fall 2013 1 / 28

Motivation

A Fairly Simple Stochastic SIR Model

Suppose that we are interested in the spread of a communicable infectious diseasethrough a population. Let us make the following assumptions:

The infection is non-lethal.

The population contains N individuals and the timescale of the epidemic is shortenough that we can neglect births and deaths. We will also assume that thepopulation is closed so that we can ignore both immigration and emigration.

The probability that any two individuals come into contact on a given day is γ.Furthermore, contacts occur independently between different pairs of individuals.

Given that a susceptible individual comes into contact with an infected individual,the probability that they are infected is β.

Infected individuals recover with probability ρ per day and recovery providescomplete and lasting immunity against subsequent infections.


Motivation

Before proceeding further, we introduce the following notation. Let St , It , and Rt

denote the numbers of susceptible individuals, infected individuals, and recoveredindividuals on day t. Under the assumptions made on the previous slide, we know that

St + It + Rt = N.

However, these variables change at random from day to day and we would like tounderstand these dynamics. We first calculate the probability that an individual that issusceptible on day t is infected on that day. In fact, it will be easier to calculate theprobability that they are not infected on day t.

To escape infection on a given day, a susceptible individual must avoid being infected byeach of the It infected individuals that are present in the population on that day. Thiscan happen either because the avoid contact (prob = 1− γ) altogether or because theyhave contact but no transmission occurs (prob = γ(1− β)). Since contacts andinfections occur independently of one another, the probability that the susceptibleindividual is not infected is:

P(not infected) = (1− γ + γ(1− β))It .


Motivation

Since each susceptible individual either does or does not contract the diseaseindependently of the others on a given day, the number of new infections that occur onday t, say wt is binomially distributed with parameters St and 1− (1− γ + γ(1− β))It

Similarly, since each infected individual recovers independently of the others withprobability ρ on a given day, the number of recoveries that occur on day t, say rt isbinomially distributed with parameters It and ρ.

Using these observations, we can write the following stochastic recursive equations forour population:

St+1 = St − wt

It+1 = It + wt − rt

Rt+1 = Rt + rt

wt ∼ Binomial“

St , 1− (1− γ + γ(1− β))It”

rt ∼ Binomial(It , ρ)

As simplistic as this model is, there are few exact analytical results and many questionscan only be addressed through Monte Carlo simulations.


Motivation

Monte Carlo Simulations of the Stochastic SIR Model

0 10 20 30 40 50 600

10

20

30

40

50

60

70

80

90

100Stochastic SIR Dynamics

Time (days)

SIR

SIR

A single sample path of the stochasticSIR model: N = 100, I0 = 1, γ = 0.02,β = 0.3, ρ = 0.3333.

0 10 20 30 40 50 60 70 80 90 1000

500

1000

1500

2000

2500

3000

3500

4000

4500

Distribution of the total number ofinfections in 104 simulations of thestochastic SIR model.


Pseudorandom Number Generators


Although hardware exists that can be used to generate ‘truly’ random sequences ofnumbers, most computer simulations of random processes rely on pseudorandomnumber generators. Informally, a pseudorandom number generator is an algorithm thatcan be used to generate a sequence of numbers that appears to be random.

The initial value of the sequence is determined by a seed value that can beselected by the user.

Given the seed value, the sequence of pseudorandom values is fully deterministic,i.e., each time that we choose a particular seed, we recover exactly the samesequence of pseudorandom values.

The pseudorandom sequence satisfies certain statistical tests of randomness. Forexample, for a sufficiently large class of sets E and for small values of ε > 0, thereexists a positive integer N such that whenever n ≥ N,˛̨̨̨

#{i ≤ n : Xi ∈ E}n

− P(E)

˛̨̨̨< ε.



Example: Linear congruential generators (LCGs) generate pseudorandom sequencesof integers using the following recursion:

Xn+1 = (aXn + c) mod m.

Here a, c,m are non-negative integers which are chosen so that the sequence X0,X1, · · ·appears to be uniformly distributed on the set {0, 1, · · · ,m − 1}, with each variableseemingly independent of the others.

One of the strengths of this class of generators is that they are very fast.

One of the limitations is that the sequences are periodic with period at most mand possibly depending on the choice of seed.

A sequence of pseudorandom standard uniform variables can be obtained bydividing through by m, i.e., let Un = Xn/m.

The pseudorandom number generator ran1 recommended in the NumericalRecipes books (Press et al., 1992) combines the ”minimal standard” LCG of Parksand Miller (1988) with a pseudorandom shuffle of blocks of 32 consecutive iterates.



Although LCGs can be effective pseudorandom number generators, it is crucial that thecoefficients a, c,m are chosen carefully. Failure to do so can result in sequences ofnumbers that have strong serial correlations. For example, one algorithm that waswidely used in the 1960’s was a LCG called RANDU, which produced sequencessatisfying the recursion

Un+2 = (6Un+1 − 9Un) mod 1.

In other words, given any two consecutive iterates of this method, the third could bepredicted using a simple linear formula. This hardly looks random by any set of criteriaand use of this algorithm in Monte Carlo simulations may have compromised the resultsof many scientific papers from the period.

See: G. Marsaglia, 1968, ”Random Numbers Fall Mainly in the Planes” PNAS 61:25-28.



Modern pseudorandom number generators tend to use algorithms that are moresophisticated than the LCG. Arguably the most important is the Mersenne twister,which is the default random number generator implemented in MATLAB and R.

The Mersenne twister is based on a matrix linear recursion over a finite field.

There are several implementations that differ in word length. The MATLABimplementation (known as MT19937ar) generates 32-bit integers, which areapproximately uniformly distributed on the set {0, 1, · · · , 232 − 1 ≈ 4× 109}.The period is 219937 − 1 ≈ 4× 106001. (For comparison, the age of the universe isestimated to be approximately 4.35× 1017 seconds.)



Some of the relevant MATLAB commands to generate pseudorandom sequences ofuniformly distributed variates are listed below:

rand returns a pseudorandom number from the standard uniform distributionU[0, 1]; rand(1,n) generates an array of n independent samples from thisdistribution.

randi(N) returns a pseudorandom integer drawn from the discrete uniformdistribution on {1, · · · ,N}. randi(N,1,n) generates an array of n independentsamples with this distribution.

The default seed used by MATLAB for the Mersenne twister is 0; this is usedwhenever a new MATLAB session is invoked. You can also reset the seed to 0 byentering the command rng default.

The command rng shuffle can be used to initialize the twister with a ‘randomseed’ that is chosen based on the current clock time.


Non-uniform Variable Generation

Non-Uniform Random Variable Generation

Computing software like MATLAB and R also include functions to generate randomvariables with many of the familiar non-uniform distributions. The MATLAB commandsused to sample from some of the more common distributions are listed below:

distribution support MATLAB command

Binomial(n, p) {0, 1, · · · , n} binornd(n,p)

Geometric(p) 0, 1, 2, · · · geornd(p)

Negative Binomial(r , p) 0, 1, 2, · · · nbinrnd(r,p)

Poisson(lambda) 0, 1, 2, · · · poissrnd(lambda)

Beta(a, b) [0, 1] betarnd(a,b)

Exponential(lambda) [0,∞) exprnd(lambda)

Gamma(a, lambda) [0,∞) gamrnd(a,lambda)

Normal(mu, sigma2) (−∞,∞) normrnd(mu,sigma)

How are these distributions sampled and what do we do if we need to sample from adistribution that isn’t supported by our chosen software?



The inversion method for sampling non-uniform variates is based on the followingresult.

TheoremLet X be a real-valued random variable with values in [a, b] ⊆ R and cumulativedistribution function F (x) = P(X ≤ x). Suppose that F is a continuous, one-to-onefunction from [a, b] onto [0, 1] with inverse F−1 : [0, 1]→ [a, b]. If U is a standarduniform random variable, then

Y = F−1(U)

has the same distribution as X .

Proof: If x ∈ [a, b], then

P(Y ≤ x) = P(F−1(U) ≤ x) = P(U ≤ F (x)) = F (x),

since F is invertible and U is uniform on [0, 1]. Thus Y has the same cumulativedistribution function as X and therefore the same distribution.



Example: The cumulative distribution function of an exponentially distributed randomvariable X with parameter λ is

F (x) = 1− e−λx .

Since F is invertible with inverse

F−1(x) = − 1

λlog(1− x)

the inversion method can be used to simulate a random variable with the desiredexponential distribution by setting

X = − 1

λlog(1− U).

In fact, because 1− U is also a standard uniform random variable, it suffices to take

X = − 1

λlog(U).



Remarks:

If U1,U2, · · · is a sequence of pseudo-random standard uniform variates, thenF−1(U1),F−1(U2), · · · will be a sequence of pseudo-random variates withdistribution F .

The weakness of the inversion method is that we need to be able to calculate theinverse of the cumulative distribution function F (x). This is problematic becausethe inverse may not exist in closed form, in which case it will need to be evaluatednumerically. For example, the c.d.f. of the standard normal distribution is

F (x) =1√2π

Z x

−∞e−t2 dt

is invertible, but the has no simple expression.

The inversion method is one of a general class of transformation methods, inwhich we generate a random variable Y with one distribution and then transformit to obtain a variable X = Φ(Y ) with another distribution. This is illustrated onthe next slide.



Box-Muller Method

The Box-Muller transform uses a pair of independent standard uniform random variablesU1,U2 to generate a pair of independent standard normal random variables Z1,Z2. It isbased on the observation that if we let (R,Θ) denote the polar coordinates of therandom vector (Z1,Z2),

R =q

Z 21 + Z 2

2

Θ = arctan(Z2/Z1) ∈ [0, 2π),

then R and Θ are independent random variables and R2 is exponentially distributed withparameter 1/2 while Θ is uniform on [0, 2π). Furthermore, if we invert thistransformation by setting

Z1 = R cos(Θ) =p−2 log(U1) cos(2πU2)

Z1 = R sin(Θ) =p−2 log(U1) sin(2πU2),

then Z1 and Z2 will be independent standard normal random variables.



Rejection sampling provides a powerful alternative to the inversion method. In itssimplest form, it is based on the following result.

TheoremLet X be a continuous random variable with density f (x) on [a, b] ⊆ R and let Y be acontinuous random variable with density g(x) on this same interval. Suppose that thereis a finite number M such that

max

f (x)

g(x): a ≤ x ≤ b

ff≤ M.

Let Y1,Y2, · · · be a sequence of i.i.d. random variables with the same distribution as Yand let U1,U2, · · · be an independent sequence of i.i.d. standard uniform randomvariables. If N ≥ 1 is the smallest positive integer such that

UN ≤f (YN)

M · g(YN),

then the variable Z = YN has the same distribution as X .



Suppose that we wish to generate a random variable Z with target density f (x). Thiscan be accomplished through the use of the following rejection sampling algorithm:

1 Set t = 1 and select a proposal distribution with density g(x) and then choose aconstant M such that

M ≥ max

f (x)

g(x): a ≤ x ≤ b

ff.

2 Next, generate a pair of independent random variables Ut and Yt such that Ut isstandard uniform and Yt has density g(x). These variables should be independentof the other variables generated at previous steps of the algorithm.

3 If

Ut ≤f (Yt)

M · g(Yt),

then stop the algorithm and let Z = Y1. Otherwise, increase t to t + 1 and returnto the second step.



Example: Suppose that we wish to sample from the distribution with density

f (x) =1

Cex(1 + sin(2πx)), x ∈ [0, 1],

where C ≈ 1.4516 is the normalizing constant. We will take the proposal distribution tobe uniform on [0, 1], so that g(x) = 1 for all x ∈ [0, 1]. Next, we need to choose M sothat

M ≥ max0≤x≤1

f (x)

g(x)= max

0≤x≤1f (x).

Plotting the density f (x) reveals that there is a unique mode at x = 1 and thatf (x) ≈ 1.8727. However, since C was determined numerically, it is safer to take Mslightly larger than this value, say M = 1.9.

Having chosen the proposal distribution and the constant M, we next need to generatetwo independent sequences of i.i.d. standard uniform random variables, Y1,Y2, · · · andU1,U2, · · · until the first time N that the inequality

UN ≤1

1.9f (YN)

is satisfied. Then the variable Z = YN has density f (x).



Apart from the requirement that the support of the target distribution is contained inthe support of the proposal distribution, the user has considerable flexibility in thechoice of the proposal distribution. However, this choice will influence the efficiency ofthe algorithm. To minimize the average time Tf required to generate a single samplewith density f (x), we should take into account the following criteria when designing theproposal distribution:

The proposal density g(x) should be as similar to the target density f (x) aspossible.

The time required to sample from g(x) should be as small as possible.

Unfortunately, there are usually trade-offs to be made in satisfying these two criteria,e.g., the more similar g(x) is to f (x), the harder it is to sample from g(x). Indeed, if Nis the average number of samples required from the proposal distribution and if Tg isthe average time required to generate each such sample, then

Tf = N × Tg .



Sampling from Discrete Distributions

Suppose that X is a discrete random variable that takes values in the finite setE = {x1, · · · , xn} with probabilities pk = P(X = xk) > 0. The following algorithm canbe used to generate a random variable Z with the same distribution as X .

1 Define an increasing sequence 0 = F0 < F1 < F2 < · · · < Fn−1 < Fn = 1 by setting

Fk =kX

i=1

pi .

2 Generate a standard uniform random variable U.

3 Let Z = xk for the unique value of k such that Fk−1 < U ≤ Fk .

Since U is uniform on [0, 1] and Fk − Fk−1 = pk , it follows that

P(Z = xk) = P(Fk−1 < U ≤ Fk) = Fk − Fk−1 = pk

which shows that Z has the same distribution as X .



Example: To generate a Bernoulli(p) distributed random variable, let U be a standarduniform random variable and set X = 1 if U ≤ p and X = 0 otherwise.

Remarks:

This method works best when n is small and the probabilities pk are easy tocompute.

If we need to generate a large number of independent samples from the samediscrete generation, then the sequence F1,F2, · · · ,Fn should be pre-computed andsaved.

It may also be advantageous to put the elements of E in order from most probableto least probable since this will minimize the expected number of Fk ’s that mustbe checked to determine the value of Z . Again, this is useful only when we willrepeatedly sample from the same distribution.

In many cases, we can use idiosyncratic features of the target distribution toformulate more efficient sampling algorithms. This is illustrated on the nextseveral slides.



Example: Suppose that we wish to sample from the geometric distribution with successprobability p. Let X denote the target variable and let Y = X + 1, so that Y takesvalues in the positive integers E = {1, 2, · · · } with probability mass function

pk = P(Y = k) = (1− p)k−1p.

(The only reason for introducing Y is so that the indices used in our description of thesampling algorithm match with the values of the variable being sampled.) Then

Fk =kX

i=1

pi = P(Y ≤ k) = 1− P(Y > k) = 1− (1− p)k .

Thus, if U is a standard uniform random variable, we need to find the unique integerk ≥ 1 such that

1− (1− p)k−1 < U ≤ 1− (1− p)k ,

which is equivalent to the condition

k − 1 <log(1− U)

log(1− p)≤ k.



If we let bxc denote the greatest integer less than or equal to x (the so-called floorfunction), then Y can be expressed as a simple function of U:

Y =

—log(1− U)

log(1− p)

�+ 1,

while X = Y − 1 ∼ Geometric(p) is the desired variable.

Thus, in this case, we can directly transform U into a geometric random variable.Indeed, we could have also derived this result by observing that if W is an exponentialrandom variable with rate λ = log(1− p), then the integer part of W is geometricallydistributed with parameter p. This follows from the memorylessness of both theexponential and the geometric distributions.



Example: Samples from the Poisson distribution with parameter λ can be generatedwith the help of the following result.

LemmaSuppose that X1,X2, · · · is a sequence of i.i.d. exponential random variables with rateparameter 1. Then, for each λ > 0, the variable

Nλ = max {n ≥ 0 : X1 + · · ·+ Xn ≤ λ}

is Poisson distributed with parameter λ.

To turn this into a sampling algorithm, recall that we can generate a sequence of i.i.d.exponential random variables by setting

Xi = − log(Ui )

where U1,U2, · · · is a sequence of standard uniform random variables.



When the Xi ’s are generated in this way, the formula defining the Poisson variable Nλ

can be rewritten as:

Nλ = {n ≥ 0 : X1 + · · ·+ Xn ≤ λ}= {n ≥ 0 : − log(U1U2 · · ·Un) ≤ λ}

=n

n ≥ 0 : U1U2 · · ·Un ≥ e−λo.

The advantage of the last formulation is that the exponential function is sometimescheaper to evaluate than the logarithm. This leads to the following algorithm.

1 Calculate e−λ and set t = 1 and R = 1.

2 Generate an independent standard uniform random variable Un and set R = RUn.

3 If R > e−λ, increase t to t + 1 and return to (2).

4 Otherwise, halt the algorithm and return X = t − 1 ∼ Poisson(λ).



Example: Recall that the binomial distribution with parameters n and p can beinterpreted as the number of successes in a sequence of n i.i.d. random trials withsuccess probability p. This leads to the following simple binomial random variablegenerator.

1 Generate n i.i.d. standard uniform random variables U1, · · · ,Un.

2 Let X be equal to the number of these variables that are less than or equal to p:

X = # {1 ≤ i ≤ n : Xi ≤ p}

This method works well when n is not too large, but otherwise becomes very slow.Fortunately, when n is large, the binomial distribution can be approximated either by thenormal distribution if p is not close to 0 or 1, or by the Poisson distribution if p is closeto 0 or 1.



Before presenting the algorithm, we first explain how the Poisson approximation canbe used in the case when n is large and p is close to 1. This relies on the followingresult, which holds for all n and p.

LemmaIf X ∼ Binomial(n, p), then n − X ∼ Binomial(n, 1− p).

Proof: Since X is the number of successes in a series of n i.i.d. trials, each with successprobability p, it is clear that n − X is the number of failures in this same series of trials.Thus, if we redefine a success to be a failure and vice versa, then it is clear that n − Xis binomially distributed with parameters n and 1− p.

Now, if n is large and p is close to 1, then because 1− p is close to zero, we canapproximate n − X by the Poisson distribution with mean n(1− p). Equivalently, X canbe approximate by n − Z , where Z has this Poisson distribution.



A Fast Approximate Binomial Random Number Generator

Given parameters n and p and a cutoff parameter K , do one of the following:

If n ≤ K , then generate n independent Bernoulli(p) random variables X1, · · · ,Xn

and return X = X1 + · · ·+ Xn.

If n > K and np < 1, simulate a Poisson distributed random variable Y with meannp and return X = Y .

If n > K and n(1− p) < 1, simulate a Poisson distributed random variable Y withmean n(1− p) and return X = n − Y .

If n > K and 1 < np and 1 < n(1− p), simulate a standard normal random

variable Z and return X = round“

np +p

np(1− p)Z”

.

Observe that the parameter K determines when the algorithm uses an approximation tothe binomial distribution. The choice of K depends on the intended application, butvalues in the range 25− 50 suffice for many purposes.


Documents

APM 541: Stochastic Modelling in Biology Random Number ...jtaylor/teaching/Fall2013/APM541/lectures/sampling.pdfnumbers, most computer simulations of random processes rely on pseudorandom