Upload
eneskoc
View
240
Download
2
Embed Size (px)
DESCRIPTION
generating random variables
Citation preview
1
Computational Statistics Ch 3 Methods for generating random variables Prof. Donna Pauler Ankerst
Book Statistical Computing with R Maria L. Rizzo Chapman & Hall/CRC, 2008 Please review yourself: Chapter 1: Introduction to the R environment Chapter 2: Probability and statistics review
2
Random variable simulation
The fundamental tool required in compuational statistics is the ability to simulate random variables from specified probability distributions.
All random variable generation starts with uniform random variable generation.
A uniform distribution means all values in the domain space of consideration have equal probability of occurrence.
3
Discrete uniform
Let the random variable Y be the outcome from the roll of a single fair die. What is the distribution of Y?
Answer: Y ~ Discrete Uniform on {1,2,3,4,5,6} where p(Y = i) = 1/6 for i = 1,...,6. Write R code to simulate a random sample of 600 observations of Y and show a histogram to prove it follows the correct distribution.
Answer: >ysamp=sample(1:6,600,replace=T)
>hist(ysamp)
4
Discrete uniform >ysamp=sample(1:6,600,replace=T) >hist(ysamp)
I expect the bars to be of the same height, 100 of each, so I try again.
This must be random variation, double check by increasing from 600 to 60000.
5
Discrete uniform >ysamp=sample(1:6,60000,replace=T) >hist(ysamp)
That is better, now I am convinced the sample command is doing what I think it is doing.
6
More about the R sample function
The multinomial distribution is not uniform in general, it allows different values to have different probabilities as long as they sum to one.
7
Continuous uniform
>ysamp=runif(1000) >hist(ysamp,prob=T)
The height is near 1, correct for the U(0,1) density.
Let the random variable Y come from the continuous uniform distribution on the interval (0,1) [U(0,1) density]. Write down the density function of Y.
Answer: Y ~ f(Y) where f(Y) = 1 for 0 < Y < 1, and 0 otherwise. Write R code to simulate a random sample of 1000 observations of Y and show an empirical density plot to prove it follows the correct distribution.
Answer:
8
Uniform(0,1)
The U(0,1) distribution provides the base generating point for generating most distributions.
Most programming languages, such as C, and statistical packages, such as R, include a U(0,1) generator.
There are many computer algorithms for generating U(0,1) random variables based on congruential methods. These fall more in the real of informatics and are beyond the scope of this course.
For this course we will assume that we have a method for generating a U(0,1) random variable.
9
Uniform(a,b)
>hist(1+3*runif(1000),prob=T)
Question: How would you generate Z ~ U(a,b), the uniform distribution on (a,b), if you only had a U(0,1) generator available? Answer: Generate Y ~ U(0,1) and let Z = a + (b-a)Y.
Write the R code to verify this for a = 1, b = 4.
The height of a U(a,b) density is 1/(b-a) and the height appears to be near 1/3 as expected for U(1,4).
10
Generators in R
In general, p returns the cdf and d the pdf evaluated at a given value, q returns a quantile and r returns a random number from the distribution. Try these functions out yourself. Use the help command, eg help(runif).
11
Discrete random variables
Although R now contains generators for most discrete distributions, and the list is constantly growing, we will learn the algorithms behind them. Specifically, we will now cover how to generate from the following distributions Bernoulli Binomial Discrete
assuming that a U(0,1) generator is available.
12
Bernoulli(p)
(0,1).~for 1)(0)()(1)(
: worksalgorithm theProof
)( 2.))1,0(~ Generate 1.)
:Algorithm
(0,1).for 1)0(
)1( ,}1,0{)(~
UUp-pUPXPp pUPXP
pUIXUU
ppXP
pXPXpBerX
=>==
=≤==
≤=
∈
−==
==
∈⇒
13
Binomial(n,p)
).( ddistribute is variablesrandom )(t independen of sum theand
,1for )(~)( slide, previous By the : worksalgorithm theProof
)( 2.)
1for )1,0(~ Generate 1.):Algorithm
).(~ where Therefore,
0,1,..., ,)1( )( if ),(~
1
1
n,pBinpBern
,...,nipBerpUI
pUIX
,...,niUU
pBerXXX
nk-ppkn
kXPpnBinX
i
n
ii
i
iid
i
n
ii
n-kk
=≤
≤=
=
=
=⎟⎟⎠
⎞⎜⎜⎝
⎛==
∑
∑
=
=
You may have forgotten your probability distributions. These are reviewed in Ch. 2 of the Rizzo book or can be found in Wikipedia. It is important to know the distributions well to be able to simulate from them. The Binomial distribution counts the number of successes in n independent Bernoulli trials.
14
Discrete(x1,x2,...,xn)
( )
∑
∑
∑
=
−
=
=
∈=
≤=
===
===
=>=
n
iii
ii
nnn
j
iij
n
iiii
nn
.IUIxX
X
XxXPFFFIFFIFFI
,...,njpFF
ppxX
xxxDiscretexxxX
1
1212101
10
1
2121
)( 2.)
U(0,1)~ UGenerate 1.): generate toAlgorithm
. of cdf theis )( that Note].,[),...,,[ ),,[ lssubinterva the
into [0,1] interval thepartition and,1for and 0Let
.1 and 0y probabilit with if
),...,,(~ ,...,,on on distributi discrete a follows
⎩⎨⎧
= CAA
AI 0 1
)(
:functionIndicator
15
Discrete(x1,x2,...,xn)
.1,...,for
)()(:Proof
)( 2.)
(0,1)~ Generate 1.): generate toAlgorithm
].,[),...,,[ ),,[ lssubinterva the
into [0,1] interval thepartition and,1for and 0Let
1
1
1212101
10
nipFF
IUPxXP
.IUIxX
UUX
FFIFFIFFI
,...,njpFF
i
ii
ii
n
iii
nnn
j
iij
==
−=
∈==
∈=
===
===
−
=
−
=
∑
∑
16
Discrete(x1,x2,...,xn)
∑
∑
=
−
=
∈=
===
===
n
iii
nnn
j
iij
.IUIxX
UUX
FFIFFIFFI
,...,njpFF
1
1212101
10
)( 2.)
(0,1)~ Generate 1.): generate toAlgorithm
].,[),...,,[ ),,[ lssubinterva the
into [0,1] interval thepartition and,1for and 0Let
REMARK This algorithm can be extended for a very large and even infinitely countable n, but the search for the correct interval can become numerically infeasible. For these cases binary and indexed searches can be used (Ripley 1987).
17
Inverse Transform Method
To move on to generating from common continuous distributions or infinite discrete distributions we need a new general and useful technique.
The inverse transform method works for any random variable that has an invertible cdf.
18
Theorem: Probability Integral Transformation
.))(())((
))())((())(()(
uniform. a of cdf thefollows )( thatshow We.)P( satisfies variable
1)uniform(0, a of cdf the10For :Proof
).1,0(~)(then ),( cdf with variablerandom continuous a is if note,First
11
11
uuFFuFXPuFXFFPuXFPuUP
XFUuuUU
u
UXFUxFX
XXX
XXXX
X
X
X
==≤=
≤=≤=≤
=
=≤
≤≤
=
−−
−−
).()(: of cdf of Def.xXPxFX
X ≤=
19
Theorem: Probability Integral Transformation
on.distributi (0,1) theof cdf theof definitionby )())(())())((())(()( :Proof
).( 2.)(0,1).~ Generate .)1
:)( pdf hence and ,)( cdf with X generate Then to
.(0,1)for })(:inf{)(ation transforminverse theDefine
11
1
1
UxFxFUPxFUFFPxUFPxXP
UFXUU
xfxF
uuxFxuF
XX
X-XX
-X
-X
XX
XX
=≤=
≤=≤=≤
=
∈==−
20
Example from book
21
Good fit!
22
Example: Exponential distribution
The exponential distribution is often used in industrial research to model time until failure of machines, light bulbs, etc.
23
Example: Exponential distribution
.)1(1)1P(1)1P()1( :Proof(0,1).both are 1 and since )/log( setting toequivalent is This
)/1log()1log(
)exp(1)(exp1
.for )( Solve)(Set (0,1).~ Generate
1
uuuUUuuUPUUUUX
UXλXUλXUUλX
XUXFUFXUU
X-X
=−−=−<−=≤−=≤−
−−=
−−=
−=−
−=−
=−−
⇒
=⇔=
λ
λ
To generate n Exp(lambda) random variables in R: >-log(runif(n))/lambda R has an exponential generator >rexp(n,lambda)
24
Inverse Transformation Method, Discrete Case
i
iXiXi
iXiXi-X
X
iii-
xXxFUxFx
,U~U
xFuxFxuFxF
xxxX
=•
≤<•
•
≤<=
<<<<
−
−
+
Set )()( where Find
)10( is algorithm The
).()( where,)(
is transforminverse then the),( ofity discontinu of points theare
and variablerandom discrete a is If
ons.distributi discretefor applied be alsocan transforminverse but theknown Less
1
11
11 !!
25
Example: Poisson distribution
.Set )()1( where Find
)10( is algorithm The
.1)()1( and )()1()(...
)0()0(!1
)1()0()1(
)0()0(:lysequential scdf' theCalculate
on.distributiPoisson thefrom generate toused be could transforminverse the
,algorithmsefficient more are hereAlthough t
xXxFUxFx
,U~U
xxfxfxfxFxF
ffeeffF
efF-λ
-λ
-λ
=•
≤<−•
•
+=++−=
+=+=+=
==
λ
λλ
To generate n Poisson(lambda) random variables in R: >rpois(n,lambda)
26
Acceptance-Rejection Algorithm
You want to generate from some distribution f(y) but just do not know how.
27
Acceptance-Rejection Algorithm
What you need is a density g(y) that you can sample from and such that cg(y) covers f(y) in the sense that cg(y) ≥ f(y) for all y such that f(y) > 0.
f(y)
cg(y)
28
Acceptance-Rejection Algorithm
(a). back to go otherwise
;return and accept )()( If (c)
(0,1)~ Generate (b))(~ Generate (a)
required variablerandomeach For
from. samplecan you that and
0)( with )()( satisfying )(density Find
:)(~ variablerandom a generate To
YXYtcgtfU
UUYgY
tftctgtfYg
xfX
=≤
•
>∀≤•
29
Acceptance-Rejection Algorithm
).()( since 1)()( Note
(0,1).~for )( since )()(
|)()()|(accept that see we(c)In
(a). back to go otherwise
;return and accept )()( If (c)
(0,1)~ Generate (b))(~ Generate (a)
required variablerandomeach For
YcgYfYcgYf
UUaaUPYcgYf
YYcgYfUPYP
YXYYcgYfU
UUYgY
≤≤
=≤=
⎟⎟⎠
⎞⎜⎜⎝
⎛≤=
=≤
So the closer cg(Y) is to f(Y) the higher the acceptance probability and the more efficient the algorithm.
30
Acceptance-Rejection Algorithm
f(y)
cg(y)
The closer cg(Y) is to f(Y) the higher the acceptance probability and the more efficient the algorithm. It can be a challenge to find an efficient g(Y). High probabilities
of rejection regions
31
Acceptance-Rejection Algorithm
f(y)
Adaptive rejection, implemented in the R Winbugs package, uses g(Y) as a piecewise spline approximation to densities f(Y) that are log concave [Not covered in detail here].
g(y) Sometimes g(Y) is called the envelope density.
32
Acceptance-Rejection Algorithm
.])1(11mean has that variable,random geometric a istry eachon 1y probabilit with success a until iterations ofnumber [The
. is )(~ single a generate torequired iterations ofnumber theaverageOn
y.probabilit acceptance average themaximizingfor ideal is 1 So
.1)(1)()()()()|(accept(accept)
is averageon y probabilit acceptance that theimplies This .continuous )( Assume
. )( ~ where)()()|(accept Back to
c/c//p/cp
cXfX
c
cdyyf
cdyyg
ycgyfdyygyPP
Yg
YgYYcgYfYP
YYY
==
=
=
====
=
∫∫∫
33
Acceptance-Rejection Algorithm
. since )( )(
shownjust sby what wa 1
)()()(
rule Bayesby (accept)
)()|(accept )( ~ ere wh)accept|(accept)|(
).(accept)|(show weaccepted, are that sgeneration only take weSince
:book]in case [discrete casecontinuous for the )(on distributiright thefollows algorithm
rejection acceptance by the generated r.v. that theProof
XYXfYfc
YgYcgYfP
YgYPYgYYPXP
XfXP
XfX
===
=
=
=
=
34
Acceptance-Rejection Algorithm
The first step to successfully using acceptance-rejection is knowing what your density f(y) looks like. For multivariate densities this can be very difficult.
35
Example: Beta distribution
Beta(1,1) = U(0,1)
.)1()(
)(;)(
.1!0,)!1()( 1, numbers naturalFor
0. 0, 1,0for )1()()()()(),Beta(~
Beta(2,2).~ nsobservatio 1,000 generate toalgorithmrejection -acceptance theUse
2
11
+++=
+=
=−=Γ≥
>>≤≤−ΓΓ
+Γ=⇒
=
−−
βαβααβ
βαα
βαβαβα
βα βα
XVarXE
nnn
xxxxfX
Xn
36
Example: Beta distribution
ns.observatio 1,000 requested theget toiterations 1,500 i.e. ,acceptance oneevery
for needed be woulditerations 1.53/2 averageon and 0.672/31/ be y wouldprobabilit acceptance
average themeans This 3/2. and (0,1))( ischoice natural a [0,1] is )( ofsupport theSince
3/2.)6(1/2)(1/2 of valueawith 1/22)2/(2at mean itsat max a hason distributi This
).1(6)1(!1!1!3 )1(
)2()2()22()(
ly.analyticalout it writeon,distributiyour know get to step,First
Beta(2,2).~ nsobservatio 1,000 generate toalgorithmrejection -acceptance theUse
1212
==
≈=
==
=
=+
−=−=−ΓΓ
+Γ=
=
−−
cc
cUxgxf
xxxxxxxf
Xn
Beta(2,2) is symmetric and unimodal in [0,1].
37
The book hastily chooses the same U(0,1) generating density but c = 6 instead of 3/2. It therefore requires 4 times the iterations.
Book
38
R Code for Example 3.7
5873 iterations required to get
1000 r.v.’s
39
Missing a square as shown next. The correct se yields similar conclusions.
Errors in the book
Ch 2, not Ch 1. Show next.
40
Variance of the qth sample quantile
expected. be toas quantiles extremefor iancehigher var causing numerator, then faster tha zero approachesy r typicalldenominato theons,distributi tailed-heavyFor
on.distributi
theof tailsat the 0 approaches also )( However, 1). 0, (qon distributi the of tailsat the 0 equals and (median) 0.5qat 1/4 of maximum a hasnumerator The
accuracy. moreget to increasecan you so increases as Decreases:Note
on.distributi sampled theofdensity theis and size sample theis where
,)()1()ˆVar(
is ˆ quantile sampleth theof varianceThe
2
•
=
=•
•
−=
q
q
xf
nn
fn
xnfqqx
xq
41
Transformations
(0,1).t independen are )2sin()log(2~ Z
)2cos()log(2~ Z
t independen are (0,1)~ 4.)
df on,distributi- sStudent' ~t independen are l),(0,~ .)3
~// oft independen ,, 2.)
(df) freedom of degrees 1 with square-chi l)(0,~ 1.)
sources) books, in various found becan required,not (proofs Examples
. variablesrandom generatingfor methodefficient more a providecan ,applicable if tions,Transforma
2
1
2
,22
21
2
NVU
VU
UU,V
nttV/nZTχ~VNZ
FnVmUFVUχ~VχU ~
~χZVNZ
nn
nmnm
π
π
−
−
⇒
=⇒
=⇒
=⇒
Error in these formulas in book
42
Transformations
lecture. in thisearlier on distributilexponentia thereviewed Weon.distributi )( theison distributi ),1( The
.)(,)( are varianceandmean The
.0,)(
)( is of pdf theif 0parameter
rate and 0parameter shapeon with distributi a follows0
(0,1).domain has which on,distributi seen thejust have We
),(~ t independen are ),(~,),(~ 5.)
2
1
λλ
λλ
λλ
λλ
λ
ExpGamma
rXVarrXE
xexr
xfX
r Gamma X
Beta
srBetaVU
UXsGammaVrGammaU
xrr
==
>Γ
=>
>>
+=⇒
−−
43
> u = rgamma(1000,shape=3,rate=1) > v = rgamma(1000,shape=2,rate=1) > qqplot(qbeta(ppoints(1000),3,2),u/(u+v)) > abline(0,1)
ppoints(): Generates a symmetric series in [0,1]. You could have used seq(0,1,length=1000) instead to get evenly spaced numbers in [0,1] or the rbeta(1000,3,2) to just get a random sample.
Example 3.8
44
Transformations
→
=
>Γ
=>
>>•
−−
Next attention. special deserve that ations transformspecial are mixtures and nsConvolutio ns.convolutio called are variablesrandom of sums of onsDistributi
them.add and variables )( simulate variable,random ),( single a simulate toSo,
).,( is variablesrandom )( i.i.d. of sum The .)6
.)( ),1( that Note . variance,and
ismean The .0,)(
)( is of pdf theif 0parameter
rate and 0parameter shapeon with distributi a follows0
2
1
λλ
λλ
λλλ
λλ
λ λ
ExprrGamma
rGammaExp
ExpGammar/
r/xexr
xfX
r Gamma X
xrr
45
Convolutions
sum. their takingand ,...,, generatingby n convolutio aison that distributi a with variablerandom a simulate torwardstraightfo isIt
ofn convolutio fold)-( thecalled is
~
~,...,,
21
21
21
n
XS
Sn
X
iid
n
XXX
FnF
FXXXS
FXXX
+++= !
46
Distributions related by convolution
).,( is onsdistributi )(t independen ofn convolutio The
0,1,...for )1()()(~:successfirst theuntil failures ofnumber thecountson distributi Geometric
A ons.distributi Geometrict independen ofn convolutio theison distributi This
0,1,...for )1(1
)(),(~
. toequal each trialon success ofy probabilit with t trialsindependen among obtained is successth theuntil failures ofnumber thecounts ),(on distributi binomial negative The
. variables(0,1) squared ofn convolutio theis the0, integer For 2
λλ
χ
rGammaExpr
xppxXPpGeomX
r
xppxrx
xXPprNegBinX
pr
prNegBin
Niidvv
x
rx
v
•
=−==⇒
=−⎟⎟⎠
⎞⎜⎜⎝
⎛ −+==⇒
•
>•
There are now R functions for these (rchisq,rnbinom, rgamma) but this is not just an R course.
47
Mixtures
∫
∫
∑
∑
∞
∞
=
∞
∞
=
=
=
=>
=
-
|
-|
i
21
.1)(such that function weighting
and numbers realby indexed )( onsdistributi offamily afor
)()( )(
is ofon distributi theif mixture continuous a follows variablerandomA
ies.probabilit mixingor
weightsmixing thecalled are ' The .1such that 0 and,..., variablesrandomt independen of sequence somefor
)( )( sum weighteda is ofon distributi theif mixture discrete a is variablerandomA
yff
yxF
dyyfxFxF
XX
sπXX
xFπxFXX
YY
yYX
YyYXX
ii
i
iXiX i
ππ
48
Example: Mixture of Normals
To generate a random variable that follows a 0.5:0.5 mixture of independent N(0,1) and N(3,1) distributions: 1.) Generate an integer k in {1,2}, where P(1) = P(2) = 0.5. 2.) If k = 1, draw X ~ N(0,1); otherwise X ~ N(3,1).
A mixture of Normals is different from a convolution of Normals. A mixture of Normals may not be Normal, it may be bimodal or multimodal. Is a convolution of Normals Normal?
49
Example: Convolution of Normals
What is the distribution of the convolution of independent N(0,1) and N(3,1) distributions? Let X = X1 + X2, where X1 ~ N(0,1) and X2 ~ N(3,1) Then from Statistics courses we know that E(X) = E(X1) + E(X2) = 0 + 3 = 3 ...linearity of expectation Var(X) = Var(X1) + Var(X2) = 1 + 1 = 2 ...independence X ~ Normal ...characteristic functions
Therefore, X ~ N(3,2).
50
Example: Mixture of Gammas
densities.component over the mixture theof estimatedensity empiricalan overlay and , from nsobservatio 5000 simulate toloops
withoutcode Refficient Write1,...,5.for ,15/ are iesprobabilit
mixing theandt independen are )/1,3( where
,
:onsdistributi Gamma of mixture following heConsider t5
1
X
j
jj
jXjX
FjjπjrGamma~X
FπFj
==
==
=∑=
λ
WAKE-UP CALL: A question like this could appear on the exam!
51
Example: Mixture of Gammas
ghts.higher weiget meanshigher with onsdistributi theSo 5/15. 4/15, 3/15, 2/15, 1/15,
ies,probabilit mixing respective with 53,6,9,12,1 are means The
1,...,5.for 3/)( on,distributi for the Recall...simulating areyou what know Get to
1,...,5.for ,15/ iesprobabilit
mixing with )/1,3( where,5
1
Gamma
jjrXEGamma
jjπ
jrGamma~XFπF
jj
j
jjj
XjX j
===
==
===∑=
λ
λ
52
Ex. 3.12 R code
density(): Generates a kernel density estimate, like a smoothed histogram, for a sample of data.
Your code would not have to appear exactly like this but would need to work and not contain loops.
53
Ex. 3.12 Mixture of several gamma distributions
54
Ex. 3.15 Poisson-gamma mixture
)).1/(,(~
),(~ ),(~X
ly,Specifical ).,(~ whereons,distributi )( of mixture continuous a also is success)th theuntil
failures ofnumber the(countson distributi Binomial Negative The
ββ
βλλ
βλ
λ
+
⇒
rNegBinX
rGammaPoisson
rGammaPoissonr
Question: Simulate 5000 observations of a variable that counts the number of failures until the 4th success, where the probability of success on each independent trial is 0.75 using a continuous mixture.
55
Ex. 3.15 Poisson-gamma mixture
Question: Simulate 5000 observations of a variable that counts the number of failures until the 4th success, where the probability of success on each independent trial is 0.75 using a continuous mixture. Solution: From the description we recognize this as a NegBin random variable. By the continuous mixture requirement we, see that we need to use the Poisson-Gamma method.
3. and 4 need))1/(,(~),(~ ),(~X
==∴
+⇒
β
βββλλ
rrNegBinXrGammaPoisson
56
Ex. 3.15 Poisson-gamma mixture
Histogram of x
x
Frequency
0 2 4 6 8
0100
200
300
400
500
600
> hist(x) Check that the simulations make sense: if X counts the number of failures until the 4th success where each success probability is 0.75, expect lots of 0’s.
57
Ex. Poisson-gamma mixture
R code and output comparing the simulation probability mass function with a NegBin(4,.75) mass function:
58
The Empirical Cumulative Distribution Function (ecdf, Ch 2 of book) is a good method for comparing distributions.
.5.0/)](1)[()( oferror standard The
.1,1,...,1,/
,0)(
:by ... sample ordered observedan for defined and)()( of estimate unbiasedan is )( ecdf The
)(
)1()(
)1(
)()2((1)
nnxFxFxF
xxnixxxni
xxxF
xxxxX P xFxF
n
n
iin
n
n
≤−=
⎪⎩
⎪⎨
⎧
≤
−=<≤
<
=
≤≤≤
≤=
+
Comparing distributions
59
Ex. 3.15 Poisson-gamma mixture
nnxFxFxFn
5.0/)](1)[()]([ s.e. ≤−=
R code and output comparing the ecdf of the simulated sample with the probability mass function of a NegBin(4,.75):
60
Multivariate Normal distribution
( ) .)(Σ)(21expΣ2)(
function density hasit if ),Σ,(~ denoted ,matrix variance
symmetric definite positive and r mean vectoon with distributi
Normal temultivaria ldimensiona- a has vector randomA
12/12/
1
111
1
1
⎭⎬⎫
⎩⎨⎧ −ʹ′−−=
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
=Σ
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
=
∈⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
∈
−−− µµπ
µ
σσ
σσ
µ
µ
µ
XXXf
NX
dRx
xX
d
d
ddd
d
d
d
d
!"#"
!
"
"
Covered in Multivariate Statistics
determinant inverse
61
Multivariate Normal distribution
The R package already contains efficient functions to generate multivariate normal random variables: • mvrnorm in the MASS package • rmvnorm in the mvtnorm package. However, it is useful to see how these are traditionally generated from univariate N(0,1) random variables.
62
Multivariate Normal distribution
here.shown not isIt course. statisticsearly an in donebeen havemight and functionssticcharacteri requires This Normal. still is variablerandom ddistribute
Normal a ofn combinatiolinear a that is show todifficult morebit A
.)()()()( )()()()()(
yields variablesof nsnsformatiolinear trafor varianceandn expectatioof rules theapplyingThen . and (0,1)~Let
(0,1). a from simulated is thishow show and case ),( e Univariat thebegin with we
thisshow To ns.observatio (0,1) i.i.d. from generated bemay ),Σ( on,distributi Normal temultivaria a followingn observatioAn
22
2
σσ
µσµ
σµ
===+=
=+=+=+=
+=
ZVarσZVarσZµVarYVarZEσZEµEσZµEYE
σZµYNZ
NNNd
µ,Nd
63
Multivariate Normal distribution
Σ).,( on,distributi desired thehave to transform want to weNow
matrix.identity theis where),,0(~Then Z
.)( vector a into esecollect th and : variablesrandom(0,1) i.i.d. generatefirst WeΣ).,(~ generate want to weSuppose
similarly. proceeds variablerandom Normal temultivaria a generate To
.Set 2.)(0,1).~ Generate 1.)
:(0,1) a from variablerandom ),( a generate to how shows thisand ),(~ and (0,1)~
11
2
2
µ
µ
σµ
σµ
d
d
dd
d
NZ
ddIIN
,...,ZZZ,...,ZZNdNY
σZµYNZ
NNNYσZµYNZ
×
ʹ′=
+=
⇒+=∴
64
Multivariate Normal distribution
.dimensions on Normal also is and
)()()()()()()()()(
:are varianceandn expectatiofor ruleslinearity The ).( thandimension different a havecan that Note .in ector constant v a is and
in matrix constant a is where, and ),(~ ZSuppose
course. Statistics teMultivaria in the covered is and algebramatrix ofbit little arequiresIt shown.just case univariate thegeneralize results following The
qY
CCΣCZCVarCZVarbCZVarYVarbCbZCEbECZEbCZEYE
dqZYRb
RCbCZYNq
dqd
ʹ′=ʹ′==+=
+=+=+=+=
≠
+=Σ ×
µ
µ
65
Multivariate Normal distribution Suppose that Σ can be factorized as Σ =C "C . Then if Z~Nd (0,I ) and Y =CZ +µ,
E(Y ) = E(CZ +µ) = E(CZ )+E(µ) =CE(Z )+µ = µVar(Y ) =Var(CZ +µ) =Var(CZ ) =CVar(Z ) "C =C "C = Σ.
and Y is also Normal on d dimensions. This gives the algorithm for generating the random variable, but how is the decomposition of Σ performed?
Three methods in R: • eigen: spectral or eigenvalue decomposition • chol: Cholesky factorization • svd: Singular value decomposition
66
Defintions and examples of how to use these from the book
Three methods in R: • eigen: spectral or eigenvalue decomposition • chol: Cholesky factorization • svd: Singular value decomposition
You will not need to memorize the specific decompositions for the exam, they would be provided and you would have to write R code.
67
Ex. 3.16 Spectral decomposition method
68
Ex. 3.16 Spectral decomposition method
69
Ex. 3.16 Spectral decomposition method
Compare simulated means, covariances to the truth (here covariances = correlations since the variances = 1).
70
Ex. 3.17 Singular value decomposition method
71
Ex. 3.18 Choleski factorization method
72
Wishart distribution
mean. sample theis 1 where
,~)(
then),(~,..., if that learning recallmight You
. variancessamplefor on distributi assumedcommonly a iswhich on,distributi square-chi the tocollapses Wishart thecase ldimensiona-1
In the matrices.for on distributi usedcommonly most theisWishart theand matrices random exists there vectorsrandom oaddition tIn
1
21
2
1
2
21
∑
∑
=
−=
=
−
n
ii
n
n
ii
iid
n
Xn
X
XX
NXX
χσ
σµ
73
Wishart distribution
mean. sample theis 1 where
,~)(
then),(~,..., if that learning recallmight You
. variancessamplefor on distributi assumedcommonly a iswhich on,distributi square-chi the tocollapses Wishart thecase ldimensiona-1
In the matrices.for on distributi usedcommonly most theisWishart theand matrices random exists there vectorsrandom oaddition tIn
1
21
2
1
2
21
∑
∑
=
−=
=
−
n
ii
n
n
ii
iid
n
Xn
X
XX
NXX
χσ
σµ
74
Wishart distribution
used.hardly isit but ondistributi square-chi central-non a exists thereasjust Wishart central-non a exists
there Wishart;central a echnicallyactually t is introduced Wishart The
).,(~ and ),0(~ and 1,For
matrices. definite-positive symmetric of space the tobelongs and
function gamma theis where,)2/]1([2
)}Σ(2/1exp{)(
:like looksdensity n theinformatioyour for but memorize, tonecessary not isIt
).,Σ( as written , (df) freedom of degrees and Σmatrix scaleith Wishart wis matrix theofon distributi Then the on.distributi )Σ,0(
t independenan roweach with matrix, data an is where that Suppose
21
222
1
2
1
2/4/)1(2/
12/)1(
•
===•
Γ−+Γ
−=
•
×
×ʹ′=
∑
∏
=
=
−
−−−
nWMNXXMd
M
jnΣ
MtraceMMf
nW~MnMdnN
dnXXXM
n
iid
i
n
ii
d
j
nddnd
dn
W
d
d
σχσσ
π
75
Wishart distribution
course. thisof scope thebeyond proof with next, provided is which ion,decomposit sBartlett' called algorithmefficient more a is There
ariable. Wishart voneevery for Normals temultivaria generate tohave wesincet inefficien be wouldalgorithm thisHowever,
rward.straightfo is variablerandom Wishart a generating ons,distributi Normaltemultivaria generate tohow learnedjust that weand above definition Given the
).,Σ( as written , (df) freedom of degrees and Σmatrix scaleith Wishart wis matrix theofon distributi Then the on.distributi )Σ,0(
t independenan roweach with matrix, data an is where that Suppose
n
nW~MnMdnN
dnXXXM
d
d ×
×ʹ′=
76
Bartlett’s Decomposition (not required to memorize for the exam)
).,Σ(~Then 2.). and 1.)by Generate
ngular.lower tria is where,Σion factorizat Choleski Obtain the
:),Σ( a generate To
).,(~Then
.,...,1for ~ .)2
for )1,0(~ 1.)
satisfying
entriest independenh matrix wit random ngular lower tria a be )(Let
21
nWLLAA
LLL
nW
nIWTTA
d iT
j iNT
ddTT
d
d
dd
inii
iid
ij
ij
ʹ′•
•
ʹ′=•
ʹ′=
=
>
×=
+−χ
77
End of Chapter 3
Chapter 4: Visualization of Multivariate Data Has some interesting and straightforward concepts but the R packages used in the chapter are now outdated. There are now many advanced R packages for plotting data, including ggplot. Chapter 4 is not covered in this course.