8/12/2019 Conjugacy Print
1/28
Conjugate Models
Patrick Lam
8/12/2019 Conjugacy Print
2/28
Outline
Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model
The Normal ModelNormal Model with Unknown Mean, Known Variance
Normal Model with Known Mean, Unknown Variance
8/12/2019 Conjugacy Print
3/28
Outline
Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model
The Normal ModelNormal Model with Unknown Mean, Known Variance
Normal Model with Known Mean, Unknown Variance
8/12/2019 Conjugacy Print
4/28
Outline
Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model
The Normal ModelNormal Model with Unknown Mean, Known Variance
Normal Model with Known Mean, Unknown Variance
8/12/2019 Conjugacy Print
5/28
Conjugacy
Suppose we have a Bayesian model with a likelihood p(y|) and a
prior p().
If we multiply our likelihood andprior, we get ourposterior p(|y)up to a constant of proportionality.
If ourposterioris a distribution that is of the same family as ourprior, then we have conjugacy. We say that theprioris conjugateto the likelihood.
Conjugate models are great because we know the exact distribution
of theposteriorso we can easily simulate or derive quantities ofinterest analytically.
In practice, we rarely have conjugacy.
8/12/2019 Conjugacy Print
6/28
Brief List of Conjugate Models
Likelihood Prior PosteriorBinomial Beta Beta
Negative Binomial Beta BetaPoisson Gamma Gamma
Geometric Beta BetaExponential Gamma Gamma
Normal (mean unknown) Normal NormalNormal (variance unknown) Inverse Gamma Inverse Gamma
Normal (mean and variance unknown) Normal/Gamma Normal/GammaMultinomial Dirichlet Dirichlet
8/12/2019 Conjugacy Print
7/28
Outline
Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model
The Normal ModelNormal Model with Unknown Mean, Known Variance
Normal Model with Known Mean, Unknown Variance
8/12/2019 Conjugacy Print
8/28
A Binomial Example
Suppose we have vector of data on voter turnout for a random
sample ofn voters in the 2004 US Presidential election.
We can model the voter turnout with a binomial model.
Y Binomial(n, )
Quantity of interest: (voter turnout)
Assumptions:
Each voters decision to vote follows the Bernoulli distribution. Each voter has the same probability of voting. (unrealistic)
Each voters decision to vote is independent. (unrealistic)
8/12/2019 Conjugacy Print
9/28
The Conjugate Beta Prior
We can use the beta distribution as apriorfor, since the betadistribution is conjugate to the binomial distribution.
p(|y) p(y|)p()
= Binomial(n, ) Beta(, )
=ny
y(1 )(ny)( + )
()()(1)(1 )(1)
y(1 )(ny)(1)(1 )(1)
p(|y) y+
1(1 )n
y+
1
Theposterior distributionis simply aBeta(y+ , n y+ )distribution. Effectively, ourprioris just adding 1 successesand 1 failures to the dataset.
8/12/2019 Conjugacy Print
10/28
The Uninformative (Flat) Uniform Prior
Suppose we have no strong prior beliefs about the parameters. We
can choose apriorthat gives equal weight to all possible values ofthe parameters, essentially an uninformative orflat prior.
p() =constant
for all values of.
For the binomial model, one example of a flat prioris theBeta(1,1) prior:
p() = (2)
(1)(1)(1
1)(1 )(1
1)
= 1
which is the Uniform distribution over the [0, 1] interval.
8/12/2019 Conjugacy Print
11/28
Since we know that a Binomial likelihood and a Beta(1,1) priorproduces aBeta(y+ 1, n y+ 1) posterior, we can simulate the
posteriorin R.
Suppose our turnout data had 500 voters, of which 285 voted.
> table(turnout)
turnout
0 1
215 285
Setting ourpriorparameters at = 1 and = 1,
> a < - 1> b < - 1
we get theposterior
> posterior.unif.prior
8/12/2019 Conjugacy Print
12/28
Outline
Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model
The Normal ModelNormal Model with Unknown Mean, Known Variance
Normal Model with Known Mean, Unknown Variance
8/12/2019 Conjugacy Print
13/28
Outline
Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model
The Normal ModelNormal Model with Unknown Mean, Known Variance
Normal Model with Known Mean, Unknown Variance
8/12/2019 Conjugacy Print
14/28
Normal Model with Unknown Mean, Known Variance
Suppose we wish to estimate a model where the likelihood of thedata is normal with an unknown mean and a known variance 2.
Our parameter of interest is .
We can use a conjugateNormal prioron , with mean 0 andvariance20 .
p(|y, 2) p(y|, 2)p()
Normal(1, 21 ) = Normal(, 2) Normal(0, 20 )
8/12/2019 Conjugacy Print
15/28
Let represent our parameter of interest, in this case .
p(|y) nY
i=1
1
22 exp(yi
)2
22
1p220 exp
(
0)
2
220
exp
nXi=1
(yi )222
( 0)2
220
!
= exp"1
2
n
Xi=1
(yi
)2
2 +
(
0)
2
20!#
= exp
" 1
2220
20
nXi=1
(yi )2+2( 0)2!#
= exp" 1
222020
n
Xi=1
(y2i
2yi+ 2)+2(2
20+
20)!#
8/12/2019 Conjugacy Print
16/28
We can multiply the 2yiterm in the summation by nn
in order toget the equations in terms of the sufficient statistic y.
p(|y)
exp"
1
222020
n
Xi=1
(y2i
2n
nyi+
2)+2(2
20+
20)!#
= exp
" 1
2220
20
nXi=1
y2i 20 2ny+ 20 n2+22202 + 202
!#
We can then factor the terms into several parts. Since 2
0
2
and20n
i=1y2i do not contain , we can represent them as some
constant k, which we will drop into the normalizing constant.
p(|y) exp 1
2220
2
2 + 20 n 2
0
2 + 20 ny
+k
= exp1
2
22 + 20 n220
2
02 + 20 ny
220
+k
= exp
1
2
2
1
20+
n
2
2
0
20+ny
2
+k
8/12/2019 Conjugacy Print
17/28
Lets multiply by
1
20
+ n2
120
+ n2
in order to simplify the 2 term.
p(|y) exp241
2
1
20+
n
2
0@20@ 120 + n2
1
20
+ n2
1A 2
0@ 020 + ny2
1
20
+ n2
1A+k
1A35
= exp2412 120 +
n
20@
2 20@0
2
0
+ ny2
1
20
+ n2
1A+k
1A35
= exp
2
41
2
1
20+
n
2
0
@
0
@
0
20
+ ny2
1
20
+ n2
1
A
1
A
23
5Finally, we have something that looks like the density function of aNormal distribution!
8/12/2019 Conjugacy Print
18/28
p(|y
) exp241
2 120 +
n
20@ 0@
0
20
+ ny2
1
20
+ n21A1A
2
35
Posterior Mean: 1 =
0
20
+ny2
1
2
0
+ n2
Posterior Variance: 21 =
120
+ n2
1
Posterior Precision: 1
2
1
= 1
2
0
+ n
2
Posterior Precisionis just the sum of theprior precisionand thedata precision.
8/12/2019 Conjugacy Print
19/28
We can also look more closely at how the prior mean 0 and theposterior mean 1 relate to each other.
1 =02
0
+ ny2
1
20
+ n2
=
02+20ny
20
2
2+n20
20
2
= 02 + 20 ny
2 +n20
= 0
2
2 +n20+
20 ny
2 +n20
As n increases, data mean dominates prior mean.
As 20 decreases (less prior variance, greater prior precision),our prior mean becomes more important.
A Si l E l
8/12/2019 Conjugacy Print
20/28
A Simple Example
Suppose we have some (fake) data on the heights (in inches) of arandom sample of 100 individuals in the U.S. population.> known.sigma.sq unknown.mean n heights mu0 tau.sq0
8/12/2019 Conjugacy Print
21/28
Our posterior is a Normal distribution with Mean0
20
+ny
2
1
20
+ n2
and
Variance
120
+ n2
1
> post.mean post.mean
[1] 68.03969
> post.var post.var
[1] 0.1592920
O tli
8/12/2019 Conjugacy Print
22/28
Outline
Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model
The Normal ModelNormal Model with Unknown Mean, Known VarianceNormal Model with Known Mean, Unknown Variance
N l M d l ith K M U k V i
8/12/2019 Conjugacy Print
23/28
Normal Model with Known Mean, Unknown Variance
Now suppose we wish to estimate a model where the likelihood of
the data is normal with a known mean and an unknown variance2.
Now our parameter of interest is 2.
We can use a conjugateinverse gamma prioron 2, with shapeparameter0 and scale parameter 0.
p(2|y, ) p(y|, 2)p(2)
Invgamma(1, 1) = Normal(, 2) Invgamma(0, 0)
8/12/2019 Conjugacy Print
24/28
Let represent our parameter of interest, in this case 2.
p(|y, ) n
Yi=112
exp
(yi )
2
2
00
(0)(0+1) exp
0
nY
i=1
12 exp
(yi )
2
2
(0+1) exp
0
= n2 exp
Pn
i=1(yi )22
(0+1) exp
0
= (0+ n2+1) exp0
+Pn
i=1(yi )2
2
= (0+n2+1) exp
24
0@20+ 2
Pni=1 (yi)
2
2
2
1A35
= (0+n2+1) exp
240@0+ Pn
i=1 (yi
)2
2
1A35
This looks like the density of an inverse gamma distribution!
8/12/2019 Conjugacy Print
25/28
p(|y, ) (0+ n2+1) exp24
0@0+
Pni=1(yi)
2
2
1A35
1 =
0+
n
2
1 = 0+
Pni=1(yi )2
2
Our posterior is anInvgamma(0+ n2 , 0+
Pni=1(yi)
2
2 )
distribution.
A Simple Example
8/12/2019 Conjugacy Print
26/28
A Simple Example
Again suppose we have some (fake) data on the heights (in inches)
of a random sample of 100 individuals in the U.S. population.> known.mean unknown.sigma.sq n heights alpha0 beta0
8/12/2019 Conjugacy Print
27/28
Our posterior is a inverse gamma distribution with shape 0+ n2
and scale 0+Pn
i=1(yi)
2
2
> alpha1 beta1 library(MCMCpack)> posterior post.mean post.mean
[1] 12.88139
> post.var post.var
[1] 3.136047
Hmm . . . what if we increased our sample size?
8/12/2019 Conjugacy Print
28/28
> n heights alpha1 beta1 posterior post.mean post.mean
[1] 15.92281
> post.var post.var
[1] 0.5058952