Download pdf - Conjugacy Print

8/12/2019 Conjugacy Print

1/28

Conjugate Models

Patrick Lam


2/28

Outline

Conjugate ModelsWhat is Conjugacy?The Beta-Binomial Model

The Normal ModelNormal Model with Unknown Mean, Known Variance

Normal Model with Known Mean, Unknown Variance


3/28

Outline





4/28

Outline





5/28

Conjugacy

Suppose we have a Bayesian model with a likelihood p(y|) and a

prior p().

If we multiply our likelihood andprior, we get ourposterior p(|y)up to a constant of proportionality.

If ourposterioris a distribution that is of the same family as ourprior, then we have conjugacy. We say that theprioris conjugateto the likelihood.

Conjugate models are great because we know the exact distribution

of theposteriorso we can easily simulate or derive quantities ofinterest analytically.

In practice, we rarely have conjugacy.


6/28

Brief List of Conjugate Models

Likelihood Prior PosteriorBinomial Beta Beta

Negative Binomial Beta BetaPoisson Gamma Gamma

Geometric Beta BetaExponential Gamma Gamma

Normal (mean unknown) Normal NormalNormal (variance unknown) Inverse Gamma Inverse Gamma

Normal (mean and variance unknown) Normal/Gamma Normal/GammaMultinomial Dirichlet Dirichlet


7/28

Outline





8/28

A Binomial Example

Suppose we have vector of data on voter turnout for a random

sample ofn voters in the 2004 US Presidential election.

We can model the voter turnout with a binomial model.

Y Binomial(n, )

Quantity of interest: (voter turnout)

Assumptions:

Each voters decision to vote follows the Bernoulli distribution. Each voter has the same probability of voting. (unrealistic)

Each voters decision to vote is independent. (unrealistic)


9/28

The Conjugate Beta Prior

We can use the beta distribution as apriorfor, since the betadistribution is conjugate to the binomial distribution.

p(|y) p(y|)p()

= Binomial(n, ) Beta(, )

=ny

y(1 )(ny)( + )

()()(1)(1 )(1)

y(1 )(ny)(1)(1 )(1)

p(|y) y+

1(1 )n

y+

1

Theposterior distributionis simply aBeta(y+ , n y+ )distribution. Effectively, ourprioris just adding 1 successesand 1 failures to the dataset.


10/28

The Uninformative (Flat) Uniform Prior

Suppose we have no strong prior beliefs about the parameters. We

can choose apriorthat gives equal weight to all possible values ofthe parameters, essentially an uninformative orflat prior.

p() =constant

for all values of.

For the binomial model, one example of a flat prioris theBeta(1,1) prior:

p() = (2)

(1)(1)(1

1)(1 )(1

1)

= 1

which is the Uniform distribution over the [0, 1] interval.


11/28

Since we know that a Binomial likelihood and a Beta(1,1) priorproduces aBeta(y+ 1, n y+ 1) posterior, we can simulate the

posteriorin R.

Suppose our turnout data had 500 voters, of which 285 voted.

> table(turnout)

turnout

0 1

215 285

Setting ourpriorparameters at = 1 and = 1,

> a < - 1> b < - 1

we get theposterior

> posterior.unif.prior


12/28

Outline





13/28

Outline





14/28

Normal Model with Unknown Mean, Known Variance

Suppose we wish to estimate a model where the likelihood of thedata is normal with an unknown mean and a known variance 2.

Our parameter of interest is .

We can use a conjugateNormal prioron , with mean 0 andvariance20 .

p(|y, 2) p(y|, 2)p()

Normal(1, 21 ) = Normal(, 2) Normal(0, 20 )


15/28

Let represent our parameter of interest, in this case .

p(|y) nY

i=1

1

22 exp(yi

)2

22

1p220 exp

(

0)

2

220

exp

nXi=1

(yi )222

( 0)2

220

!

= exp"1

2

n

Xi=1

(yi

)2

2 +

(

0)

2

20!#

= exp

" 1

2220

20

nXi=1

(yi )2+2( 0)2!#

= exp" 1

222020

n

Xi=1

(y2i

2yi+ 2)+2(2

20+

20)!#


16/28

We can multiply the 2yiterm in the summation by nn

in order toget the equations in terms of the sufficient statistic y.

p(|y)

exp"

1

222020

n

Xi=1

(y2i

2n

nyi+

2)+2(2

20+

20)!#

= exp

" 1

2220

20

nXi=1

y2i 20 2ny+ 20 n2+22202 + 202

!#

We can then factor the terms into several parts. Since 2

0

2

and20n

i=1y2i do not contain , we can represent them as some

constant k, which we will drop into the normalizing constant.

p(|y) exp 1

2220

2

2 + 20 n 2

0

2 + 20 ny

+k

= exp1

2

22 + 20 n220

2

02 + 20 ny

220

+k

= exp

1

2

2

1

20+

n

2

2

0

20+ny

2

+k


17/28

Lets multiply by

1

20

+ n2

120

+ n2

in order to simplify the 2 term.

p(|y) exp241

2

1

20+

n

2

0@20@ 120 + n2

1

20

+ n2

1A 2

0@ 020 + ny2

1

20

+ n2

1A+k

1A35

= exp2412 120 +

n

20@

2 20@0

2

0

+ ny2

1

20

+ n2

1A+k

1A35

= exp

2

41

2

1

20+

n

2

0

@

0

@

0

20

+ ny2

1

20

+ n2

1

A

1

A

23

5Finally, we have something that looks like the density function of aNormal distribution!


18/28

p(|y

) exp241

2 120 +

n

20@ 0@

0

20

+ ny2

1

20

+ n21A1A

2

35

Posterior Mean: 1 =

0

20

+ny2

1

2

0

+ n2

Posterior Variance: 21 =

120

+ n2

1

Posterior Precision: 1

2

1

= 1

2

0

+ n

2

Posterior Precisionis just the sum of theprior precisionand thedata precision.


19/28

We can also look more closely at how the prior mean 0 and theposterior mean 1 relate to each other.

1 =02

0

+ ny2

1

20

+ n2

=

02+20ny

20

2

2+n20

20

2

= 02 + 20 ny

2 +n20

= 0

2

2 +n20+

20 ny

2 +n20

As n increases, data mean dominates prior mean.

As 20 decreases (less prior variance, greater prior precision),our prior mean becomes more important.

A Si l E l


20/28

A Simple Example

Suppose we have some (fake) data on the heights (in inches) of arandom sample of 100 individuals in the U.S. population.> known.sigma.sq unknown.mean n heights mu0 tau.sq0


21/28

Our posterior is a Normal distribution with Mean0

20

+ny

2

1

20

+ n2

and

Variance

120

+ n2

1

> post.mean post.mean

[1] 68.03969

> post.var post.var

[1] 0.1592920

O tli


22/28

Outline


The Normal ModelNormal Model with Unknown Mean, Known VarianceNormal Model with Known Mean, Unknown Variance

N l M d l ith K M U k V i


23/28


Now suppose we wish to estimate a model where the likelihood of

the data is normal with a known mean and an unknown variance2.

Now our parameter of interest is 2.

We can use a conjugateinverse gamma prioron 2, with shapeparameter0 and scale parameter 0.

p(2|y, ) p(y|, 2)p(2)

Invgamma(1, 1) = Normal(, 2) Invgamma(0, 0)


24/28

Let represent our parameter of interest, in this case 2.

p(|y, ) n

Yi=112

exp

(yi )

2

2

00

(0)(0+1) exp

0

nY

i=1

12 exp

(yi )

2

2

(0+1) exp

0

= n2 exp

Pn

i=1(yi )22

(0+1) exp

0

= (0+ n2+1) exp0

+Pn

i=1(yi )2

2

= (0+n2+1) exp

24

0@20+ 2

Pni=1 (yi)

2

2

2

1A35

= (0+n2+1) exp

240@0+ Pn

i=1 (yi

)2

2

1A35

This looks like the density of an inverse gamma distribution!


25/28

p(|y, ) (0+ n2+1) exp24

0@0+

Pni=1(yi)

2

2

1A35

1 =

0+

n

2

1 = 0+

Pni=1(yi )2

2

Our posterior is anInvgamma(0+ n2 , 0+

Pni=1(yi)

2

2 )

distribution.

A Simple Example


26/28

A Simple Example

Again suppose we have some (fake) data on the heights (in inches)

of a random sample of 100 individuals in the U.S. population.> known.mean unknown.sigma.sq n heights alpha0 beta0


27/28

Our posterior is a inverse gamma distribution with shape 0+ n2

and scale 0+Pn

i=1(yi)

2

2

> alpha1 beta1 library(MCMCpack)> posterior post.mean post.mean

[1] 12.88139

> post.var post.var

[1] 3.136047

Hmm . . . what if we increased our sample size?


28/28

> n heights alpha1 beta1 posterior post.mean post.mean

[1] 15.92281

> post.var post.var

[1] 0.5058952