Module 1: Introduction to Bayesian Statistics, Part Ircs46/modern_bayes17/lecturesModernB...Traditional inference Youaregivendata X andthereisanunknown parameter you wishtoestimateθ

Module 1: Introduction to Bayesian Statistics,Part I

Rebecca C. Steorts

Agenda

I MotivationsI Traditional inferenceI Bayesian inferenceI Bernoulli, BetaI Connection to the Binomial distributionI Posterior of Beta-BernoulliI Example with 2012 election dataI Marginal likelihoodI Posterior Prediction

Social networks

Precision Medicine

Estimating Rent Prices in Small Domains

Benchmarked estimates Benchmarked estimates with smoothing

6.5

7.0

7.5

8.0

8.5

9.0

9.5

10.0

Traditional inference

You are given data X and there is an unknown parameter youwish to estimate θHow would you estimate θ?

I Find an unbiased estimator of θ.I Find the maximum likelihood estimate (MLE) of θ by looking

at the likelihood of the data.I If you cannot remember the definition of an unbiased estimator

or the MLE, review these before our next class.

Bayesian Motivation

[credit: Peter Orbanz, Columbia University]

Bayesian inference

Bayesian methods trace its origin to the 18th century and EnglishReverend Thomas Bayes, who along with Pierre-Simon Laplacediscovered what we now call Bayes’ Theorem

I p(x | θ) likelihoodI p(θ) priorI p(θ | x) posteriorI p(x) marginal distribution

p(θ|x) = p(θ, x)p(x) = p(x |θ)p(θ)

p(x) ∝ p(x |θ)p(θ)

Bernoulli distribution

The Bernoulli distribution is very common due to binary outcomes.I Consider flipping a coin (heads or tails).I We can represent this a binary random variable where the

probability of heads is θ and the probability of tails is 1− θ.

The write the random variable as X ∼ Bernoulli(θ)1(0 < θ < 1)It follows that the likelihood is

p(x | θ) = θx (1− θ)(1−x)1(0 < θ < 1).

I Exercise: what is the mean and the variance of X?

Bernoulli distribution

I Suppose that X1, . . . ,Xniid∼ Bernoulli(θ). Then for

x1, . . . , xn ∈ {0, 1} what is the likelihood?

Notation

I ∝: means “proportional to”I x1:n denotes x1, . . . , xn

Likelihood

p(x1:n|θ) = P(X1 = x1, . . . ,Xn = xn | θ)

=n∏

i=1P(Xi = xi | θ)

=n∏

i=1p(xi |θ)

=n∏

i=1θxi (1− θ)1−xi

= θ∑

xi (1− θ)n−∑

xi .

Beta distribution

Given a, b > 0, we write θ ∼ Beta(a, b) to mean that θ has pdf

p(θ) = Beta(θ|a, b) = 1B(a, b)θ

a−1(1− θ)b−11(0 < θ < 1),

i.e., p(θ) ∝ θa−1(1− θ)b−1 on the interval from 0 to 1.I Here,

B(a, b) = Γ(a)Γ(b)Γ(a + b)

.I The mean is E (θ) =

∫θ p(θ)dθ = a/(a + b).

Posterior of Bernoulli-Beta

Lets derive the posterior of θ | x1:n

p(θ|x1:n)∝ p(x1:n|θ)p(θ)

= θ∑

xi (1− θ)n−∑

xi 1B(a, b)θ

a−1(1− θ)b−1I(0 < θ < 1)

∝ θa+∑

xi −1(1− θ)b+n−∑

xi −1I(0 < θ < 1)∝ Beta

(θ | a +

∑xi , b + n −

∑xi).

Approval ratings of Obama

What is the proportion of people that approve of President Obamain PA?

I We take a random sample of 10 people in PA and find that 6approve of President Obama.

I The national approval rating (Zogby poll) of President Obamain mid-September 2015 was 45%. We’ll assume that in PA hisapproval rating is approximately 50%.

I Based on this prior information, we’ll use a Beta prior for θ andwe’ll choose a and b.

Obama Example

n = 10# Fixing values of a,b.a = 21/8b = 0.04th = seq(0,1, length=500)x = 6

# we set the likelihood, prior, and posteriors with# THETA as the sequence that we plot on the x-axis.# Beta(c,d) refers to shape parameterlike = dbeta(th, x+1, n-x+1)prior = dbeta(th, a, b)post = dbeta(th, x+a, n-x+b)

Likelihood

plot(th, like, type='l', ylab = "Density",lty = 3, lwd = 3, xlab = expression(theta))

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

θ

Den

sity

Prior

plot(th, prior, type='l', ylab = "Density",lty = 3, lwd = 3, xlab = expression(theta))

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

θ

Den

sity

Posterior

plot(th, post, type='l', ylab = "Density",lty = 3, lwd = 3, xlab = expression(theta))

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

θ

Den

sity

Likelihood, Prior, and Posterior

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

θ

Den

sity

PriorLikelihoodPosterior

Cast of characters

I Observed data: xI Note this could consist of many data points, e.g.,

x = x1:n = (x1, . . . , xn).

likelihood p(x |θ)prior p(θ)posterior p(θ|x)marginal likelihood p(x)posterior predictive p(xn+1|x1:n)loss function `(s, a)posterior expected loss ρ(a, x)risk / frequentist risk R(θ, δ)integrated risk r(δ)

Marginal likelihood

The marginal likelihood is

p(x) =∫

p(x |θ)p(θ) dθ

I What is the marginal likelihood for the Bernoulli-Beta?

Posterior predictive distribution

I We may wish to predict a new data point xn+1I We assume that x1:(n+1) are independent given θ

p(xn+1|x1:n) =∫

p(xn+1, θ|x1:n) dθ

=∫

p(xn+1|θ, x1:n)p(θ|x1:n) dθ

=∫

p(xn+1|θ)p(θ|x1:n) dθ.

Example: Back to the Beta-Bernoulli

Supposeθ ∼ Beta(a, b)

andX1, . . . ,Xn | θ

iid∼ Bernoulli(θ)

Then the marginal likelihood is

p(x1:n)

=∫

p(x1:n|θ)p(θ) dθ

=∫ 1

0θ∑

xi (1− θ)n−∑

xi 1B(a, b)θ

a−1(1− θ)b−1dθ

=B(a +

∑xi , b + n −

∑xi)

B(a, b) ,

by the integral definition of the Beta function.

Example continued

Let an = a +∑

xi and bn = b + n −∑

xi .It follows that the posterior distribution isp(θ|x1:n) = Beta(θ|an, bn).The posterior predictive can be derived to be

P(Xn+1 = 1 | x1:n) =∫

P(Xn+1 = 1 | θ)p(θ|x1:n)dθ

=∫θ Beta(θ|an, bn) = an

an + bn,

hence, the posterior predictive p.m.f. is

p(xn+1|x1:n) = axn+1n b1−xn+1

nan + bn

1(xn+1 ∈ {0, 1}).

Documents

Module 1: Introduction to Bayesian Statistics, Part Ircs46/modern_bayes17/lecturesModernB...Traditional inference Youaregivendata X andthereisanunknown parameter you wishtoestimateθ