Download pdf - primer clase de estadística bayesiana

8/9/2019 primer clase de estadística bayesiana

1/19

Bayesian Statistics Lecture 1 Motivation

Bayesian StatisticsLecture 1

Motivation

Alvaro Mauricio Montenegro D́ıaz

August, 2011

Alvaro Mauricio Montenegro D́ıaz Bayesian Statistics Lecture 1 Motivation

http://find/


2/19


Table of contents

1 Brief History2 Bayesian-frequentist controversy

3 Motivating Examples.


http://find/


3/19


Brief History

Table of contents

1 Brief History

2 Bayesian-frequentist controversy

3 Some basic Bayesian models


http://find/


4/19


Brief History

Classical Statistics

Classical Age

1 Probability is old. Several hundreds of years ago, most mathematicianssupported by rich noblemen advise them on how to maximize theirwinnings in games of chance.

2 Statistics is young. Linear regression rst appeared in the work of Francis Galton in the late 1800s. Karl Pearson added correlation andgoodness-of-t measures around the turn of the last century.

3 Statistics blossomed . Between 1920 and 1930 R.A. Fisher developed thenotion of likelihood of general estimation. Jerry Neyman and EgonPearson developed the basis for classical hypothesis testing.

4 The war was an engine. A urry of research activity was energized bythe World War II.


http://find/


5/19


Brief History

Bayesian Statistics

Bayesian Age

1 Bayesian methods are much older. The original paper dating in 1763by the Rev. Thomas Bayes, a minister and amateur mathematician.

2 Laplace and Gauss and others were interest ed in the 19th century.3 Bayesian approach was ignored or actively opposed in the early of 20th

century.

4 Several non-statisticians as Jeffreys (a physicist) and Bolye (aneconometrician), continued to lobby on behalf of the Bayesian ideas,

(which they referred as inverse probability .5 Bayesian statistics is here. Beginning around 1950 statisticians as

Savage, Finetti, Lindley, and many others began advocating Bayesianmethods as remedies for certain deciencies in the classical approach.


http://find/


6/19


Bayesian-frequentist controversy

Table of contents

1 Brief History




http://find/http://goback/


7/19



Example 1. Classical point of view

Example 1. Frequentist interval estimation

Suppose X i iid

∼N (θ, σ 2), i = 1 , · · · , n. We desire a 95% interval estimate forthe population mean θ.

Provided n sufficiently large (say bigger than 30), a classical approach woulduse the condence interval .

δ (x ) = x̄ ±1.96s / √ n

Interpretation. Before any data are collected the probability that the interval

contains the true value is 0.95, for any value of θ and σ2

.

After collecting the data and computing δ (x ), the interval contains the true θor it does not; its coverage probability is not 0.95, but either 0 or 1.

It is not correct to say that the true θ has a 95% chance of falling in δ (x ).


B i St ti ti L t 1 M ti ti



8/19



Example1. Bayesian point of view

Bayesian interval estimation

Bayesian condence intervals are called credibility sets . They are free of theawkward frequentist interpretation.

In the Bayesian, approach, the data are rst observed. Then, conditional on theobserved data the posterior distribution p (θ|x ) is obtained.A 95% credibility interval can be obtained by taking the 2 .5 and 97.5percentiles of the posterior distribution.

Interpretation. The actual value of θ has a 95% chance of falling incondence interval. However, there is a price to pay. It is necessary to specify a(possible vague) prior distribution for θ.



http://goforward/http://find/http://goback/


9/19



Example 2. Test of hypothesis

Frequentist Test of hypothesis

Suggested by Lindley and Phillips(1976).

Suppose in 12 independent toses of a coin, we observe 9 heads and 3 tails.

We wish to tests the hypothesis H 0 : θ = 1 / 2 versus the alternative H a θ > 1/ 2,where θ is the true probability of heads.



http://find/


10/19




Possible modelsGiven only this information, two choices for the sampling distribution emerge:

1 Binomial: The number n = 12 tosses was xed beforehand, and therandom quantity X was the number of heads observed in the n toses.

Then the likelihood function is given by

L1(θ) =nx

θx (1 −θ)1− x =

129

θ9(1 −θ)3

2 Negative binomial: Data collection involve ipping the coin until the third

tail appeared. Here X is the number of heads required to complete theexperiment, so that X ∼NegBin (r = 3 , θ), with likelihood

L2(θ) =r + x −1

x θx (1 −θ)1

− x =119

θ9(1 −θ)3



http://find/


11/19




ComputationsThe p -value to the rejection region Reject H 0 if X ≥c are computed as: Usingthe binomial likelihood we have

α 1 = P θ = 12

(X

≥9) =

12

j =9

j

j −12θ j (1

−θ)12− j = 0 .075,

Using the negative binomial likelihood we have

α 2 = P θ = 12 (X ≥9) =∞

j =9

2 + j j

θ j (1 −θ)3 = 0 .0325.

Uppss!!

Using the usual type I error level α = 0 .05, the two model assumptions lead totwo different decisions. The problem is that there is not sufficient informationin the problem setting to help us in the selection of the correct model.



http://find/


12/19


Some basic Bayesian models

Table of contents

1 Brief History







13/19

y


Bayes theorem

The most basic Bayesian model has three stages:

1 Likelihood specication Y ∼θf (y |θ).2 Prior specication θ∼p (θ), where Y or θ can be vectors. p (θ) can becompleted specied (the simplest case) or not.

3 Computation of the posterior distribution

The posterior distribution of θ is given by

π(θ|y ) = f (y

|θ)p (θ)

m(y ) ,where

m(y ) = f (y |θ)p (θ)d θis the marginal distribution of Y . This is the Bayes’ Theorem.



http://find/


14/19

y


Distributions in the Bayesian perspective.



http://find/


15/19


A Gaussian/Gaussian (normal/normal) model

Y ∼N (θ, σ 2), σ2 known.In this case both the prior and the likelihood (the observational model) areGaussian distributions, namely,

θ

∼

N (µ, τ 2)

Y |θ∼N (θ, σ 2),thus, the marginal distribution of Y is N (µ, σ 2 + τ 2), and the posteriordistribution of θ is Gaussian with mean and variance given by

E [θ|Y ] = B µ + (1 −B )Y Var [θ|Y ] = (1 −B )σ2,

whereB = σ2/ (σ2 + τ 2).





16/19


A beta/binomial model

Y ∼Bin(n, θ)This is a model which could be used in the problem of test of hypothesispresented above. After observe Y , the number of success in n independenttrials, the sampling distribution (observational model) is given by

P (Y = y |θ) = f (y |θ) = ny θy (1 −θ)1− y

To obtain a closed form for the marginal distribution, we use the Beta (a , b )prior distribution. That is

P (θ) = 1

B (a , b ) θa − 1

(1 −θ)b − 1

,where B (a , b ) is the beta function.

For convenience we use the reparametrization ( µ, M ) where µ = a/ (a + b ), isthe prior mean and M = a + b , a measure of the prior precision.



http://find/


17/19


A beta/binomial model II

Y ∼Bin(n, θ)The prior variance given by µ(1 −µ)/ (M + 1) is a decreasing function of of M .The marginal distribution of Y is called beta-binomial , and can be shown that

E

Y

n = µ

Var Y n

= µ(1 −µ)

n1 +

n −1M + 1

.

The posterior distribution of θ is again a Beta distribution with

θ̂ = E [θ|Y ] = M M + n

µ + n

M + nY n

Var [θ|Y ] = θ̂(1 −θ̂)/ (M + n) .



http://find/


18/19


Poisson distribution

Y i ∼Po (λ ), i = 1 . . . , nIt usual to assume a gamma distribution for the prior of λ, say G (α, β ). Forthis example, we assume the parameterization

p (x ; α, β ) = x α − 1e − β x

Γ(α )β − α ,

so, E [X ] = α/β and Var [X ] = α/β 2. The posterior distribution is

p (λ |y) ∝ n

i =1

e − λ λ y i λ α − 1e − βλ = e − (n+ β )λ λ i y i + α − 1





19/19


Example 1. Bayesian estimate of λ

Y i ∼Po (λ ), i = 1 . . . , nThen,

p (λ |y) = G n

i =1

y i + α, n + β

Hence, the EAP estimate of λ is

λ =ni =1 Y i + αn + β =

nn + β

ni =1 y i n +

β n + β

αβ .

http://find/