Bayesian case studies, practical 1

Bayesian Case Studies, week 1

Robin J. Ryder

7 January 2013

Robin J. Ryder Bayesian Case Studies, week 1

About this course

Two aims:

1 Implement computational algorithms

2 Analyse real datasets

6× 3 hours.E-mail: [email protected]. Office B627.Evaluation: written-up analysis of a dataset, to hand in by end ofMarch. The project topic will be given in February.


Exponential family

A family of distributions (=a model) is an exponential family if thedensity can be written as

fX (x |θ) = h(x) exp[η(θ) · T (x)− A(θ)]

where h, η, T and A are known functions.

Then T (x) is a sufficient statistic. For iid x1, . . . , xn,∑

T (xi ) is asufficient statistic for the sample: it encapsulates all theinformation about the parameters included in the data. Theposterior depends on the sample only through the sufficientstatistic.η(θ) is called the natural parameter.A(θ) is the log-partition, the log of the normalizing factor.


Exponential family

A family of distributions (=a model) is an exponential family if thedensity can be written as

fX (x |θ) = h(x) exp[η(θ) · T (x)− A(θ)]

where h, η, T and A are known functions.

Then T (x) is a sufficient statistic. For iid x1, . . . , xn,∑

T (xi ) is asufficient statistic for the sample: it encapsulates all theinformation about the parameters included in the data. Theposterior depends on the sample only through the sufficientstatistic.η(θ) is called the natural parameter.A(θ) is the log-partition, the log of the normalizing factor.


Conjugate prior

A family of distributions is a conjugate prior for a given model ifthe posterior belongs to the same family of distributions.This is mostly a computational advantage.If the model is an exponential family, then a conjugate prior exists.


Jeffreys’ prior

Jeffreys’ prior, also called the uninformative prior, is invariant byreparameterization. In the one-dimensional case, it is defined as

π(θ) ∝√

I (θ)

where I (θ) is the Fisher information, which is defined as a functionof the log-likelihood `:

I (θ) = EX

[(∂`

∂θ

)2∣∣∣∣∣ θ]

= −EX

[∂2`

∂θ2

∣∣∣∣ θ](under certain regularity conditions)


Jeffreys’ prior (contd)

Jeffreys’ prior may be improper, which means that it integrates toinfinity.This is not an issue as long as the corresponding posterior isproper. This point should always be checked.


Data: Ship accidents

The dataset ShipAccidents includes data on accidents of 40classes of ships. Each row corresponds to one class. Each class ofship is defined by 3 attributes: type of ship (5 modalities), periodof construction (4 modalities), period of operation (2 modalities).

For each type of ship, we are given the cumulative number ofmonths in operation and the cumulative number of incidents,which we expect to follow a Poisson distribution.


ABC

Approximate Bayesian Computation is a computational method todraw approximate samples from a posterior distribution in caseswhere the likelihood is intractable, but where it is easy to simulatenew datasets.Given observed data Dobs , with prior π(θ), we wish to sample θfrom the posterior π(theta)L(D|θ).The non-approximate version of the algorithm is:

1 Simulate θ from the prior π.

2 Simulate a new dataset Dsim from the model, with parameterθ.

3 If Dobs = Dsim, then accept θ; else reject θ.

4 Repeat until we get a large enough sample of θ’s.


ABC (contd)

It is clear that this algorithm gives samples which follow exactlythe posterior distribution, but the acceptation probability at step 3is very small, making the algorithm very slow. Instead, anapproximate version is used, by introducing a distance d ondatasets and a tolerance parameter ε:



3 If d(Dobs ,Dsim) < ε, then accept θ; else reject θ.



ABC (contd)

It is clear that this algorithm gives samples which follow exactlythe posterior distribution, but the acceptation probability at step 3is very small, making the algorithm very slow. Instead, anapproximate version is used, by introducing a distance d ondatasets and a tolerance parameter ε:



3 If d(Dobs ,Dsim) < ε, then accept θ; else reject θ.



ABC (contd)

In the limit ε→ 0, this algorithm is exact.In practice, the distance is usually computed on a summarystatistic of the data. Ideally, the summary statistic is sufficient,thus incurring no loss of information.


Documents

Bayesian case studies, practical 1