Click here to load reader
Upload
robin-ryder
View
95
Download
0
Embed Size (px)
Citation preview
Bayesian Case Studies, week 1
Robin J. Ryder
7 January 2013
Robin J. Ryder Bayesian Case Studies, week 1
About this course
Two aims:
1 Implement computational algorithms
2 Analyse real datasets
6× 3 hours.E-mail: [email protected]. Office B627.Evaluation: written-up analysis of a dataset, to hand in by end ofMarch. The project topic will be given in February.
Robin J. Ryder Bayesian Case Studies, week 1
Exponential family
A family of distributions (=a model) is an exponential family if thedensity can be written as
fX (x |θ) = h(x) exp[η(θ) · T (x)− A(θ)]
where h, η, T and A are known functions.
Then T (x) is a sufficient statistic. For iid x1, . . . , xn,∑
T (xi ) is asufficient statistic for the sample: it encapsulates all theinformation about the parameters included in the data. Theposterior depends on the sample only through the sufficientstatistic.η(θ) is called the natural parameter.A(θ) is the log-partition, the log of the normalizing factor.
Robin J. Ryder Bayesian Case Studies, week 1
Exponential family
A family of distributions (=a model) is an exponential family if thedensity can be written as
fX (x |θ) = h(x) exp[η(θ) · T (x)− A(θ)]
where h, η, T and A are known functions.
Then T (x) is a sufficient statistic. For iid x1, . . . , xn,∑
T (xi ) is asufficient statistic for the sample: it encapsulates all theinformation about the parameters included in the data. Theposterior depends on the sample only through the sufficientstatistic.η(θ) is called the natural parameter.A(θ) is the log-partition, the log of the normalizing factor.
Robin J. Ryder Bayesian Case Studies, week 1
Conjugate prior
A family of distributions is a conjugate prior for a given model ifthe posterior belongs to the same family of distributions.This is mostly a computational advantage.If the model is an exponential family, then a conjugate prior exists.
Robin J. Ryder Bayesian Case Studies, week 1
Jeffreys’ prior
Jeffreys’ prior, also called the uninformative prior, is invariant byreparameterization. In the one-dimensional case, it is defined as
π(θ) ∝√
I (θ)
where I (θ) is the Fisher information, which is defined as a functionof the log-likelihood `:
I (θ) = EX
[(∂`
∂θ
)2∣∣∣∣∣ θ]
= −EX
[∂2`
∂θ2
∣∣∣∣ θ](under certain regularity conditions)
Robin J. Ryder Bayesian Case Studies, week 1
Jeffreys’ prior (contd)
Jeffreys’ prior may be improper, which means that it integrates toinfinity.This is not an issue as long as the corresponding posterior isproper. This point should always be checked.
Robin J. Ryder Bayesian Case Studies, week 1
Data: Ship accidents
The dataset ShipAccidents includes data on accidents of 40classes of ships. Each row corresponds to one class. Each class ofship is defined by 3 attributes: type of ship (5 modalities), periodof construction (4 modalities), period of operation (2 modalities).
For each type of ship, we are given the cumulative number ofmonths in operation and the cumulative number of incidents,which we expect to follow a Poisson distribution.
Robin J. Ryder Bayesian Case Studies, week 1
ABC
Approximate Bayesian Computation is a computational method todraw approximate samples from a posterior distribution in caseswhere the likelihood is intractable, but where it is easy to simulatenew datasets.Given observed data Dobs , with prior π(θ), we wish to sample θfrom the posterior π(theta)L(D|θ).The non-approximate version of the algorithm is:
1 Simulate θ from the prior π.
2 Simulate a new dataset Dsim from the model, with parameterθ.
3 If Dobs = Dsim, then accept θ; else reject θ.
4 Repeat until we get a large enough sample of θ’s.
Robin J. Ryder Bayesian Case Studies, week 1
ABC (contd)
It is clear that this algorithm gives samples which follow exactlythe posterior distribution, but the acceptation probability at step 3is very small, making the algorithm very slow. Instead, anapproximate version is used, by introducing a distance d ondatasets and a tolerance parameter ε:
1 Simulate θ from the prior π.
2 Simulate a new dataset Dsim from the model, with parameterθ.
3 If d(Dobs ,Dsim) < ε, then accept θ; else reject θ.
4 Repeat until we get a large enough sample of θ’s.
Robin J. Ryder Bayesian Case Studies, week 1
ABC (contd)
It is clear that this algorithm gives samples which follow exactlythe posterior distribution, but the acceptation probability at step 3is very small, making the algorithm very slow. Instead, anapproximate version is used, by introducing a distance d ondatasets and a tolerance parameter ε:
1 Simulate θ from the prior π.
2 Simulate a new dataset Dsim from the model, with parameterθ.
3 If d(Dobs ,Dsim) < ε, then accept θ; else reject θ.
4 Repeat until we get a large enough sample of θ’s.
Robin J. Ryder Bayesian Case Studies, week 1
ABC (contd)
In the limit ε→ 0, this algorithm is exact.In practice, the distance is usually computed on a summarystatistic of the data. Ideally, the summary statistic is sufficient,thus incurring no loss of information.
Robin J. Ryder Bayesian Case Studies, week 1