80
SPH6004 Advanced Biostatistics Part 1: Bayesian Statistics Chapter 1: Introduction to Bayesian Statistics

SPH6004 Advanced Biostatistics

  • Upload
    genero

  • View
    60

  • Download
    1

Embed Size (px)

DESCRIPTION

SPH6004 Advanced Biostatistics. Part 1: Bayesian Statistics Chapter 1: Introduction to Bayesian Statistics. Golden rule: please stop me to ask questions. Objectives. Describe differences between Bayesian and classical statistics - PowerPoint PPT Presentation

Citation preview

Page 1: SPH6004  Advanced Biostatistics

SPH6004 Advanced Biostatistics

Part 1: Bayesian StatisticsChapter 1: Introduction to Bayesian Statistics

Page 2: SPH6004  Advanced Biostatistics

Golden rule:please stop me to ask questions

Page 3: SPH6004  Advanced Biostatistics

Week Starting Tuesday Friday1 13 Jan Alex [Alex]2 20 Jan3 27 Jan4 3 Feb Alex [Alex]5 10 Feb Alex [Alex]6 17 Feb Alex [Hyungwon]R 24 Feb7 3 Mar Hyungwon Hyungwon8 10 Mar Hyungwon Hyungwon9 17 Mar Hyungwon Hyungwon

10 24 Mar YY YY11 1 Apr YY YY12 7 Apr YY YY

Page 4: SPH6004  Advanced Biostatistics

Week Starting Tuesday Friday

1 13 Jan Introduction to Bayesian statistics

Importance sampling

2 20 Jan3 27 Jan4 3 Feb Markov chain

Monte CarloJAGS and STAN

5 10 Feb Hierarchical modelling

Variable selection and model checking

6 17 Feb Bayesian inference for mathematical

models

Page 5: SPH6004  Advanced Biostatistics

Objectives● Describe differences between Bayesian and classical

statistics● Develop appropriate Bayesian solutions to non-

standard problems, describe the model, fit it, relate analysis to problem

● Describe differences between computational methods used in Bayesian inference, understand how they work, implement them in a programming language

● Understand modelling and data analytic principles

Page 6: SPH6004  Advanced Biostatistics

Expectations

Know already● Basic and intermediate statistics● Likelihood function● Pick up programming in R● Generalised linear models● Able to read notes

Page 7: SPH6004  Advanced Biostatistics
Page 8: SPH6004  Advanced Biostatistics

The fundamental theoremof statistics?

Page 9: SPH6004  Advanced Biostatistics

Why the profundity?● Bayes' rule is THE way to invert conditional

probabilities● ALL probabilities are conditional● Bayes' rule therefore provides the 'calculus' to

manipulate probability, moving from p(A|B) to p(B|A).

Page 10: SPH6004  Advanced Biostatistics
Page 11: SPH6004  Advanced Biostatistics

Prof Gerd Gigerenzer

The following information is available about asymptomatic women aged 40 to 50 in your region who have mammography screening

Imagine you conduct such screening using mammography

For early detection of breast cancer, starting at some age, women are encouraged to have routine screening, even if they have no symptoms

Page 12: SPH6004  Advanced Biostatistics

• The probability a woman has breast cancer is 0.8%• If she has breast cancer, the probability is 90% that she has a positive mammogram• If she does not have breast cancer, the probability is 7% that she still has a positive mammogram

The challenge:• Imagine a woman who has a positive mammogram• What is the probability she actually has breast cancer?

Page 13: SPH6004  Advanced Biostatistics

Their answers...

I never inform my patients about statistical data. I would tell the

patient that mammography is not so exact, and I would in any case

perform a biopsy.

Page 14: SPH6004  Advanced Biostatistics

• The probability a woman has breast cancer is 0.8%• If she has breast cancer, the probability is 90% that she has a positive mammogram• If she does not have breast cancer, the probability is 7% that she still has a positive mammogram

Can we write the above mathematically?

The following information is available about asymptomatic women aged 40 to 50 in your region who have mammography screening

Page 15: SPH6004  Advanced Biostatistics
Page 16: SPH6004  Advanced Biostatistics

• p(B = 1 | A = 1)---the probability prior to observing the mammogram

• p(B = 1 | M = 1, A = 1)---the probability after observing it

• Bayes’ rule provides the way to update the prior probability to reflect the new information to get the posterior probability

• (Even the prior is a posterior)

Key point 1

Page 17: SPH6004  Advanced Biostatistics

Key point 2● Bayes' rule allows you to switch from– pr(something known | something unknown)

● to– pr(something unknown | something known)

Page 18: SPH6004  Advanced Biostatistics

Bayesians and frequentists

Bayes' rule is used to switch to pr(unknowns|knowns) for all situations in which there is uncertainty including parameter estimation

Bayes' rule is only used to make probability statements about events, that in principle could be repeatedly observed

Parameter estimation is done using methods that perform well under some arbitrary desiderata, such as being unbiased, and uncertainty is quantified by appealing to large samples

Page 19: SPH6004  Advanced Biostatistics
Page 20: SPH6004  Advanced Biostatistics

The Thai AIDS vaccine trial

Page 21: SPH6004  Advanced Biostatistics
Page 22: SPH6004  Advanced Biostatistics

The modified intention to treat analysis

Vaccine arm Placebo armSeroconverted 51 74Participated 8197 8198

Q: what is the “underlying” probability pv of infection over this time window for

those on the vaccine arm?

Page 23: SPH6004  Advanced Biostatistics

What does that actually mean?

• Participants are not randomly selected from the population: they are referred or volunteer

• Participants must meet eligibility requirements• Not representative of Thai population• Risk of infection different in Thailand and, eg,

Singapore• Nebulous: risk of infection in an hypothetical

second trial in same group of participants• Hope pv/pu has some relevance in other settings

Page 24: SPH6004  Advanced Biostatistics

Model for data

• Seems appropriate to assume Xv ~ Bin(Nv,pv)

• Xv = 51 = number vaccinees infected

• Nv = 8197 = number vaccinees

• pv = ?

Point estimate to summarise the data

Interval estimate to summarise uncertainty

(later) measure of evidence that the vaccine is effective

Page 25: SPH6004  Advanced Biostatistics

Refresher: frequentist approach

• Traditional approach to estimate pv:– find the value of pv that maximises the probability

of the data given that the hypothetical value were the true value

– using calculus– numerically (Newton-Raphson, simulated

annealing, cross entropy etc)– EITHER CASE use log likelihood

Page 26: SPH6004  Advanced Biostatistics

Refresher: frequentist approach

• Differentiating wrt argument we want to max over

• setting derivative to zero, adding hat, solving, gives

• which is just the empirical proportion infected

Page 27: SPH6004  Advanced Biostatistics

Refresher: frequentist approach

• To quantify the uncertainty might take a 95% interval

• You probably know

• (involves cheating: assuming you know pv and assuming the same size is close to infinity---actually there are better equations for small samples)

Page 28: SPH6004  Advanced Biostatistics
Page 29: SPH6004  Advanced Biostatistics

Interpretation

• The maximum likelihood estimate of pv is not the most likely value of pv

• Classical statisticians cannot make probabilistic statements about parameters

• Not a 95% chance pv lies in the interval (0.45,0.79)%

• 95% of such intervals over your lifetime (with no systematic error, small samples) will contain the true value

Page 30: SPH6004  Advanced Biostatistics

...we know all th

at oredy...

this is so boring, tell us something

new dr cook

Page 31: SPH6004  Advanced Biostatistics
Page 32: SPH6004  Advanced Biostatistics

Tackling it Bayesianly

• Target: point and interval estimate• Intermediate: probability of the parameter pv

given the data Xv and Nv, ie

• Likelihood function is same as before• What is the prior?

posterior for pv

likelihood fn prior for pv

dummy variable pi

Page 33: SPH6004  Advanced Biostatistics

What is the prior?

• There is no the prior• There is a prior: you choose it just as you

choose a Binomial model for the data• It represents information on the parameter

(proportion of vaccinees that would be infected) before the data are in hand

• Perhaps justifiable to assume all probs between [0,1] are equiprobably before data observed

Page 34: SPH6004  Advanced Biostatistics

What is the prior?

• 1{A}=1 if A true and 0 if A false

• Nv can be dropped from the condition as I assume sample size and probability of infection are independent

Page 35: SPH6004  Advanced Biostatistics

What is the posterior?

• pv on the range (0,1)• C a constant

• Smart way• (later)1• Dumb way• (now)2

Page 36: SPH6004  Advanced Biostatistics

The dumb way

• Grid of values for pv, finely spaced, on sensible range

• Evaluate log posterior +C• Transform to posterior ×C• Approximate integral by sum over grid• Scale to get rid of C exploiting fact that

posterior is a pdf and integrates to 1

Page 37: SPH6004  Advanced Biostatistics

The dumb way

Page 38: SPH6004  Advanced Biostatistics

The posterior

can take values >1

note asymmetry

Page 39: SPH6004  Advanced Biostatistics

Point estimates

• If you have a sample x1, x2, ... from a distribution, can represent overall location using:– mean– median– mode

• Similarly can report as point estimate mean, median or mode of posterior

Page 40: SPH6004  Advanced Biostatistics

In R

Method EstimateMean 0.63%Mode 0.63%Median 0.62%MLE 0.62%

Page 41: SPH6004  Advanced Biostatistics

Uncertainty

• Two common methods to get uncertainty interval/credible interval/intervals:– quantiles of the posterior (eg 2.5%ile, 97.5%ile)– highest posterior density interval

• Since there is a 95% chance if you drew a parameter value from the posterior of it falling in this interval, the interpretation is how many people think of confidence intervals

Page 42: SPH6004  Advanced Biostatistics

Highest posterior density intervals

need to draw sketch

Page 43: SPH6004  Advanced Biostatistics

In R(0.47,0.82)%

(0.47,0.81)%

(0.45,0.79)%

Page 44: SPH6004  Advanced Biostatistics
Page 45: SPH6004  Advanced Biostatistics

Important points

• In some situations it doesn’t really matter if you do a Bayesian or a classical analysis as the results are effectively the same– sample size is large, asymptotic theory justified– no prior/external information for analysis– someone has already developed a classical

routine• In other situations, Bayesian methods come

into their own!

Page 46: SPH6004  Advanced Biostatistics

Philosophical points

• If you really love frequentism and hate Bayesianism, you can pragmatically use Bayesian approaches and interpret them like classical ones

• If vice versa, you can– use classical estimates from literature as if

Bayesian– arguably interpret classical point/interval

estimates the way you want to

Page 47: SPH6004  Advanced Biostatistics
Page 48: SPH6004  Advanced Biostatistics

Priors and posteriors

• A prior probability of BC reflects the information you have before observing the mammogram: all you know is the risk class the patient sits in

• The posterior probability of BC reflects the information after observing the mammogram

• A prior probability density function for pv reflects the information you have before the study results are known• The posterior probability density function reflects the information after the study, including anything known before and everything from the study itself

How much knowledge, how much uncertainty

Page 49: SPH6004  Advanced Biostatistics

Justification

• Statistician, Ms A, is analysing some data. She comes up with a model for the data based on some simplifying assumptions. She must justify this choice if others are to believe her

• Bayesian statistician, Mr B, is analysing some data. He must come up with a model for the data and for the parameters. He too must justify his choice.

For instance, Ms A wants to do a logistic regression on the following data

outcome: got infected by H1N1 as measured byserology

predictors: age, gender, recent overseas travel,number of children in household, ...

There is no reason why the effect of age on the risk of infection should be linear in the logit of risk. There is no reason why each predictor’s effect is additive on the logit of risk. There is no reason why individuals should be taken to be independent. These are choices made by the statistician

Page 50: SPH6004  Advanced Biostatistics

Support

• Each parameter of a model has a support• The prior should match this

• All a bit silly:

𝑋 𝐵𝑖𝑛 (𝑛 ,𝑝 )𝑝∈ [0,1 ]

𝑝 𝑁 (0,1002 )

Page 51: SPH6004  Advanced Biostatistics

Priors for multiple parameters

• You must specify a joint prior for all parameters, eg p(a,b,σ)• Often easiest to assume the parameters are a priori

independent, ie egp(a,b,σ) = p(a) p(b) p(σ)

• (note this does not force them to be independent a posteriori)

• But you can incorporate dependency if appropriate, eg if you analyse dataset 1 and use its posterior as a prior for dataset 2

𝑌 𝑖 𝑁 (𝑎+𝑏𝑥 𝑖 ,𝜎2 )𝑎∈ℝ ,𝑏∈ℝ ,𝜎∈ℝ+¿ ¿

Page 52: SPH6004  Advanced Biostatistics

Aim for this part

• Look at different classes of priors:– informative, non-informative– proper, improper– conjugate

Page 53: SPH6004  Advanced Biostatistics

Informative and noninformative priors

Informative Non-informativeEncapsulates information beyond that available solely in the data directly at hand

For instance, if someone has previously estimated the risk of infection by HIV in Thai adults and reported point and interval estimates, you could take those and convert into an appropriate prior distribution

Opposite: a distribution that is flat or approximately flat over the range of parameter values with high likelihood valuesEg pv ~ U(0,1) is non-informative as it is flat over the range 0.5--1.5% where the data tell you pv should beEg mu~U(-1000000,1000000) might be non-informative for a parameter on the real line; as might N(0,10002)

Page 54: SPH6004  Advanced Biostatistics
Page 55: SPH6004  Advanced Biostatistics

When to choose which?Use a non-informative prior if: Use an informative prior if:Your primary data set has so much information in it you can estimate the parameter with no problems

Your primary data set doesn’t give enough information to estimate all unknowns well (see next chapter for an example)

You only have one data set You have multiple data sets and can best analyse them one at a time

You have no really solid estimates from the literature that you can supplement the information from your primary data

You have really good estimates from the literature that everyone accepts

You want to approximate a frequentist analysis

You are analysing the data for your own benefit, to make a decision, say, and do not need the acceptance of others

Page 56: SPH6004  Advanced Biostatistics

Q: I’ve decided I want a non-informative prior. But what form?

Parameter support

Possible non-informative prior

[0,1] U(0,1), Be(1,1), Be(1/2,1/2)Positive part of real line

U(0, ∞), U(0,big number), exp(big mean), gamma(big variance?), log N(mean 1, big variance?), truncated N(0, big variance)

Real line U(−∞, ∞), U(−big number, big number), N(0,big variance)

Exact choice rarely makes a difference

Page 57: SPH6004  Advanced Biostatistics

Q: I’ve decided I want an informative prior and have found an estimate in the

literature. So, how?

Bit of writing needed

Page 58: SPH6004  Advanced Biostatistics

Aim for this part

• Look at different classes of priors:– informative, non-informative– proper, improper– conjugate

Page 59: SPH6004  Advanced Biostatistics

Proper and improper priors

• Recall:• Distributions are supposed to integrate to 1• Prior distributions really should, too• A prior that integrates to 1 is proper• One that doesn’t is improper

( ) d 1Xf x x

p( )

Page 60: SPH6004  Advanced Biostatistics

Proper and improper posteriors

An improper posterior is a bad outcome!

Prior PosteriorProper Proper Improper Proper Improper Improper

Page 61: SPH6004  Advanced Biostatistics

Bad likelihoods

• If the likelihood is ‘badly behaved’ then not only do you need a proper prior, you need an informative prior, as there is insufficient information in the data to estimate that parameter (or those parameters)

Page 62: SPH6004  Advanced Biostatistics

Aim for this part

• Look at different classes of priors:– informative, non-informative– proper, improper– conjugate

Bit of writing needed

Page 63: SPH6004  Advanced Biostatistics

Conjugate priors

• So, with our binomial model, we moved– from a prior for pv that was beta

– to a posterior for pv that was beta

• We therefore say that the beta is conjugate to the binomial

Page 64: SPH6004  Advanced Biostatistics

( )

Page 65: SPH6004  Advanced Biostatistics

Conjugate priors

• There are a handful of other data models with conjugate priors

• May encounter some later in the course• Most real problems do not have conjugate priors

though• If it does, it makes sense to exploit it• Eg for the Thai vaccine, once you realise pv is beta

a posteriori can summarise the posterior directly

Page 66: SPH6004  Advanced Biostatistics

Summarising a posterior directly/(2+nv)

Page 67: SPH6004  Advanced Biostatistics

Different kinds of priors

Non-inform

ative

Informative

ImproperProper

Conjugate

Non-conjugate

Page 68: SPH6004  Advanced Biostatistics

Different kinds of priors

Non-inform

ative

Informative

ImproperProper

Conjugate

Non-conjugate

Page 69: SPH6004  Advanced Biostatistics

Different kinds of priors

Non-inform

ative

Informative

ImproperProper

Conjugate

Non-conjugate

Page 70: SPH6004  Advanced Biostatistics
Page 71: SPH6004  Advanced Biostatistics

Information to Bayesiansprior dataposterior

Page 72: SPH6004  Advanced Biostatistics

Information to Bayesiansprior data 1posterior 1

data 2 posterior 2

Page 73: SPH6004  Advanced Biostatistics

Information to Bayesiansprior data 1posterior 1

data 2 posterior 2

?

Page 74: SPH6004  Advanced Biostatistics

A Gedanken

• Consider experiments to estimate a probability p given a series of Bernoulli trials, xi, with yi = Σj=1:i xj

• Use a Be(α,β) prior for p• Experimentor 1, instead of waiting for all the data to

come in, recalculates the posterior from scratch based on yi and (α,β) each time a data point comes in

• Experimentor 2, uses his last posterior and xi to recalculate the posterior

Recall: beta is conjugate to binomial

Page 75: SPH6004  Advanced Biostatistics

Experimentor 1

Page 76: SPH6004  Advanced Biostatistics

Experimentor 2

Page 77: SPH6004  Advanced Biostatistics

• The two experimentors, using the same prior and same data, end with the same posterior

• Experimentor 1 started afresh each time with the original prior and all data

• Experimentor 2 updated the old posterior with the new datum

Page 78: SPH6004  Advanced Biostatistics

Implications

• If data come to you piecemeal, it doesn’t matter if you analyse them once at the end, or at each intermediate point and update your prior

• (In practice one or the other may be convenient: eg if posterior is not analytic, makes sense to estimate/approximate once, rather than once per datum)

You can always treat an old posterior obtained elsewhere as a prior

You can take estimates from the literature and convert them into priors

Page 79: SPH6004  Advanced Biostatistics

What did we learn in chapter 1?

Bayes ruleApplied to probability of a state of nature (BC) given evidence (MG) and background risk (age)

Refresher on frequentist estimationEstimating a proportion given x, nSaw how Bayes rule could be used to derive posterior probability density of parameter given data

Priors Accumulation of evidence

Page 80: SPH6004  Advanced Biostatistics

What did we learn in chapter 1?

• Don’t know how to do Bayesian inference for problems with >1 parameter!

NOT

Chapter 2 & 3:computing posteriors

Importance samplingMarkov chain Monte Carlo