72
A Bayesian 2 test for goodness of fit 10/23/09 Multilevel RIT

A Bayesian 2 test for goodness of fit 10/23/09 Multilevel RIT

  • View
    215

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

A Bayesian 2 test for goodness of fit10/23/09

Multilevel RIT

Page 2: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Overview

• Talk about basic 2 test. Review with some examples.

• Talk about the paper with examples.

Page 3: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Basic 2 test

• The 2 test is used to test if a sample of data came from a population with a specific distribution.

• An attractive feature of the 2 goodness-of-fit test is that it can be applied to any univariate distribution for which you can calculate the CDF.

y1 y3y2 y4 yny5

Page 4: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The value of the 2 depends on how you partition the support.

The sample size needs to be a sufficient size for the approximation to be valid.

Page 5: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

n is the sample size

K is the number of partitions or bins specified over the sample space

is the probability assigned by the null model to this interval

is the number of observations within the kth bin

The 2 statistic, in the case of the simple hypothesis, is:

2 with k-1 degrees of freedom, as n goes to infinity

Page 6: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

4 examples

We generate 4 sets of RVs:

1) 1000 normal

2) 1000 double exponential

3) 1000 t distribution with 3 degrees of freedom

4) 1000 lognormal

We use the chi square test to see if each of the data sets fits a normal distribution. Ho: the data come from a normal distribution

Page 7: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 8: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 9: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 10: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 11: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

are the estimates of the bin probabilities based on either the MLE for the grouped data or on the minimum 2 method.

The 2 statistic, in the case of composite hypothesis, is:

2 with k-s-1 degrees of freedom, as n goes to infinity

Where s is the dimension of the underlying parameter vector

Page 12: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 13: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

= 5.73

Page 14: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 15: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The MLE for the grouped data means maximizing this function

with respect to , while minimum 2 estimation involves finding the value of that minimizes a function related to Rg.

Page 16: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

A Bayesian 2 statistic.

Let y1, ……., yn (= y) denote the scalar-valued, continuous, identically distributed, conditionally independent observations drawn from the pdf f(y|).

is indexed by an s-dimensional parameter vector Rs

We want to generate a sampled value from the posterior p( | y) .

To do that, we can apply the inverse of the probability integral transform method.

Page 17: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

.

.

.

Set up these integrals, and then solve for ’s

Generally, in practice, the are calculated using the Gibbs sampler.

Page 18: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

denotes a value of

sampled from the posterior distribution based on y

The MLE

Notation considerations

Page 19: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 20: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 21: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

This is interesting because if you contrast RB with R^ we see that R^ has k – s – 1 degrees of freedom while RB has K – 1 degrees of freedom. RB is independent of the number of parameters.

Page 22: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 23: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 24: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The process is:

Page 25: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The process is:

1) Have data y1, ……., yn

Page 26: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The process is:

1) Have data y1, ……., yn

2) Generate from data y1, ……., yn (by integral transform or Gibbs sampler).

Page 27: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The process is:

1) Have data y1, ……., yn

2) Generate from data y1, ……., yn (by integral transform or Gibbs sampler).

3) Create ’s

Page 28: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The process is:

1) Have data y1, ……., yn

2) Generate from data y1, ……., yn (by integral transform or Gibbs sampler).

3) Create ’s

4) Calculate RB

Page 29: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The process is:

1) Have data y1, ……., yn

2) Generate from data y1, ……., yn (by integral transform or Gibbs sampler).

3) Create ’s

4) Calculate RB

5) Repeat steps 2 to 4 to get many RB’s

Page 30: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The process is:

1) Have data y1, ……., yn

2) Generate from data y1, ……., yn (by integral transform or Gibbs sampler).

3) Create ’s

4) Calculate RB

5) Repeat steps 2 to 4 to get many RB’s

6) By LLN,

)),((1 2

)],([1

baPIN N

baR

N

i b

Page 31: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

We can then report the proportion of RB values that exceeded the 95th percentile of the reference 2 with k-1 degrees of freedom.

If the proportion is higher than what is expected then, the excess can be attributed to dependence between RB values or lack of fit.

If the RB values did represent independent draws from the 2, then the proportion of values falling in the critical region of the test would exactly equal the size of the test.

Page 32: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The statistic A is used in the event that formal significance tests must be performed to assess model adequacy.

Page 33: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The statistic A is used in the event that formal significance tests must be performed to assess model adequacy.

A is related to a commonly used quantity in signal detection theory and represents the area under the ROC curve [e.g., Hanley and McNeil (1982)] for comparing the joint posterior distribution of RB values to a χ2

K−1 random variable.

Page 34: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

The statistic A is used in the event that formal significance tests must be performed to assess model adequacy.

A is related to a commonly used quantity in signal detection theory and represents the area under the ROC curve [e.g., Hanley and McNeil (1982)] for comparing the joint posterior distribution of Rb values to a χ2

K−1 random variable.

The expected value of A, if taken with respect to the joint sampling distribution of y and the posterior distribution of θ given y, would be 0.5. Large deviations in the expected value of A from 0.5, when the expectation is taken with respect to theposterior distribution of θ for a fixed value of y, indicate model lack of fit.

Page 35: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Some things to keep in mind

• Unfortunately, approximating the sampling distribution of A can be a lot of trouble.

Page 36: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Some things to keep in mind

• Unfortunately, approximating the sampling distribution of A can be a lot of trouble.

• How do you decide how many bins to make and how to assign probabilities to these bins? Consistency of tests against general alternatives requires that k as n .

Page 37: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Some things to keep in mind

• Unfortunately, approximating the sampling distribution of A can be a lot of trouble.

• How do you decide how many bins to make and how to assign probabilities to these bins? Consistency of tests against general alternatives requires that k as n .

• Having too many bins can result in loss of power.

Page 38: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Some things to keep in mind

• Unfortunately, approximating the sampling distribution of A can be a lot of trouble.

• How do you decide how many bins to make and how to assign probabilities to these bins? Consistency of tests against general alternatives requires that k as n .

• Having too many bins can result in loss of power.

• Mann and Wald suggested to use 3.8(n-1)0.4 equiprobable cells.

Page 39: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Example

Let y = (y1, ….., yn) denote a random sample from a normal distribution with unknown and 2

Let us assume a joint prior for (, 2) to be proportional to 1/2 .

Page 40: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

For a given data vector y and posterior sample (μ˜ ,σ˜ ), bin counts mk(μ˜ ,σ˜ ) are determined by counting the number of observations yi that fall into the interval

( ˜σ−1(ak−1) + ˜μ, ˜σ−1 (ak) + ˜μ),

where −1(·) denotes the standard normal quantile function.

Based on these counts, RB(μ˜,σ˜ ) is calculated according to

Page 41: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

0 2 4 6 8 10

0.0

0.1

0.2

0.3

0.4

0.5

x

dch

isq

(x, d

f = 2

)

Page 42: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

0 2 4 6 8 10

0.0

00

.05

0.1

00

.15

x

dch

isq

(x, d

f = 4

)

Page 43: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 44: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 45: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 46: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Power Calculation

• The next figure displays the proportion of times in 10,000 draws of t samples that the test statistic A was larger than the 0.95 quantile for the sampled values of App.

(App comes from posterior predictive

observations of y).

Page 47: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 48: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Essentially, the only requirement for their use is that observations be conditionally independent.

Main advantages:

Goodness-of-fit tests based on the statistic RB provide a simple way of assessing the adequacy of model fit in many Bayesian models.

Values of RB generated from a posterior distribution may prove useful both as a convergence diagnostic for MCMC algorithms and for detecting errors written in computer code to implement these algorithms.

From a computational perspective, such statistics can be calculatedin a straightforward way using output from existing MCMC algorithms.

Page 49: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 50: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

There is a later paper written in 2007 that uses the same methodology, but applied to censored data.

Page 51: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Bayesian Chi-square TTE fit

Using Bayesian chi-square tests to assess goodness of fit for time-to-event data This software computes the Bayesian chi square test of Valen Johnson [1] for right-censored time-to-event data. It tests the goodness of fit of the best fit to the data from the following distribution families:

exponential gamma inverse gamma Weibull log normal log logistic log odds rate

Page 52: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Bayesian chi square test results

Input options

File sample1.txt

Number of bins 16 (default)

Discrete time yes

RNG seed from system time

Notation 0 for alive and 1 for dead

Bayesian chi square and related statistics

Distribution mean X2 var X295th percentile

p-value bound

BIC DIC DIC # parameters

Gamma 11.29196.20126

15.7188 1 9009.48997.49

0.973041

LogOddsRate 11.997212.7518

18.875 19019.83

9002.04

1.49818

LogLogistic 20.995932.4916

31.75 0.1365069027.91

9016.12

1.03674

LogNormal 25.914335.2128

37.0938 0.02404349042.18

9030.31

0.996002

Weibull 29.37649.01371

34.6563 0.0539539 90359023.08

0.97273

InverseGamma 113.822145.183

133.813 0 92109198.14

1.00249

Exponential 379.83575.5927

397.438 09469.93

9463.99

0.493292

Page 53: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

mean X2 is the Bayesian chi square (BCS) value, the mean of the chi-square values from 1000 samples from the posterior.

var X2 is the corresponding sample variances of the chi square values.

95 percentile is this order statistic of the chi-square samples.

p-value bound is the upper bound on the p-value corresponding to the order statistic using Rychlik's inequality.

BIC is the 'Bayesian' information criteria.

DIC is the deviance information criteria.

DIC # parameters is the number of effective parameters as measured by the DIC.

Page 54: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Distribution param1 param2 param3

Gamma 2.97519 17.4145  

LogOddsRate 2.31743 49.9121 0.481747

LogLogistic 2.73695 -10.4045  

LogNormal 3.77847 0.644426  

Weibull 1.88126 58.0321  

InverseGamma 2.18072 75.8742  

Exponential 54.1108    

This output produced by BCSTTE, Bayesian Chi-Square TTE fit, available at http://biostatistics.mdanderson.org/SoftwareDownload/.

Distribution parameters

Page 55: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Bayesian chi square test results

Input options

File sample2.txt

Number of bins 5

Discrete time no

RNG seed 12345

Notation 0 for uncensored and 1 for censored

Bayesian chi square and related statistics

Distribution mean X2 var X2 95th percentile p-value bound BIC DIC DIC # parameters

Gamma 4.04367 7.75087 8.66667 1 1075.5 1067.84 0.952195

LogLogistic 4.44592 11.2346 13.0833 0.213249 1081.61 1074.01 0.987576

LogOddsRate 4.58767 6.40555 8.91667 1 1079.92 1068.04 1.19743

LogNormal 4.83717 10.6833 12.3333 0.294848 1085.41 1077.74 0.950352

Weibull 5.2845 6.15882 9.75 0.879533 1075.42 1067.83 0.990863

InverseGamma 22.4472 86.2438 37.5833 2.6779e-006 1115.82 1108.23 0.99144

Exponential 31.9989 6.84955 37.6667 2.57403e-006 1107.98 1104.22 0.508292

 

Page 56: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

 Distribution parameters

Distribution param1 param2 param3

Gamma 2.34858 20.0585  

LogLogistic 2.3886 -8.79073  

LogOddsRate 1.79345 49.7335 0.134152

LogNormal 3.63348 0.753531  

Weibull 1.68663 52.402  

InverseGamma 1.55293 42.0575  

Exponential 48.4923    

Page 57: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

• Here is the math.

That’s most of it…

Page 58: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 59: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 60: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 61: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 62: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 63: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 64: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 65: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 66: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 67: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 68: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 69: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 70: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 71: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT
Page 72: A Bayesian  2 test for goodness of fit 10/23/09 Multilevel RIT

Thanks for coming to the talk.

Cao, Jing, Moosman, Ann, Johnson, V.E. (2008). ‘A Bayesian Chi-Squared Goodness-of-Fit Test for Censored Data Models.’ UT MD Anderson Cancer Center Department of Biostatistics Working Paper Series