35
relation With Errors-In-Variables 3/28/2002 Correlation with Errors-In- Variables and an Application to Galaxies William H. Jefferys University of Texas at Austin, USA

Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Embed Size (px)

Citation preview

Page 1: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 1

Correlation with Errors-In-Variables and an Application to Galaxies

William H. Jefferys

University of Texas at Austin, USA

Page 2: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 2

A Problem in Correlation

• A graduate student at Maryland asked me to assist her on a problem involving galaxy data. She wanted to know if the data showed clear evidence of correlation, and if so, what the correlation was and how strong was the evidence for it.

Page 3: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 3

The Data

Page 4: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 4

Comments on Data

• At first glance the correlation seems obvious.

• But there is an unusual feature of her problem: she knew that the data were imperfect, and for each data point had an error bar in both x and y. Standard treatments of correlation do not address this situation.

Page 5: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 5

The Data

Page 6: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 6

Comments on Data

• The presence of the error bars contributes to uncertainty as to how big the correlation is and how well it is determined

• The data are sparse, so we also need to be concerned about small number statistics

• The student also was concerned about how the lowest point affected any correlation. What would happen, she wondered, if it were not included in the sample? [She was afraid that the ellipticity of the particular galaxy was so low that it might not have been measured accurately and in fact that the galaxy might belong to a different class of galaxies]

Page 7: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 7

The Data

Page 8: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 8

Bayesian Analysis and Astronomy

• She had been trying to use our program GaussFit to analyze the data, but it is not designed for tasks such as measuring correlation.

• I, of course, suggested to her that a Bayesian approach might be appropriate

• Bayesian methods offer many advantages for astronomical research and have attracted much recent interest.

• Astronomy and Astrophysics Abstracts lists 169 articles with the keywords ‘Bayes’ or ‘Bayesian’ in the past 5 years, and the number is increasing rapidly (there were 53 in 2000 alone, up from 33 in 1999).

Page 9: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 9

Advantages of Bayesian Methods

• Bayesian methods allow us to do things that would be difficult or impossible with standard (frequentist) statistical analysis.

• It is simple to incorporate prior physical or statistical information

• Interpretation of results is very natural

• Model comparison is easy and straightforward. (This is such a problem)

• It is a systematic way of approaching statistical problems, rather than a collection of ad hoc techniques. Very complex problems (difficult or impossible to handle classically) are straightforwardly analyzed within a Bayesian framework.

Page 10: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 17

Bayesian Model Selection/Averaging

• Given models Mi, each of which depends on a vector of parameters M, and given data Y, Bayes’ theorem tells us that

• The probabilities p (M | Mi ) and p (Mi ) are the prior probabilities of the parameters given the model and of the model, respectively; p (Y |M, Mi ) is the likelihood function, and p (M, Mi |Y ) is the joint posterior probability distribution of the parameters and models, given the data.

• Note that some parameters may not appear in some models, and there is no requirement that the models be nested.

p( M,Mi |Y) ∝ p(Y | M,Mi )p(M |Mi )p(Mi ),

Page 11: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 19

Strategy

• I do not see a simple frequentist approach to this student’s problem

• A reasonable Bayesian approach is fairly straightforward:

• Assume that the underlying “true” (but unknown) galaxy parameters i and i (corresponding to the observed xi and yi) are distributed as a bivariate normal distribution

p(i ,i |ρ,a,b,σ,σ) ∝1

σσ (1−ρ2)

×exp−1

2(1−ρ2 )(i −a)2

σ2 +

(i −b)2

σ2 −2ρ

(i −a)(i −b)

σσ

⎜ ⎜

⎟ ⎟

⎢ ⎢

⎥ ⎥

Page 12: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 20

Strategy

• Since we do not know i and i for each galaxy but instead only the observed values xi and yi, we introduced the i and i for each galaxy as latent variables. These are parameters to be estimated.

• Here a and b give the true center of the distribution; ρ is the true correlation coefficient, and σ and σ are the true standard deviations. None of these quantities are known. They are also parameters which must be estimated.

Page 13: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 21

Strategy

• [Since we are using a bivariate normal, the variance-covariance matrix is

with inverse

V =σ2 ρσσ

ρσσy σ2

⎣ ⎢

⎦ ⎥

V−1 =1

1−ρ2

1σ2 −

ρσσ

−ρ

σσ

1σ2

⎢ ⎢ ⎢

⎥ ⎥ ⎥

Page 14: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 22

Strategy

• So the density is

where =(–a–b)´ ]

p(ϕ ) ∝ V −1/ 2 exp−12

′ ϕ V−1ϕ ⎡ ⎣

⎤ ⎦

Page 15: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 23

Strategy

• This expression may be regarded as our prior on the latent variables i and i. It depends on the other parameters (a, b, ρ, σ, σ). We can regard these as hyperparameters, which will in turn require their own priors.

• The joint prior on all the latent variables i and i can be written as a product:

where and are vectors whose components are i and i respectively.

p(Ξ,Η |ρ,a,b,σ ,σ ) ∝ p(i ,i |ρ,a,b,σ ,σ )i∏

Page 16: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 24

Strategy

• We know the distributions of the data xi and yi conditional on i and i. Their joint distribution is given by

• Here si and ti are the standard deviations of the data points, assumed known perfectly for this analysis (these are the basis of the error bars I showed earlier...).

p(xi ,yi |i ,i ,si ,ti ) ∝ exp−(xi −i )

2

2si2

⎣ ⎢ ⎤

⎦ ⎥exp−

(yi −i )2

2ti2

⎣ ⎢ ⎤

⎦ ⎥

Page 17: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 25

The Data

The “true” valueis somewhere nearthis error ellipse

Page 18: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 26

Strategy

• Now we can write down the likelihood, the joint probability of observing the data, conditional on the parameters (here only the latent parameters appear, the others are implicit through the prior on the latent parameters):

where X, Y, S, and T are vectors whose components are xi, yi, si, and ti, respectively.

p(X,Y |Ξ,Η,S,T) ∝ p(xi ,yi |i ,i ,si ,ti)i∏

Page 19: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 27

Priors

• The next step is to assign priors for each of the parameters, including the latent variables. Lacking special information, I chose conventional priors for all but and . Thus, I assign

• Improper constant flat priors on a and b.

• Improper Jeffreys priors 1/σ and 1/σ on σ and σ.

• We have two models, one with correlation (M1) and one without (M0). I assign p(M1)= p(M0)=1/2

• We will compare M1 and M0 by computing their posterior probabilities. I chose the prior p(ρ|M1) on ρ to be flat and normalized on [–1,1] and zero elsewhere; I chose a delta-function prior p(ρ|M0)= (ρ–0) on M0

• Priors on and were displayed earlier

Page 20: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 28

Posterior Distribution

• The posterior distribution is proportional to the prior times the likelihood, as Bayes instructs us

p(ρ ,a,b,σ ξ ,σ η ,Ξ,Η, M k | X,Y ,S,T )

∝p(ρ | Mk ) p(Mk )

σξσ η

× p(Ξ,Η | ρ ,a,b,σ ξ ,σ η )

× p( X,Y | Ξ,Η,S,T )

Page 21: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 29

Simulation Strategy

• We used simulation to generate a sample from the posterior distribution through a combination of Gibbs and Metropolis-Hastings samplers (“Metropolis-within-Gibbs”).

• The sample can be used to calculate quantities of interest:

» Compute posterior mean and variance of the correlation coefficient ρ, (calculate sample mean and variance of ρ)

» Plot the posterior distribution of ρ, (plot a histogram of ρ from the sample).

» Determine quantiles of the posterior distribution of ρ (use quantiles of the sample).

» Compute posterior probabilities of each model (calculate the frequency of the model in the sample).

Page 22: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 30

Posterior Conditionals

• The conditional distribution on i and i looks like

• By combining terms and completing the square we can sample i and i from a bivariate normal.

p(i ,i |ρ,a,b,σ,σ,X,Y,S,T) ∝

exp−12

(i −xi )2

si2 +

(i −yi )2

ti2

⎝ ⎜

⎠ ⎟

⎣ ⎢

⎦ ⎥

×exp−1

2(1−ρ2 )(i −a)2

σ2 +

(i −b)2

σ2 −2ρ

(i −a)(i −b)

σσ

⎜ ⎜

⎟ ⎟

⎢ ⎢

⎥ ⎥

Page 23: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 31

Posterior Conditionals

• Similarly, the posterior conditional on a, b is

• Again, by completing the square we can sample a and b from a bivariate normal

• Note that if this were not an EIV problem, we would have x’s and y’s instead of ’s and ’s

p(a,b | ρ,σ ,σ,Ξ,Η) ∝

exp−1

2(1−ρ2)(i −a)2

σ2 +

(i −b)2

σ2 −2ρ

(i −a)(i −b)

σσ

⎜ ⎜

⎟ ⎟

⎢ ⎢

⎥ ⎥i

Page 24: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 32

Posterior Conditionals

• The posterior conditional on σ and σ is

• It wasn’t obvious to me that we could sample this in a Gibbs step, but maybe there’s a way to do it. I just used independent M-H steps with a uniform symmetric proposal distribution, tuning the step size for good mixing, and this worked fine.

p(σ ,σ |ρ,a,b,Ξ,Η) ∝σ−(N+1)σ

−(N +1)

× exp−1

2(1−ρ2 )(i −a)2

σ2 +

(i −b)2

σ2 −2ρ

(i −a)(i −b)

σσ

⎜ ⎜

⎟ ⎟

⎢ ⎢

⎥ ⎥i

Page 25: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 33

Posterior Conditionals

• We do a reversible-jump step on ρ and M simultaneously. Here the idea is to propose a model M and at the same time a correlation ρ in a M-H step and either accept or reject according to the M-H .

• Proposing from M0 to M0 or from M1 to M1 is basically simple, just an ordinary M-H step.

• If we are proposing between models, then things are a bit more complicated. This is due to the fact that the dimensionalities of the parameter spaces are different between the two models.

Page 26: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 34

Posterior Conditionals

• The posterior conditional of ρ under M1 is

• (The leading factor of 1/2 comes from the prior on ρ and is very important)

p(ρ |a,b,σ ,σ,Ξ,Η,M1) ∝12×

1

(1−ρ2)N / 2

× exp−1

2(1−ρ2 )(i −a)2

σ2 +

(i −b)2

σ2

⎝ ⎜ ⎜

⎠ ⎟ ⎟

⎢ ⎢

⎥ ⎥i

× expρ

(1−ρ2 )(i −a)(i −b)

σσ

⎜ ⎜

⎟ ⎟

⎢ ⎢

⎥ ⎥i

Page 27: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 35

Posterior Conditionals

• The posterior conditional of ρ under M0 is

• The function guarantees that ρ=0.

• Here, the proportionality factor is chosen so to match the factor of 1/2 under M0. The factors come from the priors [p(ρ|M1~U(–1,1) which has an implicit factor 1/2, and p(ρ|M0~(ρ)].

p(ρ |a,b,σ ,σ,Ξ,Η,M0) ∝ (ρ)

× exp−12

(i −a)2

σ2 +

(i −b)2

σ2

⎝ ⎜ ⎜

⎠ ⎟ ⎟

⎢ ⎢

⎥ ⎥i

Page 28: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 36

Posterior Conditionals

• The M-H ratio when jumping from (M,ρ) to (M*,ρ*) is therefore (where the q’s are the proposals):

• We sampled using a beta proposal q(ρ|M1,…) with parameters tuned by experiment for efficiency and good mixing under the complex model, and with a proposal q(M1|…) that also was chosen by experiment with an eye to getting an accurate estimate of the posterior odds on M1.

• The idea is that a beta proposal on ρ matches the actual conditional pretty well so will be accepted with high probability; the M-H ratio will be close to 1.

p(ρ* |M* ,K )p(M* |K )q(ρ |M,K )q(M |K )p(ρ |M,K )p(M |K )q(ρ* |M* ,K )q(M* |K )

Page 29: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 37

Sampling Strategy for Our Problem

• To summarize:

• We sampled the a, b, j, j in Gibbs steps (a and b appear in the posterior distribution as a bivariate normal distribution, as do the j, j).

• We sampled σ, σ with M-H steps using symmetric uniform proposals centered on the current point, adjusting the maximum step for good mixing

• We sampled ρ and M in a simultaneous reversible-jump M-H step, using a beta proposal on ρ with parameters tuned by experiment for efficiency and good mixing under the complex model, and with a proposal on M that also was chosen by experiment with an eye to getting an accurate estimate of the posterior odds on M.

Page 30: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 38

Results

• For the data set including the circled point, we obtained

• Odds on model with correlation = 207 (assumes prior odds equal to 1)

• Median rho = -0.81

• Mean rho = -0.79 ± 0.10

Page 31: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 39

Posterior distribution of ρ (Including all points)

Page 32: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 40

Results

• For the data set including the circled point, we obtained

• Odds on model with correlation = 207 (assumes prior odds equal to 1)

• Median rho = -0.81

• Mean rho = -0.79 ± 0.10

• For the data set without the circled point we obtained

• Odds on model with correlation = 9.9

• Median rho = -0.70

• Mean rho = -0.68 ± 0.16

Page 33: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 41

Posterior distribution of ρ (Excluding 1 point)

Page 34: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 42

Final Comments

• This problem combines several interesting features:

• Latent variables, introduced because this is an errors-in-variables problem

• Model selection, implemented through reversible-jump MCMC simulation

• A combination of Gibbs and Metropolis-Hastings steps to implement the sampler (“Metropolis-within-Gibbs”)

• It is a good example of how systematic application of basic Bayesian analysis can yield a satisfying solution of a problem that, when looked at frequentistically, seems almost intractable

Page 35: Correlation With Errors-In-Variables3/28/20021 Correlation with Errors-In-Variables and an Application to Galaxies William H. Jefferys University of Texas

Correlation With Errors-In-Variables 3/28/2002 43

Final Comments

• One final comment: If you look at the tail area in either of the two cases investigated, you will see that it is much less than the 1/200 or 1/10 odds ratio that we calculated for the odds of M0 against M1. This is an example of how tail areas in general are not reliable statistics for deciding whether a hypothesis should be selected.