Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Empirical and Fully Bayesian HierarchicalModels : Two Applications in the Stellar
Evolution
Shijing Si∗, David van Dyk∗, Ted von Hippel†
∗Imperial College London
†Embry-Riddle Aeronautical University
February 10, 2015
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Overview
1 Introduction
2 Example
3 Fitting the Distributions of Halo WDs
4 Fitting the Variability in IFMRs Among Stellar Clusters
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Hierarchical Modelling
• Many data are of hierarchical structure, whichmotivates hierarchical models.
• Hierarchical modelling usually produces ashrinkage estimate, which is widely considered tobe more reasonable.
• Also, it can reduce the mean squared errors.
• However, fitting a hierarchical model may becomputationally intensive.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Hierarchical Models
• One commonly used and simple hierarchicalmodel is:
Yi = µi + εi ; εi ∼ N(0, σ2i ), (1)
µi ∼ N(γ, τ2), i = 1, · · · , I . (2)
in which:
• εi s are independent of µj s and σ2i s are known.
• µi s are group effects, γ pooling effects, and τ2
between-group variation.
• What we want to learn from this model is µi s andγ and τ2.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Statistical Inference to Hierarchical Models
• Commonly used ways are Maximum likelihood,fully Bayesian and empirical Bayes.
• MLE: Find the MLE of hyper-parameters bymaximising their likelihood function;
• Fully Bayesian: Find the fully joint posterior forall parameters and make inferences from thispostrior, usually via MCMC;
• Empirical Bayesian: Find the MLE or MAP ofhyper-parameters by maximising their likelihoodfunction and then fit group-level parameters in ageneral Bayesian way.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Pros and Cons
• Fully Bayesian will yield a shrinkage estimator forall µi s. However, it often brings computationalchallenges, eg. high-dimensional sampling.
• Empirical Bayes is the combination of theprevious two. It will produce a shrinkageestimator but not much as fully Bayesian, alsoless intensive than it.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Fully Bayesian Approach
We fit a dataset with these two approaches and compare theresult later.Fully Bayesian: The joint posterior density
p(γ, τ2, µ1, · · · , µI |Y) ∝ p(γ, τ2)I∏
i=1
N(Yi |µi , σi )N(µi |γ, τ2).
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Shrinkage Estimator
• For brevity, we take the hyper-prior p(γ, τ2) ∝ 1.Given γ and τ2, the conditional densities for µi sare
p(µi |γ, τ2,Y) ∼ N(µ̂i , σ̂2i ), (3)
with µ̂i =τ2Yi+σ2
i γ
σ2i +τ2
and σ̂2i =σ2
i τ2
σ2i +τ2
.
p(γ|τ2, µ1, · · · , µI ,Y) ∼ N(1
I
I∑i=1
µi ,τ2
n),
p(τ2|γ, µ1, · · · , µI ,Y) ∼ inv − Γ(n − 2
2,
1
2
I∑i=1
(µi − γ)2).
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Empirical Bayes Estimation
• It proceeds by obtaining an estimation for hyperparameters γ and τ2 first. Usually, a MLE orMAP estimator. Then, µi s are fitted in a generalBayesian framework.
• EM-type algorithms can be employed to find theMLE of hyper-parameters. µi s are treated asmissing data.
The log complete data likelihood function:
L(γ, τ2|Y, µ1, · · · , µI ) =I∑
i=1
[− 1
2log(2πσ2i )− 1
2σ2i(Yi − µi )
2
−1
2log(2πτ2)− 1
2τ2(µi − γ)2
].
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
EM algorithm
• Given the t-th iteration γ(t) and τ2(t), conditionaldistribution of µi s are normal with mean
E (µi |Y, γ(t), τ2(t)) =τ2(t)Yi+σ2
i γ(t)
σ2i +τ2(t)
and variance
Var(µi |Y, γ(t), τ2(t)) =τ2(t)σ2
i
σ2i +τ2(t)
.
• The t + 1-st update:
γ(t+1) =1
I
I∑i=1
E (µi |Y, γ(t), τ2(t));
τ2(t+1) =1
I
I∑i=1
E [(µi − γ(t+1))2|Y, γ(t), τ2(t)].
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
One Example
• We take one dataset in Rstan (with modificationson σ). It is a survey conducted by ETS to checkthe effect of coaching on students’ performancein eight states in US. The dataset is presented asfollows:Y 28 8 -3 7 -1 1 18 12
σ 5 1 5 2 3 1 2 3
• The hierarchical model:
Yi = µi + εi ; εi ∼ N(0, σ2i ),
µi ∼ N(γ, τ2), i = 1, · · · , I .
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Results
We use Rstan to fit this hierarchical model in a fully Bayesianway and code in R to find the MLE of hyper-parameters by EMalgorithm.
histogram of gamma
gamma
Den
sity
−10 0 10 20
0.00
0.04
0.08 Empirical Bayes Estimate
histogram of tau
gamma
Den
sity
5 10 15 20 25 30
0.00
0.04
0.08
0.12
Empirical Bayes Estimate
Figure 1 : Inferences on γ and τ 2 from fully Bayesian (Histograms)and empirical Bayes (Red line)
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Cont’d
mean score of 1st school
mu1
Den
sity
10 15 20 25 30 35 40
0.00
0.04
0.08
Empirical
Bayes
mean score of 2nd school
mu2
Den
sity
5 6 7 8 9 10 11
0.0
0.1
0.2
0.3
0.4
mean score of 3rd school
mu3
Den
sity
−15 −5 0 5 10 15
0.00
0.04
0.08
mean score of 4th school
mu4
Den
sity
2 4 6 8 10 12 14
0.00
0.10
0.20
mean score of 5th school
mu5
Den
sity
−10 −5 0 5 10
0.00
0.05
0.10
0.15
mean score of 6th school
mu6
Den
sity
−2 −1 0 1 2 3 4
0.0
0.2
0.4
mean score of 7th school
mu7
Den
sity
12 14 16 18 20 22 24
0.00
0.10
0.20
mean score of 8th school
mu8
Den
sity
5 10 15 20
0.00
0.05
0.10
0.15
Figure 2 : Comparisons on indivual means from fully Bayesian(Histograms) and empirical Bayes (Red line)
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Diagnostics about EM algorithm
8.55 8.60 8.65 8.70 8.75
7090
110
Trace plot of EM updates
gamma
tauS
quar
e
0 100 200 300 400 500
8.55
8.65
8.75
iterations
gam
ma
8.535824
0 100 200 300 400 500
7090
110
iterations
tauS
quar
e
69.78020
0 50 100 150 2000.0e
+00
1.5e
−13
profile likelihood function of tauSquare
tauSquare
prof
ile li
kelih
ood
69.76
Figure 3 : Diagnostic pictures about EM algorithm
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Contour plot of Marginal likelihood
0 5 10 15
050
100
150
200
Contour of gamma and tau−square
gamma
tau−
squa
re
Figure 4 : Contour plot of marginal likelihood of γ and τ 2
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Fitting the Distributions of Age of Halo WDs
• The goal here is the distribution of the populationage of galactic halo WDs.
• µ the mean of the population logAge.
• Here is the simple hierarchical structure:
populationage
WDnWD1 WD2
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Mathematical Formula
• For the i-th star, denote its logAge, distancemodulus, metallicity, mass as Ai ,Di ,Ti ,Mi .
• The data available is its photometry, Yi , thebrightness on a range of wavelengths.
• Assume that the logAge of the Galactic halo isnormal with mean γ and variance τ2.
(Yi |Ai ,Di ,Ti ,Mi ) ∼ N(G (Ai ,Di ,Ti ,Mi ),Σi );
Ai ∼ N(γ, τ2), i = 1, · · · , I ,
with G (·) is a computer model and Σi s are known fromastronomers. Also, astronomers have very good priors on otherparameters, Di ,Ti ,Mi , i = 1, · · · , I .
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Photometry Data
Photometry is the intensity of an astronomical object’selectromagnetic radiation, over a braod band of wavelengths.
Figure 5 : The photometry over several filters.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Fitting one Star at a Time
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Empirical Bayes Fitting
BASE9 can fit one star at a time, namely, it can draw a verygood sample from the i-th star’s posterior density for given γand τ2:
p(Ai ,Di ,Ti ,Mi |Yi ) ∼ p(Yi |Ai ,Di ,Ti ,Mi )×p(Ai |γ, τ2)p(Mi )p(Di )p(Ti ),
where p(Ai |γ, τ2), p(Mi ), p(Di ), p(Ti ) are prior densities.We incorporate this fit-one-at-a-time algorithm in our fitting toa more complicated hierarchical model.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Monte Carlo EM algorithm
Monte Carlo EM (MCEM) algorithm is used to find the MLEof γ and τ2.
MC-step: Given the t-th update γ(t) and τ2(t), draw asample of size N,
(A[j]i ,D
[j]i ,M
[j]i ,T
[j]i ), j = 1, 2, · · · ,N, i = 1, 2, · · · , I from
its posterior;
E-step: Replace expectations with sample means;
M-step:
γ(t+1) =1
I · N
I∑i=1
N∑j=1
A[j]i ;
τ2(t+1) =1
I · N
I∑i=1
N∑j=1
(A[j]i − γ
(t+1))2.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Primary Results
9.8 9.9 10.0 10.1 10.2
010
2030
40
Distribution of logAge for DB white dwarfs
Den
sity
Empirical Hierarchical
Individual
J0346
J1102
J2137
J2145N
J2145S
Figure 6 : Fit 5 stars in a hierarchical model.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
IFMRs
• In stellar evolution, IFMR, initial-final massrelationship connects the mass of a white dwarfwith the mass of its progenitor in themain-sequence.
• Commonly used IFMRs: William, Weidemann,Salaris I and Salaris II.
• Assume Mfinal = f (Minitial ,β), usually f (·) istaken as linear or piecewise linear.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Contour plot
Initial Mass
WD
Mas
s
0 2 4 6 8
0.6
0.8
1.0
1.2
HyadesNGC 2477WeidemannWilliamSalaris ISalaris II
Figure 7 : Contour of the joint distribution of final mass and initialmass, which contains 95% of the distribution.
Empirical andFully BayesianHierarchical
Models : TwoApplicationsin the StellarEvolution
Shijing Si∗,David vanDyk∗, Tedvon Hippel†
Introduction
Example
Fitting theDistributionsof Halo WDs
Fitting theVariability inIFMRs AmongStellarClusters
Hierarchical Modelling of IFMRs
(Xi |βi ) ∼ N(G (Ai , θi ,βi ),Σi ), i = 1, 2, · · · , I ;(βi |ξ,Ψ) ∼ N(ξ,Ψ),
• Σi are known;
• βi s are coefficients in IFMRs;
• We want to learn from some dataset about ξ andΨ.