Bayesian Generalized Product Partition Model

By David Dunson and Ju-Hyun Park

Presentation by Eric Wang 2/15/08

Outline• Introduce Product Partition Models (PPM).

• Relate PPM to DP via the Blackwell-MacQueen Polya Urn scheme.

• Introduce predictor dependence into PPM to form Generalized PPM (GPPM).

• Discussion and Results

• Conclusion

Product Partition Model• A PPM is formally defined as

– Where is a partition of .– Let denote the data for subjects in cluster h, h

= 1,…,k.– Therefore, the probability of partition is therefore the

product of all its independent subsets.– The posterior cohesion on after seeing data is also a

* ccf|f1 1

0* )S()S(),y()Sy(

)S,...,S(S **1

*k },...,1{ n}S:{y *hih iy

)y()S( *hhh fc

Product Partition Model• A PPM can also be induced hierarchically

– Where if , .

• Taking induces a nonparametric PPM.

• A prior on the weights imposes a particular form on the cohesion: a convenient choice corresponds to the Dirichlet Process.

)(~S,|

hSi *hSi )',..,(S 1 nSS

)',...( 1 k

Relating DP and PPM• In DP, . – G is seen in stick breaking. If it is marginalized out, it yields

the Blackwell-MacQueen (1973) formulation:

– Where is the unique value taken by the ith data.– The joint distribution of the a particular set

is therefore

due to the independence of the data.

)(~ 0GDPG

Relating DP and PPM• It can be shown directly that the Blackwell-MacQueen

formulation leads to

• Where is the number of data taking unique value .• is the unique value of the subject in cluster h, re-sorted

by their ids:

• Also, , is a normalizing constant and the cohesion is Then:

thl},...,,...,,...,,,...,{ ,1,

,11,1 21 kh

Relating DP and PPM• From slide 3, writing the prior and likelihood together:

• Notice that from (1), G can be marginalized out to get the same form

• Specifically, integrate over all possible unique values which can be taken by for subset h.h

Relating DP and PPM• Therefore, DP is a special case of PPM with cohesion

and normalizing constant .

• However, (2) follows the premise of DP that data is exhcangeable and does not incorporate dependence on predictors.

• Next, PPMs will be generalized such that predictor dependence is incorporated.

Generalized PPM• The goal of the paper is to formulate (1) such that the cohesion

depends on the subject’s predictor:

• This can be done following a process very similar to the non-predictor case above.

• Once again, the connection between DP and PPM will be used, this will henceforth be referred to as GPPM

• The formulation is interesting because the predictors will be treated as random variables rather than known fixed values (as in KSBP).

GPPM• Consider the following hierarchical model

– Where , constitutes a base measure on and , the parameters of the data and predictor, respectively.

– This model will segment data {1,…,n} into k clusters. As before, denotes that subject i belongs to cluster h.

– and , which denote the unique values of the parameters associated with the subject and its predictor, shown below

GPPM• The joint distribution of can be developed in a similar manner to (2):

• The conditional distribution of given predictors is

• For comparison, (2) is shown below:

• The cohesion in (6) is

• (7) meets the criteria originally set out.

GPPM• Some thoughts on GPPM so far:

– As noted earlier the posterior distribution of PPMs are still in the class of PPMs, but with updated cohesion.

– Similiarly, the posterior of a GPPM will also take the form of a GPPM

– (2) and (6) are quite similar. The extra portion of (6) is the marginalized probability of the predictor .

– If , then the GPPM reverts to the Blackwell-MacQueen formulation, seen clearly in the following theorem.

)y()S( *hhh fc

Generalized Polya Urn Scheme• The following theorem shows that the GPPM can induce a

Blackwell-MacQueen Polya Urn scheme, generalized for predictor dependence:

Generalized Polya Urn Scheme• By the above theorem, data i will do either 1) or 2)

– 1) Draw a previously unseen unique value proportional to the concentration parameter and the base measure on the predictor

– 2) Draw a previously used unique value equal to the parameters of

cluster h proportional to the number of data which have previously chosen that unique value and the marginal likelihoods of its predictor value across the clusters.

• Further, since the predictors are treated as random variables, updating the posteriors on each cluster’s predictor parameters means that GPPM is a flexible, non-parametric way to adapt the distance measure in predictor space.

• In this paper G is always integrated out; however, Dunson alludes to variational techniques which could still be developed in similar fashion following the fast Variational DP proposed by Kurihara et al (2006).

Generalized Polya Urn Scheme• Consider, for example, a Normal-Wishart prior on the predictor as follows

• Where and are multiplicative constants and is a Wishart distribution with degrees of freedom and mean

• Notice that this formulation adds another multiplier to the precision of the predictor distribution. This analogously corresponds to kernel width in KSBP, and encourages tight local clustering in predictor space.

• The marginal distributions on the predictors from Theorem 1 take the forms shown on the next slide.

Generalized Polya Urn Scheme• The marginal distribution of the predictor in the first weight:

• The marginal distribution of the predictor in the second weight has the same functional form but with updated hyperparameters:

Non-central multivariate t-distribution with degrees of freedomMean and scale

2/)(*0

*0*2/1*

)()'(11||)2/()(

)2/)((),,|(p

xxxxxx

And is the empirical mean of the predictors in cluster h, without predictor i.

Generalized Polya Urn Scheme• Posterior updating in this model is straightforward using MCMC. The

conditional posterior of the parameters is

• The indicators are updated separately from the cluster parameters . The membership indicators are sampled from it multinomial posterior:

• Next, update the parameters conditioned on and number of clusters k.

where is the base prior updated with the data likelihood

and the weights from Theorem 1

Results• Dunson et al. demonstrates results using the following model on

conditional density regression problems

• Where

• Demonstrate results on 3 datasets:– Simulated Single Gaussian (p = 2)– Simulated Mixture of two Gaussians (p = 2)– Epidemiology data (p = 3)

P-dimensional predictor

Data likelihood

Parameters of cluster h.

Results• Simulated single Gaussian data, 500 data points

– is generated iid from a uniform distribution over (0,1).– Data was simulated using

• Algorithm was run for 10,000 iterations with 1,000 iteration burn-in. Fast mixing and good estimates. Raw Data

Below are conditional distributions on y for two different values of x. The dotted lines is truth, the solid line is the estimation, and the dashed lines are 99% credibility intervals

Results• Simulated 2 Gaussian results, 500 data points

– is generated iid from a uniform distribution over (0,1).– Data was simulated using

Here, the left column of plots are for a PPM (non-generalized, while the right column plots is the GPPM on the same dataset. Notice much better fitting in the bottom plots, and that the GPPM is not dragged toward 0 as the second peak appears when approaches 0.

PPM GPPM

Results• Epidemiologic Application:• DDE is shown to increase the rate of pre-term birth. Two

predictors and correspond to DDE dose for child i, and mother’s age after normalization, respectively.

• Dataset size was 2,313 subjects.

• MCMC GPPM was run for 30,000 iterations with 10,000 iteration burn-in.

• The results confirmed earlier findings that DDE causes a slightly decreasing trend as DDE level rises.

• These findings are similar to previous KSBP work on the same dataset, but the implementation was simpler.

Results

Dashed lines indicate 99% credibility intervals

Raw Data

Conclusion• A GPPM was formulated beginning with the Blackwell-MacQueen

Polya Urn scheme.

• The GPPM incorporates predictor dependence by treating the predictor as a random variable.– It is similar in spirit to the KSBP, but is able to bypass issues such as kernel

width selection and the inability to implement a continuous distribution in predictor space.

• Future research directions could explore Dunson’s mention of a variational method similar to the formulation proposed in this paper.

Bayesian Generalized Product Partition Model

Documents

BAYESIAN MODEL AVERAGING FOR GENERALIZED LINEAR MODELS … · 2013. 5. 10. · BAYESIAN MODEL AVERAGING FOR GENERALIZED LINEAR MODELS WITH MISSING COVARIATES VALENTINO DARDANONI,

A generalized many-facet Rasch model and its Bayesian

General Design Bayesian Generalized Linear Mixed Modelsjstauden/zscw-paper.pdf · generalized additive models, generalized geostatistical models, additive models with interactions,

brms: An R Package for Performing Bayesian Generalized

1 Coverage Optimization using Generalized Voronoi …0908.3565v3 [math.OC] 12 Oct 2011 1 Coverage Optimization using Generalized Voronoi Partition K.R. Guruprasad and Debasish Ghose

Beyond the Generalized Linear Mixed Model: a Hierarchical Bayesian Perspective

Bayesian Inference for Generalized Autoregressive Score Models › pub › 39258 › Niesert.pdf · 2017-09-19 · Bayesian Inference for Generalized Autoregressive Score Models ByRobinNiesert(344760)

The partition problem: case studies in Bayesian screening ... · The partition problem: case studies in Bayesian screening for time-varying model structure Zesong Liu Jesse Windle

Bayesian State Estimation Using Generalized Coordinateskarl/Bayesian State... · 2011. 11. 11. · Bayesian State Estimation Using Generalized Coordinates Bhashyam Balaji a, and Karl

Generalized linear models - CEREMADExian/BCS/Bglm.pdf · Generalized linear models Metropolis{Hastings algorithms Convergence assessment. Bayesian Core:A Practical Approach to Computational

Bayesian generalized linear models and an appropriate ... · Bayesian generalized linear models and an ... Su Bayesian generalized linear models and an appropriate default prior

Dynamic Generalized Linear Models and Bayesian Forecastingmw/MWextrapubs/West1985a.pdf · Dynamic Generalized Linear Models and Bayesian Forecasting MIKE WEST, P. JEFF HARRISON, and

Bayesian estimation and prediction for the generalized

Bayesian Generalized Kernel Mixed Modelsjmlr.csail.mit.edu/papers/volume12/zhang11a/zhang11a.pdf · 2020-04-29 · BAYESIAN GENERALIZED KERNEL MIXED MODELS In geostatistics, GPs have

Bayesian hierarchical models: convexity, sparsity and ... · data via hierarchical Bayesian models with generalized gamma hyperpriors: convergence, convexity and performance. Manuscript

A Generalized Fast Subset Sums Framework for Bayesian ...neill/papers/icdm2011.pdf · enabling detection of irregularly-shaped clusters. Here we propose a Generalized Fast Subset

A Bayesian Generalized Linear Model for the Bornhuetter ...method with generalized linear models by applying Bayesian estimation. The present paper is based very much on generalized

Generalized Linear Models - ASU Digital Repository · 2017. 6. 1. · Generalized Linear Models in Bayesian Phylogeography by Daniel Magee A Dissertation Presented in Partial Fulfillment

1 Advanced Chemical Engineering Thermodynamics Appendix BK The Generalized van der Waals Partition Function

Bayesian inference by reversible jump MCMC for clustering ...amansystem.com/apps/publications/papers/Bayesian... · Keywords Finite mixtures · Generalized inverted Dirichlet · Bayesian