27
Applying Latent Profile Analysis to Classify Chicago Neighborhoods Oksana Pugach, PhD Institute for Health Research and Policy University of Illinois at Chicago December, 2017

Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Applying Latent Profile Analysisto Classify Chicago Neighborhoods

Oksana Pugach, PhD

Institute for Health Research and Policy

University of Illinois at Chicago

December, 2017

Page 2: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Cluster Analysis

• Identifying group of individuals or objects that are similar to each other but different from

individuals in other groups

• Cluster analysis and discriminant analysis both classify objects into categories

• In a nutshell:

– select cases

– select variables (standardize?)

– select clustering procedure

• hierarchical clustering

• k-means clustering

• two-step clustering

• Cluster analysis does not identify a particular statistical method

Page 3: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Cluster Analysis

• Different cluster methods will result in different and conflicting solutions. Final cluster

solution and selection of cluster number is informal and subjective

• Alternative approach to clustering which postulates a formal statistical model for the

population: model assumes that population consists of subpopulations (‘clusters’) in each of

which variables have different multivariate probability density function, resulting is a finite

mixture density for the population as a whole.

• Problem: estimate parameters of the density functions and mixing probabilities

• Calculate: posterior probability of cluster membership

• How to determine number of clusters: model selection by objective procedures

Page 4: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Latent Profile Analysis• Latent profile models are commonly attributed to Lazarsfeld and Henry (1968).

• Cluster analysis based on finite mixture models (FMM) are aka model-based clustering methods (Banfield, J. D & Raftery, A. E, 1993)

• FMM can be seen as a form of latent variable analysis (Skrondal & Rabe-Hesketh, 2004) with subpopulation being a latent categorical variable – aka latent class cluster analysis

Source: Oberski, D. (2016). Mixture Models: Latent Profile and Latent Class Analysis. In Modern Statistical Methods for HCI (pp. 275–287). Springer, Cham. https://doi.org/10.1007/978-3-319-26633-6_12

Observed Models for means Regression models

Latent Latent

Continuous Discrete Continuous Discrete

Continuous Factor analysisLatent profile analysis

Random effectsRegression mixture

DiscreteItem response theory

Latent class analysis

Logistic ran. eff. Logistic reg. mix.

Names of different kinds of latent variable models

Page 5: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Finite Mixture Densities

• Model

• x – p-dimensional random vector

• Pj – mixing probabilities

• gj() – component densities

• c – number of clusters

Assumption for finite mixture as model for cluster analysis: each group of observations in a

dataset comes from population with a different probability distribution

1

( ) ;c

j j jj

f p g

x;p,θ x θ

11

c

jj

p

Page 6: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Cluster allocation

Having estimated the parameters of the assumed mixture density, observations can be

associated with particular clusters based on the basis of the maximum value of the posterior

probability

ˆˆ ,Pr |

ˆˆ; ,

j j i

i

i

p gcluster j

f

x θx

x p θ

Page 7: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Maximum Likelihood Estimation

Estimation by:

Expectation Maximization algorithm (usually used)

Bayesian estimation methods using Gibbs sampler or other MCMC methods

1

, ln ; ,n

i

i

l f x

p θ p θ

Page 8: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Maximum Likelihood Estimation for mixtures of multivariate normal

• As number of clusters increases, number of model parameters increases rapidly. Restrictions

on can be imposed to obtain more parsimony and stability.

• Banfield, J. D & Raftery, A. E, 1993 proposed reparameterizing of class-specific covariance

matrix by principal component

Geometrical interpretation of the decomposition

Volume, Orientation, and Shape of j-cluster

Restrictions applied can be directly interpreted in terms of geometrical form of a cluster

j j j j jD A D

j

Page 9: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Parameterisations of the within-group covariance matrix for multidimensional data available in the mclust

package, and the corresponding geometric characteristics (Scrucca, Fop, Murphy, & Raftery, 2016)

Page 10: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and
Page 11: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Example of mixture of two normals

Page 12: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Other finite mixture models

• Mixture of multivariate t-distributions – robust to outliers and skewed distributions

• Mixtures for categorical data – latent class analysis.

• Multivariate Bernoulli densities with assumption that, given class, the categorical

variables are independent of each other.

Page 13: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Model selection and Inference

• Log-likelihood ratio test

• Unfortunately this does not lead to a suitable statistical test, since the regularity conditions do not hold for - it is on

the edge of the parameter space, when components coincide, their mixing probability become unidentifiable. Tends to

overestimate number of clusters. Alternative – parametric bootstrap – preferred method. Both are available only for

nested models.

• Information theoretic approaches

• Uses a measure of information lost when a particular model is used to approximate the true model: AIC and BIC –

both are penalized log-likelihoods. Smaller value is preferred. All depends heavily on regularity conditions, which do

not necessarily holds in FMM. Robustness is not studied. Recommended to use multiple criteria along with

theoretical and practical considerations.

• Bayes factors

• It is a posterior odds of one model against another model. Estimation requires integration of marginal likelihood

(limitation).

• MCMC method using reversible jump MCMC

Page 14: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Statistical Software

• R: mclust by Fraley and Raftery

• R: flexmix by Gruen and Fleisch

• R: caman by Schlattmann

• Latent GOLD (Statistical Innovations) - is a powerful latent class and finite mixture

program with a very user-friendly point-and-click interface (GUI).

• Mplus by Muthen and Muthen

• gllamm in Stata

• FMM in SAS (experimental)

Page 15: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Application

• Project: Measuring Disparities in the Chain of Survival in Latino Communities

• PI: Marina Del Rios Rivera, MD, MSc

• Funding Agency: American Heart Association (Award No. 16MCPRP30960065)

• Purpose: Explore the relationship between neighborhood-level variables (i.e., language,

educational attainment, and residential instability) and out-of-hospital cardiac arrest

(OHCA) outcomes in Hispanics.

• Data: Surveillance data prospectively submitted to the Cardiac Arrest Registry to Enhance

Survival (CARES) will be geocoded to Census Tracts.

Page 16: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Concentrated disadvantage- composite measure of census-tract level socioeconomic composition in Chicago

Sampson

et.al., 1997

Cagney and

Browning,

2004

Current Analysis,

N=797

mean (sd)

Age Dependency Ratio 52.90 (20.01)

% Unemployed 14.52 (10.15)

% Female-headed HH 20.12 (14.34)

% Median Income HH, 1K 49.66 (26.68)

% Vacant Housing 13.90 (8.81)

% Below Poverty 24.07 (14.65)

% on Public Assistance 24.44 (17.66)

% Less Than High School 18.39 (12.95)

% less than Age 18

% Black

Census tract characteristics of 2010-2014 5-year ACS estimates

Page 17: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

> library(mclust) > mod <- Mclust(mydata2[,-1]) > summary(mod$BIC) Best BIC values: VVV,4 VVE,6 VVV,3 BIC -45562.89 -45592.95763 -45606.92200 BIC diff 0.00 -30.06785 -44.03223 > summary(mod) ---------------------------------------------------- Gaussian finite mixture model fitted by EM algorithm ---------------------------------------------------- Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 4 components: log.likelihood n df BIC ICL -22183.51 797 179 -45562.89 -45638.06 Clustering table: 1 2 3 4 260 331 185 21

21 cases is 2.6%

Page 18: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

BIC plot

Page 19: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Fitting mixture model with 3 classes

> mod.3 <- Mclust(mydata2[,-1], G=3) > summary(mod.3) ---------------------------------------------------- Gaussian finite mixture model fitted by EM algorithm ---------------------------------------------------- Mclust VVV (ellipsoidal, varying volume, shape, and orientation) model with 3 components: log.likelihood n df BIC ICL -22355.84 797 134 -45606.92 -45689.57 Clustering table: 1 2 3 328 273 196

Page 20: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Mixture probabilities and mean (sd) for Census tract characteristics

Component Class 1 Class 2 Class 3Mixing Probabilities 41.3% 34.3% 24.4%Age Dependency Ratio 49.6 (14.7) 67.6 (15.4) 37.9 (19.9) % Less Than High School 24.2 (14.3) 20.4 (8.21) 5.6 (4.48) % Unemployed 11.3 (4.27) 24.8 (9.8) 5.45 (2.67) % Female-headed HH 15.4 (6.71) 35.8 (11) 6.03 (4.02) % Median Income HH, 1K 44.7 (12.5) 29.4 (10.3) 86.5 (22.9) % Vacant Housing 10.8 (4.65) 21 (9.63) 9.11 (6.44) % Below Poverty 22.6 (9.58) 36.4 (13.6) 9.22 (4.94) % on Public Assistance 21.1 (9.89) 42.2 (13.6) 5.2 (4.23) Labels poor distressed affluent

Page 21: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Uncertainty plot

Page 22: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

> drmod<- MclustDR(mod.3, lambda=1) > summary(drmod) > plot(drmod, what='contour') > plot(drmod, what='contour') > miscl<-mod.3$uncertainty>0.3 > points(drmod$dir[miscl,], pch=1, cex=2) > table(miscl) miscl FALSE TRUE 761 36

Contour plot of estimated mixture

densities on a projection subspace

Page 23: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Chicago Map

Page 24: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Classification by %Race

Page 25: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

• calculated as weighted by factor loading sum of components with loading above 0.3

• Mean (range) = 210.60 (37.81 – 406.15)

• Density Plot

• Class n mean sd min max

• 1 328 203.92 39.08 114.39 315.19

• 2 273 290.47 50.17 155.18 406.15

• 3 196 110.53 29.66 37.81 194.17

Concentrated disadvantage as continuous variable

Page 26: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

Thank you!

• This work was supported by Award No. 16MCPRP30960065 from the NIH – American Heart Association and by the Methodology Research Core at IHRP, UIC.

Page 27: Applying Latent Profile Analysis to Classify Chicago Neighborhoods · 2019. 5. 15. · Latent Profile Analysis • Latent profile models are commonly attributed to Lazarsfeld and

References

• Banfield, J. D, & Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, 49, 803–821.

• Browning, C. R., & Cagney, K. A. (2002). Neighborhood structural disadvantage, collective efficacy, and self-rated physical health in an urban setting. Journal of Health and Social Behavior, 43(4), 383–399.

• Cagney, K. A., & Browning, C. R. (2004). Exploring Neighborhood-level Variation in Asthma and other Respiratory Diseases. Journal of General Internal Medicine, 19(3), 229–236. https://doi.org/10.1111/j.1525-1497.2004.30359.x

• Hagenaars, J. A., & McCutcheon, A. L. (2002). Applied Latent Class Analysis. New York: Cambridge University Press. Retrieved from http://ebookcentral.proquest.com/lib/uic/detail.action?docID=217833

• Oberski, D. (2016). Mixture Models: Latent Profile and Latent Class Analysis. In Modern Statistical Methods for HCI (pp. 275–287). Springer, Cham. https://doi.org/10.1007/978-3-319-26633-6_12

• Sampson, R. J., Raudenbush, S. W., & Earls, F. (1997). Neighborhoods and Violent Crime: A Multilevel Study of Collective Efficacy. Science, 277(5328), 918–924. https://doi.org/10.1126/science.277.5328.918

• Scrucca, L., Fop, M., Murphy, T. B., & Raftery, A. E. (2016). mclust 5: Clustering, classification and density estimation using gaussian finite mixture models. The R Journal, 8(1), 289.

• Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models (1 edition). Boca Raton: Chapman and Hall/CRC.

• Wiley: Cluster Analysis, 5th Edition - Brian S. Everitt, Sabine Landau, Morven Leese, et al. (n.d.). Retrieved November 30, 2017, from http://www.wiley.com/WileyCDA/WileyTitle/productCd-EHEP002266.html