Old and new approaches for the analysis of categorical ...yrosseel/lavaan/lavaan_2015_Berlin.pdf · Department of Data Analysis Ghent University example SEM framework: u = binary,

Department of Data Analysis Ghent University

Old and new approaches for the analysis ofcategorical data in a SEM framework

Yves RosseelDepartment of Data Analysis – Ghent University – Belgium

Myrsini KatsikatsouDepartment of Statistics – London Scool of Economics – UK

Meeting of the Working Group Structural Equation Modeling26 February 2015 – Freie Universitat Berlin

Yves Rosseel Old and new approaches for the analysis of categorical data in a SEM framework 1 / 32


two approaches for handling categorical data in a SEM framework

• limited information approach

– only univariate and bivariate information is used– mainly developed in the SEM literature– perhaps the best known method: three-stage least squares (in Mplus:

estimator WLSMV)– new approach: pairwise likelihood estimation

• full information approach

– all information is used– frequentist approach: marginal maximum likelihood estimation∗ requires numerical integration (number of dimensions = number

of latent variables)∗ mainly developed in the IRT literature (and GLMM literature)∗ only recently incorporated in modern SEM software

– Bayesian approach



example SEM framework: u = binary, o = ordered, y = numeric

u1

u2

u3

u4

o1

o2

o3

o4

y1 y2 y3

f2

f3

f1

x1 x2



full information approach

• three approaches:

1. marginal maximum likelihood (MML)

2. latent response approach

3. (Bayesian estimation)



full information approach: marginal maximum likelihood

• origins: IRT models (eg Bock & Lieberman, 1970) and GLMMs

• the marginal likelihood for the response vector yi can be written as

Li(θ) = f(yi|xi;θ) =

∫D(η)

f(yi|η,xi;θ)f(η|xi;θ)dη

where yi are observed endogenous variables, xi are observed exogenouscovariates, and η are latent variables; D(η) is the domain of integration; θis the parameter vector

• numerical integration

– Gauss-Hermite quadrature– adaptive quadrature– Laplace approximation– Monte Carlo integration

• some clever ‘dimension reduction’ techniques exist for special cases



available software for the marginal maximum likelihood approach

• commercial software:

– SEM software: Mplus, . . .

– IRT software: BILOG-MG, MUTLILOG, PARSCALE, TESTFACT,EQSIRT, IRTPRO, flexMIRT, . . .

• non-commercial, open-source software

– the Stata module ‘gllamm’

– R packages for IRT: TAM, mirt, . . . (see the CRAN Task View: Psy-chometric Models and Methods) and lme4

– R packages for SEM: OpenMx, lavaan (since 0.5-16, but slow)



full information approach: latent response approach (1)

• an observed variable y can often be viewed as a partial observation of a latentcontinuous response y?; eg ordinal variable withK = 4 response categories:

latent continuous response y*

−1.4 0.8 1.8

0.0

0.1

0.2

0.3

0.4

y=1 y=2 y=3 y=4

t1

t2

t3



full information approach: latent response approach (2)

• assumption: both latent continuous responses (y?) and latent variables (η)are multivariate normal

• the likelihood contribution for observation i is given by

Li(θ) = f(yi|xi;θ) =

∫T(y?

i)

N [y?i |µi(θ),Σi(θ)] dy?i

where yi are observed endogenous variables, xi are observed exogenouscovariates; T (y?i ) is the integration region (defined by the thresholds)

• the order of integration equals the number of (non-continuous) observedvariables

• some examples in the literature exist, up to 4 variables



available software for the full information latent response approach


– none?


– R package lavaan (since version 0.5-15)

– estimator="FML"

– integration is done by the sadmvn() function in the R package mnormt– no analytical gradient (for now)

– just for fun



limited information approaches

1. three stage least squares (Mplus WLSMV)

2. pairwise likelihood estimation



the three stage least squares estimator

• developed by Bengt Muthen, in a series of papers; the seminal paper is

Muthen, B. (1984). A general structural equation model withdichotomous, ordered categorical, and continuous latent variableindicators. Psychometrika, 49, 115–132

• this approach has been the ‘golden standard’ in the SEM literature for almostthree decades

• first available in LISCOMP (Linear Structural Equations using a Compre-hensive Measurement Model), distributed by SSI, 1987 – 1997

• follow up program: Mplus (Version 1: 1998), currently version 7.3

• other authors (Joreskog 1994; Lee, Poon, Bentler 1992) have proposed sim-ilar approaches (implemented in LISREL and EQS respectively)

• another great program: MECOSA (Arminger, G., Wittenberg, J., Schepers,A.) written in the GAUSS language (mid 90’s)



stage 1 – estimating the thresholds

• estimating the thresholds: maximum likelihood using univariate data

– if no exogenous variables, this is just# generate ‘ordered’ data with 4 categoriesY <- sample(1:4, size = 100, replace = TRUE)

prop <- table(Y)/sum(table(Y))cprop <- c(0, cumsum(prop))

th <- qnorm(cprop)

– in the presence of exogenous covariates, this is just ordered probit re-gressionlibrary(MASS)X1 <- rnorm(100); X2 <- rnorm(100); X3 <- rnorm(100)

fit <- polr(ordered(Y) ˜ X1 + X2 + X3, method = "probit")fit$zeta



stage 2 – estimating tetrachoric, polychoric, . . . , correlations

• estimate tetrachoric/polychoric/. . . correlation from bivariate data:

– tetrachoric (binary – binary)

– polychoric (ordered – ordered)

– polyserial (ordered – numeric)

– biserial (binary – numeric)

– pearson (numeric – numeric)

• ML estimation is available (see eg. Olsson 1979 and 1982)

– two-step: first estimate thresholds using univariate information only;then, keeping the thresholds fixed, estimate the correlation

– one-step: estimate thresholds and correlation simultaneously

• if exogenous covariates are involved, the correlations are based on the resid-ual values of y? (eg bivariate probit regression)



stage 3 – estimating the SEM model

• third stage uses weighted least squares:

FWLS = (s− σ)>W−1(s− σ)

where s and σ are vectors containing all relevant sample-based and model-based statistics respectively

• s contains: thresholds, correlations, optionally regression slopes of exoge-nous covariates, optionally variances and means of continuous variables

• the weight matrix W is (a consistent estimator of) the asymptotic covariancematrix of the sample statistics (s)

• robust version: WLSMV

– use the diagonal of W only for estimation (DWLS)– use the full matrix for inference (standard errors and test statistic)– ‘MV’ stands for the Satterthwaite’s mean and variance corrected test

statistic



available software for the WLSMV estimator


– golden standard: Mplus (since 1998)

– LISREL and EQS have similar capabilities (but less general)

– MECOSA (mid ’90s, not available anymore)


– R package lavaan (since version 0.5)

– estimator="WLSMV" is the default estimator if some of the ob-served (endogenous) variables are categorical

– full implementation including ‘delta’ and ‘theta’ parameterization formultiple groups and/or longitudinal data



pairwise likelihood (PL) estimation

• special case of the broader framework of ‘composite’ likelihood estimation

– key idea: the complex likelihood is broken down as a (weighted) prod-uct of component likelihoods which are easier to handle (computation-ally)

– composite ML estimators are asymptotically unbiased, consistent, andnormally distributed

– key references:Lindsay, B. (1998). Composite likelihood methods. Contem-porary Mathematics, 80, 221–239Varin, C. (2008). On composite marginal likelihoods. Ad-vances in Statistical Analysis, 92(1), 1–28.

• introduced in the SEM literature by Joreskog & Moustaki (2001), De Leon(2005), Liu (2007), Xi (2011), Katsikatsou et al. (2012)

• computational complexity can be kept low regardless the number of ob-served and latent variables



pairwise likelihood (PL) estimation in SEM (1)

• in PL estimation, all model parameters are estimated in a single step

• for a random sample of N observations, the pairwise loglikelihood (pl) isdefined as the sum of all bivariate log-likelihood functions given by:

pl (θ;y) =

N∑n=1

∑i<i′

lnL (θ; (yin, yi′n)) =

=∑i<i′

ci∑a=1

ci′∑b=1

n (yi = a, yi′ = b) lnπ (yi = a, yi′ = b;θ) ,

where

π (yi = a, yi′ = b;θ) =

∫ τi,a

τi,a−1

∫ τi′,b

τi′,b−1

f (y?i , y?i′) dy

?i dy

?i′ .

• ‘robust’ standard errors are based on the Godambe/sandwich information



pairwise likelihood (PL) estimation in SEM (2)

• a recent simulation study illustrates the many pleasant properties of PL:

– bias and MSE of PL estimators and their (sandwich type) standard er-rors are found to be small in all experimental conditions, and decreas-ing with the sample size

– Katsikatsou, M., Moustaki, I., Yang-Wallentin, F., & Joreskog, K. G.(2012). Pairwise likelihood estimation for factor analysis models withordinal data. Computational Statistics & Data Analysis, 56(12), 4243–4258.

• a follow-up study illustrates how PL can be used in a ‘large’ SEM setting (7latent variables, many indicators)

• available in lavaan since 0.5-11 (dec 2012)

– estimation and ‘robust’ standard errors only– no support for mixed item types– no support for exogenous covariates



inference under PL in a SEM framework

• reference:

Katsikatsou, M., Moustaki, I. (under revision). Inference underpairwise likelihood for structural equation models with ordinalvariables.

• inference:

– Wald test

– Pairwise Likelihood Ratio Test (PLRT) for overall fit

– PL-AIC and PL-BIC

– PLRT for comparing nested models

• available in the development version of lavaan (0.5-18)



lavaan example

• simulated dataset (N = 500)

– 7 latent variables (2 endogenous, 5 exogenous)

– 45 ordinal indicators (4 response categories)

– structural part:

ξ6 ∼ ξ1 + ξ2 + ξ3 + ξ4 + ξ5

ξ7 ∼ ξ5 + ξ6

• timings:

– lavaan estimator = “PML”: currently takes about 36 minutes (3 minestimation, 17min standard errors, 16min test statistic)

– lavaan estimator = “WLSMV” about 3 minutes

– Mplus estimator = “ML”, integration = montecarlo (700), default set-tings: 1h 17min, but failed with THE MODEL ESTIMATION DID NOT TERMINATE NORMALLY



lavaan inputlibrary(lavaan)

Data <- read.csv("rx.ord")Data[,] <- lapply(Data[,], ordered)

simModel <- ’# exogenous lvksi1 =˜ V1 + V2 + V3 + V4 + V5ksi2 =˜ V6 + V7 + V8ksi3 =˜ V9 + V10 + V11 + V12 + V13ksi4 =˜ V14 + V15 + V16ksi5 =˜ V17 + V18 + V19 + V20 + V21

# endogenous lvksi6 =˜ V22 + V23 + V24 + V25 + V26 + V27 + V28 + V29 +

V30 + V31 + V32 + V33 + V34 + V35 + V36 + V37 + V38ksi7 =˜ V39 + V40 + V41 + V42 + V43 + V44 + V45

# structural modelksi6 ˜ ksi1 + ksi2 + ksi3 + ksi4 + ksi5ksi7 ˜ ksi5 +ksi6

’

fit <- sem(model = simModel, data = Data, estimator = "PML")



lavaan output (header only)lavaan (0.5-18.789) converged normally after 105 iterations

Number of observations 500

Estimator PML RobustMinimum Function Test Statistic 594.086 132.368Degrees of freedom 927 125.245P-value NA 0.314Scaling correction factor 0.223for the mean and variance adjusted correction

Parameter estimates:

Information ObservedStandard Errors Robust.huber.white

Estimate Std.err Z-value P(>|z|)Latent variables:ksi1 =˜V1 1.000V2 0.974 0.055 17.612 0.000V3 1.165 0.047 24.705 0.000V4 1.146 0.048 23.682 0.000V5 0.946 0.055 17.186 0.000

ksi2 =˜...



last slide

• PL estimation is a (relatively) new approach for handling categorical data ina SEM framework

– PL can handle a large number of observed and latent variables

– PL has many attractive statistical properties

– support for the ‘full’ SEM framework

• the R package lavaan:

– still catching-up in the ‘full information ML’ area (but wait for 0.6!)

– full support for WLSMV (and friends)

– PL is currently only implemented in lavaan∗ no support for inference in multiple groups yet!∗ complete data only (PL with missing values is ongoing research)



Thank you!

(questions?)

http://lavaan.org



PL estimation: likelihood

• basic assumption:(y?iy?i′

)∼ N2

((0

0

),

(1

ρi′i 1

))

• the pl for N independent observations:

pl (θ;y) =

N∑n=1

∑i<i′

lnL (θ; (yin, yi′n)) =

=∑i<i′

ci∑a=1

ci′∑b=1

n (yi = a, yi′ = b) lnπ (yi = a, yi′ = b;θ) ,

where

π (yi = a, yi′ = b;θ) =

∫ τi,a

τi,a−1

∫ τi′,b

τi′,b−1

f (y?i , y?i′) dy

?i dy

?i′ .



properties of the pairwise likelihood estimator

• θPL = maxθpl (θ;y)

• under regularity conditions upon the component likelihoods,

√N(θPL − θ

)→ N

(0, G−1(θ)

)where

G(θ) = H(θ)J−1(θ)H(θ)

and

H(θ) = E

{− ∂2

∂θ′∂θpl(θ;y)

}J(θ) = V ar

{∂

∂θ′pl(θ;y)

}



PLRT for nested models (1)

• let θ be partitioned as θ = (ψ′,ω′)′, whereψ : vector of parameters of interest of dimension d,ω : vector of nuisance parameters.

• consider the hypothesis:

H0 : ψ = ψ0 vs H1 : ψ 6= ψ0

• Pace et al. (2011),

PLRT (ψ0) = 2(pl(θ)− pl

(θ))

,

where θ = (ψ′, ω′)′ and θ = (ψ′0, ω′ψ0

)′ are the PL estimators under H1

and H0, respectively.

• the standard asymptotic result that, under H0, PLRT → χ2diff cannot be

used.



PLRT for nested models (2)

• instead, we use a Satterthwaite approximation; under H0,

E (PLRT (ψ0))12 ∗ V ar (PLRT (ψ0))

PLRT (ψ0)→ χ2v

where v = [E(PLRT (ψ0))]2

12∗V ar(PLRT (ψ0))

,

E (PLRT (ψ0))→ tr(Gψψ

[Hψψ

]−1),

V ar (PLRT (ψ0))→ 2 ∗ tr(Gψψ

[Hψψ

]−1Gψψ

[Hψψ

]−1),

• Gψψ and Hψψ are the parts of G−1 and H−1 that refer to ψ.



PL version of AIC and BIC

• AICPL based on Varin & Vidoni (2005):

AICPL = −2 ∗ pl(θPL;y

)+ 2 ∗ tr(J(θPL)H−1(θPL))

• BICPL based on Gao and Song (2009):

BICPL = −2 ∗ pl(θPL;y

)+ logN ∗ tr(J(θPL)H−1(θPL))



PLRT for overall fit (1)

• let θ be partitioned as θ = (ϕ′, τ ′)′, where τ : the vector of thresholds,ϕ : the vector of the rest SEM parameters, dimension d.

• recall P ≡ Corr (y?); let ρ = vech (P ), dimension p

• consider the hypothesis:

H0 : ρ = g(ϕ) versus H1 : ρ unconstrained

where g : Rd → Rp

• under H1, the parameter vector ϑ is partitioned as ϑ = (ρ′, τ ′)′

• τ : nuisance parameter




• letPLRTSEM = 2

(pl(ϑ)− pl

(θ))

,

where ϑ = (ρ′, τ ′)′ and θ = (ϕ′, τ ′)′ are the PL estimates under H1 andH0, respectively

• under H0,E (PLRTSEM )

12 ∗ V ar (PLRTSEM )

PLRTSEM → χ2v

where

v =[E (PLRTSEM )]

2

12 ∗ V ar (PLRTSEM )




• mean:

E (PLRTSEM )→ tr(Gρρ [Hρρ]

−1)− tr

(Gϕϕ [Hϕϕ]

−1)

• variance:

V ar (PLRTSEM )→ 2 ∗ tr(Gρρ [Hρρ]

−1Gρρ [Hρρ]

−1)

+ 2 ∗ tr(Gϕϕ [Hϕϕ]

−1Gϕϕ [Hϕϕ]

−1)

− 4 ∗ tr(M ′ [Hρρ]

−1MGϕϕ [Hϕϕ]

−1Gϕϕ

)where

M =∂

∂ϕg (ϕ)

∣∣∣∣ϕ=ϕ0


Documents

Old and new approaches for the analysis of categorical ...yrosseel/lavaan/lavaan_2015_Berlin.pdf · Department of Data Analysis Ghent University example SEM framework: u = binary,