Finite Mixtures of Quantile and M-quantile regression models · Likelihood Inference (focus on MQ) 3 Applications Pain Labor Data & Treatment of lead-exposed children The Millennium

Introduction on FM Finite Mixtures for Q and MQ Applications Conclusions

Finite Mixtures of Quantile and M-quantileregression models

Marco Alfo1 Nicola Salvati2 M.G. Ranalli3

1Sapienza Universita di Roma 2Universita di Pisa 3Universita di Perugia

Workshop on “Recent Advances in Quantile and M-quantileRegression”

Universita di Pisa — July 15th, 2016

Alfo, Salvati, Ranalli

Finite Mixtures of Quantile and M-quantile regression models


Dependent Observations

Essential References

Alfo, M., Salvati, N., Ranalli M.G. (2016)Finite mixtures of quantile and M-quantile regression models.Statistics and Computing

Tzavidis, N., Salvati, N., Schmid, T., Flouri, E., Midouhas, E. (2016)Longitudinal analysis of the strengths and difficulties questionnaire scores of theMillennium Cohort Study children in England using M-quantile random-effectsregression,Journal of the Royal Statistical Society: Series A





The presentation at a glance

Data are seldom i.i.d. and without outliers!

Dependent Observations (multilevel, longitudinal, panel data)

Quantile and M-quantile regression models

Introducing Finite Mixtures (nonparametric distribution forthe random effects)

Maximum Likelihood Estimation

Multivariate extension





Outline

1 Introduction on Finite MixturesDependent ObservationsFinite mixtures of regression models

2 Finite Mixtures for Quantile and M-Quantile regression modelsLikelihood Inference (focus on MQ)

3 ApplicationsPain Labor Data & Treatment of lead-exposed childrenThe Millennium Cohort Study (Joint work with MF Marino &N Tzavidis)

4 Conclusions





Hierarchically structured data

Regression model for multilevel data

E(yij | xij , bi) = x′ijβ +w′

ijbi, i = 1, . . . , n, j = 1, . . . , ri

yij , observed response variable

xij = (xij1, . . . , xijp)′ vector of explanatory variables; let

xij1 ≡ 1

Linear Models (for ease of notation) → GLMs





Hierarchically structured data

Regression model for multilevel data

E(yij | xij , bi) = x′ijβ +w′

ijbi, i = 1, . . . , n, j = 1, . . . , ri

wij is a subset of xij that contains those p1 6 p variableswhose effects are assumed to be individual-specific

the effects bi i = 1, . . . , n, vary across individuals according toa distribution h(·)





Likelihood

Local independence assumption

L(Φ) =

n∏i=1

∫B

ri∏j=1

f(yij |xij , bi)dH(bi)

,

Φ global set of parameters,

f(·) is the Gaussian density,

H(·) is the random coefficient cdf and B the correspondingsupport

In the general case, the integral defining the likelihood can notbe analytically computed (GQ, aGQ, MCML, Composite Lik,etc.)




Finite mixtures of regression models

Nonparametric distribution for the random coefficients

Leave h(·) unspecified

Approximate h(·) by a discrete distribution on G < nlocations {b1, . . . , bG}, with associated probabilities definedby πk = Pr(bi = bk), i = 1, . . . , n and k = 1, . . . , G.

bi ∼G∑k=1

πkδbk

where δθ is a one-point distribution putting a unit mass at θ.





Comparing the Likelihoods

Nonparametric distribution for the random effects

L(Φ) =

n∏i=1

G∑k=1

∏j

f(yit|xit, bk)πk

=:

n∏i=1

G∑k=1

∏j

fijkπk

.

Parametric distribution for the random effects

L(Φ) =

n∏i=1

∫B

∏j=1

f(yij |xij , bi)dH(bi)

,





Comparing the Likelihoods

Nonparametric distribution for the random effects

L(Φ) =

n∏i=1

G∑k=1

∏j

f(yij |xij , bk)πk

=:

n∏i=1

G∑k=1

∏j

fijkπk

.

Φ = {β, b1, . . . , bG, π1, . . . , πG}fijk is the distribution of the response variable for the j-thmeasurement in the i-th cluster when the k-th component ofthe finite mixture, k = 1, . . . , G is considered

resembles the likelihood function for a finite mixture ofGaussian distributions





Regression model

semi-parametric approximation to a fully parametric, possiblycontinuous, distribution for the random coefficients

a model-based clustering approach, where the population ofinterest is assumed to be divided in G homogeneoussub-populations which differ for the values of the regressionparameters

Considering the k-th component of the mixture,

E(yij | xij , bk) = x′ijβ +w′

ijbk.





Estimation of model parameters (1)

The score function can be written as the posterior expectation ofthe score function corresponding to a standard LM:

S (Φ) =∂ log[L(Φ)]

∂Φ=

n∑i=1

G∑k=1

τik∑j

∂ log fijk∂Φ

,

where the weights

τik =

∏j fijkπk∑l

∏j fijlπl

represent the posterior probabilities of component membership.





Estimation of model parameters (2)

Likelihood equations that are essentially weighted sums of thelikelihood equations for a standard LM, with weights τik.

The basic EM algorithm is defined by solving equations for agiven set of the weights, and updating the weights as afunction of the current parameter estimates.




Finite Mixtures for Quantile and M-Quantile regression models

Outline

1 Introduction on Finite Mixtures

2 Finite Mixtures for Quantile and M-Quantile regression modelsLikelihood Inference (focus on MQ)

3 Applications

4 Conclusions





Quantile and M-Quantile regression models for dependentobservations

Linear Quantile Random Effect models(Geraci & Bottai, 2007, 2014; Liu & Bottai, 2009)

Qq(yij | xij , bi,q) = x′ijβq +w

′ijbi,q

Linear M-Quantile Random Effect models(Tzavidis et al., 2016)

MQq(yij | xij , bi,q, ψ, c) = x′ijβq +w

′ijbi,q

Note that both fixed and random coefficients vary withq ∈ (0, 1)Random effects are normally distributed





Finite Mixtures of Q and MQ regression models

Approximate the distribution of the random coefficients through adiscrete distribution defined on a finite, G-dimensional, set oflocations. Then, conditional on k,

Qq(yij | xij , bk,q) = x′ijβq +w

′ijbk,q

MQq(yij | xij , bk,q, ψ, c) = x′ijβq +w

′ijbk,q

for k = 1, . . . , G.

Each component of the mixture is characterised by a different(sub-) vector of regression coefficients, bk,q, k = 1, . . . , G

Note that the distribution of bk,q may vary with quantiles




Likelihood Inference for MQ

Estimation of model parameters (focus on MQ)

L(Φq) =

n∏i=1

G∑k=1

∏j

fq(yij |xij , bk,q)πk,q

.

Φq ={βq, b1,q, . . . , bG,q, σq, π1,q, . . . , πG,q

}fq(·) is the ALID (Asymmetric Least Informative Density,Bianchi et al., 2015):

fq(·) =1

Bq(σq, c)exp{−ρq(·)}

Bq(σq, c) is a normalising constant that ensures the densityintegrates to oneρq(·) is the Huber loss function.





Missing data approach

zik,q =

{1 if unit i is in component k of the mixture0 otherwise

P (zik,q = 1) = πk,q = P (bi,q = bk,q)zi,q = (zi1,q, ..., ziG,q)

′, i = 1, ..., n, are considered as missingdata

Complete data log-likelihood

Should we have observed, for each i, (yi, zi,q), the log-likelihoodfor the complete data would have been:

`c(Φq) =

n∑i=1

G∑k=1

zik,q{log[fq(yi | βq, bk,q, σq)

]+ log(πk,q)

}Alfo, Salvati, Ranalli




Missing data approach

zik,q =

{1 if unit i is in component k of the mixture0 otherwise

P (zik,q = 1) = πk,q = P (bi,q = bk,q)zi,q = (zi1,q, ..., ziG,q)

′, i = 1, ..., n, are considered as missingdata

Complete data log-likelihood

Should we have observed, for each i, (yi, zi,q), the log-likelihoodfor the complete data would have been:

`c(Φq) =

n∑i=1

G∑k=1

zik,q{log[fq(yi | βq, bk,q, σq)

]+ log(πk,q)

}Alfo, Salvati, Ranalli




Maximum Likelihood via the EM algorithm – E-step

Expected value of `c(Φq) over zi,q, conditional on the observeddata and the current parameter estimates:

Q(Φq | Φ(t)

q ) = EΦ

(t)

q

[`c(Φq) | yi]

=

n∑i=1

G∑k=1

τ(t+1)ik,q

{log[fq(yi | βq, bk,q, σq)

]+ log(πk,q)

}.

That is, the unobservable indicators are replaced by theirconditional expectation, which, at iteration (t+ 1) are given by

τ(t+1)ik,q =

π(t)k,qfik,q(Φ

(t)

q )∑l π

(t)l,q fil,q(Φ

(t)

q ), i = 1, . . . , n, k = 1, . . . , G.





Maximum Likelihood via the EM algorithm – E-step

Expected value of `c(Φq) over zi,q, conditional on the observeddata and the current parameter estimates:

Q(Φq | Φ(t)

q ) = EΦ

(t)

q

[`c(Φq) | yi]

=

n∑i=1

G∑k=1

τ(t+1)ik,q

{log[fq(yi | βq, bk,q, σq)

]+ log(πk,q)

}.

That is, the unobservable indicators are replaced by theirconditional expectation, which, at iteration (t+ 1) are given by

τ(t+1)ik,q =

π(t)k,qfik,q(Φ

(t)

q )∑l π

(t)l,q fil,q(Φ

(t)

q ), i = 1, . . . , n, k = 1, . . . , G.





Maximum Likelihood via the EM algorithm – M-step

Maximise the function Q(·) w.r.t. Φq to update parameterestimates.Then Φ

(t+1)

q are defined to be the solutions to the following scoreequation:

∂Q(Φq | Φ(t)

q )

∂Φq= 0,

which are equivalent to the score equations for the observed data,S(Φq) = 0.





Standard Errors

Oakes (1999)’s identity

I(Φq) = −

{∂2Q(Φq | Φq)

∂Φq∂Φ′q

∣∣∣∣∣Φq=Φq

+∂2Q(Φq | Φq)

∂Φq∂Φ′q


= A + B

A Cond. exp. of the complete data Hessian given the obs. data (EM)

B First derivative of the cond. exp. of the complete data Score giventhe obs. data (numDeriv in R)

Sandwich Cov(Φq

)= I(Φq)

−1V (Φq)I(Φq)−1, where

V (Φq) =∑n

i=1 Si(Φq)Si(Φq)′.





Standard Errors

Oakes (1999)’s identity

I(Φq) = −

{∂2Q(Φq | Φq)

∂Φq∂Φ′q


+∂2Q(Φq | Φq)

∂Φq∂Φ′q


= A + B

A Cond. exp. of the complete data Hessian given the obs. data (EM)

B First derivative of the cond. exp. of the complete data Score giventhe obs. data (numDeriv in R)

Sandwich Cov(Φq

)= I(Φq)

−1V (Φq)I(Φq)−1, where

V (Φq) =∑n

i=1 Si(Φq)Si(Φq)′.




Classical Datasets

Applications

Univariate response (Alfo, Salvati, Ranalli, Stat. Comp., 2016)

Pain Labor DataTreatment of lead-exposed children

Multivariate response (Joint work with M.F. Marino & N.Tzavidis)

The Millennium Cohort Study






Longitudinal study on children’s emotional/behaviouralproblems measured via the Strengths and DifficultiesQuestionnaire (SDQ)

n = 9021 children born in the UK between Sept. 2000 andSept 2001

First information collected when children were around 9months old. Waves 2, 3, 4 took place around ages 2, 5, and 7





Outcome variables

internalizing SDQ - i-SDQ (emotional problems): total scoreon 5 emotional symptom items + 5 peer problem items(0− 20)

externalising SDQ - e-SDQ (behavioural problems): total scoreon 5 conduct problem items + 5 hyperactivity items (0− 20)

i−SDQ0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

e−SDQ0 5 10 15 20

0.00

0.05

0.10

0.15





Multivariate Extension

yijh, h = 1, 2 observed outcomes

The joint conditional distribution from unit i is

fq(yi | βq, bi,q,σq) =H∏h=1

∏j

fq(yijh | βh,q, bih,q, σh,q

).

Conditional independence assumption





Covariates

ALE11 : number of potentially Adverse Life Events (0− 11)

SED4 : family poverty score measured on the SED scale (0− 4)

KESSM: maternal depression score measured on the Kessler scale(0− 24)

IMD: neighborhood deprivation rank measured by the Index ofMultiple Deprivation with lower values corresponding to higherdeprivation (1− 10)

Age: child’s age

Maternal education: no qualification (bsl.), degree, GCSE

Ethnicity : non-white (bsl.), white

Gender: female (bsl.), male

Statification: advantaged (bsl.), ethnic, disadvantaged





Modeling details

Focus on more severe emotional and behavioural problems,i.e. q = {0.50, 0.75, 0.90}Discrete random intercepts to account for dependence

Age is centered around the mean and a squared effect is alsoconsidered

ALE11, SED4, KESSM, and IMD are centered around theirindividual means to account for between/within individualeffects

BIC is used to select the optimal model (G = 1, . . . , 15)





Discrete distributions of random effects

−2 0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

i−SDQ

Locations

Est

imat

ed c

df

q = 0.50

q = 0.75

q = 0.90

−4 −2 0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

e−SDQ

Locations

Est

imat

ed c

df

q = 0.50

q = 0.75

q = 0.90

Higher dispersion for e-SDQ intercepts

The probability of higher components increases with q

Random intercept distribution is quite far from symmetry andunimodality





Model for the M-median

i-SDQ e-SDQEst se Est se

Age -0.02 0.04 -0.45 0.05Age2 0.07 0.01 0.21 0.02ALE11 mean 0.09 0.23 0.19 0.04ALE11 0.06 0.02 0.09 0.06SED4 mean 0.12 0.05 0.17 0.14SED4 -0.04 0.06 -0.01 0.07Kessm mean 0.17 0.08 0.23 0.09Kessm 0.08 0.01 0.11 0.02Degree -0.66 0.74 -1.17 0.44Gcse -0.41 0.34 -0.50 0.27White -0.31 0.11 0.17 0.16Male 0.05 0.12 0.75 0.16IMD mean -0.02 0.04 -0.04 0.04IMD -0.00 0.03 -0.03 0.04Ethnic st. 0.18 0.10 -0.05 0.22Disadv st. 0.07 0.39 0.11 0.32σu 1.72 2.52

Both i-SDQ and e-SDQ reduceas the time passes by untilchildren are 5 years old andstart increase afterwards

Adverse life events (ALE11)and maternal depression(KESSM) are positivelyassociated with both responses

Family poverty (SED4) seemsto affect i-SDQ only

White children have loweri-SDQ

Males have higher e-SDQ





Model for M-q = 0.75


Age -0.01 0.01 -0.47 0.01Age2 0.08 0.01 0.24 0.01ALE11 mean 0.19 0.04 0.32 0.06ALE11 0.08 0.02 0.10 0.03SED4 mean 0.12 0.05 0.23 0.07SED4 -0.03 0.04 0.00 0.05Kessm mean 0.24 0.01 0.26 0.02Kessm 0.10 0.01 0.13 0.01Degree -0.78 0.12 -1.40 0.18Gcse -0.48 0.11 -0.60 0.15White -0.34 0.12 0.42 0.22Male 0.17 0.05 0.97 0.10IMD mean -0.05 0.02 -0.05 0.02IMD -0.01 0.02 -0.03 0.03Ethnic st. 0.22 0.13 -0.05 0.25Disadv st. 0.06 0.07 0.18 0.12σu 1.73 2.54

ALE11, SED4, and Kessmpositively affect bothresponses and their impact ishigher wrt q = 0.50

Males have more severeinternalising and externalisingproblems that females

Children living in less deprivedareas (higher IMD) havelower i-SDQ and e-SDQ





Model for M-q = 0.90


Age 0.04 0.01 -0.46 0.02Age2 0.09 0.01 0.25 0.01ALE11 mean 0.37 0.06 0.51 0.08ALE11 0.10 0.03 0.10 0.04SED4 mean 0.21 0.08 0.34 0.10SED4 -0.05 0.06 0.01 0.07Kessm mean 0.35 0.02 0.36 0.03Kessm 0.13 0.02 0.16 0.02Degree -1.05 0.14 -1.65 0.21Gcse -0.63 0.13 -0.75 0.19White -0.42 0.13 0.37 0.24Male 0.35 0.09 1.25 0.14IMD mean -0.09 0.02 -0.07 0.03IMD -0.00 0.04 -0.03 0.04Ethnic st. 0.14 0.16 -0.18 0.25Disadv st. -0.02 0.11 0.25 0.18σu 1.70 2.40

The effect of ALE11, SED4,maternal depression (KESSM),and neighbourhood deprivation(IMD) becomes much strongerfor high SDQ scores

Severe problems are less likelywith higher mother’seducational levels

The effect of race and genderbecomes more evident forhigher percentiles




Conclusions

Conclusions

We have developed Q and MQ regression models that candeal with dependent observations: the dependence withinobservations from the same individual is modelled viaindividual-specific discrete random parameters

By suitably setting the tuning constant c to a large value, weget Finite Mixtures of Expectile regression models

Nonparametric distribution of the random effects is more inthe spirit of Q and MQ models

It is possible to carry out a ML inference and obtain analyticalSEs

It can be extended to handle Multivariate outcomes




Conclusions

Future developments

Consider time-varying random parameters to model sources ofunobserved heterogeneity that evolve over time, e.g. viaLatent Markov Models (Farcomeni, 2012)

Extension to zero-inflated data

Extension to count data

Application in the small area estimation setting (focus is onprediction, rather than estimation)



Documents

Finite Mixtures of Quantile and M-quantile regression models · Likelihood Inference (focus on MQ) 3 Applications Pain Labor Data & Treatment of lead-exposed children The Millennium