Summer&School&on&Longitudinal&and& LifeCourse&Studies& … · 2019. 12. 4. · Summer&School&on&Longitudinal&and& LifeCourse&Studies& A&(short)&introduc8on&to&Mul8level& (and&Longitudinal)&Modelling&–1&

Summer School on Longitudinal and Life Course Studies

A (short) introduc8on to Mul8level (and Longitudinal) Modelling – 1

August 2014 Francesco C. Billari

Lecture topics •  Mul8level and longitudinal data structures •  Smoking and birthweight data •  The variance-‐components model •  Linear random intercept model •  Random-‐coefficients model (introduc8on)

•  Key reference: Sophia Rabe-‐Hesketh and Anders Skrondal, Mul$level and Longitudinal Modeling Using Stata, Stata Press. Second Edi8on (2008) or Third Edi8on (2012).

Hierarchical data structures

Level 1

Level 2

Level 3

Individual 1,

class 1,

school 1

Class 1,

school 1

School 1

Individual 2,

class 1,

school 1

Individual 3,

class 1,

school 1

…

…

…

Longitudinal (discrete-‐8me) data structures

Level 1

Level 2

Level 3

Time 1,

individual 1,

region 1

Individual 1,

region 1

Region 1

Time 2,

individual 1,

region 1

Time 3,

Individual 1, region 1

…

…

…

Data: countries/regions

Fieldhouse, E., Tranmer, M., & Russell, A. (2007). “Something about young people or something about elec8ons? Electoral par8cipa8on of young people in Europe: Evidence from a mul8level analysis of the European Social Survey.” European Journal of Poli8cal Research, 46(6), 797-‐822.

Data: neighborhood

Cerdá, M., S. L. Buka, et al. (2008). "Neighborhood influences on the associa8on between maternal age and birthweight: A mul8level inves8ga8on of age-‐related dispari8es in health." Social Science & Medicine 66(9): 2048-‐2060.

Data: schools Goldstein, H. and D. J. Spiegelhalter (1996). "League Tables and Their Limita8ons: Sta8s8cal Issues in Comparisons of Ins8tu8onal Performance." Journal of the Royal Sta8s8cal Society. Series A (Sta8s8cs in Society) 159(3): 385-‐443.

Data: panel surveys, repeated

measures

Yang, M., H. Goldstein, et al. (2000). "Mul8level Models for Repeated Binary Outcomes: Agtudes and Vo8ng over the Electoral Cycle." Journal of the Royal Sta8s8cal Society. Series A (Sta8s8cs in Society) 163(1): 49-‐62.

Data: growth curves

Steele, F. (2008). "Mul8level models for longitudinal data." Journal of the Royal Sta8s8cal Society: Series A (Sta8s8cs in Society) 171(1): 5-‐19.

Data: surveys with mul8ple stage sampling

McNay, K., P. Arokiasamy, et al. (2003). "Why Are Uneducated Women in India Using Contracep8on? A Mul8level Analysis." Popula8on Studies 57(1): 21-‐40.

Vocabulary

•  Popula8ons 1.  Hierarchical 2.  Nested 3.  Cross-‐classified 4.  Mul8level

1. and 2. are interchangeable; 4. usually incorporates 1., 2., 3.

•  Models – Mul8level –  Hierarchical (linear – HLM)

– Mixed –  Random coefficients, intercept, effects

–  Variance components –  Subject/unit specific

Smoking and birthweight data

•  Does smoking during pregnancy affect infant birthweight?

•  Here level 1 is the child, level 2 is the mother Child 1,

Mother 1

Mother 1

Child 2,

Mother 1

xij

i =1,..,nj

j =1,.., J

How much variance at each level?

•  When we have two levels we can define the overall variance, compu8ng devia8ons from the overall mean across the whole dataset

!!sxO2 =

1N −1

xij − x..( )2i=1

nj

∑j=1

J

∑

!!x..=

1N

xiji=1

nj

∑j=1

J

∑


•  The between variance (level 2) is

!!sxB2 =

1J −1

x. j − x..( )2j=1

J

∑

!!x. j =

1nj

xiji=1

nj

∑


•  The within variance (level 1) is

!!sxW2 =

1N −1

xij − x. j( )2i=1

nj

∑j=1

J

∑

!!sxO2 = sxB

2 + sxW2


STATA CODE Data at hpp://www.stata-‐press.com/data/mlmus3.html use smoking xtsum birwt smoke black, i(momid) xtreg birwt, i(momid) mle NOTE THE i/j inversion!



The variance-‐components model

•  Measurement of subject j in occasion i

•  A regression model without covariates is

!!

�

yij

!!

�

yij = β+ξij


•  A more appropriate two-‐level regression model that decomposes the error term in occasion-‐specific and subject-‐specific factors is

•  is the random devia8on of subject’s j mean measurement from the overall mean

•  à random effect or random intercept

!!

�

yij = β +ζ j +ε ij

!!

�

ζ j

�

β


!!!!

�

E ζ j( ) = 0

V ζ j( ) =ψ!!!!

�

E ε ij( ) = 0

V ε ij( ) =ϑ

!!

�

ζ j

�

β

!!!!

�

ε1 j

!!!!

�

ε2 j



•  Usually a normality (and across-‐level independence) assump8on is made

!!!!

�

ζ j

ε ij

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟ ~ N

00

⎛

⎝ ⎜ ⎞

⎠ ⎟ ,ψ 00 θ

⎛

⎝ ⎜

⎞

⎠ ⎟

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟


•  The total variance is the sum of the two variance components

•  The between-‐subjects share of variance is

!!

�

V yij( ) =V ζ j +ε ij( ) =ψ +θ

!!

�

ρ =V ζ j( )V yij( ) =

ψψ +θ


•  Within a subject, condi8onally on the subject-‐specific random effect, observa8ons are independent

!!!!

�

corr yij,y ′ i! jζ j( ) = 0 i ≠ ′ i!


•  Let us get the uncondi8onal correla8on

!!!!

�

cov yij,y ′ i! j( ) = E yij −E yij( )( ) y ′ i! j −E y ′ i! j( )( )⎛ ⎝ ⎜ ⎞

⎠ ⎟

= E ζ j +ε ij( ) ζ j +ε ′ i! j( )( ) = E ζ j2( ) =ψ


•  Let us get the uncondi8onal correla8on

!!!!

�

corr yij,y ′ i! j( ) =ψ

V yij( ) V y ′ i! j( )=

ψ

ψ +θ( ) ψ +θ( )

!!!!

�


ψ +θ


•  This is the intraclass correla$on coefficient

•  Es8mated through…

!!!!

�


ψ +θ= ρ

ρ̂ =ψ̂

ψ̂ + θ̂


•  If intraclass coefficient is low (i.e. not significantly different from zero) there is no need to have a more complex variance-‐components modelàthe need for a mul8level model is testable


Random-‐intercept models with covariates

•  How to extend a linear regression model to a mul8level segng?

•  How to show the rela8ve importance of level-‐1 and level-‐2 covariates?

•  àinsert xs on the right side of the equa8on


•  Measurement of subject j in occasion i

•  A regression model with covariates is

!!

�

yij

!!yij = β

1+ β

2x2ij + β3

x3ij + ...+ βpxpij +ξij


•  If the variance can be decomposed in level-‐1 and level-‐2 factors then:

•  With the same hypotheses on random components as we had for variance-‐components models withouth covariates

!!yij = β

1+ β

2x2ij + β3

x3ij + ...+ βpxpij +ζ j + ε ij


•  Another way is to see the subject-‐specific random intercept explicitly

•  With the same hypotheses on random components as we had for variance-‐components models withouth covariates

!!yij = β

1+ζ j( )+ β2

x2ij + β3

x3ij + ...+ βpxpij + ε ij


•  Exogeneity assump8ons

•  So that

!!!E ζ j xij( ) = 0

!!!E ε ij ζ j ,xij( ) = 0

!!!E yij xij( ) = β

1+ β

2x2ij + β3

x3ij + ...+ βpxpij


•  And

•  On distribu8ons à normality assump8on, absence of correla8on at the same level and across levels (we keep same nota8on)

!!!E yij xij ,ζ j( ) = β

1+ β

2x2ij + β3

x3ij + ...+ βpxpij +ζ j



•  Variances and covariances here are condi$onal

!!V yij xij( ) =ψ +θ

!!!ρ = corr yij ,yi ' j xij ,xi ' j( ) = ψ

ψ +θ


xtreg birwt smoke male hsgrad married black, i(momid) mle xtmixed birwt smoke male hsgrad married black, || momid:, mle





•  What is the propor8on of variance explained (R-‐squared)? There are two (1 denotes a model and 0 a model without covariates):

!!R22 =

ψ̂0−ψ̂

1

ψ̂0

!!R12 =

θ̂0−θ̂

1

θ̂0



•  Here

!!R22 =

ψ̂0−ψ̂

1

ψ̂0

= 0.142

!!R12 =

θ̂0−θ̂

1

θ̂0

= 0.025


•  Covariates that vary only at level 2 affect only level 2 variance (except computa8on)

•  Covariates that vary at level 1 might affect both variances (because part of the level 2 variance might be due to composi$onal effects related to level 1 values…)

Between and within effects

•  The es8mate of the effect of smoking is not comparing mothers (between) nor children or the same mother when the smoking status changes (within)… Indeed random-‐intercept model es8mates are averages between between and within es8mates

•  We might want to es8mate the effect comparing mothers who smoke (between-‐mother effect)àequivalent to running a model using only the average value for mothers



•  We might want to es8mate the effect when a mother switches status (within-‐mother effect)àrun a model subtrac8ng the between-‐mother effect

•  Covariates are centered around each mother’s mean (random effect version)

•  The fixed effect alterna8ve is to build J dummies (can be very high)



•  There may be some endogeneity problems e.g. when there is a correla8on between cluster-‐level residuals and a covariate – mothers who smoke during pregnancy might also be more prone to adopt other behaviors that are nega8vely affec8ng birth weight

•  This can be solved by using the difference from the cluster mean for a variable instead of the variable


•  The difference from the cluster mean is an instrumental variable because it is correlated with the variable but not with the cluster mean

•  If you are concerned with endogeneity this has to be done for all covariates…


egen mn_smok=mean(smoke), by(momid) gen dev_smok=smoke-‐mn_smok xtreg birwt dev_smok mn_smok male hsgrad married black, i(momid) mle


Random-‐coefficient models

•  Now we add random coefficients or random slopes to random intercept

•  The effect of covariates might therefore vary across level-‐2 units

•  Typical applica8on: school effec8veness –  In Britain GCSE (Graduate Cer8ficate of Secondary Educa8on) is a standardized test at age 16

– LRT (London Reading Test) is a standardized test at age 11


•  How can we study the rela8onship between GCSE and LRT scores?

•  We could start on a school-‐by-‐school basis, with a linear regression

•  E.g. school 1


use gcse, clear reg gcse lrt if school==1 predict p_gcse, xb twoway (scaper gcse lrt) (line p_gcse lrt, sort) if school==1, x8tle(LRT) > y8tle(GCSE)





•  Now, all schools

statsby inter=_b[_cons] slope=_b[lrt], by(school) saving(ols_gcse): reg gcse > lrt if num>4 sort school merge school using ols_gcse twoway scaper slope inter, x8tle(Intercept) y8tle(Slope) egen pickone=tag(school) sum inter slope if pickone==1 corr inter slope if pickone==1, covariance




•  Now, all schools

gen pred=inter+slope*lrt sort school lrt twoway (line pred lrt, connect(ascending)), x8tle(LRT) y8tle(Fiped > regression lines)



•  We now create a true mul8level model with a random slope

!!yij = β

1+ β

2xij +ζ1 j +ζ2 j xij + ε ij

!!yij = β

1+ζ

1 j( )+ β2+ζ

2 j( )xijij + ε ij


•  We assume

!!

E ζ1 j xij( ) = 0

E ζ2 j xij( ) = 0

E ε ij xij ,ζ1 j ,ζ2 j( ) = 0


•  We assume

•  and a joint normal distribu8on

!!

COVζ1 j

ζ2 j

⎛

⎝⎜⎜

⎞

⎠⎟⎟=

ψ11

ψ12

ψ12

ψ22

⎡

⎣⎢⎢

⎤

⎦⎥⎥



Documents

Summer&School&on&Longitudinal&and& LifeCourse&Studies& … · 2019. 12. 4. · Summer&School&on&Longitudinal&and& LifeCourse&Studies& A&(short)&introduc8on&to&Mul8level& (and&Longitudinal)&Modelling&–1&