Page 1: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

All rights reserved. No part of this material may be reproduced or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission from Statistical Innovations Inc.

Introduction to Latent Class Modeling using Latent GOLD

Copyright © 2012 by Statistical Innovations Inc.





Page 2: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

Mhurhq N1 Yhupxqw

Ghsduwphqw ri Phwkrgrorj| dqg Vwdwlvwlfv/ Wloexuj Xqlyhuvlw|

Dgguhvv ri fruuhvsrqghqfh=

Mhurhq N1 Yhupxqw

Ghsduwphqw ri Phwkrgrorj| dqg Vwdwlvwlfv

Wloexuj Xqlyhuvlw|

S1R1Er{ <3486

8333 OH Wloexuj

Wkh Qhwkhuodqgv

H0pdlo= M1N1YhupxqwCxyw1qo

Dffhswhg e| Vrflrorjlfdo Phwkrgrorj| +Qryhpehu 7/ 5335,

Pxowlohyho Odwhqw Fodvv Prghov

Excerpts from:


Page 3: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

Pxowlohyho Odwhqw Fodvv Prghov


Odwhqw fodvv +OF, prghov ghyhorshg vr idu dvvxph wkdw revhuydwlrqv duh lqghshqghqw1 Sdud0

phwulf dqg qrqsdudphwulf udqgrp0frh!flhqw OF prghov duh sursrvhg wkdw pdnh lw srvvleoh wr

uhod{ wklv dvvxpswlrq1 Wkh prghov fdq/ iru h{dpsoh/ eh xvhg iru wkh dqdo|vlv ri gdwd froohfwhg

zlwk frpsoh{ vdpsolqj ghvljqv/ gdwd zlwk d pxowlohyho vwuxfwxuh/ dqg pxowlsoh0jurxs gdwd iru

pruh wkdq d ihz jurxsv1 Dq dgdswhg HP dojrulwkp lv suhvhqwhg wkdw pdnhv pd{lpxp olnh0

olkrrg hvwlpdwlrq ihdvleoh1 Wkh qhz prgho lv looxvwudwhg zlwk h{dpsohv iurp rujdql}dwlrqdo/

hgxfdwlrqdo/ dqg furvv0qdwlrqdo frpsdudwlyh uhvhdufk1

Assigned Reading: Latent Class: C1


Page 4: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

Pxowlohyho Odwhqw Fodvv Prghov

4 Lqwurgxfwlrq

Lq wkh sdvw ghfdgh/ odwhqw fodvv +OF, dqdo|vlv +Od}duvihog/ 4<83> Jrrgpdq/ 4<:7, kdv ehfrph

d pruh zlgho| xvhg whfkqltxh lq vrfldo vflhqfhv uhvhdufk1 Rqh ri lwv dssolfdwlrqv lv foxvwhulqj

ru frqvwuxfwlqj w|srorjlhv zlwk revhuyhg fdwhjrulfdo yduldeohv1 Dqrwkhu uhodwhg dssolfdwlrq lv

ghdolqj zlwk phdvxuhphqw huuru lq qrplqdo dqg ruglqdo lqglfdwruv1 Dq lpsruwdqw olplwdwlrq ri

wkh OFprghov ghyhorshg vr idu lv/ krzhyhu/ wkdw wkh| dvvxph wkdw revhuydwlrqv duh lqghshqghqw/

dq dvvxpswlrq wkdw lv riwhq ylrodwhg1 Wklv sdshu suhvhqwv d udqgrp0frh!flhqwv ru pxowlohyho

OF prgho wkdw pdnhv lw srvvleoh wr uhod{ wklv dvvxpswlrq1

Udqgrp0frh!flhqwv prghov fdq eh xvhg wr ghdo zlwk ydulrxv w|shv ri ghshqghqw revhuydwlrqv

+Djuhvwl hw do1/ 5333,1 Dq h{dpsoh lv ghshqghqw revhuydwlrqv lq gdwd vhwv froohfwhg e| wzr0vwdjh

foxvwhu vdpsolqj ru orqjlwxglqdo ghvljqv1 D pruh wkhru| edvhg xvh ri udqgrp0frh!flhqwv prghov

wkdw lv srsxodu lq �hogv vxfk dv hgxfdwlrqdo dqg rujdql}dwlrqdo uhvhdufk lv riwhq uhihuuhg wr dv

pxowlohyho ru klhudufklfdo prgholqj/ d phwkrg wkdw lv lqwhqghg wr glvhqwdqjoh jurxs0ohyho iurp

lqglylgxdo0ohyho h�hfwv +Eu|n dqg Udxghqexvk/ 4<<5> Jrogvwhlq/ 4<<8> Vqlmghuv dqg Ervnhu/

4<<<,1 Dqrwkhu lqwhuhvwlqj dssolfdwlrq ri wkhvh phwkrgv lv lq wkh frqwh{w ri pxowlsoh0jurxs

dqdo|vlv/ vxfk dv lq furvv0qdwlrqdo frpsdudwlyh uhvhdufk edvhg rq gdwd iurp d odujh qxpehu ri

frxqwulhv +vhh/ iru h{dpsoh/ Zrqj dqg Pdvrq/ 4<;8,1

Lq d vwdqgdug OF prgho/ lw lv dvvxphg wkdw wkh prgho sdudphwhuv duh wkh vdph iru doo

shuvrqv +ohyho04 xqlwv,1 Wkh edvlf lghd ri d pxowlohyho OF prgho lv wkdw vrph ri wkh prgho

sdudphwhuv duh doorzhg wr gl�huv dfurvv jurxsv/ foxvwhuv/ ru ohyho05 xqlwv1 Iru h{dpsoh/ wkh

suredelolw| ri ehorqjlqj wr d fhuwdlq odwhqw fodvv pd| gl�hu dfurvv rujdql}dwlrqv ru frxqwulhv1

Vxfk gl�huhqfhv fdq eh prghoohg e| lqfoxglqj jurxs gxpplhv lq wkh prgho/ dv lv grqh lq


Page 5: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

pxowlsoh0jurxs OF dqdo|vlv +Forjj dqg Jrrgpdq/ 4<;7,/ zklfk dprxqwv wr xvlqj zkdw lv

fdoohg d �{hg0h�hfwv dssurdfk1 Dowhuqdwlyho|/ lq d udqgrp0h�hfwv dssurdfk/ wkh jurxs0vshfl�f

frh!flhqwv duh dvvxphg wr frph iurp d sduwlfxodu glvwulexwlrq/ zkrvh sdudphwhuv vkrxog eh

hvwlpdwhg1 Ghshqglqj rq zkhwkhu wkh irup ri wkh pl{lqj glvwulexwlrq lv vshfl�hg ru qrw/ hlwkhu

d sdudphwulf ru d qrqsdudphwulf udqgrp0h�hfwv dssurdfk lv rewdlqhg1

Wkh sursrvhg pxowlohyho OF prgho lv vlplodu wr d udqgrp0frh!flhqwv orjlvwlf uhjuhvvlrq

prgho +Zrqj dqg Pdvrq/ 4<;8> Khghnhu dqg Jleerqv/ 4<<9> Khghnhu/ 4<<<> Djuhvwl hw do1/

5333,1 D gl�huhqfh lv wkdw wkh ghshqghqw yduldeoh lv qrw gluhfwo| revhuyhg/ exw d odwhqw yduldeoh

zlwk vhyhudo revhuyhg lqglfdwruv1 Wkh prgho fdq/ wkhuhiruh/ eh vhhq dv dq h{whqvlrq ri d udqgrp0

frh!flhqwv orjlvwlf uhjuhvvlrq prgho lq zklfk wkhuh lv phdvxuhphqw huuru lq wkh ghshqghqw

yduldeoh1 Lw lv zhoo0nqrzq wkdw OF prghov fdq eh xvhg wr frpelqh wkh lqirupdwlrq frqwdlqhg

lq pxowlsoh rxwfrph yduldeohv +Edqghhq0Urfkh hw do1/ 4<<:,1

Wkh prgho lv dovr vlplodu wr wkh pxowlohyho lwhp uhvsrqvh wkhru| +LUW, prgho wkdw zdv

uhfhqwo| sursrvhg e| Ir{ dqg Jodv +5334,1 D frqfhswxdo gl�huhqfh lv/ krzhyhu/ wkdw lq LUW

prghov wkh xqghuo|lqj odwhqw yduldeohv duh dvvxphg wr eh frqwlqxrxv lqvwhdg ri glvfuhwh1 Eh0

fdxvh ri wkh vlplodulw| ehwzhhq OF dqg LUW prghov +vhh/ iru h{dpsoh/ Khlqhq/ 4<<9,/ lw lv

qrw vxusulvlqj wkdw uhvwulfwhg pxowlohyho OF prghov fdq eh xvhg wr dssur{lpdwh pxowlohyho LUW


Wkh lghd ri lqwurgxflqj udqgrp h�hfwv lq OF dqdo|vlv lv qrw qhz= vhh/ iru h{dpsoh/ Tx/ Wdq/

dqg Nxwqhu +4<<9,/ dqg Ohqn dqg GhVduer +5333,1 Wkh prghov sursrvhg e| wkhvh dxwkruv duh/

krzhyhu/ qrw pxowlohyho prghov/ dqg wkhuhiruh frqfhswxdoo| dqg pdwkhpdwlfdoo| yhu| gl�huhqw

iurp wkh prghov ghvfulehg lq wklv sdshu1 Dowkrxjk wkhuh lv qr rswlrq iru frpelqlqj OF vwuxf0

wxuhv zlwk udqgrp h�hfwv lq wkh fxuuhqw yhuvlrq ri wkh JOODPP surjudp ri Udeh0Khvnhwk/

Slfnohv/ dqg Vnurqgdo +5334,/ wkh prgho L sursrvh �wv yhu| qdwxudoo| lqwr wkh jhqhudo pxowlohyho


Page 6: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

prgholqj iudphzrun ghyhorshg e| wkhvh dxwkruv1

Wkh pxowlohyho OF prgho fdq eh uhsuhvhqwhg dv d judsklfdo ru sdwk prgho frqwdlqlqj rqh

odwhqw yduldeoh shu udqgrp frh!flhqw dqg rqh odwhqw yduldeoh shu ohyho04 xqlw zlwklq d ohyho05

xqlw1 Wkh idfw wkdw wkh prgho frqwdlqv vr pdq| odwhqw yduldeohv pdnhv wkh xvh ri d vwdqgdug HP

dojrulwkp iru pd{lpxp olnholkrrg +PO, hvwlpdwlrq lpsudfwlfdo1 Wkh PO hvwlpdwlrq sureohp

fdq/ krzhyhu/ eh vroyhg e| pdnlqj xvh ri wkh frqglwlrqdo lqghshqghqfh dvvxpswlrqv lpsolhg e|

wkh judsklfdo prgho1 Pruh suhflvho|/ L dgdswhg wkh H vwhs ri wkh HP dojrulwkp wr wkh vwuxfwxuh

ri wkh pxowlohyho OF prgho1

Wkh qh{w vhfwlrq ghvfulehv wkh pxowlohyho OF prgho1 Wkhq/ dwwhqwlrq lv sdlg wr hvwlpdwlrq

lvvxhv wkdw duh vshfl�f iru wklv qhz prgho1 Vhfwlrq 7 suhvhqwv dssolfdwlrqv iurp rujdql}dwlrqdo/

hgxfdwlrqdo/ dqg furvv0qdwlrqdo frpsdudwlyh uhvhdufk1 Wkh sdshu hqgv zlwk d vkruw glvfxvvlrq1

5 Wkh pxowlohyho OF prgho

Ohw t � ��� ghqrwh wkh uhvsrqvh ri lqglylgxdo ru ohyho04 xqlw � zlwklq jurxs ru ohyho05 xqlw � rq

lqglfdwru ru lwhp &1 Wkh qxpehu ri ohyho05 xqlwv lv ghqrwhg e| a / wkh qxpehu ri ohyho04 xqlwv

zlwklq ohyho05 xqlw � e| ? � / dqg wkh qxpehu ri lwhpv e| g1 D sduwlfxodu ohyho ri lwhp & lv

ghqrwhg e| r � dqg lwv qxpehu ri fdwhjrulhv e| 7 � 1 Wkh odwhqw fodvv yduldeoh lv ghqrwhg e| f � � /

d sduwlfxodu odwhqw fodvv e| |/ dqg wkh qxpehu ri odwhqw fodvvhv e| A 1 Qrwdwlrq v � � lv xvhg wr

uhihu wr wkh ixoo yhfwru ri uhvsrqvhv ri fdvh � lq jurxs �/ dqg t wr uhihu wr d srvvleoh dqvzhu


Wkh suredelolw| vwuxfwxuh gh�qlqj d vlpsoh OF prgho fdq eh zulwwhq grzq dv iroorzv=

� Ev � � ' t� '


���� � Ef � � ' |�� Ev � � ' tmf � � ' |�



���� � Ef � � ' |�


� �� � Et � ��� ' r � mf � � ' |�� +4,

Wkh suredelolw| ri revhuylqj d sduwlfxodu uhvsrqvh sdwwhuq/ � Ev � � ' t�/ lv d zhljkwhg dyhudjh


Page 7: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

ri fodvv0vshfl�f suredelolwlhv � Ev � � ' tmf � � ' |�1 Wkh zhljkw � Ef � � ' |� lv wkh suredelolw|

wkdw shuvrq � lq jurxs � ehorqjv wr odwhqw fodvv |1 Dv fdq eh vhhq iurp wkh vhfrqg olqh/ wkh

lqglfdwruv t � ��� duh dvvxphg wr eh lqghshqghqw ri hdfk rwkhu jlyhq fodvv phpehuvkls/ zklfk

lv riwhq uhihuuhg wr dv wkh orfdo lqghshqghqfh dvvxpswlrq1 Wkh whup � Et � ��� ' r � mf � � ' |� lv

wkh suredelolw| ri revhuylqj uhvsrqvh r � rq lwhp & jlyhq wkdw wkh shuvrq frqfhuqhg ehorqjv wr

odwhqw fodvv |1 Wkhvh frqglwlrqdo uhvsrqvh suredelolwlhv duh xvhg wr qdph wkh odwhqw fodvvhv1

Wkh jhqhudo gh�qlwlrq lq htxdwlrq +4, dssolhv wr erwk wkh vwdqgdug dqg wkh pxowlohyho OF

prgho1 Lq rughu wr eh deoh wr glvwlqjxlvk wkh wzr/ wkh prgho suredelolwlhv kdyh wr eh zulwwhq

lq wkh irup ri orjlw htxdwlrqv1 Lq wkh vwdqgdug OF prgho/

� Ef � � ' |� 'i TE� � �


�� �� i TE� � � +5,

� Et � ��� ' r � mf � � ' |� 'i TEq

� �� � �

S � �� �� i TEq�� � �

� +6,

Dv dozd|v/ lghqwli|lqj frqvwudlqwv kdyh wr eh lpsrvhg rq wkh orjlw sdudphwhuv/ iru h{dpsoh/

� � ' q���� ' f1

Wkh idfw wkdw wkh � dqg q sdudphwhuv dsshdulqj lq htxdwlrqv +5, dqg +6, gr qrw kdyh dq

lqgh{ � lqglfdwhv wkdw wkhlu ydoxhv duh dvvxphg wr eh lqghshqghqw ri wkh jurxs wr zklfk rqh

ehorqjv1 Wdnlqj lqwr dffrxqw wkh pxowlohyho vwuxfwxuh lqyroyhv uhod{lqj wklv dvvxpswlrq1 Wkh

prvw jhqhudo pxowlohyho OF prgho lv rewdlqhg e| dvvxplqj wkdw doo prgho sdudphwhuv duh jurxs

vshfl�f> wkdw lv/

� Ef � � ' |� 'i TE� � � �


�� �� i TE� � � � +7,

� Et � ��� ' r � mf � � ' |� 'i TEq

� �� � � �

S � �� �� i TEq�� � � �

� +8,

Zlwkrxw ixuwkhu uhvwulfwlrqv/ wklv prgho lv htxlydohqw wr dq xquhvwulfwhg pxowlsoh0jurxs OF

prgho +Forjj dqg Jrrgpdq/ 4<;7,1 D pruh uhvwulfwhg pxowlsoh0jurxs OF prgho lv rewdlqhg

e| dvvxplqj wkdw wkh lwhp frqglwlrqdo suredelolwlhv gr qrw ghshqg rq wkh ohyho05 xqlw> wkdw


Page 8: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

lv/ e| frpelqlqj vshfl�fdwlrqv +7, dqg +6,1 Lq sudfwlfh/ vxfk d sduwldoo| khwhurjhqhrxv prgho

dvvxplqj lqyduldqw phdvxuhphqw huuru lv wkh prvw xvhixo vshfl�fdwlrq/ dowkrxjk lw lv qrw d

sureohp wr uhod{ wklv dvvxpswlrq iru vrph ri wkh lqglfdwruv1

Lw zloo/ krzhyhu/ eh fohdu wkdw vxfk d pxowlsoh0jurxs ru �{hg0h�hfwv dssurdfk pd| eh sure0

ohpdwlf li wkhuh duh pruh wkdq d ihz jurxsv ehfdxvh jurxs0vshfl�f hvwlpdwhv kdyh wr eh rewdlqhg

iru fhuwdlq prgho sdudphwhuv1 Qrw rqo| wkh qxpehu ri sdudphwhuv wr eh hvwlpdwhg lqfuhdvhv

udslgo| zlwk wkh qxpehu ri ohyho05 xqlwv/ wkh hvwlpdwhv pd| dovr eh yhu| xqvwdeoh zlwk jurxs

vl}hv wkdw duh w|slfdo lq pxowlohyho uhvhdufk1 Dqrwkhu glvdgydqwdjh ri wkh �{hg0h�hfwv dssurdfk

lv wkdw doo jurxs gl�huhqfhv duh �h{sodlqhg� e| wkh jurxs gxpplhv/ pdnlqj lw lpsrvvleoh wr

ghwhuplqh wkh h�hfwv ri ohyho05 fryduldwhv rq wkh suredelolw| ri ehorqjlqj wr d fhuwdlq odwhqw

fodvv1 Ehorz L vkrz krz wr lqfoxgh vxfk fryduldwhv lq wkh prgho1

D sdudphwulf dssurdfk

Wkh sureohpv dvvrfldwhg zlwk wkh pxowlsoh0jurxs dssurdfk fdq eh wdfnohg e| dgrswlqj d

udqgrp0h�hfwv dssurdfk= udwkhu wkdq hvwlpdwlqj d vhsdudwh vhw ri sdudphwhuv iru hdfk jurxs/

wkh jurxs0vshfl�f h�hfwv duh dvvxphg wr frph iurp d fhuwdlq glvwulexwlrq1 Ohw xv orrn dw wkh

vlpsohvw fdvh= d wzr0fodvv prgho zlwk jurxs0vshfl�f fodvv0phpehuvkls suredelolwlhv dv gh�qhg

e| htxdwlrq +7,/ dqg zlwk � � � ' f iru lghqwl�fdwlrq1 W|slfdoo|/ udqgrp frh!flhqwv duh dvvxphg

wr frph iurp d qrupdo glvwulexwlrq/ |lhoglqj d OF prgho lq zklfk

� � � ' � � n � � � �� c +9,

zlwk �� � �Efc ��1 Qrwh wkdw wklv dprxqwv wr dvvxplqj wkdw wkh ehwzhhq0jurxs yduldwlrq lq wkh

orj rggv ri ehorqjlqj wr wkh vhfrqg lqvwhdg ri wkh �uvw odwhqw fodvv iroorzv d qrupdo glvwulexwlrq

zlwk d phdq htxdo wr � � dqg d vwdqgdug ghyldwlrq htxdo wr � � 1

Zlwk pruh wkdq wzr odwhqw fodvvhv/ rqh kdv wr vshfli| wkh glvwulexwlrq ri wkh A � � udqgrp0


Page 9: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

D qrqsdudphwulf dssurdfk

D glvdgydqwdjh ri wkh suhvhqwhg udqgrp0h�hfwv dssurdfk lv wkdw lw pdnhv txlwh vwurqj dvvxps0

wlrqv derxw wkh pl{lqj glvwulexwlrq1 Dq dwwudfwlyh dowhuqdwlyh lv/ wkhuhiruh/ wr zrun zlwk d

glvfuhwh xqvshfl�hg pl{lqj glvwulexwlrq1 Wklv |lhogv d qrqsdudphwulf udqgrp0frh!flhqwv OF

prgho lq zklfk wkhuh duh qrw rqo| odwhqw fodvvhv ri ohyho04 xqlwv exw dovr odwhqw fodvvhv ri ohyho05

xqlwv vkdulqj wkh vdph sdudphwhu ydoxhv1 Vxfk dq dssurdfk grhv qrw rqo| kdyh wkh dgydqwdjh ri

ohvv vwurqj glvwulexwlrqdo dvvxpswlrqv dqg ohvv frpsxwdwlrqdo exughq +Yhupxqw dqg Ydq Glmn/

5334,/ lw pd| dovr �w ehwwhu wr wkh vxevwdqwlyh uhvhdufk sureohp dw kdqg1 Lq pdq| vhwwlqjv/ lw

lv pruh qdwxudo wr fodvvli| jurxsv +iru h{dpsoh/ frxqwulhv, lqwr d vpdoo qxpehu ri w|shv wkdq

wr sodfh wkhp rq d frqwlqxrxv vfdoh1

Lw vkrxog eh qrwhg wkdw zlwk qrqsdudphwulf L gr qrw phdq �glvwulexwlrq iuhh�1 Lq idfw/ wkh

qrupdo glvwulexwlrq dvvxpswlrq lv uhsodfhg e| d pxowlqrpldo glvwulexwlrq dvvxpswlrq1 Dffrug0

lqj wr Odlug +4<:;,/ d qrqsdudphwulf fkdudfwhul}dwlrq ri wkh pl{lqj glvwulexwlrq lv rewdlqhg e|

lqfuhdvlqj wkh qxpehu ri pdvv srlqwv wloo d vdwxudwlrq srlqw lv uhdfkhg1 Lq sudfwlfh/ krzhyhu/

rqh zloo zrun zlwk ihzhu odwhqw fodvvhv wkdw wkh pd{lpxp qxpehu wkdw fdq eh lghqwl�hg1

Ohw ` � ghqrwh wkh ydoxh ri jurxs � rq wkh odwhqw fodvv yduldeoh gh�qlqj wkh glvfuhwh pl{lqj

glvwulexwlrq1 Lq d qrqsdudphwulf dssurdfk/ wkh prgho iru wkh odwhqw fodvv suredelolw| htxdov

� Ef � � ' |m` � ' 6� 'i TE� ��� �


�� �� i TE� � � �

c +<,

zkhuh 6 ghqrwhv d sduwlfxodu pl{wxuh frpsrqhqw1 Ehvlghv wkh � frpsrqhqw0vshfl�f frh!0

flhqwv/ zh kdyh wr hvwlpdwh wkh vl}h ri hdfk frpsrqhqw/ ghqrwhg e| Z � 1 Qrwh wkdw zh fdq zulwh

� ��� dv

� ��� ' � � n � ��� c +43,

zkhuh wkh � ��� frph iurp dq xqvshfl�hg glvwulexwlrq zlwk � pdvv srlqwv1

Assigned Reading: Latent Class: C2


Page 10: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh


D qdwxudo h{whqvlrq ri wkh udqgrp0frh!flhqw OF prgho lqyroyhv lqfoxglqj ohyho04 dqg ohyho05

fryduldwhv wr suhglfw fodvv phpehuvkls1 Vxssrvh wkdw wkhuh lv rqh ohyho05 fryduldwh ~ � � dqg rqh

ohyho04 fryduldwh ~ � � � 1 D pxowlqrpldo orjlvwlf uhjuhvvlrq prgho iru f ��� zlwk d udqgrp lqwhufhsw

lv rewdlqhg e|

� Ef � � ' |m~ � � c ~ � � � � ' i TE� � � � n � ��� ~ � � n � � � ~ � � � �S

�� �� i TE� � � � n � � � ~ � � n � � � ~ � � � �

Wklv prgho lv dq h{whqvlrq ri wkh OF prgho zlwk frqfrplwdqw yduldeohv sursrvhg e| Gd|wrq

dqg PfUhdg| +4<;;,> wkdw lv/ d prgho frqwdlqlqj qrw rqo| �{hg exw dovr udqgrp h�hfwv1

Qrw rqo| wkh lqwhufhsw/ exw dovr wkh h�hfwv ri wkh ohyho04 fryduldwhv pd| eh dvvxphg wr eh

udqgrp frh!flhqwv1 D prgho zlwk d udqgrp vorsh lv rewdlqhg e| uhsodflqj � � � zlwk � � � � / dqg

pdnlqj fhuwdlq glvwulexwlrqdo dvvxpswlrqv derxw � � � � 1 Lq idfw/ dq| pxowlohyho prgho wkdw fdq

eh vshfl�hg iru dq revhuyhg qrplqdo rxwfrph yduldeoh fdq dovr eh dssolhg zlwk wkh odwhqw fodvv

yduldeoh/ zklfk lv lq idfw dq lqgluhfwo| revhuyhg qrplqdo rxwfrph yduldeoh1

Dovr wkh qrqsdudphwulf dssurdfk fdq hdvlo| eh h{whqghg wr lqfoxgh ohyho04 dqg ohyho05 fr0

yduldwhv1 Dq h{dpsoh lv

� Ef � � ' |m~ � � c ~ � � � c ` � ' 6� 'i TE� � ��� n � ����� ~ � � n � � � ~ � � � �


�� �� i TE� � � � n � � � � ~ � � n � � � ~ � � � �

Lq wklv prgho/ erwk wkh lqwhufhsw dqg wkh vorsh ri wkh ohyho04 fryduldwh duh dvvxphg wr ghshqg

rq wkh pl{wxuh yduldeoh ` � 1

Lwhp eldv

Wkh odvw h{whqvlrq L zrxog olnh wr phqwlrq lv wkh srvvlelolw| wr doorz iru jurxs gl�huhqfhv lq

wkh fodvv0vshfl�f frqglwlrqdo uhvsrqvh suredelolwlhv/ dv zdv douhdg| lqglfdwhg lq htxdwlrq +8,1

Lw pd| kdsshq wkdw fhuwdlq lwhpv duh uhvsrqghg lq d gl�huhqw pdqqhu e| lqglylgxdov ehorqjlqj

wr gl�huhqw jurxsv/ d skhqrphqrq wkdw lv vrphwlphv uhihuuhg wr dv lwhp eldv1


Page 11: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

Vwdqgdug huuruv

Frqwudu| wr Qhzwrq0olnh phwkrgv/ wkh HP dojrulwkp grhv qrw surylgh vwdqgdug huuruv ri wkh

prgho sdudphwhuv dv d e|0surgxfw1 Hvwlpdwhg dv|pswrwlf vwdqgdug huuruv fdq eh rewdlqhg e|

frpsxwlqj wkh revhuyhg lqirupdwlrq pdwul{/ wkh pdwul{ ri vhfrqg0rughu ghulydwlyhv ri wkh orj0

olnholkrrg ixqfwlrq wrzdugv doo prgho sdudphwhuv1 Wkh lqyhuvh ri wklv pdwul{ lv wkh hvwlpdwhg

yduldqfh0fryduldqfh pdwul{1 Iru wkh h{dpsohv suhvhqwhg lq wkh qh{w vhfwlrq/ L frpsxwhg wkh

qhfhvvdu| ghulydwlyhv qxphulfdoo|1

Wkh lqirupdwlrq pdwul{ fdq dovr eh xvhg wr fkhfn lghqwl�delolw|1 D vx!flhqw frqglwlrq iru

orfdo lghqwl�fdwlrq lv wkdw doo wkh hljhqydoxhv ri wkh lqirupdwlrq pdwul{ duh odujhu wkdq }hur1

Vriwzduh lpsohphqwdwlrq

Wkh pxowlohyho OF prgho fdqqrw eh hvwlpdwhg zlwk vwdqgdug vriwzduh iru OF dqdo|vlv1 Wkh

xszdug0grzqzdug dojrulwkp ghvfulehg lq wklv vhfwlrq zdv lpsohphqwhg lq dq h{shulphqwdo

yhuvlrq ri wkh Odwhqw JROG surjudp +Yhupxqw dqg Pdjlgvrq/ 5333,1 Wkh phwkrg zloo ehfrph

dydlodeoh lq d qh{w yhuvlrq ri wklv surjudp iru OF dqdo|vlv1

7 Wkuhh dssolfdwlrqv

Wkuhh dssolfdwlrqv ri wkh sursrvhg qhz phwkrg duh suhvhqwhg1 Wkhvh qrw rqo| looxvwudwh wkuhh

lqwhuhvwlqj dssolfdwlrq �hogv/ exw dovr wkh prvw lpsruwdqw prgho vshfl�fdwlrq rswlrqv1 Lq

wkh �uvw h{dpsoh/ L xvh gdwd iurp d Gxwfk vxuyh| lq zklfk hpsor|hhv ri ydulrxv whdpv duh

dvnhg derxw wkhlu zrun frqglwlrqv1 Pxowlohyho OF dqdo|vlv lv xvhg wr frqvwuxfw d wdvn0ydulhw|

vfdoh dqg wr ghwhuplqh wkh ehwzhhq0whdp khwhurjhqhlw| ri wkh odwhqw fodvv suredelolwlhv1 Wkh

vhfrqg h{dpsoh xvhv d Gxwfk gdwd vhw frqwdlqlqj lqirupdwlrq rq wkh pdwkhpdwlfdo vnloov ri

judgh ; sxslov iurp ydulrxv vfkrrov1 Wkh uhvhdufk txhvwlrq ri lqwhuhvw lv dv wr zkhwkhu vfkrro

gl�huhqfhv uhpdlq diwhu frqwuroolqj iru lqglylgxdo fkdudfwhulvwlfv vxfk dv qrq0yhuedo lqwhooljhqfh

Assigned Reading: Latent Class: C3


Page 12: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

dqg vrflrhfrqrplf vwdwxv ri wkh idplo|1 Lq wkh wklug h{dpsoh/ L xvh gdwd iurp wkh 4<<< Hxurshdq

Ydoxhv Vxuyh|1 Frxqwu| gl�huhqfhv lq wkh sursruwlrq ri srvw0pdwhuldolvwv duh prghoohg e| phdqv

udqgrp h�hfwv1 Frqwudu| wr wkh suhylrxv dssolfdwlrqv/ L dp qrw rqo| lqwhuhvwhg lq wkh ryhudoo

ehwzhhq0jurxs gl�huhqfhv/ exw dovr lq wkh odwhqw glvwulexwlrq iru hdfk ri wkh jurxsv +frxqwulhv,1

714 Rujdql}dwlrqdo uhvhdufk

Lq d Gxwfk vwxg| rq wkh h�hfw ri dxwrqrprxv whdpv rq lqglylgxdo zrun frqglwlrqv/ gdwd zhuh

froohfwhg iurp 74 whdpv ri wzr rujdql}dwlrqv/ d qxuvlqj krph dqg d grplfloldu| fduh rujdql}d0

wlrq1 Wkhvh whdpv frqwdlqhg ;;9 hpsor|hhv1 Iru wkh h{dpsoh/ L wrrn �yh glfkrwrpl}hg lwhpv

ri d vfdoh phdvxulqj shufhlyhg wdvn ydulhw| +Ydq Plhuor hw do1/ 5335,1 Wkh lwhp zruglqj lv dv

iroorzv +wudqvodwhg iurp Gxwfk,=

41 Gr |rx dozd|v gr wkh vdph wklqjv lq |rxu zrunB

51 Grhv |rxu zrun uhtxluh fuhdwlylw|B

61 Lv |rxu zrun glyhuvhB

71 Grhv |rxu zrun pdnh hqrxjk xvdjh ri |rx vnloov dqg fdsdflwlhvB

81 Lv wkhuh hqrxjk yduldwlrq lq |rxu zrunB

Wkh ruljlqdo lwhpv frqwdlqhg irxu dqvzhu fdwhjrulhv1 Lq rughu vlpsoli| wkh dqdo|vlv/ L fro0

odsvhg wkh �uvw wzr dqg wkh odvw wzr fdwhjrulhv1 Ehfdxvh vrph uhvsrqghqwv kdg plvvlqj ydoxhv

rq rqh ru pruh ri wkh lqglfdwruv/ L dgdswhg wkh PO hvwlpdwlrq surfhgxuh wr ghdo zlwk vxfk

sduwldoo| revhuyhg lqglfdwruv1

L zloo dqdo|}h wklv gdwd vhw e| phdqv ri OF dqdo|vlv1 Wklv phdqv wkdw L dp dvvxplqj

wkdw wkh uhvhdufkhu lv lqwhuhvwhg lq exloglqj d w|srorj| ri hpsor|hhv edvhg rq wkhlu shufhlyhg

wdvn ydulhw|1 Rq rwkhu kdqg/ li rqh zrxog eh lqwhuhvwhg lq frqvwuxfwlqj d frqwlqxrxv vfdoh/ d


Page 13: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

odwhqw wudlw dqdo|vlv zrxog eh pruh dssursuldwh1 Ri frxuvh/ dovr lq wkdw vlwxdwlrq wkh pxowlohyho

vwuxfwxuh vkrxog eh wdnhq lqwr dffrxqw1

Lq wkh dqdo|vlv ri wklv gdwd vhw/ L xvhg d vlpsoh xquhvwulfwhg OF prgho frpelqhg zlwk wkuhh

w|shv ri vshfl�fdwlrqv iru wkh ehwzhhq0whdp yduldwlrq lq wkh fodvv0phpehuvkls suredelolwlhv=

qr udqgrp h�hfwv/ sdudphwulf udqgrp h�hfwv dv gh�qhg lq htxdwlrq +9,/ dqg qrqsdudphwulf

udqgrp h�hfwv dv gh�qhg lq htxdwlrq +<,1

Wdeoh 4 uhsruwv wkh orj0olnholkrrg +OO, ydoxh/ wkh qxpehu ri sdudphwhuv/ dqg wkh ELF ydoxh

iru wkh prghov wkdw zhuh hvwlpdwhg1 L �uvw hvwlpdwhg prghov zlwkrxw udqgrp h�hfwv1 Wkh

ELF ydoxhv iru wkh rqh wr wkuhh fodvv prgho +Prghov 406, zlwkrxw udqgrp h�hfwv vkrz wkdw d

vroxwlrq zlwk wzr fodvvhv vx!fhv1 Vxevhtxhqwo|/ L lqwurgxfhg udqgrp h�hfwv lq wkh wzr0fodvv

prgho +Prghov 709,1 Iurp wkh uhvxowv rewdlqhg zlwk Prghov 7 dqg 8/ lw fdq eh vhhq wkdw wkhuh lv

fohdu hylghqfh iru ehwzhhq0whdp yduldwlrq lq wkh odwhqw glvwulexwlrq= Wkhvh prghov kdyh pxfk

orzhu ELF ydoxhv wkdq wkh wzr0fodvv prgho zlwkrxw udqgrp h�hfwv1 Wkh wzr0fodvv �qlwh0pl{wxuh

uhsuhvhqwdwlrq +Prgho 8, |lhogv d voljkwo| orzhu OO wkdq wkh sdudphwulf uhsuhvhqwdwlrq ri wkh

ehwzhhq0whdp yduldwlrq +Prgho 7,/ exw d vrphzkdw kljkhu ELF ehfdxvh lw xvhv rqh dgglwlrqdo

sdudphwhu1 Wkh wkuhh0fodvv �qlwh0pl{wxuh prgho +Prgho 9, kdv doprvw wkh vdph OO ydoxh dv

Prgho 8/ zklfk lqglfdwhv wkdw qr pruh wkdq wzr odwhqw fodvvhv ri whdpv fdq eh lghqwl�hg1


Page 14: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

�uvw fodvv kdv d pxfk orzhu frqglwlrqdo uhvsrqvh suredelolw| wkdq fodvv wzr rq hdfk ri wkh

fruuhvsrqg wr wkh kljk wdvn0ydulhw| uhvsrqvh +glvdjuhh iru lwhp 4 dqg djuhh iru wkh rwkhuv,1 Wkh

uhvsrqvh suredelolwlhv ghvfulelqj wkh uhodwlrqvkls ehwzhhq wkh odwhqw yduldeoh dqg wkh lqglfdwruv

Wdeoh 5 uhsruwv wkh sdudphwhu hvwlpdwhv rewdlqhg zlwkPrghov 7 dqg 8/ zkhuh wkh frqglwlrqdo

wrwdo yduldqfh lv h{sodlqhg e| whdp phpehuvkls1

Wkh lqwudfodvv fruuhodwlrq rewdlqhg zlwk htxdwlrq +;, htxdov 14</ zklfk phdqv wkdw 4<( ri wkh

Wkhvh qxpehuv lqglfdwh wkdw wkhuh lv d txlwh odujh whdp h�hfw rq wkh shufhlyhg wdvn ydulhw|1

dqg xsshu 43( wdlov ri wkh qrupdo glvwulexwlrq/ zh jhw odwhqw fodvv suredelolwlhv ri 174 dqg 1;91

+7,1 Iru h{dpsoh/ zlwk � htxdo wr 0415; dqg 415;/ wkh 5 ydoxhv fruuhvsrqglqj wr wkh orzhu

fdq �oo lq d ydoxh iru �

vwdqgdug ghyldwlrq ri wkh orj rggv ri ehorqjlqj wr fodvv 5 htxdo 31:6 dqg 31;:/ uhvshfwlyho|1

Lq wkh sdudphwulf vshfl�fdwlrq ri wkh ehwzhhq0whdp khwhurjhqhlw| +Prgho 7,/ wkh phdq dqg

ri wkh pl{wxuh glvwulexwlrq1

h�hfwv prghov/ zklfk vkrzv wkdw wkh gh�qlwlrq ri wkh fodvvhv lv txlwh urexvw iru wkh vshfl�fdwlrq

Qrwh wkdw wkh frqglwlrqdo uhvsrqvh suredelolwlhv wdnh rq doprvw wkh vdph ydoxhv lq erwk udqgrp0

lqglfdwruv1 Wkh wzr fodvvhv fdq wkhuhiruh eh qdphg �orz wdvn0ydulhw|� dqg �kljk wdvn0ydulhw|�1

lq htxdwlrq +9, dqg vxevwlwxwh wkh rewdlqhg ydoxh iru � lq htxdwlrq

Lq rughu wr jhw dq lpsuhvvlrq ri wkh phdqlqj ri wkhvh qxpehuv rq wkh suredelolw| vfdoh/ rqh





Page 15: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

zlwk uhvshfw wr wkh glvwulexwlrq ri wkh whdp phpehuv ryhu wkh wzr w|shv ri hpsor|hhv1

wkdw wkhuh duh wzr w|shv ri hpsor|hhv dqg wzr w|shv ri whdpv1 Wkh wzr w|shv ri whdpv gl�hu

shufhswlrq ri wkh ydulhw| ri wkh zrun1 Wkh vxevwdqwlyh frqfoxvlrq edvhg rq Prgho 8 zrxog eh

gl�huhqw pdqqhu/ erwk vkrz wkdw wkhuh duh odujh ehwzhhq0whdp gl�huhqfhv lq wkh lqglylgxdo

wkh sdudphwulf dqg qrq0sdudphwulf dssurdfk fdswxuh wkh yduldwlrq dfurvv whdpv lq d vrphzkdw

dqg 415</ uhvshfwlyho|1 Wkhvh orj rggv fruuhvsrqg wr suredelolwlhv ri 174 dqg 1:;1 Dowkrxjk

96 shufhqw ri wkh whdpv1 Wkhlu orj rggv ri ehorqjlqj wr wkh kljk wdvn0ydulhw| fodvv duh 0168

Wkh pl{wxuh frpsrqhqwv lq wkh wzr0fodvv �qlwh0pl{wxuh prgho +Prgho 8, frqwdlqhg 6: dqg


Page 16: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

Basic and Advanced1

Jeroen K. Vermunt and Jay Magidson

Statistical Innovations Inc.

(617) 489-4490


Excerpts from:

Excerpts from:


Technical Guide for Latent GOLD 5.0:

This document should be cited as “J.K. Vermunt and J. Magidson(2013). Technical Guide for Latent GOLD 5.0: Basic and Advanced. BelmontMassachusetts:StatisticalInnovations Inc.”

Page 17: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

sum of the parameters of the three other categories. This guarantees thatthe parameters sum to zero since

∑3p=1 βp −

∑3p=1 βp = 0.

Instead of using effect coding, it is also possible to use dummy coding.Depending on whether one uses the first or the last category as referencecategory, the design matrix will look like this

category 1category 2category 3category 4

0 0 01 0 00 1 00 0 1


or thiscategory 1category 2category 3category 4

1 0 00 1 00 0 10 0 0


Whereas in effect coding the category-specific effects should be interpreted interms of deviation from the average, in dummy coding their interpretation isin terms of difference from the reference category. Note that the parameterfor the reference category is omitted, which implies that it is equated to 0.

2.5 Known-Class Indicator

Sometimes, one has a priori information – for instance, from an externalsource – on the class membership of some individuals. For example, in a four-class situation, one may know that case 5 belongs to latent class 2 and case11 to latent class 3. Similarly, one may have a priori information on whichclass cases do not belong to. For example, again in a four-class situation,one may know that case 19 does not belong to latent class 2 and that case 41does not belong to latent classes 3 or 4. In Latent GOLD, there is an option– called “Known Class” – for indicating to which latent classes cases do notbelong to.

Let τ i be a vector of 0-1 variables containing the “Known Class” infor-mation for case i, where τix = 0 if it is known that case i does not belong toclass x, and τix = 1 otherwise. The vector τ i modifies the general probabilitystructure defined in equation (1) as follows:

f(yi|zi, τ i) =K∑


τix P (x|zi) f(yi|x, zi) .

Assigned Reading: Technical G uide: Sec. 2.5


Page 18: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

As a result of this modification, the posterior probability of belonging to classx will be equal to 0 if τix = 0.

The known-class option has three important applications.

1. It can be used to estimate models with training cases; that is, casesfor which class membership has been determined using a gold standardmethod. Depending on how this training information is obtained, themissing data mechanism will be MCAR (Missing Completely At Ran-dom, where the known-class group is a random sample from all cases),MAR (Missing At Random, where the known-class group is a randomsample given observed responses and covariate values), or NMAR (NotMissing At Random, where the known-class group is a non-randomsample and thus may depend on class membership itself). MAR oc-curs, for example, in clinical applications in which cases with more thana certain number of symptoms are subjected to further examination toobtain a perfect classification (diagnosis). NMAR may, for example,occur if training cases that do not belong to the original sample underinvestigation are added to the data file.

Both in the MAR and MCAR situation, parameter estimates will be un-biased. In the NMAR situation, however, unbiased estimation requiresthat separate class sizes are estimated for training and non-trainingcases (McLachlan and Peel, 2000). This can easily be accomplishedby expanding the model of interest with a dichotomous covariate thattakes on the value 0 for training cases and 1 for non-training cases.

2. Another application is specifying models with a partially missing dis-crete variable that affects one or more response variables. An importantexample is the complier average causal effect (CACE) model proposedby Imbens and Rubin (1997), which can be used to determine the effectof a treatment conditional on compliance with the treatment. Compli-ance is, however, only observed in the treatment group, and is missingin the control group. In Latent GOLD, this CACE model can be speci-fied as a LC Regression model, in which class membership (compliance)is known for the treatment group, and which a treatment effect is spec-ified only for the compliance class.

3. The known-class indicator can also be used to specify multiple-group LCmodels. Suppose we have a three-class model and two groups, say males

3 Latent Class Cluster Models

nominal covariate.should not only be used as the known-class indicator, but also as afemales to classes 4–6. To get the correct output, the grouping variablethere are six latent classes, were males may belong to classes 1–3 andand females. A multiple-group LC model is obtained by indicating that


Page 19: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

proposed using random effects in LC Regression models for ranking data,and Muthen (2004) and Vermunt (2006) proposed including random effectsin LC growth models.

It has been observed that the solution of a LC Regression analysis may bestrongly affected by heterogeneity in the intercept. In rating-based conjointstudies, for example, it is almost always the case that respondents differ withrespect to the way they use the, say, 7-point rating scale: some respondentstend to give higher ratings than others, irrespective of the characteristics ofthe rated products. A LC Regression model captures this response hetero-geneity phenomenon via Classes with different intercepts. However, mostlikely, the analyst is looking for a relatively small number of latent classesthat differ in more meaningful ways with respect to predictor effects on theratings. By including a random intercept in the LC Regression model, forexample,

ηm,x,zit,F1i= βxm0 +


βx.q · y∗m · zpreditq + λx01 · y∗m · F1i,

it is much more likely that one will succeed in finding such meaningful Classes(segments). The random intercept, which may have a different effect in eachlatent class, will filter out (most of) the “artificial” variation in the intercept.

Another interesting application of random effects within latent classesoccurs in the context of LC growth modeling (Muthen, 2004; Vermunt, 2006).Suppose we have a model for a binary response variable measured at Toccasions in which zpred

it1 equals time and zpredit2 time squared. A LC growth

model with a random intercept and a random slope for the linear time effectwould be of the form:

ηx,zit,Fi= βx0 + βx1 · zpred

it1 + βx2 · zpredit2 + λ01 · F1i + λ02 · F2i + λ12 · F2i · zpred

it1 ,

where we assume that the β parameters are Class dependent and λ parame-ters Class independent. Similarly, LC growth models can be formulated fordependent variables of other scale types.

10 Multilevel LC Model

10.1 Model Components and Estimation Issues

To be able to explain the multilevel LC model implemented in Latent GOLD,we have to introduce and clarify some terminology. Higher-level observa-

j given all exogenous variable information in group j,jg

j j• for f(y |z , z ), which is the marginal density of all responses in group

These four equations show that a multilevel LC model is a model


jih ji jif(y |x, z ,F , x ,F )h=1


ji ji jif(y |x, z ,F , x ,F ) =


j jg gg g

ji ji ji ji ji jif(F ) P (x|z , x ,F ) f(y |x, z ,F , x ,F ) dF .jiF



ji jif(y |z , x ,F ) =

(29); that is,jgg

ji jii, f(y |z , x ,F ) has a structure similar to the one described in equationAssuming that the model of interest may also contain CFactors, for each case



j jg gg g

j j ji jif(y |z , x ,F ) = f(y |z , x ,F ).


j jg g

j j j jg g g gf(F ) P (x |z ) f(y |z , x ,F ) dF , (33)


∫gx =1



j jf(y |z , z ) =

model equalsThe most general Latent GOLD probability structure for a multilevel LC

g gβ , and λ .

gj jgg gvariates) are denoted by x , F , and z , and group-level parameters by γ ,

group-level continuous factors (GCFactors), and group-level covariates (GCo-tities will be referred to using a superscript g: Group-level classes (GClasses),j. Rather than expanding the notation with new symbols, group-level quan-

jresponses of case i in group j, and with y the responses of all cases in groupji(at replication) t of case i belonging to group j, with y the full vector of

jitnumber of cases in group j. With y we denote the response on indicatorjThe index j is used to refer to a particular group and I to denote the

the multiple responses of an individual at the various time the multiple time points within individuals and replications (or indicators)be individuals, for example, in longitudinal applications. “Cases” would thenvariable. It should, however, be noted that higher-level observations can also

records of cases belonging to the same group are connected by the Group IDtions will be referred to as groups and lower-level observations as cases. The

Assigned Reading: Technical G uide: Sec. 10


Page 20: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

• containing GClasses (xg) and/or (at most three mutually independent)GCFactors (Fg

j ),

• containing GCovariates zgj affecting the group classes xg,

• assuming that the Ij observations for the cases belonging to group jare independent of one another given the GClasses and GCFactors,

• allowing the GClasses and GCFactors to affect the case-level latentclasses x and/or the responses yji.

GCFactors enter in exactly the same manner in the linear predictors forthe various types of response variables as case-level CFactors. We will refer totheir coefficients as λt,g

d (Cluster and DFactor) and λgxqd (Regression), where

we add a subscript m when needed. GCFactors can also be used in themodel for the latent classes. These terms are similar to those for nominal(Cluster and Regression) or ordinal (DFactor) dependent variables. We willdenote a GCFactor effect on the latent classes as λ0,g

xrd, 0 ≤ 1 ≤ R, where thesuperscript 0 refers to the model for the latent classes.

GClasses enter in the linear predictors of the models for the indica-tors as βt,g

xg and in the one of the model for the dependent variable asβg

x0,xg +∑Q

q=1 βgxq,xg ·zpred

jitq . Inclusion of GClasses in the model for the Clusters,DFactors, or Classes implies that the γ parameters become GClass depen-dent; that is ηx|zji,x

g = γxg ,x0 +∑R

r=1 γxg ,xr · zcovjir . Note that this is similar to

a LC Regression analysis, where xg now plays the role of x, and x the role ofa nominal or ordinal y variable.

The remaining linear predictor is the one appearing in the multinomiallogistic regression model for the GClasses. It has the form ηxg |zg

i= γg

xg ,0 +∑Rg

r=1 γgxg ,r ·zg,cov

jr . This linear predictor is similar to the one for the Clusters orClasses (in a standard LC model), showing that GCovariates may be allowedto affect GClasses in the same way that covariates may affect Classes.

Below we will describe the most relevant special cases of this very generallatent variable model,43 most of which were described in Vermunt (2002b,2003, 2004, and 2005) and Vermunt and Magidson (2005b). We then devotemore attention to the expressions for the exact forms of the various linearpredictors in models with GClasses, GCFactors, and GCovariates.

43In fact, the multilevel LC model implemented in Latent GOLD is so general that manypossibilities remain unexplored as of this date. It is up to Latent GOLD Advanced usersto further explore its possibilities.


Page 21: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

Model restrictions Both in Cluster and in DFactor models, one canequate the λ’s are across indicators of the same type for selected CFactors(“Equal Effects”) and/or fix some of the λ’s to zero. The same applies tothe β’s corresponding to the GClasses.

In Regression, one can use the parameter constraints “Class Indepen-dent”, “No Effect”, and “Merge Effects”, implying equal λ’s (β’s) amongall Classes, zero λ’s (β’s) in selected Classes, and equal λ’s (β’s) in selectedClasses.

ML (PM) estimation and technical settings Similar to what was dis-cussed in the context of CFactors, with GCFactors, the marginal densityf(yj|zj) described in equation (33) is approximated using Gauss-Hermitequadrature. With three GCFactors and B quadrature nodes per dimension,the approximate density equals

f(yj|zj, zgj ) ≈





P (xg|zgj ) f(yj|zj, x

g, F gb1

, F gb2

, F gb3

) P gb1

P gb2

P gb3


ML (PM) estimates are found by a combination of the upward-downwardvariant of the EM algorithm developed by Vermunt (2003, 2004) and Newton-Raphson with analytic first-order derivatives.44

The only new technical setting in multilevel LC models is the same one asin models with CFactors; that is, the number of quadrature nodes to be usedin the numerical integration concerning the GCFactors. As already explainedin the context of models with CFactors, the default value is 10, the minimum2, and the maximum 50.

10.2 Application Types

10.2.1 Two-level LC or FM model

The original multilevel LC model described by Vermunt (2003) and Vermuntand Magidson (2005b) was meant as a tool for multiple-group LC analysisin situations in which the number of groups is large. The basic idea wasto formulate a model in which the latent class distribution (class sizes) isallowed to differ between groups by using a random-effects approach rather

44Numeric second-order derivatives are computed using the analytical first-order deriva-tives.


Page 22: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

than by estimating a separate set of class sizes for each group – as is done ina traditional multiple-group analysis.

When adopting a nonparametric random-effects approach (using GClasses),one obtains the following multilevel LC model:

f(yj) =Kg∑


P (xg)



P (x|xg)T∏




in which the linear predictor in the logistic model for P (x|xg) equals ηx|xg =γxg ,x0. Here, we are in fact assuming that the intercept of the model for thelatent classes differs across GClasses.

When adopting a parametric random-effects approach (GCFactors), oneobtains

f(yj) =∫ ∞

−∞f(F g




P (x|F g1j)



dF g1j,

where the linear term in the model for P (x|F g1j) equals ηx|F g

1j= γx0+λ0,g

x01 ·Fg1j.

Note that this specification is the same as in a random-intercept model fora nominal dependent variable.

Vermunt (2005) expanded the above parametric approach with covariatesand random slopes, yielding a standard random-effects multinomial logisticregression model, but now for a latent categorical outcome variable. Withcovariates and multiple random effects, we obtain

f(yj|zj) =∫Fg


f(Fgj )



P (x|zji,Fgj )



dFgj ,

where the linear predictor for x equals


= γx0 +R∑


γxr · zcovjir +


λ0,gx0d · F

gdj +



λ0,gxrd · F

gdj · zcov

jir ,

Whereas in the Cluster and Regression Modules, this is a random-effectsmultinomial logistic regression model, in the DFactor Module, we use arandom-effects ordinal logistic regression model for each of the discrete ordi-nal factors.


Page 23: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

Also when adopting a nonparametric random-effects approach, one mayinclude covariates in the multilevel LC model; that is,

ηx|zji,xg = γxg ,x0 +


γxg ,xr · zcovjir .

This yields a model for the latent classes in which the intercept and thecovariate effects may differ across GClasses. In fact, we have a kind of LCRegression structure in which the latent classes serve as a nominal dependentvariable and the GClasses as latent classes.

An important extension of the above nonparametric multilevel LC modelsis the possibility to regress the GClasses on group-level covariates. This partof the model has the same form as the multinomial logistic regression modelfor the Clusters or Classes in a standard LC or FM model.

In the Cluster and DFactor Modules, it is possible to allow GCFactorsand/or GClasses to have direct effects on the indicators. As suggested byVermunt (2003), this is a way to deal with item bias. Below, we will discussvarious other applications of this option.

10.2.2 LC (FM) regression models for three-level data

Another application type of the Latent GOLD multilevel LC option is three-level regression modeling (Vermunt, 2004). This application type concernsthe Regression Module.

A three-level LC (FM) Regression model would be of the form

f(yj|zj) =Kg∑


P (xg)



P (x)Ti∏


f(yjit|x, zpredjit , xg)


Suppose we have a LC Regression model for a binary outcome variable. Thesimplest linear predictor in a model that includes GClasses would then be

ηzjit,x,xg = βx0 +Q∑


βxq · zpredjitq + βg

0,xg ,

which is a model in which (only) the intercept is affected by the GClasses. Amore extended model is obtained by assuming that also the predictor effectsvary across GClasses; that is,

ηzjit,x,xg = βx0 +Q∑


βxq · zpredjitq + βg

0,xg +Q∑


βgq,xg · zpred

jitq .


Page 24: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

for the first GClass, or are equal to zero for the last GClass.

the dependent variable either sum to zero across GClasses, are equal to zero

0 ≤ q ≤ Q and 1 ≤ x ≤ K. In other words, the parameters in the model forgxqKxq1gxq,x

g g gβ = 0, β = 0, or β = 0, forgx =1

gK∑the most general model, this isstraints have to be imposed on the parameters involving the GClasses. InIt should be noted that in each of the above three models, identifying con-

gxq,xgx0,xjitq jitqpred g g pred

q=1 q=1

∑ ∑Q Q

gjitz ,x,x x0 xqη = β + β · z + β + β · z .

teractions. Such a model is defined as

gto be Class dependent, which implies including Classes-GClasses (x-x ) in-The most extended specification is obtained if all the effects are assumed

dictors that change values across cases to depend on the GClasses.change values across replications to be Class dependent and effects of pre-In practice, it seems to be most natural to allow effects of predictors that

and/or discrete random effects (GClasses).This variation can be modelled using continuous random effects (GCFactors)cept and possibly also the time slope is allowed to vary across individuals.a LC growth model: class membership depends on time, where the inter-regression model for the (time-specific) latent classes will have the form ofof one of the other types of Latent GOLD models. The multinomial logisticmodel for the time points may have the form of a LC Cluster model, but alsoand the index j for the cases (time points are nested within cases). The LCcation of such a model would involve using the index i for the time pointschange over time could be described using a (LC) growth model. Specifi-to build a time-specific latent classification, while the pattern of (latent)sponse variables) for each time point. The multiple responses could be usedSuppose one has a longitudinal data set containing multiple indicators (re-

10.2.4 LC growth models for multiple indicators


Page 25: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh


Jeroen K. Vermunt

& Jay Magidson


Statistical Innovations

Thinking outside the brackets!TM25

Page 26: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

ClassPred Tab: Restricting Cases Known (Not) to Belong to a Certain Class orClasses

With this option one can specify that one or more specific cases can belong to a certain class or certain classesonly. To use this feature, select a variable from the list box in the ClassPred Tab to be used as the Known ClassIndicator and click Known Class. The variable moves to the Known Class Indicator Box and the Assignment Tablebecomes active. For each category of the Known Class indicator you then specify to which classes the cases withthat category code may belong (or not belong) using the Assignment Table. For example, Figure 5-12 illustratesa 4-Cluster model (4 columns) where the variable 'classind' is used as the Known Class Indicator. Cases for which'classind=1' are allowed to be in cluster 1 only; those for which 'classind=2' are allowed to be in cluster 2 only; allother cases ('classind=3') may be assigned to any of the 4 clusters.

Figure 5-12. ClassPred Tab for Cluster Model with a Known Class Indicator

This option is useful if you have a priori class membership information for some cases (pre-assigned or pre-clas-sified cases) or if membership to certain classes is very implausible for some combinations of observed scores.


In applications where a subset of the cases are known with certainty not to belong to a particular class, or particular classes, you can take advantage of this information to restrict their posterior membership probability to0 for one or more classes and hence classify these cases into one of the remaining class(es) with a total probability = 1. This feature allows more control over the segment definitions to ensure that the resulting classes



Page 27: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

are most meaningful. Common applications include:

1) using new data to refine old segmentation models while maintaining the segment classifications ofthe original sample

2) archetypal analysis - define class membership a priori based on extreme response patterns thatreflect theoretical "archetypes"

3) partial classification -- high cost (or other factors) may preclude all but a small sample of cases frombeing classified with certainty. These cases can be assigned to their respective classes with 100%certainty, and the remaining would be classified by the LC model in the usual way

4) certain cases may be known to be "type 1 OR type 2" (e.g., 'clinically depressed' or 'troubled'). Byexcluding such cases from being in say class 3 = 'healthy', such cases can be pre-assigned to bein class 1 or 2, while additional cases may be freely classified into any class

5) post-hoc refinement of class assignment where modal assignment for certain cases is judged to beimplausible based on the desired interpretation of the classes.

In addition, this option may also be used to specify multiple group models by including the group variable as botha Known Class Indicator and as an active covariate. For further details of this, see section 2.5 of Technical Guide.

Note: The Known Class option is not available in cluster and regression if the Range option hasbeen used in the Variables Tab.

For DFactor models, this option applies only to levels of DFactor 1.

To select known classes (clusters/classes/DFactor1 levels):

Select one variable from those appearing in Variables List Box(located in the upper left-hand portion of the ClassPred Tab).Variables appearing here are those that have not been previouslyselected as Indicators or Covariates.

Click Known Class to move that variable to the Known Class Boxand the class assignment window beneath the Known Class Boxbecomes active.

A separate row appears for each category/code/value taken on bythe known class indicator

A separate column appears for each class.

Click on the appropriate boxes to select or deselect the possibleassignment of the categories to certain classes.

A checkmark off means that the posterior membership probability is restricted to zero for that class for cases inthat category of the known class indicator.

By default, the checks are assigned as follows:

For a K-class model, a category with a code of K on the Known Class Indicator is assigned to only class



Page 28: SESSION 3 · 2015-12-24 · rifodvv0vshfl˜fsuredelolwlhv˙ Ev ’ tmf ’ |˝1 Wkhzhljkw˙ Ef ’ |˝ lvwkhsuredelolw| wkdwshuvrq lqjurxsˆ ehorqjvwrodwhqwfodvv|1 Dvfdqehvhhqiurpwkhvhfrqgolqh

K. Categories coded less than 1, greater than K or missing are assigned to all classes (i.e., no restric-tions). Missing values are not shown in the table.

Note: For the example in Figure 5-12 above, all cases are coded either 1, 2 or 3 on the variable'classind' (i.e., no missing values). Those coded 'classind=1' and 'classind=2' are main-tained at their default specifications on the table, while the default specification for casescoded 'classind=3' was changed (from 'cluster 3 only' to 'any cluster' -- all 4 cluster columnschecked). This specification would be obtained by default if those coded '3' on the classindvariable were instead coded as 'missing'. In this situation, the table would differ from thatshown in Figure 5-12, in that the 3rd row of the table would not appear, since that categorywould be coded 'missing'.

For further information, see section 2.5 of the Technical Guide.