Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
MissingData;MissingDataMethodsinML;
MultipleImputationviaPredictiveMeanMatching
EPSY905:MultivariateAnalysisSpring2016
Lecture#11:April13,2016
EPSY905:Lecture11-- MissingDataMethods
Today’sLecture
• Thebasicsofmissingdata:Ø Typesofmissingdata
• HowNOTtohandlemissingdataØ Deletionmethods (bothpairwiseandlistwise)Ø Mean-substitutionØ SingleImputation
• Howmaximumlikelihoodworkswithmissingdata
• MultipleimputationformissingdataØ HowimputationworksØ Howtoconductanalyseswithmissingdatausingimputation
EPSY905:Lecture11-- MissingDataMethods 2
ExampleData#1
• Todemonstratesomeoftheideasoftypesofmissingdata,let’sconsiderasituationwhereyouhavecollectedtwovariables:
Ø IQscoresØ Jobperformance
• ImagineyouareanemployerlookingtohireemployeesforajobwhereIQisimportant
EPSY905:Lecture11-- MissingDataMethods 3
EPSY905:Lecture11-- MissingDataMethods 4
IQ Performance78 984 1384 1085 887 791 792 994 994 1196 799 7105 10105 11106 15108 10112 10113 12115 14118 16134 12
CompleteDataFromEnders(2010)
TYPESOFMISSINGDATA
EPSY905:Lecture11-- MissingDataMethods 5
OurNotationalSetup
• Let’sletD denoteourdatamatrix,whichwillincludedependent(Y)andindependent(X)variables
𝐃 = 𝐗,𝐘
• Problem: someelementsof𝐃 aremissing
EPSY905:Lecture11-- MissingDataMethods 6
MissingnessIndicatorVariables
• WecanconstructanalternatematrixM consistingofindicatorsofmissingnessforeachelementinourdatamatrixD
𝑀'( = 0 ifthe𝑖+,observation’s𝑗+, variableisnotmissing𝑀'( = 1 ifthe𝑖+,observation’s𝑗+, variableismissing
• Let𝑴012 and𝑴3'2 denotetheobservedandmissingpartsof𝑴
𝑴 = {𝑴012,𝑴3'2}
EPSY905:Lecture11-- MissingDataMethods 7
TypesofMissingData
• Averyroughtypologyofmissingdataputsmissingobservationsintothreecategories:
1. MissingCompletelyAtRandom(MCAR)2. MissingAtRandom(MAR)3. MissingNotAtRandom(MNAR)
EPSY905:Lecture11-- MissingDataMethods 8
MissingCompletelyAtRandom(MCAR)
• MissingdataareMCARiftheeventsthatleadtomissingnessareindependentof:
Ø Theobserved variables-and-
Ø Theunobserved parameters ofinterest
• Examples:Ø Plannedmissingness insurvey research
w Somelarge-scale testsaresampled usingbookletsw Students receiveonlyafewofthetotalnumberofitemsw The itemsnotreceived aretreatedasmissing – butthat iscompletely afunctionofsampling andnoothermechanism
EPSY905:Lecture11-- MissingDataMethods 9
A(More)FormalMCARDefinition
• Ourmissingdataindicators,𝑴 arestatisticallyindependent ofourobserveddata𝑫
𝑃 𝑴|𝑫 = 𝑃 𝑴thiscomesfromhowindependenceworkswithpdfs
• Likesayingamissingobservationisduetopurerandomness(i.e.,flippingacoin)
EPSY905:Lecture11-- MissingDataMethods 10
ImplicationsofMCAR
• Becausethemechanismofmissingisnotduetoanythingotherthanchance,inclusionofMCARindatawillnotbiasyourresults
Ø Canusemethodsbasedonlistwisedeletion,multipleimputation,ormaximumlikelihood
• Youreffectivesamplesizeislowered,thoughØ Lesspower, lessefficiency
EPSY905:Lecture11-- MissingDataMethods 11
IQ Performance78 -84 1384 -85 887 791 792 994 994 1196 -99 7105 10105 11106 15108 10112 -113 12115 14118 16134 -
MCARData
Missingdataaredispersedrandomlythroughoutdata
MeanIQofcompletecases:99.7MeanIQofincompletecases:100.8
EPSY905:Lecture11-- MissingDataMethods 12
MissingAtRandom(MAR)
• DataareMARiftheprobabilityofmissingdependsonlyonsome(orall)oftheobserveddata
• 𝑴 isindependentof𝑫𝑚𝑖𝑠𝑃 𝑴 𝑫 = 𝑃(𝑴|𝑫012)
EPSY905:Lecture11-- MissingDataMethods 13
IQ Perf Indicator78 - 184 - 184 - 185 - 187 - 191 7 092 9 094 9 094 11 096 7 099 7 0105 10 0105 11 0106 15 0108 10 0112 10 0113 12 0115 14 0118 16 0134 12 0
MARData
Missingdataarerelatedtootherdata:
AnyIQlessthan90didnothaveaperformancevariable
MeanIQofincompletecases:83.6MeanIQofcompletecases:105.5
EPSY905:Lecture11-- MissingDataMethods 14
ImplicationsofMAR
• Ifdataaremissingatrandom,biasedresultscouldoccur
• Inferencesbasedonlistwisedeletionwillbebiasedandinefficient
Ø Fewerdatapoints=moreerror inanalysis
• Inferencesbasedonmaximumlikelihoodwillbeunbiasedbutinefficient
• WewillfocusonmethodsforMARdatatoday
EPSY905:Lecture11-- MissingDataMethods 15
MissingNotAtRandom(MNAR)
• DataareMNARiftheprobabilityofmissingdataisrelatedtovaluesofthevariableitself
𝑃 𝑴 𝑫 = 𝑃(𝑴|𝑫012, 𝑫3'2)
• Oftencallednon-ignorablemissingnessØ Inferences basedonlistwisedeletionormaximumlikelihoodwillbebiasedandinefficient
• Needtoprovidestatisticalmodelformissingdatasimultaneouslywithestimationoforiginalmodel
EPSY905:Lecture11-- MissingDataMethods 16
SURVIVINGMISSINGDATA:ABRIEFGUIDE
EPSY905:Lecture11-- MissingDataMethods 17
UsingStatisticalMethodswithMissingData
• Missingdatacanalteryouranalysisresultsdramaticallydependingupon:1.Thetypeofmissingdata2.Thetypeofanalysisalgorithm
• Thechoiceofanalgorithmandmissingdatamethodisimportantinavoidingissuesduetomissingdata
EPSY905:Lecture11-- MissingDataMethods 18
TheWorstCaseScenario:MNAR
• TheworstcasescenarioiswhendataareMNAR:missingnotatrandom
Ø Non-ignorablemissing
• YoucannoteasilygetoutofthismessØ Insteadyouhave tobeclairvoyant
• Analysesalgorithmsmustincorporatemodelsformissingdata
Ø Andthesemodelsmustalsoberight
EPSY905:Lecture11-- MissingDataMethods 19
TheReality
• Inmostempiricalstudies,MNARasaconditionisanafterthought
• ItisimpossibletoknowdefinitivelyifdatatrulyareMNARØ Sodataaretreated asMARorMCAR
• HypothesistestsdoexistforMCARØ Althoughtheyhavesomeissues
EPSY905:Lecture11-- MissingDataMethods 20
TheBestCaseScenario:MCAR
• UnderMCAR,prettymuchanythingyoudowithyourdatawillgiveyouthe“right”(unbiased)estimatesofyourmodelparameters
• MCARisveryunlikelytooccurØ Inpractice,MCARistreated asequallyunlikelyasMNAR
EPSY905:Lecture11-- MissingDataMethods 21
TheMiddleGround:MAR
• MARisthecommoncompromiseusedinmostempiricalresearch
Ø UnderMAR,maximumlikelihoodalgorithmsareunbiased
• Maximumlikelihoodisformanymethods:Ø Linearmixedmodels inPROCMIXED,gls,lavaanØ Modelswith“latent”randomeffects (CFA/SEMmodels) inMplus,lavaan
EPSY905:Lecture11-- MissingDataMethods 22
MISSINGDATAINMAXIMUMLIKELIHOOD
EPSY905:Lecture11-- MissingDataMethods 23
MissingDatawithMaximumLikelihood
• Handlingmissingdatainmaximumlikelihoodismuchmorestraightforwardduetothecalculationofthelog-likelihoodfunction
Ø Eachsubjectcontributesaportionduetotheirobservations
• Ifsomeofthedataaremissing,thelog-likelihoodfunctionusesareducedformoftheMVNdistribution
Ø CapitalizingonthepropertyoftheMVNthatsubsets ofvariables fromanMVNdistributionarealsoMVN
• Thetotallog-likelihoodisthenmaximizedØ Missingdatajustare“skipped”– theydonotcontribute
EPSY905:Lecture11-- MissingDataMethods 24
EachPerson’sContributiontotheLog-Likelihood
• Foraperson𝑝,theMVNlog-likelihoodcanbewritten:
log 𝐿B = −𝑉2 log 2𝜋 −
12 log 𝚺𝒑 −
𝒚B − 𝝁BK𝚺BLM 𝒚B − 𝝁B2
Ø Fromourexampleswithmissingdata,subjectscouldeitherhavealloftheirdata…so their inputintolog 𝐿B uses:
𝒚B =𝑦B,OP𝑦B,QRST ;
𝝁B = 𝐗B𝜷 = 1 11 0
𝛽Y𝛽M
= 𝛽Y + 𝛽M𝛽Y
=𝜇OP𝜇QRST ;
𝚺B =𝜎OP] 𝜎OP,QRST
𝜎OP,QRST 𝜎QRST]
Ø …orcouldbemissingtheperformance variable, yielding:
𝒚B = 𝑦B,OP ; 𝝁B = 𝐗B𝜷 = 1 1𝛽Y𝛽M
= [𝛽Y + 𝛽M] = 𝜇OP ; 𝚺B = 𝜎OP]
EPSY905:Lecture11-- MissingDataMethods 25
EvaluationofMissingDatainPROCMIXED(andprettymuchallotherpackages)
• Ifthedependentvariablesaremissing,PROCMIXEDautomaticallyskipsthosevariablesinthelikelihood
Ø TheREPEATED statement specifies observations withthesamesubject ID–andusesthenon-missingobservations fromthatsubjectonly
• Ifindependentvariablesaremissing,however,PROCMIXEDuseslistwisedeletion
Ø IfyouhavemissingIVs,thisisaproblemØ Youcansometimes phrase IVsasDVs,though
• SASSyntax(identicaltowhenyouhavecompletedata):
EPSY905:Lecture11-- MissingDataMethods 26
AnalysisofMCARDatawithPROCMIXED
• Covariancematricesfromslide#4(MIXEDisclosertocomplete):
• Estimated𝐑 matrixfromPROCMIXED:
• Outputforeachobservation(obs #1=missing,obs #2=complete):
EPSY905:Lecture11-- MissingDataMethods 27
MCARData(Pairwise Deletion)
IQ 115.6 19.4
Performance 19.4 8.0
CompleteData
IQ 189.6 19.5
Performance 19.5 6.8
MCARAnalysis:EstimatedFixedEffects
• Estimatedmeanvectors:
• Estimatedfixedeffects:
• Means– IQ=89.36+10.64=100;Performance=10.64EPSY905:Lecture11-- MissingDataMethods 28
Variable MCARData(pairwise deletion)
CompleteData
IQ 93.73 100
Performance 10.6 10.35
AnalysisofMARDatawithPROCMIXED
• Covariancematricesfromslide#4(MIXEDisclosertocomplete):
• Estimated𝐑 matrixfromPROCMIXED:
• Outputforeachobservation(obs #1=missing,obs #10=complete):
EPSY905:Lecture11-- MissingDataMethods 29
CompleteData
IQ 189.6 19.5
Performance 19.5 6.8
MAR Data(Pairwise Deletion)
IQ 130.2 19.5
Performance 19.5 7.3
MARAnalysis:EstimatedFixedEffects
• Estimatedmeanvectors:
• Estimatedfixedeffects:
• Means– IQ=90.15+9.85=100;Performance=9.85
EPSY905:Lecture11-- MissingDataMethods 30
Variable MCARData(pairwise deletion)
CompleteData
IQ 105.4 100
Performance 10.7 10.35
AdditionalIssueswithMissingDataandMaximumLikelihood
• Giventhestructureofthemissingdata,thestandarderrorsoftheestimatedparametersmaybecomputeddifferently
Ø Standarderrorscomefrom-1*inverse informationmatrixw Informationmatrix=matrixofsecondderivatives =hessian
• SeveralversionsofthismatrixexistØ Somebasedonwhatisexpected underthemodel
w Thedefault inSAS– goodonlyforMCARdataØ Somebasedonwhatisobserved fromthedata
w Empirical optioninSAS– worksforMARdata(onlyforfixedeffects)
• Implication:someSEsmaybebiasedifdataareMARØ Mayleadtoincorrect hypothesis testresultsØ Correction needed forlikelihoodratio/deviance test statistics
w Notavailable inSAS;availableforsomemodels inMplusEPSY905:Lecture11-- MissingDataMethods 31
WhenMLGoesBad…
• Forlinearmodelswithmissingdependentvariable(s)PROCMIXEDandalmosteveryotherstatpackageworksgreat
Ø ML“skips”overthemissingDVsinthelikelihoodfunction,usingonlythedatayouhaveobserved
• Forlinearmodelswithmissingindependentvariable(s),PROCMIXEDandalmosteveryotherstatpackageuseslist-wisedeletion
Ø Givesbiasedparameter estimates underMAR
EPSY905:Lecture11-- MissingDataMethods 32
OptionsforMARforLinearModelswithMissingIndependentVariables
1. UseMLEstimatorsandhopeforMCAR
2. RephraseIVsasDVsØ InSAS:hardtodo,butpossible forsomemodels
w Dummycoding,correlated randomeffectsw Relyonproperties ofhowcorrelations/covariances arerelated tolinearmodel
coefficients 𝛽Ø InMplus:mucheasier…looksmore likeastructuralequationmodel
w Predicted variablesthenfunction likeDVsinMIXED
3. ImputeIVs(multipletimes)andthenuseMLEstimatorsØ Notusuallyagreatidea…but oftentheonlyoption
EPSY905:Lecture11-- MissingDataMethods 33
HOWNOT TOHANDLEMISSINGDATA
EPSY905:Lecture11-- MissingDataMethods 34
BadWaystoHandleMissingData
• Dealingwithmissingdataisimportant,asthemechanismsyouchoosecandramaticallyalteryourresults
• Thispointwasnotfullyrealizedwhenthefirstmethodsformissingdatawerecreated
Ø Eachofthemethodsdescribed inthissectionshouldneverbeusedØ Given toshowperspective – andtoallowyoutounderstandwhathappens ifyouwere tochooseeach
EPSY905:Lecture11-- MissingDataMethods 35
DeletionMethods
• Deletionmethodsarejustthat:methodsthathandlemissingdatabydeletingobservations
Ø Listwisedeletion:delete theentireobservation ifanyvaluesaremissingØ Pairwisedeletion:deleteapairofobservations ifeitherofthevaluesaremissing
• Assumptions:DataareMCAR
• Limitations:Ø Reduction instatisticalpower ifMCARØ Biasedestimates ifMARorMNAR
EPSY905:Lecture11-- MissingDataMethods 36
ListwiseDeletion
• Listwisedeletiondiscardsall ofthedatafromanobservationifoneormorevariablesaremissing
• Mostfrequentlyusedinstatisticalsoftwarepackagesthatarenotoptimizingalikelihoodfunction(needML)
• Inlinearmodels:Ø SASGLMlist-wisedeletes caseswhere IVs orDVs aremissing
EPSY905:Lecture11-- MissingDataMethods 37
PairwiseDeletion
• Pairwisedeletiondiscardsapairofobservationsifeitheroneismissing
Ø Different fromlistwise:usesmoredata(restofdatanotthrownout)
• Assumes:MCAR
• Limitations:Ø Reduction instatisticalpower ifMCARØ Biasedestimates ifMARorMNAR
• Canbeanissuewhenformingcovariance/correlationmatrices
Ø Maymakethemnon-invertible, problem ifusedasinputintostatisticalprocedures
EPSY905:Lecture11-- MissingDataMethods 38
SingleImputationMethods
• Singleimputationmethodsreplacemissingdatawithsometypeofvalue
Ø Single: onevalueusedØ Imputation: replacemissingdatawithvalue
• Upside:canuseentiredatasetifmissingvaluesarereplaced
• Downside:biasedparameterestimatesandstandarderrors(evenifmissingisMCAR)
Ø Type-Ierror issues
• Still:neverusethesetechniques
EPSY905:Lecture11-- MissingDataMethods 39
UnconditionalMeanImputation• Unconditionalmeanimputationreplacesthemissingvaluesofavariablewithitsestimatedmean
Ø Unconditional=meanvaluewithoutanyinputfromothervariables• Example:missingOxygen=47.1;missingRunTime =10.7;
missingRunPulse =171.9
• Notice:uniformlysmallerstandarddeviationsEPSY905:Lecture11-- MissingDataMethods 40
BeforeSingleImputation: AfterSingleImputation:
ConditionalMeanImputation(Regression)
• Conditionalmeanimputationusesregressionanalysestoimputemissingvalues
Ø Themissingvaluesareimputedusingthepredicted values ineachregression(conditionalmeans)
• Forourdatawewouldformregressionsforeachoutcomeusingtheothervariables
Ø OXYGEN=β01 +β11*RUNTIME +β21*PULSEØ RUNTIME=β02 +β12*OXYGEN +β22*PULSEØ PULSE=β03 +β13*OXYGEN+β23*RUNTIME
• MoreaccuratethanunconditionalmeanimputationØ Butstillprovides biasedparameters andSEs
EPSY905:Lecture11-- MissingDataMethods 41
StochasticConditionalMeanImputation
• Stochasticconditionalmeanimputationaddsarandomcomponenttotheimputation
Ø Representing theerror term ineachregression equationØ AssumesMARrather thanMCAR
• Again,usesregressionanalysestoimputedata:Ø OXYGEN=β01 +β11*RUNTIME +β21*PULSE +ErrorØ RUNTIME=β02 +β12*OXYGEN +β22*PULSE + ErrorØ PULSE=β03 +β13*OXYGEN+β23*RUNTIME +Error
• Errorisrandom:drawnfromanormaldistributionØ Zeromeanandvarianceequal toresidualvariance𝜎R] forrespective regression
EPSY905:Lecture11-- MissingDataMethods 42
ImputationbyProximity:HotDeckMatching
• Hotdeckmatchingusesrealdata– fromotherobservationsasitsbasisforimputing
• Observationsare“matched”usingsimilarscoresonvariablesinthedataset
Ø Imputedvaluescomedirectlyfrommatchedobservations
• Upside:Helpstopreserveunivariatedistributions;givesdatainanappropriaterange
• Downside:biasedestimates(especiallyofregressioncoefficients),too-smallstandarderrors
EPSY905:Lecture11-- MissingDataMethods 43
ScaleImputationbyAveraging
• Inpsychometrictests,acommonmethodofimputationhasbeentouseascaleaverageratherthantotalscore
Ø Canre-scale tototalscorebytaking#items*averagescore
• Problem:treatingmissingitemsthiswayislikeusingpersonmean
Ø Reduces standarderrorsØ Makescalculationofreliabilitybiased
EPSY905:Lecture11-- MissingDataMethods 44
LongitudinalImputation:LastObservationCarriedForward
• Acommonlyusedimputationmethodinlongitudinaldatahasbeentotreatobservationsthatdroppedoutbycarryingforwardthelastobservation
Ø Morecommoninmedicalstudiesandclinicaltrials
• Assumesscoresdonotchangeafterdropout– badideaØ Thoughttobeconservative
• CanexaggerategroupdifferencesØ Limitsstandard errors thathelpdetectgroupdifferences
EPSY905:Lecture11-- MissingDataMethods 45
WhySingleImputationIsBadScience
• Overall,themethodsdescribedinthissectionarenotusefulforhandlingmissingdata
• Ifyouusethemyouwilllikelygetastatisticalanswerthatisanartifact
Ø Actualestimates youinterpret (parameter estimates) willbebiased(ineitherdirection)
Ø Standarderrorswillbetoosmallw LeadstoType-IErrors
• Puttingthistogether:youwilllikelyendupmakingconclusionsaboutyourdatathatarewrong
EPSY905:Lecture11-- MissingDataMethods 46
WHATTODOWHENMLWON’TGO:MULTIPLEIMPUTATION
EPSY905:Lecture11-- MissingDataMethods 47
MultipleImputation
• Ratherthanusingsingleimputation,abettermethodistousemultipleimputation
Ø Themultiplyimputedvalueswillendupaddingvariabilitytoanalyses–helpingwithbiasedparameter andSEestimates
• Multipleimputationisamechanismbywhichyou“fillin”yourmissingdatawith“plausible”values
Ø Endupwithmultipledatasets– need torunmultipleanalysesØ Missingdataarepredicted usingastatisticalmodelusingtheobserved data(theMARassumption) foreachobservation
• MIispossibleduetostatisticalassumptionsØ Themostoftenusedassumption isthattheobserved dataaremultivariatenormal
EPSY905:Lecture11-- MissingDataMethods 48
MultipleImputationSteps
1. Themissingdataarefilledinanumberoftimes(say,mtimes)togeneratem completedatasets
2. Themcompletedatasetsareanalyzedusingstandardstatisticalanalyses
3. Theresultsfromthem completedatasetsarecombinedtoproduceinferentialresults
EPSY905:Lecture11-- MissingDataMethods 49
Distributions:TheKeytoMultipleImputation
• Thekeyideabehindmultipleimputationisthateachmissingvaluehasadistribution oflikelyvalues
Ø Thedistribution reflects theuncertaintyaboutwhatthevariablemayhavebeen
• Multipleimputationcanbeaccomplishedusingvariablesoutsideananalysis
Ø Allcontribute tomultivariate normaldistributionØ Harder tojustifywhyun-important variables omitted
• Singleimputation,byanymethod,disregardstheuncertaintyineachmissingdatapoint
Ø Results fromsinglyimputeddatasetsmaybebiasedorhavehigherType-Ierrors
EPSY905:Lecture11-- MissingDataMethods 50
MultipleImputationinR
• Rhastwowell-knownpackagesformultipleimputation:MICEandAmelia
Ø MICE:MultipleImputationbyChainedEquationsØ Amelia:MIviaMCMCassumingMultivariate Normal
• TheMICEpackagehasanumberofmethodsavailableformultipleimputation
Ø Werestrict ourfocustodayonthosethatarecalled“predictivemeanmatching”asthese areverycommonlyused inresearch
EPSY905:Lecture11-- MissingDataMethods 51
IMPUTATIONPHASE
EPSY905:Lecture11-- MissingDataMethods 52
MultipleImputationviaChainedEquations• ThekeytoMIviachainedequations iscreatingasetofunivariateregressionmodels
Ø Oneforeachvariablewithmissingdata
• Theprocessstartsbyrandomly imputing values forallmissingobservations
• Foreachvariable withmissingdata,insequence, aregression isrunwhereby apredictedmeaniscreated forallobservations withmissingdata
Ø Thepredictedmeancomesfromapplyingthesampledvaluesoftheposteriordistributionoftheregressionweights(sosamplingerrordoesn’tgetforgotten)
• Eachmissingobservation isthenimputed viaoneofahostofmethods (upnext)
• Thisprocesscontinues forallvariables withmissingdataØ Eachtimethrough,newlyimputedvaluesaidintheestimationofthebetas
• Attheendoftheprocess,anumberof“imputed” (complete) datasetsnowexist
EPSY905:Lecture11-- MissingDataMethods 53
ExampleofMIviaCEandPMM• Todemonstratemultipleimputationwithchainedequationsusingpredictivemeanmatchingwewillconsiderthreevariablesfromouroft-usedmathperformancedata
Ø Performance (PERF),Sex(MALE),andCollegeCreditHours (CC)
• Amountofmissingdatainsample:
• WewillimputeforPERFandforCCEPSY905:Lecture11-- MissingDataMethods 54
RegressionsforPERFandCC
• Tomakethechainedequationswork,weregressionmodelsforeachvariablewithmissingdata:𝑃𝐸𝑅𝐹B = 𝛽YQ + 𝛽dQ𝑀B + 𝛽eeQ 𝐶𝐶B + 𝛽d∗eeQ 𝑀B𝐶𝐶B + 𝑒BQ
𝐶𝐶B = 𝛽Ye + 𝛽de𝑀B + 𝛽Qe𝑃B + 𝛽d∗Qe 𝑀B𝑃B + 𝑒Be
• Eachvariable’spredictionmodelcanbemadeupofanything(we’lljustuseallothervariablesplustheinteraction)
• We’llcontinuethisprocessinourRexample
EPSY905:Lecture11-- MissingDataMethods 55
MIviaPredictiveMeanMatching
• Intheimputationphase,missingvaluesareimputedusingmatchingonthepredictedmean(fromR)
EPSY905:Lecture11-- MissingDataMethods 56
RunningMICE
• Youdon’thavetousemyclunkyRcodetomakeanimputationrun…youcanuseMICE:
• HereissomeMICEimputationdistributions:
EPSY905:Lecture11-- MissingDataMethods 57
MULTIPLEIMPUTATION:ANALYSISPHASE
EPSY905:Lecture11-- MissingDataMethods 58
UpNext:MultipleAnalyses
• OnceyourunMICE,thenextstepistouseeachoftheimputeddatasetsinitsownanalysis
Ø CalledtheanalysisphaseØ Forourexample, thatwouldbe100times
• Themultipleanalysesarethencompiledandprocessedintoasingleresult
Ø Yieldingtheanswers toyouranalysisquestions (estimates, SEs,andP-values)
• GOODNEWS:MICEwillautomateallofthisforyou
EPSY905:Lecture11-- MissingDataMethods 59
AnalysisPhase
• AnalysisPhase:runtheanalysisonallimputeddatasets
• Syntaxrunsforeachdataset(with())
• Model01nowhasall100modelscontainedwithinit
EPSY905:Lecture11-- MissingDataMethods 60
MULTIPLEIMPUTATION:POOLINGPHASE
EPSY905:Lecture11-- MissingDataMethods 61
PoolingParametersfromAnalysesofImputedDataSets
• Inthepoolingphase,theresultsarepooledandreported
• Forparameterestimates,thepoolingisstraightforwardØ Theestimated parameter istheaverage parameter valueacrossallimputeddatasets
• Forstandarderrors,poolingismorecomplicatedØ Have toworryaboutsourcesofvariation:
w Variationfromsamplingerrorthatwouldhavebeenpresent hadthedatanotbeenmissing
w Variationfromsamplingerrorresultingfrommissing data
EPSY905:Lecture11-- MissingDataMethods 62
PoolingStandardErrorsAcrossImputationAnalyses
• Standarderrorinformationcomesfromtwosourcesofvariationfromimputationanalyses(for𝑚 imputations)
• WithinImputationVariation:
𝑉i =1𝑚j𝑆𝐸']
3
'lM• BetweenImputationVariation(here𝜃 isanestimatedparameterfromanimputationanalysis):
𝑉n =1
𝑚 − 1j𝜃o' − �̅�
]3
'lM
• Then,thetotalsamplingvarianceis:𝑉K = 𝑉i + 𝑉n +qrd
• Thesubsequent(imputationpooled)SEis𝑆𝐸 = 𝑉KEPSY905:Lecture11-- MissingDataMethods 63
PoolingPhaseinR:MICE’spool()Function
• MICEhasafunctionthatwillpoolallresults(resultsforusingcertainclassesofmodels,thatis)
• Behold,theresults:
EPSY905:Lecture11-- MissingDataMethods 64
AdditionalPoolingInformation
• Thedecompositionofimputationvarianceleadstotwohelpfuldiagnosticmeasuresabouttheimputation:
• FractionofMissingInformation: 𝐹𝑀𝐼 =qrt
urv
qwØ Measure ofinfluenceofmissingdataonsamplingvariance
EPSY905:Lecture11-- MissingDataMethods 65
ISSUESWITHIMPUTATION
EPSY905:Lecture11-- MissingDataMethods 66
CommonIssuesthatcanHinderImputation
• MCMCConvergenceØ Need“stable”meanvector/covariance matrix
• Non-normaldata:counts,skeweddistributions,categorical(ordinalornominal)variables
Ø MplusisagoodoptionØ Someclaimitdoesn’tmatterasmuchwithmanyimputations
• PreservationofmodeleffectsØ Imputationcanstripouteffects indata
w Interactions aremostdifficult– formasauxiliaryvariable
• ImputationofmultileveldataØ Differingcovariancematrices
EPSY905:Lecture11-- MissingDataMethods 67
NumberofImputations
• Thenumberofimputations(𝑚 fromthepreviousslides)isimportant:biggerisbetter
Ø Basically,runasmanyasyoucan(100s)
• TakealookattheSEsforourparametersasIvariedthenumberofimputations:
EPSY905:Lecture11-- MissingDataMethods 68
Parameter 𝑚 = 1 𝑚 = 10 𝑚 = 30 𝑚 = 100
Intercept 8.722 9.442 9.672 9.558
RunTime 0.366 0.386 0.399 0.389
RunPulse 0.053 0.053 0.057 0.056
WRAPPINGUP
EPSY905:Lecture11-- MissingDataMethods 69
WrappingUp
• Missingdataarecommoninstatisticalanalyses
• TheyarefrequentlyneglectedØ MNAR:hardtomodelmissingdataandobserved datasimultaneouslyØ MCAR:doesn’toftenhappenØ MAR:mostmissingimputationassumesMVN
• Moreoftenthannot,MListhebestchoiceØ Software isgettingbetterathandlingmissingdata
EPSY905:Lecture11-- MissingDataMethods 70