55
Missing covariate data in matched case-control studies: Do the usual paradigms apply? Bryan Langholz USC Department of Preventive Medicine Joint work with Mulugeta Gebregziabher Larry Goldstein Mark Huberman

Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

Missing covariate data in matchedcase-control studies: Do the usual

paradigms apply?

Bryan Langholz

USC Department of Preventive Medicine

Joint work withMulugeta Gebregziabher

Larry Goldstein

Mark Huberman

1

Page 2: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

CONTEXT AND SCOPE

• Missing covariate data in nested case-control studies.

• Epidemiologic individually matched case-control data.

• 1:1 (single case-single control).

• Single covariate that has missing values.

• Missing indicator methods.

2

Page 3: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

MISSING INDICATOR METHOD IN THELITERATURE

Reference Context CommentJones (1986) JASA 91:222-230.

Least squares regression “The missing-indicator methodsshow unacceptably large biases inpractical situations and are notadvisable in general.”

Vach and Blettner (1991) AJE134:895-907.

Unmatched case-control studies(logistic)

“Making use of data on subjectswho are neglected in completecase analysis by creating an ad-ditional category always results inbiased estimation.”

Greenland and Finkle (1995)AJE 142:1255-1264.

Unmatched case-control studies(logistic)

“The authors recommend thatepidemiologists avoid using themissing-indicator method...”

Li, et al. (2004) AJE 159:603-10.

Individually matched case-control

“...the missing-indicator methodshould be used cautiously.”

3

Page 4: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

THREADS

• THREAD I: Some background: Missing data inmatched pairs.

• THREAD II: Some theory: Missing data inducedintensity and models.

• THREAD IV: Missing indicator methods: A bad rap?

• THREAD III: Is MCAR/MAR/NI classification “asquare peg in a round hole?”

4

Page 5: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

THREAD I: SOME BACKGROUND

• Motivation: Childhood leukemia study.

• Old work: “Retain complete pairs” method formatched pairs.

• Data analysis.

5

Page 6: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

CASE-CONTROL STUDY OF MAGNETICFIELDS AND CHILDHOOD LEUKEMIA

• Individually matched population based case-control studyin Los Angeles County.

• Cases identified retrospectively from Los Angeles SEERRegistry.

• Controls individually age matched with friends or randomdigit dial.

• 232 Matched pairs.

• Meters used to collect 24-hour magnetic field profile inchild’s bedroom.

• Spot measurements.

0London et al (1991) AJE 134:923-937

6

Page 7: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

LOS ANGELES CHILDHOOD LEUKEMIASTUDY

CompleteMissing matched

Information Cases Controls subjects pairs

Interview data 232 232 0% 232

24-hour magnetic fieldmeasurements

164 144 34% 108

Spot magnetic fieldmeasurement in child’sbedroom

140 109 46% 71

7

Page 8: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

REASONS FOR MISSING DATA

• Most because the family had moved “relevant” homeso it was not possible to measure fields.

• Instrument problems.

• Some refusal.

8

Page 9: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

COMPLETE CASE ANALYSIS

• Drop subjects with missing data.

• Analyze as matched case-control study.

Note that:

• Pair is dropped if either case or control is missing.

Large loss of data.

9

Page 10: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

BREAK THE MATCHING

• Drop subjects with missing data.

• Break the matching.

• Analyze as unmatched case-control study, control forconfounding by matching factors by modelling.

Note that:

• Can model age but

• Quantification of “friend” or “telephone number”matching not possible.

Potential residual confounding.

10

Page 11: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

SOME PAST WORK: A BETTER METHOD

Compromise between complete case and breakthe matching:

• Retain matching for those pairs in which bothsubjects have covariate data.

• Break matching for subjects with data in pairs with“missing data partner.”

Avoids large loss of data (complete case analysis)and minimizes potential uncontrolledconfounding (break the matching).

11

Page 12: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

ANALYSIS

• Matched pairs part contributes conditional logisticlikelihood part.

• Unmatched part contributes unconditional logisticlikelihood part (perhaps control for confounding bymodelling matching variables).

Estimation based on product of conditional andunconditional logistic likelihood parts.

12

Page 13: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

SIMULATION STUDY PARAMETERS

• Three age groups - 50%, 40%, 10% youngest tooldest.

• Exposure - Xi −N (µs, 1) with µs - stratum specificmean.

• µs = µage + U(−.5, .5), µage= age = 0,1, or 2.

• Age - quantifiable matching factor, the U(−.5, .5) forunquantifiable.

• Independent “risk sets” of size 50 were generated for400 cases.

13

Page 14: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

• Disease status - prob based on rate ratios,multiplicative in age and exposure withrrage = 2, rrX =

√2, βX = .35.

• A single individual was randomly sampled from the 49in the risk set to serve as the control in the matchedpair set.

• 25% missing X completely at random: 56% pairs hadcomplete data.

• Results based on 500 trials.

14

Page 15: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

SIMULATION RESULTS

Mean Empirical Estimated

Method parameter s.e. s.e.√

MSE

No missing 0.35 0.080 0.075 0.075

Complete case 0.35 0.098 0.101 0.101

Break the matching:Pooled over age 0.19 0.052 0.061 0.171Age stratified 0.26 0.065 0.073 0.114

Retain matching in complete pairs:Pooled over age 0.29 0.078 0.077 0.099Age-stratified 0.32 0.082 0.083 0.087Age-trend 0.31 0.081 0.082 0.088

15

Page 16: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

“RETAIN THE MATCHING” METHOD

• Efficiency is as least as good as complete-caseanalysis.

• Bias is never as bad as breaking the matching.

Good compromise to the two most commonlyused alternatives.

16

Page 17: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

DATA ANALYSIS: FITTING THE “RETAINTHE MATCHING” METHOD

Missing indicator analysis is exactly the “retainthe matching in complete pairs” analysis.

0Huberman and Langholz (1999) AJE 150:1340-1345.

17

Page 18: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

MISSING INDICATOR ANALYSIS DATASET

Exposure Missing Age-stratified MIC-C Case(1) Age (0 if missing) indicator Age=1 Age=2 Age=3 Age-trendSet control(0) group Z M1 M2 M3 Mtrend38 1 3 2.99688 0 0 0 0 038 0 3 0.00000 1 0 0 1 3

39 1 3 2.91090 0 0 0 0 039 0 3 0.00000 1 0 0 1 3

40 1 2 0.92814 0 0 0 0 040 0 2 0.00000 1 0 1 0 2

41 1 1 0.62488 0 0 0 0 041 0 1 0.90149 0 0 0 0 0

42 1 1 0.51196 0 0 0 0 042 0 1 -0.76174 0 0 0 0 0

43 1 3 2.48709 0 0 0 0 043 0 3 0.00000 1 0 0 1 3

44 1 3 2.65434 0 0 0 0 044 0 3 0.00000 1 0 0 1 3

45 1 2 0.00000 1 0 1 0 245 0 2 0.00000 1 0 1 0 2

46 1 1 0.00000 1 1 0 0 146 0 1 0.00000 1 1 0 0 1

18

Page 19: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

THREAD II: SOME THEORY

• Develop a theory for predictable missingness (doesn’tdepend on case-control status) .

– Consider complete data cohort intensity (rates)under proportional hazards.

– Use the double expectation formula to derivegeneral form of missing data “induced” intensity.

– Extend the proportional hazards model toencompass the possible form(s) of the inducedintensity.

– Propose a semi-parametric version of this model.

– Fitting the model.

– Complete case analysis.

• Adapted (depends on case-control status) missingness.

19

Page 20: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

KEY ASSUMPTION FOR THIS PART

• I assume that missingness is predictable - does notdepend on case-control status.

20

Page 21: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

COHORT - CASE-CONTROL STUDIES

Key point (Ørnulf’s talk):

• If the true intensity is a member of the specifiedproportional hazards model, then the partiallikelihood(s) provide valid estimation of β0.

• Works for both Cox (for full cohort) and conditionallogistic (for individually matched case-control data)regression.

21

Page 22: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

INTENSITY FOR COMPLETE COVARIATEDATA

• Let (Ni, Yi, Zi) counting, censoring, and covariateprocesses (i.i.d.).

• Z - corresponding filtration.

• Then the “complete data” Z-intensity is

λi(t;Z)dt = E[dNi(t)|Zt− ]

0Andersen et al (1992) Statistical Models Based on Counting Pro-cesses, Springer.

22

Page 23: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

PROPORTIONAL HAZARDS MODEL FORCOMPLETE DATA

• Semi-parametric model for the rate of disease as afunction of factors (e.g., radiation, genes).

• λ(t, y, z;α(·), β) = y α(t) r(z;β).

– α - rate of disease in “unexposed” as a functionof time (nuisance function).

– r - “rate ratio” as a function of z.

– β - rate ratio parameters to be estimated.

• Assume that there exist λ0, β0 such thatλi(t;Z) = Yi(t) λ0(t) r(Zi(t);β0).

23

Page 24: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

DATA MODEL FOR (PREDICTABLE)MISSING DATA

• Zi(t) - covariate with no missing data.

• Xi(t) - single covariate with missing data.

• Mi(t) - missing indicator, independent and leftcontinuous.

• Representation for missing covariate data(Ni, Yi, (1−Mi)Xi,Mi, Zi).

• M-filtration generated by{Ni(t), Yi(t), [1−Mi(t)]Xi(t),Mi(t), Zi(t)}.

Note: Mi is M-predictable.

24

Page 25: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

INTENSITY PROCESS FOR(PREDICTABLE) MISSING DATA

• The “Innovation Theorem” says that the M (missingdata)-intensity

λi(t;M)dt = E[dNi(t)|Mt− ] = E[λi(t;Z)|Mt− ]dt

(Double expectation formula for processes)

• Proportional hazards:

λi(t;M) = Yi(t)λ0(t)E[r(Xi, Zi(t);β0)|Mt− ]

25

Page 26: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

MISSING DATA INDUCED INTENSITY

Assumptions:

• Distribution of X only depends on summaries of otherfactors at t.

• Recall i.i.d.

Define:

• FX(·; z, t; β0, α0) - distribution function for X conditionalon Y (t) = 1, M(t) = 1, and Z(t) = z.

26

Page 27: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

MISSING DATA INDUCED INTENSITY

λi(t;M) = Yi(t) λ0(t) r(Xi(t), Zi(t);β0)1−Mi(t)

× η[t, Zi(t);β0, λ0, FX(·; t, Zi(t);β0, λ0)]Mi(t)

where

η[t, z;β0, λ0(·), FX(·; t, z;β0, λ0)] =

E[r(X, z;β0)|Y (t) = 1,M(t) = 1]

=∫

r(x, z;β0)dFX(x; t, z;β0, λ0).

27

Page 28: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

MISSING DATA INDUCED MODEL

• We will need to expand the complete data modelwith a component to accommodate η.

• In (very general) the model would look like:

λ(t, y, (x, z),m;α, β) =

y α(t) r(x, z;β)1−m η[t, z;β, α, Fψ(·; t, z;β, α)]m

where

η(t, z;β, α) =∫

r(x, z;β) dFψ(x; t, z;β, α)

and Fψ is a family of distribution functions.

28

Page 29: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

“MODELLING THE MISSINGNESS”

• Modelling the missingness is modelling η, the“expected rate ratio.”

• η(t, z;β, α, Fψ(x, z, t;β, α)) depends β, α and Fψ.

• Missing data methods for failure time data can beseen as different ways of structuring η.

• Prentice (1982)1 assumes a distributional form onX|Z ∼ N(µZ , σ2) and take expectation as functionof β.

1Prentice (1982) Biometrika 69:331-342.

29

Page 30: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

A SEMI-PARAMETRIC APPROACH

Model that encompasses the “missing data” inducedintensity under proportional hazards:

λ(t, y, (x, z),m;α, β) = y α(t) r(x, z;β)1−m η(t, z; γ)m.

where η(t, z; γ) corresponds to the “expected rate ratio”E[r(X, Z;β)|Y = 1,M = 1, Z = z)].

• Since the model is still proportional hazards, standardpartial likelihood methods are valid for both cohortand case-control data.

• Need to capture the variation in expected rate ratio ηwith t and (known) modelled covariates Z.

30

Page 31: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

SOME OBSERVATIONS

• “Non-parametric” with respect to distribution of X(just like complete data model).

• Probably don’t need to model η too well as it is anuisance parameter in the model.

31

Page 32: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

DATA ANALYSIS: PARTIAL LIKELIHOODFOR THE SEMI-PARAMETRIC

MISSINGNESS MODEL

“Missing indicator” method.

32

Page 33: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

COMPLETE CASE ANALYSIS

Theorem: If η(t, z; γ) = γ(t, z) (“unstructured”) then, thepartial likelihood estimator is the complete case analysis.

(Pseudo) Proof:

y α(t) r(x, z; β)1−m γ(t, z)m =

{y α(t) r(x, z; β) if m = 0y γ̄(t, z) if m = 1

where γ̄(t, z) = α(t)γ(t, z) captures the disease rate in missing.

• “Missing data” stratified model.

• PL contribution from subjects with missing data isconstant, so “dropped” from analysis.

• Complete case analysis.

33

Page 34: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

CONSEQUENCES OF THE THEOREM

• Complete case analysis is “infinite parameter” missingindicator-t-Z interaction model.

• Complete case analysis is thenon-parametric-limiting-case of missing indicatormodelling.

• Spectrum:

– Single parameter - most bias, least variance.

– Complete case - least bias, most variance.

Model missingness with “enough but not too many”parameters.

34

Page 35: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

SIMULATION STUDY: PREDICTABLEMISSINGNESS

• X - Dichotomous, can be missing (RR=2).

• Z - Three levels, no missing (RR=4/level).

• 1:1 matched data (as before), matching factor is not aconfounder.

• X-missingness dependence (predictable): none, Z, X,Z −X.

• Missing data methods:

– Complete case.

– Z-specific missing indicators (right model).

• Results based on 500 trials.

35

Page 36: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

SIMULATION RESULTS: X MISSING IN 50%:

BIAS

Missing Eβ̂X Eβ̂ZX Complete Missing Complete Missing

depend. case indicators case indicatorsnone 0.72 0.71 1.44 1.41Z 0.72 0.69 1.44 1.42X 0.73 0.71 1.44 1.42Z −X 0.71 0.70 1.43 1.42

1No missing: Eβ̂X = 0.69, Eβ̂Z = 1.40

36

Page 37: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

SIMULATION RESULTS: X MISSING IN 50%

VARIANCE

Missing var β̂aX var β̂aZX Complete Missing Complete Missing

depend. case indicators case indicatorsnone 0.138 0.064 0.068 0.029Z 0.121 0.062 0.081 0.040X 0.129 0.064 0.062 0.027Z −X 0.105 0.061 0.049 0.026aVariance well estimated by usual inverse information estimator.

37

Page 38: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

ADAPTED MISSINGNESS

Definition:

• Missingness depends on case-control status.

38

Page 39: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

ADAPTED MISSINGNESSSIMULATION RESULTS: X MISSING IN 50%:

BIAS

Missing Eβ̂X Eβ̂ZX Complete Missing Complete Missing

depend. case indicators case indicatorsD 0.72 0.71 1.46 1.43D − Z 0.72 0.72 0.70 0.69D −X -0.26 -0.24 1.46 1.42

39

Page 40: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

THEORY FOR ADAPTED MISSINGNESS:SECOND STAGE SAMPLING

• r1 - case-control set.

• r2 - subjects in the case-control set with completedata (“2nd stage sample”).

• Ni,r1,r2 - counts i,r1,r2 ”events.”

• π(r2|i, r1) - probability of complete data (r2) givencase-control status.

Proportional hazards:

λi,r1,r2(t) = Yi λ0(t) r(Xi, Zi;β0) π(r1|i) π(r2|i, r1)

40

Page 41: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

CONJECTURE: PREDICTABLE VS.ADAPTED MISSINGNESS

• Predictable (missingness does not depend oncase-control status) - can estimate all parameters.

• Adapted (missing depends on case-control status)-can’t estimate all parameters. 2

Conjecture:To estimate all parameters in the model predictability issufficient and necessary.

2D dependent missingness: can’t estimate λ0.

41

Page 42: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

THREAD III: HAVE MISSING INDICATORMETHODS GOTTEN A BAD RAP?

42

Page 43: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

SO, IS EVERYONE ELSE WRONG ABOUTMISSING INDICATORS?

Two considerations:

• Lack of understanding that one needs to “modellingthe missingness:” model the variation in expectedrate ratio.

• Something special about individually matchedcase-control setting.

43

Page 44: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

“FULL COHORT” ANALYSIS

Mean EmpiricalMethod parameter s.e.

No missing 0.34 0.05

Complete case 0.34 0.11

Missing indicator modelsSingle MI 0.24 0.09Z-MI interaction 0.34 0.11Z-MI trend 0.32 0.11

280% missing, MCAR

44

Page 45: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

LIKELY SITUATION: WE’RE ALL “RIGHT”

• In other settings, there is less (no?!?) efficiency gainusing missing indicator?

– Survival analysis.

– Single stratum unmatched case-control studies.

Need to evaluate each setting.

45

Page 46: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

THREAD IV: “CLASSICAL”MCAR/MAR/NI CLASSIFICATION OF

MISSINGNESS

Piak and Sacco (2000) Regression calibration (imputation)Rathouz, et al. (2002) Missing data likelihoodPiak (2004) Regression calibration (imputation)

All assume MAR.

46

Page 47: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

SIMULATION RESULTS: X MISSING IN 50%:

BIAS

Missing Eβ̂X Eβ̂ZX Miss Comp Miss Comp Miss

depend. type case ind case indPredictable missingness

none MCAR 0.72 0.71 1.44 1.41Z MAR 0.72 0.69 1.44 1.42X NI 0.73 0.71 1.44 1.42Z −X NI 0.71 0.70 1.43 1.42

Adapted missingnessD MAR 0.72 0.71 1.46 1.43D − Z MAR 0.72 0.72 0.70 0.69D −X NI -0.26 -0.24 1.46 1.42

47

Page 48: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

CLASSICAL CLASSIFICATIONS:UNBIASEDNESS OF COMPLETE CASE OR

MISSING INDICATOR

• MCAR OK, but too strong (sufficient but notnecessary).

• MAR can be unbiased (e.g., MAR(Z)) or biased(e.g., MAR(D − Z)).

• NI can be unbiased (e.g., NI(X)) or biased (e.g.,NI(D −X)).

Not very useful for predicting validity ofmethods.

48

Page 49: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

IS MCAR/MAR REASONABLE?

Even if asked prior to disease occurrence, missing thefollowing are likely to be dependent on the actual value(NI):

• Did you smoke during pregnancy?

• What is your current income?

• Radiation dose as abstracted from medical records.

Better not to make a MCAR/MAR assumption.

49

Page 50: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

MULTIPLE IMPUTATION APPLIED TOMATCHED PAIRS

SIMULATION STUDY

• X and Z covariates, X with missing.

• Used multiple imputation (SAS PROCs MI, PHREG,and MIANALYZE).

• Imputation model has case-control status and Z.

• Only looked at performance for estimating βX = .69.

50

Page 51: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

MISSING INDICATOR AND MULTIPLEIMPUTATION

Missing MultipleMissing Miss indicators imputation

depend. type β̂X varβ̂X β̂X varβ̂Xnone MCAR 0.71 0.26 0.70 0.27Z MAR 0.71 0.27 0.71 0.25X NI 0.70 0.27 0.70 0.28Z −X NI 0.70 0.26 0.69 0.28D MAR 0.70 0.27 0.72 0.30D − Z MAR 0.74 0.27 0.70 0.28D −X NI 1.06 – 2.04 –

51

Page 52: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

MISSING INDICATOR AND MULTIPLEIMPUTATION FOR 1:1 DATA

Comparison:

• Performance, including precision, very similar tomissing indicators.

• Both methods require “modelling missingness.”

Classical classification and multiple imputation:

• MCAR/MAR/NI not relevant.

52

Page 53: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

CLOSING THOUGHTS

53

Page 54: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

MISSING INDICATOR FOR INDIVIDUALLYMATCHED CASE-CONTROL STUDIES

• Standard conditional logistic software using standardmodelling techniques.

• Predictability is key assumption for valid estimation.

• Need to model the variation in the expected rateratio (missing indicator-covariate interactions).

54

Page 55: Missing covariate data in matched case-control studies: Do ...people.musc.edu/~gebregz/publications/missing... · Unmatched case-control studies (logistic) “Making use of data on

GENERAL COMMENTS

• Classical classification of missingness was not veryuseful in assessing estimator behavior nor guiding themethod of analysis: Some research to be done here?

• Predictable vs. adapted classifies well and is closelyassociated with the concept of information bias.

• Adding MAR or MCAR assumptions about missingtype may yield estimators with better efficiency.

• MAR and MCAR should be evaluated carefully.

55