172
via C. Battisti, 241 - 35121 Padova, Italia - tel 498274168 – fax 49827417 Scuola di Dottorato di Ricerca in Scienze Statistiche - XXII ciclo Ciclo di seminari su MISCLASSIFICATION AND MEASUREMENT ERROR IN REGRESSION MODELS tenuti da Prof. Helmut Küchenhoff Ludwig-Maximilians-Universität München Calendario Martedì 2 ottobre 2007 10.00 – 12.00 Aula Uggè 14.30 – 16.30 Aula Cucconi Mercoledì 3 ottobre 2007 10.00 – 12.00 Aula Cucconi 14.30 – 16.30 Aula Cucconi http.//www.stat.unipd.it/seminaridottorato Prof.ssa Alessandra Salvan Direttore della Scuola

MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

via C. Battisti, 241 - 35121 Padova, Italia - tel 0498274168 – fax 0498274170

Scuola di Dottorato di Ricerca in Scienze Stat ist iche - XXII c ic lo

C i c l o d i s e m i n a r i s u

M I S C L A S S I F I C A T I O N A N D M E A S U R E M E N T E R R O R

I N R E G R E S S I O N M O D E L S

t e n u t i d a

Pro f . He lmut Küchenhof f Ludw ig -Max im i l i a n s -Un i v e r s i t ä t München

C a l e n d a r i o M a r t e d ì 2 o t t o b r e 2 0 0 7 1 0 . 0 0 – 1 2 . 0 0 A u l a U g g è

1 4 . 3 0 – 1 6 . 3 0 A u l a C u c c o n i

M e r c o l e d ì 3 o t t o b r e 2 0 0 7 1 0 . 0 0 – 1 2 . 0 0 A u l a C u c c o n i

1 4 . 3 0 – 1 6 . 3 0 A u l a C u c c o n i

http.//www.stat.unipd.it/seminaridottorato

Prof.ssa Alessandra Salvan Direttore della Scuola

Page 2: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

MISCLASSIFICATION AND MEASUREMENTERROR IN REGRESSION MODELS

Short course atDIPARTIMENTO DI SCIENZE STATISTICHE

Universita’ degli Studi di Padova

Helmut KuchenhoffStatistical Consulting Unit

Ludwig-Maximilians-Universitat Munchen

Padova2./3.10.2007

Page 3: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Schedule

Tuesday 10-12 1. Misclassification: Basic Models14.30-16.30 2. Measurement error: Effect and Models

Wednesday 10–12 3. Methods for Estimation in the presenceof Measurement error

14.30–16.30 4. Simuation and Extrapolation (SIMEX)for Misclassification and measurement error:Concept and Examples

Padova 10/2007 1

Page 4: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Material

• Carroll R. J. , D. Ruppert , L. Stefanski and CRainiceanu, C : Measure-ment Error in Nonlinear Models. A Modern perspective. Chapman &Hall London 2006.

• Kuha, J., C. Skinner and J. Palmgren. Misclassification error. In:Encyclopedia of Biostatistics, ed. by Armitage, P.and Colton, T., 2615-2621. Wiley, Chichester.

Padova 10/2007 2

Page 5: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Outline 1. Misclassification

• Examples

• One sample

– Model– Effect– Correction

• Bivariate analysis

– Error in disease status– Error in exposure status– Methods for correction

Padova 10/2007 3

Page 6: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

• Regression

– Misclassification in binary response– Miscalassification in regressors

Padova 10/2007 4

Page 7: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Outline 2. Measurement error: Models and effect

• Examples

• Models for the error

• Effect of measurement error

– Response error– Linear model– Logistic model– Survival

Padova 10/2007 5

Page 8: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Outline 3. Methods for correction

• Direct correction, method of moments

• Regression calibration

• Likelihood

• Quasi likelihood

• Bayes

• Corrected score

Padova 10/2007 6

Page 9: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Outline 4. SIMulation and EXtrapolation

• SIMEX for measurement error

• Misclassification SIMEX

Padova 10/2007 7

Page 10: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

1. Misclassification: Examples

• Wrong diagnosis

”Not diseased“ instead of

”diseased”

• Wrong answer in a questionnaire

”No drugs”

”Do not smoke”

• Technical problems , e. g. classification of genes

• Problem of definition, e .g. Caries

• Randomized response

• Anonymisation of data

Padova 10/2007 8

Page 11: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

NotationWe have to distinguish between true (correctly measured ) variableand its (possible incorrect) measurement

X ,W , Z - Notation (Carroll et al.)X: correctly (unobservable) VariableW : possibly incorrect measurement of XZ: Further correctly measured variables

ξ - X- Notation (Schneeweiß , Fuller)ξ : correctly (unobservable) VariableX : possibly incorrect measurement of X

* - Notation (HK)X, Z : correctly (unobservable) VariableX∗, Z∗: Corresponding possibly incorrect measurements

Padova 10/2007 9

Page 12: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

1.2 One sample binary

Model for misclassificationX : true binary variable, gold standardX∗ : observed value of X, surrogate

P (X∗ = 1|X = 1) = π11 (Sensitivity)

P (X∗ = 0|X = 0) = π00 (Specificity)

P (X∗ = 0|X = 1) = 1− π11 = π01

P (X∗ = 1|X = 0) = 1− π00 = π10

→ misclassification matrix

Π =

(π00 π01

π10 π11

)

Padova 10/2007 10

Page 13: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Effect of misclassificationNaive analysis: Simply ignore misclassificationWe want to estimate P (X = 1)We use 1

n

∑ni=1 X∗

i

P (X∗ = 1) = π11P (X = 1) + π10P (X = 0)

P (X∗ = 1)− P (X = 1) = π10P (X = 0)− π01P (X = 1)

−→ Examples:

No bias if P (X = 1) = 12 and π00 = π11

Neg. bias if P (X = 1) = 0.9 and π00 = π11 = 0.9→ Bias = −0.1 · 0.9 + 0.1 · 0.1 = −0.08Pos. bias if P (X = 1) = 0.8 and π11 = 0.99, π00 = 0.9→ Bias = −0.01 · 0.8 + 0.1 · 0.9 = 0.01

Padova 10/2007 11

Page 14: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Effect of Misclassification

Everything can happendependent on π11, π00 and P (X = 1).

If π00 = π11 (in most times unrealistic) then

Bias = π00(1− 2P (X = 1))

P (X = 1) > 0.5 =⇒ bias < 0

P (X = 1) < 0.5 =⇒ bias > 0

Attenuation towards 0.5

Padova 10/2007 12

Page 15: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Correction

Idea: Solve the bias equationNote that X∗ is still binomial and P(X∗ = 1) can be consistently esti-mated from the observed data.

P (X∗ = 1) = π11P (X = 1) + π10(1− P (X = 1))

⇒ P (X = 1) = (P (X∗ = 1)− π10)/(π11 + π00 − 1)

Assumptions

• π11 and π00 known

• π11 + π00 > 1

Variance factor (π11 + π00 − 1)−2

Padova 10/2007 13

Page 16: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Multinomial case

X is multinomial with categories 1, . . . , k.

X∗ is observed

The error model is given by the misclassification Matrix

Π = {πij}

with πij = P (X∗ = i|X = j). Then we get for the probability vectors

PX = (px1, . . . , pxk)′

PX∗ = (p∗x1, . . . , p∗xk)

PX∗ = Π ∗ PX

Padova 10/2007 14

Page 17: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

The matrix method

The correction method is given by

PX := Π−1 ∗ PX∗ (1)

Properties

• Misclassification matrix has to be known or estimated

• Gives sometimes probabilities > 1 or < 0

• Variance calculation straight forward

• Use the delta method in the case of estimated Π, Greenland (1988)

Padova 10/2007 15

Page 18: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Prevalence estimation from the Signal- Tandmobiel study

• Oral health study involving 4468 children in Flanders

• Y=1 if the tooth is decayed, missing due to caries or filled

• 16 examiners with high MC on Y

• Validation study also used for two regions

• Validation data from 3 validation studies

• Simple correction in two regions: East and West

Padova 10/2007 16

Page 19: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

0.1

0.2

0.3

0.4

0.5

0.6

b

b

1996

bb

1997

b

b

1998

b

b

1999

b

b

2000

b

b

2001

1

Padova 10/2007 17

Page 20: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

0.1

0.2

0.3

0.4

0.5

0.6

b

b

1996

b

b

1997

b

b

1998

b

b

1999

b

b

2000

b

b

2001

1

Padova 10/2007 18

Page 21: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Results

Estimated prevalence using data from the validation study

• Corrections

• Huge confidence limits

• MC Matrix possibly overestimated

Padova 10/2007 19

Page 22: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Information about misclassification

There are three basic strategies:

• Assumption, external validation data

• Internal validation data

• Replication data

Padova 10/2007 20

Page 23: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Assumption, external validation data

Examples

• Certain type of diagnosis

• Technical applications

• Results from other studies (be very careful!)

• Interpretation as sensitivity analysis

Note that ignoring misclassification assumes πij = 0!

Padova 10/2007 21

Page 24: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Internal validation data

Examples

• Caries study: examiners were compared to a gold standard

• Controlling a part of a questionnaire by a doctor

• Ex post check of a diagnosis

Padova 10/2007 22

Page 25: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Calibration Model

X : true binary variable, gold standard examinerX∗ : observed value of X, surrogate

P (X = 1|X∗ = 1) (positive predicted value)

P (X = 0|X∗ = 0) (negative. Predicted value)

can be calculated from MC-Matrix and marginal Distribution of X (i.e.from P (X = 1))

Padova 10/2007 23

Page 26: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Replication

If no gold standard is available measurements are replicated.

• Requirement: measurements have to be conditional independent onthe true value

• Identifiability conditions for multinomial case

Padova 10/2007 24

Page 27: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Two independent measurements

We observe X∗i1, X

∗i2 ,i.e a 2 ∗ 2 –table:

X∗1 = 0 X∗

1 = 1

X∗2 = 0 n00 n10

X∗2 = 1 n01 n11

Assuming independence and constant MC we get :

P (X∗1 = 0, X

∗2 = 0) = P (X = 0) ∗ π

200 + P (X = 1) ∗ π

201)

P (X∗1 = 1, X

∗2 = 1) = P (X = 0)π

210 + P (X = 1)π

211

P (X∗1 = 0, X

∗2 = 1) = P (X = 0) ∗ π00π10 + P (X = 1)π01π11

P (X∗1 = 1, X

∗2 = 0) = P (X

∗1 = 0, X

∗2 = 1)

Two independent equations, but three unknown parameters!

⇒ We cannot estimate the MC-Matrix and P(X=1)!!

Padova 10/2007 25

Page 28: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Identified problems

Literature about diagnostic tests

Three independent Measurements: Three independent equations threeunknowns. Explicit solution available

Further assumptions : Error in Haplotype reconstruction same MC ma-trix for each gene (Heid, HK et al., work in progress)

Padova 10/2007 26

Page 29: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Kappa Statistics

Basic idea : Evaluate agreement and adjust for agreement by chance

Measuring agreement:n00 + n11

n

X∗1 = 0 X∗

1 = 1X∗

2 = 0 10 2X∗

2 = 1 2 0

X∗1 = 0 X∗

1 = 1X∗

2 = 0 5 2X∗

2 = 1 2 5

Same proportion of agreement, but different situation !!

Padova 10/2007 27

Page 30: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Definition of Kappa

Po =n00 + n11

n

Pe =n0.n.0

n2+

n1.n.1

n2

κ = (Po − Pe)/(1− Pe)

X∗1 = 0 X∗

1 = 1X∗

2 = 0 10 2X∗

2 = 1 2 0

X∗1 = 0 X∗

1 = 1X∗

2 = 0 5 2X∗

2 = 1 2 5

κ < 0 κ = 0.428

Padova 10/2007 28

Page 31: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Kappa and MC-Matrix

Kappa depends on the MC-Matrix and marginal distribution P(X=1)Fixed MC-Matrix π00 = 0.9, π11 = 0.7 (l)Sensitivity and specificity which result in κ = 0.5 for P (X = 1) = 0.2(r)

0,6

0,3

0,4

0,2

0,1

0,2

0

0

prx

10,8

Kappa sens

0,950,9

spec

0,85

1

0,8

0,9

0,8

0,75

0,7

0,6

0,70,5

Kappa=0.5

Padova 10/2007 29

Page 32: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

1.3 Bivariate analysis

2*2 Tables, misclassification in disease status:

Binary exposure: XDisease status: YMeasurement of disease: Y ∗

Model for misclassification:

π110 = P (Y ∗ = 1|Y = 1, X = 0)

π111 = P (Y ∗ = 1|Y = 1, X = 1)

π100 = P (Y ∗ = 1|Y = 0, X = 0)

π101 = P (Y ∗ = 1|Y = 0, X = 1)

Padova 10/2007 30

Page 33: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Non differential misclassification if

π110 = π111 and π100 = π101,

i.e. misclassification independent of exposure

Padova 10/2007 31

Page 34: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Effect and correction

Use the results of one sample case:

P (Y∗= 1|X = 1) = π111P (Y = 1|X = 1) + π101P (Y = 0|X = 1)

P (Y∗= 1|X = 0) = π110P (Y = 1|X = 0) + π100P (Y = 0|X = 0)

If the misclassification error is non differential then:

P (Y∗= 1|X = 1)− P (Y

∗= 1|X = 0) =

[P (Y = 1|X = 1)− P (Y = 1|X = 0)] (π11 + π00 − 1)

• Attenuation to 0

• Also for OR

• Correction by matrix method

Padova 10/2007 32

Page 35: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Misclassification in exposure

We observe X∗ instead of XModel for misclassification:

π110 = P (X∗= 1|X = 1, Y = 0)

π111 = P (X∗= 1|X = 1, Y = 1)

π100 = P (X∗= 1|X = 0, Y = 0)

π101 = P (X∗= 1|X = 0, Y = 1)

Non differential misclassification ifπ110 = π111 and π100 = π101,i.e. misclassification independent of diseaseThis is fulfilled in most cohort studies, but could be violated in casecontrol studies

Padova 10/2007 33

Page 36: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Example for non differential misclassification error

high fat No Yes No Yes No Yescases 450 250 360 340 410 290controls 900 100 720 280 740 260Odds ratio 5.0 2.4 2.0

Correct 20%of No 20% of No s. YesClassification say Yes 20% of Yes s. No

Attenuation to OR = 1 Note: Everything can happen in case of differ-ential misclassification

Padova 10/2007 34

Page 37: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Likelihood

We assume non differential misclassification error

P (Y = 1, X∗= x

∗) =

∑x

P (Y = 1, X∗= x

∗, X = x)

=∑

x

P (Y = 1|X∗= x

∗, X = x) ∗ P (X

∗= x

∗, X = x)

=∑

x

P (Y = 1|X = x) ∗ P (X∗= x

∗|X = x) ∗ P (X = x)

We have three components of the likelihood:

Main model: P (Y = 1|X = x)Measurement model: P (X∗ = x∗|X = x)Exposure model: P (X = x)

Padova 10/2007 35

Page 38: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Observed probabilities

P (Y = 1|X∗= w) =

P (Y = 1, X∗ = w)

P (X∗ = w)

P (Y = 1|X∗ = 1)− P (Y = 1|X∗ = 0)

P (Y = 1|X = 1)− P (Y = 1|X = 0)=

(π11 + π00 − 1)P (X = 1)P (X = 0)

P (X∗ = 1)P (X∗ = 0)

Bias to 1 if (π11 + π00 − 1) > 0

Padova 10/2007 36

Page 39: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Misclassification in a confounder

X : Misclassified confounderZ : ExposureY : Response

Even in the case of non differential measurement error with respect toY and Z:

• Bias in both direction possible

• Residual confounding

• e.g. Savitz and Baron (1989)

Padova 10/2007 37

Page 40: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Correction methods

• Matrix method: The two by two table can be seen as one multinomialvariable

• Variance estimation see Greenland(1988)

• MLE for unrestricted sampling

• Alternatives by Tennebein (1972)

Padova 10/2007 38

Page 41: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Misclassification in regression

General Regression Model

E(Y |X1, . . . , Xk) = h(β0 + β1X1 + . . . + βkXk) h: Link-function

Misclassification possibly on

• binary covariates: Observe X∗ instead of X

• binary response : Observe Y ∗ instead of Y

Padova 10/2007 39

Page 42: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Handling misclassification in Y in binary regression

• Hausmann et al. (Journal of Econometrics, 1998)

• Neuhaus (Biometrika, 1999)

We observe Y ∗ instead of Y with misclassification matrix Π

P (Y∗= 1|X) = π11G(x

′β) + (1− π10)(1−G(x

′β)) = H(x

′β)

H(t) = π11G(t) + (1− π00)G(t)

Padova 10/2007 40

Page 43: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Logistic regression with misclassified Y

0

0.2

0.4

0.6

0.8

1

–4 –3 –2 –1 1 2 3 4X

Page 44: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Misclassification in regressors

One binary regressor, normal Outcome:

Y = β0 + β1I1, β0 = µ0, β1 = µ1− µ0

Naive analysis:

E(Y |X∗ = 0)) = P (X = 0|X∗ = 0) ∗ µ0 + P (X = 1|X∗ = 0) ∗ µ1

E(y|X∗ = 0) = P (X = 0|X∗ = 1) ∗ µ0 + P (X = 1|X∗ = 1) ∗ µ1

These equations can be solved for µ1 and µ2, when MC Matrix andP(X=0) is known

Matrix Method

Padova 10/2007 42

Page 45: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Likelihood

L(Y = 1|X∗= x

∗) =

∑x

L(Y |X∗= x

∗, X = x)

=∑

x

L(Y |X∗= x

∗, X = x) ∗ P (X

∗= x

∗, X = x)

=∑

x

L(Y |X = x) ∗ P (X∗= x

∗|X = x) ∗ P (X = x)

Likelihood for many regression models numerically easy to handle

Components of the misclassification model and its components can beadded.

Padova 10/2007 43

Page 46: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Effects of misclassification

• Biased and inconsistent estimates for parameters

• In most cases attenuation to 0

• In complex settings bias in any direction possible

• Effect dependent on the misclassification matrix

• Similar to effect of measurement error in continuous variables in re-gression

Padova 10/2007 44

Page 47: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Hypothesis testing

Attenuation →

• Naive tests (e. g. for no true effect in a 2x2 table) have still correctsignificance level

• Power reduction

• Sample size calculation has to be corrected

Padova 10/2007 45

Page 48: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Outlook

• Use of validation data

• Latent class analysis

Padova 10/2007 46

Page 49: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

MISCLASSIFICATION AND MEASUREMENTERROR IN REGRESSION MODELS

Part 2

Helmut KuchenhoffStatistical Consulting Unit

Ludwig-Maximilians-Universitat Munchen

Padova2./3.10.2007

Padova 2007 Part 2

Page 50: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Outline 2. Measurement error: Models and effect

• Examples

• Models for the error

• Effect of measurement error

– Response error– Linear model– Logistic model– Survival

Padova 2007 Part 2 1

Page 51: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

2. Measurement error Introduction

Measurement is the contact of reason with nature(Henry Margenau)

Nearly all the grandest discoveries of science have been but therewards of accurate measurement(Lord Kelvin)

Measurement is the basis for producing data

Literature: David Hand: Measurement. Theory and practice . The worldthrough quantification. (Arnold,2004)

Padova 2007 Part 2 2

Page 52: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Types of measurement

• Representational measurementMeasurements relate to existing attributes of the objectsExamples: Length, weight, blood parameter

• Pragmatic measurementAn attribute is defined by its measuring procedure, no ’real’ existencebeyond thatExamples: Pain score, intelligence

Padova 2007 Part 2 3

Page 53: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Sources of measurement error

• Induced by an instrument (laboratory value, blood pressure)

• Induced by medical doctors or patients

• Measurement error induced by definition, e.g. ”long term mean of dailyfat intake”

• Surrogate -Variables e.g. ”mean of exposure in a region where the studyparticipant lives instead of individual exposure

Padova 2007 Part 2 4

Page 54: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Examples

• Framingham heart study, Munich blood pressure study:Blood pressure, long term averageSingle measurement

• Munich bronchitis study:Average Occupational dust exposureSingle measurement, expert ratings

• Silicosis study:Occupational dust exposureJob exposure matrix

• MONICA study:

Padova 2007 Part 2 5

Page 55: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Long term Fat intakeOne week diary

• German radon study:Residential radon exposureMeasurements in flats and estimation depending on the home

• Uranium miners studyRadon exposureJob exposure matrix

• Erfurt studyIndividual exposure to a pollutantData from two gauging stations

Padova 2007 Part 2 6

Page 56: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Models for measurement error

• Systematic vs random

• Classical vs Berkson

• Additive vs multiplicative

• Homoscedastic vs heteroscedastic

• Differential vs non differential

Padova 2007 Part 2 7

Page 57: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Classical additive random measurement error

Xi : True valueX∗

i : Measurement of X

X∗i = Xi + Ui (Ui, Xi) indep.

E(Ui) = 0

V (Ui) = σ2U

Ui ∼ N(0, σ2U)

This model is suitable for

• Instrument m.e.

• One measurement is used for a mean

Padova 2007 Part 2 8

Page 58: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Accuracy, Validity and Reliability

• Accuracy: General term, describing how closely a measurement repro-duces the attribute being measured

• Validity: How well the measurement captures the true attribute or howwell it captures the concept which is targeted to be measured

• Reliability describes the differences between multiple measurements ofan attribute

Statistical point of view:Accuracy : Mean square errorValidity : Bias E(U)Reliability: Measurement error variance σ2

U

Padova 2007 Part 2 9

Page 59: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Reliability measures

Two measurements

X∗ij = Xi + Uij j = 1, 2

Assuming independence of the measurement errors Uij

V ar(X∗ij) = V ar(Xi) + V ar(Uij)

R =V ar(Xi)

V ar(X∗ij)

Cor(X∗i1, X

∗i2) =

Cov(X∗i1, X∗

i2)√V ar(X∗

i1) ∗ V ar(X∗i2)

= R

Cor(X∗i1, Xi) =

Cov(X∗i1, Xi)√

V ar(X∗i1) ∗ V ar(Xi)

=√

R

Padova 2007 Part 2 10

Page 60: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Intraclass Correlation and Reliability

Interpretation:R : Informative Part of measurement (Variance decomposition)R: Correlation between two independent measurements of the same unitR: Square of the correlation between true value and measurement

Estimation of reliability when two measurements per unit are available:Use

Corr(X∗i1, X

∗i2)

UseV ar(X∗

i1 −X∗i2) = V ar(Ui1 − Ui2) = 2σ2

u

Padova 2007 Part 2 11

Page 61: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

General case

More than 2 measurements per unit, different measurement tools etc. Usevariance component model :

X∗ij = Xi + Uij(+τj)

Xi : random true value

τj : random or fixed effect of the jth measurement tool

Then the variances and R can be estimated e.g. by ML or REML.

Padova 2007 Part 2 12

Page 62: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Problems

• Reliability dependent on V ar(Xi)

• Intra Class Correlation invariant on change of the scale for one measure-ment

• Measurement error variance primary and intuitive characteristic for thesimple measurement model

• Measurement error variance can be estimated from two independent (!!)measurements

Padova 2007 Part 2 13

Page 63: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Blend Altmann Plots

Main Idea: Explore relationship between measurement error and true value

Data: Two types of measurement

Plot difference between two measurements and the mean

Padova 2007 Part 2 14

Page 64: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"
Page 65: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Approaches for Assessment of agreement

Choudhary and Ng (Biometics 2006): Two measurement methods

• Find a model D = Xi1 −Xi2 = f((Xi1 + Xi2)/2

• Find a simultaneous p% probability range for the difference

• Use parametric or noparametric (Splines) regression models

• Bootstrap and approximations

Useful for assessment, but correction methods cannot be derived

Padova 2007 Part 2 16

Page 66: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Additive Berkson-error

Xi = X∗i + Ui (Ui, X

∗i ) indep.

E(Ui) = 0

V (Ui) = σ2U

Ui ∼ N(0, σ2U)

The model is suitable for

• Mean exposure of a region X∗ instead of individual exposure X.

• Working place measurement

• Dose in a controlled experiment

Padova 2007 Part 2 17

Page 67: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Classical and Berkson

Note that in the Berkson case

E(X|X∗) = X∗

V ar(X) = V ar(X∗) + V ar(U)

V ar(X) > V ar(X∗)

Note that in the Clssical additive case

E(X∗|X) = X

V ar(X∗) = V ar(X) + V ar(U)

V ar(X∗) > V ar(X)

Padova 2007 Part 2 18

Page 68: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Multiplicative measurement error

X∗i = Xi ∗ Ui (Ui, Xi) indep.

Classical

Xi = X∗i ∗ Ui (Ui, X

∗i ) indep.

Berkson

E(Ui) = 1

Ui ∼ Lognormal

• Additive on the logarithmic scale

• Used for exposure by chemicals or radiation

Padova 2007 Part 2 19

Page 69: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Measurement error in response

Simple linear regression

Y = β0 + β1X + ε

Y ∗ = Y + U additive measurement error

−→Y ∗ = β0 + β1X + ε + U

New equation error: ε + UAssumption : U and X independent, U and ε independent−→ Higher variance of ε−→ Inference still correct

Error in equation and measurement error are not discriminable.

Padova 2007 Part 2 20

Page 70: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Measurement error in covariates

We focus on covariate measurement error in regression models

Main model:E(Y ) = f(β, X, Z)

We are interested in Inference on β1

Z is a further covariate measured without error

Error model:X ←→ X∗

Observed model:

E(Y ) = f∗(X∗, Z, β∗)Naive estimation:Observed model = main modelbut in most cases : f∗ 6= f, β∗ 6= β

Padova 2007 Part 2 21

Page 71: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Differential and non differential measurement error

Assumption of differential measurement error relates to the response:

[Y |X, X∗] = [Y |X]

For Y there is no further information in U or W when X is known.Then the error and the main model can be split.

[Y, X∗, X] = [Y |X][X∗|X][X]

From the substantive point of view:

• Measurement process and Y are independent

Padova 2007 Part 2 22

Page 72: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

• Blood pressure on a special day is irrelevant for CHD if long term averageis known

• Mean exposure irrelevant if individual exposure is known

• But people with CHD can have a different view on their nutritionbehavior

Padova 2007 Part 2 23

Page 73: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Simple linear regression

We assume a classical non differential additive normal measurement error

Y = β0 + β1X + ε

X∗ = X + U, (U,X, ε) indep.

U ∼ N(0, σ2u)

ε ∼ N(0, σ2ε )

Padova 2007 Part 2 24

Page 74: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

SAS-Simulation for linear m.e. model

/* simulation of x N(0,1) */

data sim ;

do i=1 to 100; x=rannor(137);

output;

end;

run; /* simulation of Y = 1+2*x + epsilon*/

/* simulation of the surrogate with additive m.e */

data sim ;

set sim; su=2; /* measurement error std*/

y= 1+2*x+0.3*rannor(123); w= x+su*rannor (167) ;

run;

/* Plot options green for the correct and blue for the surrogates */

symbol1 c = green V = dot; symbol2 c = blue V = dot;

proc gplot data = sim;

symbol i=none; plot y*x y*w /overlay; /* scatter*/

symbol1 i=r; /* regression lines */ symbol2 i=r;

Padova 2007 Part 2 25

Page 75: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Figure 1: Effect of additive measurement error on linear regression

Padova 2007 Part 2 28

Page 76: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

The observed model in linear regression

E(Y |X∗) = β0 + β1E(X|W )Assuming X ∼ N(µx, σ2

x), the observed model is:

E(Y |X∗) = β∗0 + β∗1X∗

β∗1 =σ2

x

σ2x + σ2

u

β1

β∗0 = β0 +(

1− σ2x

σ2x + σ2

u

)β1µx

Y − β∗0 − β∗1X∗ ∼ N

(0, σ2

ε +β2

1σ2uσ2

x

σ2x + σ2

u

)

• The observed model is still a linear regression !

Padova 2007 Part 2 29

Page 77: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

• Attenuation of β1 by the factorσ2

xσ2

x+σ2u

”Reliability ratio”

• Loss of precision (higher error term)

Padova 2007 Part 2 30

Page 78: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Identification

(β0, β1, µx, σ2x, σ2

u, σ2ε ) −→ [Y, X∗]

−→ µy, µx∗, σ2y, σ

2x∗, σx∗y

(β0, β1, µx, σ2x, σ2

u, σ2ε ) and (β∗0 , β∗1 , µx, σ2

x + σ2u, 0, σε)

yield the identical distributions of (Y, X∗). =⇒ The model parameters arenot identifiableWe need extra information, e.g

• σu is known or can be estimated

Padova 2007 Part 2 31

Page 79: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

• σu/σε is known (orthogonal regression)

The model with another distribution for X is identifiable by higher moments.

Padova 2007 Part 2 32

Page 80: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

The observed model in linear regression (2)

Note that the observed model is dependent on the distribution of X. It isnot a linear regression, if X is not normal. Ex: X is a mixture of Normals

Padova 2007 Part 2 33

Page 81: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Naive LS- estimation

For the slope :

β1n =Syx∗

S2x∗

plim(β1n) =σyx∗

σ2x∗

=σyx

σ2x + σ2

u

= β1 ∗ σ2x

σ2x + σ2

u

For the intercept:

β0n = Y − β1nX∗

Padova 2007 Part 2 34

Page 82: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

plim(β0n) = µy + β1 ∗ σ2x

σ2x + σ2

u

∗ µx∗

= β0 + β1 ∗(

1− σ2x

σ2x + σ2

u

)∗ µx

Padova 2007 Part 2 35

Page 83: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Naive LS- estimation (2)

For the residual variance:

MSE = SY−β0n−β1nX∗

plim(MSE) = σ2ε +

β21σ

2uσ2

x

σ2x + σ2

u

Padova 2007 Part 2 36

Page 84: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Multiple linear regression

The generalization form the simple model is straightforward:

Y = β0 + X ′βx + Z ′βz

X∗ = X + U

U ∼ N(0, Σu)

Z is observed without errorIf we use X∗ instead of X then

(βx∗n

βzn

)→

(Σx + Σu Σxz

Σxz Σz

)−1 (Σx Σxz

Σxz Σz

) (βx∗βz

)

Padova 2007 Part 2 37

Page 85: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Multiple linear regression (2)

If Z and X are correlated then

• The attenuation factor is nowσ2

x|zσ2

x|z+σ2u

σ2x|z is residual variance from regressing X on Z

• βzn is also biased

βzn −→ βz + (1− σ2x|z

σ2x|z+σ2

u)βxγz

γz is regression coefficient when regressing X on Z

Padova 2007 Part 2 38

Page 86: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Correction for attenuation

We have a first method: Solve the bias equation:

β1 = β1nσ2

x + σ2u

σ2x

β1 = β1nS2

x∗

S2x∗ − σ2

u

β0 = β0n − β1

(S2

x∗ − σ2u

S2x∗

)W

Correction by reliability ratio.

V (β1) > V (β1n) Bias Variance trade off

Padova 2007 Part 2 39

Page 87: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Berkson-Error in simple linear regression

Y = β0 + β1X + ε

X = X∗ + U, U, (X∗, Y ) indep., E(U) = 0

Padova 2007 Part 2 40

Page 88: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Observed Model

E(Y |X∗) = β0 + β1X∗

V (Y |X∗) = σ2ε + β1 ∗ σ2

u

• Regression model with identical β

• Measurement error ignorable

• Loss of precision

Padova 2007 Part 2 41

Page 89: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Binary Regression

Logistic with additive non differential measurement error

P (Y = 1|X) = G(β0 + β1X)

G(t) = (1 + exp(−t))−1

X∗ = X + U

Observed model:

(Y = 1|X∗) =∫

P (Y = 1|X,X∗)fX|X∗dx

=∫

P (Y = 1|X)fX|X∗dx

Padova 2007 Part 2 42

Page 90: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

If we have additive measurement error and X and U are normal then X|X∗

is also normal

P (Y = 1|X∗) =∫

G(β0 + β1X)fX|X∗dx

Padova 2007 Part 2 43

Page 91: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

trueobserved

Legend

0.2

0.4

0.6

0.8

–3 –2 –1 0 1 2 3w

Page 92: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

observed lin approx

Legend

0.2

0.3

0.4

0.5

0.6

0.7

0.8

–3 –2 –1 0 1 2 3w

Page 93: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Probit Model

This integral is not easy to handle, but for the Probit model we can evaluateit:

P (Y = 1|X∗) = Φ(

(β∗0 + β∗1W )/√

1 + β21 · v

)

β∗1 =σ2

x

σ2x + σ2

u

β1

β∗0 = β0 +(

1− σ2x

σ2x + σ2

u

)β1µx

v = V ar(X|X∗)

This gives an exact correction for the Probit model

Padova 2007 Part 2 46

Page 94: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Probit approximation for logistic regression

G(t) = (1 + exp(−t))−1 ≈ Φ(t/h∗) mit h∗ ≈ 1.70

E(Y |X∗) =∫

G(β0 + β1X)fX|X∗dx =

=∫

Φ((β0 + β1X)h−1∗ )fX|X∗dx =

= G

((β∗0 + β∗1X∗)/

√1 + β2

1vh−2∗

)

Padova 2007 Part 2 47

Page 95: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Effect of measurement error in logistic regression

• Similar to the linear Model

• Further attenuation by√

1 + vβ21vh−2∗

Padova 2007 Part 2 48

Page 96: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Effect of measurement error in survival

Example: Uranium miner study (HK, Langner , Bender, LDA, 2007)

• 60 000 male workers

• Date of birth

• Job history (job periods with exact dates)

• Response: cancer mortality

• Radiation exposure by means of job exposure matrix

• Multiplicative or additive Berkson error

Padova 2007 Part 2 49

Page 97: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

True and observed survivor function

Cox regression model:Survivor function: Strue(t,X) = P (T > t|X)Hazard: htrue(t, x) = exp(Xβ)h0(t)

For an additive non differential Berkson error U we get

Sobs(t,X∗) := P (T ≥ t|X∗)

=∫

Strue(t,X∗ + u)fu(u)du

hobs(t,X∗) =∂

∂tSobs(t,X∗)/Sobs(t,X∗)

hobs(t,X∗) =∫

htrue(t,X∗ + U)fu(U)du = exp(X∗β + σ2u/2)

holds only for the rare disease assumption

Padova 2007 Part 2 50

Page 98: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Survival

0

0.2

0.4

0.6

0.8

1

P(T

>t)

0.5 1 1.5 2 2.5 3time t

Page 99: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Survival

0

0.2

0.4

0.6

0.8

1

P(T

>t)

0.5 1 1.5 2 2.5 3exposure x,w

Page 100: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Hazard

.2

.5

1.

2.

Haz

ard

0.5 1 1.5 2 2.5 3exposure x,w

Page 101: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Hazard

.5

1.

Haz

ard

0.5 1 1.5 2 2.5 3exposure x,w

Page 102: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Hazard

.2

.5

1.

2.

Haz

ard

0.5 1 1.5 2 2.5 3time t

Page 103: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Hazard

.2

.5

1.

2.

Haz

ard

0.5 1 1.5 2 2.5 3time t

Page 104: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Hazard

.5e–2

.1e–1

.5e–1

.1

Haz

ard

0 500 1000 1500 2000 2500 3000 3500exposure x,w

Page 105: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Hazard

.5e–2

.1e–1

.5e–1

.1

Haz

ard

0 500 1000 1500 2000 2500 3000 3500exposure x,w

Page 106: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Effects of measurement error

• No effect for rare disease

• Attenuation

Padova 2007 Part 2 55

Page 107: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Effects of measurement error

• Baseline hazard different

• Proportional hazard assumption does not hold in the observed model

Calculations for the situation of the GUM study

Padova 2007 Part 2 57

Page 108: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

MISCLASSIFICATION AND MEASUREMENTERROR IN REGRESSION MODELS

Part 3

Helmut KuchenhoffStatistical Consulting Unit

Ludwig-Maximilians-Universitat Munchen

Padova2./3.10.2007

Padova 2007 Part 3

Page 109: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

3. Methods

• Functional and structural

• Correction and method of moments and orthogonal regression

• Regression calibration

• Likelihood

• Quasi likelihood

• Bayes

• Corrected score

Padova 2007 Part 3 1

Page 110: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Functional and structural

• Functional:X fixed unknown constants

No assumptions about the distribution of X

• Structural: X latent random variableUse assumptions about the distribution of X

Padova 2007 Part 3 2

Page 111: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Direct correction

• Findp lim βn = β∗ = f(β, γ)

• Thenβ := f−1(βn)

• Correction for attenuation in linear model

• General case Ku (1997)

Padova 2007 Part 3 3

Page 112: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Method of moments

Moments of observed data can be estimatedSolve moments equationsSimple linear regression:

µX∗ = µX

µY = β0 + β1µx

σ2x∗ = σ2

x + σ2u

σ2y = β2

1σ2x + σ2

ε

σyx∗ = β1 ∗ σ2x

Padova 2007 Part 3 4

Page 113: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Orthogonal regression, total least squares

In linear Regression with classical additive measurement error:

• Assumeσ2

εσ2

u= η is known,

e. g. no equation error and σε is measurement error in Y . Minimize

n∑

i=1

{(Yi − β0 − β1Xi)2 + η(X∗

i −Xi)2}

in (β0, β1, X1, X2, . . . , Xn).

• Total least squares, Van Huffel (1997)

Padova 2007 Part 3 5

Page 114: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

• Technical symmetric applications

• In other applications a problem, assumption of no equation error notrealistic

Padova 2007 Part 3 6

Page 115: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Regression calibration

This simple method has been widely applied. It was suggested by differentauthors: Rosner et al. (1989) Carroll and Stefanski(1990)

1. Find a model for E(X|X∗, Z) by validation data or replication

2. Replace the unobserved X by estimate E(X|X∗, Z) in the main model

3. Adjust variance estimates by bootstrap or asymptotic methods

• Good method in many practical situations

• Calibration data can be incorporated

• Problems in highly nonlinear models

Padova 2007 Part 3 7

Page 116: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Regression calibration

• Berkson case: E(X|X∗) = X∗ −→Naive estimation = Regression calibration

• Classical : Linear regression X on X∗

E(X|X∗) =σ2

x

σ2w

∗X∗ + µX ∗ (1− σ2x

σ2w

)

Correction for attenuation in linear model

Padova 2007 Part 3 8

Page 117: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Survival

For Cox Model and rare disease assumption appropriate

Example: MONICA study, Augustin(2002)

• CHD and fat intake

• Cox-Regression

• Quadratic model

• Classical additive measurement error

• Heteroscedastic measurement error

• Replication (7 days) for estimating measurement error variance

Padova 2007 Part 3 9

Page 118: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Results

Correction by regression calibration, animal protein

Padova 2007 Part 3 11

Page 119: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Morbidity and Animal Protein

–2

0

2

4

6

20000 40000 60000 80000 100000 120000 140000 160000

Page 120: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Morbidity and Plant Protein

–2.5

–2

–1.5

–1

–0.5

0

0.5

1

10000 20000 30000 40000 50000 60000

Page 121: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Results differ for assumption of homoscedastic and heteroscedastic mea-surement error

Padova 2007 Part 3 10

Page 122: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Likelihood methods

• Standard inference can be done with standard errors and likelihood ratiotests

• Efficiency

• Combination of different data types are possible

• Sometimes more accurate than approximations

• Difficult to calculate

• Software not available

Padova 2007 Part 3 14

Page 123: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

• Parametric model for the unobserved predictor necessary

• Robustness to strong parametric assumptions

Padova 2007 Part 3 15

Page 124: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

The classical error likelihood

Main model [Y |X, Z, β]

Error model [X∗ |X, η]

Exposure model [X |Z, λ]

[Y,X∗|Z, θ] =n∏

i=1

∫[yi|x, zi, ζ][x∗i |x, η][x|zi, λ]dµ(x),

where θ = (ζ, η, λ)

• Evaluation by numerical integration

Padova 2007 Part 3 16

Page 125: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Berkson likelihood

3 components, but the third component contains no information

Main model [Y |X,Z, ζ]

Error model [X |X∗, η]

Exposure model [X∗ |Z, λ],

[Y,X∗|Z, θ] =n∏

i=1

∫[yi|x, zi, ζ][x|wi, η][wi|zi, λ]dµ(x)

[Y,X∗|Z, θ] =n∏

i=1

∫[yi|x, zi, ζ][x|wi, η]dµ(x) ∗ const

The Berkson likelihood does not depend on the exposure model

Padova 2007 Part 3 17

Page 126: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Quasi likelihood

If calculation of the likelihood is too complicated use

E(Y |X∗, Z) =∫

g(X,Z)fx|x∗dx

V (Y |X∗, Z) =∫

v(X,Z)fx|x∗dx + V ar[g(X, Z)|X∗]

This can be done e.g. for exponential g:

• Poisson regression model

• Parametric survival

Padova 2007 Part 3 18

Page 127: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Bayes

Richardson and Green (2002)

• Evaluation by MCMC techniques

• Conditional independence assumptions on the three models parts as seenin the likelihood approach

• The latent variable X is treated an unknown parameter

• Different data types can be combined

• Priori distributions for the error model

• Flexible handling of the exposure model

Padova 2007 Part 3 19

Page 128: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Corrected Score

Functional method, X fixed constants, no distribution assumptionsEstimating (score ) equations

E(ψtrue(Y, X, β)) = 0n∑

i=1

ψtrue(Yi, Xi, β) = 0

Find corrected score function ψCS with

Eu(ψCS(Y, X∗, β) = ψtrue(Y, X, β))

Padova 2007 Part 3 20

Page 129: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Corrected Score: Examples

ψCS = (Y − β0 − β1X∗)(1 X∗)T + (0 σ2

uβ)T

Corrected Score equations are available for polynomial Regression, Poissonand for GLMs misclassificationGeneral simulation method by Stefanski

Padova 2007 Part 3 21

Page 130: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Case study : Occupational Dust and chronic bronchitis

Ku/Carroll (1997) and Gossl /Ku(2001)Research question: Relationship between occupational dust and chronicbronchitis

Data form N=1246 workers:

X: log(1+average occupational dust exposure)Y : Chronic bronchitis (CBR)X∗: Measurements and expert ratingsZ1: SmokingZ2: Duration of exposure

No validation or replication data available!

Padova 2007 Part 3 22

Page 131: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

The Model

Segmented logistic regression an unknown threshold limiting value (TLV) τ

P (Y = 1|X = x,Z = z) = G(z′βk− + βk(x− τ)+),

(x− τ)+ = max(0, x− τ).

Padova 2007 Part 3 23

Page 132: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Effect of measurement error

-2 -1 0 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

Error variance = 0.0Error variance = 0.5Error variance = 1.0

Padova 2007 Part 3 24

Page 133: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

log(1+dust concentration)

0.0 0.5 1.0 1.5 2.0

0

1

| ||| | |

|

|| |||| ||| ||| || || |||| ||

| |

||| |||

|

| || |

||

| || ||| |

|

|||

|

|

|

||| || || |||| ||

| |

|

|

|

||

| ||

|

||| |

|

|| |

| ||

||

| |

| ||

|

||

| |

|| ||| |

||

| |||| || || || | ||| ||

|

||| |

|

| |

|

| ||||| ||| |

|

|| || |||

|

||

|

||

|

||| ||| ||| |

|

|| || || | || || || ||| || || | |

|

| || || |

|

|| ||

|

||

|

||||

|

||

|

|| || |||

||

||

|

|||

||

|| ||| |||| || |||

|

| |||

|

||| || |

|

|

|

|||| |

||

|

||

||

|

||

|

|||

|

|| | |||

|

|||

|

| |

|

|||||

|

| |

|

|

|

||| || | |||| || |

|

||

|

|||| |

|

| |||| || || |

|

|| ||||

|

|

| |

| ||| |

||

|

|

|

|

|

|

|

|

| ||| ||| |||

||

||

| |

||| || ||

|

|| |

|

|||| ||| |

|

|

|

| ||

|

|| || ||| ||

|

| || |

|

|

|

|||| |||| ||

| |

|| |||| || ||||| || |||| ||| |

|

|

|

| | |

||

| ||

| |

| |

|

|||

|

|| |

| |

||| |||| |||

|

|

||

|

|

| |

|

||

|

| || ||

|

|| |||

|

| ||

||

| ||| |

|

||

| |

|

|

||

|

|

|

|

| ||

| |

|

|

|

| ||| ||| ||| ||||| | || ||| | || | |||

|

|||

|

|

|

||

|

|| ||| ||| | |||

|

|| ||| |||| ||| || |||||| || || ||||| ||| ||

| |

|| ||| |

||

||

|

|

|

|| ||

||

|

||

| |

|

|||

||

||||| ||||

|

| || | |||||| ||||| |

|

||||

||

|| || | ||| ||

|

||||

|

|||| | ||| || || ||| ||| | | ||| || || || ||

|

|| | |||| |||| |

||

| |

|

| | || || | ||||| ||| |

|

|||

|

| |

|

||| || | ||| |||

||

| ||| |||| |

|

|| | | |

|

| || || ||| | ||

|

||

|

| ||| |||| ||

|

| |

|

||

|||| |

|

| || ||

|

||

|

||

|

|

| |

|||

| |

|

|

|

|

|

| ||

|

||

|

|||| |

|

|||

| |

| || || ||||| |

|

|||

|

|

| | |

|||| | ||

|

|||| || |||||| |||

|

|| |||

|

||

| |

||

||

| |||

|

|

| ||

||

|

|

|| | |

|||

||

|| ||||

||

||

|| |

|| |

|

| || || ||| |

|

|

|

||

| |

|

||

| || ||| ||

|

|| |||| ||||| | ||

|

|| || ||

|

|

|| ||

|

|

|

|

|||

| ||

|

|| |

|

|| |

||

|

| | |

|| ||

| |

|

|| |

||

| ||

|

|| |||| || || || ||

| ||

|| || ||| || | | |

|

||| |

| |

| || || ||| ||

|

|| |||

||||

|

|

|| |

|| |

|

| |

||

|

|

|

|

||

| |

||| |

||

| ||| |

| ||

|

| ||

||

||

||

||| |||| |

|

|

|||

|

|

| ||| ||||| | |

| |

Munich Data

naive gamnaive thresholdordinary logistic

log(1+dust concentration)

0.0 0.5 1.0 1.5 2.0

0.2

0.3

0.4

0.5

0.6

0.7

Padova 2007 Part 3 26

Page 134: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Likelihood

• Probit approximation

• Calculation of the integrals

• Assumption of a mixture of two normals for the exposure model

• Fixed additive measurement error

Padova 2007 Part 3 27

Page 135: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Regression calibration

• Assumption of a mixture of two normals for the exposure model

• Fixed additive measurement error

Padova 2007 Part 3 28

Page 136: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Results

Method TLV-τ0 Nom s. e. boot s.e.

Naive 1.27 .41 .24

Pseudo-MLE 1.76 .17 .21

Regression Calibration 1.75 .12 .19

Table 1: Estimated TLV in the Munich data, when σ2u = 0.035.

Padova 2007 Part 3 29

Page 137: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Bayes

• Fixed additive measurement error: Sensitivity analysis

• flat priori for measurement error :No convergence

• Assumption of mixture of normals for exposure model

• Both models Berkson and additive

Padova 2007 Part 3 30

Page 138: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Results: Estimation of TLV

class.

Padova 2007 Part 3 31

Page 139: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Berkson

Padova 2007 Part 3 32

Page 140: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Conclusions

1. Measurement model essential: High Difference between Effect of Berksonand classical measurement error in most cases!

2. Additive classical non differential measurement error leads to attenuation

3. Many methods available

4. Regression calibration works in many cases

5. ML should be taken into account for Berkson error

6. Bayesian analysis is useful especially if model structure is complex

Padova 2007 Part 3 33

Page 141: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

MISCLASSIFICATION AND MEASUREMENTERROR IN REGRESSION MODELS

Part 4

Helmut KuchenhoffStatistical Consulting Unit

Ludwig-Maximilians-Universitat Munchen

Padova2./3.10.2007

Padova 2007 Part 4

Page 142: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

2. GENERAL SIMEX IDEA

Linear regression with additive measurement error

( )( ) ( )

( )( )

2 2i X i

*i

i 0 1

i i i i i

i

i i

Var X & N 0,

X X U with U,X

Y X i 1, ,

, independent

U N 0

n

,1

ε

= β +

= σ ε σ

= + σ ε

β + ε =

• Ignoring measurement error (Ui) ⇒ naïve estimation in * * * *i 0 1 i iY X= β + β + ε

naive 1ˆlimp= = ββ

2X

2 2X

σσ +σ

∗1β

⇒ Attenuation increases with measurement error variance

9

Page 143: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

VARIANCE OF MEASUREMENT ERROR

LIM

IT O

F N

AIV

E E

STI

MA

TOR

0.0 0.5 1.0 1.5 2.0

0.4

0.6

0.8

1.0

LINEAR REGRESSION

VAR(X)=1

TRUE BETA_1

10

Page 144: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

SIMEX idea (Cook & Stefanski, 1994)

• Assume

• σ is known

• Observe ( )n*i i i i 1

Y,X ,Z=

instead of ( )ni i i i 1Y,X ,Z

=

• SIMulation step: generate more measurement error + calculate naïve estimators

1. Simulate pseudo-data ( )* *b,i i b,iX X U for a fixed grid λ = + λ σ ( )0 1 2 m0 , , , ,λ ≡ λ λ λ…

⇒ ( )( ) ( )* 2 2b,i XVar X 1λ = σ + + λ σ

2. Do this B times (b=1, …, B)

( ) ( )( )B n*

NAIVE i b,i k i i 1b 1

1 ˆ Y X, ,ZB =

=

⎡ ⎤λ ⎦β ⎣∑kβ λ =3. Calculate mean:

11

Page 145: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

• EXtrapolation step: extrapolate back to λ = -1 to estimate β

1. Fit parametrically relation ( )( ) ( )k kˆ, k 0, , mλ β λ = …

2. Find ( )SIMEXβ ˆ 1≡ β − for all regression coefficients

12

Page 146: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Example ( )i i iY 1 X i 1, ,5 200= + + ε = … ( ) ( )i iX N 0,2² & N 0,1ε∼ ∼ ( )*

1 ii*

i 0 iY UX+ +β= β + ε ( )iU N 0,1∼

LAMBDA

NA

IVE

ES

TIM

ATO

R

-1 0 1 2

23

45

6

1

Page 147: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Example ( )i i iY 1 X i 1, ,5 200= + + ε = … ( ) ( )i iX N 0,2² & N 0,1ε∼ ∼ ( )*

1 ii*

i 0 iY UX+ +β= β + ε ( )iU N 0,1∼

LAMBDA

NA

IVE

ES

TIM

ATO

R

-1 0 1 2

23

45

6

2

Page 148: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Example ( )i i iY 1 X i 1, ,5 200= + + ε = … ( ) ( )i iX N 0,2² & N 0,1ε∼ ∼ ( )*

1 ii*

i 0 iY UX+ +β= β + ε ( )iU N 0,1∼

LAMBDA

NA

IVE

ES

TIM

ATO

R

-1 0 1 2

23

45

6

3

Page 149: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Example ( )i i iY 1 X i 1, ,5 200= + + ε = … ( ) ( )i iX N 0,2² & N 0,1ε∼ ∼ ( )*

1 ii*

i 0 iY UX+ +β= β + ε ( )iU N 0,1∼

LAMBDA

NA

IVE

ES

TIM

ATO

R

-1 0 1 2

23

45

6

4

Page 150: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Example ( )i i iY 1 X i 1, ,5 200= + + ε = … ( ) ( )i iX N 0,2² & N 0,1ε∼ ∼ ( )*

1 ii*

i 0 iY UX+ +β= β + ε ( )iU N 0,1∼

LAMBDA

NA

IVE

ES

TIM

ATO

R

-1 0 1 2

23

45

6

5

Page 151: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Example ( )i i iY 1 X i 1, ,5 200= + + ε = … ( ) ( )i iX N 0,2² & N 0,1ε∼ ∼ ( )*

1 ii*

i 0 iY UX+ +β= β + ε ( )iU N 0,1∼

Average of extrapolated estimate = 1,SIMEXˆ 4.86β =

6

Page 152: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Extrapolation functions

Linear : g(λ) = γ0 + γ1λ

Quadratic : g(λ) = γ0 + γ1λ + γ2 ∗ λ2

Nonlinear : g(λ) = γ1 +γ2

γ3 ∗ λ

• Nonlinear is motivated by linear regression

• Quadratic works fine in many examples

• Motivation by Taylor Series expansions

Padova 2007 Part 4 2

Page 153: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Variance estimation

• Delta method (Carroll et al.(1996))

• For known error variance the variance can be also be estimated byextrapolation (Stefanski and Cook (1995)

• Bootstrap (computer intensive)

Padova 2007 Part 4 3

Page 154: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Case study : Occupational Dust and chronic bronchitis

Ku/Carroll (1997) and Gossl /Ku(2001)Research question: Relationship between occupational dust and chronicbronchitis

Data form N=1246 workers:

X: log(1+average occupational dust exposure)Y : Chronic bronchitis (CBR)X∗: Measurements and expert ratingsZ1: SmokingZ2: Duration of exposure

Padova 2007 Part 4 4

Page 155: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Results for the TLV

Method TLV-τ0 Nom s. e. boot s.e.

Naive 1.27 .41 .24

Pseudo-MLE 1.76 .17 .21

Regression Calibration 1.75 .12 .19

simex: linear 1.37 .23 .23

simex: quadratic 1.40 .23 .34

simex: nonlinear 1.40 .23 .86

Padova 2007 Part 4 5

Page 156: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Misclassification SIMEX

General Regression model with misclassification matrix Π

β∗(Π) := p lim βnaive

β∗(Ik×k) = β

Problem: β∗(Π) is a function of a matrix.We define:

λ → β∗(Πλ)Πλ is defined by Π0 = Ikxk, Πn+1 = Πn ∗Π for λ = 0, 1, 2...

Πλ := EΛλE−1

E := Matrix of eigenvectors

Λ := Diagonal matrix of eigenvalues

Padova 2007 Part 4 6

Page 157: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Parameter Estimation in Relationship to the amount ofmisclassification

Logistic regression with misclassified X (π11 = π00 = 0.8)

Padova 2007 Part 4 7

Page 158: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Logistic regression with misclassified X

0

0.2

0.4

0.6

0.8

1

Lim

it of

the

naiv

e es

timat

or

0.5 1 1.5 2 2.5 3Lambda

Page 159: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Properties of the function β∗(Πλ)

• β∗(Π0) = β

• differentiable

If X∗ is misclassified in relation to X by MC-matrix ΠX∗(λ), is misclassified in relation to X∗ by MC-matrixΠλ,⇒X∗(λ) is misclassified in relation to X by MC-matrixΠλ+1

Padova 2007 Part 4 9

Page 160: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

The MC-SIMEX Procedure

Data (Yi, X∗i , Zi)n

i=1,X∗ is observed instead of X with MC-matrix ΠNaive estimator: βnaive[(Yi, X

∗i , Zi)n

i=1].SimulationFor a fixed grid λ1 . . . λm B new pseudo data aregenerated by

X∗b,i(λk) := MC[Πλ

k](X∗i ), i = 1, . . . , n; b = 1, . . . B;

There MC[M ](X∗i ) is simulated from X∗

i using themisclassification matrix M .

β(λk) := B−1B∑

b=1

βna

[(Yi, X

∗b,i(λk), Zi)n

i=1

], k = 1, . . .m.

Padova 2007 Part 4 10

Page 161: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Extrapolation

Parametric model:

β(Πλ) = G(λ,Γ) = γ0 + γ1λ + γ2λ2

Fit by least squares from data [λk, β(λk)]mk=0.

βSIMEX := G(−1, Γ)

Padova 2007 Part 4 11

Page 162: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Extrapolation function

Calculate true function β(Π) in several examples andsimulation studies

• Funktion monotonic

• Quadratische Extrapolation suitable

• Loglinear Extrapolant

Padova 2007 Part 4 12

Page 163: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

λ

Lim

it of

nai

ve e

stim

ator

Page 164: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Delta Method Variance Estimation

• All Estimators are solution of (biased) estimatingequations

• Asymptotic expansions on averages of different es-timating equations

• Extrapolation is a differentiable transformation

• Estimation of misclassification matrix can be in-cluded

Padova 2007 Part 4 13

Page 165: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

7. APPLICATION TO THE SIGNAL TANDMOBIEL STUDY®

• Oral health study involving 4468 children in Flanders (Belgium)

• Children were examined annually by one of 16 dental examiners

• Binary response Y=1 if tooth is decayed, filled or extracted due to caries

• GEE analysis for caries (combined response & individual teeth) on 4 first molars as a

function of covariates

• Questions:

East-West gradient in caries experience on the first 4 molars?

Does the trend remain the same in time?

But: dental examiners showed high & different misclassification benchmark scorer

3

helmut
Rechteck
Page 166: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

STS: East-West trend in caries experience (1st year’s cross-sectional results)

W e s t

F la n d e r s

E a s t

F la n d e r s

B r a b a n t

A n t w e r p

L i m b u r g

[ 0 . 4 9 , 1 . 1 0 ] ( 1 . 1 0 , 1 . 4 0 ] ( 1 . 4 0 , 2 . 2 5 ]

K E Y high caries

low caries

4

Page 167: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

STS: Dental examiners are active in restricted geographical areas

( - 1 5 % , - 5 % ] ( - 5 % , 5 % ] ( 5 % , 1 8 % )

K E Y

W e s t

F l a n d e r s

E a s t

F l a n d e r sA n t w e r p

L i m b u r g

overscoring

underscoring

⇒ East-Wes nt?

B r a b a n t

t gradie

5

Page 168: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Results SIMEX approach (individual teeth)

• X-coordinate

7

Page 169: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Remarks

• Existence of Πλ for λ < 1 has to be checked

• If β vector then use MC-SIMEX for every component

• The procedure also works for misclassified Y ormore general cases

• βSIMEX is consistent, if the extrapolating functionis correctly specified.

• In general MC-SIMEX is approximately consistent,if G(λ,Γ) is a good approximation of β∗(Πλ).

Padova 2007 Part 4 13

Page 170: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Results SIMEX approach (individual teeth)

• East-West gradient confirmed

• East-West gradient increases over the years

• ….

9

helmut
Rechteck
Page 171: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Software

• R-Package available (W. Lederer)

• Flexible statement for the main model

• Misclassification and additive measurement error

• Graphic display for the results

Lederer, Ku R-news (2006)

Padova 2007 Part 4 14

Page 172: MISCLASSIFICATION AND MEASUREMENT ERROR IN … · 1. Misclassiflcation: Examples † Wrong diagnosis Not diseased\ instead of " diseased" † Wrong answer in a questionnaire No drugs"

Summary

• Very general computer intensive method

• Illustration of the effect of misclassification

• MC in X,Y or both, differential MC etc. can behandled

• Misclassification known or can be estimated byvalidation data

Padova 2007 Part 4 16