197
MEASUREMENT ERROR IN HEALTH STUDIES Lecture 1 — Introduction, Examples, Effects of Measurement Error in Linear Models Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical Lecture 3 — Distinguishing Berkson from Classical, Structural/Functional, Re- gression Calibration, Non-Additive Errors Lecture 4 — SIMEX and Instrumental Variables Lecture 5 — Bayesian Methods Lecture 6 — Nonparametric Regression with Measurement Error

MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

MEASUREMENT ERROR IN HEALTH STUDIES

• Lecture 1 — Introduction, Examples, Effects of Measurement Error in Linear

Models

• Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact

Predictors, Berkson/Classical

• Lecture 3 — Distinguishing Berkson from Classical, Structural/Functional, Re-

gression Calibration, Non-Additive Errors

• Lecture 4 — SIMEX and Instrumental Variables

• Lecture 5 — Bayesian Methods

• Lecture 6 — Nonparametric Regression with Measurement Error

Page 2: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 1

LECTURE 1: INTRODUCTION, EXAMPLES AND LINEAR

MEASUREMENT ERROR MODELS

OUTLINE

• Why We Need Special Methods For Measurement Errors

• Measurement Error Examples

• Structure of a Measurement Error Problem

• Classical Error Model in Linear Regression

Page 3: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 2

WHY WE NEED SPECIAL METHODSTO HANDLE MEASUREMENT

ERRORS

• These lectures are about strategies for regression when some predictors are mea-

sured with error.

• Remember your introductory regression text. . .

∗ Snedecor and Cochran (1967),“Thus far we have assumed thatX-variable

in regression is measured without error. Since no measuring instrument is

perfect this assumption is often unrealistic.”

∗ Steele and Torrie (1980),“ . . . if the X ’s are also measured with error,. . . an

alternative computing procedure should be used. . .”

∗ Neter and Wasserman (1974),“Unfortunately, a different situation holds if

the independent variableX is known only with measurement error.”

Page 4: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 3

WHY SPECIAL METHODSARE NEEDED — CONT.

• Measurement error in outcome: If

Y = β0 + β1X + ε

and we observeY ∗ = Y + U , then

Y ∗ = β0 + β1X + (ε + U)

∗ All that has happened is that the error variance is bigger

∗ Standard regression applies

Page 5: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 4

WHY SPECIAL METHODSARE NEEDED — CONT.

• Consider now:

Y = β0 + β1X + ε

W = X + U

and

only W andY are observed.

Then

Y = β0 + β1W + (ε− β1U)

∗ Thisseems like a standard regression problemwith additional error.

I Can we analyze it that way?

Page 6: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 5

WHY SPECIAL METHODSARE NEEDED — CONT.

• From previous page:X = W − U implies

Y = β0 + β1W + (ε− β1U)

• Problem: W is correlated with the “error”ε− β1U

• Better to replaceW by a function ofW , namelyE(X|W ), since

X = E(X|W ) + V, whereE(X|W ) andV are uncorrelated

and then we have the model

Y = β0 + β1E(X|W ) + (ε + β1V )

∗ special case of“regression calibration”

∗ leads to unbiased estimates

Page 7: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 6

EXAMPLES OF MEASUREMENTERRORPROBLEMS

• Different measures of nutrient intake, e.g., by a food frequency questionnaire

(FFQ)

• Systolic Blood Pressure

• Radiation Dosimetry

• Exposure to arsenic in drinking water, dust in the workplace, radon gas in the

home, and other environmental hazards

Page 8: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 7

MEASURESOF NUTRIENT INTAKE

• Y = average daily % calories from fat by a FFQ.

• X = true long-term average daily percentage of calories from fat

• AssumeY = β0 + βxX + ε

∗ we donotassume that the FFQ is unbiased

• X is never observable. It is measuredwith error:

∗ Along with the FFQ, on 6 days over the course of a year women are interviewedby phone and asked to recall their food intake over the past year (24–hourrecalls). Their average is recorded and denoted byW .

I W is an“alloyed gold standard”

∗ The analysis of 24–hour recall introduces some error=⇒ analysis error

∗ Measurement error = sampling error + analysis error

∗ Classical measurement error model:

Wi = Xi + Ui, Ui are measurement errors

Page 9: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 8

HEART DISEASEVS SYSTOLIC BLOOD PRESSURE

• Y = indicator of Coronary Heart Disease (CHD)

• X = true long-term average systolic blood pressure (SBP) (maybe transformed)

• AssumeP (Y = 1) = H (β0 + βxX), H is the logistic or probit function

• Data are CHD indicators and determinations of systolic blood pressure forn =

1600 in Framingham Heart Study

Page 10: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 9

HEART DISEASEVS SYSTOLIC BLOOD PRESSURE— CONT.

• X measuredwith error:

∗ SBP measured at two exams (and averaged)=⇒ sampling error

∗ The determination of SBP is subject to machine and reader variability=⇒analysis error

∗ Measurement error = sampling error + analysis error

∗ Measurement error model

Wi = Xi + Ui, Ui are measurement errors

Page 11: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 10

GENERAL STRUCTUREOF A M EASUREMENTERRORPROBLEM

• Y = response,Z = error-free predictor,X = error-prone predictor

• E(Y |Z, X) = f (Z,X, β) (outcome model)

• Observed data:(Yi, Zi, Wi), i = 1, . . . , n

∗ E(Y |Z,W ) 6= f (Z,W, β) (source of our worries)

• Error model relatingWi andXi (measurement model)

∗ Wi = Xi + Ui (classical error model)

∗ Wi = γ0,em + γtx,emXi + γt

z,emZi + Ui (error calibration model)

Page 12: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 11

A CLASSICAL ERRORMODEL

• Wi = Xi + Ui (additive)

• Ui are:

∗ independent of allYi, Zi andXi

∗ IID (0, σ2u)

• In addition, wemayhave a model for the distribution of(X, Z) (exposure model)

∗ called astructural model

Page 13: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 12

SIMPLE L INEAR REGRESSIONWITH A CLASSICAL ERROR

MODEL

• Y = response,X = error-prone predictor

• Y = β0 + βxX + ε

• Observed data:(Yi,Wi), i = 1, . . . , n

• Wi = Xi + Ui (additive)

• Ui are:

∗ independent of allYi, Zi andXi

∗ IID (0, σ2u)

What are the effects of measurement error on the usual analysis?

Page 14: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 13

SIMULATION STUDY

• GenerateX1, . . . , X50, IID N (0, 1)

• GenerateYi = β0 + βxXi + εi

∗ εi IID N (0, 1/9)

∗ β0 = 0

∗ βx = 1

• GenerateU1, . . . , U50, IID N(0, 1)

• SetWi = Xi + Ui

• RegressY onX andY onW and compare

Page 15: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 14

-6 -2 0 2 4 6

-5

0

5

10

Effects of Measurement Error

Reliable Data

True Data Without Measurement Error

Page 16: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 15

-6 -2 0 2 4 6

-5

0

5

10

Effects of Measurement Error

Reliable Data

Error--prone Data

Observed Data With Measurement Error

Page 17: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 16

THEORY BEHIND THE PICTURES: THE NaiveANALYSIS

• Least Squares Estimate of Slope:

βx =Sy,w

S2w

=n−1

∑(Y − Y )(W −W )

n−1∑

(W −W )2

where

Sy,w −→ Cov(Y, W ) = Cov(Y, X + U) = Cov(Y, X) = σy,x

S2w −→ Var(W ) = Var(X + U) = σ2

x + σ2u

So

βx −→ σy,x

σ2x + σ2

u

=

(σ2

x

σ2x + σ2

u

)σy,x

σ2x

= λβx

where

λ =σ2

x

σ2x + σ2

u

=σ2

x

σ2w

= attenuation factor= reliability ratio

∗ It is therelativesize of the error that matters

Page 18: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 17

THEORY BEHIND THE PICTURES: THE NaiveANALYSIS

• Least Squares Estimate of Intercept:

β0 = Y − βxW

−→ µy − λβxµx

= β0 + (1− λ)βxµx

• Estimate of Residual Variance:

MSE−→ σ2ε + (1− λ)β2

xσ2x

Page 19: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 18

MORE THEORY: JOINT NORMALITY

• Y, X, W jointly normal=⇒

∗ Y | W ∼ Normal

∗ E(Y | W ) = β0 + (1− λ)βxµx + λβxW

∗ Var(Y | W ) = σ2ε + (1− λ)β2

xσ2x

• Intercept is shifted by(1− λ)βxµx

• Slope is attenuated by the factorλ

• Residual variance is inflated by(1− λ)β2xσ

2x

Page 20: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 19

MORE THEORY: IMPLICATIONS FOR TESTING HYPOTHESES

• Because

βx = 0 iff λβx = 0

it follows that

[H0 : βx = 0] ≡ [H0 : λβx = 0]

so the naive test ofβx = 0 is valid (correct Type I error rate).

• The naive test ofH0 : βx = 0 is asymptotically efficient whenE(X | W ) is linear

in W .

• The discussion of naive tests when there are multiple predictors measured with

error, or error-free predictors, is more complicated

Page 21: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 20

Measurement Error Variance

Sam

ple

Siz

e

0.0 0.2 0.4 0.6 0.8 1.0

0

5

10

15

20

25

30

Sample Size for 80% Power. True slopeβx = 0.75. Variancesσ2x = σ2

ε = 1.

Page 22: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 21

MULTIPLE L INEAR REGRESSIONWITH ERROR

• Model

Y = β0 + βtzZ + βt

xX + ε

W = X + U

• RegressingY onZ andW estimates(

βz∗βx∗

)= Λ

(βz

βx

) [6=

(βz

βx

)]

• Λ is theattenuation matrix or reliability matrix

Λ =

(σzz σzx

σxz σxx + σuu

)−1 (σzz σzx

σxz σxx

)

– Biases in components ofβx∗ andβz∗ can be multiplicative or additive=⇒∗ Naive test ofH0 : βx = 0, βz = 0 is valid

Page 23: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 22

From previous page:

• (βz∗

βx∗

)= Λ

(βz

βx

) [6=

(βz

βx

)]

•Λ =

(σzz σzx

σxz σxx + σuu

)−1 (σzz σzx

σxz σxx

)

∗ Naive test ofH0 : βx = 0 is valid

∗ Naive test ofH0 : βx,1 = 0 is typically not valid (βx,1 denotes a subvector of

βx)

∗ Naive test ofH0 : βz = 0 is typically not valid (same is true for subvectors)

Page 24: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 23

MULTIPLE L INEAR REGRESSIONWITH ERROR

• ForX scalar, attenuation factor inβx∗ is

λ1 =σ2

x|zσ2

x|z + σ2u

∗ σ2x|z = residual variance in regression ofX onZ

∗ σ2x|z ≤ σ2

x =⇒

λ1 =σ2

x|zσ2

x|z + σ2u

≤ σ2x

σ2x + σ2

u

= λ

∗ =⇒ Collinearity accentuates attenuation

• Biased estimates ofβz:

βz∗ = βz + (1− λ1)βxΓz,

∗ Γz is from E(X | Z) = Γ1 + ΓtzZ

Page 25: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 24

ANALYSIS OF COVARIANCE

• These results have implications for the two group ANCOVA.

∗ X = true covariate

∗ Z = dummy indicator of group 1, say

• We are interested in estimatingβz, the group effect. Biased estimates ofβz:

βz∗ = βz + (1− λ1)βxΓz,

∗ Γz is from E(X | Z) = Γ1 + ΓtzZ

∗ Γz is the difference in the mean ofX among the two groups.

∗ Thus, biased unlessX andZ are unrelated.

I Use a randomized Study!!!

Page 26: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 25

Predictors

Res

pons

e

-4 -2 0 2 4

-0.5

0.0

0.5

1.0

1.5

Unbalanced ANCOVA. Red = True Data,Blue = Observed.Solid = First Group,

Open = Second Group. No Difference In Groups.

Page 27: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 26

WHAT IF WE WILL PREDICT FUTURE OUTCOMESBASED ON

THE OBSERVEDCOVARIATE?

• If we wish to predictY based onW , then the regression ofY onW is the correct

model to use

∗ Though this assumes that the distribution ofX and the measurement error dis-

tribution will be the same in the future as in the study

I sinceλ = σ2x/(σ2

x + σ2u)

• In public health, interventions will change the trueX, notW , so we are interested

in the regression ofY onX

Page 28: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 27

SUMMARY OF THE EFFECTSOF MEASUREMENTERROR IN

SIMPLE L INEAR REGRESSION

• Regression ModelY = β0 + βxX + ε

W = X + U

• Attenuation Factor (Reliability Ratio)

λ =σ2

x

σ2x + σ2

u

∗ 0 < λ ≤ 1

∗ λ = 1 ⇐⇒ σ2u = 0

Regression Intercept Slope Residual VarianceY onX β0 βx σ2

ε

Y onW β0 + (1− λ)βxµx λβx σ2ε + (1− λ)β2

xσ2x

Page 29: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 1 28

END OF LECTURE1

Page 30: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 1

LECTURE 2: DATA TYPES, NONDIFFERENTIAL ERROR,

ESTIMATING ATTENTUATION, EXACT PREDICTORS,

BERKSON MODEL

OUTLINE

• Nondifferential measurement error

• Estimating the attenuation

• Replication and validation data

• Internal and external subsets

• Transportability across data sets

• Is there an “exact” predictor?

• Berkson and classical measurement error

Page 31: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 2

THE BASIC DATA

• A responseY

• PredictorsX measured with error (unobserved).

• PredictorsZ measured without error.

• A major proxyW for X.

• Sometime, a second proxyT for X.

Page 32: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 3

NONDIFFERENTIAL ERROR

• Roughly speaking, the error is said to benondifferentialif W andT would not be

measured if one could have measuredX.

• More formally,(W,T ) areconditionally independentof Y given(X,Z).

⇒ (W,T ) would provide no additional information aboutY if X were ob-

served

• This often makes sense, but it may be fairly subtle in each application.

Page 33: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 4

NONDIFFERENTIAL ERROR

• Many crucial theoretical calculations revolve around nondifferential error.

• Consider simple linear regression:Y = β0 + βxX + ε, ε independent ofX.

E(Y |W ) =

(1)︷ ︸︸ ︷E [{E(Y |X,W )} |W ] = E [{E(Y |X)} |W ] = β0 + βxE(X|W ).

∗ This reduces the problem to estimatingE(X|W ). For example,

E(X|W ) = λW + (1− λ)µx (under joint normality and classical error)

whereλ = σ2x/σ

2w.

• If the error isdifferential, then (1) fails, and no simplification is possible.

Page 34: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 5

HEART DISEASEVS SYSTOLIC BLOOD PRESSURE

• Y = indicator of Coronary heart Disease (CHD)

• X = true long-term average systolic blood pressure (SBP) (maybe transformed)

• AssumeP (Y = 1) = H (β0 + βxX)

• Data are CHD indicators and determinations of systolic blood pressure forn =

1600 in Framingham Heart Study

• X measuredwith error:

∗ SBP measured at two exams (and averaged)=⇒ sampling error

∗ The determination of SBP is subject to machine and reader variability

• It is hard to believe that the short term average of two days carries any additional

information about the subject’s chance of CHD over and above true SBP.

• Hence,nondifferential

Page 35: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 6

IS THIS NONDIFFERENTIAL?

• From Tosteson et al. (1989).

• Y = I{wheeze}.

• X is personal exposure to NO2.

• W = (NO2 in kitchen, NO2 in bedroom) is observed in the primary study.

Page 36: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 7

IS THIS NONDIFFERENTIAL?

• From Kuchenhoff & Carroll

• Y = I{lung irritation}.

• X is actual personal long–term dust exposure

• W = is dust exposure as measured by occupational epidemiology techniques.

∗ The sampled the plant for dust.

∗ Then they tried to match the person to where he/she worked.

Page 37: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 8

WHAT IS NECESSARYTO DO AN ANALYSIS?

• In linear regression with classical additive errorW = X + U , one needs:

∗ Nondifferential error

∗ An estimate of the error variance var(U) — sinceλ = σ2w−σ2

uσ2

w.

• How do we get the latter information?

• The best way is to get a subsample of the study in whichX is observed. This is

calledvalidation.

∗ In many applications not possible.

• Another method is to do replications of the process, often calledcalibration.

• A third way is to get the value from another similar study.

Page 38: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 9

REPLICATION

• In a replication study, for some of the study participants you measure more than

oneW .

• The standard model is

Wij = Xi + Uij, j = 1, ..., mi.

• This is a one-factor ANOVA with mean squared errorσ2u estimated by

σ2u =

∑ni=1

∑mij=1(Wij −W i•)2∑ni=1(mi − 1)

.

• As the proxy forXi one would useW i• where

W i• = Xi + U i•

var(U i•) = σ2u/mi.

Page 39: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 10

REPLICATION

• Replication allows you to test whether your model is additive with constant error

variance.

• If Wij = Xi + Uij with Uij symmetrically distributed about zero and independent

of Xi, we have a major fact:

∗ The sample mean and sample standard deviation are uncorrelated.

I Eckert, Carroll, and Wang (1997) use this fact to find a transformation to

additivity

• Also, if Uij are normally distributed, then so too are differencesWi1 − Wi2 =

Ui1 − Ui2.

• Graphical diagnostics can be implemented easily in any package.

Page 40: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 11

REPLICATION: WISH

• The WISH study measured caloric intake using a 24–hour recall.

∗ There were 6 replicates per woman in the study.

• A plot of the caloric intake data showed thatW was nowhere close to being nor-mally distributed in the population.

∗ If additive, then eitherX or U is not normal.

• When plotting standard deviation versus the mean, one often uses the rule that themethod “passes” the test if the max-to-min is less than 2.0.

∗ A little bit of non–constant variance does not seem to hurt. See Carroll &

Ruppert (1988)

∗ Caveat:Carroll & Ruppert studied effects of heteroscedasticity on efficiency,not on bias due to measurement error.

Page 41: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 12

-3 -2 -1 0 1 2 3

1000

2000

3000

4000

5000

6000

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

•••

•• •

••

••

••

• •

• •

••

••

••

••

••

••

••••

••

••

•• •

• •

••

•••

••

WISH Calories, FFQNormal QQ--Plot

WISH, Caloric Intake, Q–Q plot of W .

Page 42: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 13

-3 -2 -1 0 1 2 3

-3000

-2000

-1000

0

1000

2000

3000

••

••

••

••

••

••

••

••

••

•••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

• •

••

••

•••

• •

••

••

••

•••

••

••••

••

••

••

•••

••

••

••

••

••

••

•••

••

•••

••

••

••

••

• •

••

••

••

••

••

••

••

• • ••

••

••

••

•••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

•••

••

WISH Calories, 24--hour recallsNormal QQ--Plot of Pairwise Differences

WISH, Caloric Intake, Q–Q plot of Differenced Ws.

Page 43: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 14

Mean Calories

s.d.

1000 1500 2000 2500 3000

200

400

600

800

1000

1200

••

••

••

••

••

••

••

• •

••

••

••

• •

••

••

••

••

••

••

• •

••

••

••

••

••

••

••

• •

••

WISH Calories, 24--hour recallss.d. versus mean

WISH, Caloric Intake, plot for additivity, loess and OLS.

Page 44: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 15

REPLICATION: WISH

• Taking logarithm ofW improves all the plots.

Page 45: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 16

-3 -2 -1 0 1 2 3

6.0

6.5

7.0

7.5

8.0

8.5

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

•••

••

••

• •

••

••

••

••

•••

••

••

••

••

••

WISH Log-Calories, FFQNormal QQ--Plot

WISH, Log Caloric Intake, Q–Q Plot of ObservedW .

Page 46: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 17

-3 -2 -1 0 1 2 3

-2*10^0

-1.000020*10^0

-4.053116*10^-5

9.999392*10^-1

• ••

••

••

••

•••

••

••

••

••

••

•••

•••

••

••

••

••

••

••

••

••

••

••

• •

••

••

••

••

•••

••

••

••

••

•••

••

•• •

••••

••

••

••

••

••

••

••

••

• •

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

••

• •

••

•••

••

••

••

••

••

••

• •

••••

••

••

••••

••

••

••

••

••

•••

••

•••

••

••

••

• •

•••

••

••

• •

••

••

••

••

••

••

•••

•••

••

••

••

••••

••

•••

•••

••

••

••

• •

••••

••

••

••

••

••

• •

••

••

••

•••

••

WISH Log-Calories, 24--hour recallsNormal QQ--Plot of Pairwise Differences

WISH, Log Caloric Intake, Q–Q Plot of Differenced Ws.

Page 47: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 18

Mean Log--Calories

s.d.

6.5 7.0 7.5 8.0

0.2

0.4

0.6

0.8

• •

••

••

••

••

••

••

••

••

••

••

•••

•••

••

•••

••

••

••

••

••

••

••

••

••

••

•••

•••

•• ••

WISH Log--Calories, 24--hour recallss.d. versus mean

WISH, Log Caloric Intake, Plot For Additivity, Loess and OLS.

• Perhaps a bit over-transformed — square-root might be better.

Page 48: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 19

TRANSPORTABILITY OF ESIMATES

• In linear regression, we only require need the measurement error variance (after

checking for semi-constant variance, additivity, normality).

• In general though, more is needed. Let’s remember that if we observeW instead

of X, then the observed data have a regression ofY onW that effectively acts as

if

E(Y |W ) = β0 + βxE(X|W )

≈ β0 + βx{λW + (1− λ)µx}.

• As we will see later, in general problems we can do a likelihood analysis if we

know the distribution ofX givenW .

Page 49: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 20

TRANSPORTABILITY OF ESIMATES

• It is tempting to try to use outside data and transport an estimate ofλ to yourproblem.

∗ Bad idea!!!

λ =σ2

x

σ2x + σ2

u

∗ Note how this depends on the distribution ofX.

∗ It is rarely the case that two populations have the sameX distribution, evenwhen the same instrument is used.

I the instrument effects onlyσ2u, notσ2

x

• Maybe one should transport the estimate ofσ2u and use

λ =σ2

w − σ2u,transported

σ2w

Page 50: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 21

EXTERNAL DATA AND TRANSPORTABILITY

• We say that a model istransportableacross studies if the model holdswith the

same parametersin the two studies.

∗ Internal data is ideal since there is no question about transportability.

• With external data, transportability back to the primary study cannot be

taken for granted.

∗ Sometimes transportability clearly will not hold. Then the value of the external

data is, at best, questionable.

∗ Even is transportability seems to be a reasonable assumption, it is still just that,

an assumption.

Page 51: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 22

EXTERNAL DATA AND TRANSPORTABILITY

• As an illustration, consider two nutrition data sets which use exactly the same FFQ

• Nurses Health Study

∗ Nurses in the Boston Area

• American Cancer Society

∗ National sample

• Since the same instrument is used, error properties should be about the same.

∗ But maybe note the entire distribution!!

∗ var(differences, NHS = 47)

∗ var(differences, ACS = 45)

∗ var(sum, NHS = 152)

∗ var(sum, ACS = 296)

Page 52: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 23

10 20 30 40 50

010

2030

4050

pccal.nhs

NHS

10 20 30 40 50

010

2030

40

pccal.acs

ACS

FFQ Histograms in NHS and ACS

Page 53: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 24

IS THERE AN “EXACT” PREDICTOR?

• One can distinguish betweentwo concepts ofX:

∗ theactual exact predictor, which may never be observable under any circum-

stances.

∗ the operationally-defined exact predictorwhich could be observed, albeit at

great cost and effort.

• Example:X is long-term caloric intake.

∗ actualX is long-term intake over lifetime.

∗ operationally-definedX is long-term intake since inception of the study, as

measured by the best available instrument.

Page 54: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 25

IS THERE AN “EXACT” PREDICTOR? —CONTINUED

• One could probably construct hypothetical examples where the effects of actual

X and of operationally-definedX are quite different, even opposite in sign.

• Obviously, a policy to changeX, say to reduce intake of saturated fat or of calo-

ries, affects the actualX, not the operationally definedX.

• Nonetheless, the operationally-definedX is all we can work with.

• “Gold standard” generally means the operationally-definedX.

∗ We will take “exact predictor” and “gold standard” to both refer to operationally-

definedX.

∗ However, for some authors, “gold standard” mean the actual exactX even if

not operationally-defined.

Page 55: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 26

THE BERKSONMODEL

• Theclassical Berkson modelsays thatTrue Exposure = Observed Exposure + Mean Zero Error

X = W + Ub (or X = W × Ub),

• It is assumed thatW andUb are indep. andE(U) = 0 (additive error) orE(U) = 1

(multiplicative error) so that

E(X|W ) = W

• Compare with classical measurement error model where

W = X + U

andE(X|W ) = λW + (1− λ)µx.

Page 56: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 27

THE BERKSONMODEL — CONT.

• From previous pageE(X|W ) = W

• In the linear regression model,

∗ Ignoring error still leads to unbiased intercept and slope estimates,∗ but the error about the line is increased.

I To see this, note that

Y = β0 + β1(W + Ub) + ε = β0 + β1W + (β1Ub + ε)

• In the logistic model with normally distributed measurement error, ignoring error

∗ Leads to a bias in slope (and intercept):

Est. Slope≈ True Slope√1 + β2

1σ2u,b/2.9

.

∗ For many problems, this attenuation is only minor.

Page 57: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 2 28

END OF LECTURE2

Page 58: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 1

LECTURE 3: BERKSON/CLASSICAL,

STRUCTURAL/FUNCTIONAL,

REGRESSION CALIBRATION

OUTLINE

• What is Berkson? What is classical?

• Functional versus structural modeling

∗ Classical and flexible structural modeling

• Regression calibration

• Multiplicative error

Page 59: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 2

WHAT ’S BERKSON? WHAT ’S CLASSICAL?

• Berkson: X = W + Ub andW andUb are independent

• Classical: W = X + Uc andX andUc are independent

• In practice, it may be hard to distinguish between the classical and the Berksonerror models.

∗ In some instances, neither holds exactly.

∗ In some complex situations, errors may have both Berkson and classical com-ponents, e.g., when the observed predictor is a combination of 2 or more error–prone predictors.

• Berkson model: a nominal value is assigned.

∗ Direct measures cannot be taken, nor can replicates.

• Classical error structure: direct individual measurements are taken, and can bereplicated but with variability.

Page 60: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 3

WHAT ’S BERKSON? WHAT ’S CLASSICAL?

• The reason why people have trouble distinguishing the two is that in real life, it

takes hard thought to do so.

Page 61: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 4

LET’S PLAY STUMP THE EXPERTS!

• Framingham Heart Study

∗ Predictor is systolic blood pressure

• All workers with the same job classification and age are assigned the same expo-

sure based on job exposure studies.

• Using a phantom, all persons of a given height and weight with a given recorded

dose are assigned the same radiation exposure.

• The radon gas concentration in houses that have been torn down is estimated by

the average concentration in a sample of nearby houses

Page 62: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 5

BERKSONERROR IN NONLINEAR MODELS

• Suppose thatY = f (X, β) + ε

• Then

Y = f (W,β) +1

2f ′′(W,β)σ2

b +

{f ′(W,β)Ub +

1

2f ′′(W,β)(U 2

b − σ2b ) + ε

}

∗ Here

f ′(W,β) =∂

∂Wf (W,β)

and so forth

∗ The nature of the bias(in red)depends on the sign off ′′(W,β)

∗ In the linear casef ′′(W,β) = 0 and we get the previous result: no bias

Page 63: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 6

M IXTURES OF BERKSONAND CLASSICAL ERROR

• Let L be long-term average radon concentration for all houses in a neighborhood

• X is long-term average radon concentration in a specific house in this neighbor-hood

• W is average measured radon concentration for a sample of houses in this neigh-borhood

• Model of Tosteson and Tsiatis (1988)

X = L + Ub

W = L + Uc

L, Ub, Uc are independent

∗ Used by Reeves, Cox, Darby, and Whitley (1998) and Mallick, Hoffman, andCarroll (2002)

Page 64: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 7

M IXTURES OF BERKSONAND CLASSICAL ERROR

• From previous page: Model of Tosteson and Tsiatis (1988)

X = L + Ub

W = L + Uc

• Uc = 0 ⇒ X = W + Ub ⇒ Berkson

• Ub = 0⇒ W = X + Uc ⇒ Classical

• More generally,

W = X + Uc − Ub

andUc is independent ofX while Ub is independent ofW

Page 65: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 8

FUNCTIONAL AND STRUCTURAL MODELING CLASSICAL ERROR

MODELS

• The common linear regression texts make distinction:

∗ Functional: X ’s arefixed constants

∗ Structural: X ’s arerandom variables

• If you pretend that theX ’s are fixed constants, it seems plausible to try to estimatethem as well as all the other model parameters.

• This is the functional maximum likelihood estimator.

∗ Every textbook has the linear regression functional maximum likelihood esti-mator.

• Unfortunately, the functional MLE in nonlinear problems has two defects.

∗ It’s really nasty to compute.

∗ It’s a lousy estimator (badly inconsistent).

• Structural⇒ empirical Bayes type analysis

Page 66: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 9

FUNCTIONAL AND STRUCTURAL MODELING CLASSICAL ERROR

MODELS

• More useful distinction:

∗ Functional modeling: No assumptions made about theX ’s (could be random

or fixed)

∗ Classical structural modeling: Strong parametric assumptions made about

the distribution ofX. Generally normal, lognormal or gamma.

∗ Flexible structural modeling: Structural, but flexible parametric family. Tries

to get the best of both worlds.

• Flexible is similar in spirit toQuasi-structural modeling (Pierce et al., 1992)

∗ Assume the that distribution ofX is the (unknown) empirical distribution of

theX values in the sample

Page 67: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 10

CHOOSINGBETWEEN FUNCTIONAL OR STRUCTURAL MODELS

• Key questions:

∗ How sensitive are inferences to an assumedSTRUCTURAL model?

∗ How much does it “cost” to be functional?

• FUNCTIONAL : No need to perform extensive sensitivity analyses.

• Many functional methods are simple to implement (and some are computed using

little more than standard software).

• Functionality focuses emphasis on the error model.

• Because of “latent-model” robustness, a functional analysis serves as a useful

check on a parametric structural model.

Page 68: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 11

CHOOSINGBETWEEN FUNCTIONAL OR STRUCTURAL MODELS

• Structural models can be viewed as anempirical Bayesmethod of dealing with

a large number of nuisance parameters (the true covariate values) (Whittemore,

1989)

• Best knownfunctional methods can beinefficient (missing data, thresholds, non-

parametric regression).

Page 69: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 12

CHOOSINGBETWEEN FUNCTIONAL OR STRUCTURAL MODELS

• One should consider the distribution of the trueX ’s

∗ Example: A-bomb survivor study (Pierce et al., 1992):

I All subjects survived the acute effects

I This provides some information

I A high radiation measurement is likely to be due to a high measurement

error

∗ This information can, perhaps, be captured by during structural modeling

I The distribution of true doses is different among survivors than among all

exposed to the attacks

Page 70: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 13

CHOOSINGBETWEEN FUNCTIONAL OR STRUCTURAL MODELS

• My opinion is that one should usestructural models

∗ But care is needed when modeling the distribution ofX

Page 71: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 14

SOME FUNCTIONAL METHODS

• Regression Calibration/Substitution

∗ Replaces true exposureX by an estimate of itbased only on covariatesbut not

on the response.

∗ In linear model with additive errors, this is the classicalcorrection for attenua-

tion.

∗ In Berkson model, one simply ignores the measurement error.

• SIMEX is a fairly general functional method.

∗ It assumes only that you have an error model and that you can “add on” mea-

surement error to make the problem worse.

Page 72: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 15

REGRESSIONCALIBRATION —BASIC IDEAS

• Key idea: replace the unknownX by E(X|Z, W ) which depends only on the

known(Z, W ).

∗ This provides an approximate model forY in terms of(Z, W ).

∗ Called the “conditional expectation approach” by Lyles and Kupper (1997)

• Generally applicable.

∗ Depends on the measurement error being “not too large” in order for the ap-

proximation to be suffciently accurate.

∗ For some models, if var(X|Z,W ) is constant then the only bias due to the

regression calibration approximation is in the intercept parameter.

Page 73: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 16

REGRESSIONCALIBRATION —BASIC IDEAS

Why does regression calibration work?

• X = E(X|Z,W ) + U whereU is uncorrelated with any function of(Z, W )

• If

Y = XTβx + ZTβz

then

Y = E(X|Z, W )Tβx + ZTβz + (UTβx + ε)

where(UTβx + ε) is uncorrelated with the regressorsZ andE(X|Z, W )

• Therefore, regression ofY onE(X|Z,W ) andZ gives unbiased estimates

– E(X|Z,W ) could be replaced by the linear regression (best linear predictor)

of X using(Z,W )

Page 74: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 17

THE REGRESSIONCALIBRATION ALGORITHM

• The general algorithm is:

∗ Using replication, validation, or instrumental data, develop a modelE(X|Z, W ) =

m(Z,W, γcm) and estimateγcm.

∗ ReplaceX by m(Z, W, γcm) and run your favorite analysis.

∗ Obtain standard errors by the bootstrap or the “sandwich method.”

I Can automatically correct for uncertainty aboutγcm

• In linear regression, regression calibration is equivalent to the “correction for at-

tenuation.”

Page 75: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 18

THE REGRESSIONCALIBRATION ALGORITHM

• Easily adjusts for different amount of replication

• If W i is the average ofmi replicates then

λi =σ2

x

σ2x + σ2

u/mi

E(Xi|W i) = λiW i + (1− λi)µx

Page 76: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 19

LOGISTIC REGRESSION, NORMAL X

• Consider the logistic regression model

Pr(Y = 1|X) = {1 + exp(−β0 − βxX)}−1 = H(β0 + βxX).

• Suppose that

∗ X andU are normally distributed.

∗ ThenX givenW is normal with meanE(X|W ) = µx(1−λ)+λW and varianceλσ2

u.

• If the event is “rare”, thenH(β0 + βxX) ≈ exp(β0 + βxX).

• Then, by using moment generating functions, the observed data follow

Pr(Y = 1|W ) = E [{Pr(Y = 1|X, W )} |W ] =why?

E [{Pr(Y = 1|X)} |W ]

≈ E {exp(β0 + βxX)|W} = exp{β0 + (1/2)β2xλσ2

u + βxE(X|W )}≈ H{β0 + (1/2)β2

xλσ2u + βxE(X|W )}.

• Typically, the regression calibration approximation works fine in this case.

Page 77: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 20

LOGISTIC REGRESSION, NORMAL X

• A different approximation uses the fact thatH(x) ≈ Φ(kx) with k = 0.588 ≈1/1.70

−10 −5 0 5 100

0.5

1normallogistic

−10 −5 0 5 10−0.01

0

0.01

error

−3 −2 −1 0 1 2 3−0.2

0

0.2

relative error

error = {Φ(kx)−H(x)} and relative error = Error /H(x).

Page 78: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 21

LOGISTIC REGRESSION, NORMAL X

Pr(Y = 1|X) = Φ{k(β0 + βxX)} = Pr{U ≤ k(β0 + βxX)}

Therefore

Pr(Y = 1|W ) = E[Pr{U ≤ k(β0 + βxX)}|W ]

= Pr[U − βxkUb ≤ k{β0 + βxE(X|W )}]

= Φ

[k{β0 + βxE(X|W )}(1 + β2

xk2σ2

X|W )1/2

]≈ H

{β0 + βxE(X|W )

(1 + β2xk

2σ2X|W )1/2

}

Here

Ub ∼ N(0, σ2X|W ) = N(0, λσ2

u)

Page 79: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 22

LOGISTIC REGRESSION, NORMAL X

A third approximation is

Pr(Y = 1|W ) = H [β0 + βx{E(X|W ) + Ub}|W ]

≈ H{β0 + βxE(X|W )} +1

2β2

x H ′′{β0 + βxE(X|W )}σ2X|W

Note that

H ′′(x) > 0 if x < 0

H ′′(x) < 0 if x > 0

• This approximation is applicable to most models, not just logistic regression

Page 80: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 23

ESTIMATING THE CALIBRATION FUNCTION

• Need to estimateE(X|Z,W ).

∗ How this is done depends, of course, on the type of auxiliary data available.

• Easy case: validation data

∗ suppose one has internal, validation data.

∗ then one can simply regressX on (Z, W ) and transports the model to the non-

validation data.

∗ of course, for the validation data one regressesY on (Z, X), and this estimatemust be combined with the one from the non-validation data.

• Same approach can be used for external validation data, but with the usual concernfor non-transportability.

Page 81: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 24

ESTIMATING THE CALIBRATION FUNCTION: INSTRUMENTAL

DATA : ROSNER’ S METHOD

• Internal unbiased instrumental data:

∗ supposeE(T |Z,X) = E(T |Z,X,W ) = X so thatT is an unbiased instru-ment.

∗ If T is expensive to measure, thenT might be available for only a subset of thestudy.W will generally be available for all subjects.

∗ thenE(T |Z,W ) = E{E(T |Z, X,W )|Z, W} = E(X|Z, W ).

• Thus,T regressed on(Z, W ) follows the same model asX regressed on(Z, W ),although with greater variance.

∗ So one regressesT on (Z, W ) to estimate the parameters in the regression ofX on (Z, W ).

Page 82: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 25

ROSNER’ S METHOD, CONT.

• More formally, suppose we have the modelWi = γ0,em + γtx,emXi + γt

z,emZi + Ui

(error calibration model)

• Under joint normality, this implies a calibration model

Xi = γ0,cm + γtw,cmWi + γt

z,cmZi + Ucm,i

∗ Sometimes one can use validation data to regressX onZ andW

∗ Suppose we only have an “alloyed gold standard”Ti = Xi + UT,i on a “valida-

tion sample”

I the regressT onZ andW

I This works! — think ofTi = Xi + Ui as outcome measurement error

Page 83: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 26

ESTIMATING THE CALIBRATION FUNCTION: REPLICATION DATA

• Suppose that one has unbiased internal replicate data:

∗ Wij = Xi + Uij, i = 1, . . . , n andj = 1, . . . , ki, whereE(Uij|Zi, Xi) = 0.

∗ W i· := 1ki

∑j Wij.

∗ Notation:µz is E(Z), Σxz is the covariance (matrix) betweenX andZ, etc.

• Use standard least squares theory to get the best linear unbiased predictor ofX

from (W,Z):

E(X|Z, W ) ≈ µx + (Σxx Σtxz)

{Σxx + Σuu/k Σxz

Σtxz Σzz

}−1 (W − µw

Z − µz

)

(best linear approximation= exact conditional expectation under joint normality).

• Need to estimate the unknownµ’s andΣ’s.

• Essentially a flexible structure method since existence ofµx andΣxx is assumed.

Page 84: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 27

ESTIMATING THE CALIBRATION FUNCTION: REPLICATION DATA ,

CONTINUED

• µz andΣzz are the “usual” estimates since theZ ’s are observed.

• µx = µw =∑n

i=1 kiW i/∑n

i=1 ki.

• Σxz =∑n

i=1 ki(W i· − µw)(Zi − µz)t/ν

whereν =∑

ki −∑

k2i /

∑ki.

•Σuu =

∑ni=1

∑kij=1(Wij −W i·)(Wij −W i·)

t

∑ni=1(ki − 1)

.

•Σxx =

[{n∑

i=1

ki(W i· − µw)(W i· − µw)t

}− (n− 1)Σuu

]/ν.

Page 85: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 28

REGRESSIONCALIBRATION VERSUSCORRECTINGTHE NAIVE

ESTIMATOR FOR ATTENUATION

• If

g{E(Y |X)} = β0 + β1X (GLM)

and

E(X|W ) = α0 + λW

then these two slope estimators are equivalent:

∗ RegressY onE(X|W ) (e.g., Rosner, Spiegelman, Willett, 1989)

∗ RegressY on W and divide the slope byλ (e.g., Carroll, Ruppert, Stefanski,

1995)

• Equivalence does not hold in general, e.g., for multiplicative error, whenE(X|W )

is nonlinear.

Page 86: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 29

MULTIPLICATIVE ERROR

• The multiplicative lognormal error model is

W = X U, log(U) ∼ N(µu, σ2u) (1)

• If log(X) is normal then we can convert (1) to a model forX givenW

log(X) = α + λ log(W ) + Ub

whereUb is N(0, σ2b ) and independent ofW

• If Y were linear inlog(X) then we would work withlog(X) and have an additive

error model

∗ But typically it is assumed thatY is linear inX, not log(X)

Page 87: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 30

MULTIPLICATIVE ERROR

• Themultiplicative errormodelwith outcome linear in exposureis well-supported

by empirical work:

∗ Lyles and Kupper (1997,Biometrics):

I “there is much evidence for this model”

I “the more biologically relevant predictor is true mean exposure on theorig-

inal scale”

Page 88: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 31

MULTIPLICATIVE ERROR

∗ Pierce, Stram, Vaeth, Schafer (1992, JASA)

I data from Radiation Effects Research Foundation (RERF) in Hiroshima

I “On both radiobiological and empirical grounds, focus is on models where

the expected response is linear or quadratic in dose rather than linear on

logistic and logarithmic scales.”

I “It is accepted that radiation dose-estimation errors are more homogeneous

on a multiplicative than on an additive scale”

I “The distribution of true doses is extremely nonnormal”

I However, it seemed unlikely that a lognormal model for the true doses would

fit as well as the Weibull model that they used

Page 89: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 32

MULTIPLICATIVE ERROR

∗ Seychelles study (preliminary analysis of Thurston)

I validation data (brain versus maternal hair MeHg): error appears to be mul-

tiplicative and lognormal

I researchers use MeHg concentration, not log concentration, as the exposure

Page 90: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 33

MULTIPLICATIVE ERROR

• from previous page:

log(X) = α + λ log(W ) + Ub (2)

• From (2)

X = W λ exp(Ub)

and

E(X|W ) = W λ exp(α + σ2

b/2)

Not linear in W

Page 91: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 34

MULTIPLICATIVE ERROR

• Therefore ifE(Y |X) = β0 + β1X then

E(Y |W ) = β0 +{β1 exp

(α + σ2

B/2)}

W λ

• Therefore, regressY onW λ and divide the slope estimate byexp(α + σ2

B/2)

∗ This is regression calibration again

• Note that the regression ofY on W is not linear even though the regression ofY

onX is linear

∗ This is because the regression ofX onW is not linear

Page 92: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 35

MULTIPLICATIVE ERROR

0 1 2 3 4 5 6−0.5

0

0.5

1

1.5

2

predictor

outc

ome

n=50

true xsurrogate w

Simulation: W = XU , log(X) = N(0, 1/4), log(U) = N(−1/8, 1/4)

Y = X/2 + N(0, 0.04)

Page 93: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 36

MULTIPLICATIVE ERROR

0 5 10 15 20−1

0

1

2

3

4

5

w

outc

ome

n=10000

w−yfit to x−yfit to w−y

W = XU , log(X) = N(0, 1/4), log(U) = N(−1/8, 1/4)

Y = X/2 + N(0, 0.04)

Page 94: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 37

MULTIPLICATIVE ERROR

0 1 2 3 4 5−1

0

1

2

3

4

5

w1/2

outc

ome

n=10000 w1/2−yfit

W = XU , log(X) = N(0, 1/4), log(U) = N(−1/8, 1/4)

Y = X/2 + N(0, 0.04) Note: λ = 1/2

Page 95: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 38

MULTIPLICATIVE ERROR

Other work

• Hwang (1986) regressesY onW (notW λ) and then corrects

∗ This is the “correction method” that is equivalent to regression calibrationonly

whenE(X|W ) is linear inW – which is not the case here

∗ Consistent but badly biased in simulations of Lyle and Kupper

∗ Bias is still noticeable forn = 10, 000

∗ Previous plots suggest why

∗ These results are consistent with those of Iturria, Carroll, and Firth (1999)

Page 96: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 39

MULTIPLICATIVE ERROR

Other work

• Lyles and Kupper (1999) propose a method based on quasi-likelihood

∗ somewhat superior to regression calibration in a simulation study

I especially when measurement error is large relative to equation error (plus

measurement error inY )

∗ it is a weighted version of regression calibration

I more efficient than regression calibration because it weights according to

inverse conditional variances

∗ a special case of expanded regression calibration (Carroll, Ruppert, Stefanski,

1995)

Page 97: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 40

MULTIPLICATIVE ERROR

Other work

• Iturria, Carroll, and Firth (1999) study polynomial regression with multiplicative

error

∗ One of their general methods is a special case of Hwang’s

∗ Their “partial regression” estimator assumes lognormality and generalizes the

estimator discussed earlier

∗ In a simulation with lognormalX andU , the partial regression estimator is

superior to the ones that make less assumptions

I this is similar to the results of Lyles and Kupper

Page 98: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 41

TRANSFORMATIONSTO ADDITIVE ERROR

• Model ish(W ) = h(X) + U

∗ Includes additive and multiplicative error as a special case

• Eckert, Carroll, and Wang (1997) consider parametric (power) and nonparametric(monotonic cubic spline) models forh

∗ can transform to a specific error distribution

∗ could, instead, transform to constant error variance

∗ if model holds, then both methods have the same target transformation

∗ it could be more efficient to use both sources of information (Ruppert andAldershof, 1989)

• Nusser et al. (1996) use transformations to estimate the distribution ofX whereX is the daily intake of a dietary component

Page 99: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 3 42

END OF LECTURE3

Page 100: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 1

LECTURE 4: SIMEX AND INSTRUMENTAL VARIABLES

OUTLINE

• The Key Idea Behind SIMEX (Simulation/Extrapolation)

• An Empirical Version of SIMEX

• Details of Simulation Extrapolation Algorithm

• Example: Measurement Error in SBP in Framingham Study

• IV (Instrumental Variables): Rationale

• IV via prediction and the IV algorithm

• Example: Calibrating a FFQ

• Example: CHD and Cholesterol

Page 101: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 2

ABOUT SIMULATION EXTRAPOLATION

• A functional method

∗ no assumptions about the trueX values

• For bias reduction and variance estimation

∗ like bootstrap and jackknife

• Not model dependent

∗ like bootstrap and jackknife

• Handles complicated problems

• Computer intensive

• Approximate, less efficient for certain problems

Page 102: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 3

THE KEY IDEA

• Theeffects of measurement error on an estimator can be studied with a sim-

ulation experiment in which additional measurement error is addedto the

measured data and the estimator recalculated.

• “Response variable” is the estimator under study

• “Independent factor” is the measurement error variance

∗ “Factor levels” are the variances of the added measurement errors

• Objective is to study how the estimator depends on the variance of the measure-

ment error

Page 103: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 4

OUTLINE OF THE ALGORITHM

• Add measurement error to variable measured with error

∗ Λ controls amount of added measurement error

∗ σ2u increased to(1 + Λ)σ2

u

∗ Average over many simulations to remove Monte Carlo variation

• Recalculate estimates — called pseudo-estimates

• Plot Monte Carlo average pseudo-estimates versusΛ

• Extrapolate toΛ = −1

∗ Λ = −1 corresponds to case of no measurement error

Page 104: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 5

Measurement Error Variance

Coe

ffici

ent

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Naive Estimate

Illustration of SIMEX

Your estimate when you ignore measurement error.

Page 105: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 6

Measurement Error Variance

Coe

ffici

ent

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Naive Estimate

Illustration of SIMEX

What happens to your estimate when you have more error, which

you add on by simulation, but you still ignore the error.

Page 106: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 7

Measurement Error Variance

Coe

ffici

ent

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Naive Estimate

Illustration of SIMEX

What statistician can resist fitting a curve?.

Page 107: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 8

Measurement Error Variance

Coe

ffici

ent

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Naive Estimate

SIMEX Estimate

Illustration of SIMEX

Now extrapolate to the case of no measurement error.

Page 108: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 9

AN EMPIRICAL VERSIONOF SIMEX: FRAMINGHAM DATA

EXAMPLE

• Data

∗ Y = indicator of CHD

∗ Wk = SBP at Examk, k = 1, 2

∗ X = “true” SBP

∗ Data:

(Yj, W1,j, W2,j), j = 1, . . . , 1660

• Model Assumptions

∗ W1, W2 | X iid N(X, σ2u)

∗ Pr(Y = 1 | X) = H(α + βX), H logistic

Page 109: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 10

FRAMINGHAM DATA EXAMPLE : THREE NAIVE ANALYSES:

• RegressY onW • 7−→ βA

• RegressY onW1 7−→ β1

• RegressY onW2 7−→ β2

Measurement Error Slope

Λ Variance= (1 + Λ)σ2u/2 Estimate

−1 0 ?

0 σ2u/2 βA

1 σ2u β1, β2

Page 110: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 11

Logistic regression fits in Framingham using

first replicate, second replicate and average of both

Page 111: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 12

A SIMEX–type plot for the Framingham data,

where the errors are not computer–generated.

Page 112: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 13

A SIMEX–type extrapolation for the Framingham data,

where the errors are not computer–generated.

Page 113: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 14

SIMULATION AND EXTRAPOLATION STEPS: ADDING

MEASUREMENTERROR

• Framingham Example:

∗ W • 7−→ W1 = W • + (W1 −W2)/2

∗ W • 7−→ W2 = W • + (W2 −W1)/2

• In General:

∗ Best Estimator(data

) 7−→ Best Estimator(data +

√Λ {Independent N

(0, σ2

u

)

Error } )

∗ Λ controls amount (variance) of added measurement error

Page 114: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 15

SIMULATION AND EXTRAPOLATION ALGORITHM: ADDING

MEASUREMENTERROR (DETAILS)

• ForΛ ∈ {Λ1, . . . , ΛM}

• For b = 1 (1) B, compute:

∗ bth pseudo data set

Wb,i(Λ) = Wi +√

Λ Normal(0, σ2

u

)b,i

∗ bth pseudo estimate

θb(Λ) = θ({Yi,Wb,i(Λ)}n

1

)

∗ the average of the pseudo estimates

θ(Λ) = B−1B∑

b=1

θb(Λ) ≈ E(θb(Λ) | {Yj, Xj}n

1

)

Page 115: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 16

SIMULATION AND EXTRAPOLATION STEPS: EXTRAPOLATION

• Framingham Example: (two pointsΛ = 0, 1)

∗ Linear Extrapolation —a + bΛ

• In General: (multipleΛ points)

∗ Linear —a + bΛ

∗ Quadratic —a + bΛ + cΛ2

∗ Rational Linear —(a + bΛ)/(c + Λ) (exact for linear regression)

Page 116: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 17

SIMULATION AND EXTRAPOLATION ALGORITHM:

EXTRAPOLATION (DETAILS)

• Plot θ(Λ) vsΛ (Λ > 0)

• Extrapolate toΛ = −1 to getθ(−1) = θSIMEX

• For certain models and assuming the extrapolant function is chosen correctly

• E{

θ(−1) | True Data}

= θTRUE

Page 117: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 18

EXAMPLE : MEASUREMENTERROR IN SYSTOLIC BLOOD

PRESSURE

• Framingham Data:(Yj, Agej, Smokej, Cholj WA,j

), j = 1, . . . , 1615

∗ Y = indicator of CHD

∗ Age (at Exam 2)

∗ Smoking Status (at Exam 1)

∗ Serum Cholesterol (at Exam 3)

∗ Transformed SBP

WA = (W1 + W2) /2,

Wk = ln (SBP− 50) at Examk

Page 118: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 19

EXAMPLE : PARAMETER ESTIMATION

• Consider logistic regression ofY on Age, Smoke, Chol and SBP with transformed

SBP measured with error

∗ The plots on the following page illustrate the simulation extrapolation method

for estimating the parameters in the logistic regression model

Page 119: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 20

• • • • • • • •

Age

LambdaC

oeffi

cien

t (x

1e+

2)-1.0 0.0 1.0 2.0

4.36

4.95

5.54

6.13

6.72

0.05360.0554

• • • • • • • •

Smoking

Lambda

Coe

ffici

ent (

x 1

e+1)

-1.0 0.0 1.0 2.0

3.43

4.68

5.93

7.18

8.43

0.601 0.593

• • • • • • • •

Cholesterol

Lambda

Coe

ffici

ent (

x 1

e+3)

-1.0 0.0 1.0 2.0

5.76

6.82

7.87

8.93

9.98

0.00782

0.00787•

••

••

••

Log(SBP-50)

Lambda

Coe

ffici

ent (

x 1

e+0)

-1.0 0.0 1.0 2.0

1.29

1.50

1.71

1.92

2.12

1.93

1.71

Page 120: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 21

INSTRUMENTAL VARIABLES: RATIONALE

• Remember,W = X + U , U ∼ Normal(0, σ2u).

• The most direct and efficient way to get information aboutσ2u is to observeX on

a subset of the data.

• The next best way is via replication, namely to take≥ 2 independent replicates

∗ W1 = X + U1 and W2 = X + U2.

∗ If these are indeed replicates, then we can estimateσ2u via a components of

variance analysis.

• The third method is to useInstrumental Variables.

∗ SometimesX cannot be observed and replicates cannot be taken.

∗ Then IV’s can help.

Page 121: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 22

WHAT IS AN INSTRUMENTAL VARIABLE ?

Y = β0 + βxX + ε;

W = X + U ;

U ∼ Normal(0, σ2u).

• In linear regression, an instrumental variableT is a random variable which hasthree properties:

∗ T is independent ofε

∗ T is independent ofU

∗ T is related toX.

∗ You only measureT to get information about measurement error: it is not part

of the model.

∗ In our parlance,T is a surrogate forX!

• WhetherT qualifies as an instrumental variable can be a difficult question.

Page 122: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 23

AN EXAMPLE : CALIBRATING A QUESTIONNAIRE

X = usual (long–term) average intake of Fat (log scale);

Y = Fat as measured by a questionnaire;

W = Fat as measured by 6 days of 24–hour recalls

T = Fat as measured by a diary record

• In this example, the time ordering was:

∗ Questionnaire

∗ Then one year later, the recalls were done fairly close together in time.

∗ Then 6 months later, the diaries were measured.

• One could think of the recalls as replicates, but some researchers have worriedthat substantial correlations exist, i.e., they are notindependentreplicates.

• The 6–month gap with the recalls and the 18–month gap with the questionnairemakes the diary records a good candidate for an instrument.

Page 123: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 24

USING INSTRUMENTAL VARIABLES: MOTIVATION

• In what follows, we will use underscores to denote what coefficients go where.

• For example,βY |1X is the coefficient forX in the regression ofY onX.

• Let’s do a little algebra:

Y = βY |1X + βY |1XX + ε;

W = X + U ;

(ε, U) = independent ofT.

• This means

E(Y | T ) = βY |1T + βY |1TT = βY |1X + βY |1XE(X | T )

= βY |1X + βY |1XE(W | T ) = βY |1T + βY |1XβW |1TT.

Page 124: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 25

MOTIVATION — CONT.

• From previous page

E(Y | T ) = βY |1T + βY |1X βW |1TT.

• We want to estimateβY |1X

∗ From above

βY |1T = βY |1X βW |1T

∗ Equivalently,

βY |1X =βY |1TβW |1T

.

∗ RegressY onT and divide its slope by the slope of the regression ofW onT !

Page 125: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 26

THE DANGERSOF A WEAK INSTRUMENT

• Remember that we get the IV estimate using the relationship

βY |1X =βY |1TβW |1T

.

• The division causes increased variability.

∗ If the instrument is very weak, the slopeβW |1T will be near zero.

∗ This will make the IV estimate very unstable.

Page 126: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 27

IV V IA PREDICTION

• Here’s another way to do the same thing and get the IV estimator without doing

division explicitly:

∗ RegressW onT .

∗ Form predicted values:βW |1T + βW |1TT .

∗ RegressY on these predicted values

∗ The slope is the instrumental variables estimate.

Page 127: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 28

LOGISTIC REGRESSIONEXAMPLE

• These data come from a paper by Satten and Kupper.

Y = Evidence of Coronary Heart Disease (binary)W = Cholesterol LevelT = LDLZ = Age and Smoking Status

• It’s not particularly clear thatT is really an instrument. While it may well beuncorrelated with the error inW and the random variation inY , it might also beincluded as part of the model itself!

∗ There are no replicates here to compare with.

• Note here that we have added variables,Z, which are measured without error.

• The algorithm does not change: all regressions are done withZ as an extra covari-ate.

Page 128: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 29

IV A LGORITHM: NON-GAUSSIAN OUTCOME

• The IV algorithm in linear regression is:

STEP 1: RegressW onT andZ (may be a multivariate regression)

STEP 2: Form the predicted values of this regression

STEP 3: RegressY on the predicted values andZ.

STEP 4: The regression coefficients are the IV estimates.

• Only Step 3 changes if you do not have linear regression but instead have logisticregression or a GLM.

∗ Then the “regression” is logistic or GLM.

∗ Very simple to compute.

∗ Easily bootstrapped.

• This method is “valid” in GLM’s to the extent that regression calibration is valid.

Page 129: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 30

LOGISTIC REGRESSIONEXAMPLE : RESULTS

• We used LDL/100, Cholesterol/100, Age/100

• The naive logistic regression leads to the following analysis:

∗ Slope and bootstrap s.e. for Cholestrol: 0.65 and 0.29

∗ Slope and bootstrap s.e. for Smoking: 0.065 and 0.260

∗ Slope and bootstrap s.e. for Age: 7.82 and 4.26

• The IV logistic regression leads to the following analysis:

∗ Slope and bootstrap s.e. for Cholestrol: 0.91 and 0.33

∗ Slope and bootstrap s.e. for Smoking: 0.056 and 0.259

∗ Slope and bootstrap s.e. for Age: 7.79 and 4.31

• Note here how only the coefficient for cholesterol is affected by the measurementerror.

Page 130: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 4 31

END OF LECTURE4

Page 131: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 1

LECTURE 5: BAYESIAN METHODS

OUTLINE

• Likelihood: Outcome, measurement, and exposure models

• Priors and posteriors

• MCMC

• Example: linear regression

• Example: Mixtures of Berkson and classical error and application to thyroid can-

cer and exposure to fallout

Page 132: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 2

BAYESIAN METHODS

Outcome model:(true disease model)

f (Y |X,Z,θO)

Could be a GLM or a “nonparametric” spline model

Measurement model:(measurement error model)

f (W |Y,X,Z,θM) = f (W |X,Z,θM) if nondifferential (which is assumed here)

Could be a classical Gaussian measurement error model

Exposure model:(predictor distribution model)

f (X|Z, θX)f (Z|θZ) (usuallyf (Z|θZ) is ignored)

Could be a low-dimensional parametric model or a more flexible model (e.g., normalmixture model)

Page 133: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 3

PRIOR

π(θO), π(θM), π(θX), π(θZ) (usuallyπ(θZ) is ignored)

Page 134: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 4

L IKELIHOOD

f (Y, W |θO, θM , θX, Z) =

∫f (Y |X,Z, θO) f (W |X,Z,θM) f (X|Z, θX) dX

Major problem: computing the integral

We will go in another direction — Bayesian analysis by MCMC.

Page 135: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 5

POSTERIOR ANDMCMC

f (θO, θM , θX, X|Y, W,Z) ∝ f (Y, W,Z,X,θO, θMθX|Z)

= f (Y |X,Z,θO) f (W |X,Z,θM) f (X|Z, θX)π(θO) π(θM) π(θX)

MCMC — For example we might sample successively from:

1. [θO|others]

2. [θM |others]

3. [θX|others]

4. [X|others]

Page 136: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 6

MCMC: L INEAR REGRESSIONWITH REPLICATE

MEASUREMENTS

[Yi|Xi, Z i, β, σε] ∼ N(ZTi βz + Xiβx, σ

2ε ) (outcome model)

[Wi,j|X ] ∼ IN(Xi, σ2u), j = 1, . . . , Ji (measurement model)

[Xi|Zi] = [Xi] ∼ N(µx, σ2x) (exposure model)

[βx] = N(0, σ2β), [βz] = N(0, σ2

βI), [µx] = N(0, σ2µ)

[σ2ε ] = IG(δ, δ), [σ2

u] = IG(δ, δ), [σ2x] = IG(δ, δ),

σβ andσµ are “large” andδ is “small”

unknowns: (βx, βz, σε), (σu), (X1, . . . , XN), (µx, σx)

Page 137: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 7

MCMC: L INEAR REGRESSION— L IKELIHOOD

Let

C i =

(Z i

Xi

), Y =

Y1

...

YN

, and β =

(βz

βx

)

Likelihood:

[Yi,Wi,1, . . . , Wi,Ji|others] ∝ 1

σεσJiu

exp

{−1

2

((Yi −CT

i β)2

σ2ε

+

∑Jij=1(Wi,j −Xi)

2

σ2u

)}

Page 138: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 8

MCMC: L INEAR REGRESSION— POSTERIORFOR β

[β|others] ∝ exp

{− 1

2σ2ε

N∑i=1

(Yi −CTi β)2 − 1

2σ2β

‖β‖2

}

Let C haveith rowCTi and letλ = σ2

ε/σ2β. Then

[β|others] = N({CTC + λI

}−1 CTY , σ2ε

(CTC + λI)−1

)

• Exactly what we get for linear regression without measurement error.

∗ Except nowC will vary on each iteration of MCMC, since it contains imputed

Xs.

Page 139: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 9

MCMC: L INEAR REGRESSION— POSTERIORFOR µx

[µx|others] ∝ exp

{−

∑Ni=1(Xi − µx)

2

2σ2x

− µ2x

2σ2µ

}

Let σ2X

= σ2x/N . Then

[µx|others] ∼ N

(1

σ2X

+1

σ2µ

)−1 (X

σ2X

+0

σ2µ

),

(1

σ2X

+1

σ2µ

)−1 ,

≈ N(X, σ2

X

)if σµ is large

Page 140: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 10

MCMC: L INEAR REGRESSION— POSTERIORFOR Xi

Define

σ2w,i =

σ2u

Ji

Then

[Xi|others] ∝ exp

[−1

2

{(Yi −Xiβx −ZT

i βz)2

σ2ε

+(Xi − µx)

2

σ2x

+(W i −Xi)

2

σ2w,i

}]

After some algebra,[Xi|others] is normalwith mean

{(Yi −ZTi βz)/βx}(β2

x/σ2ε ) + (µx)(1/σ

2x) + (W i)(1/σw,i)

(β2x/σ

2ε ) + (1/σ2

x) + (1/σ2w,i)

and variance {(β2

x/σ2ε ) + (1/σ2

x) + (1/σ2w,i)

}−1

Page 141: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 11

MCMC: L INEAR REGRESSION— POSTERIORFOR σ2ε

Prior:

[σ2ε ] = IG(δ, δ)

IG(α, β):

mean =α/β and variance =α/β2 and density∝ x−(α+1) exp(−β

x

)

Posterior:

[σ2ε |others] ∝ 1

(σ2ε )

δ+N/2+1exp

{δ + 1

2

∑Ni=1(Yi −Xiβx −ZT

i βz)2

σ2ε

}

⇒ [σ2ε |others] = IG

{δ +

N

2, δ +

1

2

N∑i=1

(Yi −Xiβx −ZTi βz)

2

}

Page 142: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 12

MCMC: L INEAR REGRESSIONPOSTERIORFOR σ2x

[σ2x |others] ∝ 1

(σ2x)

δ+N/2+1exp

{δ + 1

2

∑Ni=1(Xi − µx)

2

σ2ε

}

⇒ [σ2x |others] = IG

{δ +

N

2, δ +

1

2

N∑i=1

(Xi − µx)2

}

Page 143: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 13

MCMC: L INEAR REGRESSION— POSTERIORFOR σ2u

[σ2u |others] ∝ 1

(σ2u)

δ+∑N

i=1 Ji/2+1exp

{δ + 1

2

∑Ni=1

∑Jij=1(Wi,j −Xi)

2

σ2u

}

⇒ [σ2u |others] = IG

δ +

∑Ni=1 Ji

2, δ +

1

2

N∑i=1

Ji∑j=1

(Wi,j −Xi)2

Page 144: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 14

M IXTURES OF BERKSONAND CLASSICAL MEASUREMENT

ERROR

Mallick, Hoffman, and Carroll (2002) study models where both Berkson and classical

measurement errors are present.

• W = observed dose of radiation from fallout due to nuclear testing in Nevada

• W = C ×DCF × I × TD × FP

∗ C = time-integrated radioiodine concentration of milk

I specific to producer, not individual(Berkson)

I but one component is deposition rate ofI131 across regions of study(Clas-

sical)

Page 145: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 15

M IXTURES OF BERKSONAND CLASSICAL MEASUREMENT

ERROR

∗ DCF = ingestion dose conversion factor

I specific to age and isotope(Berkson) (Classical)

∗ I = individual milk intake rate in liters per day, measured by a food frequency

questionnaire (FFQ)(Berkson)

∗ TD = time delay (depends on source of milk — from FFQ)(Berkson)

∗ FP= frequency of purchase correction factor (from FFQ)(Berkson)

Page 146: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 16

LATENT VARIABLE MODEL

• W = measured dose

• X = true dose

• L = latent intermediate variable

• Berkson and classical errors are combined in:

log(X) = log(L) + Ub

log(W ) = log(L) + Uc

∗ Ub = 0 ⇒ log(X) = log(L) ⇒ log(W ) = log(X) + Uc (Classical)

∗ Uc = 0⇒ log(W ) = log(L) ⇒ log(X) = log(W ) + Ub (Berkson)

Page 147: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 17

LATENT VARIABLE MODEL

• σ2b + σ2

c is known

• p = σ2b/(σ2

b + σ2c ) is unknown

∗ informative priors:

I p = 1 (Berkson)

I p = 0 (Classical)

I p is uniform on [0.2, 0.8](Mixture)

I compare results from these three priors (sensitivity analysis)

• Z = age at exposure, sex (known exactly)

• state = state of residence (Utah, Nevada, Arizona) (known exactly)

Page 148: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 18

EXPOSUREAND OUTCOME MODELS

• Exposure model:

[ log(L)|Z, state = s ] ∼ N( αs0 + ZTαs1, σ2s )

• Outcome model:

∗ Y = indicator of thyroid cancer

∗ logit{pr(Y = 1|Z, X)} = β0 + ZTβ1 + log(1 + θX)

Page 149: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 19

APPROXIMATE MODEL FOR [Y |W,Z]

• Outcome model (from previous page):

∗ Y = indicator of thyroid cancer

∗ logit {pr(Y = 1|Z, X)} = β0 + ZTβ1 + log(1 + θX)

• Outcome model based onW :

∗ logit{pr(Y = 1|Z, W )} ≈ β0 + ZTβ1 + log(1 + θ γ W λx|w,t

)

where

γ = exp{(

1− λx|w,t

) (µL|Z + σ2

L|Z/2)

+ σ2b/2

}

I µL|Z andσ2L|Z are the mean and variance oflog(L) givenZ

and

λx|w,t =σ2

L|Zσ2

L|Z + σ2c

Page 150: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 20

BERKSONCASE

• λx|w,t = 1

• γ = exp(σ2b/2)

and model on previous page simplifies to

logit{pr(Y = 1|Z, W )} ≈ β0 + ZTβ1 + log (1 + θ γ W )

∗ γ > 1 so effect of Berkson error is for the naive analysis tooverestimatethe

effect of dose

∗ In the regression, replaceW by γW to correct for bias (regression calibration)

Page 151: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 21

BAYESIAN ANALYSIS

• Several parametric and semiparametric models were used

• nonparametric model for dose is constrained to be monotonically increasing

• nonparametric model for the distribution oflog(L) uses a Polya-tree prior (Lavine,

1992)

Page 152: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 22

BAYESIAN ANALYSIS: SOME RESULTSFOR PARAMETRIC

MODELS

Error model Post. mean ofθ RR at 1 Gy 95% credible interval for RR

No error 38.9 9.4 (4.5, 13.8)

Classical 74.1 17.1 (8.5, 24.6)

Berkson 31.9 7.9 (3.8, 11.4)

Mixture 56.1 13.2 (5.0, 23.1)

• Effects on a naive analysis:

∗ Berkson error⇒ overestimate effect

∗ Classical error⇒ underestimate effect

∗ Mixture of errors⇒ underestimate effect (but less than if only classical error)

Page 153: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 23

BAYESIAN ANALYSIS: SEMIPARAMETRIC MODELS

• the semiparametric models have better fit by DIC

• the semiparametric model forlog(L):

∗ leads to slightly smaller estimated posterior means ofθ and RR

∗ credible intervals also widen

• the semiparametric outcome model:

∗ leads to slightly higher estimated posterior means ofθ and RR

∗ credible intervals also widen

• Overall effect of both semiparametric models on mixture of errors model:

∗ raise RR from 13.2 to 14.2

∗ change CI from (5.0, 23.1) to (1.7, 33.6)

Page 154: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 24

BAYESIAN ANALYSIS: ASSUMPTIONOF INDEPENDENTERROR

• almost certainly false

• C (radioiodine concentration) includes

∗ deposition ofI131 by region

∗ mass interception on vegetation

∗ consumption of vegetation by cows

∗ effective half-life ofI131 in vegetation

∗ milk transfer coefficient (MTC)

• Utah study generated a distribution of log-normally distributed MTC’s with esti-mated mean and variance

∗ if parameters known, then error is Berkson

∗ parameters estimated from historical data and literature review

⇒ shared classical errors

Page 155: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 25

SENSITIVITY ANALYSIS

• For the six gender/state combinations, Berkson errors for individuals in a group

had a common correlation ofρ

ρ 0.0 0.2 0.4 0.6

E(θ|data) 56.1 65.9 84.1 95.1

CI (18.6, 102) (21.4, 120) (30.4, 143) (38.2, 152)

Page 156: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 5 26

END OF LECTURE5

Page 157: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 1

LECTURE 6: NONPARAMETRIC REGRESSION WITH

MEASUREMENT ERROR

OUTLINE

• Earlier approaches: deconvolution kernels, SIMEX, regression calibration

• New Bayesian spline approach (Berry, Carroll, and Ruppert, 2002, JASA)

• Simulation results

• Example: A clinical trial

• Example: TSP and mortality, a sensitivity study

Page 158: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 2

THE PROBLEM OF MEASUREMENTERROR— ILLUSTRATION

−4 −2 0 2 4−1

−0.5

0

0.5

1

1.5

x

y

Gold Standard

datatrue curveSpline

−4 −2 0 2 4−1

−0.5

0

0.5

1

1.5

w

y

Data with measurement error

Page 159: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 3

THE PROBLEM OF MEASUREMENTERROR— ILLUSTRATION

−4 −2 0 2 4−1.5

−1

−0.5

0

0.5

1

1.5

x

y

Gold Standard

data true curveSpline

−4 −2 0 2 4−1.5

−1

−0.5

0

0.5

1

1.5

w

y

Data with measurement error

Page 160: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 4

THE PROBLEM OF MEASUREMENTERROR

• The regression model is

Y = m(X) + ε

wherem is only known to be smooth

• Observe

Y and W = X + U, where

– E(U |X) = 0 and var(U |X) = σ2u

– U |X normally distributed

• Measurement error varianceσ2u is estimated from internal replicate data. (Observe

Wij, j = 1, . . . , Ji.)

Page 161: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 5

AVAILABLE METHODS

• Deconvolution kernels

• SIMEX

• Structural spline

• Bayesian spline

Page 162: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 6

DECONVOLUTION KERNELS

• Globally consistent nonparametric regression by deconvolution kernels (Fan and

Truong, 1993,Annals)

– does not work so well

∗ Fan & Truong show very poor asymptotic rates of convergence

∗ simulations show poor finite-sample behavior

– no methodology for bandwidth selection or inference

Page 163: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 7

SIMEX

• The SIMEX method is due to Cook & Stefanski (1995,JASA).

• SIMEX has been previously applied to parametric problems.

• Makes no assumptions about the trueX ’s. (Functional)

• Results in estimators which areapproximatelyconsistent, i.e., consistent at least

to orderO(σ6u).

Page 164: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 8

SIMEX

• Carroll, Maca, Ruppert (2001,Biometrika) (CMR ) applied the SIMEX to non-parametric regression.

• CMR have asymptotic theory in the local polynomial regression (LPR) context.

∗ The estimators have the usual rates of convergence.

∗ They are approximately consistent, to orderO(σ6u).

• An asymptotic theory with rates seems very difficult for splines

∗ but, simulations inCMR indicate that SIMEX/splines works a littlebetterthan

SIMEX/kernel

∗ problem seems due to undersmoothing

• Staudenmayer and Ruppert (2004,JRSS-B) look at bandwidth selection for SIMEX/LPR.

∗ With better bandwidth selection, SIMEX/LPR is competitive with other meth-ods.

Page 165: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 9

STRUCTURAL MODELING

• The regression ofY on the observedW is

E(Y |W ) = E {m(X)|W} =

∫m(x)f (x|W )dx.

• Suppose that we had:

∗ convenient flexible form form(x; β)

∗ convenient flexible distribution forf (x|W ).

• Then we could estimatem(X ; β) by minimizing over the datan∑

i=1

{Yi −

∫m(x; β)f (x|Wi) dx

}2

.

Page 166: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 10

REGRESSIONSPLINES — “PLUS FUNCTION” B ASIS

• Model

E(Y |X) = m(X ; β) :=

J∑j=0

βjXj +

K∑

k=1

βk+J(X − ξk)J+

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 11

1.1

1.2

1.3

1.4

1.5

0 0.2 0.4 0.6 0.8 10.5

1

1.5

2

2.5

0 0.2 0.4 0.6 0.8 11

1.2

1.4

1.6

1.8

Page 167: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 11

10-KNOT QUADRATIC SPLINE APPROXIMATIONS

0 0.2 0.4 0.6 0.8 1−1.5

−1

−0.5

0

0.5

1

1.5sin(15x)

0 0.2 0.4 0.6 0.8 1−1

−0.5

0

0.5

1

1.5

sin{5(1+x+x2)}

0 0.2 0.4 0.6 0.8 1−0.5

0

0.5

1

1.5

Φ{18(x−.05)}

0 0.2 0.4 0.6 0.8 10

0.5

1

1.50.8x+exp{−35(x−0.5)}

“blue” = function ,“red” = spline approximation

Page 168: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 12

REGRESSIONSPLINES

• Model

E(Y |X) = m(X ; β) :=

J∑j=0

βjXj +

K∑

k=1

βk+J(X − ξk)J+

• Xj is replaced byE(Xj|W )

• (X − ξk)J+ is replaced byE{(X − ξk)

J+|W}

• The key remaining issue: the joint distribution ofX andU .

∗ CMR used a mixtures of normals for[X ] and Gibbs sampling to estimate the

parameters.(Flexible structural)

I This is an extension to measurement error of an idea of Roeder & Wasser-

man (JASA, 1997).

Page 169: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 13

FULLY BAYESIAN MODEL

What’s New?Answer: Fully Bayesian MCMC method in Berry, Carroll, and Ruppert (2002,JASA)(BCR)

• Uses splines

∗ smoothing or penalized

∗ P-splines in this lecture

• Structural

∗ Xi are iid normal

∗ but seems robust to violations of normality

Page 170: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 14

FULLY BAYESIAN MODEL

• Smoothing parameter is automatic

• Inference adjusts for the data-based smoothing parameter and for measurement

error

• Allow global confidence bands

Page 171: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 15

FULLY BAYESIAN MODEL — PARAMETERS

• Regression Model

Yi = m(xi; β) + εi

– m(xi; β) is a P-spline

– εi iid N(0, σ2ε )

• Measurement Error Model

Wij = Xi + Uij whereUij iid N(0, σ2u)

• Structural Model

Xi iid N(µx, σ2x)

• Parameters:β, σ2e , σ

2u, µx, σ

2x

Page 172: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 16

FULLY BAYESIAN MODEL — PARAMETERS

• Priors

– β is N(0, (γK)−1) whereK is known. [α := γσ2e is the smoothing parameter.]

– γ is Gamma(Aγ, Bγ)

– σ2e is Inv-Gamma(Ae, Be)

– σ2u is Inv-Gamma(Au, Bu)

– µx is N(dx, t2x)

– σ2x is Inv-Gamma(Ax, Bx)

• Hyperparameters: Ae, Be, Au, Bu, Ax, Bx, dx, t2x, Aγ, Bγ

– all fixed at values making the priors noninformative

∗ E.g.,t2x = 106.

Page 173: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 17

FULLY BAYESIAN MODEL — SMOOTHING

• Model

E(Y |X) = m(X ; β) :=

J∑j=0

βjXj +

K∑

k=1

βk+J(X − ξk)J+

• β is N(0, (γK)−1) whereK is known. [α := γσ2e is the smoothing parameter.]

• Let

K = diag(∆, . . . , ∆, 1, . . . , 1)

∗ ∆ is “small” — essentially 0

∗ J + 1 ∆’s followed byK ones

∗ The prior on the jumps at the knots is that they are iid.

∗ The prior on the polynomial coefficients is “diffuse,” i.e., “non-informative”.

Page 174: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 18

GIBBS SAMPLING

• Iterate throughβ, σ2e , σ

2u, σ

2x, µx, γ,X1, . . . , Xn.

• All steps except one are easy, either gamma, inverse-gamma, or normal

– E.g.,

[β|other parameters, Y , W ] ∼ Normal

Mean = (XTX + γK)−1XTY

Cov = σ2e(X

TX + γK)−1.

∗ HereX is one of the “other parameters”

∗ Essentially we’re fitting a spline to the imputedX ’s and the observedY ’s

Page 175: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 19

GIBBS SAMPLING

∗ Estimate ofβ, call it β, is

(XTX + γK)−1XTY

averaged overγ andX.

Page 176: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 20

GIBBS SAMPLING

• The exception to the sampling being quick and easy is that a Metropolis-Hastings

step is needed forX1, . . . , Xn.

[Xi|µx, σ2x, β, σ2

e , σ2u, Y ,W]

∝ exp

− 1

2σ2u

mi∑j=1

(Wij −Xi)2 − 1

2σ2ε

{Yi −m(Xi; β)}2 − 1

2σ2x

(Xi − µx)2

.

Page 177: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 21

BAYESIAN INFERENCE

• Let X be the spline basis function evaluated on a fine grid over some interval,

[a, b].

• Xβ is the curve on[a, b].

• Xβ is the estimated curve.

• Let Kα be the(1− α) MCMC sample quantile of

maxgrid

{X(β − β)

SD(Xβ)

}.

• Then,

Xβ ±Kα SD(Xβ)

is a 100(1− α)% simultaneous confidence band for the curve on[a, b].

Page 178: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 22

BAYESIAN INFERENCEFOR DERIVATIVES

• Let X′

be derivatives of the spline basis function evaluated on a fine grid over

[a, b].

• X′β is the curve’s derivative on[a, b].

• X′β is the estimated derivative.

• Let K ′α be the(1− α) MCMC sample quantile of

maxgrid

{X

′(β − β)

SD(X′β)

}.

• Then,

X′β ±K ′

.95 SD(X′β)

is a 100(1− α)% simultaneous confidence band for the derivative on[a, b].

Page 179: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 23

SIMULATIONS

The six cases were considered.ni ≡ 2 in each case.

Page 180: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 24

SIMULATIONS

• [Case 1]

The regression function is

m(x) =sin (πx/2)

1 + 2x2{sign(x) + 1}.

with n = 100, σ2ε = 0.32, σ2

u = 0.82, µx = 0 andσ2x = 1.

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Page 181: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 25

SIMULATIONS

• [Case 2] Same as Case 1 exceptn = 200.

• [Case 3] A modification of Case 1 above except thatn = 500.

• [Case 4]

Case 1 ofCMR so that

m(x) = 1000x3+(1− x)3+,

x+ = xI(x > 0), with n = 200, σ2ε = 0.00152, σ2

u = (3/7)σ2x, µx = 0.5 and

σ2x = 0.252.

−0.5 0 0.5 1 1.50

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

Page 182: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 26

SIMULATIONS

• [Case 5]

A modification of Case 4 ofCMR so that

m(x) = 10 sin(4πx),

with n = 500, σ2ε = 0.052, σ2

u = 0.1412, µx = 0.5 andσ2x = 0.252.

−0.5 0 0.5 1 1.5−10

−8

−6

−4

−2

0

2

4

6

8

10

• [Case 6]

The same as Case 1 above except thatX is a normalized chi-square(4) random

variable. (Tests robustness against violation of the structural assumptions.)

Page 183: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 27

SIMULATIONS

Mean Squared Bias×102

Method Case 1 Case 2 Case 3 Case 4 Case 5 Case 6Naive 5.59 4.92 5.21 1,108 3,733 4.83Bayes 0.78 0.38 1.04 17.4 468 1.74Flex. Structural, 5 knots 1.38 0.62 0.46 3.7 838 1.47Flex. Structural, 15 knots 1.44 0.60 0.66 3.3 226 1.75

Mean Squared Error ×102

Method Case 1 Case 2 Case 3 Case 4 Case 5 Case 6Naive 6.91 5.57 5.38 1,155 3,793 5.77Bayes 2.84 1.56 1.47 195 1,031 2.69Flex. Structural, 5 knots 8.17 3.82 1.73 217 2,032 7.27Flex. Structural, 15 knots 9.90 5.40 1.85 237 799 6.94

Results based on 200 Monte Carlo simulations for each case. SIMEX was notincluded in the table — it was not among the best estimators.

Page 184: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 28

EXAMPLE — SIMULATED

• Y = sin(2X) + ε

• X is N(1, 1)

• σu = 1

• σe = 0.15

• n = 201

• ni = 2 for all i

• 15 knot quadratic P-splines

• 2,000 iterations of Gibbs. First 667 deleted as burn-in.

Page 185: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 29

−1 0 1 2 3−1.5

−1

−0.5

0

0.5

1

1.5

x

y

Gold Standard

−4 −2 0 2 4 6−1.5

−1

−0.5

0

0.5

1

1.5

wbar

y

Data with measurement error

−2 0 2 4−4

−2

0

2

4

6

x

wba

r

−1 0 1 2 3−1.5

−1

−0.5

0

0.5

1

1.5

2

mha

t

x

Fitted curveTrue curveGold Std.

Page 186: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 30

0 1000 2000−5

0

5

10beta(1)

0 1000 2000−1

−0.5

0

0.5yhat(1)

0 1000 20000

0.5

1

1.5yhat(51)

0 1000 20000

0.5

1

1.5yhat(101)

0 1000 2000−2

−1

0

1x(1)

0 1000 20000.8

1

1.2

1.4

1.6

sigmau

0 1000 2000−4

−2

0

2log(gamma)

0 1000 2000−2.5

−2

−1.5

−1

−0.5

log(σe)

0 1000 20000.8

1

1.2

1.4

1.6

mux

Results of Gibbs Sampling. Every twentieth iteration.

Note: X(1) = −1.45 and W (1) = −0.8. Also, log(σe) = −1.9.

Page 187: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 31

EXAMPLE — SIMULATED

What does the Bayes approach work so well? Here’s my explanation:

Bayes uses all possible information to estimateX and, especially,m(X).

• ‖m(X)− E{m(X)|W, Y , other param.}‖≈ ‖m(X)− ave{m(X)}‖ = 2.47

• ‖m(X)−m(E{X|W, Y , other param.})‖≈ ‖m(X)−m(ave{X})‖ = 4.67

• ‖m(X)−m(E(X|W ))‖ = 10.25

• ‖m(X)−m(W )‖ = 12.36

Page 188: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 32

EXAMPLE — CLINICAL TRIAL

• Study of a psychiatric medication.

• Treatment and control group.

• Evaluation at baseline (W ) and at end of study (Y ).

– smaller values→ more severe disease

– scale is a combination of self-report and clinical interview so there is consider-

able measurement error

– it is believe thatσ2u ≈ 0.35.

• We are interested in∆(X) := m(X)−X = E(Y −W |X).

• Preliminary Wilcoxon test found a highly significant treatment effect.

• Question: How does the treatment effect depend upon the baseline value?

Page 189: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 33

True Baseline Score

Cha

nge

-1 0 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

TreatmentControl

Page 190: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 34

True Baseline Score

Tre

atm

ent E

ffect

-1 0 1 2

-3

-2

-1

0

1

2

3

Page 191: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 35

MAXIMUM L IKELIHOOD V IA MCEM

Ganguli, Staudenmayer, and Wand (2005, to appear):

• use the same model as Berry, Carroll, and Ruppert (2002)

• compute MLEs using van Dyk’s (2002) nested Monte Carlo EM algorithm

• their estimator works well

∗ this suggests that the success of the Berry, Carroll, and Ruppert method is due

to using the likelihood rather than from the information in the prior

• Ganguli, Staudenmayer, and Wand extend their method to additive models

Page 192: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 36

EXAMPLE — A SENSITIVITY ANALYSIS

Original study from Zanobetti, Wand, Schwartz, and Ryan (2000).

This analysis from Ganguli, Staudenmayer, and Wand (2005, to appear) — also see

Ruppert, Wand, Carroll (2003,Semiparametric Regression)

• Outcome = log(natural mortality)

• Exposure

∗ TSP = total suspended particles

• Confounders

∗ DAY = days since beginning of study

∗ Mean daily temperature

∗ Relative humidity

Page 193: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 37

EXAMPLE — A SENSITIVITY ANALYSIS

reliability ratio =var{log(TSPi)}

var{log(TSPi)} + σ2u

.

Problem: No replicates so the measurement error varianceσ2u cannot be estimated.

Page 194: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 38

EXAMPLE — A SENSITIVITY ANALYSIS

0 2 4 6−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

Log(TSP)

Add

itive

Mod

el T

erm

for

log(

TS

P)

No Measurement Errorrel. ratio = 0.9rel. ratio = 0.8rel. ratio = 0.7

0 1000 2000 3000 4000−0.4

−0.2

0

0.2

0.4

0.6

Day

Add

itive

Mod

el T

erm

for

Day

−10 0 10 20 30 40−0.2

−0.1

0

0.1

0.2

0.3

0.4

0.5

Mean Daily Temperature

Add

itive

Mod

el T

erm

for

Tem

pera

ture

0 20 40 60 80 100−0.1

−0.05

0

0.05

0.1

Relative Humidity

Add

itive

Mod

el T

erm

for

Hum

idity

Page 195: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 39

DISCUSSION

• With the work ofCMR andBCR we now have reasonably efficient estimators for

nonparametric regression with measurement error.

– SIMEX (LPR and splines) — inCMR

– (Flexible) Structural splines — inCMR

– Fully Bayesian (hardcore structural) — inBCR

Page 196: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 40

DISCUSSION

• With BCR we have a methodology that

– automatically selects the amount of smoothing

– estimates the unknownX ’s

– allows inference that takes account of the effects of smoothing parameter se-

lection and measurement error

• Most efficient methods appear to be structural, though SIMEX may be competitive

– hardcore structural methods seem reasonably robust

Page 197: MEASUREMENT ERROR IN HEALTH STUDIEScarroll/talks/Carroll_Short_Course.pdf · † Lecture 2 — Data Types, Nondifferential Error, Estimating Attentuation, Exact Predictors, Berkson/Classical

Lecture 6 41

END OF LECTURE6

THANK YOU FOR YOUR ATTENTION!!!