NONPARAMETRIC REGRESSION WITH MEASUREMENT ERROR: … · SIMEX CMR applied the SIMEX to nonparametric regression. CMR have asymptotic theory in the local polynomial re-gression (LPR)

NONPARAMETRIC REGRESSIONWITH

MEASUREMENT ERROR: SOME RECENT

PROGRESS

David Ruppert

Cornell University

www.orie.cornell.edu/˜davidr

(These transparencies, preprints, and references available —

link to “Recent Talks” and “Recent Papers”)

Work done jointly with

Scott Berry, Texas A & M

R. J. Carroll, Texas A & M

Jeff Maca, Novartis

John Staudenmayer, Cornell

1

OUTLINE

� Statement of problem — nonparametric regression with

measurement error

� Review of the currently available estimators

– deconvolution kernels (Fan & Truong, 1993, Annals)

– SIMEX (Carroll, Maca, Ruppert, 1999, Biometrika)

– structural splines (Carroll, Maca, Ruppert, 1999)

� New Bayesian spline approach (Berry, Carroll, and Rup-

pert, 2000)

� Simulation results

� Examples

– Simulated data

– Clinical trial of a psychiatric medication

2

THE PROBLEM OF MEASUREMENT ERROR —

ILLUSTRA TION

−4 −2 0 2 4−1

−0.5

0

0.5

1

1.5

x

y

Gold Standard

data true curveSpline

−4 −2 0 2 4−1

−0.5

0

0.5

1

1.5

w

y

Data with measurement error

3

THE PROBLEM OF MEASUREMENT ERROR —

ILLUSTRA TION

−4 −2 0 2 4−1.5

−1

−0.5

0

0.5

1

1.5

x

y

Gold Standard

data true curveSpline

−4 −2 0 2 4−1.5

−1

−0.5

0

0.5

1

1.5

w

y


4

THE PROBLEM OF MEASUREMENT ERROR

� We are interested in nonparametric regression when the

predictor�

cannot be observed exactly.

– The regression model is� � � �� where

�is only known to be smooth

– Observe �and � � � � ��

where�� var(

��) = ��

normally distributed

– The normality is not important.

� Measurement error variance � �� is estimated from internal

replicate data. (Observe � �� , � � ! #"$"$"# &% � .)

5

THE PROBLEM OF MEASUREMENT ERROR, CONT.

� Measurement error occurs in a wide variety of problems.

– Measuring nutrient intake

– Measuring airborne lead exposure

– Measuring blood pressure

– C ')( dating� The effects of measurement error are:

– biased estimates of the regression curve

– increase in the perceived variability about the regres-

sion line.� Other than the work of Fan and Truong (1993, Annals),

there had been little done on nonparametric regression

with measurement error until

– Carroll, Maca, and Ruppert (1999, Biometrika) (CMR )

and

– Berry, Carroll, Ruppert (2000, submitted) (BCR) — avail-

able at www.orie.cornell.edu/˜davidr.

6

REVIEW OF CURRENT ESTIMATORS

� Globally consistent nonparametric regression by deconvo-

lution kernels (Fan and Truong, 1993, Annals)

– does not work so well� Fan & Truong show very poor asymptotic rates of

convergence� we have simulations showing poor finite-sample be-

havior

– no methodology for bandwidth selection or inference� Standard measurement error method: SIMEX

– functional — no assumptions on * �,+– very general — can be applied to nearly any measure-

ment error problem, parametric or nonparametric� Structural Spline

– Regression splines for basic regression model

– Mixtures of normals for covariate density model

– Emphasis is on flexible parametric modeling, not non-

parametric modeling. (I believe there is little or no dif-

ference in practice.)

7

SIMEX

� The SIMEX method is due to Cook & Stefanski (1995,

JASA).

– The theory is in Carroll, et al. (1996, JASA)

– Also see Carroll, Ruppert, and Stefanski (1995, Mea-

surement Error in Nonlinear Models)

� SIMEX has been previously applied to parametric prob-

lems.

– makes no assumptions about the true�

’s. (Functional)

– results in estimators which are approximately consistent,

i.e., consistent at least to order - � �/.� � .� Here is the method, defined via a graph.

8

SIMEX, ILLUSTRA TED

Measurement Error Variance

Coe

ffici

ent

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Naive Estimate

SIMEX Estimate

Illustration of SIMEX

9

SIMEX

� CMR applied the SIMEX to nonparametric regression.� CMR have asymptotic theory in the local polynomial re-

gression (LPR) context.

– The estimators have the usual rates of convergence.

– They are approximately consistent, to order - � �/.� � .� An asymptotic theory with rates seems very difficult for

splines

– but, simulations in CMR indicate that SIMEX/splines

works a little better than SIMEX/kernel

– problem seems due to undersmoothing

� Staudenmayer (2000, Cornell PhD thesis) is looking at band-

width selection for SIMEX/LPR.

– With better bandwidth selection, SIMEX/LPR is com-

petitive with other methods.

10

STRUCTURAL MODELING

� We can estimate � �� from our data — use ordinary

smoothing.� But, this is related to the desired function� �0� �

by

� �� 1� � 2 � �� 3� � 4 � 56� �078�:9;��7<� � �>=?71"� If we had a convenient form for

� �� , say

� �0� @BAC�, and if

we knew * � � � +, then we could estimate

� �� @BAC�by mini-

mizing over the dataDE�GFH'JI � �LK 5 � �071@BAM�N9O�07<� � � �P=?7RQ � "– similar to regression calibration which uses the approx-

imation5 � ��71@BAM�:9;��7<� � � �P=S7UT � V 5 7W9;��7<� � � �P=S7X@BAZY� We need two things to make this work:

– convenient flexible form for� �071@BAM�

(must be parametric

but flexible enough to be nonparametric for all intents

and purposes)

– convenient flexible distribution for�

.

11

WHAT IF WE HAVE A DISTRIB UTION FOR�

?

� Suppose however that we knew * � +, and that * ��,+

is nor-

mal.

– We in particular know * � � � +!

– Could we get anywhere?

� Consider regression splines of order [ with \ knots:

� �� ;� � ��71@^]��`_a� bE�:Fdc ] � � � � eE�NFf' ] �:g b �� K h � � bg� The terms h ' $"i"j"i h e are knots.

12

REGRESSIONSPLINES

� Recall from the previous slide:

� �� ;� � �� @BAC�`_a� bE�:Fdc ] � � � � eE�NFf' ] �:g b �� K h � � bg� If we know * � k�l+

, and therefore * � � � +, then in the ob-

served data we have

� �0� � � �;� � �� @BAC�3� � �� bE�:FJc ] � � �0� � � � �m� eE�:FH' ] �:g b �n2 �� K h � � bg � � 4

– This is just a linear model in the]

’s !!!!

– There are many methods to fit such splines� The key remaining issue: the joint distribution of�

and�

.

– CMR used a mixtures of normals for * �,+and Gibbs

sampling to estimate the parameters.� This is an extension to measurement error of an idea

of Roeder & Wasserman (JASA, 1997).

13

FULLY BAYESIAN MODEL

What’sNew?

Answer: Fully Bayesian MCMC method� In BCR� Uses splines

– smoothing or penalized

– P-splines in this talk

� Structural

–� � are iid normal

– but seems robust to violations of normality

� Smoothing parameter is automatic� Inference adjusts for the data-based smoothing parameter

and for measurement error

– all automatic

14

FULLY BAYESIAN MODEL — PARAMETERS

� RegressionModel� � � � �07 � @BAC�8� �–� ��7 � @BAM� is a P-spline

– � iid o �p�P � �q �� MeasurementErr or Model� �� where

� �� iid o �0�r � �� Structural Model� � iid o �0stu � �t �� Parameters:Av � �w � �� &stu � �t� Priors

–A

is o V �r $�yx{z �N| ' Y wherez

is known. [ } _a� x ��w is thesmoothing parameter.]

–x

is Gamma�0~��

– � �w is Inv-Gamma�0~ w �� w �

– �� is Inv-Gamma�0~ � ��

–st

is o �p=�t� �� t �– ��t is Inv-Gamma

�0~ t �� t �� Hyperparameters:

~ w �� w �~ � �� &~�t� ��lt� �=�t� �� t &~�� l�– all fixed at values making the priors noninformative� E.g.,

� �t � �� . .15

GIBBS SAMPLING

� Iterate throughAv � �w � �� t �st� �x� &� ' $"$"#"$ �� D .� Generate each one conditional on the current values of the

others.� All steps except one are easy, either gamma, inverse-gamma,

or normal

– E.g.,

* A��otherparameters

�� +8� �`�!�:��#��k�#�� 0� �S� � x�z � | ' � �r�

� �� w �0� �?� � x�z � | ' "� Here

�is one of the “other parameters”� Essentially we’re fitting a spline to the imputed

�’s

and the observed�

’s� Estimate ofA

is�0� �S� � x�z � | ' � �d�averaged over

xand

�.

16

GIBBS SAMPLING

� The exception to the sampling being quick and easy is that

a Metropolis-Hastings step is needed for� ' $"#"$"3 &� D .

* � � ��st� � �t NA � �w � �� ¡ +¢ ��£S¤OV K ¥ � ��

¦/§E�NFf' � � �� K � � � �K ¥ � �q 2 � � K � �� @BA¨� 4 � K ¥ � �t �� K s t � � Y?"

17

SIMULA TIONS

The six cases were considered.% �8© ¥

in each case.

Case1 The regression function is� ��78�;� ª:« �¬�0/7/® ¥ � ¯� ¥ 7 � 2 ª:«±° �/��78�8� 4 "with

% � ��!�, ��q � �r"³² � , �� P"µ´ � , s t � �

and ��t � .

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

Case2 Same as Case 1 except% � ¥ �!�

.

Case3 A modification of Case 1 above except that% � ¶·�!�

.

18

Case4 Case 1 of CMR so that� ��78�;� ��!�¸�·7H¹g �> K 78�º¹g 7 g � 78»H��7 ¼ �½�, with

% � ¥ �!�, ��q � �r"³�¸�P �¶ � , �� p²�®u¾!� ��t ,stM� �P"¿¶

and � �t � �P" ¥ ¶ � .

−0.5 0 0.5 1 1.50

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

Case5 A modification of Case 4 of CMR so that� �078�� ªN« ��À½/78�Á with

% � ¶��¸�, ��q � �P"µ�½¶ � , �� r"j �Àd � , s t � �P"¿¶

and ��t ��r" ¥ ¶ � .

−0.5 0 0.5 1 1.5−10

−8

−6

−4

−2

0

2

4

6

8

10

Case6 The same as Case 1 above except that�

is a nor-

malized chi-square(4) random variable. (Tests robustness

against violation of the structural assumptions.)

19

Mean Squared Bias Â�Ã#Ä¸ÅMethod Case1 Case2 Case3 Case4 Case5 Case6

Naive 5.59 4.92 5.21 1,108 3,733 4.83

Bayes 0.78 0.38 1.04 17.4 468 1.74

Structural,5 knots 1.38 0.62 0.46 3.7 838 1.47

Structural,15knots 1.44 0.60 0.66 3.3 226 1.75

Mean Squared Err or ÂÆÃ3Ä½ÅMethod Case1 Case2 Case3 Case4 Case5 Case6

Naive 6.91 5.57 5.38 1,155 3,793 5.77

Bayes 2.84 1.56 1.47 195 1,031 2.69

Structural,5 knots 8.17 3.82 1.73 217 2,032 7.27

Structural,15knots 9.90 5.40 1.85 237 799 6.94

Table1: Resultsbasedon 200MonteCarlosimulationsfor eachcase.SIMEX was

not includedin thetable— it wasnotamongthebestestimators.

20

EXAMPLE — SIMULA TED

� � � ª:« �/� ¥ � �m� � �

is o �> ! $ �� w � �P"i �¶� % � ¥ �P � % � � ¥

for all Ç� 15 knot quadratic P-splines� 2,000 iterations of Gibbs. First 667 deleted (burn-in pe-

riod).

21

−1 0 1 2 3−1.5

−1

−0.5

0

0.5

1

1.5

x

y

Gold Standard

−4 −2 0 2 4 6−1.5

−1

−0.5

0

0.5

1

1.5

wy


−2 0 2 4−3

−2

−1

0

1

2

3

4

5

x

w

−1 0 1 2 3−1.5

−1

−0.5

0

0.5

1

1.5

mha

t

x

Fitted curveTrue curve Gold Std.

Figure1: SimulatedData.

22

0 1000 2000−4

−3

−2

−1

0

1

2

3beta(1)

0 1000 2000−0.6

−0.5

−0.4

−0.3

−0.2

−0.1

0

0.1

0.2yhat(1)

0 1000 2000−1

−0.5

0

0.5

1

1.5yhat(51)

0 1000 20000.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1yhat(101)

0 1000 2000−1.5

−1

−0.5

0

0.5x(1)

0 1000 20000.85

0.9

0.95

1

1.05

1.1

1.15

1.2

sigmau

0 1000 2000−3

−2

−1

0

1

2

3log(gamma)

0 1000 2000−2.5

−2

−1.5

−1

−0.5

log(σe)

0 1000 20000.8

0.9

1

1.1

1.2

1.3

1.4

1.5

mux

Figure2: SimulatedData. Resultsof GibbsSampling. Every twentiethiteration

plotted.Note: È�ÉËÊ�ÌLÍ ÎÏÊ�ÐaÑ and Ò ÉËÊ�ÌLÍ Î1Ó�ÐÕÔ . Also, log É×Ö¸ØËÌ = ÎÏÊ�ÐÚÙ .

EXAMPLE — SIMULA TED

What does the Bayes approach work so well? Here’s my ex-

planation:

Bayes uses all possible information to estimate�

and,

especially,� ��

.

�ÜÛÛÛÛ � �� K ��2 � �� #�Ý¡ �� otherparam.4 ÛÛÛÛT ÛÛÛÛ � �� K �Á�u� 2 � ��Þ� � 4 ÛÛÛÛ � ¥ "aÀ�¾

� ÛÛÛÛ � �� K � � �n2 � �Ý¡ �� otherparam.4 � ÛÛÛÛT ÛÛÛÛ � �� K � �)��u� 2 Þ� 4 � ÛÛÛÛ � Àd"µß½¾

� ÛÛÛÛ � �� K � � � �0� � � �^� ÛÛÛÛ � ��P" ¥ ¶� ÛÛÛÛ � �� K � � � � ÛÛÛÛ � ¥ "µ²!ß

24

EXAMPLE — CLINICAL TRIAL

� Study of a psychiatric medication.� Treatment and control group.� Evaluation at baseline ( � ) and at end of study (�

).

– smaller values à more severe disease

– scale is a combination of self-report and clinical inter-

view so there is considerable measurement error

– it is believe that �� T �P"µ²½¶.

� We are interested in á �� `_³� � �� K � � � �� K � �� .� Preliminary Wilcoxon test found a highly significant treat-

ment effect.� Question: How does the treatment effect depend upon the

baseline value?

25

True Baseline Score

Cha

nge

-1 0 1 2

0.0

0.5

1.0

1.5

2.0

2.5

3.0

TreatmentControl

Figure3: Estimateof thefunction âãÉ×ä�Ì�Í å Éjä�ÌLÎ�ä for thecontrolgroupandthe

treatmentgroupin theexample.(Note: Positivechangeis animprovement.)

26

True Baseline Score

Trea

tmen

t Effe

ct

-1 0 1 2

-3

-2

-1

0

1

2

3

Figure 4: Estimate(solid) of the differenceof the function â¨Éjä�Ì Í å Éjä�Ì1Î äbetweenthe tr eatment group and the control group in the example(control-

treatment)with 90%pointwisecredibleintervals(dashed).

27

EXAMPLE - ')( C DATING AND SEALEVEL

� Core samples taken from four salt marshes in Maine� � �true age (or true C ')( age?) of sample� Only one � measured per core sample, but there is a

standard deviation also report.� � �sea level as determined by micro fossils in sample

– expressed as a deviation from present

� Preliminary data analysis suggests:

– Nonlinear� K �

relationship

– Site effects, which might be modeled as linear:� � � �� 8� ¹E�:FH' ] � � � »f� site� � �� "

� Methodology can be applied w/o much change.

28


0 1 2 3 4 5−8

−7

−6

−5

−4

−3

−2

−1

0

C14 age (1000 years)

Sea

Lev

el c

hang

e fr

om p

rese

nt

Wells site

0 1 2 3 4 5−8

−7

−6

−5

−4

−3

−2

−1

0


Sea

Lev

el c

hang

e fr

om p

rese

nt

Phippsburg site

0 1 2 3 4 5−8

−7

−6

−5

−4

−3

−2

−1

0


Sea

Lev

el c

hang

e fr

om p

rese

nt

Gouldsboro site

0 1 2 3 4 5−8

−7

−6

−5

−4

−3

−2

−1

0


Sea

Lev

el c

hang

e fr

om p

rese

nt

Machiasport site

29


0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5−8

−7

−6

−5

−4

−3

−2

−1

0Combined Sites


Sea

Lev

el c

hang

e fr

om p

rese

nt

Naive fit partial residuals

30

DISCUSSION

æ With the work of CMR and BCR we now have reasonably

efficient estimators for nonparametric regression with mea-

surement error.

– SIMEX (LPR and splines) — in CMR

– (Flexible) Structural splines — in CMR

– Fully Bayesian (hardcore structural) — in BCR

æ With BCR we have a methodology that

– automatically selects the amount of smoothing

– estimates the unknown ç ’s

– allows inference that takes account of the effects of

smoothing parameter selection and measurement error

æ Most efficient methods appear to be structural, though SIMEX

may be competitive

– hardcore structural methods seem reasonably robust

31

Documents

NONPARAMETRIC REGRESSION WITH MEASUREMENT ERROR: … · SIMEX CMR applied the SIMEX to nonparametric regression. CMR have asymptotic theory in the local polynomial re-gression (LPR)