Hypothesis testing in an errors-in-variables model with heteroscedastic measurement errors

STATISTICS IN MEDICINEStatist. Med. (2008)Published online in Wiley InterScience(www.interscience.wiley.com) DOI: 10.1002/sim.3343

Hypothesis testing in an errors-in-variables model withheteroscedastic measurement errors

Mario de Castro1,∗,†, Manuel Galea2 and Heleno Bolfarine3

1Instituto de Ciencias Matematicas e de Computacao, Universidade de Sao Paulo, Caixa Postal 668,13560-970 Sao Carlos-SP, Brazil

2Universidad de Valparaıso, Valparaıso, Chile3Universidade de Sao Paulo, Instituto de Matematica e Estatıstica, Sao Paulo-SP, Brazil

SUMMARY

In many epidemiological studies it is common to resort to regression models relating incidence of a diseaseand its risk factors. The main goal of this paper is to consider inference on such models with error-proneobservations and variances of the measurement errors changing across observations. We suppose that theobservations follow a bivariate normal distribution and the measurement errors are normally distributed.Aggregate data allow the estimation of the error variances. Maximum likelihood estimates are computednumerically via the EM algorithm. Consistent estimation of the asymptotic variance of the maximumlikelihood estimators is also discussed. Test statistics are proposed for testing hypotheses of interest.Further, we implement a simple graphical device that enables an assessment of the model’s goodness offit. Results of simulations concerning the properties of the test statistics are reported. The approach isillustrated with data from the WHO MONICA Project on cardiovascular disease. Copyright q 2008 JohnWiley & Sons, Ltd.

KEY WORDS: errors-in-variables models; equation-error models; maximum likelihood; hypothesistesting; goodness of fit

1. INTRODUCTION

Errors-in-variables models (also known as measurement error models) extend the usual regressionmodels toward a more realistic representation of the covariate. Observed values are interpreted asa error-prone proxy of the unobservable values of the covariate. A broad coverage of topics in

∗Correspondence to: Mario de Castro, Instituto de Ciencias Matematicas e de Computacao, Universidade de SaoPaulo, Caixa Postal 668, 13560-970 Sao Carlos-SP, Brazil.

†E-mail: [email protected]

Contract/grant sponsor: FONDECYT (Fondo Nacional de Desarrollo Cientıfico y Tecnologico, Chile); contract/grantnumber: Proyecto 1070919

Received 1 September 2007Copyright q 2008 John Wiley & Sons, Ltd. Accepted 23 April 2008

M. DE CASTRO, M. GALEA AND H. BOLFARINE

the subject matter is found in some books [1–4]. As early as 1989 Statistics in Medicine issueda special number dedicated to errors-in-variables models. Since then many applications flourishedin the medical literature, from either a classical or a Bayesian perspective (e.g. [5–8]).

As in Reference [5], our motivation arises from epidemiological studies with aggregate data.These data allow the estimation of the error variances (changing across observations), which arethen considered as known. In fact, Kulathinal et al. [5] and Cheng and Riu [9] investigate anerrors-in-variables model with varying variances, but they concentrated solely on estimation. Ourendeavor aims to cover estimation as well as hypothesis testing.

Our paper’s addition to the literature is twofold. First, test statistics are proposed for testinghypotheses of interest with the asymptotic chi-square distribution, which guarantees correct asymp-totic significance levels. Remembering that goodness of fit receives scarce attention, another keyissue we deal with is the adequacy of the model. We contribute to this point implementing a simplegraphical device. Our results are empirically validated through simulations and applied to data setsfrom the WHO MONICA Project on cardiovascular disease [5, 10].

2. MODELS AND INFERENCE

Let n be the sample size; yi the true (unobserved) response in unit i ; xi the true covariate valuefor unit i ; Xi the observed value of the covariate in unit i ; and Yi the observed response in unit i .Relating these variables we postulate the model

yi = �0+�1xi +qi (1)

Xi = xi +�xi (2)

Yi = yi +�yi (3)

noting that in (1) the relationship between the true variables is not perfect but contaminated withan error in the equation (qi), where �xi and �yi amount to (additive) measurement errors. Thissetup is called the equation-error model [1, 2], which differs from the no-equation-error, since inthe latter the error qi vanishes, i =1, . . .,n.

Letting gi = (�xi ,�yi)�, as in Reference [5], the errors and the unobserved covariate have normal

distributions, denoted by N(mean, variance). In our working framework, we assume that

gi and (q j , x j )� are independent, qi and x j are independent

qii.i.d.∼ N(0,�2), gi

indep.∼ N2

(0,

[�xi 0

0 �yi

])

xii.i.d.∼ N(�x ,�

2x ), i, j =1, . . .,n

(4)

We call attention to the fact that the measurement errors in (4) are heteroscedastic. Variances �xiand �yi are supposedly known and greater than 0, i =1, . . .,n. Since the true covariate is treatedas a random variable, we have a structural model. As in Reference [5], the functional model

Copyright q 2008 John Wiley & Sons, Ltd. Statist. Med. (2008)DOI: 10.1002/sim

HYPOTHESIS TESTING IN AN ERRORS-IN-VARIABLES MODEL

(x1, . . ., xn are unknown constants) will not be contemplated here (see Reference [9]). Denotingthe observable variables by vi = (Xi ,Yi )� and letting a= (0,�0)

� and b= (1,�1)�, the model

defined by equations (1)–(3) can be written as

vi =a+xib+(

�xi

�yi +qi

)

Then, under the assumptions in (4), it follows that

viindep.∼ N2(a+�xb,Si) (5)

where

Si =�2xbb�+si =

[�2x +�xi �1�

2x

�1�2x �21�

2x +�yi +�2

](6)

with si =diag(�xi ,�yi +�2) standing for a 2×2 diagonal matrix with elements �xi and �yi +�2,

i =1, . . .,n. Using (5), it is true that (vi −a−�xb)�S−1i (vi −a−�xb)

i.i.d.∼ �22, i =1, . . .,n, where �22indicates a chi-square distribution with two degrees of freedom. By applying the Wilson–Hilfertytransformation [11]

ri =3{[(vi −a−�xb)�S−1i (vi −a−�xb)/2]1/3−8/9} (7)

we obtain rii.i.d.∼ N(0,1), i =1, . . .,n, approximately. Such a distributional result enables us to check

the model in practice, as we will explore in Section 4. The determinant and the inverse of Si are

|Si |=�xi (�yi +�2)c−1i �2

x and S−1i =s−1

i −cis−1i bb�s−1

i

where

ci =�2x (1+�2xb�s−1

i b)−1, i =1, . . .,n (8)

Let h= (�0,�1,�x ,�2x ,�

2)�. The log-likelihood function corresponding to the model definedby (5) can be written as

l(h) =n∑

i=1li (h)

li (h) = const.− 12 log |Si |− 1

2qi

(9)

where qi = (vi −a−�xb)�S−1i (vi −a−�xb) =qi1−ciq2i2, with

qi1= (vi −a−�xb)�s−1i (vi −a−�xb)=�−1

xi (Xi −�x )2+(�yi +�2)−1(Yi −�0−�x�1)

2



and

qi2=�−1xi (Xi −�x )+(�yi +�2)−1�1(Yi −�0−�x�1), i =1, . . .,n

Maximization of the log-likelihood function (9) is quite involved. Maximum likelihood (ML)estimates are more easily computed with the EM algorithm [12]. For the equation-error model,the EM algorithm is detailed in Kulathinal et al. [5]. Define

l=(

�1

�2

)=(

�x

�0+�x�1

)and R=

[�11 �12

�21 �22

]=[

�2x �1�2x

�1�2x �21�

2x +�2

](10)

so that Si =R+diag(�xi ,�yi), i =1, . . .,n. After convergence, parameter estimates follow from the

one-to-one transformation �x = �1, �1= �12/�11, �0= �2− �x �1, �2x = �11, and �2= �22− �12�2x .

Starting values of l and R can be taken as l=v= (X,Y )� and R=n−1∑ni=1(vi −v)(vi −v)�−

diag(�x ,�y), with X=∑ni=1 Xi/n.

When there is no error in the equation, �2=0 in (10), there are four parameters to be estimated,implying that the mapping connecting (l,R) and h= (�x ,�0,�1,�

2x )

� is not one-to-one. Thus, aspecific algorithm has to be invoked. Since the distribution of (x, X,Y ) is such that

(xi , Xi ,Yi)�indep.∼ N3

((�x

a+�xb

),

[�2x �2xb

�

�2xb �2xbb

�+diag(�xi ,�yi)

])

i =1, . . .,n, the EM algorithm is composed by the following steps:

1. Provide initial values for h= (�x , �0, �1, �2x)

�.2. Compute xi = ci{�x �

−2x +�−1

xi Xi +�−1yi �1(Yi − �0)} and x2i = ci + xi2, where ci comes from (8)

with �2=0, i =1, . . .,n.3. Compute

�1=∑n

i=1 �−1yi (Yi −Y ∗)xi∑n

i=1 �−1yi x

2i −x∗

∑ni=1 �−1

yi

and �0=Y ∗−x∗�1

with Y ∗ =∑ni=1 �−1

yi Yi/∑n

i=1 �−1yi and x∗ =∑n

i=1 �−1yi xi/

∑ni=1 �−1

yi .

4. Compute �x =n−1∑ni=1 xi and �2

x =n−1∑ni=1 x

2i − �x

2.5. Repeat steps 2–4 until convergence.

This iterative scheme can begin with the estimates from the equation-error model or from themethod of moments [5, 9]. Our stopping criterion is based on the maximum relative differencesbetween estimates in two successive iterations.



2.1. Score and observed information

After some algebraic manipulations we get from (9) the elements of the score vector U(h); namely,

U(h) = (U�0,U�1,U�x,U�2x

,U�2)� =

n∑i=1

�li (h)�h

=n∑

i=1Ui (h)

Ui (h) = (Ui�0,Ui�1,Ui�x ,Ui�2x,Ui�2)

�

where

Ui� =−1

2

� log |Si |�c

− 1

2

�qi�c

(11)

c=�0,�1,�x ,�2x ,�

2, with

� log |Si |��0

= 0,� log |Si |

��1=2ci(�yi +�2)−1�1,

� log |Si |��x

=0

� log |Si |��2x

= �−2x (1−ci�

−2x ),

� log |Si |��2

= (�yi +�2)−1{1−(�yi +�2)−1ci�21}

and

�qi�c

= �qi1�c

−2ciqi2�qi2�c

−q2i2�ci�c

with

�qi1��0

= −2(�yi +�2)−1(Yi −�0−�x�1),�qi1��1

=�x�qi1��0

�qi1��x

= −2�−1xi (Xi −�x )−2�1(�yi +�2)−1(Yi −�0−�x�1),

�qi1��2x

=0

�qi1��2

= −(�yi +�2)−2(Yi −�0−�x�1)2,

�qi2��0

=−�1(�yi +�2)−1

�qi2��1

= (�yi +�2)−1(Yi −�0−2�x�1),�qi2��x

=−{�−1xi +�21(�yi +�2)−1}, �qi2

��2x

=0

�qi2��2

= −(�yi +�2)−2�1(Yi −�0−�x�1),�ci��0

=0,�ci��1

=−2c2i (�yi +�2)−1�1,�ci��x

=0

�ci��2x

= c2i �−4x and

�ci��2

=c2i (�yi +�2)−2�21, i =1, . . .,n



After lengthy algebraic manipulations we get from (11) the elements of the observed informationmatrix K(h),

K(h)=−n∑

i=1

�2li(h)

�h�h�=−

n∑i=1

Li (h) (12)

with upper entries of Li organized as

Li (h)=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

Li�0�0 Li�0�1 Li�0�xLi�0�2x

Li�0�2

Li�1�1 Li�1�xLi�1�2x

Li�1�2

Li�x�x Li�x�2xLi�x�2

Li�2x�2xLi�2x�2

Li�2�2

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦(13)

i =1, . . .,n, whose presentation is postponed to the Appendix.

2.2. Expected information

In this section we provide the expected information matrix of the parameter vector h. Adopting theparameterization h∗ = (�1,�2,�11,�22,�)�, where �=�12/

√�11�22, with �1, �2, �11, and �22 as

in (10), Kulathinal et al. [5] furnish the expected information matrix of r= (�11,�22,�)� denotedby I(r). Their expression for Ji on p. 1095 should have the factor ‘2’ outside the square root, butthe computations in [5] use the correct expression (Kulathinal, Personal Communication). Lettingl= (�1,�2)

�, it can be shown in a few steps, as in Patriota et al. (A heteroscedastic structuralerrors-in-variables model with equation error. Unpublished Manuscript, 2007), that

E

{�2li (h∗)�l�r�

}=0 and E

{�2li (h∗)�l�l�

}=S−1

i

implying that

I(h∗)=

⎡⎢⎢⎣n∑

i=1S−1i 0

0� I(r)

⎤⎥⎥⎦Turning back to our parameterization, we arrive at

I(h)=J(h∗)I(h∗)J(h∗)� (14)



with Jacobian given by

J(h∗)=

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎣

0 1 0 0 0

0 �1 0 2�√

�11�22 (1−�2)

√�11�22

1 �

√�22�11

0 0 0

0 0 1 �2�22�11

�(1−�2)

2�11

0 0 0 1 − �

2�22

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎦Confidence regions for the parameters can be constructed from asymptotic results. Fahrmeir [13]

handles the lack of i.i.d. observations in ML inference. Under some regularity conditions, it can beshown that the distribution of h approaches a multivariate normal distribution as the sample sizegoes to infinity (n→∞). Hence, the approximate distribution of h in large samples is N5

(h,I(h)−1

).

The aforementioned conditions are hard to verify in complex models. This task is partially circum-vented through empirical validation (Section 3). The expected information matrix I(h) can beestimated by I(h) or, alternatively, by K(h), with K(h) as in (12).

2.3. Hypothesis testing

Now we deal with the problem of testing

H0 :h1=h10, h2 unspecified (15)

where h1 is a subset of h= (h�1 ,h�2 )� with p1 parameters (1�p1�5). The score vector, the expectedinformation matrix, and its inverse are partitioned accordingly resulting in

U(h)=(U1(h)

U2(h)

), I(h)=

[I11(h) I12(h)

I21(h) I22(h)

]and I(h)−1=

[I11(h) I12(h)

I21(h) I22(h)

]

Hypothesis (15) can be tested using the following statistics [13, 14]:

Likelihood ratio: LR=−2{l (h)−l (h)} (16)

Wald: W = (h1−h10)�{I11(h)−I12(h)I22(h)−1I21 (h)}(h1−h10) (17)

Score: S=U1(h)�I11(h)U1(h) (18)

where h= (h�10, h2�)� and h2 denotes the ML estimator of h2 restricted to H0 in (15). Under

some suitable regularity conditions [13], we have that under H0 these statistics share the same

asymptotic behavior; that is, LR, W , and Sd−→�2p1 , as n→∞.

Score and likelihood ratio statistics require the ML estimator of h under H0. On the otherhand, an alternative testing procedure that does not demand the computation of h2 may be useful.



Neyman’s C() test statistic [15, 16], given by

C()=U(h)�I(h)−1U(h)−U2(h)�I22(h)−1U2(h) (19)

is asymptotically equivalent to the score test under H0, where h= (h�10, h�2 )� and h2 is a consistent

estimator of h2.Of particular interest is the hypothesis

H0 :�1=0 (20)

In this case, p1=1 and ML estimates of �0, �x , �2x , and �2 constrained to (20) are computedby direct numerical maximization of the log-likelihood function (9), for the M step of the EMalgorithm does not furnish closed form expressions of the parameter estimates. This maximizationcan be accomplished, for instance, with the MaxBFGS function [17] or the optim function [18].Let h2 be a consistent estimator of h2= (�0,�x ,�

2x ,�

2)�. In our case, �0=Y , �x = X , �2x =(n−1)−1∑n

i=1(Xi −X)2−�x , and �2= (n−1)−1∑ni=1(Yi −Y )2−�y are consistent estimators of

�0, �x , �2x , and �2, respectively. Therefore, test of hypothesis (20) can be carried out using theC() test statistic by putting h= (0, �0, �x , �

2x , �

2)�.

3. SIMULATION STUDY

In order to state some results in Section 2 we rely on asymptotic theory. In view of this, we plannedMonte Carlo simulations to evaluate the empirical level and the power of the test statistics (16)–(19)at a nominal significance level of 5 per cent when testing hypothesis (20). Simulation settings areas follows. The intercept is �0=−1. Three values of the slope (�1=−1,0, and 1) make possibleto monitor not only the nominal level but also the power of the tests. The parameters of thedistribution of the true covariate x in (4) are �x =−2 and �2x =4, whereas the variance of the errorin the equation is chosen as �2=10. Measurement error standard deviations

√�x and

√�y are

picked from uniform distributions on (0.5,1.5) and (0.5,4), respectively. Sample sizes are 40, 80,and 100. For each pair (n,�1) we generate the error variances as above; then, holding these valuesfixed, observations vi are drawn from (5), i =1, . . .,n. Rejection rates of (20) are calculated from10 000 samples. Simulated scenarios mimic some characteristics of the data sets in our example(Section 4). Computations were performed with specific purpose Ox code [17]. Computationalcodes are available from the first author upon request. Henceforth, the subscripts ‘obs’ and ‘exp’will refer to statistics evaluated with the observed and expected information matrices, respectively(see the last sentence of Section 2.2). The samples in which the optimization method failed toconverge (ML estimation subject to H0) were discarded, as well as the samples with negative valuefrom some test statistic.

Figure 1 displays the quantile–quantile (QQ) plots of the test statistics in samples of size 80(in these plots the vertical segments indicate the critical value equal to 3.84). Similar aspects arealso observed in the plots when sample sizes are 40 and 100 (omitted for the sake of space). Fromthese plots we conclude that the LR, Sexp, and C()exp statistics reach the closest agreement withthe theoretical chi-square distribution.

Table I summarizes some results of the simulations. Restrained to H0 in (20), rejection ratesfrom the LR, Sexp, and C()exp tests are close to 5 per cent, whichever the sample size, a further



evidence for there being agreement between empirical and theoretical distributions under the nullhypothesis (see also Figure 1). As desirable, with �1 �=0 rejection rates tend to 1.000 when thesample size increases. With a nominal significance level of 1 per cent the results are similar.

0 5 10 15 20

0

10

5

15

20

Theoretical χ2 quantiles

LR

0 5 10 15 20 25 30

5

0

10

15

20

25

30


Wob

s

0 5 10 15 20

0

10

5

15

20


Wex

p

0 10 20 30

0

40

10

30

20

0

10

30

20


Sob

s

0 5 10 15

0

5

15

10

0

5

15

10


Sex

p

0 10 20 30 40


C(α

) obs

0 5 10 15


C(α

) exp

Figure 1. Chi-square probability plots of 10 000 simulated values of the test statistics in samples of size 80.



Table I. Rejection rates of the hypothesis H0 :�1=0 (at a nominal level of 5 per cent) from the likelihoodratio (LR), Wald (W ), score (S), and C() tests in the equation-error model.

Test statistic

n �1 LR Wobs Wexp Sobs Sexp C()obs C()exp

−1 0.8162 0.8455 0.8301 0.8415 0.7839 0.8407 0.763840 0 0.0583 0.0742 0.0653 0.0802 0.0504 0.0956 0.0521

1 0.7869 0.8198 0.8009 0.8215 0.7604 0.8206 0.7380

−1 0.9713 0.9758 0.9730 0.9766 0.9690 0.9769 0.966480 0 0.0544 0.0623 0.0589 0.0645 0.0511 0.0706 0.0509

1 0.9732 0.9769 0.9759 0.9785 0.9702 0.9776 0.9686

−1 0.9892 0.9907 0.9900 0.9909 0.9888 0.9912 0.9879100 0 0.0493 0.0556 0.0518 0.0576 0.0462 0.0613 0.0465

1 0.9907 0.9919 0.9915 0.9921 0.9897 0.9919 0.9899

The remaining test statistics do not behave so well as the LR, Sexp, and C()exp tests. Forsmaller sample sizes, rejection rates corresponding to H0 differ from the nominal significancelevel (a greater sample size is needed). Hence, rejection rates from samples violating (20) shouldbe read with caution (it is not legitimate to associate these rates with the power of the tests). TheC()obs statistic shows the worst performance. The improvement when the sample size increasesis more noticeable for the Wexp test.

Comparing statistics evaluated with the observed and expected information matrices, as a rule,the latter yield rejection rates closer to 5 per cent when �1=0. Investigating alternative forms ofthe score test, Bera and Bilias [19] report an analogous behavior. Bearing in mind the QQ plotsin Figure 1, the significance level, and the power, our simulations reveal that the LR, Sexp, andC()exp tests exceed the contenders and seem to be the recommended (at least within the scope ofour study). It is worth to remark once again that the C() test does not involve any sort of iterativeoptimization.

Moreover, in a second batch of simulations, we generate heteroscedastic errors, but assumehomoscedastic errors for inference, with known variances equal to the means of the simulatedvariances (�x0 and �y0, say). With this side condition, ML estimators have expressions (see

Reference [2]) �0=Y − �1X , �1=mXY /(mXX −�x0), �x = X , �2x =mXY /�1, and �2=mYY −�1mXY −�y0, where mXY =∑n

i=1(Xi −X)(Yi −Y )/n and so on. The impact of this (wrong)inferential option on the test statistics depends on the heteroscedasticity pattern in the errors.For instance, with �xi =0.5+(i−1)/(n−1) and �yi =0.5+3.5(i−1)/(n−1), i =1, . . .,n, andincreasing true variables (x and y), the probability plots of the statistics deviate from the target chi-square distribution. As a guideline, we recommend test statistics properly designed to accommodateheteroscedastic errors.

4. APPLICATION

The presented approach is illustrated with data from the WHO MONICA Project on cardiovasculardisease and its risk factors [5, 10], which comprises about 40 populations in 21 countries. We will



analyze estimates of trends in coronary event rates (response) and trends in the mean value of riskscores (covariate) for women and men (n=36 and 38 populations, respectively). The risk scoreis a linear combination of smoking status, systolic blood pressure, body mass index, and totalcholesterol. The coefficients of this combination were estimated by fitting the Cox proportionalhazards model with coronary heart disease death as the response variable. On the other hand, trendin coronary event rates is the coefficient of the linear time trend of the annual rates (in a logarithmicscale) over a five-year period. Variances of the measurement errors (changing across populationsand considered as fixed values) are the sampling variances of the estimates of the trends fromeach population. Since the coefficients in the risk score were estimated from a different study (theNordic Risk Assessment Study [10]), this amounts to another source of uncertainty. Additionalinformation about the data sets can be found in References [5, 10]. Figures 2 and 3 exhibit thepoints, standard deviations, and regression lines adjusted by distinct methods. Graphics were builtusing the R system [18]. Weighted least-squares (WLS) estimates neglect the equation error andthe measurement error in the covariate. The differences between the lines highlight the importanceof errors modelling.

Replacing the ML estimates of h in (7) we constructed the QQ plots and envelopes [20] inFigures 4(b) and 5(b). The points represent the pairs (Z(i), r(i)), where Z(i) =�−1((i− 3

8)/(n+ 14))

is an estimate of the i th expected order statistic of the standard normal distribution (with �

0 4

0

5

–5

–10

–6 –4 –2

10

Average annual change in risk score (%)

Ave

rage

ann

ual c

hang

e in

eve

nt r

ate

(%)

2

Figure 2. Change in event rate versus change in risk score for women’s data togetherwith measurement errors standard deviations (crosses) and regression lines: solid line, equa-

tion-error; dashed line, no-equation-error; and dotted line, WLS.



denoting its cumulative distribution function), ri is computed from (7) with the ML estimates h,and r(1)� · · ·�r(i)� · · ·�r(n). Envelopes are constructed by simulation. Let vm1, . . .,vmn be the mthsample replicated from (5) substituting h for h. The transformed values rmi are ordered as rm(i), i =1, . . .,n. Atkinson [20] suggests 19 generated samples. The limits of the envelope are min19m=1 rm(i)

and max19m=1 rm(i), whereas the line in the center connects the points (Z(i),∑19

m=1 rm(i)/19), i =1, . . .,n. For the no-equation-error model, Figures 4(a) and 5(a) stem from an expression analogousto (7) with �2=0 in (6) and ML estimates h produced from the EM algorithm described inSection 2. These plots form the basis to guide us on assessing departures from the entertainedmodel. Many points in Figures 4(a) and 5(a) lie outside the envelopes. Comparing with Figures 4(b)and 5(b), the fits achieved with the equation-error model by far outperform the ones with themodel, which ignores the error in the equation. Thus, the latter will be excluded from the ongoingdiscussion. Moreover, normality of the transformed values ri , i =1, . . .,n, is tested recurring to theKolmogorov–Smirnov statistic. The descriptive levels in Figures 4(a) and (b) and 5(a) and (b) are0.001, 0.935, 0.023, and 0.248, respectively, giving evidence against the no-equation-error model.

Estimates of �0 and �1 and its standard errors (SE) are summarized in Table II. For thesedata, there is a slight difference between the values from the observed and expected information

0 2 4 6

0

–2

–4

–6

–8

–6 –4 –2Average annual change in risk score (%)

Ave

rage

ann

ual c

hang

e in

eve

nt r

ate

(%) 2

4

6

Figure 3. Change in event rate versus change in risk score for men’s data togetherwith measurement errors standard deviations (crosses) and regression lines: solid line,

equation-error; dashed line, no-equation-error; and dotted line, WLS.



0 1 2

0

–2

–2 –1 –2

–2

–1

–1

2

4

(a) Theoretical N(0,1) quantiles

Sam

ple

valu

es a

nd s

imul

ated

env

elop

e

0 1 2

0

(b) Theoretical N(0,1) quantiles

Sam

ple

valu

es a

nd s

imul

ated

env

elop

e

1

2

3

Figure 4. Normal probability plot and simulated envelope for women’s data:(a) no-equation-error and (b) equation-error model.

matrices. Amongst other assumptions, error variances are considered known. Since they are typi-cally estimated from the data (as in our example), this imposes an additional source of parameteruncertainty. Furthermore, Kulathinal et al. [5] point out that possible bias in the data employedin the estimation of the trends contributes to measurement errors (inflating its variances). Asin Bernsen et al. [21], aiming to take care of this, we also present SEs obtained via the boot-strap resampling method [22]. After reestimating �0 and �1 in 5000 samples of size n from{(Xi ,�xi ,Yi ,�yi), i =1, . . .,n} with replacement, bootstrap SEs are computed. Except for the slope(�1) in the men’s data set, divergences between bootstrap and ML estimates are minor (they arenot larger than 10 per cent).

According to Kulathinal (Personal Communication), rechecking the entries of Table III inKulathinal et al. [5], we get SE(�1)=0.389 (women’s data) and 0.229 (men’s data). Refering tothe figures quoted in our Table II, these small discrepancies could be ascribed to the differentcomputational implementations.



0 1 2

0

–1

–2

–2 –1 –2

–2

–1

–1

(a) Theoretical N(0,1) quantiles

Sam

ple

valu

es a

nd s

imul

ated

env

elop

e

0 1 2

0

1

2

3

(b) Theoretical N(0,1) quantiles

Sam

ple

valu

es a

nd s

imul

ated

env

elop

e

1

2

3

Figure 5. Normal probability plot and simulated envelope for men’s data:(a) no-equation-error and (b) equation-error model.

Table II. Parameter estimates and standard errors from ML and bootstrapmethods in the equation-error model.

Standard error

Data set Parameter Estimate MLobs MLexp Bootstrap

Women �0 0.032 1.064 1.123 1.020�1 0.679 0.381 0.407 0.408

Men �0 −2.080 0.526 0.557 0.499�1 0.469 0.243 0.234 0.293



Table III. Results of the null slope test in the equation-error model.

Test statistic (and descriptive level)

Data set LR Sexp C()exp

Women 2.86 (0.091) 2.90 (0.089) 4.34 (0.037)Men 3.37 (0.066) 2.86 (0.091) 4.16 (0.041)

Finally we test the hypothesis H0 in (20). Table III summarizes the statistics, which demonstratedgood properties in our simulations. Descriptive levels were taken from the chi-square distributionwith one degree of freedom. Implications of the tests are in conflict at a significance level of 5per cent (but not at 1 per cent). Contrary to the LR and Sexp statistics, in both data sets C()exp testleads to reject H0; nevertheless, the strength of the rejection is not very strong. Hence, our analysiscasts doubt on the impact of the gathered risk scores in coronary event rates. This assertion isin accordance with Kulathinal (Personal Communication) and Reference [5] (see also the ratios�1/SE(�1) in Table II with bootstrap SEs).

5. CONCLUDING REMARKS

We have presented inferential tools for a model with a wealth of applications in medical sciences.In particular, a graphical device for checking the model and hypothesis test procedures come inaddition to the techniques in References [5, 9]. The example in Section 4 warns us that the effects ofthe error in the equation should not be neglected. This issue is also stressed in References [4, 23, 24].

The proposed test statistics can be readily extended to hypotheses other than (15) (linear ornonlinear in h). Since it is not possible to enunciate a general theoretical result regarding the meritsof the tests, as a guidance to the practitioner having to select a statistical test, we suggest a choicebased on simulations under conditions that resemble the problem at hand. Outputs such as theones in Figure 1 and Table I might be valuable in this respect. Modern computers and existingsoftware help bring this task to the practitioner’s routine.

Our proposal in Section 2 could be named a classical measurement error model [4], for in (2)the observed covariate X is an unbiased counterpart of the true covariate x . An extended modelthat accounts for biases in the line of Carroll et al. [4] (and references therein) could be envisioned.A more complete formulation requires more information than the data available in the data sets inSection 4. Additional information (replicated observations, for instance) also enables to propose amodel in which the variances of the measurement errors are estimated. Such extensions constituteareas for future research.

APPENDIX

The elements of Li in (13) have general expression

Li� =−1

2

�2 log |Si |��

− 1

2

�2qi��

=−1

2(di�+qi�)



�,=�0,�1,�x ,�2x ,�

2. Deriving once more the expressions in Section 2.1 leads to

di�0�0 = di�0�1 =di�0�x=di�0�2x =di�0�2 =di�1�x =di�x�x

=di�x�2x =di�x�2 =0

di�1�1 = 2ci {−2ci(�yi +�2)−1�21+1}(�yi +�2)−1

di�1�2x = 2c2i �−4x �1(�yi +�2)−1, di�1�2 =2ci�1(�yi +�2)−2{ci�21(�yi +�2)−1−1}

di�2x�2x = −�−8x (�2x −ci )

2, di�2x�2 =−�−4x c2i (�yi +�2)−2�21

di�2�2 = −{1−ci�21(�yi +�2)−1}2(�yi +�2)−2

and

qi� = �2qi1��

−2ci

{�qi2��

�qi2�

+qi2�2qi2��

}−2qi2

{�qi2��

�ci�

+ �ci��

�qi2�

}−q2i2

�2ci��

where

�2qi1��20

= 2(�yi +�2)−1,�2qi1

��0��1=2�x (�yi +�2)−1,

�2qi1��0��x

=2�1(�yi +�2)−1

�2qi1��0��2x

= 0,�2qi1

��0��2=2(�yi +�2)−2(Yi −�0−�x�1),

�2qi1��21

=2�2x (�yi +�2)−1

�2qi1��1��x

= −2(�yi +�2)−1(Yi −�0−2�x�1),�2qi1

��1��2x=0

�2qi1��1��2

= 2�x (�yi +�2)−2(Yi −�0−�x�1),�2qi1��2x

=2{�−1xi +�21(�yi +�2)−1}

�2qi1��x��2x

= 0,�2qi1

��x��2=2�1(�yi +�2)−2(Yi −�0−�x�1),

�2qi1��4x

= �2qi1��2x��2

=0

�2qi1��4

= 2(�yi +�2)−3(Yi −�0−�x�1)2,

�2qi2��20

=0,�2qi2

��0��1=−(�yi +�2)−1

�2qi2��0��x

= �2qi2��0��2x

=0,�2qi2

��0��2=�1(�yi +�2)−2,

�2qi2��21

=−2�x (�yi +�2)−1

�2qi2��1��x

= −2�1(�yi +�2)−1,�2qi2

��1��2x=0,

�2qi2��1��2

=−(�yi +�2)−2(Yi −�0−2�x�1)

�2qi2��2x

= �2qi2��x��2x

=0,�2qi2

��x��2=−�21(�yi +�2)−2,

�2qi2��4

x= �2qi2

��2x��2=0



�2qi2��4

= 2(�yi +�2)−3�1(Yi −�0−2�x�1),�2ci��20

= �2ci��0��1

= �2ci��0��x

= �2ci��0��2x

=0

�2ci��0��2

= 0,�2ci��21

=−2c2i {(�yi +�2)−1−4ci (�yi +�2)−2�21},

�2ci��1��x

=0

�2ci��1��2x

= −4c3i �−4x (�yi +�2)−1�1,

�2ci��1��2

=2c2i (�yi +�2)−2�1{1−2ci (�yi +�2)−1�21}

�2ci��2

x= �2ci

��x��2x= �2ci

��x��2=0,

�2ci��4x

=2c2i �−6x (ci�

−2x −1)

�2ci��2x��2

= 2c3i (�yi +�2)−2�21�

−4x

and

�2ci��4

=2c2i (�yi +�2)−3�21{ci(�yi +�2)−1�21−1}, i =1, . . .,n

ACKNOWLEDGEMENTS

Helpful comments from two anonymous reviewers are acknowledged. The authors also thank Dr KariKuulasmaa (National Public Health Institute, Finland) for gently supplying the data of our example inSection 4.

REFERENCES

1. Fuller WA. Measurement Error Models. Wiley: New York, 1987.2. Cheng CL, Van Ness JW. Statistical Regression with Measurement Error. Arnold: London, 1999.3. Gustafson P. Measurement Error and Misclassification in Statistics and Epidemiology. Chapman & Hall/CRC

Press: Boca Raton, 2004.4. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Measurement Error in Nonlinear Models: A Modern

Perspective (2nd edn). Chapman & Hall/CRC Press: Boca Raton, 2006.5. Kulathinal SB, Kuulasmaa K, Gasbarra D. Estimation of an errors-in-variables regression model when the

variances of the measurement errors vary between the observations. Statistics in Medicine 2002; 21:1089–1101.DOI: 10.1002/sim.1062.

6. Li L, Palta M, Shao J. A measurement error model with a Poisson distributed surrogate. Statistics in Medicine2004; 23:2527–2536. DOI: 10.1002/sim.1838.

7. Prescott GJ, Garthwaite PH. Bayesian analysis of misclassified binary data from a matched case-control studywith a validation sub-study. Statistics in Medicine 2005; 24:379–401. DOI: 10.1002/sim.2000.

8. Prescott GJ, Garthwaite PH. A Bayesian approach to prospective binary outcome studies with misclassificationin a binary risk factor. Statistics in Medicine 2005; 24:3463–3477. DOI: 10.1002/sim.2192.

9. Cheng CL, Riu J. On estimating linear relationships when both variables are subject to heteroscedastic measurementerrors. Technometrics 2006; 48:511–519. DOI: 10.1198/004017006000000237.

10. Kuulasmaa K, Tunstall-Pedoe H, Dobson A, Fortmann S, Tolonen H, Evans A, Ferrario M, Tuomilehto J for theWHO MONICA Project. Estimation of contribution of changes in classic risk factors to trends in coronary-eventrates across the WHO MONICA Project populations. Lancet 2000; 355:675–687.

11. Johnson NL, Kotz S, Balakrishnan N. Continuous Univariate Distributions (2nd edn), vol. 1. Wiley: New York,1994.



12. Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm (withDiscussion). Journal of the Royal Statistical Society, Series B 1977; 39:1–38.

13. Fahrmeir L. A note on asymptotic testing theory for nonhomogeneous observations. Stochastic Processes andtheir Applications 1988; 28:267–273.

14. Cox DR, Hinkley DV. Theoretical Statistics. Chapman & Hall: London, 1974.15. Gourieroux C, Monfort A. Statistics and Econometric Models, vol. 2. Cambridge University Press: New York,

1995.16. Bera AK, Bilias Y. Rao’s score, Neyman’s C() and Silvey’s LM tests: an essay on historical developments and

some new results. Journal of Statistical Planning and Inference 2001; 97:9–44.17. Doornik JA. Object-oriented Matrix Programming Using Ox (3rd edn). Timberlake Consultants Press: Oxford,

London, 2002.18. R Development Core Team. R: A Language and Environment for Statistical Computing. R Foundation for

Statistical Computing: Vienna, Austria, 2008.19. Bera AK, Bilias Y. Alternative forms and properties of the score test. Journal of Applied Statistics 1986; 13:13–25.20. Atkinson AC. Plots, Transformations, and Regression. Oxford University Press: Oxford, 1985.21. Bernsen RMD, Tasche MJA, Nagelkerke NJD. Some notes on baseline risk and heterogeneity in meta-analysis.

Statistics in Medicine 1999; 18:233–237.22. Efron B, Tibshirani RJ. An Introduction to the Bootstrap. Chapman & Hall: New York, 1993.23. Oman SD, Meir N, Haim N. Comparing two measures of creatinine clearance: an application of errors-in-variables

and bootstrap techniques. Applied Statistics 1999; 48:39–52.24. Dunn G. Statistical Evaluation of Measurement Errors: Design and Analysis of Reliability Studies (2nd edn).

Arnold: London, 2004.


Documents

Hypothesis testing in an errors-in-variables model with heteroscedastic measurement errors