30
GOODNESS-OF-FIT PROCESSES for LOGISTIC REGRESSION: SIMULATION RESULTS D.W. Hosmer# and N.L. Hjort+ Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA, U.S.A.# Department of Mathematics and Statistics, University of Oslo, Oslo Norway+ t, : .. 1

GOODNESS-OF-FIT PROCESSES for LOGISTIC REGRESSION ... · /', - L . , ' '~ . Abstract In this paper we build on the simulation results in Hosmer, Hosmer, Le Cessie and Lemeshow (1997)

  • Upload
    trananh

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

GOODNESS-OF-FIT PROCESSES for LOGISTIC REGRESSION:

SIMULATION RESULTS

D.W. Hosmer# and N.L. Hjort+

Department of Biostatistics and Epidemiology, University of Massachusetts, Amherst, MA,

U.S.A.#

Department of Mathematics and Statistics, University of Oslo, Oslo Norway+

t, : ..

1

/', - L

. , ' '~ .

Abstract

In this paper we build on the simulation results in Hosmer, Hosmer, Le Cessie and

Lemeshow (1997) and use new theoretical work in Hjort and Hosmer (2000) on weighted

goodness-of-fit processes. We compare the performance of the new weighted goodness-of-fit

processes statistics, the Hosmer-Lemeshow decile of risk statistic, the Pearson chi square and

unweighted sum-of-squares statistic. By considering different weights and grouping strategies

we consider up to 24 different test statistics. The simulations demonstrate that, in all but a few

exceptions, the statistics had the correct size. An examination of the performance of the tests

when the correct model has a quadratic term but a model containing only the linear term has

been fit shows that all tests, have power close to or exceeding 50% to detect moderate departures

from linearity when the sample size is 100 and have power over 90% for these same alternatives

for samples of size 500. All tests had low power with sample size 100 when the correct model

had an interaction between a dichotomous and continuous covariate but the model containing the

continuous and dichotomous covariate was fit. Power exceeded 80 percent to detect extreme

interaction with a sample size of 500. Power to detect an incorrectly specified link was poor for

samples of size 100 and for most settings for sample size 500. Only with a sample size of 500 .

and an extremely asymmetric link function did power exceed 80 percent. The picture that

emerges from these simulations is that no one statistic or class of statistics performed markedly

better in all settings. However, one of the new optimally weighted tests based on the omitted

covariate had power comparable to other tests in all setting s and had the highest power in the

difficult setting of an omitted interaction term. We illustrate the tests within the context of a

model for factors associated with low birth weight. We conclude the paper with specific

recommendations for practice.

Keywords: residuals, generalized linear models, chi-square tests, goodness-of-fit tests

2

•.- ~· : ;

An Example of Some Problems in Using Overall Goodness-of-Fit Tests

To illustrate some of the problems with currently available tests assessing overall

goodness-of-fit we present the results of the fit of a model using the low birth weight data from

Hosmer and Lemeshow (2000). The outcome variable was whether or not birth weight was less

than 2500 grams. Data were collected on 189 births of which 59 were low birth weight and 130

were normal birth weight. Our purpose is to illustrate problems with assessing model fit rather

than to provide a definitive analysis of these data. The independent variables used in this

example are age of the mother (AGE), weight of the mother at the last menstrual period (LWT),

race of the mother, (white, black, or other, coded into two design variables using white race as

the referent group (RACE_2, RACE_3)) and whether or not the mother smoked, 1 =yes, 0 =no,

(SMOKE)). To avoid differences between packages when ties are present in the estimated

probabilities we jittered AGE and LWT by adding the value of an independent U (-0.5, 0.5)

variate. The jittered variables are denoted paper as A GEj and L WTj. We show in Table 1 the

results of fitting this logistic regression model. We note that the jittered data are different from

the jittered data used in Hosmer et. al. (1997) so the fitted model this paper is slightly different

from their model. We present the values of currently available goodness-of-fit tests computed

from a few widely used software packages in Table 2. We include the p- value for the Pearson

chi square computed using the normal approximation as well as the unweighted sum-of-squares

statistic and its p-value computed using the normal approximation. The later two statistics

emerged from the work of Hosmer, et. al. (1997) as having the reasonable power among tests

examined. The p-values are calqllated using the normal approximation described in Hosmer et.

al. (1997) and Hosmer and Lemeshow (2000).

The fitted model shown in Table 1 contains variables known to be important risk factors

for low birth weight. Mother's age, although not significant, was retained in the model because

of its known clinical significance. All five packages mentioned in Table 2 obtained the same

estimated coefficients and estimated standard errors.

The p-values for the goodness-of-fit statistics presented in Table 2 highlight current

problems in trying to interpret summary tests of goodness-of-fit from packaged programs. First,

the p-value for the Pearson chi-square statistic obtained from a chi square distribution with 183

degrees-of-freedom is, in this case, meaningless as it is based on a contingency table with

estimated expected cell frequencies that are all less than one. In Table 2 we also show p-values

computed using asymptotic normal approximations to the distribution of the Pearson chi square

and the unweighted sum-of-squares. The unweighted sum-of-squares test provides some

3

: j~! : . , '

•• ,~? ;

evidence of lack of model fit asp= 0.084. Second, we obtain three different values of the

Hosmer-Lemeshow goodness-of-fit statistic based on grouping subjects into deciles of risk.

Three packages produce the same statistic with a p = 0.229, one hasp= 0.111 and one has p = 0.041. The problem is that the packages all use slightly different algorithms to select cutpoints

that define the deciles. The results in the Table 2 show the inherent difficulties in the use of

these tests. The outcome is dichotomous and the tests based on groups are sensitive to choice of

groups. The results in Table 2 show that, even with a relative large sample, moving a positive or

negative outcome from one group to another can have a pronounced effect on the magnitude of

the test. The non-cutpoint tests have p-values based on asymptotic results that require large

sample sizes to hold.

Currently Used Overall Goodness-of-Fit Tests

The addition of goodness-of-fit tests and logistic regression diagnostic statistics to

statistical software packages has made the once difficult task of using these methods to assess the

adequacy of a fitted logistic regression model a routine step in the model building process. Any

analysis should incorporate a thorough examination of logistic regression diagnostics before

reaching a final decision on model adequacy. We do not wish to understate the importance of

the use of these statistics; but the focus of this paper is on overall goodness-of-fit tests.

We begin by setting the notation used to describe the model. Assume we are in the strictly binary case and observe n independent pairs (xi,yi),i = L ... n, where

x; = (x0i,xli, ... ,xpi),x0i = 1, denotes a vector of p +1 assumed fixed covariates for the ith subject

and yi = 0,1 denotes an observation of the outcome random variable Y;. Under the logistic

regression model we assume that P(Y; = 11 xi'~)= 1l( xi'~), where TC(x) = er(x,,f'lj ( 1 + e'(x,,P)),

and the logit transformation is r( xi,~)= x; ~. Parameter estimates are usually obtained by A A A A

maximum likelihood and are denoted by W = (f3o,f31' .•. ,f3P). We denote the fitted values as

fti = 1C(Xp~}.

As noted by Hosmer et. al. (1997) the process of examining a model's goodness-of-fit has

several facets. Namely one should determine if the fitted model's residual variation is small,

displays no systematic tendency and follows the follows the distribution postulated by the model.

The components of fit in a logistic regression model are specified by the following three

assumptions:

(A1) the logit transformation is the correct function linking the covariates with the

conditional mean, logit[TC(x)] = x'~.

~' . . . ,._,.'

4

(A2) the linear predictor, x'~, is correct (We do not need to include additional variables,

transformations of variables, or interactions of variables.),

and

( A3) the variance is Bernoulli, var( Y; I xi ) = 7r( X;) [1 -7r( xi)] .

Evidence of lack-of-fit may come from a violation of one or more of three characteristics.

We may assess model fit at a number of stages in the modeling process. We could use it

as an aid in model development where our goal is to find violations primarily in (A2) and/or to

verify that a "final" model does fit where the emphasis is more towards examining (Al) and

(A3). In the case of a logistic regression model we are faced with the practical problem that

assumptions Al-A3 are not mutually exclusive. Specifically, assumption A3 may be confounded

withAl and/or A2. If we violate A2 and misspecify the linear predictor then the model-based

estimate of the variance is also incorrect. Similarly if we have the incorrect link function, with

or without linear predictor misspecification, then the model-based estimate of the variance is also

incorrect.

A useful conceptual framework for thinking about assessment of model fit is to consider

the data as described by a 2 x n contingency table. The two rows are defined by the values of

the dichotomous outcome variable y and the n columns by the assumed number of possible

distinct values taken on by the p non-constant covariates in the model. The replicated design

occurs when there are fewer than n distinct values (patterns) of the covariates. The likelihood

ratio D (Deviance) and Pearson chi-square, X2 , statistics that compare observed values to those

predicted by the fitted logistic regression model in the 2 x n table are

Evidence for model lack-of-fit occurs when the values of these statistics are large. Towards this

end, many packages provide a p-value computed using the X2 (n- p -1) distribution. For the

situation considered in this paper, the strictly binary case, this p-value is not useful . For the p­

value to be a valid measure of model fit the number of columns in the table must be fixed and the

sample size large enough that the estimated expected values in the table all exceed some

minimum number such as five. Hosmer and Lemeshow (2000, Chapter 5) discuss using groups

of equal numbers of subjects grouping based on the ranked estimated logistic probabilities. The

statistic, based on 10 equal sized groups (called "deciles of risk"), is denoted C, and is currently

computed in most statistical packages. Hosmer and Lemeshow (1980) and Hosmer et. al. (1997)

.... ', .. ···. :.',

',-' I

·,'' . ~ I ' .', .

'·,'

5

,· ' ,.

showed, via simulations, that when the logistic regression model is correct, assumptions A 1- A3

hold, and the estimated expected values are "large" in all cells, the distributions ofC with g groups is well approximated by the chi-square distribution with g- 2 degrees-of-freedom,

X2(g- 2).

Based on the simulation results in Hosmer et. al. (1997) Hosmer and Lemeshow (2000)

recommend that in addition to the decile of risk statistic, C, one use X2 with the p-value

computed using a normal approximation to its distribution derived by Osius and Rojek (1992).

The mean of the approximating normal distribution is the model degrees-of-freedom, n-p- 1

adjusted by a correction factor described in Hosmer et. al. (1997) The estimate of the variance

is calculated as the residual sum-of-squares from the linear regression of [( 1-2nJ/vi] on xi with weights vi , where -Y; = ii ( 1- iri). In addition, we consider in this paper the unweighted

sum-of-squares statistic S = I,~=1 (yi - ii t with a p-value computed using a normal

approximation to its distribution, also derived by Osius and Rojek (1992). The mean of the '

distribution is I,i:l vi and the estimate of the variance is the residual sum-of-squares from the

linear regression of (1- 2nJ on x, with weights v,. Hosmer and Lemeshow (2000) also suggest

using a score test for alternative link functions proposed by Stukel (1988). In this paper rather

than using Stukel's test we consider several tests incorporating the covariate that forms the basis

of her test.

Goodness-of-Fit Processes for Logistic Regression

Hjort and Hosmer (2000) consider generalized weighted goodness of fit tests that have

their foundation in statistical process theory. The tests are similar in spirit to tests proposed by

Su and Wei (1991) and Royston (1992). The main building block of these tests is the process

where 1( x;P ~ r) = 1 if x;~ ~ r and 0 otherwise. We consider two types of tests based on the

process in (1).

(1)

One test is a weighted version of the Hosmer-Lemeshow decile of risk that results from

considering g values obtained by summing lY, (r) over the respective g ordered risk groups,

6

·(.X:~;~fj]Jf;.~rf:,:.},<:t'.··"··;; ...

(2)

where the cutpoints ~ , j = 1,2, .. . ,g define the risk groups. Specifically the cutpoint ~ is such

that the (n xj /g )th largest fitted value is ir(nxjfn) = exp(~)/(1 +exp(~ )) , j = 1,2, . .. ,g -1 with

To = -oo and rg = 00. The right hand side of the expression in (2) is of the form ( 1/ n) X ( 0 j - ej ) where

and

ej = f w(xi,~)I(~-1 < x;~ ~~prj, i=1

for j = 1,2, . .. ,g. Hjort and Hosmer (2000) show that the estimator of the limiting covariance A A A A 1 .....

matrix of the g-vector of sums is of the form Q = D-B' .r B. In order to simplify the notation

some we use I i = {i:I(~-1 < xj} ~ 0)} to denote the indices of the subjects whose fitted values are A

in the jth risk group. The matrix D is g x g diagonal with jth diagonal element

The matrix B' is g x (p+ 1) with the jth row defined by the vector

The matrix j is the observed information matrix scaled by n, namely j = (1/n X X'Vx) where V is n x n diagonal with ith diagonal element ~ and X is the n x (p + 1) data matrix. Hjort and

Hosmer (2000) suggest as a goodness-of-fit test the statistic

(3)

with limiting null distribution X 2(df) with df = rank(n-) and .Q- denotes a generalized inverse

of .Q-. Hjort (1990, pl234) shows that the generalized inverse is of the form A A 1 A 1"" A. A A 1 A A A AA t"' n- = D- + D- B'G-BD- , where G- is a generalized inverse of G = J- BD- B'. Substituting

these expressions into (3) and simplifying yields that the test statistics is

7

,-, ,'. __, (

,; -·. .,'

J' •••• '( ' -' :' -; . ) i,,' '' ·.J '·, ·, .; •, , •• •

,' ,'', ;' ~ I 0 ~ / ' I \, I ~ ; 1 0 .'

(4)

The right hand side of equation (4) contains two parts. The first part is essentially a weighted

version of the Hosmer-Lemeshow goodness-of-fit test. If we use w{xj>~) = 1 then the difference

between the two tests is that an estimator of the exact variance of the sum is used in ( 4) where as

Hosmer and Lemeshow (2000) approximate it by n/ti(1-1ri) where 1fi = _!_ Lii-; and ni rj .

ni = L1. In the simulations we examine the rise of the first part of (4) and denote it as Ij

(5)

Based on the work of Hosmer and Lemeshow (1980) and Hosmer et. al. (1997) we calculate p­

values using the z2(g- 2) distribution. Each of the weight functions we consider in this paper

yields a G matrix of full rank. Thus we use the z2(g) distribution to compute p-values for the

limiting null distribution of X2 w.

The second type of test we consider is based on the maximum of the a~solute values of

the terms in equation (1). Specifically we let

Ww = m:u(l~(~)l). (6)

In equation (6) we take advantage of the fact that the value of (1) changes at the observed values,

7; . To obtain a p-value we use the simulation approach suggested by Su and Wei (1991). The

procedure is as follows:

1. Generate a random sample of new outcomes, y; ,i = 1,2, .. . ,n using the fitted values A

'lr;, e.g.

* { 1 if U; ~ if; Y; = . , where u;,..., U(0,1)

0 otherwise

2. Fit the model using the data (y; ,xJ ,i = 1,2, ... ,n to obtain 1( and~·.

3. Calculate a new value of the test statistic in (6) using f = xj3*, ww*.

8

·-- ......... -.. ·-i,

·:',

4. Repeat steps 1 - 3 m = 1, 2, ... ,M times

1 M 5. Calculate the p-value asp= -2)(ww: ~ Ww).

M m=l

The statistics in equations (4), (5) and (6) each involve a weight function, generically

denoted by win the notation. As mentioned, all be it briefly, one choice of a weight function is

simply to use no weight, that is w{ ~. ~) = 1. The advantage of this weight function is the A

computations are simpler. As we noted above in this case we expect Hl1 = C.

We derive in Hjort and Hosmer (2000) optimal weight functions in the sense that they

maximize the power of the tests to detect a particular type of alternative to the null model.

Weight functions are obtained for a missing covariate from the model and for a one parameter

generalization of the logistic model. The basic form of each weight function is the same. Then­

vector of weights is

w = (I-H)z, (7)

where His the logistic regression hat matrix, H =X(x'vxfxv, and z is then-vector of values

of the "omitted" covariate. The weight function for a specific omitted covariate uses z equal to

the values of the covariate. If we are trying to detect a departure from the null model due to the

omission of a quadratic term then z contains the values of the square of the particular continuous

covariate. As another example suppose we are trying to detect a departure from the null model·

due to the omission of an interaction between a continuous and dichotomous covariate then z

contains the values of the product of the continuous and dichotomous covariates.

The one-parameter generalization we consider in Hosmer and Hjort (2000) is

It follows from equation (8) that if r = 0 then the generalized model is equal to the logistic

model. The omitted covariate for this type of model departure is to use the values of ~ = ii; ln(ii;) ,i = 1,2, ... ,n. The form of this covariate is similar to one discussed in Cook and

(8)

Weisberg (1982, page 73) to assess departures from linearity in normal errors linear regression-.

To our knowledge this transformation has never been used in logistic regression to assess over

all model adequacy. Cook and Weisberg ( 1982, page 73 discuss use of the square of the linear

model as a covariate to detect model departure from linearity. They note that in the linear

9

--- --~ ~- -.--- ..... -..--~--.- ·-' .. '\··: ,,'

' ~ . -~ - ...... ' . '

~i·~i~fiivtf~~~t~~?Ji,~< ·:,: \,.· ·' •' ! I' • ,: ', ,~' '

' ' ... '· !. ;.

;·',

regression setting it is equivalent to Tukey's one degree-of-freedom for additivity test. Pregibon

(1984) uses it in the context of assessing model adequacy for the 1-1 matched pairs logistic

model. Stukel (1988) uses a signed version in her two degree-of-freedom test. In the same spirit

we consider weights using as the omitted covariate the values of Z; = (x~r ,i = 1,2, ... , n.

To compute the weights one needs to evaluate the expression in equation (7). We

consider two forms. The first and computationally simplest is to ignore the off diagonal

elements of the hat matrix, H, and use only the leverage values, yielding approximate optimal

weights, wh1 (x 1 ,~) = (1- h1 )z1 • These are quite easy to calculate as the leverage values, hi' are

routinely available from software packages following the fit of a logistic model. The second

approach uses the fact that the weights in equation (7) are the residuals from the weighted linear

regression of z on x with weights v, the n- vector with general element v1 • Another way to

describe the weights is they are the "x" components for the added variable plot in linear

regression. These weights are a bit more work to calculate in that one must fit a regression and

save/compute the residuals. However, linear regression programs are quite fast and this step

does not add a huge burden to the computation, Recall that to compute the p-value for the test

statistic in equation (6) one has to do all the computations Mtimes.

The collection of previously used tests and new tests using different omitted covariates

and two forms of the weight function leads to 18 possible statistics to simulate when we do not

have a specific model omitted covariate, e.g. z = x2 • These are listed in the first 18 rows of

Table 3. The addition of a model specific covariate when looking at specific alternative models

leads to six more tests. These are listed in rows 19 to 24 in Table 3. We use the notation in

second column of Table 4 in subsequent tables.

Simulation Results

We used simulations to study the properties of the goodness-of-fit tests listed in Table 3.

The goal was to assess the adequacy of the proposed null distribution of the statistics when the

fitted logistic model was the correct model and to assess the power of the tests to detect a variety

of departures from the logistic model. We performed all simulations using STA TA 6.0.

10

Null Distribution

We considered a number of different situations to examine the performance of the tests

when the logistic model fit was the correct model. The settings we examined are similar to those

used in Hosmer et. al. (1997). We chose the various distributions of the covariate to produce

distributions of probabilities in the (0,1) interval that one might encounter in practice. We

present in Table 5 thee distribution of the covariate(s), the true coefficients for the logistic

model. In addition we provide the minimum, maximum and the three quartiles of the resulting

distribution of the logistic probabilities for a sample of size 100. The Uniform distribution on

the ( -6,6) interval, U( -6,6), produces a symmetric distribution with mostly small or large

probabilities, while the U(-1,1) produces probabilities mostly in center of the (0,1) interval. A

highly skewed right distribution results, (mostly small but a few large probabilities), when the

covariate has the z\4) distribution. The Normal-Bernoulli model was chosen to represent the

type of data one might typically encounter in practice, a mix of correlated continuous and

dichotomous covariates.

In all simulations we first generated a sample of size n = 100 or 500 values of the

covariate(s) and then we generated the outcome variable by comparing an independently

generated U(0,1) variate, u, to the true logistic probability using the rule y = 1 if u ~ n-(x) and

y = 0 otherwise. In all settings we used 500 replications.

The computation of the p-value for the partial sum-of-residuals tests requires M simulated

values of the statistics. We performed some preliminary simulations to study the effect of the

choice of M on the accuracy of the estimate of the size of the tests over 500 replications. We

compared the results for M = 20,40,80 and 160. The results indicated that the empirical alpha

levels were unstable using 20 or 40 simulations. The results for M = 80 and M = 160 were

stable and similar. Thus we chose to use 80 simulations for each replication of the study.

We present in Table 5 the percent of time each of the first 18 statistics denoted in Table 3

rejected the hypothesis of fit at the a= 0.05 level. These empirical alpha levels are plotted

versus the setting number in Figure 1. The plot shows with only a few exceptions the empirical

rejection percents are within two percent of the desired five percent level of significance. The

two partial sum-of-residuals tests that use (x'~r as the omitted covariate and the optimally

weighted partial sum-of-residuals tests using nln(i) as the covariate do not reject often enough

in settings 7 and 8. It is not clear exactly why this is the case; but may it be due to the narrow

;,,,

J:.: '•··- . ,-, ,.'•

11

range in values of the omitted covariate as x-U( -1,1). Further investigations into the reasons

for this behavior are planned.

Power

We use the same three settings used by Hosmer et. al. (1997) to examine the power of the

tests. These are: the omission of a quadratic term in a continuous variable, the omission of the

interaction of a dichotomous variable and a continuous variable and an incorrectly specified link

function. In all settings studied the distribution of the continuous covariate, x, is U( -3,3). The

distribution of the dichotomous covariate, d, is Bernoulli(l/2) and is independent of the

continuous covariate.

We use five different models to evaluate power with omission of a quadratic term from

the model. We generate the outcome variable using a logistic model with logit r(x,~) = /30 + f31x + f32x 2 where the values of the three coefficients are set such that

n(-1.5) = 0.05, n(3) = 0.95 and n(-3) = J for J= 0.01, 0.05, 0.1, 0.2, and 0.4. The linear

logistic model with n(-1.5) = 0.05, n(3) = 0.95 corresponds to a value of J = 0.007. As the J

parameter increases the lack of linearity in the logit function becomes progressively more

pronounced.

We use four different interaction models to study the power with omission of the

dichotomous-continuous interaction term from the model. We generated the outcome variable from a model with logit r(x,d,~) = /30 + f31x + f32d + f33 xd. The four parameters are set such that

n(-3,0) = 0.1, n(-3,1) = 0.1, n(3,0) = 0.2 and n(3,1) = 0.2+1 where 1= 0.1,0.3,0.5,0.7. Thus

the four models display progressively more interaction.

We examine five different models to assess the power to detect an incorrectly specified

link function. We generate the values of the outcome variable from Stukel's generalized logistic

model using the function 'T](X) = 0.8x as the linear predictor and values of the parameters ~ and

a2 as specified in Table 6. Stukel (1988) noted that if~= a2 =0.165 then the resulting

generalized logistic model has nearly the same shape as the probit model. The model with when

~ =0.62 and a2 = -0.037 has the same shape as the complimentary log-log model. We chose

the remaining three situations to yield one model with both tails longer, one model with both

tails shorter tails and an asymmetric model with one tail longer and one tail shorter than the

logistic model.

12

The situations we use to examine the power of the tests were chosen to represent typical

logistic regression models encountered in practice. The combination of two sample sizes, 100

and 500, and the various models examined yields results that further our understanding of what

types of departures from a linear logistic model the various tests can detect with moderate to high

power.

We present in Table 7 the percent of time each of the 24 tests denoted in Table 3 rejected

the hypothesis of fit at the a= 0.05 level.

The results for the quadratic model are presented in Table 7.1 and are plotted versus

setting in Figure 2. We see that the power is, as expected, poor when trying to detect models that·

are quite close to the logistic. As the departure from linearity in the logit increases, the power

increases rapidly. High power is attained for samples of size 100 in those settings where there

are substantial differences over the entire [0,1] interval between the true quadratic model and the

fitted linear model, settings 7 and 9 in Figure 2 where J = 0.20 and J = 0.40. In setting 4 where

J = 0.05 and the sample size is 500 the power is around 80 percent for most tests. In settings 6,

8 and.10 where the sample is size 500 and J ~ 0.1 the power is over 90 percent for all tests. In

settings 5 and 7 where n = 100 and J = 0.1, 0.2 the range in power between the 24 different tests

is nearly 30 percent. The most powerful test, at 65 percent, is the unweighted sum-of-squared

residuals and 11 other tests have power nearly as good, over 55 percent. The least powerful tests

in these as well as most other settings are the unweighted partial sum-of-residuals test and the optimally weighted partial-sum-of-residuals test using omitted covariate irln(n). The pattern of

the test specific polygons in Figure 2 show that power of all the tests increases at about the same

rate as a function of both sample size and deviation from the null model. The power for tests 19

- 24 in Table 3 that use weights optimal for the omitted term, x 2 , have power that is comparable

to but not better than the other tests. In summary, the results in Table 7.1 indicate that when

there is a substantial difference between the linear and quadratic model all tests have high power

and all tests have low power when there is little difference between the fitted and true models.

The results on the power to detect an omitted dichotomous-continuous variable

interaction are presented in Table 7.2 and are plotted versus the setting number in Figure 3. As

can be seen in Figure 3 the power is low, less than about 40 percent, for all tests when the sample

size is 100, settings 1, 3, 5 and 7. One setting where there are important differences in the tests

is setting 6, n = 500 and I= 0.5. The results in Figure 3 show two clusters of tests, ones with

power over 60 percent and those with power less than 50 percent. Among those with power over

60 percent the best four tests are: the partial sum-of-residuals test with optimal omitted covariate,

x x d, and optimal weights, the unweighted sum-of-squared residuals test, the Pearson chi-

13

square statistic and the partial sum-of-residuals test with optimal omitted covariate, x x d, with

approximate optimal weights. As can be seen in Table 7.2 these four tests are the most powerful

in all eight settings. The power is high, over 80 percent, only setting 8 and in setting 6 for the

partial sum-of-residuals test with optimal omitted covariate, x x d, and optimal weights and the

unweighted sum-of-squared residuals test. In summary ,we see that the power to detect the

interaction is generally low. However the new test using the optimal omitted covariate and

optimal weights seems to provide an improvement in power over the less specific tests. We

believe that this improvement in power is important as interactions of the type considered in

these settings are often difficult to detect during model. Any test that can aid in detecting

omitted terms of this type should be used during assessment of model fit.

One exception to the performance of the tests in the quadratic model is the behavior of

the grouped sum-of-residuals tests, HLoh and X2oh, using the optimal omitted covariate and

approximate optimal weights. Further examination of the simulation results of the distribution of

these two tests suggests that the degrees-of-freedom may differ from the values of 8 and 10 used

to compute the significance levels. The results suggest that the degrees-of-freedom may be 6 and

8 respectively. Results, not shown, indicate that power when significance levels are calculated

using 6 and 8 degrees-of-freedom are in line with the other grouped process based tests.

We note that the power results in Table 7.2 indicate substantially better power to detect

an omitted interaction term than results previously in reported in Hosmer et. al. (1997). The

simulations performed here are slightly different in that we fit the model containing both the

continuous and dichotomous covariates while Hosmer ec al.Tit the model only containing the

continuous covariate. We replicated the simulations fitting the model containing only the

continuous covariate and the results were similar to those previously reported. One possible

explanation for the difference in power is that the model one obtains when fitting only fitting the

continuous covariate essentially has a line on the logit scale intermediate between the separate

lines non-parallel lines in the true model. Thus it appears to fit better than a model with two

different parallel lines on the logit scale.

The results for the power to detect an alternative link function are presented in Table 7.3

and plotted versus the setting number in Figure 4. The power is always less than 30 percent for

sample size 100, settings 1, 3, 5, 7 and 9. The only exception is for the asymmetric link function

in setting 9 where the power for the optimally weighted partial sum-of-residuals test using omitted covariate fCln(i) has power of 44.2 percent. The power is over 80 percent only in

setting 10 when n = 500 with the asymmetric link function. In general the results are quite

variable with no single test being optimal in all settings. The unweighted sum-of-squares test

14

performs about the best and has strikingly higher power than all other tests in setting 8, n = 500

and the short tailed link function. As noted in Hosmer et. al. (1997) when both~ and a2 are

large and positive in the Stukel model, the probability function becomes quite steep and the fitted

values, ic, tend to be either small or large. When this occurs the Pearson chi-square and

unweighted sum-of-squares tests approach zero. However, their estimated variances become

quite large due to the range in the fitted values. The normalized goodness-of fit tests tend to be

not significant since the numerator is small and the estimated variance is large. The same holds

true for the various "decile-of-risk" tests. We note that the power of the Pearson chi-square

statistic and the "decile-of-risk" tests in Table 7.3 are less than the riominal alpha level when

lXr = a2 = 1. Although not shown here when the two parameters, CXr and a2 , become sufficiently

large the tests degenerate. However with a sample of size 500 there are a sufficient number of

estimated probabilities that are not near zero or one to allow the unweighted sum-of-squares test

to have a distribution which leads to relatively high power.

In summary, the results in Table 7 and Figure 2- 4 show that overall the goodness-of-fit

tests have reasonable power for detecting a curvature type misspecification of the logit function.

The power is low for sample size 100 to detect an omitted interaction that yields a linear model

with different slopes and an incorrect but still symmetric link function. However, for sample

size 500 several tests had reasonable power to detect a moderately large interaction term.

The overall performance of the Pearson chi-square statistic and unweighted sum-of­

squares statistic was, overall, superior to most tests. The performances of all the "decile-of-risk"

type tests were similar. The performance of the new optimally weighted partial residual sum-of­

residuals test using the optimal covariate shows promise in detecting lack-of-fit due to omitted

interactions. The weighted partial sum-of-residuals test using the generic omitted covariates

irln(ir) and (x'Pt did not have power better than more easily calculated tests.

When we consider computational issues, power and current availability in packages a

practical strategy is to use the Pearson chi-square statistic and/or the unweighted sum-of-squares A

statistics in conjunction with the Hosmer-Lemeshow decile-of-risk statistic, C. We recommend A

obtaining the 2 by 10 table, of observed and estimated expected frequencies used to compute C

as it provides a useful overall summary of the fit or lack-of-fit of the model and is easily

understood by subject matter scientists. In addition, we recommend using the optimally

weighted partial sum-of-residuals test using perhaps several "educated guesses" about possible

omitted covariates, especially interactions, from the model.

·',:'• ' ~ i -·:

', ' ' ' .... ' i~:."' ~

15

;, t''

Return to the Example

We return to an evaluating the fit of the model for low birth weight shown in Table 1.

We present in Table 9 the p-values of al124 tests. These results in show that only one of the 24

tests, X2 sh, hasp< 0.05 and two others have p-values between 0.05 and 0.15. When we

employ the recommended strategy of using the Pearson chi square and/or unweighted sum-of­

squares tests for power against overall non-linearity in the logit, the Hosmer-Lemeshow decile of

risk statistic and 2 by 10 table for confirmatory evidence we see that it suggests overall fit of the

model. The optimally weighted partial sum-of-residuals test using AGE2 as the omitted

covariate yields p = 0.40, further supporting model fit.

Summary

The use of overall summary measures of goodness-of-fit of logistic regression models has

become an important and easily performed step in model building. Decisions on model fit using

tests based on cutpoints may depend on choice of cutpoints. A new class of overall goodness-of­

fit tests based on weighted partial sum-of-residuals tests has been studied via simulation under

both null and alternative scenarios. The simulation results showed that, with a few exceptions,

all tests had the correct size. The optimal weighted partial sum-of-residuals test had the highest

power for omission of a quadratic term. The Pearson chi-square and unweighted sum-of-squares

statistics had power nearly as high. All tests had low power to detect continuous-dichotomous

variable interaction with a small sample size. With a large sample size power was adequate to

detect a moderate interaction for the optimal weighted partial sum-of-residuals test as well as the

Pearson chi -square and unweighted sum of squares test. All tests had more power to detect lack­

of-fit due to model misspecification when the logit was non-monotone increasing (decreasing)

under the alternative than when it was monotone under both null and alternative models. None

of the tests studied had high power to detect an incorrectly specified link function with sample

size 100. Power was high for all tests to detect an asymmetric link function with a sample size of

500 ..

Because of the superior power of the unweighted sum-of-squares statistic and the

Pearson chi-square/unweighted sum-of-squares statistics, we recommend their use. In addition

the optimally weighted partial sum-of-residuals test using one or more choices for an omitted

covariate could be expected to add to the assessment of model fit. We suggest using the decile

of risk tests for confirmation of model fit or lack-of-fit and its associated 2 x 10 table of

observed and estimated expected frequencies as it is easily understood by subject matter

16

scientists. In all cases one must keep in mind the lack of power with small sample sizes to detect

subtle deviations from the logistic model. Thus the choice of both the logistic regression model

and its covariates should have a strong biological or clinical basis.

References

Cook, R.D. and Weisberg, S. Residuals and Influence in Regression, Chapman Hall, New York

NY (1982)

Hosmer, D.W., Hosmer, T., le Cessie, S. and Lemeshow, S. 'A comparison of goodness-of-fit

tests for the logistic regression model', Statistics in Medicine, 16, 965-980 (1997).

Hosmer, D.W. and Lemeshow, S. 'A goodness-of-fit test for the multiple logistic regression

model', Communications in Statistics, A1 0, 1043-1069 (1980).

Hosmer, D.W. and Lemeshow, S. Applied Logistic Regression: Second Edition, John Wiley and

Sons Inc., New York, NY (2000).

Hjort, N.L. Goodness-of-fit tests in models for life history data based on cumulative hazard

rates, Annals of Statistics, 18, 1221-1258 ( 1990).

Hjort, N.L. and Hosmer, D.W. 'Goodness-of-fit processes for logistic regression', Technical

Report, Department of Mathematics and Statistics, University of Oslo, Oslo Norway (2000)

Royston ,P. 'The use of cusums and other techniques in modeling continuous covariates in

logistic regression', Statistics in Medicine, 11, 1115-1129 ( 1992).

Stukel, T.A. 'Generalized logistic models', Journal of the American Statistical Association,

83,426-431 (1988).

Osius, G. and Rojek, D. 'Normal goodness-of-fit tests for multinomial models with large degrees

of freedom', Journal of the American Statistical Association, 8 7,1145-1152 ( 1992).

17

' .. ,_ ·.<··:,I t'.' •

. :··'

Table 1

Estimated Coefficients, Estimated Standard Errors and p­

values from a Model Fit to the Jittered Low Birth Weight

Data

Variable Coefficient Std. Err. p-value

AGEj -0.022 0.0341 0.512

LWTj -0.013 0.0064 0.050

RACE_1 1.232 0.5171 0.017

RACE_2 0.943 0.4162 0.023

SMOKE 1.054 0.3800 0.006

CONSTANT 0.333 1.1085 0.764

Table 2

V aloe of the Pearson Chi Square Statistic, X 2 , and Values

of the Hosmer-Lemeshow Decile of Risk Statistic, C, Computed by Six Different Packages

Statistic Value DF p-value

X2 ~ x2 (183) 180.81 183 0.532 X2 ~Normal 180.81 * 0.667 S ~Normal 36.90 * 0.084

A

LOGXACT's C 13.02 8 0.111 A

SAS's C 10.55 8 0.229 A

SPSS's C 10.54 8 0.229

STATA's C 10.55 8 0.229 A

SYSTAT's C 16.10 8 0.041

Tables Page 1

Table 4

Settings Used to Examine the Null Distribution of the Tests for n = 100 and 500 Distributional Characteristics of the

Logistic Probabilities (n = 100) Covariate Distribution Logistic Coefficients nc1J Q1 Qz Q, n(n)

U(-6,6) f3o = 0,{31 =0.8 0.009 0.083 0.5 0.917 0.991 U( -4.5,4.5) f3o = 0,/31 =0.8 0.029 0.142 0.5 0.858 0.971

U(-3,3) f3o = 0,/31 =0.8 0.087 0.231 0.5 0.769 0.913 U(-1,1) f3o = 0,/31 =0.8 0.313 0.400 0.5 0.600 0.687 xz(4) f3o = -4.9,{31 = 0.65 0.009 0.025 0.062 0.202 0.965

Normal-Bernoulli f3o = 0,{31 =0.8,/3z = -0.8, Model* /33 =ln(2) 0.020 0.288 0.589 0.834 0.989

*: (X1,X2 I D= d),..., N((2d,2d),:E], Var(X1) = Var(X2 ) =6,Cor~Xt. X2 ) = 0.5, D,..., B(0.5)

Tables Page 2

/ ' . . ,·

Test# Test Notation

1 xz 2 s 3 c 4 HLl

5 X2 1 6 HLnh

7 X 2nh

8 HLng

9 X 2ng

10 HLsh

11 X 2sh

12 HLsg

13 X 2sg

14 Wl

15 Wnh

16 Wng

17 Wsh

18 Wsg

19 HLoh

20 X 2oh

21 HLog

22 X 2og

23 Woh

24 Wog

' ~· . }. ' ... '·. . . '. ' ." '

Table 3 Definition of Notation for Test Statistics

Description

Pearson Chi -Square

Unweighted Sum-of-Squares

Hosmer-Lemeshow Decile of Risk

Hosmer-Lemeshow, weights= 1

Full Grouped Chi-Square, weights= 1 Hosmer-Lemeshow, omit cov = itln(il"), approximate weights

Full Grouped Chi-Square, omit cov = il"ln(it), approximate weights Hosmer-Lemeshow, omit cov = itln(it), optimal weights

Full Grouped Chi-Square, omit cov = itln(n), optimal weights Hosmer-Lemeshow, omit cov = g2 , approximate weights

Full Grouped Chi-Square, omit cov = g2 , approximate weights

Hosmer-Lemeshow, omit cov = g2 , optimal weights

Full Grouped Chi-Square, omit cov = g2 , optimal weights

Partial Sums-of-Residuals, weights= 1 Partial Sums-of-Residuals, omit cov = nln(il"), approximate weights

Partial Sums-of-Residuals, omit cov = itln(it), optimal weights Partial Sums-of-Residuals, omit cov = gz, approximate weights

Partial Sums-of-Residuals, omit cov = g2 , optimal weights Hosmer-Lemeshow, model specif. omit cov , approximate weights

Full Grouped Chi-Square, model specif. omit cov, approximate weights

Hosmer-Lemeshow, model specif. omit cov, optimal weights Full Grouped Chi-Square, model specif. omit cov, optimal weights Partial Sums-of-Residuals, model specif. omit cov, approximate weights

Partial Sums-of-Residuals, model specif. omit cov, optimal weights s

Tables Page 3

Table 5

Simulated percent rejection at the a= 0.05 level using sample sizes of 100 and 500 with 500 replications. Confidence intervals are obtained using ± 2%

Normal-Distrib U(-6,6) U( -4.5,4.5) U(-3,3) U(-1,1) xz(4) Bernoulli

Sample Size 100 500 100 500 100 500 100 500 100 500 100 500

Fig. 1 Setting 1 2 3 4 5 6 7 8 9 10 11 12

xz 5.4 4.2 4.2 4.6 5.4 6.8 5.0 3.8 3.8 3.0 4.4 5.0 ~ s 5.0 5.2 4.6 3.2 5.4 6.0 4.8 3.8 4.4 4.4 5.6 4.6 A c 6.2 5.8 3.4 4.2 4.8 3.6 4.0 5.6 6.6 4.2 3.0 4.8

HL1 6.8 6.4 3.4 4.6 4.8 3.6 4.0 5.6 6.8 4.4 3.4 4.8 X2 1 6.6 5.8 2.8 5.0 4.4 4.0 4.4 5.4 5.4 4.8 5.0 6.0

HLnh 7.4 6.2 3.2 4.8 5.4 3.8 4.0 5.6 7.6 5.2 4.0 5.2 X 2nh 6.6 5.6 2.6 5.8 4.8 4.0 4.0 4.6 7.2 5.2 6.4 5.2 HLng 6.8 5.2 3.6 5.4 4.6 4.6 4.8 5.0 8.6 5.2 3.6 6.0 xzng 5.2 5.0 3.4 4.4 3.6 5.0 4.2 5.2 7.2 3.4 3.8 5.2 HLsh 6.6 6.2 3.2 5.2 5.2 4.0 3.8 5.6 6.0 5.4 5.6 5.2 X 2sh 6.4 5.6 3.2 4.8 4.2 5.0 4.2 3.4 7.6 5.2 5.4 6.0 HLsg 7.4 6.6 3.8 4.8 5.6 5.2 4.2 5.4 7.2 6.2 6.4 6.6 X 2sg 5.8 6.0 3.6 5.4 5.2 4.0 4.2 5.0 6.2 6.4 6.0 6.2 WI 4.8 5.6 3.4 4.6 6.0 4.0 5.6 5.4 4.8 5.4 3.2 6.4

Wnh 5.8 5.2 2.8 5.4 4.8 3.6 6.0 5.0 6.4 5.0 3.2 6.4 Wng 6.2 4.2 3.4 5.0 3.6 3.4 0.2 1.6 6.0 4.2 3.4 5.2 Wsh 6.4 4.8 2.8 4.4 2.8 3.6 0.0 0.8 4.8 4.4 2.6 5.0 Ws,l? 6.8 5.0 3.4 4.2 3.2 4.0 0.2 1.2 4.4 4.2 2.6 5.4

Tables Page 4

'' J

Table 6 CoeMcients for the Generalized Logistic Model

Model at ~

Pro bit 0.165 0.165

Comp. Log-Log 0.620 -0.037

Long Tails -1.0 -1.0

Short Tails 1.0 1.0

Asymmetric Long-

Short Tails -1.0 1.0

Tables Page 5

,,. ·,·

Table 7 Simulated Percent Rejection at the a= 0.05 Using Sample Sizes of 100 and 500

with 500 Replications, Confidence Intervals are Obtained as ± 2%

Table 7.1 Quadratic Models

Model J = 0.01 J = 0.05 J = 0.10 J = 0.20 J = 0.40

Sample

Size 100 500 100 500 100 500 100 500 100 500 Fig. 2

Setting 1 2 3 4 5 6 7 8 9 10 xz 8.2 9.4 37.6 86.2 62.2 99.0 83.8 100.0 98.6 100.0 ~ s 4.6 8.2 36.0 88.2 65.0 99.2 87.8 100.0 99.0 100.0 ~ c 8.8 8.0 31.2 79.6 55.2 97.4 76.0 100.0 93.4 100.0

HL1 9.0 8.4 31.4 79.6 55.4 97.4 76.0 100.0 93.4 100.0 X21 9.4 8.2 31.0 77.6 54.0 97.2 76.2 100.0 92.8 100.0 HLnh 9.2 8.2 30.8 75.2 49.4 96.0 71.0 100.0 91.8 100.0 X 2nh 10.0 7.2 30.2 75.6 50.0 97.0 74.2 100.0 92.8 100.0 HLng 9.6 10.6 33.4 81.8 56.8 98.0 77.0 100.0 94.0 100.0 X 2ng 7.4 8.4 29.0 77.2 52.2 97.4 73.4 100.0 92.4 100.0

HLsh 8.2 9.4 30.8 81.0 55.8 97.4 76.8 100.0 94.4 100.0 X 2sh 7.6 8.0 30.2 78.4 53.2 97.4 76.4 100.0 94.0 100.0 HLsg 11.4 9.6 33.8 84.0 57.2 97.4 79.8 100.0 96.0 100.0 X 2sg 9.2 8.8 30.4 80.0 52.6 97.4 77.2 100.0 94.4 100.0

W1 4.4 6.2 17.8 67.2 37.2 95.4 63.4 99.8 90.2 100.0 Wnh 4.6 7.6 24.4 78.0 46.2 97.6 68.8 99.8 92.2 100.0 Wng 6.2 6.8 24.6 82.6 45.4 98.2 59.6 100.0 74.0 100.0 Wsh 6.8 8.2 37.4 86.6 63.4 99.4 86.0 100.0 97.4 100.0 Wsg 5.8 7.6 30.2 90.0 59.0 99.8 80.4 100.0 92.2 100.0 HLoh+ 10.8 9.8 33.4 81.2 54.6 97.8 79.0 100.0 94.8 100.0 X 2oh+ 9.0 8.4 29.8 80.0 54.2 96.8 77.6 100.0 94.2 100.0 HLog +

9.6 9.6 33.8 84.0 57.2 97.4 79.8 100.0 96.0 100.0 X 2og+ 9.2 8.8 30.4 80.0 52.6 97.4 77.2 100.0 94.4 100.0 Woh+ 4.4 6.2 26.0 81.4 50.8 98.0 74.4 100.0 95.8 100.0 Wog+ 5.8 8.4 31.8 90.0 62.4 99.4 82.6 100.0 98.4 100.0

+: Ommitted Covariate x 2

Tables Page 6

'·, ;J·•'.

''

Table 7.2 Interaction Models

Model I= 0.1 I= 0.30 I = 0.50 I= 0.70

Sample

Size 100 500 100 500 100 500 100 500 Fig. 3

Setting 1 2 3 4 5 6 7 8

~ 5.2 5.6 8.8 29.8 23.0 68.0 38.4 93.8 s 5.6 5.0 8.2 26.8 21.8 72.0 40.2 98.6

A

c 4.6 5.6 5.0 11.4 10.8 40.2 23.2 84.2 HL1 4.6 5.6 5.0 12.0 10.8 40.2 23.2 84.4 X21 5.4 6.0 5.8 12.6 11.8 39.2 23.0 82.2

HLnh 4.6 5.8 5.4 12.0 11.4 38.0 23.4 82.0 X 2nh 5.0 6.0 6.0 12.2 12.2 40.8 25.0 82.6 HLng 5.8 5.6 7.2 13.6 11.8 42.6 26.2 85.8 X 2ng 4.6 5.8 6.6 13.8 10.8 41.2 24.4 84.4 HLsh 5.4 6.0 5.6 12.0 10.6 42.4 23.0 86.4 X 2sh 5.6 5.2 5.6 13.0 11.2 39.6 23.4 85.0 HLsg 6.0 5.4 6.8 12.8 12.2 45.8 27.2 86.4 X 2sg 6.0 5.4 6.4 1.3 10.2 43.8 25.0 84.6 Wl 5.4 5.8 7.2 16.4 13.4 49.8 21.2 86.4

Wnh 3.0 4.8 7.0 18.2 15.8 60.0 26.8 94.6 Wng 0.0 0.4 0.8 10.8 8.4 59.4 27.8 97.0 Wsh 1.6 3.4 3.2 20.8 13.8 61.4 31.8 96.4 Wsg 0.4 0.8 1.6 14.4 8.0 62.6 27.2 97.4

HLoh+ 2.2 3.4 2.4 1.6 1.4 2.2 0.4 10.6 X 2oh- 1.4 3.7 2.6 4.6 2.8 26.8 4.8 71.0 HLog + 5.6 5.0 6.4 14.0 11.8 46.2 26.8 89.0 X 2og 6.0 4.2 6.8 13.2 9.6 43.4 24.0 84.6 Wah+ 7.0 7.6 8.0 23.6 15.8 64.8 29.8 95.2. Wo£ 6.6 8.0 10.6 32.8 24.0 76.4 38.0 98.6

+: Ommitted Covariate x x d

Tables Page 7

'• .'

~ ~. ' •' · .. ' ~ -... \

Table 7.3 Alternative Link Functions

Model Pro bit Comp. Log-Log Long Tails Short Tails Asymmetric Long-

Short Tail

Sample Size 100 500 100 500 100 500 100 500 100 500

Fig. 4 Setting 1 2 3 4 5 6 7 8 9 10

xz 6.4 7.6 2.6 17.6 5.0 13.0 0.4 43.6 2.4 87.2 A s 5.8 10.2 5.2 23.4 5.4 12.6 11.0 77.2 17.8 86.4 A

c 4.8 6.8 3.4 27.0 4.0 7.8 3.4 19.0 12.2 92.6 HL1 5.0 6.8 3.6 27.6 4.0 7.8 3.4 19.2 12.6 92.8 X21 5.0 7.6 4.4 25.4 5.2 7.8 3.0 19.4 11.8 91.8

HLnh 5.6 6.8 4.0 26.2 3.8 7.8 5.0 19.4 12.0 91.8 X 2nh 4.2 7.6 5.4 26.0 4.6 7.0 3.8 16.2 10.2 91.6 HLng 6.2 8.6 6.0 28.0 5.6 8.4 3.2 12.0 13.4 94.0 X 2ng 4.0 8.0 6.6 26.4 4.2 7.8 2.6 19.0 12.0 91.0 HLsh 4.2 7.0 5.6 23.8 4.6 7.4 3.6 17.2 13.0 90.8 X2sh 3.8 6.6 6.0 24.8 3.6 7.8 3.0 17.0 11.0 91.8 HLsg 6.2 9.6 7.0 31.0 6.6 8.2 3.2 13.8 15.0 94.4 X 2sg 4.0 11.2 6.8 25.0 4.4 7.6 2.4 18.4 12.0 92.0 W1 6.0 7.6 9.0 37.6 6.4 8.2 11.0 41.6 25.4 95.0

Wnh 5.8 7.2 8.4 21.4 6.0 8.0 13.4 46.2 28.0 98.0 Wng 5.6 5.2 10.8 36.0 1.4 4.4 7.6 19.6 44.2 99.8 Wsh 3.2 4.0 8.0 48.0 0.0 4.0 10.2 2.6 22.4 99.8 Ws,R" 2.8 4.6 9.6 5.1 0.0 4.0 0.8 2.0 28.8 99.8

Tables Page 8

• . ' .-"'. 1,.

': '. ·,-, I , .. 'r'/-r•·. • -~~. -.·"'~

.::,

Table 8 P-Values ofthe Goodness-of-Fit

Statistics for the Low Birth Weight

Model in Table 1

Statistic p-value x2 0.667

A

s 0.084 A c 0.229

HL1 0.385

X 2 1 0.543

HLnh 0.367

X 2nh 0.498

HLng 0.127

X 2ng 0.239

HLsh 0.340

X 2sh .0446

HLsg 0.147

X 2sg 0.270

W1 0.362

Wnh 0.438

Wng 0.325

Wsh 0.900

Wsg 0.300 HLoh+ 0.359 X 2oh+ 0.520 HLog+ 0.239 X 2og+ 0.053 Woh+ 0.188 Wog+ 0.400

+: Omitted Covariate = AGE/

Tables Page 9

9

8

f\

7 .....

6

~ 8 5 1 ~-"-"~ tu ~~· ..

0...

4j ·~~~

3 y

2

Figure 1 Percent Rejected Using a= 0.05

Null Case

1,-----0 ---------

1 2 3 4 5 6 7 8

Setting

9 10 11 12

····J.i•-·Wnh Wng

--Wsh --Wsg

- ... -

100.0

90.0

80.0

70.0

60.0 +-'

5 50.0 ~ 0..

40.0

30.0

20.0

10.0

0.0

)

1 2 3 4

Figure 2 Percent Rejected Using a. = 0.05

Quadratic Models

5 6 7

Setting

""

-+-X2

-s2 c

~HLl

-*-X21 _,._HLnh

-t--X2nh

-HLng -X2ng

HLsh

X2sh HLsg

--+<~ X2sg

Wl ··.";fr····Wnh

Wng --Wsh

-Wsg

---.-·-- Hloh ;,1;<-X2oh

---·<i·--Hlog ---*- X2og

~Wah

8 9 10 ,-.-wag

' . . ~

100.0

90.0

80.0

70.0

60.0

'· :_; ,,

~

§ 50.0 1-<

~ 40.0

30.0

-~. -'

20.0

10.0

0.0

. -..,_•.

I

-----

1 2 3

Figure 3 Percent Rejected Using a = 0.05

Interaction Models

/

4 5

Setting

./· ·.,,

'•.,

i

6 7

/ /

//

/ / /

~./

8

-+-X2

------ S2 c

~HL1

-.-x21 -+-HLnh -+-X2nh --HLng -X2ng

HLsh X2sh HLsg

_,,*, .. X2sg

<~··W1

.• ,01- .. Wnh

Wng --Wsh --Wsg --+-- Hloh

··i1i~ ·X2oh - &--- Hlog

-*-X2og -.-woh ----.-"-- Wog

100.00

90.00 .. ·'

80.00

- _..::. 70.00

60.00

~ 50.00 8

~ 40.00

30.00

20.00

10.00

0.00 1 2 3

Figure 4 Percent Rejected Using a = 0.05

Alternative Link Models

4 5 6 7

Setting

8 9 10

-+-X2 ---s2

c --*-HLl __._X21 -.-HLnh --t-X2nh --HLng -X2ng

HLsh X2sh HLsg

---J<:--· X2sg ·· :JcW1 ..... .., .. Wnh

Wng -Wsh --Wsg

• - ~' ~· .;------ --c.o·'.-~ ..... , .. ,~_. ......... _~--------