Nonlinear quasi-likelihood models: applications to continuous proportions

ELSEVIER Computational Statistics & Data Analysis 21 (1996) 449-461

COMPUTATIONAL STATISTICS

& DATA ANALYSIS

Nonlinear quasi-likelihood models: applications to continuous

proportions

Christopher Cox Department of Biostatistics. School of Medicine and Dentistry, University of Rochester, Rochester,

NE, 716-275-6684, USA

Received March 1994; revised May 1995

Abstract

We discuss some practical aspects of fitting and interpreting nonlinear quasi-likelihood regression models. The examples considered are for data in the form of continuous rates or proportions distributed on the unit interval. However, many of the concepts and procedures illustrated should be helpful with other models as well. Our approach to estimation for multi-parameter models is by the method of fitting expected values discussed by Jennrich and Moore (1975) and Cox (1984a). This allows the use of standard nonlinear regression computer programs. An important feature of this approach is that it does not require the existence of a link function. Thus nonlinear models are easily accommodated, including models with a parametric link function, as well as models with a link to a nonlinear predictor having multiplicative terms. After a brief discussion of the properties of these models and the method of fitting, we provide a detailed illustration of fitting a number of link-linear as well as nonlinear models to a set of data considered by Wedderburn (1974). The analysis makes use of a diagnostic tool known as the biplot to help select a multiplicative interaction model. A second example of a useful nonlinear model is provided by an experiment in toxicology. As part of the discussion of these alternative models, we illustrate the choice of link and variance functions using an extended quasi-likelihood developed by Nelder and Pregibon (1987).

Keywords: Quasi-likelihood model; Continuous proportions; Nonlinear regression

1. Introduction

The quasi-likelihood approach to estimation was introduced by Wedderburn (1974), in the generalized linear model context. Generalized linear quasi-likelihood models are also discussed by McCullagh and Nelder (1989, Chapter 9). In this

0167-9473/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved SSDI 0 1 6 7 - 9 4 7 3 ( 9 5 ) 0 0 0 2 4 - 0

450 C Cox/Computational Statistics & Data Analysis 21 (1996) 449-461

paper we consider an extension of the approach to nonlinear (non-link-linear) models. Briefly, a quasi-likelihood model is one in which the variances of the observed data are known (up to a scale parameter) functions of the means. The means in turn are known functions of the unknown parameters. If Y is the n-vector of data, then we assume that the elements of Y are uncorrelated and

Ep(Yi) = lti(fl) Vart~(Yi) = o-2V(fli), where fl is a p-vector (p < n) of unknown parameters. We denote the vector of means by m(fl) and the vector of observed values by Yo. The letter y denotes a single observed value. The convariance matrix of Y is denoted by V(m), a diagonal matrix.

The general class of quasi-likelihood models includes the class of single-parameter exponential family regression models (Jennrich and Moore, 1975; Cox, 1984a). A further specialization is obtained if we also assume the existence of a known, monotonic link function f and p predictor variables x for which 1~ =f(x ' f l ); these are the generalized linear models of Nelder and Wedderburn (McCullagh and Nelder, 1989). The larger class of models includes models with a link function to a nonlinear predictor, for example multiplicative interaction models, as well as models for which there is no natural link function. Both of these types will be illustrated subsequently, and the point will be made that the larger class of models offers a useful generalization of link-linear quasi-likelihood models. Furthermore, such models can be easily fitted using nonlinear regression routines in standard statistical packages such as GENSTAT, BMDP and SAS.

In the case of continuous data with known variances, least squares is a common approach to parameter estimation in nonlinear models. The least-squares problem is: minimize (Yo - m) 'V- l(yo - m). If we ignore the dependence of V(m) on fl, then the normal equations are

c3m' - - V - l ( y o - - m ) = O.

For exponential family models these are also the likelihood equations (McCullagh and Nelder, 1989; Cox, 1984a). By extension these equations are referred to in the present context as the quasi-likelihood equations, and the solutions as the quasi- likelihood estimates. They are equivalent to the generalized least-squares estimates obtained by actually solving the nonlinear least-squares problem at each iteration, using the current weights (Carroll and Ruppert, 1988). Properties of the quasi- likelihood estimates were derived by McCullagh (1983). In particular, the quasi- likelihood parameter estimates are asymptotically normal with mean fl and covariance matrix

O-2~ ore' lOm~-I =O(1 ) v -

The usual estimate of o -2 in the regression context is the residual mean square. With appropriate weights this is identical to the Pearson goodness-of-fit statistic divided

C. Cox/Computational Statistics & Data Analysis 21 (1996) 449-461 451

by its degrees-of-freedom,

6 2=Zz/(n-p), wherez

If it is possible to find a function I(#; y) such that OI/Op = (y - #)/v(#), then by analogy we refer to / (#; y) as the quasi-likelihood function. The quasi-likelihood function allows the definition of the deviance, D(y; #) = - 2(l(p; y) - l(y; y)), and of quasi-likelihood ratio statistics (McCullagh and Nelder, 1989, Chapter 9) for testing the fit of restricted models. McCullagh (1983) showed these to be asymptotically distributed as 0-2 times the appropriate chi-square distributions. If an estimate of 0 -2 is available, perhaps from an initial fit, then deviance statistics for testing reduced models can be divided by this estimate and the quotients treated as chi-square or, with adjustment for degrees of freedom, as F statistics. Alternatively, one could use approximate chi-square or F tests based on the difference in values of the Pearson chi-square.

An important extension of the quasi-likelihood concept was provided by Nelder and Pregibon (1987). They defined an extended quasi-likelihood function, which in its sample version is

Q+(yo; vh) = - ½ ~ [D(y;/~)/62 + loge{2r~Zv(y)}],

where the summation is over corresponding elements of the data Y0 and the estimated mean rh, denoted by y and/i. This extended quasi-likelihood includes the variance function v(#) and thus may be used to compare models having alternative variance functions. A simplified version of this statistic was proposed by McCul- lagh and Nelder (1989). They defined

1 ^2 Q~N(Yo; rh) = ~ ~ [D(y; fi)/ff2 + loge(0- )].

Both these expressions resemble the log likelihood based on a normal distribution, with D(yo; rh) substituted for the sum of squared residuals y~(y -/i)2. In the first case the variance is ~2v(y), while in the second case the variance is ~2. These two statistics will be illustrated by our first example.

2. Model formulation, fitting and evaluation

2.1. The link, the variance and the quasi-likelihood

In general, the choice of link and variance functions depends on the data. In a particular problem there may be a number of reasonable possibilities. Indeed any given link/variance combination may be only approximately correct and a number of acceptable alternatives may exist. However, certain natural choices can be distinguished. First is the canonical link for a given variance function. Given

452 C. Cox/Computational Statistics & Data Analysis 21 (1996) 449-461

a variance function v(p) the natural parameter is defined as

0(~/) = ; [U(Z)] - 1 dz (2.1)

(Wedderburn, 1974). The variable/t in the function 0(/t) denotes the argument of the indefinite integral. The usual definition of the natural parameter is in terms of an exponential family likelihood. The canonical link (McCullagh and Nelder, 1989, p. 32) is then defined as the inverse function #(~/) = 0-l(t/) with argument q (the linear predictor, ~/= x'fl). That is, we invert the natural parameter as a function of p and substitute q for the argument 0. Thus the canonical link satisfies the differential equation

~q v(~).

Canonical links for generalized linear models have desirable statistical properties and it may be anticipated that they will perform well in the quasi-likelihood context as well. For example, the canonical link for the binomial variance v(#) = #(1 - /~) is the logistic function. For the quadratic variance function v(#)= #z(1 -/~)2 we have (McCullagh and Nelder, 1983, Chapter 8)

and the canonical link is the inverse of the natural parameter function. The canonical link satisfies the differential equation/~' =/~2(1 - / 0 2 . Conversely, given the (increasing)link function/~ =f(x ' fl), we let r / (#)=f-a( /~) be the linear predictor and define the canonical variance by ]1

v(/~) = O r/(/0 . (2.3)

Canonical links may be useful if we start with a given variance function and can find a simple expression for the natural parameter. Since linearity is the more important criterion, it may be more common that we start with a link to a linear model,/~ = f(q), and then must select a reasonable variance function. As in the case of our first example, it frequently happens in practice that the same transformation which produces additivity helps to stabilize variability. That is, q = f - 1 (y) is the variance stabilizing transformation. The variance function is then

(~ ) - 2 v(/~) = I~--~ t/(/~ j (2.4)

We shall say that this variance function is orthogonal to the link/t(q). Note that the difference between the canonical (2.3) and orthogonal variance functions is simply a matter of the exponent.

In the orthogonal case, as noted by McCullagh and Nelder (1989, p. 293), the asymptotic covariance matrix of the quasi-likelihood estimates has a particularly

C. Cox~Computational Statistics & Data Analysis 21 (1996) 44~461 453

simple form when there is a link to a linear predictor. Let X denote the n x p model matrix of predictor variables. Then if we let 8m/Oil denote the diagonal matrix of partial derivatives with respect to the linear predictor, we have

0m 0m - X

and the (quasi) information matrix is

8m' dm x, Sm'/~m~ -2 8m [ ( fl ) ~'-~ v -1 . . . . |- | = ,Tx x ' x .

Thus the asymptotic covariance matrix of the quasi-likelihood estimates is the same as for the linear model with model matrix X and constant variance. In particular, any orthogonality properties in the model matrix will be preserved in the sense that the corresponding quasi-likelihood estimates will be asymptotically independent.

The orthogonal variance function for the logit link is v(/2)=/22(1 - /2 ) 2. The orthogonal variance function for the log link is/22. For the identity link, both the canonical and orthogonal variance are identically equal to one. Another useful link for the data in our first example is the complementary log-log, /2(r/) = 1 - exp[ - exp(r/)]. The orthogonal variance function is v(/2) = [ - (1 - /2 ) loge (1 - /2 ) ] 2, while for the canonical variance we omit the exponent. For small values of/2 this is very similar to the quadratic variance function,/22(1 -/~)2.

If the variance function is given, then the inverse of the transformation (2.4) will give the inverse of the orthogonal link (not the natural parameter). This is just the usual formula for a variance stabilizing transformation, and is identical to (2.1), but with exponent - 2. The property of orthogonality may be useful if the means are linked to a linear model. If there are nonlinear parameters in the model, then this property may be less important since orthogonality properties of the model matrix may no longer be preserved. In any case the proposed link/variance combination must be consistent with the data.

Using the fact that

f v--~ dz = l~O(/2)- f o(p) dl~,

the quasi-likelihood function can be written as

l(/2; y) = yO(/2) --/20(12) + f O(#) d/2.

We may write this expression in a more familiar form by using the canonical link /2(0) expressed as a function of the natural parameter. We then have

t(.(o); y ) = yo - /2 (0 )o + d . = yO - dO. d d


Thus, the function b(O) = f/2(0) dO satisfies (Ob/O0) =/2(0), and (~2b/~02) = v(/2(O))

(Wedderburn, 1974). For example, for v(/2) =/2(1 - p) we have

O(/2)=loge[l@p]andf/2(O)dO=loge(l+e°).

In the case of the quadratic variance function v(/2) =/22(1 - /2)z it follows from (2.4) that

/2(0) dO = loge + 1 p

(McCullagh and Nelder, 1983). As a check in this particular case, note that

~?b ~b cOp _ 1 - 0/2 #(1 - / 2 ) 2 v(/2) =/2.

Therefore,

[ 1 l ( # ; y ) = y 21og~ -- + -- log~ 1 --/2"

If we consider v(p) = ((1 - p) loge(1 -/2))2, the orthogonal variance function for the complimentary log-log link, then

0(/2) = -- v(/2)- 1/2 + loge( -- log~(1 -- #)) + ( loge(1 /2))k

k = 1 k k !

and the quasi-likelihood function is

t(/2; y) = (y - 1)0(/2) + 1/log~(1 -/2).

2.2. Fitting the model

Nonlinear quasi-likelihood models can be fitted using any weighted nonlinear regression computer program with the ability to recompute the weights at each iteration (Cox, 1984). Such programs are available in both BMDP (3R) and SAS (PROC NLIN); the GLIM statistical system can also be used Especially, useful are programs employing the Gauss-Newton algorithm, which requires the quantities /2(fl), ~p/~fl and v(p)- 1 (means, derivatives, and weights), which are specified in the control language of the program. In BMDP 3R the required derivations are computed numerically. PROC NLIN has an alternative algorithm (DUD) which does not require specification of derivatives. For more complex models, BMDP allows the use of a FORTRAN subroutine, which, however, does require calcu- lation of derivatives, while SAS has a built-in programming capability. In both programs, the deviance can be computed at each iteration and used as a termina- tion criterion; this requires computation of the quasi-likelihood function. Although BMDP 3R was used primarily for the examples, multiplicative interaction models

C Cox/Computational Statistics & Data Analysis 21 (1996) 449-461 455

for the first example were fitted using SAS PROC NLIN. This provided a convenient check on the method and the software.

Both programs also compute and print appropriately standardized residuals,

r = ( y - - h ) ] 1 /2

The scale factor h is the corresponding diagonal element of the projection (hat) matrix

H = A ( A ' A ) - 1A, where A = V- 1/2 c3rn

is the square root of the (quasi) information matrix (McCullagh and Nelder, 1989, p. 397; Cox, 1984b). These quantities, as well as observed and predicted values, can be conveniently extracted from the output file using a text editor for further analysis and plots. Note that if we have orthogonal link/variance functions in the link-linear case, then the hat matrix is the same as for the linear model.

Nonlinear regression programs typically require initial values for the parameters, and in the examples which we discuss some care was required and alternative sets of initial values occasionally had to be tried for some of the nonlinear models. A useful feature of PROC NLIN is that a range of initial values can be specified for each parameter, and the program will then perform a grid search. For models with a link to a linear predictor a convenient method for choosing initial values is to transform the data and perform the linear regression. The GLIM statistical system has an advantage for such models since initial values are chosen automatically.

3. Examples

3.1. An agricultural field trial

As a first example, we consider the data of Wedderburn (1974). The measurement is the proport ion (0-100%) of leaf blotch on 10 varieties of barley grown at nine different sites. The data are also available in McCullagh and Nelder. The logistic function is a natural link to a two-way additive model since it insures that the predicted values lie between zero and one. It is also a familiar link function for the analysis of binomial data. The possible usefulness of this link may be examined by transforming the data and performing the two-way ANOVA. As noted by Wedder- burn, the logit transformation appears to both produce additivity and stabilize the variance. Other possibilities for the link function include the log and complementary log-log links (McCullagh and Nelder, 1983).

On the basis of our previous discussion, the following models are all candidates.

(1) /~(0) = 1/(1 + e -°) v(p) = p(1 - #) (Canonical pair)

(2) p(O) = 1/(1 + e -°) v(p) = ~2(1 - - ~ ) 2 (Orthogonal pair),

(3) #(0) = 1 - exp[ - exp(0)] v(#) = (p - 1)loge(1 - p) (Canonical pair),

(4) p(O) = 1 - exp[ - exp(0)] v(p) = (1 - p)Zlog2(1 - p) (Orthogonal pair).


We will consider the fit of each of these link/variance combinations and attempt to select one for the complete analysis. Our initial model is the additive model for the two factors "site" and "variety", having 18 parameters with 72 dffor error. We then proceed to consider an extended model having a multiplicative interaction term. This possibility was suggested by Wedderburn in his original analysis. The fits of the four additive models are summarized in Table 1, which gives the estimate of residual error (d2), the deviance (G 2) and the chi-square goodness-of-fit statistics (g2), as well as the values of the Q+ statistics of Nelder and Pregibon and McCullagh and Nelder. Both statistics were computed during the final iteration; this can be easily accomplished in either B M D P or SAS. Note that when y = 0 the value of l(y; y) is undefined in some of these models. Rather than replace these zero values by a small positive number, we set l(y; y) = 0 as is done in logistic regression when the response is binary. This produced results similar to replacing zero observations by fitted values.

Of the four models, the values of G 2 and ~2 seem relatively small for the two canonical link combinations, suggesting variance functions that are too large. More important, the values of the Q + statistics are much smaller for these models than for the other two, while the values of Q~tN are larger, again suggesting that the estimated variances may be relatively large. The fit of the CCL/or thogonal and logit /orthogonal models seems about equally good as judged by Q +, but perhaps not Q~tN, which in this case reflects primarily the difference in G 2. The estimated dispersion parameters for both models are close to one. Since there seems little to choose between the two models we continue to use the original one, the logit/orthogonal. Based on the estimated value, we will also assume a dispersion parameter of one. Parameter estimates for this model are shown in the first column of Table 2. These values agree closely with estimates given by McCullagh and Nelder (1989) using a different parametrization. The asymptotic s tandard deviations of these estimates are identical to those for a linear model with I = 10 rows and J = 9 columns, parametrized to have I variety parameters and J - 1 site parameters, namely,

cr + j f j ] (with ~r - 1, = 0.4472).

F rom the estimated parameters, the first three, second three and the last three varieties appear similar to each other. Variety 7 is intermediate between the groups

Table 1 Fits of four additive quasi-likelihood models to data on the percentage of leaf blotch on barley

M o d e l (link/variance) if2 G 2 Z 2 O + Q~IN

Logit/canonical (/~(1 - p)) 0.08878 6.13 6.39 134.38 74.47 Logit/quadratic (orthogonal) 0.9881 38.64 71.14 183.50 - 19.01 CLL/canonical 0.08160 4.28 5.87 139.21 86.57 CLL/orthogonal 0.9249 14.79 66.60 183.50 - 4.48

C. Cox~Computational Statistics & Data Analysis 21 (1996) 449 461 457

Table 2 Parameter estimates and standard deviations for a full and reduced model. The model is additive with a logit link and quadratic variance function

Estimate Estimate Estimate reduced

Parameter full model reduced model SD CR model SD

vl - 0.85 - 0.96 0.35 - 0.54 0.45 v 2 - - 1.32 - 0 . 9 6 0.35 - 1.32 0.49 v3 - 0.78 - 0.96 0.35 - 1.10 0.48 v, 0.10 0.36 0.35 0.19 0.43 v5 0.50 0.36 0.35 0.67 0.43 v6 0.47 0.36 0.35 0.44 0.43 v7 1.49 1.47 0.45 1.55 0.44 Vs 2.41 2.59 0.35 2.37 0.47 v9 2.28 2.59 0.35 2.19 0.46 Vlo 3.03 2.59 0.35 2.89 0.49 sl - 7.07 - 7.07 0.45 - 7.58 0.57 s2 5.68 - 5.63 0.45 - 6.13 0.57 s3 - 3.21 - 3.29 0.45 - 3.05 0.41 s, - 3.51 - 3.52 0.45 - 3.92 0.57 s5 - 2.96 - 2.90 0.45 - 2.81 0.41 s6 - 2.76 - 2.82 0.45 - 2.61 0.41 s7 - 2.15 - 2.11 0.45 - 2.17 0.45 Ss - 1.37 - 1.37 0.45 - 1.36 0.45 01 1.4 0.22 02 1.4 0.22 03 0.76 0.16 0,, 1.4 0.22 0s 0.76 0.16 06 0.76 0.16 07 1.0 - - Os 1 . 0 - -

o n e i t h e r s ide. T h i s o b s e r v a t i o n s u g g e s t s a m o d e l h a v i n g f o u r d i s t i n c t v a r i e t y

p a r a m e t e r s , w i t h v a r i e t y p a r a m e t e r s w i t h i n t h e t h r e e g r o u p s b e i n g e q u a l to e a c h

o t h e r . S u c h c o n s t r a i n t s o n t h e p a r a m e t e r s a r e e a s i l y i m p o s e d b y t h e B M D P 3 R

p r o g r a m , a n d a r e n o t d i f f i cu l t to p r o g r a m in SAS. P a r a m e t e r e s t i m a t e s a n d

s t a n d a r d d e v i a t i o n s for t h e r e d u c e d m o d e l ( h a v i n g 6 c o n s t r a i n t s ) a r e s h o w n in t h e

s e c o n d a n d t h i r d c o l u m n s o f T a b l e 2. T h e v a l u e o f z 2 fo r th i s m o d e l w a s 74.75, w i t h

G 2 = 43.90, s h o w i n g n o e v i d e n c e a g a i n s t t h e r e d u c e d m o d e l w h e n c o m p a r e d to t h e

v a l u e s in T a b l e 1. F o r e x a m p l e , t h e q u a s i - l i k e l i h o o d r a t i o s t a t i s t i c w a s

43 .90 - 38.64 = 5.26 w i t h 6df. W h e n v a r i e t y p a r a m e t e r 7 w a s a s s u m e d e q u a l to

v a r i e t y p a r a m e t e r 6 ( r e s p e c t i v e l y , 8), G 2 i n c r e a s e d to 53.49 (53.20, r e s p e c t i v e l y ) . W e

c o n c l u d e t h a t v a r i e t y 7 is d i s t i n c t f r o m t h e t w o g r o u p s o n e i t h e r s ide . T h e s e r e s u l t s

a r e s i m i l a r t o t h o s e p r e s e n t e d m o r e i n f o r m a l l y b y W e d d e r b u r n . O f c o u r s e , in

p r a c t i c e , o n e w o u l d h a v e t o a d d a c e r t a i n d e g r e e o f s k e p t i c i s m if t h e f ina l m o d e l

w e r e s u g g e s t e d s o l e l y b y t h e d a t a . S i m i l a r r e s u l t s w e r e o b t a i n e d u s i n g o t h e r m o d e l s .


In his original paper Wedderburn also ment ions the possibility of multiplicative interact ion models for this data. One formulat ion of this idea for an I × J table is

link(#ij) = ~i + flj + 2iOj,

(flj = 0, J~I = 0, 0j = 0, 0j_ a = 1). This is a model with 2I + 2J - 4 parameters. In the present table (I = 10, J = 9) this results in 34 parameters, a rather large number. In order to simplify this model we used a diagnostic plot known as the biplot (Bradu and Gabriel, 1978). Briefly the biplot provides a representat ion of both sites (by arrows) and varieties (by plot ted points). Pat terns in the ar rangement of points and arrows suggest various multiplicative interaction models. A biplot of the logits of the data showed that the points for the 10 varieties were roughly collinear, while the 9 sites fell into three roughly collinear groups (1-2-4, 3-5-6 and 7-8-9). The model which this plot suggested is a special case of the general multiplicative interaction model known as a columns regression model:

logit(/,tij) = flj + 2iOj,

(flj = 0, 0j = 1), having I + 2 (J - 1) parameters. The value of G 2 for the full model, having 26 parameters, was 27.87. The

quasi-l ikelihood ratio statistic for the 8 extra site parameters is then 38.64 - 27.87 = 10.77 with 8df, which is not significant. As noted above, groupings among the biplot arrows suggest equalities among the multiplicative site parameters. When the corresponding 6 constraints (01 = 02 = 04 ; 0 3 = 0 5 = 06; 0 v = 0 8

= 1) were imposed, the value of G z was 28.75, a decrease of 9.89 from the additive model, with 2df (p = 0.007). Parameter estimates and s tandard deviations for this model are shown in Table 2. Rather than model further relationships among sites, we imposed the same 6 constraints on the variety parameters as for the additive model, with the result that G 2 was 34.03, an increase of 5.28 with 6df f rom the reduced rows regression model. This value may also be compared with the value of G 2 for the reduced additive model, 43.90-34.03 = 9.87 with 8df. When variety parameter 7 was assumed equal to variety parameter 6 (respectively, 8), G 2 rose to 47.08 (43.92). Thus results of compar ing the 10 varieties are similar to those from the additive model. In this case the biplot proved to be a convenient tool for choosing an appropr ia te multiplicative interact ion model.

3.2. An experiment in toxicology

The data for this example were collected in an experiment to study the effects of a chelating agent, DMPS, for the removal of mercury from exposed mice. Groups of animals were given different doses of radioactive mercuric chloride and of D M P S and the a m o u n t of mercury excreted in urine was measured as a fraction of the initial body burden (normalizing for the dose of mercury). Like the previous data, the measurement is a cont inuous ratio varying between 0.0 and approximately 70.0%. As is often done in toxicology, we used a logit model with the log~o of the dose of D M P S as the predictor variable. A background level of excretion was


also assumed. To complete the model we employed the quadratic variance function v(/~) =/~z(1 -- #)z.

After a number of fits, including a fourth degree polynomial in the predictor variable, it became clear that a horizontal asymptote was needed in the model. The final model was therefore ~ = k + (e - k)/[1 + exp( - (z + M*)) ] , where a* is the loglo of the dose of DMPS. The parameter k (0 < k < 1) is the background rate, while ~, > k (assuming/~ > O) is the value of the asymptote. Of course, in this case, we do not have a link to a linear predictor since k and ~, are "nonlinear" parameters. The fit of this model was satisfactory compared to the degrees of freedom (G 2 = 30.42, Z 2 = 41.82 with 80df). The values of the parameters (SD) were 2 = 0.02 (0.0035), 7 = 0.459 (0.055), 0t = 3.784 (2.197), fl = 1.587 (0.673). The fit is summarized in Fig. 1 which shows a plot of the fitted dose-effect curve vs. log~o D M P S together with the data, summarized as mean _ 2SE. Groups of three or four animals from different experimental runs are plotted separately. The controls are arbitrarily placed. The wider intervals reflect smaller numbers of animals in some groups. Intervals have been jittered to prevent overlap. The fit of this model is clearly satisfactory. A plot of standardized residuals and predicted values did not suggest any problems.

0.70]

0.65

0.60]

0.55 ~

0.50

~0.4s .o

~0.40

~0.3s

o~0.30

¢1 ~0.25 u

::', O. 20 =

~0.15'

0.i0'

0.05'

0.00

0.0001 0.0010 0.0100 0.1000 1,0000 10.0000 DMPS Cm mole/kg]

Fig. 1. Data and fitted response curve for a quasi-likelihood model based on the logistic function. The four parameter model included nonlinear parameters for background (lower asymptote) and max-

imum effect (upper asymptote). Data from different runs are plotted as mean + 2 SE.


The mean function for this model is identical to a standard model used in radioimmunoassay (Finney, 1976). The variance function is different, however, since these models are usually based on radioactive counts (of bound ligand). Alternatively, the ratio of bound to total could be treated as a continuous propor- tion and the model considered in this example could be applied. The advantage of this approach is that the present variance function does not involve unknown parameters, as is usually the case in radioimmunoassay.

4. Discussion

The method of fitting expected values has been shown to perform quite satisfac- torily for two different nonlinear quasi-likelihood models. This approach facilitates model fitting by allowing the use of standard statistical packages such as SAS and BMDP. The models considered provide a useful class of alternatives for data in the form of continuous fractions. Clearly, however, concepts such as orthogonality of link and variance functions are useful in modeling other sorts of data. The quasi-likelihood deviance statistic, G 2, appears to be a useful measure of goodness- of-fit, giving similar results to those using g 2. The extended quasi-likelihood is useful for discrimination between models with different variance functions.

At the same time, a certain amount of caution seems appropriate when using these models in practice. The ability to choose the variance function to fit the data raises the possibility of over or underestimating the variability, with conse- quences for both confidence intervals and hypothesis tests. Although our first example offers some encouragement, experience suggests that it is probably best if there is some previous experience on which to draw in choosing both the model for the mean and the variance functions. This was the case in our second example, at least as regards the model for the mean. In the case of radioimmunoassay as well, models have been suggested for the variance function. Some of these include parameters in the variance function, and large amounts of data are typically required to estimate these parameters. As will any technique, nonlinear quasi- likelihood models offer a useful family of modeling tools when used with appropriate caution.

The fitting of these models presents no difficulty using the approach we have illustrated. In every case, the required programs were short and easy to write. Standard features of both the BMDP3R and SAS programs greatly facilitated model checking and exploration. Results using the two different programs were very consistent. While both systems are batch-oriented, results can be extracted from the output for further analysis.

References

Bradu, D. and K.R. Gabriel, The biplot as a diagnostic tool for models of two-way tables, Techno- metrics, 20 (1978) 47-68.


Carroll, R.J. and D. Ruppert, Transformation and weighting in regression (Chapman & Hail, New York, 1988).

Cox, C., Generalized linear models - the missing link, Appl. Statist. 33 (1984a) 18-24. Cox, C. An elementary introduction to maximum likelihood estimation for multinomial models:

Burch's theorem and the delta method, The Amer. Statistician 38 (1984b) 283-287. Finney, D.J., Radioligand assay, Biometrics 32 (1976) 721-740. Jennrich R.I. and R.H. Moore, Maximum likelihood estimation by means of non-linear least squares,

Proc. Statistical Computing Section, Amer. Statist. Assoc. (1975) 57-65. McCullagh, P. quasi-likilihood functions, Ann. Statist. 11 (1983) 59-67. McCullagh, P. and J.A. Nelder, Generalized linear models (Chapman & Hall, New York, 1983). McCullagh P. and J.A. Nelder, Generalized linear models, 2nd edn. (Chapman & Hall, New York,

1989). Nelder, J.A. and D. Pregibon, An extended quasi-likelihood function, Biometrika 74 (1987) 221-232. Pregibon, D. Goodness of link tests for generalized linear models, Appl. Statist. 29 (1980) 15-24. Wedderburn, R.W.M., quasi-likelihood functions, generalized linear models, and the Gauss-Newton

method, Biometrica 61 (1974) 439-447.

Documents

Nonlinear quasi-likelihood models: applications to continuous proportions