Classification on the Basis of Successive Observations

  • Published on

  • View

  • Download

Embed Size (px)


  • Classification on the Basis of Successive ObservationsAuthor(s): K. UlmSource: Biometrics, Vol. 40, No. 4 (Dec., 1984), pp. 1131-1136Published by: International Biometric SocietyStable URL: .Accessed: 28/06/2014 08:14

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .

    .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact


    International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

    This content downloaded from on Sat, 28 Jun 2014 08:14:16 AMAll use subject to JSTOR Terms and Conditions

  • BIOMETRICS 40, 1131-1136 December 1984

    Classification on the Basis of Successive Observations

    K. Ulm Institute for Medical Statistics and Epidemiology, Technical University Munich,

    Sternwartstrasse 2, D-8000 Munich 80, West Germany

    SUMMARY A model is presented for the classification of an individual into one of two populations on the basis of successive measurements. We consider the model in which the measurements follow exponential decay curves. These curves are described by an autoregressive stochastic process of first order. With these assumptions the expected values and covariance matrices of the variables are estimated and used to calculate the discriminant function. This model is applied to the assessment of the diagnosis of Type-B hepatitis virus infections.

    1. Introduction

    A traditional classification problem arises in testing a diagnostic procedure that involves simple observations.

    Usually the observations are taken at one time, but we consider a special situation in which observations on one or more random variables are made on an individual at successive points in time. The goal is to classify the individual into one of two or more populations Hi (i = 1, 2, . . .) on the basis of these observations. This situation occurs in the diagnosis or prognosis of a disease, and the classification generally affects the choice of treatment.

    An example provided by the diagnosis of a Type-B hepatitis virus. Very broadly, there are two different forms of hepatitis-B: acute and chronic. Each has its special treatment, which is only appropriate for one form and not for the other. Therefore it is desirable to allocate a patient to the right treatment as soon as possible. The exact diagnosis of Type- B hepatitis can be made histologically some months after the onset of the disease. The question is: are there some easily observed features that would permit an earlier diagnosis? Some enzymes which characterize the condition of the liver have been considered. The measurement of the enzymes at a single time does not reflect the previous course of the disease. Therefore these enzymes have to be monitored over a longer period.

    The times at which the observations are made, are represented by t = 0, 1, 2, . . ., N. The value of N may be thought of as the minimum period of observations before a diagnosis can be made with a high degree of certainty. The observation, x,, at time t is thought of as the realization of a random variable, X,. Let X = (Xo, . .. , XN) be normally distributed in H, (i = 1, 2), thus X - N(tti, Li), let C(i I j) be the cost of misclassifying x in Hi when it should be in Hj (i t j), and let qi be the a priori known probability that x comes from Hi (i= 1, 2).

    The decision rule that minimizes the expected costs of misclassification is well-known (see Anderson, 1958, Ch. 6). In the case of two populations 11 and II2, when the covariance matrices are equal, i.e. 21 = 22 = 7, the decision rule is based on the linear discriminant

    Key words: Discriminant analysis; Autoregressive stochastic process; Exponential growth curves; Sequential allocation.

    1 131

    This content downloaded from on Sat, 28 Jun 2014 08:14:16 AMAll use subject to JSTOR Terms and Conditions

  • 1132 Biometrics, December 1984


    D(x) = - !(tt + h2)j' -(hl - I'2). (1)

    When 21 t 22, one has the so-called quadratic discriminant function. A patient is classified into HI if

    D(x) > ln c, otherwise into [12 with

    c = C(1 12)q2QC(21 l)qIj1'. (2)

    The coefficients of the discriminant function depend on the parameters of the distributions of x(ui and Li, i = 1, 2). These parameters are usually unknown and have to be estimated from samples of II, and I12. With the estimators i and Li, one can calculate D(x). In the case of successive observations, various methods exist to estimate the parameters; see Lachenbruch (1975, p. 93) or Azen, Garcia-Pera and Afifi (1975). The purpose of this communication is to derive a new approach.

    2. Description of the Model

    Consider the model in which the variables in each of the two populations decay almost exponentially. Figure 1 shows the decay curves for one enzyme in the diagnosis of hepatitis-B (for details, see Neiss and Ohlen, 1978). In order to simplify the notation, the


    , 2000 a.



    Acute Hepatitis 0~~~~ *~1000


    a 500 E 300 | t^^ ;,\ T Chronic Hepatitis

    200 4 + 2 00 100 ....... . ............................. ~kW eek after

    0 1. 2. 3. 1. 5. 6. GOT-Peak

    Figure 1. Curves of the variable GPT (glutamate-pyruvate-transaminase) in 50 patients with acute hepatitis-B and 38 patients with chronic hepatitis-B. The mean values, the 25% (Qj) and 75% (Q3) percentiles are shown. Time is measured from peak GOT (glutamate-oxalacetate-transaminase)


    This content downloaded from on Sat, 28 Jun 2014 08:14:16 AMAll use subject to JSTOR Terms and Conditions

  • Classification of Time-Dependent Data 1133

    subscript i that denotes the population will be omitted. The exponential decrease is described by a first-order autoregressive time-series model in which x, (O < t < N) is again the realization of a random variable X,,

    Xt - a = O(Xt-, - a) + ct (3)

    with E(E,) = 0. Using (3) recursively, we easily find the exponential decrease in E(X,) to be

    E(X) =a + fl& with

    E(Xo) = a + iB.

    The usual assumptions are made about the error terms e:

    (i) E(E,) = 0, (ii) var(E,) = a2 (constant), (iii) corr(E,, c,') = 0, t t t'.

    These assumptions lead to constant var(X,), but from Fig. 1 one can see that this does not hold in the hepatitis-B example. Therefore we have to modify Assumption (ii) about the error terms c, (see Wegman, 1974) when considering a process which, in general, will be nonstationary; the assumptions to be made then are

    (i) E(E,) = 0, (ii) var(E,) = ato, depending on t, (4) (iii) corr(c,, E,') = 0, t t t'.

    This means Ic, is an uncorrelated sequence of random variables with expectation 0 and a variance depending on t.

    2.1 Variance-Covariance Structure

    The variance-covariance structure will be derived by using the variable Y, = - a, instead of X,. From (3) it is easy to see that

    Y= bY,-1 + c,. (5)

    Multiplying both sides by Y-s, we have

    Yt YI_s= Yt-I Yt-s + El Yt-s

    Taking expectations, we obtain

    E(YI YY_s) = 4E(Y_,1 Yts).

    Subtracting the expectation of (5), we have for the covariance structure

    cov(Y,, Y_S) = 4 cov(Y,-1, Y-S). (6)

    Recursive application of (6) gives

    cov(X", Xt-s) = cov(Y", Yt_s)

    = osvar(Y) (7)

    Thus the system of covariances depends only on var(X,) and the parameter 4.

    This content downloaded from on Sat, 28 Jun 2014 08:14:16 AMAll use subject to JSTOR Terms and Conditions

  • 1134 Biometrics, December 1984

    For the variances it follows that

    X- E(Xt) = (Xt - a) - E(Xt-a)

    = Yt- E(Yt)

    = (oYt-, + Et) - 4E(Yt-1)

    = Yt- -E(Yt-1)I + Et, and recursively,


    Yt- E(Yt) E o U-Ai - j=O

    Squaring, taking expected values and recalling that c, is uncorrelated with c,', t t t', we have

    var(Y,)= var(X,)

    = E 2(1-j) j2 (8) j=O

    To summarize, for the estimation of all E(X,) it is enough to estimate just three parameters, a, ,B and 0. For the covariance matrix it is only necessary to estimate the variances of the random variables Xt. For the extension from the univariate to the multivariate case, the cross-correlations between the variables are also required.

    2.2 Estimation Procedure

    To estimate the parameters it is necessary to analyze an autoregressive stochastic process by either maximum likelihood or least-squares methods (see Gallant and Goebel, 1976). In a first step the parameters 0 = (a, ,B, k) will be estimated for each sample. With 0 as the estimator for 0, one can estimate var(X,) and 2: using (7).

    Gallant and Goebel suggested an iterative procedure to compute a quasi-Aitken estimator of 0, with 0 as the starting value weighted with the estimated covariance matrix M. The results of an additional simulation study show that this procedure does not improve the estimation in the situations considered.

    3. Application

    3.1 Description of the Hepatitis-B Study

    The decision rule was applied to a sample consisting of 88 patients, 50 patients with acute hepatitis-B (AH) and 38 patients with chronic hepatitis-B (CH). All diagnoses were based on histological reports. For about two months the levels of 10 enzymes were determined weekly in all 88 patients. These enzymes are easy to monitor. Besides GPT (glutamate- pyruvate-transaminase) there were GOT (glutamate-oxalacetate-transaminase), -y-GT (glutamyle-transpeptidase), CHE (cholinesterase) etc.

    Descriptions of all enzymes have been given by Neiss and Ohlen (1978). The difference between the two populations, AH and CH, is shown for GPT in Fig. 1; less pronounced differences exist for the other enzymes.

    3.2 Allocation at a Fixed Time

    A patient is classified into Population 7 if the discriminant score D(x) is greater than 0 and otherwise into Population 112. This means that the value of c [defined in (2)] is set to 1 and

    This content downloaded from on Sat, 28 Jun 2014 08:14:16 AMAll use subject to JSTOR Terms and Conditions

  • Classification of Time-Dependent Data 1135

    the a posteriori probability for 11I s

    pr(1I x)> .5.

    Using the GPT data up to the third week (the covariance matrices were unequal), we find that 97% of patients are classified correctly by the quadratic discriminant function with both the leaving-one-out and reclassification methods. Additional variables do not make for a better classification rate.

    Estimation of ui and Yj in the usual way, with time dependence ignored, yields a linear discriminant function that gives the following results (see Neiss and Ohlen, 1978): (i) the classification rate with the leaving-one-out method is 90%, based on the values of GOT and GPT at the time of the GOT peak and one week later (93% with the reclassification method); (ii) if the CHE and y-GT levels of the third week are used in addition, the classification rate is improved to 94% for both methods; (iii) the quadratic discriminant function shows higher error rates.

    3.3 Sequential Allocation

    To make a decision on the basis of successive measurements, as in the exact diagnosis of hepatitis-B, an appropriate sequential allocation rule can easily be devised. At each time t one can decide to allocate a patient to one of the two populations or continue making observations (for details see, for example, Lachenbruch, 1975, p. 93). A patient is classified if the a posteriori probability pr(11i I x) is greater than .95.

    After the first week (use of GPT only and a quadratic discriminant function), 85% were classified, all of them correctly. Observations for the other patients had to be continued. At the sixth week, observations were ended so an allocation was forced, i.e. pr(Ili I x) > .5. By this procedure 2% of patients are misclassified by the leaving-one-out rule. With the exponential model of Azen, Garcia-Pera and Afifi (1975), which seems the appropriate one for the hepatitis example, one has to wait some weeks before starting classification. This procedure uses the parameters of the curves instead of the measurements themselves. For (3), which is also the basis of the model of Azen et al. (1975), three parameters, a, ,B and X, are necessary to describe a curve for one patient. To estimate these parameters one needs at least three observations, hence the wait before starting classification. Thereon the classification rate is nearly the same.

    4. Conclusions and Discussion

    A model under the assumptions of an autoregressive stochastic process has been derived for the classification of patients into one of two populations, on the basis of successive observations. In assessing the diagnosis of Type-B hepatitis virus this model gave rise to lower error rates and allowed an earlier classification than other models.

    The assumption of an autoregressive stochastic process of first order is valid only if the data follow an exponential trend. If the measurements follow some other trend one has to modify the assumptions. However, the application of this model is not restricted to an autoregressive process of first order. This process seems to be appropriate in the example considered, but within the model it is also possible to assume that the underlying process is described by an autoregressive process of higher order or a moving average process or a mixture of both. The fit of the data is then better, but one has to estimate more parameters which may lead to a higher error rate.

    The assumptions depend on the particular situation. In the example of the diagnosis of hepatitis-B infections, the time interval between two measurements is constant. Suppose observations follow some other pattern: at the outset the intervals are short and become longer towards the end of the observation period. The proposed model applies in such

    This content downloaded from on Sat, 28 Jun 2014 08:14:16 AMAll use subject to JSTOR Terms and Conditions

  • 1136 Biometrics, December 1984

    situations. The time interval is reflected in the covariance between two measurements, and this is taken into account in the model.

    To summarize, models which recognize that the course of a disease is a dynamic process seem to be appropriate for constructing a decision rule, especially when the observations are taken at successive times.


    The author wishes to thank Profs Dr H.-J. Lange...


View more >