11
Equivalence and Interval Testing for Lehmann's Alternative Author(s): Axel Munk Source: Journal of the American Statistical Association, Vol. 91, No. 435 (Sep., 1996), pp. 1187- 1196 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2291737 . Accessed: 16/06/2014 15:22 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PM All use subject to JSTOR Terms and Conditions

Equivalence and Interval Testing for Lehmann's Alternative

Embed Size (px)

Citation preview

Page 1: Equivalence and Interval Testing for Lehmann's Alternative

Equivalence and Interval Testing for Lehmann's AlternativeAuthor(s): Axel MunkSource: Journal of the American Statistical Association, Vol. 91, No. 435 (Sep., 1996), pp. 1187-1196Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2291737 .

Accessed: 16/06/2014 15:22

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journalof the American Statistical Association.

http://www.jstor.org

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 2: Equivalence and Interval Testing for Lehmann's Alternative

Equivalence and Interval Testing for Lehmann's Alternative

Axel MUNK

Equivalence and interval tests for Lehmann's alternative that extend the well-known Savage test for one-sided hypotheses are proposed. The proposed tests are shown to be unbiased with a strictly unimodal power function, provided the sample sizes in both treatment groups are equal. By means of a numerical investigation of the bias in the case of unequal sample sizes that are not too far apart, the suggested tests still turn out to provide practicable solutions. Because the computational effort to perform the suggested tests is considerable, tables containing the critical values are displayed to perform these tests easily. A numerical analysis of the power function of the interval test establishes this procedure as a powerful tool for detection of a significantly relevant difference in the small-sample case. In contrast to the case of interval testing, the fact arises that the performance of a powerful equivalence study under Lehmann's alternative requires an extensive amount of data. Because the proposed tests are based on the locally optimal scores under Lehmann's alternative, we cannot improve the suggested equivalence test essentially. Therefore, we also provide the asymptotic version of this test and display tables containing the required numerical values.

KEY WORDS: Bioequivalence; Clinically relevant difference; Population equivalence; Savage's test; Unbiased testing.

1. INTRODUCTION

A test formulation (T) of a drug and a reference for- mulation (R) are considered to be bioequivalent if the rate and the extent of the absorption of the drug in the sys- temic circulation are similar. According to the U.S. Food and Drug Administration (FDA), the ratio of two formu- lations ,uT/,uK for the pharmacokinetic parameters under study must be within some reasonable limits (which has been determined by the clinician) when bioequivalence be- tween the two formulations is claimed. (For further method- ological aspects, refer to Dobbins and Thiyagarajan 1992, Dunett and Gent 1977, or Metzler 1974.) This is termed average bioequivalence in the literature and represents the most popular criterion for bioequivalence. Various authors have criticized this concept (see Anderson and Hauck 1990 or Holder and Hsuan 1993), because average bioequivalence focuses only on the comparison of the means of the un- derlying distributions. Therefore, these same authors have suggested a bioequivalence criterion that requires the entire distribution of the test formulation to be sufficiently sim- ilar to that of the reference-a concept called population bioequivalence. They came to the conclusion that for a pa- tient being started on a new drug, population equivalence seems to be the more appropriate criterion than average bioequivalence (see Hauck and Anderson 1992). Although average bioequivalence often seems to be an inappropriate measure of equivalence, it is the most common criterion in equivalence studies up to now. (For a comparison over the most popular procedures in the case of normally distributed data, refer to Mandallaz and Mau 1981.) Jennison and Turn- bull (1993) suggested a sequential procedure to guarantee a

Axel Munk is Assistant Professor at the Ruhr-University Bochum, Ger- many. This work was supported by the Deutsche Forschungsgemeinschaft Grant Br655/4-2. Parts of this article were written while the author was visiting Indiana University, Bloomington. The author would like to thank the Department of Mathematics of the Indiana University for its hospitality and the Deutsche Forschungsgemeinschaft for making this visit possible. The author also thanks E. Brunner, M. Denker, H. Dette, and M. Puri for their helpful comments; 0. Dannenberg, U. Munzel, and especially M. Broder for computational assistance; and an associate editor and two ref- erees for some helpful comments that led to a substantial improvement of the representation.

small type II error, whereas various other authors derived decision procedures based on a Bayesian approach (see, e.g., Selwyn and Hall 1984). Giani and Finner (1991) and Giani and StraBburger (1994) defined simultaneous equivalence in the k-sample case and suggested tests based on the range statistic, similar to the test done by Bofinger, Hayter, and Liu (1993). In the meantime, equivalence tests are becoming a well-accepted statistical tool not only in medical research, but also in other sciences where quantitative methods are applied; for example, in psychology (Rogers, Howard, and Vessey 1993). During the last 5 years, various authors have begun to deal with the equivalence problem in semiparamet- ric and nonparametric models where especially confidence interval procedures were suggested (Hsu, Hwang, Liu, and Ruberg 1994). Within the following context, we would like to focus the interest of the reader on the work of Com- Nougue, Rodary, and Patte (1993) and Wellek (1993b), who provided asymptotic tests to establish the equivalence of two survivor functions under the assumption of a propor- tional hazard rate model with right-censored data.

Here we consider the equivalence problem for two survivor functions without censoring. Let X1,...,X ; Y, . . . , Yn, be independent random variables having contin- uous cdf's F for the Xi and G for the Yj. Using a purely nonparametric approach, Wellek (1993a) suggested the hy- pothesis of nonequivalence

H: P(X < Y)- I

>c, (1)

to claim the equivalence of F and G,

K: P(X < Y)--I < (2)

at a controlled error rate ae E (0,1). Here E E (0,1/2) de- notes the bioequivalence limit mentioned earlier. A seri- ous drawback of this measure remains-in general, none of the above-mentioned equivalence concepts are expressible

? 1996 American Statistical Association Journal of the American Statistical Association

September 1996, Vol. 91, No. 435, Theory and Methods

1187

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 3: Equivalence and Interval Testing for Lehmann's Alternative

1188 Journal of the American Statistical Association, September 1996

in terms of P. However, assuming Lehmann's alternative (Lehmann 1953),

G 1-(1-F)6, 6 > 0. (3)

Wellek's P and population equivalence turn out to be the same concepts in light of the sequel argument (Wellek 1993b). We obtain from (3) that P(X < Y) = 17(6 + 1), which allows us to rewrite (1) and (2) into the equivalence testing problem

H: 6 f [A-1, A] against K: 6 [A-1, A], (4)

where A = (1/2 + E)/(1/2 - E) > 1 is the new equivalence bound. As far as population equivalence is concerned, it is reasonable to consider F and G = 1- (1- F)6 as equivalent if we may assume that

sup IF(x)-{1-(1-F(x))6}1 < c (5) xCR

for a sufficiently small preassigned constant c > 0, to guar- antee the similarity of the entire distributions F and G. It follows by means of elementary calculations that (5) is equivalent to K in (4) where the equivalence bound A > 1 is the unique solution of the equation

h(A) = Al/(lA-) _ AA/(1-A) = c. (6)

Therefore, it is sufficient to deal with the testing problem (4), which can easily be interpreted in terms of P as well as in terms of population bioequivalence by means of the relation (6).

It is convenient to assume Lehmann's model to establish the equivalence of two survivor functions provided that no censoring is present. When T denotes a continuous ran- dom variable (rv) representing an individual's lifetime, this problem occurs within a Cox regression model (Cox 1972), where the hazard function of T given a known regression variable x is of the form h(tlx) = hF(t)exp(xf). Here 3 denotes an unknown regression variable, and proportional hazards are assumed. The hazard functions of the two dis- tributions are hF(t) and hG(t) = hF(t)exp(!3) which is equivalent to the assumption that F and G are related by Lehmann's alternative (3) where 6 = exp(/3) (see Lawless (1982, pp. 348) for a detailed description). In particular, when F and G are exponential with unknown scale param- eters A and r, and the fraction of the scales p = /A (of, e.g., two failure rates) is to be tested, we obtain a special case of the model (3).

Similar to the equivalence problem is the dual testing problem K against H, termed "testing a (clinically) relevant difference" (Victor 1987). Various authors have suggested reformulating testing simple hypotheses H: 6 = A into in- terval testing problems; that is, testing K: 6 E [A -', A] against H: 6 f [A-1, A]. (For a detailed discussion of this topic, refer to the introduction in Staudte and Sheather 1990.)

This article provides equivalence and interval tests for Lehmann's alternative. These tests are denoted in the se- quel as "extended Savage tests" because they generalize

a well-known test enounciated by Savage (1956) for one- sided hypotheses. Section 2 deals with the finite-sample case, and shows the proposed tests to be unbiased in the case of equal sample sizes n = in. A detailed numerical analysis of the power functions shows that in the case of unequal sample sizes not too far apart (1/2 < n/m < 2), the extended Savage test is still rather useful for practical purposes; that is, the bias is negligibly small. The perfor- mance of the extended Savage test requires extensive com- putations. Therefore, tables with the critical values are dis- played. The fact arises that for small samples, the interval test represents a useful tool for the detection of a signif- icantly relevant difference, whereas an equivalence study should be performed only when a sufficiently large sam- ple is present, because otherwise the power is very small (or A must be chosen rather large). Note that we cannot hope to find another rank test with essentially improved power, because the Savage scores are asymptotically effi- cient. Roughly speaking, this implies that a semiparametric model, as in (3), contains enough information to perform an equivalence study only when the available data set is large. Therefore, Section 3 considers an asymptotic version of the extended Savage test. To perform this test, tables with the required asymptotic means and variances are displayed.

Illustrating the proposed procedures, Section 4 concludes the article with an application of the extended Savage test in the finite-sample case where we detect a relevant difference between the rainfall rates in Melbourne in 1981 and 1983.

2. THE EXTENDED SAVAGE TEST IN THE FINITE-SAMPLE CASE

For the sake of simplicity, we consider here Lehmann's alternative of the form

G(x) = F6(x), 6 > 0, (7)

because the following tests and results can be easily rewrit- ten for the model (3) by means of the sequel argument. From statement (a) of Shorack (1967), we draw that

P(7FmnIG = 1-(1-F)6)= P(7MnnG = F6), (8)

where Wmn denotes an ordering of the rv's XI.... , Xm; Y., .. , Yn and wmn denotes its mirror image. All tests con- sidered in the following are based on a statistic that is a function of the ordered combined sample. Therefore, it is sufficient to deal with model (7). Observe further that Equa- tion (6) remains the same as for the model (3); that is, popu- lation equivalence turns out to be exactly the same concept for the alternatives in (3) and in (7).

Davies (1971) compared numerically various nonpara- metric tests for the simple testing problem H: F = G against K: F * G under Lehmann's alternative. The re- sults can be summarized as follows. In practice, especially as one is unsure that the alternative is exactly of Lehmann's form, one may use Savage's test, because it is the simplest test and tables for it are available. This result motivates our construction of an equivalence and interval test based on

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 4: Equivalence and Interval Testing for Lehmann's Alternative

Munk: Testing for Lehmann's Alternative 1189

the statistic enounciated by Savage (1956):

N N1 T(C?nrn) = EZC2, E - (9)

Z=1 3=i

which leads to the (LMPRT) in the one-sided testing prob- lem. Here Cm,n = (Cl ... ,Cm?+n) denotes a vector of length m + n = N with m elements zero and n elements 1. C. equals zero if the jth element in the ordered com- bined sample belongs to the sample with cdf F; elsewhere,

C. equals 1. The probability of any vector Cm,n is given as (Davies 1971)

P(Cm,n = Cm,n) = T!n!6n

T n+n _ )-

i {y i+ ( zj- I) (10)

For numerical computations, it is convenient to use the fol- lowing recursion formula due to Savage (1956):

r T(Cmi,n) + rt

T(Cm,m) J if the last element of cm,n equals 0, T(Cm,n) -

T(Cm,ni) n

| Tcmn_1 +m + n

if the last element of Cm,n equals 1

(1 1)

with initial values T(cj,o) = 0 and T(co,i) = 1.

We restrict our representation in the following to the case of interval testing, because all results can be transferred to the equivalence problem by dualization; that is, by inter- changing the critical region with the region of acceptance. This leads to an unbiased equivalence test p at the level 1 - a for H against K, provided that the dual-interval test 1 - p is an unbiased test at the level a for K against H. Our aim is to find upper and lower bounds tV,> t,, (depending on the level a and the boundary A of the equivalence region) to determine a critical region

Ca :={t, > T(Cm,n) or T(Cm,n) > tvj (12)

due to the constraint P6(C,,) < a, 6 E H. In (12) the symbol > is used to indicate that randomization may be required when the test statistic equals t,1 (resp. tvjQ to obtain an unbiased test. However, our numerical analysis leads to the fact that the nonrandomized test is a good approximation for an unbiased procedure provided that m and n are not too far apart (see the end of this section). Therefore, we suppress randomization in the following but mention that in the case of very small samples, randomization sometimes may lead to a substantial diminishing of the bias and hence of the type II error.

Let us first consider the case of equal sample sizes nm m. The testing problem (4) remains invariant under transforma- tions inv: 8 v-* 8- on R+ This group induces transforma- tions *: Cnn8>C on the ordered combined sample by

means of

P6(Cnn = Cn,n) = P6-, (Cn,71 = Crl,r (13)

which can be drawn from (10). Here c* is a vector with elements c, = 1- c,. If we choose the critical region C, symmetrical with respect to (13), then we must determine the upper critical value by

v, = iiiax {v: y-/(v) < o, 6 E AH}: 1?v?(w/2)

where w (2n ) and y6 (v) = v {p6 (i) + p6(w + 1 - i)} is the power function of a symmetrical two-sided test with critical values tv and tW+1q). Here the probability of an event t, of the Savage statistic is abbreviated as P6(i): p6(T(Cm,m) = t), where ti < t3 for 1 < i < <j u . We obtain the resulting interval test as

p(t) { t < <tv or t > tv,' (14)

where v, = v. and vul = w + 1 - v,. In the sequel we will denote the power function of (14) by ,3, (6) = 6(v,), 6 > 0. The following theorem ensures that the obtained test is unbiased for K against H at the level <3,,(A) (a. The proof is deferred to the Appendix.

Theorem 2.1 (Extended Savage Test). Assume Leh- mann's alternative and let the sample sizes of the samples

. m and (Y)2=1.be equal n = 7n > 2. Then the interval test (o in (14) for the testing problem

H: 6 E [A-1, A] against K: 6 f [A-1, A]

has the following properties: (a) the power function is log- symmetrical; that is, iv3(6) = Ov(6-1) for all 6 > 0; (b) the unique minimum of the power function is attained at 6 = 1; (c) the power function is strictly unimodal, and p is an unbiased test at the level (3, (A) < a.

We mentioned earlier that the nonrandomized version of the extended Savage test represents a rather good approxi- mation of an unbiased test. We found numerically, for even small sample sizes, just a small deviation from the preas- signed level a, which can be regarded as negligible from a practical standpoint. (If rn,n > 5 and 1/2 < m/n < 2, then the deviation from the preassigned level oa = .05 at the boundary A of the hypothesis was always < .005. provided that 1 < A < 1.25). Note that we obtain as a consequence of Theorem 2.1 the unbiasedness of the common Savage test for the simple hypothesis H: F -G against K: F 1t G un- der Lehmann's alternative, whereas in the case of unequal sample sizes we observed a bias in the power function of the two-sided Savage test. (This is an analogous result to that of Sugiura [1965] for the Wilcoxon test under shift alternatives G(x) = F(x - 8).)

Figure 1 displays the power function of the interval test for m = n = 10. For a comparison, the figure also similarly extends the Mann-Whitney-Wilcoxon test and displays the power function. Observe that the power advantage of the first test is considerable. Figure 2 illustrates the case of unequal sample sizes (mr= 4, nm 8). Observe a small bias ( .005) between 8 .8 and 8= .9.

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 5: Equivalence and Interval Testing for Lehmann's Alternative

1190 Journal of the American Statistical Association, September 1996

Power 0.60 -

0.55

0.50

0.45-

0.40 j

0.35 ,

0.30

0.25

0.20

0.15-

0.10 -

0 .0 5- - - =-- . - - _ _ _ _ _

0.00__ _ _ _ _ _ _ _ _ _ _

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Delta

Figure 1. Power Function of the Extended Savage Test (Solid Line) and the Extended Mann-Whitney-Wilcoxon Test (Dashed Line) for Interval Testing Under Lehmann's Alternative at the Level a = .05 in the Case of Equal Sample Sizes n = m = 10 and A = 1.25.

Effective numerical computation of the power function and the critical values is essentially based on the recursion (11), similar to that of Shorack (1966, 1967). The imple- mentation of this algorithm can be drawn from Neumann (1988). If m, n> 20, then the time of computation becomes considerably large; therefore, in the following section we derive asymptotic tests.

To help perform the extended Savage tests easily, Table 1 displays the required critical values for the interval test for various sample sizes and interval boundaries A at a preas- signed level of significance a = .05. The application of this table is demonstrated in Section 4. By means of dualization, we could obtain the corresponding result to Theorem 2.1 for equivalence testing. Nevertheless, in Table 1 we suppress the critical values for the equivalence test, because we ob-

served in the case m, r< K 20 that the maximum power does not exceed 2a whenever A E [1, 1.25]. We mention that we cannot hope to find another rank test with essentially improved power, because the Savage test is locally optimal under Lehmann's alternative in the one-sided case. There- fore, from a practical standpoint, an equivalence study un- der Lehmann's alternative should be performed only when the experimenter has sufficiently large data at his disposal. (We define "sufficiently large" in terms of sample sizes re- quired to control the type II error in the next section.)

3. THE EXTENDED ASYMPTOTIC SAVAGE TEST

To avoid notational confusion, we return in the asymp- totic case to the model (3) as discussed by Davies (1971, (1.2)). Remember that we can switch between the mod-

Power 0.175

0.150-

0.125

0.100 \

0.075-

0.050

0.025 _- r-___T__r____-___,- ___-___-________ --___r

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

Delta

Figure 2. Power Function of the Extended Savage Test (Solid Line) for Lehmann's Alternative at the Level ag = .05 in the Case of Unequal Sample Sizes n = 4, m = 8, and/?A = 1.1.

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 6: Equivalence and Interval Testing for Lehmann's Alternative

Munk: Testing for Lehmann's Alternative 1191

Table 1. Lower and Upper Critical Values of the Extended Savage Test for the Interval Testing Problem in the Finite-Sample Case

Equivalence Bound

Sample Size .1 = 1.15 1.2 1.25

m n tvu tvo tvu tvo tvu two tvu two

5 5 2.3885 7.6115 2.2814 7.7187 2.2814 7.7187 2.2814 7.7187 4 6 3.2341 8.4393 3.2008 8.5071 3.2008 8.5071 3.2008 8.5071 5 6 3.1340 8.7431 3.1006 8.7736 3.1006 8.7736 3.0768 8.7847 6 6 3.0307 8.9693 3.0248 8.9752 3.0173 8.9827 2.9930 9.0070 4 7 4.0372 9.5102 4.0372 9.5102 4.0134 9.5546 4.0134 9.5546 5 7 3.9851 9.7951 3.8963 9.8130 3.8601 9.8268 3.8506 9.8359 6 7 3.8653 10.0232 3.8133 10.0558 3.8057 10.0863 3.7744 10.1065 7 7 3.7648 10.2352 3.7498 10.2502 3.7046 10.2954 3.6648 10.3352 4 8 4.9538 10.5372 4.9161 10.5789 4.9161 10.5789 4.8705 10.6039 5 8 4.8089 10.8276 4.7859 10.8526 4.7339 10.8847 4.7025 10.9200 6 8 4.6867 11.0970 4.6502 11.1200 4.6196 11.1561 4.5752 11.1894 7 8 4.5815 11.3244 4.5438 11.3514 4.5018 11.3875 4.4537 11.4383 8 8 4.4942 11.5058 4.4498 11.5502 4.4109 11.5891 4.3589 11.6411 4 9 5.8128 11.5704 5.7993 11.5906 5.7084 11.6149 5.7057 11.6224 5 9 5.6787 11.8724 5.6212 11.9140 5.5878 11.9390 5.5454 11.9835 6 9 5.5291 12.1546 5.5002 12.1954 5.4470 12.2196 5.3934 12.2704 7 9 5.4139 12.3901 5.3793 12.4258 5.3182 12.4687 5.2643 12.5189 8 9 5.3171 12.5919 5.2709 12.6335 5.2141 12.6828 5.1550 12.7437 9 9 5.2314 12.7686 5.1824 12.8176 5.1253 12.8747 5.0574 12.9426 4 10 6.7271 12.5803 6.6699 12.6156 6.6561 12.6387 6.5970 12.6561 5 10 6.5454 12.9244 6.5212 12.9561 6.4557 12.9866 6.4184 13.0228 6 10 6.4005 13.2112 6.3541 13.2450 6.3002 13.2847 6.2367 13.3300 7 1 0 6.2703 13.4484 6.2220 13.4947 6.1592 13.5370 6.0965 13.5934 8 10 6.1654 13.6614 6.1064 13.7108 6.0424 13.7687 5.9709 13.8324 9 10 6.0659 13.8490 6.0089 13.9034 5.9404 13.9693 5.8616 14.0399

10 10 5.9820 14.0180 5.9235 14.0765 5.8520 14.1480 5.7721 14.2279

els (3) and (7) by means of Equation (8). Let AN = m/ (m + n), where 1imn,m,, AN= A E (0, 1), and let Fm and Gn denote the empirical distribution functions of F and G. The empirical distribution function of the combined sam- ple is defined by HN = ANFm + (1 - AN)Gn with mean H = ANF + (1 - AN)G. Consider now the scores (Sav- age 1956) JN(i/(N + 1)) = EN -i+1 l, which can be shown to be asymptotically efficient (Capon 1961), and the statistic

TN =J JN (N + I HN(X)) dFm (x)

N N = E zi E j -1,

i=1 j=N-i+l

where Zi = 1 - Ci and Ci have been defined following (9). We know that (Chernoff and Savage 1958)

TN -UIN -DUN(,) TN-N

where U is a standard normally distributed rv with mean

UN(6) j J(H(x)) dF(x)

-ln{AN-ANt + (1-AN) (1-t)6} dt

(15)

and variance

2 2(1- AN) (JN N

| f f [1-(1-F(x))6](I - F(y))6 lj]J{x<Y} [1 - H(x)][I - H(y)]

x dF(x) dF(y) + - AN

AN

x ff F(x)[I - F(y)] ]{x<y} [1 - H(x)][I - H(y)]

x d[I - (1 - F(x))6] d[1 - (1 - F(y))6]}

2(1 -AN) 1 F(1-t) 6 ft [1 ((1 - S)]

N J[h(t) Jo h(s)

+ (1-AN) t/6 ft AN 9g(t) J

1-(1 -s)/6 ds1 dt} X g(s) J J

Here the functions g and h are defined as h(x) := 1- ANx -(1 - AN)(I -h(e -x)e) and g(x) := o- AN(th- (Sv- x)116)- (I -AN)x. The generating function for the Savage

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 7: Equivalence and Interval Testing for Lehmann's Alternative

1192 Journal of the American Statistical Association, September 1996

scores is denoted by J(u) - - ln(l - u), u E (0, 1), which is related to JN by JN(i/(N + 1)) = EJ(U(M)), where U( is the order statistic from the uniform distribution over (0,1). Both moments (15) and (16) do not admit a represen- tation in a closed form, with exception of the case 6 1, where [N l and o2 = n/m.

Assume for the moment equal sample sizes, and let us construct an asymptotic interval test similarly to the finite- sample case. To this end, define a symmetrical test around Vu- ALN(l)=u - by

1: TN vU -2 or TN>VU (16) 0: vu- 2<TN< <vu,

where the critical value vu is to be determined from the equation

P(VU) u - _ (v U /N(A))

? ( VU-IN(A)+ = 2 (17)

. OWN (A) for a preassigned level ae E (0,1) and an equivalence bound A > 1.

The corresponding equivalence test at the level ae can be obtained by dualization; that is, choose the test 1 - (o(TN) and replace ae by 1 - ae in (16). We are now in the position to prove the asymptotical unbiasedness of the proposed test (17) in the case of equal sample sizes.

Theorem 3.1 (Extended Asymptotic Savage Test). Let the sample sizes be asymptotically equal; that is, limNO, AN = 1/2. For any level ae E (0, 1) and any ar- bitrary equivalence bound A > 1, the following properties hold:

a. The test (16) is an asymptotically unbiased level ae test for the interval testing problem

H: 6 E [A,-1, A] against K: 6 f [A1, A].

The power function of this test is strictly unimodal with a unique minimum at 6 = 1.

b. The dual test 1 - ((TN) is an asymptotically un- biased level 1 - ae test for the equivalence problem K against H with a strictly unimodal power function that has a unique maximum at 6 = 1.

The proof is deferred to the Appendix. Note that in the case limN,O AN * 1/2, these prop-

erties fail, however, the tests still remain consistent. Ad- ditional detailed numerical investigations showed that for 1/3 < AN < 2/3, the test (16) still represents a good ap- proximation to an unbiased test, just as in the finite-sample case.

To perform the interval test, it is convenient to rewrite the critical region of (16) as

{TN: JP(TN)J <_ ac, (18)

where P(.) denotes the p-value function defined in (17). The critical region (18) coincides with that of the test (16), because P(.) is a strictly increasing function. In a similar manner, we obtain the critical region of a level-ae equiva- lence test as {TN: IP(TN) I 1 - al. For the application of these tests, Table 2 shows the required numerical values of the mean AN(A) in (15) and the scaled variance No2(A) in (16) for various proportions of the sample sizes m and n and bounds A > 1. To evaluate numerically the integrals in (15) and (16), we used Romberg's procedure (Press, Flannery, Teukolsky, and Vetterling 1989), which was controlled by a Gaussian-Laguerre type of quadrature with 68 points of support (Stroud and Secrest 1966). The deviation between both procedures is always less than 10'.

We mentioned already in Section 1 that a reasonable ap- plication of the equivalence test requires a considerably large sample size. To guarantee a power in the maximum alternative 6 = 1 of .7 (.8) (.9) of an equivalence test at the level ae with equivalence bound A = 1.25, we found numerically a required sample size of 284 (340) (432) in each group. Because of the above-mentioned asymptotic efficiency of the Savage scores, we cannot hope to improve this result significantly with another test. This leads to the conclusion that an equivalence study under the assumption of Lehmann's alternative should be performed only when large samples are available. Of course, we can diminish the required sample size for a given type II error rate by in- creasing the equivalence bound A. But this implies a loss of information of the equivalence of the treatments from a medical standpoint. For example, A = 2 implies, under the assumption of exponential distributed data, that the means of the treatments differ by a maximum amount of 100%.

Let us conclude by comparing the extended Savage test with the log-rank test for bioequivalence assessment

Table 2. Asymptotic Means I,N and Scaled Variances Nc2N of the Savage Statistic TN for Various Equivalence Bounds A\ N~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Scaling parameter Equivalency bound Variance Mean Scaling parameter Equivalency bound Variance Mean

AN Nu 2 (,\) 1N (A) AN Nu 2 (,\) 1N (,\)

.50 1.10 .98444 1.04750 .60 1.20 .58958 1.07090

.50 1.15 .96739 1.06942 .60 1.25 .56711 1.08604

.50 1.20 .94617 1.09017 .65 1.10 .50168 1.03279

.50 1.25 .92212 1.10980 .65 1.15 .48144 1.04763

.55 1.10 .79059 1.04255 .65 1.20 .46095 1.06153

.55 1.15 .77055 1.06206 .65 1.25 .44071 1.07454

.55 1.20 .74810 1.08045 .70 1.10 .39225 1.02798

.55 1.25 .72421 1.09779 .70 1.15 .37365 1.04056

.60 1.10 .63247 1.03764 .70 1.20 .35541 1.05231

.60 1.15 .61160 1.05479 .70 1.25 .33782 1.06328

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 8: Equivalence and Interval Testing for Lehmann's Alternative

Munk: Testing for Lehmann's Alternative 1193

with right-censored survival times, as suggested by Wellek (1993b). When no censoring is present, the log-rank test for bioequivalence is based on the maximum likelihood es- timate (MLE) d from the log-likelihood equation (Wellek 1993b, p. 879)

N

log L(3) = n3 - log(nrii + n2i exp(I3)), i=l1

where nr1 denotes the number of items at risk in the vth sample at the jth smallest failure time t(3). The connection of the asymptotic Savage test to the corresponding score function U(/) = d log L/d13 is given by U(0) - mTN

(Lawless 1982, p. 351). If we perform the log-rank test for equivalence, then the variance of the MLE 13 must be esti- mated by the information I(.) evaluated at , which yields the correct variance only asymptotically. However, the ab- sence of censoring allows us to calculate the variance aN of TN exactly as is done in (16). This leads to the conjecture that in the uncensored case, the extended Savage test ap- proximates the nominal level a more accurately than does the log-rank test. This is supported by a simulation exper- iment at nominal level a = .05, where we compared the power and the level of the extended Savage test and of the log-rank test without censoring. We assumed for the under- lying survivor functions two exponential distributions with ratio of scales 6. For example, in the case m = 30, n = 50 with equivalence bound A\ 1.25, we obtained, with 10,000 replications, the actual level of the log-rank test as .035 at 6 = .8 and .56 at 6 = 1.25. The actual level of the extended Savage test turns out to be .47 and .51. We also observed a slight power advantage for the extended Savage test; for example, at 6 = 1, the power was simulated as .26, com- pared to .22 for the log-rank test. The power advantage of the extended Savage test against the log-rank test becomes larger as the sample size decreases. Whenever n, m > 80, we observed no difference of practical interest between the two tests. Another reason why the extended Savage test should be preferred in the uncensored case is its simpler performance. The log-rank test needs numerical evaluation of the zero of the score function U to obtain /, whereas the extended Savage test requires only Table 2.

4. EXAMPLE

This section presents an illustrative application of the ex- tended Savage test in the finite-sample case. Staudte and Sheather (1990, p. 22) discussed the following example of Melbourne's daily rainfall during the years 1981-1983. They suggested an exponential distribution with density fA(x) = A-'exp(-x/A) as an approximation of the dis- tribution of the daily rainfall for the winter months. This is in accordance with model (3) (which entails the family of exponential distributions). Because of the dependence of the rainfall in sequential days, we selected only each fifth measurement. We obtain from Staudte and Sheather (1990, p. 313) the following outcomes in tenths of millimeters:

1981: 64,16,110, 8,118, 80, 6,14, 36

1983: 4.1, 2, 6,24,12, 2,16,4.

To transform the data into the model G = F, we multi- ply each outcome with -1, which maps an ordering 1rmn of the rv's X1,. . ., Xm; Yi, .. ., Y, onto its mirror image 7r*mn and hence we can apply (8). If we denote the means of the rainfall by Ai, i = 1, 2, then it is our aim to test the hypoth- esis H: .8 < A1/A2 < 1.25. We obtain T(C9,8) = 13.196, which leads to the rejection of H at the level a - .05, where the critical value 12.7437 is drawn from Table 1. We observe a significantly relevant difference between rainfall in 1981 and 1983 of a relative amount that is at least 25%, even under the weak assumption of Lehmann's alternative. Just as for the model (3), we obtain the same condition as in (6) for the model G = FP. This implies that the maxi- mum of the difference between both distributions is at least c = .08. If we want to compare this result with that of the UMPI test (Dette and Munk 1996) under the assumption of exponential distributed data, then we must evaluate

T(x, y) = T2Z n Y 5=73X

which is distributed according to a scaled F distribution F2m,2n(Al/A2) with 2m, 2n degrees of freedom and scaling parameter A1/A2- We reject H whenever

T ? [Ci, C2], (19)

where the constants Ci are determined uniquely by the two equations

F2m,2n(C2/Ai) - F2m,2n(Ci/Ai) = - a i 1,2,

(20)

where \1 = .8 and Z\2 = 1.25 denote the equivalence bounds. We mention that the procedure suggested by Bris- tol and Desu (1990) coincides with the UMPI test (19) only in the case of equal sample sizes, where the critical values Ci are related by C, = 1/C2. For the case of unequal sam- ple sizes, this constraint was also assumed by Bristol and Desu (1990), which leads necessarily to biased tests or to tests that exceed the nominal level a, depending on whether m < n or m > n. We evaluate from (20) the critical values as C, 1/4.6 .227 and C2 = 4.4, where the deviation between the actual level and the nominal level a is less than i0-3. In accordance with (19), we reject H at level a = .05. The minimal level for which we could reject H under the assumption of exponential distributed data turns out to be

.028, and under the more general assumption of Lehmann alternatives, the minimal level for the extended Savage test turns out to be .04. We can draw from this example that under the assumption of Lehmann alternatives, the loss of level and power of the extended Savage test compared to the UMPI test in the exponential case is surprisingly small.

5. DISCUSSION

We have examined the behavior of an extended version of the Savage test for the equivalence and the interval testing problem under Lehmann's alternative. Two major conclu- sions have emerged:

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 9: Equivalence and Interval Testing for Lehmann's Alternative

1194 Journal of the American Statistical Association, September 1996

1. In the small-sample case, Lehmann's model is not ap- propriate for use in an equivalence study, because the power of the suggested test is too small. This demand is caused by the generality of the model, because we cannot hope to improve the extended Savage test substantially. Therefore, the experimenter should make sure to have sufficiently large data.

2. In contrast to the case of equivalence testing, the pro- posed interval test represents a powerful tool for detecting a relevant difference between two distributions that belong to Lehmann's alternative. This test turns out to be unbiased whenever the sample sizes are equal.

In addition, we conclude that the finite (asymptotic) ex- tended Savage test is unbiased, provided that the sam- ple sizes are (asymptotically) equal. The bias is negligi- bly small whenever the sample sizes are not too far apart (1/2 < m/n < 2). Finally, the performance of the sug- gested tests is very easy, using the displayed tables that contain the critical values in the finite-sample case and the asymptotic variances and means for the large-sample tests.

APPENDIX: PROOFS

Proof of Theorem 2.1

The proof is by induction for v, C {1,..., 2 2)} If we ab- breviate lrrn :H =H=1 (m + i6) and Jim = 171 (i + rn6),

then we obtain for v, = 1 that i3i(6) p5(1) + p5(w) m!6m7Trl + m!?iJ =3i(6-1), where we have used (10). This proves property a. To prove properties b and c, we evaluate the derivative of 0,, (6) as

,B' (6) m !{ (m6 n2-1 r_ m E mZIU (m + j6)) 7r

m m[m6m\ m "m Z m-IMU (j+m) M

= m! E [ EllrI ( + M6) -

2=1 [ =1

m ir

-i+m6" (m+j6) m

x J7J [(m + j6)(j + m6)]. J =1 ~ ~ 3=

j~~~~~1-

Taking into account that for any 6 > 1 and j < m, it holds that j + m6 > m + j6 (where equality occurs exactly in the case j = i), we obtain

r >0 for 6>1 and r >1,

31(6)j =0 for 6=1 or m=1

-< for 0 < 6 < 1 and m > 1,

which proves b and c for va 1. Assuming now that properties a-c hold true for v - 1. Let us

denote the unique vector Cmm for which holds T(cmm) =VI as cl, and analogously we define cw by means of T(cU' ) =v,..

Obviously, we have cl = O X c = 1, which leads to

k k

I*1 + Ec-* = k j< j _ 1 j=1

and k k

j Cl < E C W

k= 1, m. (A.l1)

_1=1 ]=I

We evaluate the power function of the extended Savage test as

03i (6) = P6(C = C1 ) + P6(C = cw ) + 0,i3,-1(6). Observe further that the sum of two (strictly) unimodal functions both with minimum at 6 = 1 is again a (strictly) unimodal func- tion with minimum at 6 = 1, and it remains to prove a-c for P, (C = c1 ) + P, (C = cw ) because the Theorem 2.1 holds true for 0,3v-j(6) by induction. Using (21), we prove property a as follows:

P6 (C = C1) + P6(C = C')

(m!)2 6b 2m

_ i + (-1ES Cj*

H<1= [i?(-1)z icr (M!)26mn

2m- Wi + (E-16 = CW ]

(M!)26,n

r2mn Fi + (6 - CW*

-S=1g)

+ H(MI i?(? 1)Q z ic) H1-1 3i+( )( = J)

=PI/,(C =cw + pI/,5(C =C )

To prove properties b-c, we require the following result (which is eq. 7.a.6 in Savage 1956).

Lemma 5.1. When two rank orders Cmn and c' n are identical except in their kth and k + ith elements, which are (01) for Cr,n

and (10) for c' ,, then their probabilities are related as

P6(C = Cmn)

[1+(6-1){k+ (6-1) c } j P (C = C'nr)

With the abbreviation u(6) := 1 + (6- 1)(k - Lk 1 C3 +

6 Ek=l c )-1 and property a, we obtain

f(6) P6(C = C1) + P'(C = CW*)

P6 (C = C 1) + PI/,5(C= CI*

z(6)P6(C = c ) + uL(6-)Pl/,(C = c1)

Differentiation of zt(6) yields zi(6) = k(k - ,k1c3 +

6Eik=l C)-2 > 0 and u'(6-1) = _62U'(6) < 0. Hence we ob-

tain for 6 > 1 that u'(6) + u'(8-1) > 0, and an easy calculation

yields u(6) > u(6-1) where we have used E. 1c= < k/2, which follows from (21). We conclude from the Lemma 5.1 that inter- changing of (01) into (10) (which denotes the tuple of the kth and k + ith element) increases the Savage statistic; that is, by the in-

duction hypothesis, T(cl') can be assumed to be an element of the

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 10: Equivalence and Interval Testing for Lehmann's Alternative

Munk: Testing for Lehmann's Alternative 1195

critical region with critical value v, - 1. In an analogous manner, we can assume, by the induction hypothesis, that the derivative of P6(C = ci') for 6 > 1 increases faster in 6 as P116(C = ci') decreases; that is, we obtain f'(6) > 0 for 6 > 1. This leads finally to O' (6) = f '(6) + O ,- 1 (6) > 0, where 6 > 1. An application of property a yields b, and we obtain c because /3,w (6) is continuous.

Proof of Theorem 3.1

Assume that AN = 1/2 and let p and 02 denote the mean and variance in (15) and (16), where we have replaced AN by 1/2. A straightforward calculation leads, under contiguous alternatives bL(6) = p (A\) + Q(n-1/2), to the asymptotic power function of the interval test (16),

Vu

+ (-vu -(6) + 2) + o(1) (A.2)

We immediately obtain i3v. (A\) ao. It remains to show that i3vc. (6) is asymptotically unimodal and log-symmetrical. There- fore, we evaluate the derivative (suppressing the dependence of 6)

3/(6) = P f V{(Vu-U)_ (-Vu -u+2)}

+ 12{(Vu 11)sO(Vu1- )

- (2- - t)p (-VU + 2) } + o(l), (A. 3)

where S? denotes the density of b. A simple calculation shows that ,u' > 0, and hence ,u maps bijectively and continuously R+ onto the interval (1 - In2, 1 + 21n2) C R. This entails that ,u maps intervals onto intervals, and hence an interval testing problem with corresponding interval [A\-1, A\] can be expressed as an analogous problem concerning the interval [Lu(AJ-1), p (A))]. Now we are in the position to prove the unimodality of the asymptotic power function. Straightforward calculation shows that a is a strictly unimodal function with the symmetry property v(6) = u(6-1) obtaining the unique minimum at 6 = 1. Similarly, we obtain bL(6) = 1 - bL(6-1), provided that AN = 1/2. This reveals that the power function is symmetric; that is, 3(6-1) = 3(6). Therefore, we can restrict our analysis to the case 6 > 1. To see that 3(6) in (A.2) is strictly increasing, we mention that po((vu - ,p)>-1) > so((-vu -b + 2)o<1) because Ivu - j < I - v ,u + 21, where u, vu > 1. Taking into account that ,u' > 0 and u' > 0 for 6 > 1, we conclude that both summands in (A.3) are positive. This proves that v/3{ (6) > 0 in (A.3), provided 6 > 1. The analogous statement for the equivalence test is obtained by dualization.

[Received June 1994. Revised September 1995.]

REFERENCES

Anderson, S., and Hauck, W. W. (1990), "Consideration of Individual Bioequivalence," Journal of Pharmacocinetics and Biopharmaceutics, 18, 259-273.

Bofinger, E., Hayter, A. J., and Liu, W. (1993), "The Construction of Up- per Confidence Bounds on the Range of Several Location Parameters," Journal of the American Statistical Association, 88, 906-911.

Bristol, D. R., and Desu, M. M. (1990), "Comparison of Two Exponential Distributions," Biometrical Journal, 3, 267-276.

Capon, J. (1961), "Asymptotic Efficiency of Certain Locally Most Powerful Rank Tests," Annals of Mathematical Statistics, 32, 88-100.

Chernoff, H., and Savage, I. R. (1958), "Asymptotic normality and effi- ciency of certain nonparametric test statistics," Annals of Mathematical Statistics, 29, 972-994.

Com-Nogue, C., Rodary, C., and Patte, C. (1993), "How to Establish Equiv- alence When Data are Censored: A Randomized Trial of Treatments for B non-Hodgkin's Lymphoma," Statistics in Medicine, 12, 1353-1364.

Cox, D. R. (1972), "Regression Models and Life Tables" (with discussion), Journal of the Royal Statistical Society, Ser. B, 34, 187-202.

Davies, R. B. (1971), "Rank Tests for Lehmann's Alternative," Journal of the American Statistical Association, 66, 879-883.

Dette, H., and Munk, A. (1996), "Sign Regularity of a Generalized Cauchy- Kernel With Applications," Journal of Statistical Planning and Infer- ence, (to appear).

Dobbins, T. W., and Thiyagarajan, B. (1992), "A Retrospective Assessment of the 75/75 Rule in Bioequivalence," Statistics in Medicine, 11, 1333- 1342.

Dunnett, C. W., and Gent, M. (1977), "Significance Testing to Establish Equivalence Between Treatments With Special Reference to Data in the Form of 2 x 2 Tables," Biometrics, 33, 593-602.

Giani, G., and Finner, H. (1991), "Some General Results on Least Favor- able Parameter Configurations With Special Reference to Equivalence Testing and the Range Statistic," Journal of Statistical Planning and Inference, 28, 33-48.

Giani, G., and Straiburger, K. (1994), "Testing and Selecting for Equiv- alence With Respect to a Control," Journal of the American Statistical Association, 89, 320-329.

Hauck, W. W., and Anderson, S. (1992), "Types of Bioequivalence and Related Statistical Considerations," Biostatistics Technical Report 19, University of California, San Francisco, Dept. of Epidemiology and Bio- statistics.

Holder, D. J., and Hsuan, F. (1993), "Moment-Based Criteria for Deter- mining Bioequivalence," Biometrika, 80, 835-846.

Hsu, J. C., Hwang, G., Liu, H.-K., and Ruberg, S. J. (1994), "Confidence Intervals Associated With Tests for Bioequivalence," Biometrika, 811, 103-114.

Jennison, C., and Turnbull, B. W. (1993), "Sequential Equivalence Testing and Repeated Confidence Intervals With Applications to Normal and Binary Responses," Biometrics, 49, 31-43.

Lawless, J. F. (1982), Statistical Models and Methods for Lifetime Data, New York: John Wiley.

Lehmann, E. L. (1953), "The Power of Rank Tests," Annals of Mathemat- ical Statistics, 24, 23-43.

Mandallaz, D., and Mau, J. (1981), "Comparison Different Methods for Decision Making in Bioequivalence Assessment," Biometrics, 37, 213- 222.

Metzler, C. M. (1974), "Bioavailability-A Problem in Equivalence," Bio- metrics, 30, 309-317.

Neumann, N. (1988), "Some Procedures for Calculating the Distribution of Elementary Nonparametric Test Statistics," Statistical Software Newslet- ter, 14, 120-126.

Press, W. H., Flannery, B. P., Teukolsky, S. A., and Vetterling, W. T. (1989), Numerical Recipes in Pascal, Cambridge, U.K.: Cambridge University Press.

Rogers, J. L., Howard, K. I., and Vessey, J. T. (1993), "Using Significance Tests to Estimate Equivalence Between Two Experimental Groups," Psy- chological Bulletin, 113, 553-565.

Savage, I. R. (1956), "Contributions to the Theory of Rank Order Statistics-The Two-Sample Case," Annals of Mathematical Statistics, 590-615.

Selwyn, M. R., and Hall, N. R. (1984), "On Bayesian Methods for Bio- equivalence," Biometrics, 40, 1103-1108.

Shorack, R. A. (1966), "Recursive Generation of the Distribution of the Mann-Whitney-Wilcoxon U-Statistic Under Generalized Lehmann Al- ternatives," Annals of Mathematical Statistics, 37, 284-286.

(1967), "Tables of the Distribution of the Mann-Whitney- Wilcoxon U-Statistic Under Lehmann Alternatives," Technometrics, 9, 666-677.

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions

Page 11: Equivalence and Interval Testing for Lehmann's Alternative

1196 Journal of the American Statistical Association, September 1996

Stroud, A. H., and Secrest, D. (1966), Gaussian Quadrature Formulas. Englewood Cliffs, NJ: Prentice-Hall.

Sugiura, N. (1965), "An Example of the Two-Sided Wilcoxon Test Which is Not Unbiased," Annals of the Institute of Statistical Mathematics, 17, 261-265.

Staudte, R. G., and Sheather, S. J. (1990), Robust Estimation and Testing, New York: John Wiley.

Victor, N. (1987), "Relevant Differences and Shifted Nullhypotheses," Methods of Information in Medicine, 26, 109-116.

Wellek, S. (1993a), "Basing the Analysis of Comparative Bioavailability Trials on an Individualized Statistical Definition of Equivalence," Bio- metrical Journal, 35, 47-55.

- (1993b), "A Log-Rank Test for Equivalence of Two Survivor Func- tions," Biometrics, 49, 877-881.

This content downloaded from 195.34.79.174 on Mon, 16 Jun 2014 15:22:31 PMAll use subject to JSTOR Terms and Conditions