IEEE TRANSACTtONS ON SOFTWARE …...Ideally, we would like to assess the effectiveness of test- ing in terms of the faults detected. Faults are the software defects caused by programmer

IEEE TRANSACTtONS ON SOFTWARE ENGINEERING, VOL. 22, NO. 2, FEBRUARY 1996 109

On the Expected Number of Failures Detected by Subdomain Testing and Random Testing

Tsong Yueh Chen and Yuen Tak Yu

Abstract-In this paper, we investigate the efficacy of subdomain testing and random testing using the expected number of failures detected (the E-measure) as a measure of effectiveness. Simple as it is, the E-measure does provide a great deal of useful information about the fault-detecting capability of testing strategies. With the E-measure, we obtain new characterizations of subdomain testing, including several new conditions that determine whether subdomain testing is more or less effective than random testing. Previously, the efficacy of subdomain testing strategies has been analyzed using the probability of detecting at least one failure (the P-measure) for the special case of disjoint subdomains only. On the contrary, our analysis makes use of the E-measure and considers also the general case in which subdomains may or may not overlap. Furthermore, we discover important relations between the two different measures. From these relations, we also derive corresponding characterizations of subdomain testing in terms of the P-measure.

Index Terms-Partition testing, random testing, software testing, subdomain testing.

1 INTRODUCTION

T HE set of all relevant inputs to a program is usually referred to as the program’s input domain. Exhaustive

testing which tests all possible inputs is generally impracti- cal since the input domain is normally very large. Typically, testers can only afford to test a small portion of the input domain.

In the past decades, a large number of test data selection strategies has been proposed. These include the most popularly known code coverage methods (such as branch testing and path testing), specification-based strategies (see, for example, [lo], [ll], [13]), the data flow criteria [l], [12], 1151, and domain testing strategy [18].

Almost all of these strategies share a common character- istic: the program’s input domain is divided into subsets, called subdomains, and one or more representatives from each subdomain are selected to test the program. This approach to the selection of test data is commonly referred to as partition testing [5], [9], [17]. In general, however, the subdomains formed may not be all disjoint, and hence they are not the true partitions in the formal mathematical sense. For this reason, the term subdomain-based festing, or simply subdomain testing, which was suggested by Frank1 and Weyuker [6], will be used throughout this paper for the general case where the subdomains may or may not be disjoint. Also, the terms partition and partition testing will be reserved for the special case when all subdomains are actually disjoint.

The motivation behind the use of subdomain testing is that each subdomain formed according to a given test crite-

l T.Y. Chen and Y.T. Yu are with the Department of Computer Science, University of Melbourne, Parkville 3052, Australia. E-mail: [email protected].

Manuscript received March 1995; revised December 1995. For information on obtaining reprints of this article, please send e-mail to: trans- [email protected], and reference IEEECS Log Number S96001.

rion is more or less homogeneous [9], [17], in the sense that either all its members cause the program to succeed or all cause it to fail, and therefore a representative from each subdomain is sufficient to test the program. Unfortunately, homogeneity represents the ideal, and in practice this is extremely difficult, if not impossible, to achieve. Therefore, subdomains must be sampled often enough to improve the chance of detecting faults.

In contrast to the systematic approach of subdomain testing, random testing simply requires test cases to be ran- domly selected from the entire input domain. An advan- tage of using random testing is that quantitative estimates of a program’s operational reliability may be inferred. Moreover, random testing does not bear the overheads of subdividing the input domain and of keeping track of which subdomains have been tested or not.

Recently, there has been much interest in the analysis of the effectiveness of subdomain testing and random testing [2], [5], [6], [7], [9], [17]. Various measures have been in use, each based on a different intuition. In this paper, we are using the expected number of failures detected (hereafter abbrevi- ated as the E-measure) to analyze subdomain and random testing. A closely related but different measure, called the expected number of errors by Duran and Ntafos, has been used in their simulation studies [5].

The rest of this paper is organized as follows. Section 2 describes the related work. Section 3 discusses the use of the E-measure for our analysis. Section 4 outlines the underlying assumptions and the notation used in our model. Section 5 analyzes subdomain testing using the E-measure and compares it with random testing. Section 6 explores the relations between the E-measure and another metric called the P-measure defined in Section 2, and derives from these relations many results in terms of the P-measure. Section 7 summarizes and concludes this paper.

0096-5589/96$05.00 61996 IEEE

110 IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 22, NO. 2, FEBRUARY 1996

2 RELATED WORK observed departure from the specified behavior of the software. Normally, only failures are revealed by testing, and the associated faults are usually identified during de- bugging. A program fault may cause a number of failures, while a program failure may have more than one fault associated with it. There is no simple relationship between faults and failures. This makes it difficult to use faults in measuring the effectiveness of testing. Many reliability metrics, for instance, are formulated in terms of failures rather than faults (see, for example, [16]).

Historically, whether random testing is a valuable strategy has been rather controversial. Random testing is regarded by many as a worst strategy, probably because it does not make use of any information about the program or its specifications. For instance, Myers [14] described random testing as “probably the poorest [test case design] method- ology.” On the other hand, Girard and Rault ]8] proposed random testing as a valuable test case generation scheme. Recently, statistical usage testing, which is a form of random testing using the expected operational profile of the software, is strongly advocated as an integral part of the Cleanroom Engineering process (see, for example, [4]).

Most of the related work on subdomain testing deals only with the special case when all subdomains are disjoint, that is, partition testing. In their simulation and empirical work on the comparison of partition and random testing, Duran and Ntafos [5] concluded that the two methods are almost equally effective, even under assumptions that seem to favor partition testing. Their results also indicate that random testing may often be more cost effective than partition testing.

Considering that “the be&systematic method is little im- provement over the worst” as being counterintuitive, Hamlet and Taylor [9] performed similar experiments and obtained similar results.

Instead of performing simulation and empirical studies, Weyuker and Jeng 1171 formally analyzed the conditions that affect the efficacy of partition testing and compared it to random testing. In their analysis, they used the probability ofde- tecting at least one failure (the P-measure) to quantify the fault- detecting ability of a testing strategy. Subsequently, some of their results have been generalized by Chen and Yu [2], [3].

However, as observed by Weyuker and Jeng themselves, their analysis [17] suffers from a number of limitations. Of particular significance are two issues they raised in con- cluding their paper, namely the assumption of disjoint subdomains and the appropriateness of using only the P-measure. This paper is an attempt to further investigate these two issues.

3 THE E-MEASURE

Although the P-measure has been widely used in the study of testing strategies [2], [3], [5], [6], [7], [9], [17], Weyuker and Jeng [17] have expressed their reservation over the appropriateness of using it as the sole means of assessing testing effectiveness, and suggested the search of other measures.

The expected number of failures detected (the E- measure), being regarded as another reasonable measure by Frank1 and Weyuker [7], has been used by them to analyze some specific subdomain testing strategies such as branch testing, mutation testing and data flow testing criteria. In this paper, we are using the E-measure to formally analyze random testing and general subdomain testing strategies rather than any specific ones.

Ideally, we would like to assess the effectiveness of testing in terms of the faults detected. Faults are the software defects caused by programmer errors, while a failure is an

We choose the E-measure because it is not only “reasonable” but also useful. Although the detection of more failures does not always help us identify more faults, in practice it is often desirable to reveal as many failures as possible. Intuitively, the more failures the testing reveals, the more information we are likely to get about the faults and the higher the chance of detecting more faults. From this point of view, a best or most effective testing strategy is one which results in a test suite that can discover as many

failures in the program as possible. We shall show in Section 5, (Propositions 1 and 3) that the situations leading to a maximum value of the E-measure are precisely those that enable the detection of the largest possible number of failures. In fact, these best case situations are also those when all (except possibly one of) the subdomains are homogeneous, which, as explained in Section 1, is precisely the’rationale underlying the use of subdomain testing. In contrast, to maximize the P-measure, it is only necessary to have one subdomain full of failure-causing inputs, even though all other subdomains may have low failure rates.

Another merit of the E-measure is that it can distinguish the capability of detecting more than one failure, while the P-measure regards a testing strategy as good as another so long as both can detect at least one failure. ,

Furthermore, there is as yet little success in extending the formal analysis to the more general and more common case of overlapping subdomains. Weyuker and Jeng seem to have foreseen the difficulty of doing this [17]. By using the E- measure, however, we are able to identify some characteris- tics of subdomain testing for overlapping subdomains.

More importantly, we have discovered some simple relations between the E-measure and the P-measure that enable the derivation of further properties of the latter more easily than by considering the P-measure alone (see Proposi- tions 18-21 in Section6). We find that when comparing subdomain and random testing, a greater or equal value of the E-measure guarantees a greater value of the P-measure. Thus, the relative values of the E-measure are also a good indication of those of the P-measure.

Still, there may be some situations in which one is more interested in the ability of the testing strategy to uncover at least one failure. In such situations, the failure rate is usually small, and so according to Propositions 22 and 23 the E- measure can be used as a first approximation to the P-measure.

Finally, we believe that there is probably no single “best” measure. Whichever one is the most appropriate to use depends very much on the purpose of the testing. Dif- ferent measures based on different intuition should com- plement each other to provide a more comprehensive un- derstanding of subdomain and random testing.

CHEN AND YU: ON THE EXPECTED NUMBER OF FAILURES DETECTED BY SUBDOMAIN TESTING AND RANDOM TESTING 111

4 THE UNDERLYING MODEL

In this section, we introduce the notation and assumptions used in our model, which is basically similar to that used by Weyuker and Jeng [17] and Chen and Yu [2], [3].

Let D denote a program’s input domain with size d > 0, m denote the number of failure-causing inputs, which are the elements of D that produce incorrect outputs, and c denote the number of the remaining inputs which are called correct inputs. Obviously, 0 5 c, m I d and c = d - m. Let n denote the total number of test cases selected when using random testing. Define the failure rate 0 and the sampling rate Q as 8 = t and o = t, respectively.

When testing is done by using subdomain testing, the subdomains formed will be denoted by D, i = 1, . . ., k,

where k is the number of subdomains and k 12. Let d, m, ci, n, ei, and q denote, respectively, the size, number of failure-causing inputs, number of correct inputs, number of test cases selected, failure rate, and sampling rate in subdomain D, where, for all i, 0 2 ci, mi 5 d, ci = di - m, ni 2 1,

ej = %-and cri = 3. Let N denote the total number of test

cases selected when using subdomain testing, that is, N = &ni.

For a fair comparison of the strength of subdomain and random testing, we shall assume, unless otherwise stated, that the same total number of test cases is selected, that is, N=n.

We shall assume that test cases are selected independ- ently and based on a uniform distribution. This implies that the probability of selecting a failure-causing input from domain D (respectively, subdomain DJ is exactly 0, (respectively, 6$). To simplify the formulae and the compu- tations involved, we shall for convenience assume that se- lections are done with replacement. Considering the fact that in practice sample sizes are normally very small when compared with the domain sizes, this assumption should have minimal effects. For more detailed justifications of these assumptions, please refer to [17].

In the rest of this paper, the terms “partition” and “partition testing” will refer only to the case of disjoint subdomains, while the terms “subdomain” and “subdomain testing” will refer to the general (including both the disjoint and nondisjoint) case. Moreover, in our subsequent analysis, many of the results are applicable to the general case; those which are not will be explicitly and unambiguously specified.

In fact, in the general case, & di 2 d, and the equality holds if and only if the subdomains are all disjoint. Besides, in general xf, m, 2 m, and xf=, ci 2 c. The last two ine- qualities become equalities if (but not only if) all subdomains are disjoint.

The values of the P-measure for random testing and subdomain testing are denoted, respectively, by P, and P,, and they are given by the following formulae:

P, = i - (I- e)”

P, = i-Ji(i-eJi i=l

Similarly, the values of the E-measure for random testing and subdomain testing are denoted by E, and E,, respectively, and they are given by the following formulae:

To simplify our exposition and enhance readability, we shall denote by Pp and Er, respectively, the values of the

P-measure and E-measure when all subdomains are disjoint.

5 ANALYSIS OF SUBDOMAIN TESTING

In this section, we analyze subdomain testing using the E-measure. We shall discuss the best and worst cases, as well as the conditions under which subdomain testing performs better or worse than random testing. Our analysis usually begins with the case of disjoint subdomains, fol- lowed by the general case whenever appropriate.

5.1 Best and Worst Cases Consider the best possible performance of partition testing. Intuitively, this occurs when the least number of test cases is “wasted,” that is, “spent” on partitions containing no failure-causing inputs, and for the other partitions, the failure rates are all equal to one, so that every test case “spent” on these partitions must necessarily reveal a failure.

PROPOSITION 1. For disjoint subdomains with di 2 ni 2 1 for all i, the best cases of partition testing are as follows.

1) Suppose that N = &ni > m. Then E, is maximized if

for all partitions that contain failure-causing inputs, mi =

ni = dC In this case, E, = m. 2) Suppose that N 5 m < d. Then E, is maximized iffor all

partitions Di except one, mi = ni = d, and from the only partition Dj which contains inputs that are not failure- causing (that is, mj < di), only one test case is selected (that is, ni = 1). In this case,

d-m Ep=N-&N+l’

PROOF.

1) By assumption, di 2 ni 2 1 and N > m. Thus

E, =iF<imi =m. i=l i i=l

Now under the conditions specified in part 1) of this proposition, this maximum possible value of E, is attained.

2) For any i, di =d-Cd, Id-xnh =d-N+ni,

hzi hzi


and PROOF.

n. 1 n. 1 -L----.--->A--= (ni - l)(d - N) di d-N+1 - d-N+n, d - N + 1 (d -N + n,)(d - N + 1)

20

Now let

i=l 1

which is the expected number of correct inputs found. Then

But

Er+C,=pp+~fgL&jr j=l i i=l i i=l

Hence d-m

E,=N-C,IN-~-~+~.

Now under the conditions specified in part 2) of this proposition, this maximum possible value of EP is attained. q

Proposition 1 shows that E, is maximized when all subdomains, except possibly one, is either full of or free of failure- causing inputs. In other words, partition testing. works best when (almost) all subdomains are homogeneous. As noted before, this is precisely the rationale underlying the use of partition testing.

Remember that Weyuker and Jeng [17] observed that the maximum value of P, is attained as long as there exists a partition full of failure-causing inputs. Clearly, the re- quirement of Proposition 1 is much stronger. However, the latter situation is obviously also the most desirable, because in this case, on average either as many as all failure-causing inputs are selected (when N > m), or almost every test case reveals a failure (when N I m). In contrast, the corresponding conditions found by Weyuker and Jeng [17] only guarantee the detection of at least one failure.

Next we consider the worst case of partition testing. PROPOSITION 2. For disjoint subdomains, the worst cases of parti-

tion testing are as follows.

l)AssumethatN2k2.2,d>k,O<m<d-k+l,Wzatis, there are at least k - 1 correct inputs). Then E, is mini-

mizedwhenn,=l,d,=...=dk-l=landdk=d-k+l, with all of the m failure-causing inputs in D,. In this case,

m m “P=K=d-k+l.

2)AssumethatNLk22,dZk,OSmId-N+l,(thatis, there are at least N - 1 correct inputs), and nj 4 di for all i. Then E, is minimized when nk = 1, d, = d - N + 1 and di = n, for i = 1, . . ., k - 1, with all of the mfailure-causing inputs in D,. In this case,

EY+ m k d-N+l’

1) For any i, di 2 1. Thus d,=d-xdjSd-xl=d-k+l

jti j#i

Hence k mini

EP=i~2&j-k+l i=l i

since ni 2 1 i=l

m =d-k+l

Now under the conditions specified in part 1) of this proposition, this minimum possible value of E, is at- tamed.

2) If we further restrict ni I d, for all i, then in a way similar to part 2) of the proof of Proposition 1, it can be shown that

5, 1 di -d-N+1

for all i and consequently

m E~2 d-N+l'

Again, under the conditions specified in Part 2) of this proposition, this minimum possible value of E, is attained. 0

Notice that the conditions for the worst cases of partition testing using the E-measure (Proposition 2) are identical to those observed by Chen and Yu [2] using the P-measure.

Now we turn to the more general case of allowing subdomains to overlap. This has the effect of widening the range of possible performance of subdomain testing. In other words, at the extremes, it is possible to have an even better or poorer performance than that of partition testing.

Intuitively, since the subdomains may overlap, the best we can expect to have is that all subdomains, except possibly one, contain only failure-causing inputs. For coverage of the entire input domain, the remaining subdomain needs to include all the correct inputs. But to have the best performance, all the failure-causing inputs should also be in- cluded. Thus, the last subdomain should be the entire input domain. We state our observation in Proposition 3.

PROPOSITIONS. For overlapping subdomains with m > 0, E, is maximized when one subdomain is the entire input domain from which only one test case is selected, and all other subdomains contain only failure-causing inputs. In this case,

d-m ES=N-~.

PROOF. Let

which is the expected number of correct inputs found. Then


Also, since & , c. 2 c and for all i, di 5 d and ni 2 1, we have

Hence d-m

E,=N-C,IN-~.

Now under the conditions specified in this proposition, this maximum possible value of E, is attained. 0

By interchanging the roles of failure-causing inputs and correct inputs, we obtain the worst case as the dual of the best case as follows.

PROPOSITION 4. For overlapping subdomains with m c cl, E, is minimized when one subdomain is the entire input domain from which only one test case is selected, and all other subdomains contain only correct inputs. In this case,

E, =;,

PROOF. This proposition is simply the dual of Proposition 3 and so the proof is also similar. q It should be noted that in the worst situation, at most

one test case (which is selected from the entire input domain) has a nonzero chance of revealing a failure. The reader may contrast this with partition testing whose worst performance is slightly better than this (see Proposition 2).

5.2 Comparison with Random Testing In general, there is no a priori information about the distribution of the failure-causing inputs. Despite this, Weyuker and Jeng [17] did find a way of partitioning the input domain which ensures that the P-measure of partition testing would never be less than that of random testing. They found that this happens if all partitions are of equal size and the number of test cases selected from each partition is equal. Chen and Yu [2] have generalized this condition as follows. As long as the number of test cases selected is proportional to the size of the partition (hereafter referred as proportional sampling), partition testing would never have a smaller value of P-measure than random testing. Thus, it is natural to investigate the effect of proportional sampling within the context of the E-measure.

PROPOSITION 5. For disjoint subdomains, if 4 = . . . = ok, (that is, %

q=“’ k = f ), then EP = E,.

PROOF. It can be shown that if o1 = . . . = ok, then oi = cr for all i. Thus,

E,=~miai=~mia=a~mi=m~=E, q i=l

COROLLARY 1. For disjoint subdomains, if d, = . . . = d, and n, = . . . = nk, then E, = E,.

Thus, while proportional sampling guarantees a better chance of partition testing than random testing to detect at least one failure [2], Proposition 5 tells us that on average this strategy reveals the same number of failures as random testing.

With proportional sampling, the sampling rate q of subdomain Di is independent of i. If all subdomains are disjoint and the sampling rate is uniform, then necessarily q = cr for all i provided the total number of test cases is the same for both random and subdomain testing. This is no longer true when overlapping is allowed. If there is at least one nonempty intersection among the subdomains, then the following two situations are possible:

1) If we keep q = cr for all i, then N = & ni > n . That is, if the sampling rate in random testing is kept equal to the uniform sampling rate in subdomain testing, then the size of the test set for random testing will be smaller.

2) I fwekeepN=~~=,n~ =n,andcr,=...=ck,thena;:<o for all i. That is, if the sizes of the test suites are kept the same in both random and subdomain testing, then subdomain testing will have a smaller (but uniform) sampling rate.

The next two propositions deal with these two situations separately. PROPOSITION 6. Subdomain testing will never be worse than ran-

dom testing if the entire domain and all subdomains are sampled at the same rate, that is,

E, 2 E, if o1 = . . . = ok = cr

In this case, An = oAd, where An = &ni - n and

Ad = x;=, di - d.

PROOF. Since some subdomains may overlap, we have

xmi2m. i=l

Thus

E, =irnioi =oirni 2om=E, i=l i=l

In this case,

An=in,-n=cridi-od=aM cl i=l i=l

It should be noted that this result is very general in that no knowledge of the failure rates is assumed. It gives us a very general sufficient condition of achieving better performance than random testing. Unfortunately, the pre- condition here also implies, as mentioned above, that more test cases are used for-subdomain testing. This result may be useful if we are willing to increase the number of test cases just to ensure that thi effectiveness of the testing will not deteriorate. However, the price is that more efforts of testing are needed. Note also that intuitively more test cases probably (but certainly not always) make the testing more effective. Proposition 6 simply guarantees a simple way of using more test cases to achieve extra effectiveness.

114 IEEE TRANSACTIONS ON SOF7A’ARE ENGINEERING, VOL. 22, NO. 2, FEBRUARY 1996

Interestingly, Proposition 6 does not hold if the P-measure is used, as the next example demonstrates.

EXAMPLE 5.1. Consider an input domain D such that d = 1200, m = 181, and n = 12. Then the sampling rate is cr = & = 0.01 for random testing. Let D be divided into

two overlapping subdomains D, and D, satisfying the following.

Clearly, 91 91

E, = 11 x lloo + 11 x - = 1100 1.82

181 E, = 12 x 1200 = 1.83

= 0.850

and

= 0.859.

Here cri = o2 = 0.01 = a, and E, > E, but P, < P,. 0

The next proposition concerns with the second situation of proportional sampling’ in which the sampling rate of subdomains is less than that of the entire input domain. It shows that, in this situation, the performance of subdomain testing relative to random testing is solely determined by whether the overall failure rate at the intersection parts is higher than the failure rate of the entire input domain.

PROPORTION 7. Consider overlapping subdomains among which there is at least one nonempty intersection. Assume N = ~~=,ni = n and a, = . . . = q. Then q < ofor all i.

Moreover, E, 2 E, iff s 2 8, where Am = ~~=, mi - m and

Ad = x;zl di - d.

PROOF. Let D = zB,di and M = Cf=,nq. Also, let as be the uniform rate of subdomain sampling, that is, 5, = 2 zz ... = 2

1 d, . Then 0s is also equal to $$. Since N =

n, we have o, = $$. Clearly, D > d and so o, < cr Now,

E, = iy = os&rni = o,M = $. i=l z

Therefore,

Now we turn to investigate the effects of failure rates on the efficacy of subdomain testing for the general situation in which sampling rates are not necessarily uniform. First, we have a result analogous to Observation 7 in Weyuker and Jeng [17j, which states that if 0, = . . . = 0, then B = q and PP = PT.

PROPOSITION 8. For disjoint subdomains, if 611 = . . . = Bk, then 8= qandEp=E,.

PROOF. Clearly, if Qi = . . . = Bk’ then Si = Bfor all i. Thus,

EP=~niei=~n,e-e~ni=ne=E, q i=l i=l i=l

Obviously, under most circumstances, the failure rates of the resulting partitions will not be all equal. It should be possible, however, for a tester to choose the sampling rates. In such cases there are two opposite lines of thought. One might think that to detect more failures, the more failure- prone subdomains should be tested more often. Another might think that fewer test cases would be needed for subdomains with higher failure rate, as failures would be eas- ier to detect, while more test cases would be needed to detect those failures in subdomains with lower failure rate: Our next few propositions address the effect of these two strategies of allocating test data for disjoint subdomains. To establish these results, we need the following two lemmas.

LEM~VLA 1. Suppose k = 2 and DI n D, = 0. If (0, - BJ(q - q)

10, then E, 2 E, where 4 and q are the failure rate and sam-

pling rate of D, respectively, for i = 1,2. /

PROOF. Since DI n D, = Band k = 2, we have n = n, + n2,

m=m,+m,,andd=d,+d,.

E -E _ v%+wz (~+%)@l+d -- P r 4 4 4 + 4

= 44u+ - em, - 02) 4 +4

20 a

LEMMA 2. Let D be partitioned into k + 1 mutually disjoint subdomains D,, . . ., Dk+I. Let D’ = Uf=, Di with 8’ and (T’ as its

failure rate and sampling rate, respectively. If 4 > Bk+I and q 1 ok+1 for all i = 1, . . ., k, then (B - Bk+J(d - ok+*) > 0.

PROOF. Let m’ be the number of failure-causing inputs in D and d’ be the size of D’. Then m’ = xF=, mi and

d’ = CF=, di. By definition

m’ mk+l ef - ek+l = d’ - d k+l

and hence E, 2 E, iff s 2 8. 0 20 since Oi 2 Ok+l for i = 1, . . . . k.


Similarly, we can prove that

0’ - ok+1 = $!e d,(q - cTk+l) 2 0. i=l

Hence (d - Q,,)(o’ - ~~+i) > 0. cl

We are now ready to present a condition for partition testing to work better than random testing.

P~o~osrr~o~ 9. For disjoint subdomains, v (4 - q)(q - q) > Ofor aZli,j=l,..., k,thenErZE,.

PROOF. This is done by induction on k.

Basis Sfq. k = 2. This follows immediately from Lemma 1.

Induction Step. Suppose the input domain D is partitioned into k + 1 disjoint subdomains D,, . . . . Dk+l with (tii - q)(q - crJ 2 0 for all i, j = 1, . . ., k + 1. Without loss of generality, assume that Dk+l has the least failure rate, that is,Qk+l16rifori=1,..., k. Since (4 - 8,,)(q - ~~+i) 2 0, we

must have ok+i < q, that is, Dk+l must also have the least

sampling rate. Let D’ = lJf=, Di with d and 0’ as its failure rate and sampling rate, respectively. By Lemma 2,

(0’ - @k+& - Uk+1) 2 0 Let d’, m’, and n’ denote the size, number of failure-

causing inputs and number of test cases selected from D’, respectively. Also, let

and

By induction assumption, we have E; 2 Ei. Now,

mk+lnk+l =E;+-- W + mk+Jn + nk+l 1 k+l d’ + ‘k+l

m’n’ > I mk+lnk+l tm’ + mk+l)(n’ + %+l>

- d' dk+l d’ + dk+l since EL 2 E:

D can be regarded as being partitioned into two disjoint subdomains D’ and Dk+I. Since we have proved that (d - 6!k+l)(d - ok+J 2 0, by Lemma 1, we have

-E, &+?$h- tm + mk+l)(n’ + %+h d’ + dk+1 20 IJ k+1

Proposition 9 asserts that partition testing is better than random testing if sampling rates are higher for partitions with greater failure rates. This consolidates our common intuition that the more failure-prone partitions should be tested more often. It should be clear that to make use of this proposition, the exact values of the failure rates and sampling rates are not required to be known. As long as the relative order of the failure rates can be estimated or known, the sampling

rates can then be defined to ensure the satisfaction of the required condition of this proposition. Two simple ways of doing this are described as corollaries below.

COROLLARY 2. For disjoint subdomains, if there exist a, /I 2 0 such that for all i, ni = ami + pEzi, then E, 2 E,.

COROLLARY 3. For disjoint subdomains, if the number of test cases selected is proportional to the number of failure-causing inputs, then E, 2 E,.

On the other hand, if the other strategy is taken, that is, the more failure-prone partitions are sampled less often, then partition testing will be less effective than random testing. This is stated in the next proposition, whose proof is omitted as it is very similar to that of Proposition 9.

PROPOSITION 10. For disjoint subdomains, if (4 - 6$)(q - 4) 5 0 foraZli,j=l . ..k.thenErIE,.

Rather surprisingly, Propositions 9 and 10 do not hold if E is replaced by E,, as shown by the following two exam- pfes. It appears that for overlapping subdomains, the situation is much more complicated than is usually thought, and the intuition on which the two propositions are based do not carry over to the overlapping case.

EXAMPLE 5.2. Let D be divided into two overlapping subdomains D, and D,, such that d = 250, d, = 200, and d, = 100. (Note that the size of D, n D, is 200 + 100 - 250 = 50.) Assume that there are 25 failure-causing inputs and they are all in D,\D,. Then

25 8, = 0 < 8, = j-@J = 0.25

and LJ

8 = m = 0.1.

Suppose n = 24. Then Er = 2.4. If n, = 15 and n, = 9, then crl = $$ < oz = &, but E, = 2.25 < E, Hence Proposition 9 does not hold for overlapping subdomains. 0

EXAMPLE 5.3. Let D be divided into two overlapping subdomains D, and D,, such that d = 250, d, = 200 and d, = 100. (Note that the size of D, rl D, is 200 + 100 - 250 = 50.) Assume that there are 25 failure-causing inputs and they are all in D, n D,. Then

8, =g 25

= 0.125 < 8, = m = 0.25

and 25

8 = 250 = 0.1.

Suppose n = 6. Then E, = 0.6. If n, = 5 and n2 = 1, then 5 1

q=y(jpo*=ig(p

but E, = 0.875 > E,. Hence Proposition 10 does not hold for overlapping subdomains. 0

For obvious reasons, it would also be desirable to know the difference in the number of failures detected on average. When all subdomains are disjoint, this exact difference turns out to be expressible in terms of the failure rates, sampling rates and partition sizes.


PROPOSITION 11. For disjoint subdomains, we have

Ep - E, = i di(ei - 6)(oi - CJ) i=l

PROOF. Recall that since all subdomains are assumed to be

disjoint, zi=, di = d, xt,rni = m, and .&rzi = n.

Also,

and similarly, E, = d&r. Now,

~d,(ei - e)(oi - CT) = id&q - &diBj - 8 id,q i=l i=l i=l i=l

+ e$ di i=l

= Ep-mcr-ne+E,

= E, -E, q

Proposition 11 is very useful. Firstly and obviously, when there are known or estimated values of the failure rates and sampling rates, we can compute the average difference in the number of failures detected by partition and random testing. Secondly, the following sufficient condition for partition testing to be better than random testing is immediate from Proposition 11.

PROPOSITION 12. For disjoint subdomains, if (Si - B)(q - CT) 2 0 for all i, then E, 1 E,. At first glance, the two conditions described in Proposi-

tions 9 and 12 look very similar. The first condition requires that the failure rates and sampling rates of all partitions are similarly ordered, that is, it is possible to rearrange the partitions in such a way that the failure rates and sampling rates are both in ascending order. The second condition, however, requires only that all partitions can be divided into two groups: one group having greater failure rates and sampling rates than those of the entire input domain, and another group having smaller failure rates and sampling rates. To use the first condition, we need to be able to ar- range the partitions in an ascending order of the failure rates. To use the second condition, we need only to know whether the failure rate of each partition is greater or smaller than that of the entire input domain. Moreover, as the next two examples show, it is possible to satisfy one condition and violate another. Therefore, none of the two conditions implies the other.

EXAMPLE 5.4. Let the domain D be divided into three partitions D,, D,, and D, satisfying the following:

Here d = 34, m = n = 17, and 0 = cr= 0.5. Clearly, for all i,(6i-B)(o;,-o)>0,andEp=12>E,=8.5,but

(4 - e2m - 4 < 0. 0

E~LE 5.5. Let the domain D be divided into three partitions D,, D, and D, satisfying the following:

Here d = 200, m = 45, and n = 34, B = 0.225, and cr= 0.17. Clearly, for all i, j, (q - q)(o;: - q) 2 0, and Ep = 8.3 > E, = 7.65, but (6!! - 8)(0, - c$ < 0. 0

Also immediate from Proposition 11 is the following condition for random testing to be better than partition testing.

PROPOSITION 13. For disjoint subdomuins, if (4 - B)(q - CT) 5 0 for aZZ i, then E, 5 E,. Again, though Propositions 10 and 13 look similar, none

of them implies the other. Finally, the following proposition, albeit a direct conse-

quence of the definition of the E-measure, sheds some light on the factors affecting subdomain testing, particularly when compared with random testing.

PROPOSITION 14. For the same total number of test cases, that is,

CF=, ni = n,

ES 2 E, iff in,(e, - 6) 2 0. i=l

PROOF.

E,-E,.=~niQi-n~=~njf3i-~ni~=~ni(~i-6J) i=l i=l i=l i=l

and hence the proposition follows. 0

Effectively, this result tells us that the crucial factor of the relative performance of subdomain testing to random testing is the aggregate of the differences in failure rates of the entire domain and that of the subdomains weighted by the number of test cases selected. Accordingly, a simple condition to guarantee that E, > E, is that for all i, Qi 2 Band for some j, q > 6. In other words, if we can divide the domain into subdomains all with failure rates not less than and at least one greater than that of the entire domain, subdomain testing will be better.

Note that such a subdivision is only possible if we allow. overlapping of subdomains. For if all subdomains are disjoint, then Crzlmi = m and hence the failure rates are related by the following equation:

ii diei = e i=l

If we further require 4 > Bfor all i and 9 > Bfor some j, we would have


i=l i=l contradicting the previous equation.

Thus, we can see that allowing overlapping of subdomains opens up new possibilities that are hard to speculate if we only attempt to generalize observations regarding disjoint subdomains.

6 RELATING THE E-MEASURE TO THE P-MEASURE

Although the E-measure and the P-measure are different metrics, they are both defined in terms of failure-causing inputs. Intuitively, more failure-causing inputs should generally lead to larger values of both measures. One may wonder, beyond this, whether there is any correlation between them. In this section, we shall explore any such relations.

Our first two propositions in this section establish the bounds of the values of the E-measure in terms of the corresponding values of the P-measure.

PROPOSITION 15. For any given progrum,

PROOF. Let xj denote the probability that exactly j failures are found. Since at most n failures can be detected, we

P, I E, = n[l- (1- PJ’n].

have P, = xi + x2 + . . . + x, and E, = x1 + 2x, + . . . + nx,. Clearly, E, 1 P, as all the x/s are nonnegative. Finally,

E, = n[l- (1- P$‘n]

follows immediately from the definitions of E, and P,. Cl

~OPosrrION 16. For any given program and any subdomain testing strategy, P, I E, I N [l - (1 - i)l’N], where N = I;f”=, ni.

PROOF. The same argument used in the proof of E, 2 P, ap- plies to E, 2 P, and is not repeated here.

For the other part of the inequality, consider the sequence of numbers c+, q, . . . . crN formed by repeating ni times the number (1 - 4) for i = 1, . . . . k. Formally, the numbers aj’s are defined as follows:

aj = i-e, for j = 1, . . . . q

a o,+...+ni-,+j = i-e, forj=l,..., n,;i=2 ,..., k

Since 0 I 4 < 1 for i = 1, . . ., k, we have 0 I 9 ZZ 1 for j=l, . . ..N.

Let G and A be the geometric mean and the arithmetic mean of the numbers gi, j = 1, . . ., N. Then

G = (gaj)“” =[$(l-eiy]IIN = (l-p,)llN

A=~~aj=~~ni(l-ei)=~~ni-~~niei j=l i=l i=l i=l

Since it is well known that the arithmetic mean is never less than the geometric mean,

I-$E, 2 (l-PyN

E, I N[l-(l-P,)l’N]

Here equality holds if and only if all ai’s are equal, that is, if and only if e), is independent of i. This com- pletes the proof. III

In this paper, we mainly compare subdomain testing with random testing. It turns out that, in this context, a very important relation between the relative values of the two measures exists, as stated in the next proposition.

PROPOSITION Ii'. Suppose '&=l~i = n. IfEs 2 E, then I’, 2 F’, and equality holds if and only if 4 = Ofor all i.

PROOF. Suppose N = n. It follows from Proposition 16 that E, I n[l - (1 - Ps)l’n]. If further, E, 2 E, then since E, =

no, we have n0 I n[l- (l- ~~‘~1, which gives P, > 1 -

(1 - qn = P, on rearrangement. Clearly, equality holds if and only if

E, = N[l - (1 - PS)l’N]

and E, = nt? (2)

From the proof of Proposition 16, (1) holds if and only if k$ is independent of i. Combining this with (2), the proposition follows. q

Effectively, Proposition 17 tells us that if a subdomain testing strategy is better than random testing according to the E-measure, then it must also be better according to the P-measure. This enables us to derive several conditions for subdomain testing to have a P-measure greater than or equal to that of random testing from our corresponding results on E-measure as follows.

PROPOSITION 18. (c.f. Proposition 9) For disjoint subdomains, if (4 - t+(q - 4) 2 0 for all i, j = 1, . . ., k, then Pr 2 P,.

PROPOSITION 19. (c.f. Proposition 12) For disjoint subdomains, if(q-B)(q-c$a3Oforulli, thenPr?P,.

PROPOSITION 20. (c.f. Proposition 14) If zf=, ni(ei - e) 2 0,

then P, 2 P,.

P~0P0sm0N 21. (c.f. Proposition 7) Assume N = xi=, ni = n,

and q = . . . = 0,. Then q < afor all i. Moreover, if 2 2 8,

where Am = ziEl mi - m and Ad = x%, di - d, then P, 2 P,.

The proofs of Propositions 18-21 are all omitted, as each of them is-simply a direct consequence of Proposition 17 and the corresponding proposition involving the E-measure.

Thus, in view of the simplicity of its mathematical formula, the E-measure is also very useful in simplifying the analysis involving the P-measure. For instance, the main result in Chen and Yu (Proposition 3 in [2]) that proportional sampling implies Pr 2 P, becomes an immediate consequence of Proposition 5, whose proof is much simpler. Similarly, a result obtained by Chen and Yu (Theorem 1 in


[3]), which is equivalent to Proposition 18, becomes an immediate consequence of Proposition 9.

Following is an immediate corollary of Proposition 17.

COROLLARY 4. Suppose ‘$=,ni = n . If P, < P, , then E, < E,.

The converse of Corollary 4 is, however, not true, as can be seen from the following example.

EXAMPLE 6.1. Consider the input domain D such that d = 100, m = 5, and n = 2. Suppose that D is partitioned into two disjoint subdomains with d, = 51, d, = 49, ml = 5, m2 = 0 and n, = n2 = 1. Then E, = 0.098 < 0.1 = E, but P, = 0.098 > 0.0975 = P,. 0

Moreover, as illustrated in the following example, the condition ‘& ni = n is essential in Proposition 17.

EXAMPLE 6.2. Consider the input domain D which is partitioned into two disjoint subdomains with d = 100, d, = 80, d,=20,m=10,m,=9,andm2=1.

1) If n = 6, n1 = 4, and n2 = 3, then E, = 0.6 = E, , but PS = 0.468 < 0.469 = P,.

2) If n = 10, n, = 1, and n2 = 18, then E, = 1.01 > 1.00 = E, but P, = 0.647 < 0.651= P,. 0

A natural question to ask is whether Proposition 17 can be extended to the comparison of two general subdomain testing strategies. In other words, if for the same program, a particular subdomain testing strategy S, is expected to detect more failures on average than another strategy S, does S, also have a greater chance of detecting at least one failure? The following example shows that the answer is negative.

EXAMPLE 6.3. Consider the input domain D such that d = 15 and m = 8. Suppose that strategy S, divides the input domain into two disjoint partitions of sizes 12 and 3, and the number of failure-causing inputs are 7 and 1, respectively. Another strategy S, divides the input domain into two disjoint partitions of sizes 10 and 5, and the number of failure-causing inputs are 7 and 1, respectively. As- sume also that both strategies require one test case from each partition. Then the values of the E-measure of S, and S, are 0.917 and 0.9, respectively, whereas those of the P-measure are 0.722 and 0.76, respectively. Thus, in this example, El > E, but P, < P2. 0

Finally, under certain circumstances, the E-measure can actually be used as an approximation of the P-measure. We state this result in the next two propositions.

PROPOSITION 22. When the program’s failure rate 0 is small such that terms involving powers of @equal to or higher than 2 can be ignored, E, = P,

PROOF. Using binomial expansion,

E,-P,=nB-[l-(1--8)“] = ne- if [I- ne+ o(d)] = O(&)

from which the proposition follows. 0

It follows from Proposition 22 that’even under situations where the P-measure is more appropriate, the E-measure,

which has a much simpler formula, can at least be used as a first approximation. Although the assumption that 8 is small is needed, this should not be too restrictive in actual practice. For when the failure rate is high, the failures in the program will be relatively easy to detect, and both random and subdomain testing should be relatively effective. The difference in performance would then be much less signifi- cant than in the case when the failure rate is low.

For subdomain testing, a similar result holds when all subdomains have small failure rates.

PROP~%TION 23. lf all subdomains of a program have small failure rates such that terms involving the product of any two or more of the failure rates can be neglected, then E, = P,

PROOF.

i=l

zz 1 - fi (1 - n,Q,) neglecting all terms of o(ef ) i=l

= I- [ 1 - i niei +C Wiej)

i=l i+j 1 = iniei

i=l

neglecting all terms of o(eiej)

= E, q

Note that while the failure rate of the entire input domain may be small, this may not be true for subdomains. Indeed, as discussed before, subdomain testing performs best when some subdomains do have high failure rates. In 1 these cases E, may not approximate P, well.

7 SUMMARY AND CONCLUSIONS

In this paper, we have extended previous analytical work 121, [3], [17] on subdomain testing and random testing in two ways. First, we make use of the E-measure to quantify the effectiveness of a testing strategy. Second, in addition to the special case of disjoint subdomains, we have also considered the more realistic case of overlapping subdomains. Further- more, we have discovered important relations between the E-measure and the P-measure, as well as new characterizations of subdomain testing using the P-measure.

Our analysis has been based on the same mathematical model used previously in [2], [3], 1171 and described in Sec- tion 4. Naturally, the validity of the derived results and conclusions is limited by the underlying simplifying assumptions that have been made for mathematical tractability.

We have identified the best and worst cases of subdomain testing, observed that a wider range of the performance of subdomain testing is possible if subdomains are allowed to overlap, and we have investigated the effects of proportional sampling. We have also obtained several conditions for partition testing to be at least as effective as random testing. For instance, this is true if partitions having failure rates greater (smaller) than that of the entire input domain also possess higher (lower) corresponding sampling rates (Proposition 12).


For the overlapping case, we notice that the crucial factor 1141 of the relative performance of subdomain testing to random testing is the aggregate of the differences in the subdomain

1151

failure rate and the overall failure rate, weighted by the number of test cases selected from that subdomain. Thus, WI unlike the disjoint case, it is possible that all subdomain failure rates are higher than the overall, in which case sub-

D71

domain testing is clearly better than random testing. Finally, we find that a subdomain testing strategy better 11’]

than random testing according to the E-measure must also be better according to the P-measure. With this, we derive corresponding characterizations of subdomain testing in terms of the P-measure. Generally speaking, however, for the same program two subdomain testing strategies may be ranked differently by the two measures (see Example 6.3). Nevertheless, when all failure rates involved are small, the E-measure can serve as a first approximation to the P-measure. In view of the success in obtaining various characterizations of subdomain and random testing, we expect that further analysis using the E-measure should be fruitful in identifying when subdomain testing strategies are worth the effort.

ACKNOWLEDGMENT

The authors would like to thank F. T. Chan and H. Leung for their invaluable discussions and comments.

REFERENCES

[l] T.Y. Chen and J.Q. Mao, “On data flow testing strategy,” Pm. Second Int’l Co@ Software Quality Management, pp. 299-309, July 1994.

[2] T.Y. Chen and Y.T. Yu, “On the relationship between partition and random testing,” IEEE Trans. Software Engineering, vol. 20, pp. 977-980, Dec. 1994.

[3] T.Y. Chen and Y.T. Yu, “A more general sufficient condition for partition testing to be better than random testing,” to appear in Information Processing Letters.

[4] R.H. Cobb and H.D. Mills, “Engineering software under statistical quality control,” IEEE Software, pp. 44-54, Nov. 1990.

[5] J.W. Duran and S.C. Ntafos, “An evaluation of random testing,” IEEE Trans. Software Engineering, vol. 10, pp. 438-444, July 1984.

[6] P.G. Frankl and E.J. Weyuker, “A formal analysis of the fault- detecting ability of testing methods,” IEEE Trans. Software Engi- neering, vol. 19, pp. 202-213, Mar. 1993.

[7] P.G. Frankl and E.J. Weyuker, “Provable improvements on branch testing,” IEEE Trans. Software Engineering, vol. 19, pp. 962- 975,0& 1993.

[S] E. Girard and J.-C. Rault, “A programming technique for software reliability,” Proc. IEEE Symp. Computer Software Reliability,

E50,1973. [9] R. Hamlet and R. Taylor, “Partition testing does not inspire con-

fidence,” IEEE Trans. Software Engineering, vol. 16, pp. 1402-1411, Dec. 1990.

[lo] IJ. Hayes, “Specification directed module testing,” IEEE Trans. Software Engineering, vol. 12, pp. 124-133, Jan. 1986.

[ll] T.*Higashmo and 6. v. Boch;r;ann, “Automated analysis and test case derivation for a restricted class of LOTOS exuressions with data parameters,” IEEE Trans. Softzuare Engineer& vol. 20, pp. 29-42, Jan. 1994.

[12] J.W. La&i and B. Korel, “A data flow oriented program testing strategy,” IEEE Trans. Software Engineering, vol. 9, pp. 347-354, May 1983.

[13] G. Luo, A. Das, and G. v. Bochmann, “Software testing based on SDL specifications with save,” IEEE Truns. Software Engineering, vol. 20, pp. 72-87, Jan. 1994.

G. Myers, The Art of Software Testing. New York: John Wiley & Sons, 1979. S. Rapps and E.J. Weyuker, “Selecting software test data using data flow information,” IEEE Trans. Software Engineering, vol. 11, pp. 367-375, Apr. 1985. I. Sommerville, Sofkuare Engineering. fourth edition, Addison- Wesley, 1992. E.J. Weyuker and B. Jeng, “Analyzing partition testing strategies,“ IEEE Trans. Sofhoare Engineering, vol. 17, pp. 703-711, July 1991. L.J. White and E.I. Cohen, “A domain strategy for computer program testing,” IEEE Trans. Software Engineering, vol. 6, pp. 247- 257, May 1980.

T.Y. Chen received the 13% and MPhil degrees in physics from the University of Hong Kong; MSc degree and DIC in computer science from the Imperial College of Science and Technology; and the PhD degree in computer science from the University of Melbourne. Dr. Chen is currently a senior lecturer in the Department of Computer Science, University of Melbourne. His main research interests include program testing, software engineering, fixpoint theory, and logic programming.

Y.T. Yu received the BSc degree in mathematics with first class honors from the University of Hong Kong, and the graduate diploma degree in computing studies from the University of Melbourne. He is currently a PhD student in the Department of Computer Science, University of Melbourne. His current research interests include program testing and software engineering, particularly software specification and design methodologies, software reliability, and complexity measures.

Documents

IEEE TRANSACTtONS ON SOFTWARE …...Ideally, we would like to assess the effectiveness of test- ing in terms of the faults detected. Faults are the software defects caused by programmer