Precondition Satisfaction by Smart Object Selection in ...se.inf.ethz.ch/old/projects/serge_gebhardt/report.pdf · When testing contract-equipped software it becomes di cult for a

Precondition Satisfaction bySmart Object Selection in Random Testing

Master Thesis

By: Serge Gebhardt Supervised by: Yi Wei Prof. Dr. Bertrand Meyer

Student Number: 02-920-148

Abstract

A random testing strategy for object-oriented software constructs test runsby repeatedly performing the following three tasks: 1) randomly select a methodunder test (MUT); 2) randomly select or construct target or argument objectsto feed to the chosen method; 3) invoke the test case.

Usually all the objects created for or returned by a MUT are stored inan object pool for reuse by future test cases. When testing contract-equippedsoftware it becomes difficult for a random testing strategy to select input objectsthat satisfy the MUT’s preconditions. The generated test cases often fail thepreconditions and thus some methods are rarely tested (if at all).

We propose an improvement to random testing through a smarter selectionof objects in order to satisfy more preconditions. All predicates appearing inthe classes under test are collected into a pool and evaluated on the objectsinvolved in each test case invocation. Predicates are mapped to satisfying objectcombinations in the pool and can be easily selected as input upon a test caseinvocation in order to satisfy the MUT’s preconditions.

Our results show that the improved strategy does indeed test more methods,especially where the original approach was failing. However we could not observea stable increase in the number of found faults.

Acknowledgments

My gratitude goes to my supervisor Yi Wei for his continuous support, in-teresting discussions, and valuable feedback, to Prof. Bertrand Meyer for givingme the opportunity to work on this interesting topic, and to my family whichsupported me during my whole time at ETH.

Contents

1 Introduction 9

2 Original AutoTest: The or-Strategy 112.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 Object Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Contract-Based Testing . . . . . . . . . . . . . . . . . . . . . . . 142.4 Difficulty With Strong Preconditions . . . . . . . . . . . . . . . . 15

3 Precondition Satisfaction By Smart Object Selection: The ps-Strategy 173.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.2 Smart Object Selection . . . . . . . . . . . . . . . . . . . . . . . . 173.3 Predicate Valuation Pool Building . . . . . . . . . . . . . . . . . 193.4 Optimizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4 Implementation 214.1 User Interface And Settings . . . . . . . . . . . . . . . . . . . . . 214.2 Linear Constraint Solver: lpsolve . . . . . . . . . . . . . . . . . . 22

5 Evaluation 295.1 Test Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.2 Testing More Features . . . . . . . . . . . . . . . . . . . . . . . . 305.3 Finding More Faults . . . . . . . . . . . . . . . . . . . . . . . . . 335.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

6 Conclusion 396.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 396.2 Further Improvements . . . . . . . . . . . . . . . . . . . . . . . . 396.3 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

A Additional Results 41A.1 Correlation graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 41A.2 Faults and success rate by time . . . . . . . . . . . . . . . . . . . 42

7

Chapter 1

Introduction

It is universally accepted that testing is becoming increasingly important whendeveloping software systems. However opinions diverge when it comes to thebest strategy.

The testing strategy can be manual or automated. With a manual strategy,the more traditional approach, testers prepare test suites that they think willbest exercise the program. An automated testing strategy tries to remove thetediousness of this process by relying on a software tool that generates test casesfrom the program’s specification (black box) or its actual text (white box).

Previous research work at the Chair of Software Engineering[2] has led tothe development of a tool called AutoTest[1]. It is, at its core, a fully au-tomated testing framework that produces systematic tests from contracts ofobject-oriented programs (Eiffel[17] in particular). It implements the conceptof push-button testing: provided a set of classes and a time frame AutoTest au-tomatically generates and runs test cases on the public methods in those classes.Once finished it returns the test results for each method.

EiffelStudio[6], the Eiffel IDE by Eiffel Software[3], ships with an integratedtesting tool. Since version 6.3[5] AutoTest is included in the standard distribu-tion.

AutoTest relies on the contracts[18] in Eiffel classes and interprets contractviolations as faults. Since AutoTest cannot check any functionality that is notspecified in the contracts, the quality of the test results is highly dependent onthe quality (correctness and completeness) of the contracts.

While AutoTest is already a great tool to automatically generate test casesfor contract-equipped classes it suffers from a key limitation: it has difficultiesto test features with strong preconditions. Yet only test cases that pass thepreconditions have a chance of discovering faults.

The original AutoTest strategy generates random input objects to use astarget and arguments on test case invocations. This will inherently fail mostof the time for features with strong preconditions, wasting a lot of resourcesrunning mostly useless test cases.

9

10 Introduction

We therefore propose an improvement to the original strategy through asmarter selection of objects in order to satisfy more preconditions. The keyconcept is to choose candidates with a high chance of satisfying a feature’spreconditions, thereby generating less invalid test cases and testing more fea-tures. Intuition dictates that running more valid test cases on more featureswill ultimately uncover more faults.

Summed up, the improved strategy keeps track of an object’s state and canthen propose candidate object combinations that in all likelihood will satisfy afeature’s preconditions. Our goals for the improved strategy are:

1. Generate more valid and less invalid test cases.

2. Test more features.

3. Find more faults.

4. Run faster (ideally) or minimally slower (worst case).

This thesis is organized as follows: Chapter 2 explains in detail the orig-inal strategy and exposes its major weakness. The proposed improvement ispresented in Chapter 3, and its implementation is then detailed in Chapter 4.The results of a thorough performance evaluation are revealed in Chapter 5. Fi-nally, Chapter 6 concludes this thesis by synthesizing the results, listing possiblefuture improvements, and showing related work.

Chapter 2

Original AutoTest:The or-Strategy

2.1 Overview

This chapter illustrates the original functionality of AutoTest, which we will re-fer to as the or -strategy or the or -mode. During runtime it repeatedly performsthe following four steps (see Figure 2.1):

1. Select the next method under test.

2. Prepare the target and argument objects to feed to the method.

3. Invoke the selected method using the prepared objects.

4. Determine failure/success of test case.

Method selection. In order to evenly test all methods evenly AutoTest main-tains the number of times each method has been tested. During method selectionit randomly chooses one of the least tested method as the next method undertest.

Object preparation. Before invoking the selected method AutoTest needsto either create new objects or re-use already created objects. To do so it dis-tinguishes between two cases: basic types and reference types. The preparationof objects is handled in more detail in Section 2.2.

Method invocation. Once AutoTest has selected the method under test andprepared the objects, it invokes the method feeding the objects as target andarguments. The result of the execution, possible exceptions and its branchcoverage information are recorded for later use.

11

12 Original AutoTest: The or-Strategy - Overview

R3

invoke method using prepared objects

R0

AutoTest launch

R1

select method under test

R4

assess pass/fail

R2

prepare target and argument objects

R2.1

object type?

R2.2

return object

basic reference

R2.1.2

preparation mode?

R2.1.2.1

generate new objectR2.1.2.2

re-use existing object

1 - PGenRef

PGenRef

R2.1.1

preparation mode?

R2.1.1.1

generate random valueR2.1.1.2

select predefined value

PGenBasic 1- P

GenBasic

Figure 2.1: Overview of the or-strategy.

Original AutoTest: The or-Strategy - Object Preparation 13

Determine failure/success of test case. AutoTest needs an oracle in orderto decide if a test case failed or succeeded. Being an automatic testing tool itcannot rely on manual specifications by the user. It works by leveraging theinformation included in contract-equipped classes. A more thorough explanationof contract-based testing is given in Section 2.3.

Even though the or -strategy seems well-suited for the majority of classesit suffers from a key limitation: it has difficulties to test methods with strongpreconditions. AutoTest will run a lot of test cases without ever passing thepreconditions, thereby generating many invalid test cases and wasting resources.This limitation is exposed in Section 2.4.

2.2 Object Preparation

For each test case invocation AutoTest feeds randomly generated target andargument objects to the method under test. It distinguishes between the gen-eration of basic (primitive) types and the generation of reference types.

Whenever an object of a basic type is required, AutoTest either generates arandom value within the type’s boundaries (with probability PGenBasic), or se-lects a value from the type’s predefined value set (with probability 1−PGenBasic).For instance INTEGER has the value set {{INTEGER_32}.min value, -100, -10, -9, -8,-7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 100, {INTEGER_32}.max value}.

The possibility to choose from a set of predefined values is crucial as inmost programs some input values are known to be more important than others.However choosing from predefined values partitions the testing space and biasesthe purely random nature of the strategy in favor of a better performance. Anadequate value for PGenBasic is 0.75, meaning that predefined values are chosenwith probability 0.25. Experiments have shown that biasing randomness withthis probability goes without loss of generality (see below).

Whenever an object of a reference type is required1, AutoTest either createsa new instance of a conforming type (with probability PGenRef ), or randomlyselects an existing object (with probability 1− PGenBasic).

AutoTest relies on creation procedures for the generation of new objects(as opposed to first allocating the memory for the object and then setting upthe necessary memory bits). Creating a new instance of a reference type istherefore equivalent to testing one of the type’s creation procedures, and theiterative steps of method selection, object preparation and test case invocationare applied recursively.

The random selection of existing objects is easy because AutoTest storesevery created object inside the object pool. This pool is filled and diversified astesting proceeds. After the method under test terminates, the objects used forits invocation are returned to the pool, possibly in a modified state.

The iterative process of choosing an existing object, invoking a method on

1Note that this is the case for at least the target object of the method under test, but alsofor any argument of reference type.

14 Original AutoTest: The or-Strategy - Contract-Based Testing

it and returning the possibly modified object to the pool creates a very diverseset of potential input objects. The more diverse the set of potential inputobjects, the higher the chances of finding faults. The re-use of existing objectsis therefore crucial for uncovering more faults.

The decision diagram of the object preparation process is shown as [R2] inFigure 2.1.

The probabilities PGenBasic and PGenRef are both fixed to 0.25 in our ex-periments as prior work[15] showed that these settings yield the best results.

2.3 Contract-Based Testing

Completely automated testing of a program is only possible, 1) if the softwarespecifications are readily available; and 2) if the conformance of the implemen-tation can be checked automatically. Contracts fulfill both these requirements:they are integrated with the source code and contain the intended semantics ofthe software. Hence contracts can be used as an oracle (the part of the testingsystem that decides if a test case has passed or failed). Since AutoTest cannotcheck any functionality that is not specified in the contracts, the quality of thetest results is highly dependent on the quality (correctness and completeness)of the contracts.

Based on this insight testing a method becomes a succession of the followingsteps:

1. Prepare the target and argument objects.

2. If the method’s preconditions are satisfied, execute its body.

3. If a contract is violated (in the body of the method under test, or furtherdown the call chain) we have found a fault. If no such violation occurs,the test case has passed.

Example. Suppose we want to test method sqrt shown in Listing 2.1. In orderto invoke it we must first prepare a target object (of reference type MATH_EXAMPLE)and an argument object (of basic type REAL). Assume the strategy decided tocreate a new target object and to randomly select a value within the boundariesof REAL for the argument object. If the random generator provides value -1for argument x, the precondition is violated and the method’s body cannotbe execute. This is referred to as an invalid test case. If however the randomgenerator returns value 9 for argument x, the precondition is satisfied (it is a validtest case) and we execute the method’s body. Once we pass the preconditionany contract violation corresponds to a fault, regardless of whether it occurredinside sqrt or in another method called by it.

Autotest performs all these operations under the hood. A user only needsto specify MATH_EXAMPLE as the class under test, and the methods of the class willbe tested in the manner illustrated by the example above.

Original AutoTest: The or-Strategy - Difficulty With StrongPreconditions 15

Listing 2.1: Eiffel: Example of contract-based testing.

class MATH_EXAMPLE

...

sqrt (x: REAL): REAL is

require

x >= 0

do

-- Implementation

...

ensure

abs (Result * Result - x) <= epsilon

end

...

end

2.4 Difficulty With Strong Preconditions

A key limitation of the traditional AutoTest strategy is its difficulty to testfeatures with strong preconditions. A purely random selection of target andargument objects has a small chance of satisfying a strong precondition. Testingwill inherently fail most of the time, thereby generating a lot of invalid test casesand wasting resources.

In order to clarify the meaning of a strong precondition let us examine a fewexamples. Listing 2.2 shows the weakest possible precondition because True isalways satisfied. A method equipped with the weakest precondition is equivalentto one without any precondition (of course assuming the rest of the method isunchanged). Therefore the method can always be tested.

The method in Listing 2.3 is a weak precondition: a new randomly createdobject of type ARRAY is usually empty and satisfies the precondition.

Listing 2.2: Eiffel: Weakest precon-dition.

class PRECONDITION_EXAMPLE

...

weakest_precondition is

require

True

do

...

end

...

end

Listing 2.3: Eiffel: Weak precondi-tion.


...

weak_precondition (a: ARRAY)

is

require

a.is_empty

do

...

end

...

end

16Original AutoTest: The or-Strategy - Difficulty With Strong

Preconditions

Listing 2.4: Eiffel: Strong precondi-tion.


...

strong_precondition (a, b:

INTEGER; c: ARRAY) is

require

a >= 0 and b >= 10

c.count = 2

do

...

end

...

end

Listing 2.5: Eiffel: Strongest precon-dition.


...

strongest_precondition is

require

False

do

...

end

...

end

A strong precondition is shown Listing 2.4. Each of the three assertionsmust hold in order for the precondition to be satisfied. The chances of successare very slim with arguments being prepared as explained in Section 2.2.

Listing 2.5 shows the strongest possible precondition because False is neversatisfied. Therefore the method can never be tested.

Usually the strength of preconditions increases with the quality of the con-tracts. AutoTest relies on high-quality contracts to gain knowledge of the soft-ware’s specifications and leverages this knowledge to find faults2. So classes ofinterest to AutoTest often have strong preconditions.

2Recall that faults are exhibited by contract violations.

Chapter 3

Precondition SatisfactionBy Smart Object Selection:The ps-Strategy

3.1 Overview

An analysis of the object pool revealed that in many cases object combinationssatisfying a given precondition do exist, but that frequently the or -strategyrandomly chooses a non-satisfying object combination instead. For strong pre-conditions the set of satisfying object combinations is small compared to thesize of the object pool, so or ’s chances of selecting objects from that set areslim. Hence the or -strategy often fails to generate a valid test case and manymethods remain untested (as detailed in Section 2.4).

A solution to this limitation must focus on maximizing the probability thatthe selected object combination is satisfying, ideally without any performancedecrease over the original approach. We propose an improvement of the or -strategy centered around a smarter object selection mechanism in order tosatisfy preconditions with a higher probability. It will be referred to as theps-strategy or the ps-mode (as opposed to the or -strategy or the or -mode).

This chapter details our proposed improvement to AutoTest, which we willrefer to as the ps-strategy or the ps-mode (as opposed to the or -strategy or theor -mode). It enhances the process of object preparation, while all the othersteps of the original AutoTest algorithm remain unchanged.

3.2 Smart Object Selection

In order to select objects from the object pool that satisfy the precondition ofthe next method under test, a predicate valuation pool is built on top of theobject pool. For every precondition predicate appear in the classes under test

17

18Precondition Satisfaction By Smart Object Selection: The

ps-Strategy - Smart Object Selection

(CUT), the predicate evaluation pool stores which object combinations in theobject pool satisfy that predicate. When a feature with precondition is to betested, objects that satisfy the feature’s precondition indicated by the predicatevaluation pool are selected to perform the call.

From our experiences, preconditions containing linear constraints are espe-cially difficult to satisfy by a random testing strategy because the chance that arandomly selected integer satisfies the constraints is very small. In order to makethe algorithm more effective, we used the lpsolve linear programming solver forpreconditions with linear constraints.

Listing 3.1: Eiffel: Feature with pre-conditions.

class FOO

...

bar (a, b: INTEGER) is

require

p1: a >= 0

p2: b >= 2 and b <=

count

p3: b < a + min_index

do

...

end

...

end

Listing 3.2: Lpsolve: Proof obliga-tion.

min: a_arg_2;

/* FOO.bar */

a_arg_1 >= 0;

a_arg_2 >= 2;

a_arg_2 <= 43;

a_arg_2 < a_arg_1 + 3;

-512 <= a_arg_1 <= 512;

-32768 <= a_arg_2 <= 32767;

a_arg_1 = 10;

/* placeholder_a_arg_2 */

int a_arg_1;

int a_arg_2;

For an linearly constrained integer variable appearing in precondition as-sertions, lpsolve will give a minimal and a maximal value (if any). These twoextreme values defines a range for all possible values for that integer variable.AutoTest then will randomly choose a value from this range.Although a mini-mal and a maximal value exist for a model does not mean that all the valuesbetween them are also valid values for that model, we decided to treat as if itis the case. Because if this is not true and AutoTest chooses a value whihc isinvalid for the model, the result will be a precondition violation for the featureunder test, which is quite normal in random testing hence is acceptable.

We introduced two biases in choosing an integer value from the range definedby a lpsolve solution by selecting the following values at a highly possibility1:

• Values that are also in the predefined set for integers are selected at apossibility of 0.25, because our previous work(citation) showed that whenvalues from the predefined set are used with possibility 0.25, AutoTest canfind the most faults.

• Border values are selected at a possibility of 0.125, because partition test-ing studies (citations) showed that border values are more likely to revealfaults. We do not know if 0.125 is the best possibility, we will try otherpossibilities in future work.

1These possibilities are provided through command line options to AutoTest.

Precondition Satisfaction By Smart Object Selection: Theps-Strategy - Predicate Valuation Pool Building 19

3.3 Predicate Valuation Pool Building

After every successful test case execution, precondition predicates whose signa-ture conform to the relevant objects (target, possibly arguments and returnedobject) involved in the execution are evaluated and their evaluatoin result iskept in the predicate evaluation pool. Put another way, the predicate eval-uation pool keeps a snapshot of which prediates hold on which objects at acertain time. As the test process goes on, the objects in the object pool mayget changed because AutoTest reuses existing objects for new method calls. Asa consequence, the information stored in the predicate evaluation pool may be-come wrong, meaning that it indicates that certain objects satisfy a predicatebut they do not anymore. Because we do not know the set of objects that areimpacted by the last method call, the only way to keep the valuation pool soundis to reevaluate the whole pool, which is too expensive for a practical use. In-stead, we always assume that the predicate evaluation pool is sound, and use theobject combinations suggested by the pool to feed to the test case for a methodwith preconditions. Only if the test case ended with a precondition violation,the corresponding entry in the predicate evaluation pool is updated, that is,the pool is corrected in a post mortem fashion. As long as the sucessful rate ofthe objects suggested by the pool actually satisfying the desired precondition ishigh enough, the algorithm can still be effective.

3.4 Optimizations

Searching the predicate valuation pool for objects satisfying certain precondi-tions needs time, and the size of the predicate valuation pool grows during thetesting process. This means the longer the test run, the longer it takes to find asuitable object combination satisfying the desired precondition. The situationbecomes worse when there is linear constraint in the precondition, because asdescribed before, a file needs to be generated for each model to solve. For someclasses with many precondition containing linear constraints, always enforceprecondition satisfaction can slow down the testing by 80%, which means muchless time (than original random testing) is spent in real testing. Although thisway, methods with preconditions have better chances to be tested, the overalleffectiveness of the test process decreases, reflected by the fact that much lessfaults are found within the same period of testing.

One way to reduce the overhead due to precondition satisfaction is to reducethe number of times that the precondition satisfaction searching is performed.This means for a method with precondition, some times the precondition search-ing part is ignored and AutoTest will resort to the original way of randomselecting objects. Our previous work showed that for many methods with pre-conditions, AutoTest can test them, but with relatively low chance. So turnoff precondition satisfaction from time to time should not endanger the wholetesting process. We introduced a probability function to decide when to turnon the precondition satisfaction for a feature f :

Pps = 1− Tlast test time

Tduration× Fact (3.1)

20Precondition Satisfaction By Smart Object Selection: The

ps-Strategy - Optimizations

where Tlast test time is the number of second relative to the starting of currenttest run when f is tested for the last time. If f is never tested, Tlast test time

is 0. Tduration is the length in second of the test run so far. Fact is a factorin range 0 to 1. in the experiements, Fact is set to 0.8 so AutoTest generatesroughly the same number of test cases during the same period(the reason isexplained below) with and without precondition satisfaction algorithm turnedon, so we can compare the effectiveness of the testing strategy both in time andin number of test cases.

This function makes sure that the precondition satisfaction algorithm isturned on for a precondition equipted method f evenly often during throughoutthe test run: if f has not been tested for a long time, the faction part of theformula becomes very small because Tduration keeps increasing as the testinggoes on while Tlast test time stays the same. As a result, the overall possibility ofthe precondition satisfaction algorithm being turned on goes up; if f has beentested quitely recently, the possibility goes down because the faction part of theformula will be large.

The benefits of this prossibility function is three fold: 1) our experimentsshowed that on one hand, this function can reduce the overhead dramatically.2) precondition equipted methods are tested quite often. This is a nice property,becuase AutoTest does not guarantee that every time a method is tested, it istested in a different state, so the more often a feature is tested, the more possiblethat the method is tested in a different state. 3) Precondition equipted methodsare tested through out the whole test run. This is also a nice property becauseas the test continues, the object pool becomes more diversified in the sense thatthere are more and more objects (possibly with different states) in the objectpool, so if a method is tested late in the test run, it may be tested in differentstates.

Another optimization we introduced is for the linear constraint solving, themost time consuming part in the precondition satisfaction algorithm. Everytime when the lpsolve solves a model, the model as well as its solution is cached.The next time, before send the model to lpsolve, the algorithm first consults thecache to see if the same model has been solved already, if so, no need to startthe expensive solving work.

These two optimizations combined, there was alomost no overhead anymore(on average, 0.5% overhead). For some classes, AutoTest can even generatesmore tests in the same period of time.

Chapter 4

Implementation

4.1 User Interface And Settings

We tried to minimize interference with the original AutoTest and wanted theability to switch back and forth between both strategies. We therefore decidedto control the ps-strategy with switches and options through the AutoTest com-mand line interface.

The original AutoTest had already some options to control its behavior,therefore support for interfacing with the ps-strategy was existent. This is doneby adding new parsing instructions in the AUTO_TEST_COMMAND_LINE_PARSER class,which is responsible for reading from the command line interface. The newswitches and options relevant to the ps-strategy can be found in Table 4.1.

Table 4.1: Command line switches and options for the ps-strategy.

Major Options

--state <arg> Enables object state monitoring. The argument isa comma-separated list of name=value pairs. Thesupported parameters are target=true/false,argument=true/false and result=true/false.

-p/--precondition Enable precondition evaluation before feature call.

--cs Enable linear constraint solving for integers.

--linear-constraint-solver <arg> Set the linear constraint solver to be used with--cs. May be either smt, lpsolve or a comma-separated list of both. In the case of both, lpsolvetakes precedence and falls back to smt on failure.Defaults to smt.

Performance Parameters

--use-random-cursor Enables randomized searching in predicate pool.Default is to search linearly.

Continued on next page. . .

21

22 Implementation - Linear Constraint Solver: lpsolve

Table 4.1 – Continued

--max-precondition-tries <arg> Maximum number of tries to search for a satisfyingobject combination. 0 means unlimited. Defaultsto 1 if --use-random-cursor is specified, to 100otherwise.

--max-precondition-time <arg> Maximum time spent on searching for a satisfyingobject combination. 0 means unlimited.

--max-candidates <arg> Maximum number of satisfying candidates. 0means unlimited. Defaults to 100.

--ps-selection-rate <arg> Threshold under which to enable smart object se-lection for precondition satisfaction.

--integer-bounds <arg> Lower and upper bounds for integer arguments tobe solved by a linear constraint solver. The ar-gument has the form lower,upper. Defaults to-512,512.

The first part of Table 4.1 shows the major switches to the ps-strategy. Theyare listed in an increasing order, in the sense that a given switch requires all theones above to be specified as well.

The second part of the table serves only for performance tuning and analysis.The ps-strategy will run fine with the defaults.

Examples. Assuming a project defined in file project.ecf with target project,a typical run of AutoTest using the or -strategy is invoked like this:

Listing 4.1: Shell: A typical invocation of AutoTest in or-mode.

> ec -config project.ecf -target project -auto_test -t 5 ARRAY

LINKED_LIST

while a typical run for ps is invoked like this:

Listing 4.2: Shell: A typical invocation of AutoTest in ps-mode.

> ec -config project.ecf -target project -auto_test -p --cs --use -

random -cursor -t 5 ARRAY LINKED_LIST

4.2 Linear Constraint Solver: lpsolve

Given the complexity and error-proneness of developing our own linear con-straint solver, we decided to use existing (free) software. They are well-testedby a large user community and in most cases already highly optimized. Wesettled on Z3[14] and lpsolve[11].

Z3 cannot compute minimum and maximum solutions for a given proof obli-gation, but only returns a single possible solution (which often happens to be

Implementation - Linear Constraint Solver: lpsolve 23

the minimum). In contrast to this lpsolve may compute minimum and maxi-mum solutions, though only on a single variable. In addition it has a smallerset of operators than Z3, for example the not operator is missing.

To take the best of both worlds we decided to make the linear constraintsolver selectable by command line options (see Section 4.1), with the possibilityto select both at the same time. What happens then is that lpsolve takesprecedence, but in case of failure (illegal file format) AutoTest falls back on Z3.This can especially happen when preconditions include operators not understoodby lpsolve.

Another reason for making lpsolve the solver of choice is the list of supportedplatforms. While Z3 is only available for Windows, lpsolve can be run on anyplatform that EiffelStudio supports.

Yet another reason lies in the fact that Z3 is only available as command lineexecutable binary, which means a lot of overhead incurred by a separate processfor every proof obligation.

In contrast lpsolve can be used in a variety of ways: as graphical envi-ronment, command line executable binary or library for many programminglanguages. The version as statically linked C library is what for target for, asit integrates nicely into the Eiffel build process (that is because Eiffel translatesto C and then compiles). We can thus make lpsolve a part of our EiffelStudioexecutable (remember, AutoTest is a part of EiffelStudio) and gain a significantspeed-up compared to using the solver as executable on its own.

Let us now look further into the implementation of linear constraint solvingusing lpsolve1. Figure 4.1 shows an overview of the process. We will explainthe process by means of the example in Listing 4.3.

Listing 4.3: Eiffel: Preconditions to solvewith lpsolve.

class FOO

...

bar (a, b: INTEGER) is

require

p1: a >= 0

p2: b >= 2 and b <= count

p3: b < a + min_index

do

...

end

...

end

We start by generating a proof obligation (steps [S1] and [G1-G5] in the1It should be noted at this point that the author did not have any exposure to the in-

terfacing with Z3 and its implementation. Therefore Z3 will not be handled in detail in thisthesis.


S3

loop over constrained operands

S3.1

solve lower bound

S3.2

solve upper bound

S3.3

create abstract integer

S3.4

get concrete integer

S3.5

insert concrete integer into list of valuations

S3.6

replace bound variables placeholder with

valuation

S1

generate proof obligation

S2

replace constraining queries with actual value

S4

return list of valuations

AUT_LP_BASED_LINEAR_CONSTRAINT_SOLVER AUT_LPSOLVE_CONSTRAINT_SOLVER_GENERATOR

G1

write feature comment

G2

write constraints

G4

write bound variables placeholder

G5

write variables declararions

G3

write boundaries:1% {INT_16} boundaries

99% CLI options

Figure 4.1: Linear constraint solving using lpsolve.


diagram) as shown in Listing 4.4. Let us go through it line by line:

1: The name of the feature the proof obligation was generated from. [G1]

2: The constraint generated from precondition p1. The argument variable b

became lpsolve variable a_arg_2. The number in the new variable name isderived from its position in the list of arguments. [G2]

3-4: The constraints generated from precondition p2. By default constraintsare ANDed by lpsolve, meaning we simply split conjunctions into multiplelines. The query count was not yet been replaced by its actual value. Theargument variable a became lpsolve variable a_arg_1. [G2]

5: The constraint generated from precondition p3. Again the query min_index

was not yet been replaced by its actual value. [G2]

7-8: The boundaries for each variable. With probability 0.01 we choose theboundaries of INTEGER_16 as with a_arg_2, with 0.99 those passed in com-mand line options (see --integer-bounds in Table 4.1) as with a_arg_1.[G3]

10-11: The placeholders for already bound variables. They will be replaced in step[S3.6] in case of multi-variable preconditions. These additional constraintsare necessary to compute more precise minimum and maximum values.[G4]

13-14: The variables declaration used by lpsolve. [G5]

Once we have the initial proof obligation we replace the queries found inthe constraints with their actual values (step [S2]). In Listing 4.5 count andmin_index were replace by 43 and 3 respectively, assuming these are the actualvalues of the queries.

Listing 4.4: Lpsolve: Initial proofobligation for preconditions of featurebar (Listing 4.3).

/* FOO.bar */

a_arg_1 >= 0;

a_arg_2 >= 2;

a_arg_2 <= count;

a_arg_2 < a_arg_1 + min_index;

-512 <= a_arg_1 <= 512;

-32768 <= a_arg_2 <= 32767;



int a_arg_1;

int a_arg_2;

Listing 4.5: Lpsolve: Proof obliga-tion after replacement of constrainingqueries by actual values.

1 /* FOO.bar */

2 a_arg_1 >= 0;

3 a_arg_2 >= 2;

4 a_arg_2 <= 43;

5 a_arg_2 < a_arg_1 + 3;

6

7 -512 <= a_arg_1 <= 512;

8 -32768 <= a_arg_2 <= 32767;

9

10 /* placeholder_a_arg_1 */


12

13 int a_arg_1;

14 int a_arg_2;

The proof obligation is now ready for solving. We iterate over all variablesand solve each for minimum and maximum value (steps [S3.1-S3.2]) through


calls to the low-level C library of lpsolve. The corresponding lpsolve proofobligations for solving variable a_arg_1 are shown in Listings 4.6 and 4.7, noticethe min and max objectives in line 1.

Listing 4.6: Lpsolve: Proof obligationwith min objective.

min: a_arg_1;

/* FOO.bar */

a_arg_1 >= 0;

a_arg_2 >= 2;

a_arg_2 <= 43;

a_arg_2 < a_arg_1 + 3;

-512 <= a_arg_1 <= 512;

-32768 <= a_arg_2 <= 32767;



int a_arg_1;

int a_arg_2;

Listing 4.7: Lpsolve: Proof obligationwith max objective.

1 max: a_arg_1;

2

3 /* FOO.bar */

4 a_arg_1 >= 0;

5 a_arg_2 >= 2;

6 a_arg_2 <= 43;

7 a_arg_2 < a_arg_1 + 3;

8

9 -512 <= a_arg_1 <= 512;

10 -32768 <= a_arg_2 <= 32767;

11



14

15 int a_arg_1;

16 int a_arg_2;

Following the example lpsolve would compute a min value of 0, and a max

value of 512 for variable a_arg_1. We then create an abstract integer of typeAUT_ABSTRACT_INTEGER (step [S3.3]). In fact an abstract integer is nothing morethan a range with a lower and an upper bound, but it encapsulates the func-tionality to semi-randomly generate a concrete integer inside that range (step[S3.4]). The task of generating the concrete integer is detailed below.

Let’s assume that in this particular example the generated concrete integeris 10. It becomes the actual value for the variable a_arg_1 and we store thisinformation into the list of valuations (step [S3.5]).

In order to correctly solve the min and max objectives for the other variablesin the proof obligation, we need to let lpsolve know the values of already boundvariables. This is done by replacing the afore-mentioned placeholders (see step[G4]) with direct value assignments (step [S3.6]). The updated proof obligationis shown in Listing 4.8.


Listing 4.8: Lpsolve: Proof obligationafter placeholder replacement.

/* FOO.bar */

a_arg_1 >= 0;

a_arg_2 >= 2;

a_arg_2 <= 43;

a_arg_2 < a_arg_1 + 3;

-512 <= a_arg_1 <= 512;

-32768 <= a_arg_2 <= 32767;

a_arg_1 = 10;


int a_arg_1;

int a_arg_2;

Listing 4.9: Lpsolve: Proof obligationwith min objective on second iteration.

1 min: a_arg_2;

2

3 /* FOO.bar */

4 a_arg_1 >= 0;

5 a_arg_2 >= 2;

6 a_arg_2 <= 43;

7 a_arg_2 < a_arg_1 + 3;

8

9 -512 <= a_arg_1 <= 512;

10 -32768 <= a_arg_2 <= 32767;

11

12 a_arg_1 = 10;


14

15 int a_arg_1;

16 int a_arg_2;

At this point the loop iteration for variable a_arg_1 is finished and we repeatthe process with the second (still unbound) variable a_arg_2. Listing 4.9 showsthe proof obligation with the min objective for variable a_arg_2. For completenessof the example let us note that lpsolve computes a min value of 2, and a max valueof 13 for variable a_arg_2. The corresponding actual value can be anything inthat range and will again be generated semi-randomly by the abstract integer.

After iterating through the constrained operands there are no more unboundvariables. The mapping between variable names and actual values is stored inthe list of valuations. We are left with returning that list and have successfullysolved the linear constraints in the preconditions of feature FOO.bar (step [S4]).

Abstract Integers. Figure 4.2 shows the generation process of a concreteinteger, based on the abstract integer with boundaries.


AUT_ABSTRACT_INTEGER

A2

choose from predefined values?

25% YES

A2.1

choose random item from predefined values (inside

abstract range)

A2.2

choose from range:12.5% lower bound12.5% upper bound75 % random item

75% NO

A0

Creation withlower >= upper bound

A1

random_element

A3

return concrete integer

Figure 4.2: Generation of a concrete integer from the abstract integer.

Chapter 5

Evaluation

5.1 Test Setup

We compared both strategies on a set of 55 classes from EiffelBase[4] (see Ta-ble 5.1 for the complete list). The choice was by no means random as theseclasses were already evaluated in a previous paper[19] involving testing. Ourparticular interest to test those exact same classes stems from the insight thatwhere there are bugs, more bugs are expected. Furthermore we already knewthese classes, making it easier to interpret the results.

We ran the tests on development snapshots of the classes from ELKS[7] revi-sion 356[8] and from Gobo[9] revision 6661[10]. These are fairly recent snapshotsand of course free of any artificially injected bugs. It clearly shows that regard-less of which strategy performs best, they both find faults in production-levelsoftware.

Table 5.1: The 55 classes chosen for evaluation.

ACTIVE LIST LEX BUILDERARRAY LINKED AUTOMATONARRAYED CIRCULAR LINKED CIRCULARARRAYED LIST LINKED CURSOR TREEARRAYED QUEUE LINKED DFAARRAYED SET LINKED LISTARRAYED TREE LINKED PRIORITY QUEUEBINARY TREE LINKED SETBOUNDED QUEUE LINKED TREECOMPACT CURSOR TREE LX DFADS ARRAYED LIST LX LEX SCANNERDS AVL TREE LX NFADS BILINKED LIST LX PROTO QUEUEDS BINARY SEARCH TREE LX REGEXP PARSERDS BINARY SEARCH TREE SET LX REGEXP SCANNERDS HASH SET LX SYMBOL CLASSDS LEFT LEANING RED BLACK TREE LX TEMPLATE LISTDS LINKED LIST MULTI ARRAY LISTDS LINKED QUEUE PART SORTED SET

Continued on next page. . .

29

30 Evaluation - Testing More Features

Table 5.1 – Continued

DS LINKED STACK PART SORTED TWO WAY LISTDS MULTIARRAYED HASH SET SORTED TWO WAY LISTDS MULTIARRAYED HASH TABLE SUBSET STRATEGY TREEDS RED BLACK TREE TWO WAY CIRCULARFIXED DFA TWO WAY CURSOR TREEFIXED TREE TWO WAY LISTHIGH BUILDER TWO WAY SORTED SETKL STRING TWO WAY TREELEXICAL

We ran AutoTest twice on each of the chosen classes, once using the originalstrategy (or -mode) and once using the one enhanced with precondition satisfac-tion (ps-mode). Each combination was tested 30 times for 1 hour. We spreadthe testing load on 9 dedicated machines, each with an Intel Pentium 4 CPU at3 GHz and 1 GB of RAM running Red Hat Enterprise Linux 5.3..

This adds up to 15 days of continuous testing in order to generate the logfiles used in this analysis. In order to have sound results we launched this 15-daytest session from a fresh start for every modification done to the source code ofeither the or - or the ps-strategy.

5.2 Testing More Features

The ps-strategy’s primary goal is to test more methods. The main reason whythe original approach failed to test a given feature is because of its strong pre-conditions: AutoTest tried to test the feature but selected target and argumentobjects that never passed the preconditions. The chance of a randomly selectedobject combination to satisfy a precondition declines steeply with increasingconstraints.

By choosing objects in a smarter way, hence by going from a purely random-ized to a smarter object selection, we stand a good chance of satisfying morepreconditions. Thereby we can hope to effectively test more features.

Since all precondition-free methods1 are already fully testable by the originalstrategy, we restrict the evaluation to the precondition-equipped ones.

To further partition the method space we introduce the notion of featurehardness. The hardness of a feature f is given as follows:

H(f) :=NInvalid TC(f)NTotal TC(f)

, (5.1)

where NInvalid TC(f) is the number of invalid test cases generated for featuref, and NTotal TC(f) is the total number of generated test cases for f. We recallthat a test case is said to be invalid if it fails to satisfy the preconditions of themethod under test.

1The trivial True precondition is equivalent to an absence of precondition

Evaluation - Testing More Features 31

−10 −5 0 5 10 15 20 25 30 35 400

1

2

3

4

5

6

7

8

9

% increase in number of tested precondition−equipped features (NEW)

Num

ber

of cla

sses

Figure 5.1: Class distribution by increase in number of tested methods.

We define the set of hard-to-test features (or simply hard features) as thosehaving a hardness score greater than a fixed threshold. For this evaluation weset the threshold to 0.9, meaning that a method is hard to test if at least 90%of the test cases generated by AutoTest are invalid.

Though intuitively clear it should nevertheless be stated explicitly that everyhard feature has at least one non-trivial precondition. The set of hard featuresis therefore a subset of the precondition-equipped features.

The evaluation of the tested methods will follow a top-down approach andbegin with the more general set of precondition-equipped features. At this pointwe are only interested in whether we can test more methods in general. Oncethis is decided we shift our focus to the hard features and assess the quantitativequestion.

Precondition-equipped features. The set of total faults we could poten-tially discover grows with the test coverage of the method space. The ability totest more features is therefore of primal importance to the ps-strategy.

Figure 5.1 shows the ps-strategy’s performance in testing more methods.The height of the bars corresponds to the number of classes and the x-axisrepresents the percental increase in the number of feature that could be newlytested. The exact numbers for each class can be found in Table 5.2.

32 Evaluation - Testing More Features

Table 5.2: Test coverage of the method space, sorted by the percental increase.

Class name #a or %b ps %c incr %d

LEX BUILDER 228 35.09 49.12 40.00LX PROTO QUEUE 155 40.65 56.77 39.68HIGH BUILDER 231 36.36 50.65 39.29LX TEMPLATE LIST 166 43.37 55.42 27.78LX SYMBOL CLASS 138 48.55 60.14 23.88FIXED TREE 40 52.50 65.00 23.81DS HASH SET 101 44.55 53.47 20.00LINKED TREE 96 63.54 76.04 19.67LX NFA 226 53.54 63.27 18.18DS LEFT LEANING RED BLACK TREE 109 37.61 44.04 17.07DS LINKED LIST 190 62.11 72.63 16.95DS BILINKED LIST 187 61.50 71.66 16.52DS MULTIARRAYED HASH SET 100 46.00 53.00 15.22LX DFA 125 60.00 68.80 14.67ACTIVE LIST 57 66.67 75.44 13.16DS BINARY SEARCH TREE 95 41.05 46.32 12.82DS ARRAYED LIST 185 63.78 71.89 12.71DS RED BLACK TREE 105 38.10 42.86 12.50DS LINKED QUEUE 39 41.03 46.15 12.50MULTI ARRAY LIST 53 64.15 71.70 11.76TWO WAY TREE 99 63.64 70.71 11.11LINKED LIST 74 52.70 58.11 10.26DS MULTIARRAYED HASH TABLE 99 61.62 67.68 9.84LINKED SET 61 70.49 77.05 9.30LX LEX SCANNER 100 43.00 47.00 9.30TWO WAY CIRCULAR 54 66.67 72.22 8.33LINKED CIRCULAR 54 66.67 72.22 8.33LINKED DFA 166 52.41 56.63 8.05ARRAYED TREE 74 67.57 72.97 8.00PART SORTED SET 56 44.64 48.21 8.00PART SORTED TWO WAY LIST 82 46.34 50.00 7.89LX REGEXP PARSER 109 39.45 42.20 6.98KL STRING 100 76.00 81.00 6.58TWO WAY LIST 92 53.26 56.52 6.12LINKED CURSOR TREE 57 63.16 66.67 5.56COMPACT CURSOR TREE 57 63.16 66.67 5.56LINKED AUTOMATON 62 61.29 64.52 5.26LX REGEXP SCANNER 97 40.21 42.27 5.13SORTED TWO WAY LIST 86 50.00 52.33 4.65TWO WAY SORTED SET 91 52.75 54.95 4.17FIXED DFA 180 66.67 68.89 3.33TWO WAY CURSOR TREE 57 63.16 64.91 2.78ARRAYED CIRCULAR 54 70.37 72.22 2.63ARRAYED LIST 65 72.31 72.31 0.00ARRAYED SET 31 80.65 80.65 0.00DS BINARY SEARCH TREE SET 75 74.67 74.67 0.00DS AVL TREE 105 46.67 46.67 0.00LEXICAL 23 39.13 39.13 0.00BINARY TREE 38 63.16 63.16 0.00BOUNDED QUEUE 21 61.90 61.90 0.00ARRAYED QUEUE 19 68.42 68.42 0.00LINKED PRIORITY QUEUE 49 40.82 40.82 0.00SUBSET STRATEGY TREE 68 38.24 36.76 -3.85ARRAY 56 76.79 69.64 -9.30DS LINKED STACK 40 52.50 47.50 -9.52

a Number of precondition-equipped methods in class.b Test coverage of method space by the or -strategy (in %).c Test coverage of method space by the ps-strategy (in %).d Percental increase in test coverage between both strategies.

Evaluation - Finding More Faults 33

The big majority of classes (43 out of 55, 78.2% of the classes under test)is located on the positive side of the 0% mark, which is very encouraging. pssucceeded in testing many new methods and is thus increasing the chances ofdiscovering new faults.

Only 3 out of 55 classes reveal a decrease in the number of tested methods.There is no need for concern with under 5.5% of the classes under test fallinginto this category – yet we are still short of an explanation and will focus on itin a future analysis.

A increase of 0% indicates that ps did not succeed in testing more methods,but did not miss those already covered by or either. 9 out of 55 classes (16.4%of the classes under test) fall into this category. The hypothesis that for some ofthese classes or already succeeded in testing all the methods (total test coverageof the method space) and that therefore ps cannot possibly improve coverage isproven wrong by the numbers in Table 5.2. Determining the exact cause of thisobservation will be investigated in a future work.

Hard features. Intuition dictates that the chance of discovering faults in-creases with the number of times a method has been tested (of course assumingdifferent inputs for each test case). The number of valid test cases is therefore arelevant measure for evaluation the performance of the ps-strategy. Even moreso if we focus only on hard features2.

Figure 5.2 shows the ps-strategy’s performance in generating valid test casesfor hard features. Each bar represents a hard method. The blue part cor-responds to the number of valid test cases generated for that method by theor -strategy, and the red part corresponds to those generated by the ps-strategy.

There are approximately 950 hard features in the 55 classes under test,however the or -strategy generates valid test cases only for about 550 of them(58%). In other words this means that for 400 methods (42%) the ps-strategysucceeded in generating at least one valid test case where or failed.

A comparison between the bigger red area with the smaller blue one clearlydemonstrates ps’s ability to generate a lot more valid test cases than the or -strategy. However it should be noted that the figure shows only the hard-to-testmethods, which make up for a minority in the set of all methods in the 55 classesunder test. We cannot extrapolate the shown results to all methods in general,as the ps-strategy’s advantage at satisfying strong preconditions is lost withmethods of lower hardness.

5.3 Finding More Faults

The ps-strategy’s ultimate goal is to find more faults than the traditional or -strategy. Following on the findings of Section 5.2 we can formulate the followingconjecture: the ps-strategy can test more methods and generate a lot more valid

2The hardness threshold is set to 0.9, meaning the or -strategy generates invalid test casesfor a hard method at least 90% of the time.

34 Evaluation - Finding More Faults

0 100 200 300 400 500 600 700 800 900 10000

20

40

60

80

100

120

140

Hard−to−test feature

Nu

mb

er

of

va

lid t

est

ca

se

s

or

ps

Figure 5.2: Number of valid test cases for hard-to-test features.

Evaluation - Finding More Faults 35

0 10 20 30 40 50 600

0.2

0.4

0.6

0.8

1

Class under test

Dis

co

ve

ry d

istr

ibu

tio

n o

f d

istin

ct

fau

lts

both

or

ps

Figure 5.3: Discovery distribution of distinct faults for the classes under test.

test cases for hard-to-test methods than the or -strategy, therefore it should alsofind more faults.

Moreover, since objects are returned to the pool after test case invocation,testing new methods diversifies the collection of available objects and therebywidens the choice for the state of input objects. This could lead to the discoveryof more faults, even in methods already easily tested by the or -strategy.

Figure 5.3 shows the discovery distribution of distinct faults for the classesunder test. Each bar represents a class. The blue part corresponds to faultsfound by both strategies, the green part to faults found only by or and the redone to faults found by ps.

The higher the blue bars, the bigger the overlap between the faults foundby or and the ones found by ps. Ideally the green parts are very small or evennon-existing, meaning that the ps-strategy uncovers all the faults found by orand, if there is a red part in the bar, even discovers new faults. It is the case in15 out of 55 classes (27.3% of the classes under test).

As long as the red part of the bar is bigger than the green part we are inthe situation where the ps-strategy failed to uncover some of the faults foundby or . However at the end of the day it discovers more new faults and thusperforms better. This is quite often the case.

A red part matching the length of the green part indicates that both strate-gies overlap for some faults (blue part), but that each discovered faults the other

36 Evaluation - Finding More Faults

−30 −20 −10 0 10 20 30 40 500

2

4

6

8

10

12

14

16

18

20

% increase in number of found faults

Nu

mb

er

of

cla

sse

s

Figure 5.4: Class distribution by increase in number of found faults.

missed. Yet in the end both strategies are on a par regarding fault discovery.This case occurs fairly often.

In the less desirable situations the green part is bigger than the red one or,even worse, the red part is non-existing. The former indicates the case whenor finds more faults missed by ps then it misses itself those found by ps. Ithappens less frequently than the other, more desirable, cases detailed above.The latter (no red part) is the result of or uncovering all the faults found by ps,and even finding additional faults. This is the case in 5 out of 55 classes (9% ofthe classes under test), which is may small, yet not insignificant.

The cause for the last two cases (when the ps-strategy performs worse thanthe or -strategy) is not yet fully solved, but bad luck in the random selection is ahypothesis. We will further investigate if it was right to impute the undesirableresults to the random generator by running more and longer test sessions.

The class distribution by percental increase in the number of found faults isshown in Figure 5.4. The majority of classes (81.8% of the classes under test)is above or on the 0% mark, meaning that the ps-strategy discovers at leastas many faults than the or -strategy (but often more). However 10 out of 55classes (18.2% of the classes under test) are located below the 0% mark. Thisundesirable observation is not yet fully explainable and we again shift the blameon a mean random generator.

The two Figures 5.3 and 5.4 confirm the conjecture made earlier: the ps-

Evaluation - Performance 37

0 10 20 30 40 50 60−0.4

−0.3

−0.2

−0.1

0

0.1

0.2

0.3

Duration of test run (minutes)

Rela

tive

speed

All classes

Median of all classes

ARRAY

LEX_BUILDER

Figure 5.5: Speed comparison between or and ps.

strategy can test more methods and generate a lot more valid test cases forhard-to-test methods than the or -strategy, therefore it also finds more faults.An explanation

It is clearly an improvement over the original or -strategy, though not aperfect one. Cases were identified where or performs better than ps. The ps-strategy must thus not (yet) be thought of as a complete replacement of theor -strategy.

5.4 Performance

To find the total set of faults is not difficult given an infinite amount of time.If all possible inputs are tested on all possible states then we will discover anyfaults in the program. But since time is a limited and sparse resource the beststrategy is worth next to nothing if it takes an absurd amount of time to findfaults.

The previous sections established ps’s ability to 1) test more methods, and2) find more faults. While already an encouraging result it has to be weightedagainst the incurred overhead.

Figure 5.5 shows a relative speed comparison between both strategies. Inthis context speed is defined as the number of valid test cases generated over

38 Evaluation - Performance

time. Each yellow line corresponds to a class under test; the one drawn greenis the average (median) over all yellow lines. The two extremes are highlighted:the maximum in blue (class ARRAY) and the minimum in red (class LEX_BUILDER).

Lines in the upper half of the graph mean that the ps-strategy ran faster thanor for that class (speedups). This is the case we aim for; an ideal improvementwould have all lines above zero.

The thought of running faster while doing more work may seem surprising atfirst, but keep in mind that the speed if defined as the number of valid test casesover time. The ps-strategy spends more time in satisfying preconditions, butas a result generates more valid test cases. In contrast to this the or -strategytries more test cases but only few of them are valid for methods with strongpreconditions. Thus the speedup of ps over or increases with the density of hardfeatures in the classes under test. This is shown with the class ARRAY, which thehas biggest speedup of all the tested classes.

Lines in the lower half of the graph represent classes with slowdowns. A linein this area is undesirable, yet oftentimes unavoidable. Consider the case of amethod with very strong (or even unsatisfiable) preconditions. The overheadincurred by the smarter object selection in the predicate pool and the query tolpsolve for solving linear constraints will have a negative impact on the overallspeed; yet ps’s extra effort will have been in vain if the precondition is still notsatisfied. These overheads add up, especially with increasing density of methodswith very strong or unsatisfiable preconditions. The class LEX_BUILDER falls intothis category.

The overall performance of the ps-strategy is acceptable, as shown by thegreen line (median of all classes under test). It is close to zero, meaning theaverage speed of ps is on a par with or .

Chapter 6

Conclusion

6.1 Conclusion

The original AutoTest has difficulties testing hard features. The ps-strategytackles this key limitation by selecting objects in a smarter way to increase theprecondition satisfaction of hard features. The evaluation shows that:

• The ps-strategy generates more valid test cases. Furthermore the or -strategy failed to generate a valid test case for 42% of the hard methods,whereas ps succeeded to do so.

• In 78.2% of the classes under test ps succeeds in testing more methodsthan the or -strategy, but it perform worse in 5.5% of the classes.

• In 30% of the classes under test the ps-strategy finds more faults than or .But in 18.2% of the classes it performed worse.

• The speed of the ps-strategy is comparable to that of the original ap-proach.

Summed up the ps-strategy is not a general replacement for the originalapproach. It does perform better in some areas but worse in others. Howeverit can be used as an additional testing tool in order to find more faults than byonly using the original AutoTest.

6.2 Further Improvements

The evaluation shows encouraging results but a deeper analysis is necessary.As stated above the ps-strategy cannot yet be used as a replacement for theoriginal strategy. Further research will include:

• determine the cause for the worse performance in the number of testedmethods in 5.5% of the classes under test.

39

40 Conclusion - Related Work

• investigate the reason of the unexpectedly high number of classes (18.2%)where the ps-strategy found less faults.

• devise further improvements especially for the linear solver in order toincrease the number of valid test cases even more.

6.3 Related Work

1. Symbolic execution: the pex project[12] by Microsoft Research.

2. Model-based testing: the spec explorer [13] by Microsoft Research.

3. Adaptive random testing[16] developed at the Chair of Software Engineer-ing, ETH Zurich.

Appendix A

Additional Results

A.1 Correlation graphs

−10 −5 0 5 10 15 20 25 30 35 40−30

−20

−10

0

10

20

30

40

50

% in

cre

ase

in

nu

mb

er

of

fou

nd

fa

ults

% increase in number of tested precondition−equipped features (NEW)−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15

−30

−20

−10

0

10

20

30

40

50

% incre

ase in n

um

ber

of fo

und faults

Relative speed

−0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15−10

−5

0

5

10

15

20

25

30

35

40

% incre

ase in n

um

ber

of te

ste

d p

reconditio

n−

equip

ped featu

res (

NE

W)

Relative speed

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1−0.2

−0.15

−0.1

−0.05

0

0.05

0.1

0.15

Mean success rate

Re

lative

sp

ee

d

41

A.2 Faults and success rate by time

0 10 20 30 40 50 60

5

10

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.6

0.7

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

4

6

8

10

12

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.8

0.85

0.9

0.95

1

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

20

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.7

0.75

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

4

6

8

10

12

14

16

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.640.660.680.70.720.740.76

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

5

5.5

6

6.5

7

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.7

0.75

0.8

0.85

0.9

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.6

0.65

0.7

0.75

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

10

20

30

40

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.5

0.6

0.7

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

10

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.85

0.9

0.95

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

2

4

6

8

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.6

0.7

0.8

0.9

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

20

25

30

35

40

45

50

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.120.140.160.180.20.220.24

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

10

20

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.70.720.740.76

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

10

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.9

0.95

1

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

15

20

25

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.680.70.720.740.760.78

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

10

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.90.910.92

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

3

4

5

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.9

0.95

1

PS

su

cce

ss r

ate

ps

or

success_rate

42

0 10 20 30 40 50 60

2

4

6

8

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.85

0.9

0.95

1

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

10

20

30

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.90.920.940.960.98

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

20

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.7

0.75

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.9

0.95

1

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

1

2

3

4

5

6

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.920.930.940.950.960.97

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

2

4

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.90.920.940.96

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.7

0.8

0.9

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

15

20

25

30

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.90.910.920.930.940.950.96

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

10

20

30

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.65

0.7

0.75

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

20

40

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 600

0.5

1

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

10

20

30

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.55

0.6

0.65

0.7

0.75

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.8

0.85

0.9

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

15

20

25

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.35

0.4

0.45

0.5

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

10

20

30

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.55

0.6

0.65

0.7

0.75

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

2

4

6

8

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.45

0.5

0.55

0.6

0.65

PS

su

cce

ss r

ate

ps

or

success_rate

43

0 10 20 30 40 50 60

5

10

15

20

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.5

0.6

0.7

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

10

15

20

25

30

35

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.40.420.440.460.480.5

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

20

40

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.5

0.6

0.7

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

4

5

6

7

8

9

10

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.510.520.530.540.550.560.57

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

5

10

15N

um

be

r o

f fo

un

d f

au

lts


0 10 20 30 40 50 60

0.55

0.6

0.65

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.5

0.6

0.7

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

10

20

30

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.560.580.60.620.64

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

5

10

15

20

25

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.3

0.4

0.5

0.6

0.7

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.960.981

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

15

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.4

0.5

0.6

0.7

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

15

20

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.4

0.45

0.5

0.55

0.6

0.65

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

15

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.960.970.980.991

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

Num

ber

of fo

und faults


0 10 20 30 40 50 60

0.9850.990.9951

PS

success r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

15

20

25

30

35

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.35

0.4

0.45

0.5

0.55

0.6

0.65

0.7

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

20

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.4

0.6

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

44

0 10 20 30 40 50 60

6

8

10

12

14

16

18

20

22

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.50.510.520.530.540.550.560.570.58

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

15

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.5

0.55

0.6

0.65

0.7

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

4

6

8

10

12

14

16

18

20

22

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.480.50.520.540.560.580.60.620.640.66

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

4

6

8

10

12

14

16

18

20

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.520.530.540.550.560.570.580.590.6

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

15

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.940.960.981

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

5

10

15

20

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.5

0.6

0.7

0.8

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 60

10

20

30

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.3

0.4

0.5

0.6

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

5

10

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.4

0.5

0.6

0.7

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

20

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.65

0.7

0.75

PS

su

cce

ss r

ate

ps

or

success_rate

0 10 20 30 40 50 600

10

20

30

Nu

mb

er

of

fou

nd

fa

ults


0 10 20 30 40 50 60

0.540.560.580.60.62

PS

su

cce

ss r

ate

ps

or

success_rate

45

Listings

2.1 Eiffel : Example of contract-based testing. . . . . . . . . . . . . . 152.2 Eiffel : Weakest precondition. . . . . . . . . . . . . . . . . . . . . 152.3 Eiffel : Weak precondition. . . . . . . . . . . . . . . . . . . . . . . 152.4 Eiffel : Strong precondition. . . . . . . . . . . . . . . . . . . . . . 162.5 Eiffel : Strongest precondition. . . . . . . . . . . . . . . . . . . . 163.1 Eiffel : Feature with preconditions. . . . . . . . . . . . . . . . . . 183.2 Lpsolve: Proof obligation. . . . . . . . . . . . . . . . . . . . . . . 184.1 Shell : A typical invocation of AutoTest in or -mode. . . . . . . . 224.2 Shell : A typical invocation of AutoTest in ps-mode. . . . . . . . 224.3 Eiffel : Preconditions to solve with lpsolve. . . . . . . . . . . . . . 234.4 Lpsolve: Initial proof obligation for preconditions of feature bar

(Listing 4.3). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.5 Lpsolve: Proof obligation after replacement of constraining queries

by actual values. . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.6 Lpsolve: Proof obligation with min objective. . . . . . . . . . . . 264.7 Lpsolve: Proof obligation with max objective. . . . . . . . . . . . 264.8 Lpsolve: Proof obligation after placeholder replacement. . . . . . 274.9 Lpsolve: Proof obligation with min objective on second iteration. 27

47

List of Figures

2.1 Overview of the or -strategy. . . . . . . . . . . . . . . . . . . . . . 12

4.1 Linear constraint solving using lpsolve. . . . . . . . . . . . . . . . 244.2 Generation of a concrete integer from the abstract integer. . . . . 28

5.1 Class distribution by increase in number of tested methods. . . . 315.2 Number of valid test cases for hard-to-test features. . . . . . . . . 345.3 Discovery distribution of distinct faults for the classes under test. 355.4 Class distribution by increase in number of found faults. . . . . . 365.5 Speed comparison between or and ps. . . . . . . . . . . . . . . . 37

49

List of Tables

4.1 Command line switches and options for the ps-strategy. . . . . . 21

5.1 The 55 classes chosen for evaluation. . . . . . . . . . . . . . . . . 295.2 Test coverage of the method space, sorted by the percental increase. 32

51

Bibliography

[1] AutoTest. http://se.ethz.ch/research/autotest/.

[2] Chair of Software Engineering, ETH Zurich. http://se.inf.ethz.ch/.

[3] Eiffel Software Inc., California, USA. http://eiffel.com/.

[4] EiffelBase, Eiffel Software. http://www.eiffel.com/libraries/base.html.

[5] EiffelStudio 6.3 Releases. http://dev.eiffel.com/EiffelStudio_6.3_Releases.

[6] EiffelStudio, Eiffel Software Inc. http://eiffelstudio.origo.ethz.ch/.

[7] Free ELKS. http://sourceforge.net/projects/freeelks/.

[8] Free ELKS, revision 356. http://freeelks.svn.sourceforge.net/viewvc/freeelks/trunk/library/?pathrev=356.

[9] Gobo Eiffel. http://sourceforge.net/projects/gobo-eiffel/.

[10] Gobo Eiffel, revision 6661. http://gobo-eiffel.svn.sourceforge.net/viewvc/gobo-eiffel/gobo/trunk/library/?pathrev=6661.

[11] lpsolve. http://sourceforge.net/projects/lpsolve/.

[12] The Pex Project, Microsoft Research. http://research.microsoft.com/en-us/projects/Pex/.

[13] The Spec Explorer, Microsoft Research. http://research.microsoft.com/en-us/projects/SpecExplorer/.

[14] Z3, Microsoft Research. http://research.microsoft.com/en-us/um/redmond/projects/z3/.

[15] Ilinca Ciupa, Andreas Leitner, Manuel Oriol, and Bertrand Meyer. Ex-perimental assessment of random testing for object-oriented software. InProceedings of the International Symposium on Software Testing and Anal-ysis 2007 (ISSTA’07), pages 84–94, 2007.

[16] Ilinca Ciupa, Andreas Leitner, Manuel Oriol, and Bertrand Meyer. Ar-too: adaptive random testing for object-oriented software. In ICSE ’08:Proceedings of the 30th international conference on Software engineering,pages 71–80, New York, NY, USA, 2008. ACM.

53

http://se.ethz.ch/research/autotest/

http://se.inf.ethz.ch/

http://eiffel.com/

http://www.eiffel.com/libraries/base.html

http://www.eiffel.com/libraries/base.html

http://dev.eiffel.com/EiffelStudio_6.3_Releases

http://dev.eiffel.com/EiffelStudio_6.3_Releases

http://eiffelstudio.origo.ethz.ch/

http://sourceforge.net/projects/freeelks/

http://freeelks.svn.sourceforge.net/viewvc/freeelks/trunk/library/?pathrev=356

http://freeelks.svn.sourceforge.net/viewvc/freeelks/trunk/library/?pathrev=356

http://sourceforge.net/projects/gobo-eiffel/

http://gobo-eiffel.svn.sourceforge.net/viewvc/gobo-eiffel/gobo/trunk/library/?pathrev=6661

http://gobo-eiffel.svn.sourceforge.net/viewvc/gobo-eiffel/gobo/trunk/library/?pathrev=6661

http://sourceforge.net/projects/lpsolve/

http://research.microsoft.com/en-us/projects/Pex/

http://research.microsoft.com/en-us/projects/Pex/

http://research.microsoft.com/en-us/projects/SpecExplorer/

http://research.microsoft.com/en-us/projects/SpecExplorer/

http://research.microsoft.com/en-us/um/redmond/projects/z3/

http://research.microsoft.com/en-us/um/redmond/projects/z3/

[17] Bertrand Meyer. Eiffel: the language. Prentice-Hall, Inc., 1992.

[18] Bertrand Meyer. Object-oriented software construction (2nd ed.). Prentice-Hall, Inc., 1997.

[19] Yi Wei, Manuel Oriol, and Bertrand Meyer. Is Coverage a Good Measureof Testing Effectiveness? Technical report, ETH Zurich.

54

Documents

Precondition Satisfaction by Smart Object Selection in ...se.inf.ethz.ch/old/projects/serge_gebhardt/report.pdf · When testing contract-equipped software it becomes di cult for a