Design of configuration testing

Design of Configuration Testing

Yasuharu Nishi and Yoshinori Iizuka

Department of Chemical System Engineering, Graduate School of Engineering, The University of Tokyo, Tokyo, 113-8656 Japan

SUMMARY

Problems such as “DLL hell” that occur when multi-ple pieces of software are used on the same system cannotbe detected by testing the functions of a single piece ofsoftware, but need to be detected by configuration testing.Configuration testing is performed as the final step in thedevelopment process, and although this means that there aremany cases where there are not sufficient man-hours tofully conduct testing, existing methods may not be able todetect failures quickly. This paper therefore proposes amethod for designing effective configuration tests calledthe resource path test method by focusing on failure mecha-nisms. Tools required in this method are presented, and theconfiguration test design process is explained. Examples ofusing the proposed method are also presented. © 2006Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 89(7):56–67, 2006; Published online in Wiley InterScience(www.interscience.wiley.com). DOI 10.1002/ecjb.20280

Key words: software testing; system testing; con-figuration testing; resources; DLL hell.

1. Introduction

With the expansion of PC software in recent years, itis becoming common for many pieces of software to run ona single PC. However, this has also led to the appearance ofunexpected problems.

One such problem is called DLL hell [1], which is aproblem that occurs when software that had been operating

correctly before the installation of a new piece of softwarestops operating correctly after a required DLL is updatedduring the installation of the new piece of software.

This kind of problem cannot be detected by unittesting of individual pieces of software, and instead requiresthat configuration testing is performed during the systemtest process. Configuration testing is a test that detectsfailures that occur due to various combinations of hardwareand software.

However, because system testing is not performeduntil the final stage of the software development process, itis affected by delays that occur earlier in the developmentprocess, and in many cases not enough time is allocated toconducting tests. Therefore, when the test is actually de-signed, there is no choice but to rely on designing the testsbased on experience, even if an existing configuration testdesign methodology is used as a basis.

This paper therefore proposes a configuration testdesign method that is more effective than existing methodsbecause it concentrates on failure mechanisms. The existingmethods are first described in Section 2, and the failuremechanisms are modeled in Section 3. Section 4 proposesthe resource path test method based on this modeling.Section 5 presents compatibility as a new indicator ofsoftware quality and explains how to evaluate compatibilityusing the resource path test method. Section 6 presents anddiscusses examples of using the method.

2. Existing Methods

As a guide to the configurations that should be tested,Myers stated that, at the very least, testing should be con-ducted on the minimal and maximal configurations [2].

© 2006 Wiley Periodicals, Inc.

Electronics and Communications in Japan, Part 2, Vol. 89, No. 7, 2006Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J84-D-I, No. 11, November 2001, pp. 1542–1552

56

Kaner and colleagues used a printer as an example topropose a procedure for configuration testing [3] in whichcomparisons are made between configurations consistingof different configuration elements. The first comparison ismade between different kinds of printers, a laser printer andan inkjet printer. Comparisons are then made between thesame kinds but different generations of printers, and thenthe same kinds and generations of printers but with differentdriver software. Finally, comparisons are made between thesame kinds and generations of printers with the same driversoftware, but where the printer models are different.

Beizer proposed a configuration testing method thattested rotations and permutations of the configurations [4].In this method, the IDs of devices that can be interchangedare treated as permutations, and tests are then designed byperforming rotation and substitution operations on the per-mutations.

For example, in a configuration test of four SCSIdevices, if the physical connection order of the device IDsfrom the host adapter is 1234, tests are first performed byrotating the ID strings to 2341, 3412, and 4123, and thenby the replacements 2143, 3412, and 4312.

During the tests design stage, however, none of theseexisting procedures make any assumptions about the fail-ures that need to be detected. As a result, failures may notnecessarily be detected quickly, and it is therefore impossi-ble to evaluate the trade-off between skipping test items toreduce the number of man-hours taken to conduct the tests,and the risk that a failure cannot be detected due to the testitems that are skipped.

3. Resource Path Model

3.1. Conflictive failures

The problem of DLL hell, as described in Section 1,is caused by shared DLLs being updated by new software.In this paper, the software that is being tested in order toinvestigate whether any failures occur is called the softwareunder testing (SUT).

However, DLL hell is not the only situation wherenew software causes a failure to occur in the software beingtested that did not occur during stand-alone operation. Forexample, a failure may occur if a new piece of software isinstalled that makes changes to settings stored in the regis-try or INI files. Furthermore, because new software con-sumes resources when it is running, failures may occurwhen the network, memory, or other working areas areexclusively locked. All of these failures are caused by newsoftware, and so cannot be detected by testing the functionsof the SUT in isolation. These kinds of failures can only befound by configuration testing.

This paper is therefore not limited to DLL hell, buthandles all of the problems that may be caused by newsoftware. That is, failures that occur when the state ofmodules or the network changes due to the installation ofnew software.

In this paper, changes that lead to a failure are calledconflicts, and failures that are caused by a conflict are calledconflictive failures. A piece of software that causes a con-flict is then called the software causing failures (SCF).Furthermore, the operation of the SCF that causes theconflict, such as running or installing the software, is calledthe failure condition.

In other words, a configuration test can be describedas a test that detects conflictive failures that are createdwhen a failure condition occurs and causes a conflict withthe SUT. The remainder of this section presents a model fordescribing conflictive failures that can be used to designconfiguration tests.

3.2. Resources

Kaner’s group has stated that there is no need forredundancy in test items [3]. When this principle is appliedto the design of configuration tests for detecting conflictivefailures, it becomes the principle that multiple test itemsshould not generate the same conflict. Conflicts thereforeneed to be generated in different units, or different conflictsneed to be generated in the same unit.

By concentrating on the unit that generates a conflict,the change from before to after the conflict can be viewedas a change that occurs in the data transfers that are neededfor the program to execute. In research into stress test designby Nishi and Iizuka [6], this kind of unit is called a resource,and is defined as “an object that transfers and maintains datathat is needed for a program to execute.” This paper followsthis definition and calls units that can generate conflictsresources. Examples of resources include modules such asDLLs, external devices, disk space, memory space, andnetworks. The transfer of resource data refers to functioncalls and passing of values between modules, command anddata transfers between devices, reading and writing to diskand memory space, sending and receiving packets on thenetwork, and so on.

Resources that are part of the SUT and also part ofthe SCF are called common resources. A failure conditionis then an operation of the SCF that generates a conflict ina common resource.

Different conflicts in the same unit, that is, differentchanges to the same resource, can be easily observed if theyare treated as changes in resource attributes.* For example,

*The way to understand the generation of conflicts, that is, the way tounderstand changes in attributes, is different for each resource and eachattribute, and also depends on how easy it is to test [5] the SUT, and so isnot discussed in detail.

57

if a conflict occurs in a module such as a DLL, it is difficultto compare what kind of changes occurred in the implemen-tation and interfaces of functions included in the module.However, by comparing attributes of the DLL such as thesize, CRC, version number, and update date, the cause ofthe conflict can be easily observed. Resources and attributesare shown in Table 1.

If we execute a function that consumes resources, thecapacities of the resources are reduced and the states of theresources can also temporarily change. In Ethernet, forexample, collisions can occur such that transmissions can-not be made for a short fixed amount of time. In the case ofdisks, write operations may not function if there is insuffi-cient free space. Attribute changes therefore also refer tothis kind of temporary change in the state of a resource.

3.3. Modeling failure mechanisms

Failures refer to situations where unexpected outputis produced or where irregular behavior occurs [7]. Thismeans that failures may not appear only when a conflictfirst occurs. After a conflict occurs, a failure is first discov-ered when unexpected output or irregular behavior occursas the result of executing a function of the software undertesting.

If we focus on the resource where a conflict mayoccur, a failure does not exist if the transfer of data per-formed by the resource does not change when the conflictoccurs. A failure can therefore be viewed as a situationwhere the effects of a change that are caused by a conflict

are passed along the chain of resources to become exter-nally visible as unexpected output or irregular behavior.

In this paper, this kind of chain of resources is calleda resource path. That is, conflict failures are modeled ac-cording to the mechanism shown in Fig. 2.

In the following section, the resource path test isproposed as a method for designing configuration tests.

4. Resource Path Test

4.1. Overview of resource path test

To generate a failure under the mechanism describedin Section 3.3, a conflict needs to be generated by creatinga failure condition in the common resource, and a functionneeds to be executed along a resource path that includes thecommon resource where the conflict occurs.

Test design methods that follow this strategy arecalled resource path test methods. In other words, resourcepath tests are tests that change common resources by per-forming an installation or other operation that consumesresources, and then detecting whether the operation of theSUT differs from the previous operation. Therefore, thefollowing three items first need to be determined in the testplan.

I. What are the common resources? II. How can the common resources be changed? III. What functions of what resources can be exe-

cuted to detect differences in operation com-pared to before the function was executed?

To determine item I, an “SCF selection phase” isperformed that selects candidates that have a commonresource from a pre-provided set of candidate SCF. Duringthis phase, a “resource dependency graph” is created thatdisplays the dependency relationships between resources.Refer to Section 4.2.1 for details. Common resources areresources that are contained in multiple software and hard-ware dependency graphs.

To determine item II, a “failure condition enumera-tion phase” is performed that enumerates the failure condi-tions under which changes to common resource attributesare caused by the selected SCF. During this phase, a “re-source change matrix” is created that displays which attrib-utes are changed in which common resources due to failureconditions such as installations, uninstallations, changes tosettings, and consumption of resources. Refer to Section4.2.2 for details.

To determine item III, a “test function enumerationphase” is performed that enumerates the functions that canbe executed to detect failures that occur due to conflicts inthe common resources. During this phase, the resource

Fig. 1. Resource.

Failure condition arises due to software causing conflicts

↓

A conflict occurs in a resource that is common to the soft-ware causing conflicts and the software under testing

↓

A function is executed along a resource path that containsthe common resource where the conflict occurred

Fig. 2. Mechanism of conflictive failure.

58

paths that contain common resources in which conflictsoccur are extracted from the resource dependency graphs,and the “resource path list,” which displays the functionsthat pass through each resource path, is created. Refer toSection 4.2.3 for details.

An additional two phases are then required in orderto conduct tests.

In order to handle situations where there are notsufficient man-hours available to conduct the actual test, a“test order determination phase” is performed that consid-ers the characteristics of the product and the structuralimportance of the common resources.

Finally, a “test execution phase” is performed thatdetermines the test data, etc.

In a resource path test, assumptions are made whenthe test is designed regarding the changes to resources thatwill cause failures. This is so that failures can be discoveredquickly using a small number of man-hours to conduct thetests, and so that the causes of failures can be easily identi-fied. Furthermore, in cases where a project is runningbehind schedule, it is possible to evaluate the trade-offbetween reducing the number of man-hours needed toconduct testing by skipping test items and the risk thatfailures cannot be detected due to the items that are skipped.

4.2. Tools used in the resource path test

4.2.1. Resource dependency graph

During the “SCF selection phase,” a nondirectionalgraph that displays the dependency relationships betweenresources is used as a tool to identify resources that arecommon to multiple pieces of software and hardware. Thisgraph is called a resource dependency graph. An overviewof this concept is shown in Fig. 3. A resource dependencygraph can be thought of as a graph that displays the hard-ware and software architecture.

Nodes on the resource dependency graph representmodules, devices, disk space, memory space, and the net-

work. Lines in the graph represent the transfer of databetween resources, such as function calls between modules,the use of devices, reading and writing to disk and memoryspace, and network use.

Common resources can be enumerated by comparingthe resources that are contained in the resource dependencygraph of the SUT to the resources contained in the resourcedependency graphs of the SCFs.

Furthermore, the information on the nodes and linesthat is needed to create the resource dependency graph canbe obtained by using software for viewing run-time stateinformation. Software for viewing run-time state informa-tion refers to the family of tools that can dynamically viewthe function calls between modules and the parameters ofthose function calls, commands and parameters passed todevices, data written to and read from memory, and packetssent across the network, such as BugTrapper [9] and De-pendency Walker [10]. One piece of software for viewingrun-time state information called Apius [11] can be used toautomatically generate a resource dependency graph re-lated to a limited set of resources such as DLLs and OCXs.

Resource dependency graphs are generated for theSUT and SCFs.

4.2.2. Resource change matrix

In the “failure condition enumeration phase” the cor-respondence between common resources and failure con-ditions is determined and a table displaying the conflictsthat occur is created and used as a tool. This table is calledthe resource change matrix. The resource change matrix isonly created for the SUT. Therefore, failure conditions areenumerated regardless of whether they are from the SUT orthe SCFs, and are entered in the resource change matrix. Anexample is shown in Fig. 4.

Each row in the resource change matrix displays afailure condition, and each column displays a commonresource. Each cell then describes a specific conflict. If theconflict cannot be described quantitatively, it is describedqualitatively.

When creating the resource change matrix, the func-tions of the SUT that cause conflicts need to be clearlydetermined. To achieve this, all of the functions are firstexecuted, and the static attributes of the common resources,such as size and update date, are then checked, and if thereare any changes, the functions that caused the changes areidentified by a binary search or other methods. Further-more, when each function is executed, software for viewingrun-time state information is used to observe and identifychanges to dynamic attributes of common resources, suchas exclusive access to the network and the amount ofavailable memory.

Fig. 3. Common resource to two resource dependencygraphs.

59

4.2.3. Resource path list

In the “test function enumeration phase,” functionsthat may operate differently after a conflict, that is, func-tions that use common resources in which conflicts occur,need to be enumerated. Furthermore, resource paths needto be enumerated that contain: the common resource; theresource that launches the function; and a resource that canbe used to observe the result of the function that waslaunched. Resources that launch a function are called acti-vation resources, and resources that observe the results oflaunching the function are called observation resources.

The activation and observation resources are the resourcesat the ends of the resource path.

During this phase, a list enumerating the resourcepaths extracted from the resource dependency graphs andthe functions that execute the extracted resource paths isused as a tool. This list is called the resource path list. Anexample is shown in Fig. 5.

When creating the resource path list, it is necessaryto determine which functions of the SCFs use which re-sources. If software for viewing run-time state informationis used, the resource path list can be created without needingthe source code because exchanges between modules, de-

Fig. 4. Resource change matrix.

Fig. 5. Resource dependency graph and resource path list.

60

vices, memory space, networks, and so on can be monitoreddynamically.

4.3. The testing process using the resourcepath test method

After the sequence of three phases corresponding tothe tools described in Section 4.2 has been executed, theprocess of testing using the resource path test method iscompleted by determining the precedence of, implement-ing, and executing test items.

Although the test execution phase is similar to thestandard black box test, the precedence determinationphase uses many more criteria than the standard black boxtest.

In the standard black box test, the precedence isdetermined by making assumptions on the resulting effectsfor each case where a failure occurs in a function. Forexample, new functions that do not exist in previous ver-sions of the SUT or competing software, functions that havebeen enhanced, basic functions of the SUT, and functionsthat are frequently used need to be given priority duringtesting because there will be more customer complaints ifa failure occurs in these functions compared to other func-tions. The situation is similar for functions where the cus-tomers are unable to find a way to resolve the problem if afailure occurs. Furthermore, the increase or decrease in thenumber of testing man-hours due to the relationship be-tween consecutive test items also needs to be considered.

In the resource path test method, the order of prece-dence is further determined by assuming the magnitude ofthe effect of a conflict. In other words, even if functionshave the same precedence, the larger the number of func-tions that use a common resource that causes a conflict, andthe more fatal the effect of a failure due to the conflict isassumed to be, even when the failures occur in the samecommon resource, the greater the precedence of the testitem.

For example, consider an SUT that is a word proces-sor. Test items for conflicts that occur in the languagerun-time library, which is used by virtually all of the func-tions, have higher precedence than the macro conversionDLL. Furthermore, when the printer spooler is a commonresource, if a conflict that causes the contents of the spoolerto disappear is assumed to cause a failure where printingcannot be performed, and a conflict that causes the spoolerto overflow is assumed to cause a failure where the systemitself crashes, then the test item that creates an overflow isgiven a higher precedence.

Figure 6 shows the test process when the resourcepath test method is used. This figure shows a breakdown ofthe phases described in Section 4.1. The number of testdesign man-hours refers to the man-hours needed to per-

I. SCF Selection Phase(1) The resource dependency graph G0 of the SUT is

created, and the resource dependency graphs G1

to Gn of the candidate SCFs are created.(2) G0 is compared to G1 to Gn to search for common

resources.(3) Candidate SCFs that have common resources are

selected as SCFs.

II. Failure Condition Enumeration Phase(4) Shared resources are written into the resource

change matrix M0 of the SUT.(5) The failure conditions of each of the SCFs that

change a common resource are written into M0.(6) Changes that are made to attributes of the common

resources are written as detailed as possible intoM0 for each failure condition.

III. Test Function Enumeration Phase(7) Common resources are written into a resource

path list L0.(8) Functions F1 to Fk of the SUT that use the common

resource are written into L0.(9) The activation resources that activate the functions

F1 to Fk are written into L0.(10) The resources that can be used to observe the

result of the execution of functions F1 to Fk arewritten into L0.

IV. Test Order Determination Phase(11) Failure conditions are matched to test functions to

create test items for each common resource.(12) The precedence and execution order of test items

are determined by considering how new the func-tions are, how frequently they are used, the mag-nitude of the effect of a conflict, how the order ofconsecutive test items will affect the number ofman-hours needed to conduct the tests, etc.

V. Test Execution Phase(13) The test data, testing environment, and other items

required to run the tests are selected.(14) The expected results for each test item are deter-

mined from the specifications.(15) The testing environment is prepared.(16) Test items are executed.(17) The results of executing test items are compared

to the expected results.(18) A failure report and a list of test items for failures

that could not be detected is created.

Fig. 6. Process of resource path testing.

61

form the work up to determining the items that are neededto run the test, and refers to items in Fig. 6 up to item (13).

5. Compatibility Test Applications

Configuration testing can be thought of as a processfor investigating conflictive failures that occur in the SUTwhen it is in an operating environment that includes SCFs.This can be interpreted as measuring the adaptability fromthe ISO/IEC 9126 quality characteristics. Adaptability inthis context refers to “a property of software whereby thesoftware is able to be used in environments different fromthe specified environment without any additional opera-tions or procedures that had not been previously preparedfor the SUT” [8].

If we concentrate on functions in which failures aredetected, then adaptability is only a measure of whether thefunctions of the SUT operate correctly or not. However,even if the functions of the SUT operate correctly, if afailure occurs in a function of the SCF, then we must judgethat a failure has occurred in the system as a whole. Forexample, this can occur when the installer of the SCFdetects a previously installed DLL of the SUT and does notreplace the DLL.

There has been very little discussion on the charac-terization of the quality of software from the perspective ofcases where a conflict caused by the SUT results in a failurethat occurs in a function of the SCF.

In this paper, we define this kind of “propertywhereby execution of the functions of other software on thesame system are not damaged by configuration changes dueto installation or uninstallation, or execution of functions ofthe SUT” as compatibility, and propose a compatibility testfor investigating compatibility.

A compatibility test can then be thought of as aconfiguration test where the roles of the SUT and SCF arereversed. The resource path test can therefore be used todesign compatibility tests without modification.

6. Usage Examples

This section presents examples of applying the pro-posed procedure to actual software that has been releasedcommercially. The results of previous methods and of de-signs based on experience are presented as a reference, andare used for comparison and discussion. Two versions of acommercial word processor are used as SUTs, and threeversions of a commercial Web browser are used as SCFs.Test designs using the proposed method and previous meth-ods were performed by the author, whereas test designbased on experience was performed by a technician withover 9 years’ experience. As described later, however, con-

flictive failures could not be detected in one of the SUTsirrespective of the procedure that was used, and so theresults for that SUT are omitted from this report.

6.1. Test requirements

The test requirements assume that a system develop-ment company is developing a client for the internal infor-mation system of a customer, and that the company wantsto perform a configuration test of commercial word proc-essors and commercial Web browsers that are already beingused by the customer in order to produce a client that canfunction in combination with the retail software. The testingconditions assume that the number of man-hours availableto conduct testing have been reduced due to the projectrunning behind schedule. It is also assumed that there arenot multiple commercial web browsers installed.

The commercial word processor already being usedby the customer is assumed to be a product that has grownto take roughly half of the market.

Furthermore, the Web browsers that make up the SCFwere chosen to be three versions of a piece of software thatis distributed through almost half of the market: one versionfrom the previous generation, one after the major change tothe current generation, and one after the major change inaddition to minor changes that make it the latest version.These are referred to as browser A, browser B, and browserB′.

A single configuration of PC was used for the testingenvironment, and the OS was reinstalled before installingeach SCF to avoid interference between the different SCFs.The OS was Windows NT 4.0.

Although the word processor implements severalhundred functions including optional features, for simplic-ity, the functions were limited to the following 10 basicfunctions:

(1) Open a file(2) Close a file(3) Print to a local printer(4) Search for a string of text(5) Change the text size(6) Add effects to text(7) Change the font(8) Execute document revision(9) Run the spell checker(10) Split a window

The document file used in the testing was one of the samplefiles provided with the SUT.

6.2. Test design using the proposed method

Tests were designed using the proposed method byfollowing the design process proposed in Section 4.3, with

62

resource dependency graphs first created for the SUT andeach of the SCFs. To avoid the need to perform too manytasks, the resources are limited to DLL and OCX files, andresources that were provided with the OS, such as KER-NEL32.DLL and other known DLLs, are ignored.

Resource dependency graphs were created with theresult that MSVCRT40.DLL was identified as a commonresource between the SUT and browser A, and between theSUT and browser B. There were no common resourcesbetween the SUT and browser B′. The creation date of thecommon resource was found to be different as a result ofinvestigating the attributes of the common resource, withthe date 1996/2/28 for the SUT, 1996/10/29 for browser A,and 1996/6/14 for browser B. The resource dependencygraphs that were created are shown in Fig 7. Commonresources are shown shaded in the figure.

Details on the common resources that had been iden-tified and changes to those resources were found by creatingthe resource change matrix, which revealed that the only

failure condition that could change a common resource wasthe installation of browser B.

The resource path list that was created next revealedthat the same common resource was used by all of thefunctions. The magnitude of the effect of a conflict in thatresource was therefore also assumed to be the same, and sothe precedence of test functions was determined by makingassumptions about the effects of a failure occurring in eachof the functions, the same as in the standard black box test.More specifically, because none of the functions corre-sponded to new or enhanced functions, the functions wereevaluated on whether they were basic functions, frequentlyused functions, or functions that could be replaced by othermeans. The printing function is the most basic function ofa word processor, and file operations are the next most basicfunctions, so these were given a high precedence. Searchingand revising functions, and GUI operations can be replacedby manual and visual methods, and so were given a lowprecedence. When determining the details of the order ofprecedence, the usage frequency was also used as a deter-mining standard.

From the above considerations, the order of prece-dence of the SCFs was browser B, A, B′, and the order oftest functions for each of the configurations was:

(3) → (1) → (2) → (5) → (7) → (6) → (4) → (9) → (8)→ (10)

The number of man-hours to design the test was atotal of 2 hours, consisting of 1 hour to create the resourcedependency graphs, and 1 hour to create the resourcechange matrix and resource path list, and to implement thetest cases. In cases where the number of man-hours toconduct the tests needs to be reduced, it became clear thatbrowser B′ can be omitted if the risk of not detectingconflictive failures in resources other than DLLs can beignored, and browser A can be omitted if the risk of notdetecting conflictive failures due to failure conditions thatwere missed during the enumeration can be ignored.

6.3. Test design using previous methods

6.3.1. Test design using the Myers guide

The design resulting from following the Myers guide[2] has a total of four test configurations, consisting of threeminimal configurations—each of the configurations whereonly one of the browsers A, B, or B′ is installed—and theconfiguration where all of the browsers are installed.

It took almost no man-hours to design the test. Fur-thermore, the order of the test configurations is random, andthere is also no way to determine the order of the testfunctions.Fig. 7. RDGs of SUT and SCFs.

63

6.3.2. Test design using the Beizer method

The Beizer method [4] considers the permutations ofinstallation order, and thus resulted in a design containinga total of six configurations made up of both permutationsof the order of installation of the browser and SUT for eachbrowser.

It took almost no man-hours to design the test. Fur-thermore, the order of the test configurations is random, andthere is also no way to determine the order of the testfunctions.

6.3.3. Test design using the Kaner method

The Kaner method [3] performs tests in order startingfrom the configuration elements that are most different.This results in a design where either browser B′ or browserA is tested first, then browser A or browser B′ is tested next,and finally browser B is tested.

It took almost no man-hours to design the test. Fur-thermore, the order of the test functions cannot be deter-mined.

6.4. Test design based on experience

Test design based on experience was performed by atechnician with over 9 years’ experience in a specialisttesting organization.

The experience-based test design focused on the de-vices that are used during the operation of a function, andso testing started from functions that were predicted byexperience to have a high tendency of generating failures.The design also concentrated on functions that were fre-quently used by the user. The resulting test order was:

(1) → (2) → (3) → (7) → (8) → (9) → (4) → (5) → (6)→ (10)

The order of the SCFs was to start testing from the latestversion, because it was assumed that there was a higherchance of later versions being used due to the possibility ofsecurity holes in the Web browsers.

The test design took 1 man-hour.

6.5. Test results

Actual tests were run to evaluate the designs that wereproduced by each method. The results are shown in Table1.

Irrespective of the order of installation, conflictivefailures did not occur when the SCF was browser A or B′,but a conflictive failure did occur whereby the SUT failedto start when only browser B or when all of the browserswere installed as the SCF.

By identifying the cause of the failures, as describedlater, it became clear that the conflictive failure was causedby a protection fault that occurred in the dynamic memoryallocation routine in MSVCRT40.dll that is called from thestartup routine of the software. Additional investigationsthat were performed by using software for viewing run-timestate information revealed that all 10 test functions calledthe same dynamic memory allocation routine. Thus, evenif the dynamic memory allocation routine had not been usedduring startup, and a failure had not occurred, it is assumedthat the failure would have occurred in all 10 of the testfunctions.

The number of man-hours taken to conduct the testswas 1 hour, and executing and viewing the results of eachtest case took 3 minutes.

6.6. Comparison of methods

Table 2 shows the detection results for each methodbased on the test execution results described in Section 6.5.

In terms of the detection of failures, the proposedmethod resulted in the fastest detection. The proposedmethod also resulted in the fastest identification of thesoftware that caused the failure.

In terms of the number of man-hours to design thetest, the proposed method required the most man-hours at2 hours, with the experience-based method taking 1 hour,and the previous methods requiring almost no time todesign. In terms of the man-hours to conduct the tests, theproposed method, the experience-based method, and theKaner method each took the least amount of labor, withthree configurations each. The number of man-hours todesign the test refers to the number of man-hours taken bythe tasks up to determining the items that are needed to runthe test. This corresponds to the tasks up to item (13) in theresource path test method shown in Fig. 6.

Table 1. Results of all test cases

64

In terms of the order of test functions, the order wasfixed in the proposed method in a similar way to thestandard black box test. In the existing methods, however,although the order was random, it can be assumed that theorder would be inferred at the actual testing site usingexperience, and so the order would not be expected to bevery different.

In terms of reducing the number of man-hours avail-able to conduct the tests, under the experience-basedmethod, the tendency for failures to occur and usage fre-quencies of users were determined by experience, whereasthe Kaner method used a standardized experience-baseddetermination of whether the variety and behavior of prod-ucts are similar. In other words, the only method that makesan objective determination is the proposed method, whichconsiders the risk of failures remaining undetected if testitems are omitted.

In terms of test design experience, although, by defi-nition, a large amount of experience is required to designtests using the experience-based method, in the proposedmethod and previous methods, test design could be per-formed by someone with almost no experience, such as theauthor, with no experience required apart from the task ofdeciding the order of test functions.

The results collected from the example describedabove thus indicate that the proposed method is the mosteffective in terms of detecting failures, identifying thesoftware that caused the failures, the number of man-hoursto conduct the tests, and reduction in the number of man-hours to conduct the tests.

However, the results also revealed that the proposedmethod required a large number of man-hours for testdesign. This is because, although the information needed tocreate the resource dependency graphs could be quicklyobtained using software for viewing run-time state infor-mation, the task of creating the nondirectional graph usingthe obtained information was performed manually, and thusthere is an urgent need to automate this task in order toimprove the design method. In situations where a test isplanned where the only resources are DLL and OCX files,as in the current example, automation can be easilyachieved by using software for viewing run-time state in-formation to record dynamic modules calls, and create agraph where the DLL and OCX files are nodes, and themodule calls are lines.

A large contribution to the effectiveness that resultedfrom the proposed method is that common resources andconflicts can be precisely identified. This is because theproposed method designs a test by considering in advancefailures that may occur due to conflicts in common re-sources. If there are DLL shared libraries that are frequentlyhandled by configuration tests, it is entirely possible toaccurately enumerate these by using software for viewingrun-time state information, as in the present example. How-ever, it would not be possible to conclude that the proposedmethod was as effective as the current example if thecommon resources are not able to be identified.

Furthermore, in cases where many functions use acommon resource, a problem is expected to arise in whichit is difficult to determine the order of test functions.

Table 2. Comparison of resource path testing and other test methods

Proposed methodExperience-based

methodMyers guide Beizer method Kaner method

Speed of failure detec-tion

Detected in the firstconfiguration

Detected in thesecond configuration

Detected in the firstconfiguration

Random Random

Identification of the SCFcausing the failure

Identified in the firstconfiguration

Identified in thesecond configuration

Identified in thesecond or laterconfiguration

Identified in thedetectedconfiguration

Identified in the finalconfiguration

Man-hours to design test 2 hours 1 hour Almost none Almost none Almost none

Man-hours to conducttest

3 configurations 3 configurations 4 configurations 6 configurations 3 configurations

Order of test functions Determined byexperience

Determined byexperience

Random Random Random

Ability to reduce thenumber of man-hoursfor conducting tests

Determinedobjectively from therisk of not detectingfailures due toskipping test items

Determined byexperience from thetendency for failuresto occur and userusage patterns

Not determined Not determined Determined byexperience thatfailures are fewer insimilar software

65

For this reason, problems relating to determining thedetails and hierarchy of common resources, such as theproblem of effectively using software for viewing run-timestate information and design information to accurately enu-merate resources, and the problem of treating internal mod-ule functions rather than modules themselves as commonresources, are expected to arise when actually designingtests, and so will need to be studied in the future.

6.7. Applications to identifying failure sources

Attempts were also made to investigate the sourcesof detected failures by applying the resource path testmethod.

Initially, software for viewing run-time errors wasused to confirm that the failure to start up was due to aprotection error that occurred in the dynamic memory allo-cation routine in MSVCRT40.dll, which had been assumedto be the source of the failure. Although the software forviewing run-time errors produces a large amount of outputmaking the task of confirmation labor-intensive, the DLLname was determined from the resource dependency graphthat was created during the test design, and so the outputcould be limited and confirmation could be made in theexceptionally short time of 5 minutes.

Next, from the resource path list that was created forbrowser B by using software for viewing run-time stateinformation, it was clear that the only reason thatMSVCRT40.dll was included in browser B was because ofthe three-dimensional sound image adjustment functionthat is executed by the three-dimensional video outputfunction. The DLL calls made by each function could beviewed using software for viewing run-time state informa-tion, and so the resource path list for browser B could becreated using the exceptionally small amount of man-hoursof 15 minutes.

In other words, by using the resource dependencygraph and resource path list tools of the resource path testmethod, it was possible to conclude that the failure wascaused by the inclusion of the three-dimensional soundimage adjustment function in browser B. Furthermore, thecauses of the failure could be investigated using an excep-tionally small number of man-hours, which is attributed tothe use of software for viewing run-time errors and softwarefor viewing run-time state information.

The same conclusions cannot be drawn using theexperience-based test design method or the test methodsdescribed in Section 2. That is, if tests are performed usingone of these methods, a separate task is required to investi-gate the causes of the failures, and this requires a largenumber of man-hours.

Thus, even when the resource path test method wasapplied to large-scale commercial software, the results were

that failures could be detected by using a small number ofman-hours to conduct tests, and that the source of thefailures could also be found. Furthermore, the causes of thefailures could be found in an exceptionally small numberof man-hours by using software for viewing run-time errorsand software for viewing run-time state information. Con-sidering that the system test process is performed as thefinal stage in development, and that this thus creates themajor problem of having to reduce the number of man-hours available for conducting testing, the resource pathmethod is thought to be an exceptionally effective method.

7. Summary

The resource path model was constructed as a modelfor configuration testing of SUT. By following the modelthat was constructed, the resource path test model was thenproposed as an effective method for designing tests. Toolsfor use in the resource path test and design process werealso proposed. Effective test configurations are then able tobe designed using the proposed methods.

By applying the proposed method to commercialsoftware, it was demonstrated that failures could be de-tected more effectively, and the man-hours to conduct test-ing could be reduced while retaining more precision thanin previous methods or experience-based methods. Thecauses of failures could also be found exceptionallyquickly.

Some problems for the future include investigatingways to determine the details and hierarchy of commonresources, creating an automatic tool, and real test designfor large-scale applications.

Acknowledgments. The authors thank ProfessorsHitoshi Kume and Takeshi Nakajo from the Faculty ofScience and Engineering at Chuo University, and everyoneat Veriserve Corporation for the important advice they gave.We also thank all of the people who reviewed this paper fortheir beneficial advice.

REFERENCES

1. Sitaraman M, Davis M, Devanbu P, Poulin J, Ran A,Weide B, Ran A, Weide B. Reuse research: Contribu-tions, problems and non-problems. Proc 5th Sympo-sium on Software Reusability, p 178–180, LosAngeles, 1999.

2. Myers GJ. The art of software testing. John Wiley &Sons; 1979.

3. Kaner C, Falk J, Nguyen HQ. Testing computer soft-ware. ITP; 1993.

66

4. Beizer B. Software system testing and quality assur-ance. ITP; 1996.

5. Pressman RS. Software engineering—A practitio-ner’s approach. McGraw–Hill; 1997.

6. Nishi Y, Iizuka Y. Design of stress testing focused onresource. Trans IEICE 2000;J83-D-I:1070–1086.

7. Beizer B. Black-box testing. John Wiley & Sons;1995.

8. Azuma M. Software quality evaluation guide book.Japan Standards Association; 1994.

9. Mutek Solutions. BugTrapper, Mutek Solutions,http://www.mutek.com/, Or-Yehuda, 2000.

10. Miller S. Dependency Walker. Microsoft,http://www.dependencywalker.com/, Redmond,WA, 2000.

11. Sarion Systems Research. Apius Brief Guide, SarionSystems Research, http://www.sarion.com/apius/,Tokyo, 2000.

AUTHORS (from left to right)

Yasuharu Nishi (member) received his B.S. degree (chemical systems engineering) from the University of Tokyo in 1995and Ph.D. degree in 2001 and joined SQC Inc., and is a researcher funded by the Department of Chemical Systems Engineeringat the University of Tokyo. He is conducting research into software quality, especially software testing. He is a member of theInformation Processing Society of Japan, the Japanese Society for Quality Control, the Society of Project Management,IEEE-CS, and ACM.

Yoshinori Iizuka received his B.S. degree (numerical engineering) from the University of Tokyo in 1970 and Ph.D. degreein 1974 and became a teaching assistant at the University of Electro-Communications. He became a teaching assistant in reactionchemistry in 1976, and an assistant professor in 1984. He changed affiliation to chemical systems in the Graduate School ofEngineering due to restructuring in 1994. He has been a professor since 1997. He is conducting research into quality controland statistical analysis. He is a member of the Japanese Society for Quality Control, the Japanese Society of Applied Statistics,the Reliability Engineering Association of Japan, the Society of Chemical Engineers, Japan, and ASQ.

67

Documents

Design of configuration testing