11
Critical Paths for GUI Regression Testing Alexander K. Ames and Haward Jie {sasha,haward}@cse.ucsc.edu Univ. of California, Santa Cruz Abstract Given the rising costs of manual regression testing for graphical user interfaces (GUIs), We present a new ap- proach to the problem of regression test case selection for GUIs. Our approach involves the use of test suite cap- ture data from a capture/replay testing tool. With this data we can represent a test suite for a given GUI using a call graph. Weights within this graph represent recurring paths taken in various test cases. Because these paths are redun- dant, they are critical to the operation of the GUI. Thus these critical paths become important to choosing which unit tests to implement. Performing unit testing based on our data reduces the time to implement these tests. This is an improvement over to implementing the tests alone, which takes longer and does not always test the critical paths. Keywords GUI testing, GUI Regression Test, Automated test. 1. Introduction Regression testing is applied to modified software to provide confidence that the changed parts behave as in- tended and that the unchanged parts have not been ad- versely affected by the modifications [6]. According to Ball [1], the goal of regression test selection analysis is to answer the following question in the least expensive way without degrading its test quality: Given test input t and programs old and new, does new(t ) have the same observable behavior as old (t ). Regression testing has become an expensive part of the software development life cycle. On top of testing tech- niques being ad hoc, testing activities consumes a signif- icant amount of labor, is resource intensive, and is also a very time consuming activity. For extensive code, regres- sion testing often accounts for 50 to 60 percent of total soft- ware development costs [7]. Therefore, researchers should focus on new approaches to reduce the cost of testing. One approach to reduce the cost of testing is to apply test se- lection to test suites, and determine the critical unit tests to perform. This paper describes our test selection system to detect which states are the critical paths to test. Addi- tionally, the paper describes a unit test technique that we propose to efficiently handle the critical paths. In describing GUI testing we may consider a state to be the result of a single user action or manipulation of the GUI. For example, some state may occur after a user presses some button, enters text into a text field, or selects a menu item. Moreover, we may find that there are paths be- tween states. Given that each path connects two states, they can also be seen as the relation of two successive actions. In a GUI test suite, the combination of many test cases for the GUI application, it is often the case that certain actions will be repeated in succession in order to properly test the GUI. This repetition indicates that such paths are critical. Hence, we have found critical paths which require further tests with unit testing. For example, a Web browser GUI may have ”back” and ”refresh” buttons. The resulting use of each will bring the GUI to new states. If used in succession we shall have a path between the the states. Since it is a commonplace op- eration for the user of the browser to perform those actions in succession respectively, as the a user may need to refresh an old web page after going back to it (and a properly de- signed test suite would aught to have this), we would find that the path between ”back” and ”refresh” to be a critical path. In the next section, we discuss general background in GUI regression testing such as GUI call graphs, which de- pict the test flows of the examples presented. Section 3 de- scribes the paper’s methodology in detail including XML parsing, the Java unit test, and the basic assumptions made. In section 4, we present our implementation. Section 5 dis- cusses the evaluations based on the results of our test imple- mentations. We also mention the performance of our tool 1

Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

Critical Paths for GUI Regression Testing

Alexander K. Ames and Haward Jie{sasha,haward}@cse.ucsc.eduUniv. of California, Santa Cruz

Abstract

Given the rising costs of manual regression testing forgraphical user interfaces (GUIs), We present a new ap-proach to the problem of regression test case selection forGUIs. Our approach involves the use of test suite cap-ture data from a capture/replay testing tool. With this datawe can represent a test suite for a given GUI using a callgraph. Weights within this graph represent recurring pathstaken in various test cases. Because these paths are redun-dant, they are critical to the operation of the GUI. Thusthese critical paths become important to choosing whichunit tests to implement. Performing unit testing based onour data reduces the time to implement these tests. Thisis an improvement over to implementing the tests alone,which takes longer and does not always test the criticalpaths.

Keywords

GUI testing, GUI Regression Test, Automated test.

1. Introduction

Regression testing is applied to modified software toprovide confidence that the changed parts behave as in-tended and that the unchanged parts have not been ad-versely affected by the modifications [6]. According toBall [1], the goal of regression test selection analysis is toanswer the following question in the least expensive waywithout degrading its test quality:

Given test inputt and programs old and new,doesnew(t) have the same observable behaviorasold(t).

Regression testing has become an expensive part of thesoftware development life cycle. On top of testing tech-niques being ad hoc, testing activities consumes a signif-icant amount of labor, is resource intensive, and is also a

very time consuming activity. For extensive code, regres-sion testing often accounts for50to 60percent of total soft-ware development costs [7]. Therefore, researchers shouldfocus on new approaches to reduce the cost of testing. Oneapproach to reduce the cost of testing is to apply test se-lection to test suites, and determine the critical unit teststo perform. This paper describes our test selection systemto detect which states are the critical paths to test. Addi-tionally, the paper describes a unit test technique that wepropose to efficiently handle the critical paths.

In describing GUI testing we may consider astate tobe the result of a single user action or manipulation ofthe GUI. For example, some state may occur after a userpresses some button, enters text into a text field, or selects amenu item. Moreover, we may find that there arepathsbe-tween states. Given that each path connects two states, theycan also be seen as the relation of two successive actions.In a GUI test suite, the combination of many test cases forthe GUI application, it is often the case that certain actionswill be repeated in succession in order to properly test theGUI. This repetition indicates that such paths are critical.Hence, we have foundcritical pathswhich require furthertests with unit testing.

For example, a Web browser GUI may have ”back” and”refresh” buttons. The resulting use of each will bring theGUI to new states. If used in succession we shall have apath between the the states. Since it is a commonplace op-eration for the user of the browser to perform those actionsin succession respectively, as the a user may need to refreshan old web page after going back to it (and a properly de-signed test suite would aught to have this), we would findthat the path between ”back” and ”refresh” to be a criticalpath.

In the next section, we discuss general background inGUI regression testing such as GUI call graphs, which de-pict the test flows of the examples presented. Section 3 de-scribes the paper’s methodology in detail including XMLparsing, the Java unit test, and the basic assumptions made.In section 4, we present our implementation. Section 5 dis-cusses the evaluations based on the results of our test imple-mentations. We also mention the performance of our tool

1

Page 2: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

run with test suites from the Java applications: Jdai, Nu-merical Chameleon, BeatsByDesign, and Huckster. Sec-tion 6 discusses complimentary work to the above. Section7 discusses possible future work. Section 8 then summa-rizes the major points of the paper.

2. Background

In this section we shall describe the conventional ap-proaches to testing for both non-GUI programs and GUIapplications, and we contrast their processes with our test-ing process. We also give some formal description of ourapproach to building the GUI call graphs.

2.1. Non-GUI Testing

The testing of non-GUI applications is generally muchsimpler than what is required for GUIs. Non-GUI appli-cations generally run with all input from a command linein text. Testing of these is made easy through scripting ofall the possible command line inputs. To perform regres-sion tests, all that is necessary is to rerun the scripted testsuite. Moreover, these non-GUI applications have text out-put which in many cases can be checked with automaticverification. Thus, we can see an overall process close tobeing fully automated. However, we shall see that this pro-cess is not applicable to GUIs.

2.2. GUI Testing Techniques

Addressing the specific pitfalls of GUI testing requiresa clear methodology that employs tools and techniques in-tegrated to use a GUI representation [7]. To formulate ourapproach, we have identified some of the more common ofthese techniques. We summarize some features of each inbrief in Table 1 and describe them some more detail in thissection.

Manual Regression Testing GUI testing, by its nature ofinvolving human interaction, cannot be scripted in the samefashion as done for testing of non-GUI programs. Thus,the conventional approach to regression testing of GUIs isthe manual testing of the interface. In this technique a hu-man tester must fully exercise the GUI, manipulating ev-ery possible control to see what may cause errors or pro-duce incorrect output. For larger GUI applications theremay be many test cases required for this process. More-over, this approach requires that the human tester manuallycheck that that his input has been properly processed bythe GUI for the application to run as expected. For properregression testing these processes must be repeated every

time a change is made to the application code base and re-quires testing. Given these factors, this will prove to be atime-consuming and labor intensive process.

Automatic Regression Testing One potential remedy tothis problem are capture/replay tools. These tools were de-veloped to assist testers by allowing them to record the testsuites once for a GUI and automatically replay the testswhen changes have been made to the application. Thetools launch the application to be tested and record the testcases through the capture of the tester’s interactions withthe GUI. With these tools the tester may organize his testcases into test suites as he sees fit. For the ”automatic” re-play of the test cases, a human tester is still needed to verifythat the application performs correctly under each test, butthe labor involved in manually exercising the GUI, whichis error prone unto itself, shall be significantly reduced.

Use of the capture/replay tools alone is often inadequateto properly test the GUI components of Java applications.Our experience with the Abbot tool (discussed further insection 3.2) shows that it works quite well in performingthe “capture” portion, i.e. creating test suites for GUIs.However, the replay portion can be problematic. Nondeter-minism plays a role in this, as differing conditions some-how affect the progress of these tests from one run to an-other. Perhaps subtleties in the larger GUI environment orwindowing systems affect how the test may function. Also,the replay tool itself may not be perfect software, and somebugs within Abbot or whatever tool used impact the result-ing replay execution such that for example, they may notcompletely execute.

Unit Testing for GUIs Unit testing involves the manualimplementation of test cases for various modules within thecode of an application that call all the possible methodswith every possible input to each. These implemented testcases can then be run automatically. In addition, it is possi-ble to automatically verify the output of each module, butwhen this process is specifically applied to GUIs, certainoperations that can be unit testing may still require somemanual human verification of correctness.

Thus, unit testing is much more straightforward and pro-vides answers that are more reliable in showing failures inthe tested module, as opposed to using an automated testharness for GUIs. However, the tricky part with perform-ing them for GUIs is knowing what to test. The process oftesting all method calls in a GUI module may not be suffi-cient for the entire user interface, whose operations dependon resulting actions from the manipulation of the GUI it-self.

Our Solution: Unit Testing with Critical Paths Giventhe problems with performing capture/replay and using unit

2

Page 3: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

Manual Automatic Unit TestRegression Test Regression Test w/o critical path w/ critical path

Advantages No external tools Saves time spent No human interaction Methodically testsrequired repeating test casesneeded with GUI important parts of GUI

Disadvantages Very time consuming Replay subject to Important GUI mani- Involves multiple toolsSubject to human failures pulations may be and techniques.error missed. Learning curve required

Table 1. Comparison of Testing Techniques

Figure 1. Conventional Process for GUI Test-ing

Figure 2. Modified Process for GUI Testing

testing as mentioned above, we have a new process for test-ing as shown in Figure 2. This differs from the conventionalmeans as shown in Figure 1 in that it will have more stepsfor a tester to undertake to arrive at test results. It combinesstages from the various conventional techniques, specifi-cally using the capture portion of capture replay and thosefrom unit testing. Additionally, we introduce a testing aidewhich must be run as part of this new process. Conven-tionally, aided GUI testing (either capture/replay or unit)has two possible approaches as shown by the two possiblepaths in Figure 1, while ours only has one. The specificparts of our process shall be described in more detail inSection 3.

2.3. GUI to call graphs

There are many algorithmic approaches to the problemof regression test suite size reduction for GUIs. For exam-ple, Rothermel and Harrold’s algorithm [13] uses a controlflow based representation to see which test cases are notnecessary to run. The algorithm has a framework for com-paring many test selection methods based on inclusiveness,precision, efficiency, and generality. This approach han-dles the graph-traversal algorithm by selecting the critical

Figure 3. GUI Regression Selection

test cases in the target test suite. Our approach is to selec-tively retest the critical paths by performing unit tests. Wealso construct a graph in our approach to test case selection,but this graph is a representation of the actions to manipu-late the GUI, i.e. a call graph as opposed to a CFG. In ourrepresentation, theverticesof the graph represent the statesof the GUI. Theedgeswithin the graph show actions takenin the manipulation of the GUI that shall bring in from onestate to the next.

In contrast to existing approaches, our call graph carriesweights from one state to the next. These weights representhow many times the test traverses the same states. Thus, theedges with the greatest weights are the critical paths.

Our algorithm is defined as follows: ProcedureP gener-ates theG. The control flow diagram is represented asG.SinceG is the output ofP in a test caset, let G= (V,S,e,x)be a directed GUI call graph. V are the graph’s vertices.S represents the states.e is the starting point which alsorepresents the entry vertex. Finally,x is the exit vertex. Avertex is a state. A path inG has the following states:

p = [s1,s2, ...,sn],n > 2 (1)

A path is represented from the sequence flow from onestate to another which can be described as follows:

p = [s1,v1,s2,v2,s3,v3, ...,vn−1,sn],n > 2 (2)

Eachvn has a weight. The weight represents the num-ber of occurrence of betweensi andsi+1 in G thatP foundduring its compilation. The heavier weight represents the

3

Page 4: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

more criticalv in the p. These critical states need to befully tested in a unit test.

As the final output forP, we reconstruct a modified ver-sionP′ with the minimal redundancy states.

3. Methodology

In this section we discuss our methodology for testingwith critical paths as shown in Figure 2. We mention someassumptions we make in the first section below. For thisprocess, we will discuss use of a capture/replay tool to cap-ture test cases. Next, we mention how we use the XML testcase data to get the critical paths. Finally, we describe howwe apply JUnit tests to these critical paths.

3.1. Assumptions

This methodology makes several assumptions. We as-sume that the test suiteP is being used to test the softwarefor different version levels and consists of at least two testcases. The test cases also have some redundancy paths, al-though the paths are in different flows or have different val-ues. We assume that the standard GUI classes perform ac-cording to the specification. Therefore, our approach doesnot test these classes. We also assume that the next versionof the tested software will not have any dramatic changesespecially with its namespace such as methods name, etc.

Another important assumption that we make about theapplications being tested is that given an implementationof the applications with separate GUI and non-GUI com-ponents (e.g. pure frontend for presentation vs backendor business logic layers), there will be separate testing ofsuch components. Thus, the non-GUI components, whichin some cases may comprise major functionality for var-ious applications, should have regression tests performedon them using conventional means, such as command-linescripting or non-GUI JUnit tests to assure quality. Our fo-cus in testing is strictly on the functionality of the user-interface; errors encountered in testing whose source arethe underlying components deserve separate attention.

3.2. Test Case Data Capture

To capture the test case data we use a capture/replaytool. Although the tool was designed to capture user in-teractions with the GUI with the purpose of replaying forfuture testing, as mentioned before, we found the replayfeature to be unreliable. For Java GUI application testingthere are several capture/replay tools available, such as Ab-bot, Jacareto, Marathon, and Pounder [4]. We tried severaltools, but we found that Abbot [16] works the best. Specif-ically, we found that Jacareto [14] had difficulty launchingvarious Java applications that had dependencies on some

external libraries to handle the look-and-feel of the applica-tion. Java GUI applications written using the Swing libraryof GUI components have the ability to switch the look-and-feel, i.e. the general appearance of all GUI components,during execution between one library and another. For ex-ample, a popular library external to the default Java runtimeenvironment is the Kunstoff [3] look-and-feel. It featuresattractive 3-D gradient shading and highlights to all GUIcomponents. Unfortunately, Jacareto did not seem compat-ible with it, as several applications we selected for test casecapture used it and would fail to launch when we wouldattempt to perform a capture in Jacareto.

Capturing the test cases in Abbot should be done me-thodically, as to properly exercise the GUI like a humantester would in manual testing. One approach to this is torecord test cases with each comprised of a single operationwith a very small number of actions to the GUI. Since theAbbot capture tool writes a file for each test case, this willresult in a large number of files. In contrast, we could alsoperform many operations with a large number of actions ineach test case, resulting in fewer files but larger file sizes.

At this point we no longer need the assistance of thecapture/replay tool. It has provided us with the test casedata in XML format that was designed for its own use forreplaying the tests. However, we too can use this test datafor our purposes, and thus we introduce a testing aide thatreads the XML data.

3.3. XML Parsing

As mentioned before, the output ofP is theGgraph. Theflow to form theG graph is recorded by the Abbot tool inan XML form. In this form, all the related and called com-ponents are identified. Each action duringP execution isrecorded sequentially. A single action invokes a transitionfrom one state to another. Since more than one action areinvoked during theP execution, some of them are identi-cal. These identical actions show the important relationsbetween states. These relations are identified as criticalpaths.

<?xmlversion= ”1.0”encoding= ”UTF−8”? >

< AWTTestScript>

< componentclass= ” javaclass1”icon= ” f ile1.gi f ” parent= ” instance1”window= ”Explorer1”/ >

< componentclass= ” javaclass2”icon= ” f ile2.gi f ” parent= ” instance2”window= ”Explorer2”/ >

< componentclass= ” javaclass3”icon= ” f ile3.gi f ” parent= ” instance3”window= ”Explorer3”/ >

< sequence>

< actionargs= ” instant”class= ”JList”method= ”actionSelectRow”/ >

< actionargs= ” f ile1.gi f ”class= ”Button”method= ”actionClick”/ >

< actionargs= ” instant”class= ”JList”method= ”actionSelectRow”/ >

< actionargs= ” f ile2.gi f ”class= ”Button”method= ”actionClick”/ >

< /sequence>

< terminate/ >

4

Page 5: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

Figure 4. XML Parsing Architecture

< /AWTTestScript>

To determine the number of occurrences between twostates, the Abbot’s XML test scripts are used as input toour test aide tool that parses the XML files and internallyrepresents the test suite as a graph. Figure 4 shows thehigh level architecture of the XML parsing system. Thereare four classes involved in the system. The GUIStateclass represents the states or vertices of graphG. The pathclass constructs the states’ relationships in a hash table tostore the graph. These two classes are used by the States-GraphRep class in constructing and reading the table. Fi-nally, the graph creator which acts as an interface class,instantiates the StatesGraphRep class to generate the finaloutput. The output shows the number of occurrences be-tween two states. The higher number of occurrences repre-sents the more critical paths that need to be validated fur-ther with the unit test.

3.4. Unit Test

After the critical paths have been identified, we needto check the correctness of the states. Specifically, thesestates includes classes, public and private methods, andalso public and private members. This JUnit frameworkruns on the top of Eclipse IDE. We use a JUnit regres-sion framework to write our code for testing the criticalstates. For example: if the critical states are involved inuser input, we write the code that simulate the user in-put. Most importantly, the code compares the user inputwith the variable that stores the user input internally toensure its correctness. This method will ensure that thetested components have the expected outputs. Our JUnitcode also uses an error seeding mechanism to change thevariable values in a variety of ways, and detects whetheror not the changes can fail the tests. This will ensurethat the tests which cover the target software are reliableand adequate. Each unit test has to extend the JFCTest-case class. This unit test uses thejunit.frameworkand junit.extensions.jfcunit to utilize the JU-nit features. This approach also tests the private mem-

bers. To access them, the test code use the JUnitPrivateAccessor class. There are three requiredmethods that must be included in the code. Themain() isused to run the test suite. Thesuite() is used in themain() . The setup() is for setting up the requiredmethods and variables to call the GUI interface.

private Obj exp;

public static void main(String[] args) {junit.textui.TestRunner.run (suite());

}

public static Test suite() {return new TestSuite(Obj.class);

}

protected void setUp() ... {super.setUp();exp = new Obj();...

}

To simulate the mouse function to click on the specificbutton, the code instantiates theJFCTestHelper classand invokes theenterClickAndLeave method.

JFCTestHelper helper = new JFCTestHelper();helper.enterClickAndLeave(new

MouseEventData(this, editModeButton));

The code itself compares values after performing theGUI interfaces. These comparisons could be equal, notequal, true, or false. While running the test, the console re-ports which variables are being performed and their values.It performs a trace log for the tester. If one of the compar-ison fails, the test is aborted and the error is reported onthe console. This unit test method is part of our strategy toreduce the regression test cases.

4. Implementation

To conduct experiments for this research, we have im-plemented a utility to generate information from readingthe XML test case files. This utility was written in theJava language. We decided to use Java for a number ofreasons. First, for the sake of convenience, we have beendealing with other Java tools and applications, such as JU-nit, Abbot, and the various programs we consider as sub-jects of our GUI testing. Secondly, Java’s portability al-lows our code to be easily run on Unix and Windows plat-forms, both of which we used to conduct our research intotesting. Thirdly, Java’s type safety features and wealth ofpredefined objects allows for rapid development of the util-ity. Additionally, an advantage we found to use Java is theavailability of a library to enable the ease of XML parsing.

5

Page 6: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

Of course, using the Java language is not free fromtrade-offs. We were well aware that the use of runningon the JVM (Java Virtual Machine) would also hinder theperformance of our application. Implementing such an ap-plication in a language such as C or C++ that compiles tomachine code should demonstrate better performance re-sults. Nonetheless, the advantages of Java for this workoutweighed those cost, and we even see in Section 5.1 thatthey are not so significant as to warrant a change.

There are a variety of XML parsers available for Java,but we chose the Dom4j parser. The specific feature of thatparser that we liked is that it elegantly represents the XMLdocuments as DOM (document object model) objects, andit has the power of XPath expressions in searching for ele-ments within the documents.

The conversion of XML elements to the states withinour internal graph representation was, for the most part,straightforward. The exceptions to this general case areworthy of note. We found that some actions described inthe XML contain additional information specific to the ac-tion, tagged on to the label of the instance. In these casesthe information has to be removed in order for the correctstate to be counted, as opposed to the generation of a newstate. We fortunately found that it was limited to a smallnumber (two so far) of object types and methods that havethe additional information that needed to be updated. Ifwe found that there were more such classes or methodsfor actions that have this or other problems that need to bespecifically handled, then perhaps we could implement asmall system for configuring the software for specific casesas they are discovered (see Future Work), but for now wefollow the XP mantra of YAGNI (You aren’t gonna need it)and wait to make the addition.

Presently, the tool produces very simple text output.This output contains the encountered paths from state tostate. Each state, as it pertains to some Java GUI com-ponent and an associated action with the component, isdescribed using an identifier for the component, the classname of the component, and a method invoked for the ac-tion. All this information about each state is important toprovide to the tester, as he may need it to properly set upthe JUnit test for that state. Pairs of each state to state arepresented in order of the most frequent occurrence withinthe frequency, i.e. to show the critical paths first.

5. Evaluation

To explore the value of our process, we have chosen anumber of applications containing GUIs written in Java.These applications vary in complexity in both the user in-terface and the underlying functionality.

Jdai is a simple “photo album” application for manag-ing a user’s collection of image files. The user interface for

this application has a very limited number of manipulatablebuttons, radio buttons, and text fields for input. It does havethe interesting property of having the user’s custom set ofimages as click-able items in the interface. That feature cancause difficulty with using “replay” in a tool like Abbot forregression testing, as the same group of images must beused for each test run.

Numerical Chameleon is a graphical Java utility thatallows for the conversion of numbers in a wide array ofvarious formats. The application demonstrates that a largeamount of functionality can be packed into a relatively sim-ple user interface, as there are a small number of buttons tomanipulate the number conversions. However, the presenceof some dynamically populated pull-down menus, with avery large number of options in each, allow for the depthin functionality. This user interface should lend itself forsystematic testing in that it should behave the same despitewhich data set, i.e. specific conversion routine, is presentlyin view. The routines themselves form part of a lower or al-most back-end layer for the application that does not mod-ify the basic GUI functionality.

Huckster is a utility for the creation of slide-basedpresentations, similar to Microsoft PowerPoint, but withhardly the functionality. The user interface is primarilymenu driven with a view of the current slide and editabletext fields to allow for user input to set up each slide.

BeatsByDesignis a “16 note” drum sequencer applica-tion. The interface contains a large number of buttons...While the application has some visible state change fromits own manipulation, the primary “output” for this is to anaudio device in the form of the drum pattern sequence.

5.1. Performance

We ran performance trials of our test aid on a Pentium 41.8 Mhz workstation running windows 2000 Professional.The utility ran under Sun Java JDK 1.4.1 and under theIBM Eclipse Platform version 3.0.0. There were five dif-ferent test suites over which we ran our application forperformance analysis: one suite for Huckster, NumericalChameleon, and BeatsByDesign, and two for Jdai. Eachvaries in the number of test cases, represented in individualXML files. This variation indicates that one may encountervarious plans for organization of test suites for different ap-plications. For BeatsByDesign and Huckster, the strategywas to spread the operations over a larger number of testcases. Thus, we find a smaller average number of actionsprocesses per test suite for each of these. The test suitefor Numerical Chameleon follows what was stated abovewith a moderate number of test cases per suite. Using asomewhat random testing pattern, the test suite attempts toprovide good coverage.

6

Page 7: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

Test Suite # Test Cases # Actions Total Suite Size(KB) Exec Time(ms)BeatsByDesign 14 198 80.1 853.1Huckster 22 208 103 812.4Jdai short 7 55 28.5 523.2Jdai long 5 606 90.6 860.8Numerical Chameleon 10 358 67.6 801.7

Table 2. Data for test case parsing

Figure 5. Plotting exec time for number of testcases

Our timings were measured simply by recording the sys-tem time in milliseconds, as reported by the Java VirtualMachine (JVM) through the System.currentTimeMillis()call. Given that this time may fluctuate from one execu-tion to another on the whim of the process scheduler, weran each performance trial ten times to report an averageexecution time for each test suite’s processing. Timings forthe various applications, along with additional data for eachare presented in table 2.

Given that running the application for each test suitegave differing average execution times, we may want to de-termine which characteristics of each test suite affects theexecution time. In figure 5 we see that Jdai short has thelowest execution time with 523.2 ms on average and Jdailong has the highest with 860.8 ms. These test suites alsovary by the greatest number of actions with 55 and 606,respectively. However, we see no direct relationship herebetween the number of actions and execution times withthe results from the other applications. For example, Huck-ster has 10 more actions than BeatsByDesign, yet it showsa faster execution time.

Although we may not see an obvious metric for approxi-mating execution time, the recorded times do show that thisprocess is not presently very time consuming. One couldprobably anticipate that we may see close to linear growthin the execution time for large growing test suite sizes.Even a suite 100 times larger than our smallest tested wouldnot take significant time in comparison with the develop-

Figure 6. JDAI Testcase 1: Image Rotation

ment time required to implement the unit tests, as shallbe presented in the next section. Moreover, average buildtimes for Java applications are often more costly, given ei-ther an infrequent software build system or one closer tocontinuous integration.

5.2. Test implementations

To evaluate the benefit of using our process, we presentthe results of two cases studies we performed on the Jdaiand Numerical Chameleon applications. These studies in-volved looking for differences of how long it takes to im-plement test cases with and without using the critical pathdata as produced by our testing tool. Additionally, we com-pare the number of critical paths we are able to implementusing the data.

5.2.1. JDAI In Jdai, we built twelve test cases to evalu-ate the effectiveness of our methods. We ran each of the testcases, captured the actions, and saved the steps in XML file

7

Page 8: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

Figure 7. JDAI Testcase 2: Image Info Edit

format. Once we had the XML outputs, we identified thecritical paths. We tried to count the number of occurrencesbetween two states manually. Without using the XML pars-ing tool, we spent four hours in testing Jdai and discoveredonly 4 critical paths as shown in Table 3 before we finallystopped running the other test cases. We spent another hourin generating the graphs on two different cases as shown inFigure 6 and 7. This manual presentation method is un-reliable since it depends on human counts. We then re-calculated the counts with our XML processing tool, sig-nificantly reducing amount of time needed to generate thenumbers. With this approach, we spent 95 minutes to writethe code and get the results. With the XML parsing tool,we could discover 14 critical paths. To identify the criticalpaths, we see the number of occurrences that were gener-ated by our XML parsing tool. With the critical paths, weidentified the methods and members which were being ex-ecuted from the Jdai source code. We found out that thosemethods and members which spread across the code wereboth public and private. Later, we built our unit tests withthe help of JUnit to make certain those public and privatemembers performed correctly.

In the JUnit test, we spent less than half an hour to builda test script. This excludes the period we spent to learn theJUnit methods. We spent at least 16 hours in understandingJUnit. We do not consider this learning period as significantsince it is a one time cost.

To select JDAI images for example, we use the follow-ing code in the JUnit test script:

try {

JdaiPhotoList list = (JdaiPhotoList) PrivateAccessor.getField(exp, ”list”);

list.selectPhoto(1); list.selectPhoto(2);

}

catch(Throwable t){

t.printStackTrace();

}

To edit the image info for example, we use the followingcode in the JUnit test script:

try

JdaiPhotoList list = (JdaiPhotoList) PrivateAccessor.getField(exp, ”list”);

list.selectPhoto(1);

/* Input Caption */

JTextArea captionTextArea = (JTextArea) PrivateAccessor.getField(edit,”captionTextArea”);

captionTextArea.setText(”Student ID”);

/* Input Keyword*/

JTextField keywordsTextField = (JTextField) PrivateAccessor.getField(edit, ”keywordsTextField”);

keywordsTextField.setText(”Photo ID”);

/* Input Caption Text */

JTextArea sectionCaptionTextArea = (JTextArea) PrivateAccessor.getField(edit, ”sectionCaptionTextArea”);

sectionCaptionTextArea.setText(”Student Requirement”);

/* Input Keyword */

JTextField sectionKeywordsTextField = (JTextField) PrivateAccessor.getField(edit, ”sectionKeywords-

TextField”);

sectionKeywordsTextField.setText(”California University of Santa Cruz”);

catch(Throwable t)

t.printStackTrace();

5.2.2. Numerical Chameleon Without the use of thecritical sections output from our test case XML process-ing tool, the implementation of test cases for NumericalChameleon was simply the invocation of the methods con-tained in the Main class of the application. This class hap-pens to contain all the code to build the GUI portion of theapplication. Many of the methods contained here are GUIevent processing methods. They just so happen to requireevent data objects as parameters that can only be createdby the event dispatcher for the GUI. Thus, these methodscould not be tested through JUnit by direct invocation.

In 2 hours we can implement 4 test cases that performon average 5 method calls each using this method.1 Muchof the time was spent understanding what methods wereappropriate to test and determining appropriate parametersfor those methods that require them. Perhaps these testscould be expanded to cover some more functionality of the

1These numbers for time to implement test cases do not count set-uptime to prep the application for test implementation. It is assumed that thetester has some skeleton code from which to implement the JUnit tests inplace.

8

Page 9: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

JDAI Numerical Chameleonw/o criticalpath w/ criticalpath w/o criticalpath w/ criticalpath

Test case creation time(min) 240 95 120 45Critical paths tested 4 14 2 7

Table 3. Unit Test Implementation Metrics

GUI, but given the large number of methods that could not,we believe that we provide an appropriate sampling of thetestable methods for our purposes.

With the the critical paths data, the shift in test caseimplementation focus was to test sequences of GUI ma-nipulations. The most common paths were two successiveincrement operations, that being the most common path,followed by two successive decrement operations. This re-sulted in 45 minutes of implementing 7 test cases whichwere simply two operations each.

The two different approaches to manual JUnit test casegeneration show some small similarities in the coverage,but not very many. In this case the addition and subtractionfunctions were testable by creating new JUnit tests withand without the use of the critical paths data, as there werespecific method calls available. However, for all the otheroperations tested from the critical path data, there were nocorresponding method calls to perform those functions inthe “Main” class of the application. Thus, we were onlyable to test two critical paths without having the data. Per-haps more careful study of the implementation could yielda clever way to determine those tests, but it appears to bemuch more time consuming than to just have a referenceavailable of what to test.

6. Related work

Research in improving automated regression testing hasprogressed for some time now. TestTube [2] considered theproblem of reducing the number of regression tests to berun. Their methodology is to examine the various modifi-cations made to different components of some software. Ifthe modifications fall within code that stands to be executedby a given test in a regression test suite, then that test shouldbe rerun. Otherwise, it may be skipped, as that would meana safe condition without new code being introduced. Har-rold and Rothermel [12] specifically studied the problemof regression test selection. They modelled systems intocontrol-flow graphs in order to look for modifications incode that would trigger a need to rerun the regression testfor a particular module, similar to TestTube above. Whenusing their system, called DejaVu, it appears that they canrun half as many tests in some cases than without, but manycases yield very few savings in terms of test cases not run.

Harrold et al. [6] consider the problem as it pertainsspecifically to code written in Java. They go a step further

than TestTube or DejaVu which consider the granularity ofmodifications to be that of individual functions declared inC. Instead, they consider individual statements within Javamethods, which they label as the edges in the CFG rep-resentations of the programs. Using this finer granularitythey try to demonstrate a greater reduction in the necessarytest cases to run. Our work does not consider changes incode in producing a reduced set of test cases. We address aslightly different problem. Both of these works mentionedabove do not deal with GUI testing, which may often haveto exercise much of a product to be effective, thus not gain-ing much benefit from those test case reduction techniques.

Some work in testing specific to GUIs has been under-taken. Memon, who has done a number of different studiesin GUI testing, has one specific examination of regressiontesting of GUIs [11]. He has analyzed the organizationsof GUI components that can be involved in GUI test casesin terms of control-flow graphs and GUI call graphs. Thisapproach has definitely influenced our choice to exemplifythe GUI test cases as graphs for purposes of analysis. How-ever, his and our work differ in that he focuses on how toautomate the modification of test cases for regression test-ing where the layout of the GUI may change, not reducingthe size of the test suite.

Memon et al. have done some other work in GUI testing.One instance of work is an approach to test case genera-tion using planners, based on having sets of predeterminedinitial and goal states [8]. They go further as to describea system for the test case generations called PATHS [10].This system focuses on test case generation hierarchically,as many GUI designs lend themselves to such organiza-tions. Such a hierarchy entails the high level focus of thetest case design, in which tests focus on different overallgoals, and the lower level, which are a sequence of individ-ual actions that together constitute part of the higher level.There may be various lower level cases for one higher one.Nonetheless, all that must be tested, as to ensure some de-gree of quality, hence the purpose of performing such testsin the first place.

Additional work of the above mentioned group is theimplementation of an automated test oracle for PATHS [9].This oracle derives expected states for a GUI tests basedon operators specified by a designer. Then, automated testscompare actual states that occur compared to the expectedstates. Takahashi also looked into creating a test oracle, butfocused on the problem of verifying graphical objects that

9

Page 10: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

may arise in the GUI and considered the API comparison[15].

The above research has not involved GUIs specific tosoftware written in Java. jRapture was a tool for Java GUItesting that uses capture/replay techniques, but also allowsfor profiling during replay through some instrumentation.However, we chose to use the Abbot tool [16] for test casecreation, as it is compatible with the quite popular and easyto use JUnit testing framework for Java [5].

7. Future Work

One potential direction to take the work done here is toenhance the capabilities of the test suite parsing tool. Atpresent the tool works solely with an XML DTD for theAbbot capture/replay application. We chose Abbot becauseit proved to be the most reliable at the time from a num-ber of tools we tried in running various Java applicationGUIs for capture. However, it is possible that brand-newor newer versions of existing tools may prove to be betterin the future. Thus, we would want our tool to be compat-ible, and so we might want to build the next version to beextensible with respect to the XML DTD. Moreover, andas mentioned earlier, further study of the output from otherGUIs may show that new rules for determining commonstates from different action elements in the XML may beneeded. Thus, we may need to add more extensible fea-tures to handle changing rules.

We may also wish to look at more automation of tests.Presently our process automatically provides a referencefor testing implementation, but the test implementation it-self still must be performed manually. Since many of thetests have much code in common, there is much potential toautomate the coding of the unit tests directly from process-ing the captured GUI test suites. Then, the test developersmay not have to write much code anymore at all for unittesting; they could simply use the capture tools. This mayappear similar to just using the replay portion, but by gener-ating reliable code for JUnit test cases and suites, these testsuites would be more reliable than those to replay capturedinteraction with GUIs.

A much further direction that we could take would be togo beyond our presently simple analysis of the generatedGUI call graphs. Some algorithm might exist that could,given a call graph for a GUI test suite, produce a new groupof test cases that follows the critical path and fully exer-cises the GUI without redundancy. We had approached thisproblem in the formulation of the work for this paper. How-ever, the algorithms we thought might best relate were onlysuited for directed graphs. Such an algorithm may surface,but we may find that the problem is intractable, perhapsreducible to the Travelling Salesperson Problem and NP-hard. Then, we would have to determine if a non-optimal

solution would be sufficient, and of course, if its outputwould be of value.

8. Conclusion

The exploration of testing techniques for graphical userinterfaces is still a very young field within the still relativelyyoung field of research for software testing in general. Al-though we can learn some things about testing from exist-ing approaches for both GUI and non-GUI application test-ing, it was appropriate to take a new approach. Part of thebeauty of this approach was that it combined the elementsof old approaches to GUI testing in an innovative way.

We see that GUI regression tests have redundant stepswhich are critical to the functioning of the GUI and deservesufficient testing. This would be labor intensive if per-formed by manual testing. However, since we can recordthe tests, we see that capturing the states flow from au-tomated GUI regression tests can identify theses criticalpaths. Finally, albeit manually, we can set up unit testsfor these critical states. With the states already isolated forus, testing them is much more straightforward. Performingthe unit tests for these critical states will save much timeand testing versus manually trying to set up the tests with-out knowing them or going through the manual testing pro-cess. Although, ideally we would wish to provide a fullyautomated an reliable solution to GUI testing, i.e. a silverbullet, we hope that we have shown an improvement to thefield of testing pointed in that direction.

References

[1] T. Ball. On the limit of control flow analysis for regres-sion test selection. InInternational Symposium on SoftwareTesting and Analysis, pages 134–142, 1998.

[2] Y.-F. Chen, D. S. Rosenblum, and K.-P. Vo. Testtube: asystem for selective regression testing. InProceedings ofthe 16th international conference on Software engineering,pages 211–220. IEEE Computer Society Press, 1994.

[3] A. R. et al. incors.org - kunstoff look&feel, 2002.http://www.incors.org/index.php3.

[4] E. Gamma and K. Beck. Gui tools for junit, 2004.http://www.junit.org/news/extension/gui/index.htm.

[5] E. Gamma and K. Beck. Junit, testing resouces for extremeprogramming, 2004. http://www.junit.org/index.htm.

[6] M. J. Harrold, J. A. Jones, T. Li, D. Liang, and A. Gujarathi.Regression test selection for java software. InProceedingsof the 16th ACM SIGPLAN conference on Object orientedprogramming, systems, languages, and applications, pages312–326. ACM Press, 2001.

[7] A. M. Memon. Gui testing: pitfalls and process.IEEEComputer, 35(8):87–88, 2002.

[8] A. M. Memon, M. E. Pollack, and M. L. Soffa. Using agoal-driven approach to generate test cases for guis. In

10

Page 11: Critical Paths for GUI Regression Testing · Manual Regression Testing GUI testing, by its nature of involving human interaction, cannot be scripted in the same fashion as done for

Proceedings of the 21st international conference on Soft-ware engineering, pages 257–266. IEEE Computer SocietyPress, 1999.

[9] A. M. Memon, M. E. Pollack, and M. L. Soffa. Automatedtest oracles for guis. InProceedings of the 8th ACM SIG-SOFT international symposium on Foundations of softwareengineering, pages 30–39. ACM Press, 2000.

[10] A. M. Memon, M. E. Pollack, and M. L. Soffa. Hierarchicalgui test case generation using automated planning.IEEETrans. Softw. Eng., 27(2):144–155, 2001.

[11] A. M. Memon and M. L. Soffa. Regression testing of guis.In Proceedings of the 9th European software engineeringconference held jointly with 10th ACM SIGSOFT interna-tional symposium on Foundations of software engineering,pages 118–127. ACM Press, 2003.

[12] G. Rothermel and M. J. Harrold. Experience with regres-sion test selection.Empirical Software Engineering: AnInternational Journal, 2(2):178–188, 1997.

[13] G. Rothermel, M. J. Harrold, and J. Dedhia. Regression testselection for c++ software.Software Testing, Verificationand Reliability, 10(2):77–109, 2000.

[14] C. Spannagel. Jacareto, 2003. http://www.ph-ludwigsburg.de/mathematik/personal/spannagel/jacareto/.

[15] J. Takahashi. An automated oracle for verifying gui objects.SIGSOFT Softw. Eng. Notes, 26(4):83–88, 2001.

[16] T. Wall. Abbot java gui test framework, 2004.http://abbot.sourceforge.net/.

11