19
https://doi.org/10.1007/s10836-019-05844-6 Comparing Graph-Based Algorithms to Generate Test Cases from Finite State Machines Matheus Monteiro Mariano 1 · ´ Erica Ferreira de Souza 2 · Andr ´ e Takeshi Endo 2 · Nandamudi Lankalapalli Vijaykumar 3,4 Received: 10 August 2019 / Accepted: 2 December 2019 © Springer Science+Business Media, LLC, part of Springer Nature 2020 Abstract Model-Based Testing (MBT) is a well-known technique that employs formal models to represent reactive systems’ behavior and generates test cases. Such systems have been specified and verified using mostly Finite State Machines (FSMs). There is a plethora of test generation algorithms in the literature; most of them are based on graphs once an FSM can be formally defined as a graph. Nevertheless, there is a lack of studies on analyzing cost and efficiency of FSM-based test generation algorithms. This study compares graph-based algorithms adopted to generate test cases from FSM models. In particular, we compare the Chinese Postman Problem (CPP) and H-Switch Cover (HSC) algorithms with the well-known Depth-First Search (DFS) and Breadth-First Search (BFS) algorithms in the context of covering all-transitions and all-transition-pairs criteria in an FSM. First, a systematic literature mapping was conducted to summarize the methods that have been adopted in MBT, considering FSMs. Second, the main methods found were implemented and analyzed on randomly-generated FSMs, as well as real-world models that represent embedded systems of space applications. To make comparisons, we considered analyses in terms of cost (time), efficiency (mutant analysis) and characteristics of the generated test suites (number of test cases, average length of test cases, largest and smallest test cases, standard deviation and distribution of test cases). In general, CPP presented the best results in terms of number of test cases and test suite size. In addition, CPP also presented low distribution of average length compared to other algorithms. Keywords Software testing · Model based testing · Finite state machine · Graph-based algorithms 1 Introduction As computing evolves, the development of software has become more complex to meet the demands for new and efficient technologies. Several products and services depend on developing software systems considered critical and complex, for instance, embedded systems in satellites. Space missions require organized and manageable test activities related to reliable software development, since failures in space systems may cause damages to environment or even loss of lives [5]. In order to assure the quality of such software, several efforts in Academia and Industry are dedicated to design Responsible Editor: L.M.B. P ¨ o hls Matheus Monteiro Mariano [email protected]; [email protected] Extended author information available on the last page of the article. more refined test strategies [54]. Among the main strategies to have formal and systematic software testing, Model- Based Testing (MBT) has drawn a lot of attention since it has shown effectiveness by using models to represent the system behavior to guide the automatic generation of test cases [54]. The system behavior is based on a formal representation of the software specification, such as Finite State Machines (FSMs) [23, 51]. FSMs have been adopted for modeling reactive systems and protocol implementations for a long time [13, 45]. Reactive systems respond to internal or external stimuli, and it is very suitable to model such systems by means of FSMs since one can basically express the input/output behavior of these applications along the transition labels of an FSM. Embedded software in space applications, for instance, falls into the category of reactive systems and can be modeled using FSMs [45]. Efforts of the scientific community have been spent on analyzing cost and efficiency of test case generation algorithms for FSMs. There exists a plethora of test Journal of Electronic Testing (2019) 35:867–885 / Published online: 11 2020 Januar y

Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

https://doi.org/10.1007/s10836-019-05844-6

Comparing Graph-Based Algorithms to Generate Test Casesfrom Finite State Machines

Matheus Monteiro Mariano1 · Erica Ferreira de Souza2 · Andre Takeshi Endo2 ·Nandamudi Lankalapalli Vijaykumar3,4

Received: 10 August 2019 / Accepted: 2 December 2019© Springer Science+Business Media, LLC, part of Springer Nature 2020

AbstractModel-Based Testing (MBT) is a well-known technique that employs formal models to represent reactive systems’ behaviorand generates test cases. Such systems have been specified and verified using mostly Finite State Machines (FSMs). Thereis a plethora of test generation algorithms in the literature; most of them are based on graphs once an FSM can be formallydefined as a graph. Nevertheless, there is a lack of studies on analyzing cost and efficiency of FSM-based test generationalgorithms. This study compares graph-based algorithms adopted to generate test cases from FSM models. In particular,we compare the Chinese Postman Problem (CPP) and H-Switch Cover (HSC) algorithms with the well-known Depth-FirstSearch (DFS) and Breadth-First Search (BFS) algorithms in the context of covering all-transitions and all-transition-pairscriteria in an FSM. First, a systematic literature mapping was conducted to summarize the methods that have been adopted inMBT, considering FSMs. Second, the main methods found were implemented and analyzed on randomly-generated FSMs,as well as real-world models that represent embedded systems of space applications. To make comparisons, we consideredanalyses in terms of cost (time), efficiency (mutant analysis) and characteristics of the generated test suites (number oftest cases, average length of test cases, largest and smallest test cases, standard deviation and distribution of test cases). Ingeneral, CPP presented the best results in terms of number of test cases and test suite size. In addition, CPP also presentedlow distribution of average length compared to other algorithms.

Keywords Software testing · Model based testing · Finite state machine · Graph-based algorithms

1 Introduction

As computing evolves, the development of softwarehas become more complex to meet the demands fornew and efficient technologies. Several products andservices depend on developing software systems consideredcritical and complex, for instance, embedded systems insatellites. Space missions require organized and manageabletest activities related to reliable software development,since failures in space systems may cause damages toenvironment or even loss of lives [5].

In order to assure the quality of such software, severalefforts in Academia and Industry are dedicated to design

Responsible Editor: L.M.B. Po hls

� Matheus Monteiro [email protected]; [email protected]

Extended author information available on the last page of the article.

more refined test strategies [54]. Among the main strategiesto have formal and systematic software testing, Model-Based Testing (MBT) has drawn a lot of attention sinceit has shown effectiveness by using models to representthe system behavior to guide the automatic generationof test cases [54]. The system behavior is based on aformal representation of the software specification, suchas Finite State Machines (FSMs) [23, 51]. FSMs havebeen adopted for modeling reactive systems and protocolimplementations for a long time [13, 45]. Reactive systemsrespond to internal or external stimuli, and it is verysuitable to model such systems by means of FSMs sinceone can basically express the input/output behavior ofthese applications along the transition labels of an FSM.Embedded software in space applications, for instance, fallsinto the category of reactive systems and can be modeledusing FSMs [45].

Efforts of the scientific community have been spenton analyzing cost and efficiency of test case generationalgorithms for FSMs. There exists a plethora of test

Journal of Electronic Testing (2019) 35:867–885

/ Published online: 11 2020January

Page 2: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

generation algorithms in the literature. Most of them aregraph-based once an FSM can be defined as a graph.Nevertheless, the literature lacks studies on analyzing FSM-based test generation algorithms to find innovative methodsto help test designers to generate test cases in a moresystematic manner.

This paper aims to compare graph-based algorithmsemployed to generate test cases from FSMs in the contextof MBT. In particular, we compare the Chinese PostmanProblem (CPP) and H-Switch Cover (HSC) algorithms withthe well-known Depth-First Search (DFS) and Breadth-FirstSearch (BFS) algorithms in the context of covering All-Transitions (AT) and All-Transition-Pairs (ATP) criteria inFSMs. In order to determine which algorithms would beanalyzed in this study, we conducted a systematic mapping.A systematic mapping provides a broad overview of an areaof research, determining whether there is research evidenceon a particular topic and summarizing such evidence thatwas found [31].

The main algorithms found in the systematic mappingwere analyzed with experiments. We compared HSC, CPP,DFS and BFS algorithms, considering the number oftest cases, test suite size, average length of sequences,its standard deviation and distribution, generation timeand fault detection ratio using mutant analysis. Randomlygenerated complete machines were adopted, as well asreal-world FSMs representing embedded software in spaceapplications that fall into the category of reactive systems.

Preliminary results of this research were publishedin [36]. The extensions of this work are essentially threefold.First, we improved the analysis of systematic literaturemapping in order to expand the knowledge about existingapproaches to automatically generate test cases in MBT.Therefore, we provide better rationale on the selectionof the compared test algorithms and criteria. Second, weextended the analyses and discussion of previous measures,mainly of distribution results. Third, we analyzed the testsuites generated by the algorithms with respect to theirfault detection capabilities. To do so, we employed mutationanalysis1 to measure the effectiveness of a test suite in termsof its ability to detect faults [28].

The remainder of this paper is organized as follows:Section 2 presents the main concepts and related work.Section 3 presents the mapping study. Section 4 presentsthe analyzed algorithms. Section 5 shows the experiments.Section 6 shows the main results. Finally, Section 7 presentsa general discussion and conclusions.

1Mutation analysis is a technique that artificially introduces faultsin a program, generating modified versions called mutants. Such setof mutants can be used as to build an indicator of fault detectioncapabilities of a test suite.

2 Background

In this section, the main concepts of this study and relatedwork are discussed.

2.1 Definitions

One of the main software testing definitions was proposedby Myers as an execution process of a program withintention of finding errors [39]. Testing a software is tocheck if the system will behave as expected, analyzing itsbehavior from a finite set of test cases. A test case is aset of inputs, conditions of execution and expected resultsdeveloped for a specific goal [27].

Test activity can be applied in three main software levels[46]: unit testing, integration testing and system testing.Each level focuses on a specific behavior of a program,employing techniques to generate test cases. Two maintechniques are: Structural testing, that has the objective toanalyze the code flows of software; and Functional testing,that generates tests using an abstraction of software (basedon software specification) that replaces the code, focusingon analyzing if software behaves in accordance with thisabstraction.

Functional testing might employ an abstraction of thesoftware, based on its specification, that represents thebehavior of system. This approach is well-known as MBT[48]. One of the main features of MBT is the automatedgeneration of test cases based on formal representation, forexample, FSMs [23, 51]. FSM is a well-known technique torepresent test models.

FSM is a formal model used to represent the behavior of asystem based on the software’s specification. An FSM has aset of states and transition arcs between these states [23, 32].Transition arcs have labels along the arcs known as events(or inputs). When an event is stimulated, a change from onestate to another takes place. A state is the representation ofa system aspect. A transition is a reaction of the machinebased on a specific input value stimulated from a certainstate, that changes the current state to another. A test case(or test sequence) is a path starting from the initial state andreturning to itself, listing all the input values along this path.A sequence can be obtained by means of a test method.

FSM describes the dynamics of software based on inputstimuli present in the transition arcs. But, depending oninput set or length it becomes impracticable to analyze allthe possible transitions. So, strategies are needed to traversean FSM and, at the same time, maintain a meaningfulcoverage. There are several approaches to generate testcases in a software represented in FSM. But, to do this, itis essential to understand some concepts of test criteria andmethods.

J Electron Test (2019) 35:867–885868

Page 3: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

A test criterion (or coverage criterion) is a definition ofwhich elements of the FSM model should be covered togenerate a test sequence (or transition sequence). Usually,a method has an objective to reach a coverage criterion,but a coverage criterion does not necessarily need to beattached to a method. This is an important step of test casegeneration, because depending on the criterion, differentmeans can be used to traverse a graph. The main test criteriaare [3, 38]:

– all-nodes: all reachable nodes of graph (or states ofFSM) must be covered. Using Fig. 1, a path thatcomplies this criterion is: W → P → B → W .

– all-edges: all edges of graph (or transitions of FSM)must be covered. So, naturally, all nodes will becovered. An output to this criterion by Fig. 1 is:[a/0] → [b/1] → [r/0] → [b/1] → [f/2] →[c/2] → [f/2].

– edge-pairs: this criterion requires that all reachablepaths of length 2 of a graph are covered. So, it verifies ifthere is an edge pair that is reachable. An output to thiscriterion using Fig. 1 is: [a/0−b/1] → [b/1− r/0] →[b/1 − f/2] → [c/2 − r0] → [c/2 − f/2].

– fault model: the test cases are generated from the modelto cover all fault types identified in the domain. InFSMs, a fault model can be defined from four categoriesof faults: operation error, transfer error, extra-state errorand absent-state error.

Over the years, several algorithms for automatic genera-tion of test cases from FSM models were proposed [11, 15,18, 20, 24, 40, 51]. In this work, we consider a method anapproach used to find a set of test cases, and an algorithmis an implementation of this method. In general, a methodaims to achieve a test criterion by covering a model.

Fig. 1 Example of an FSM with three states (W, P, C) and fivetransitions (a/0, c/2, b/1, f/2, r/0). Source: [2]

The formal notations herein are based on [19]; Fig. 1is used as an example. Formally, an FSM M is a 7-tuple(S, s0, I, O, D, δ, λ), where:

– S is a finite set of states with initial state s0 (W, P andB, with W being the initial state);

– I is a finite set of inputs (a, c, f, r, and b);– O is a finite set of outputs (0, 1, and 2);– D ⊆ S × I is a specification domain;– δ : D → S is a transition function; and– λ : D → O is an output function.

FSMs have some properties that are mentioned asfollows:

– Mealy and Moore machines: the theory from bothmachines are similar, but the Mealy machine isprocessed as a finite automaton, which generates anoutput for each transition and depends on the currentstate and its inputs [12], whereas the Moore machinegenerates an output for each state of the machine thatcan be a null value, and depends only on current state ofthe machine. In spite of this difference, the processingof both machines is the same. In the context of thework presented here, we will be always referring ourmachines to Mealy machines.

– Completely specified: an FSM is completely specifiedif it treats all possible inputs in all states of the machine,such that the FSM has transitions to all possible inputsin a state. Otherwise, it is considered partially specified.

– Strongly connected: an FSM is defined as stronglyconnected if for each pair of states there is a connectionbetween them. It is considered initially connected iffrom the initial state it is possible to access all otherstates of the FSM.

– Minimal or reduced: an FSM is defined as minimalif there are no two equivalent states, that is, differentinputs which generate same outputs.

– Deterministic and nondeterministic: an FSM is consid-ered deterministic if any state of machine has only onepossible transition from an input to another state. Oth-erwise, it will be nondeterministic, because a state willhave more than one transition from the same input.

This paper is based on deterministic Mealy machinemodels.

2.2 RelatedWork

The first step of this work was conducting a systematicmapping in order to identify which are the main approachesused in the context of MBT. Therefore, we present belowsome studies that have conducted other secondary studies

J Electron Test (2019) 35:867–885 869

Page 4: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

(systematic mappings or systematic literature reviews) inthe same context. Next, we present some publications thatconducted experiments to compare different FSM methodsto generate test cases.

Barmi et al. [6] conducted a systematic mapping withrespect to the alignment of specification and testing offunctional and non-functional requirements to identify gapsin the literature. The map summarizes studies and discussesthem within sub categories with MBT and traceability.The main results showed that the papers that have linkedthe specification with test requirements focused on model-centric approaches, code-centric approaches, traceability,formal approaches and test cases, with focus on problems ofMBT and traceability.

Siavashi and Truscan [50] conducted a systematic reviewin order to understand how an environment model canbe used in MBT and challenges. Environment modelingis an activity that specifies a part of the real world, inwhich the system is integrated. From the characteristics andadvantages, the authors conclude that using environmentmodels can be helpful in robustness testing, safety testingand regression testing. However, methodological aspects ofcreating environment models are only discussed in a limitednumber of studies.

Gurbuz and Tekinerdogan [25] presented a systematicmapping to describe the advances in MBT for softwaresafety. The results show that the safety is a broad area andapplicable to several domains with focus on reducing costsand increasing test coverage. Nevertheless, the solutionsproposed are focused on specific domains, making thesolution less generic and less reusable.

Another systematic review performed by Khan et al. [30]showed how such review analyzed the quality of empiricalstudies with MBT. The results identified that the communityof MBT needs to provide more details in reportingguidelines, and few papers were applied in industryartifacts.

Each presented study conducted some kind of secondarystudy on MBT, but with different purposes. In a similar wayto related literature, we also conducted a secondary studyon MBT, however, our focus was to systematically map themethods, criteria, formal models for generating test casesand the evaluation strategies of the generated test cases.

So, now we will show other publications from theliterature that propose and compare methods that generatetest cases from FSM. Souza et al. [53] presented anempirical evaluation of cost and efficiency between two testcriteria of the Statechart Coverage Criteria Family (FCCS)[55] all-transitions and all-simple-paths criteria, and theSwitch Cover criterion for FSMs. Mutation analysis wasused to evaluate the criteria. Cost is defined according to

the size of test suite and the time needed by a test suite tokill a mutant. Efficiency is related to the ability a test suitehas to kill mutants (mutation score). The results showed thattwo FCCS criteria and Switch Cover method presented thesame efficiency. But, all-simple-paths criterion presented abetter cost because its test set is smaller and was faster inidentifying the mutants than other test sets, meaning that thecriterion can detect software faults faster than other criteria.

In Simao et al. [52], the authors describe an experimentalcomparison among different coverage criteria for FSMs,such as state, transition, initialization fault, and transitionfault coverage. In order to perform such comparison, anapproach was developed to generate suitable test cases foreach one of the criteria without depending on the chosenmethod. First, a test suite that suits a chosen criterion wasgenerated, and then it was minimized using a generalizedgreedy algorithm. The results showed that the proposedapproach was faster than others found in the literature,despite a little loss in the reduction power.

Dorofeeva et al. [18] presented an overview and anexperimental evaluation of FSM methods: DS, W, Wp,HSI, UIO, and H. They analyzed the criteria on differentaspects, such as test suite length and derivation time.The experiments were conducted on randomly generatedspecifications and on two real-world protocols. The resultsshowed that the methods, except UIO, produced a test setalmost independent of specification length, and W-methodproduced a larger test set than other methods. With respectto complete test set, H method surpassed the other methodswith its test set consisting of only 40% of the W-method testset.

Endo and Simao [21] presented an experimental studythat compared the complete test sets generated automat-ically using the methods W, HSI, SPY, H, and P. Theyanalyzed the number of test cases and their length, the totalcost (i.e., the length) of each test suite, and the effective-ness of the methods using mutation testing for FSMs. Inorder to conduct the study, FSMs were randomly generatedvarying number of states, inputs, outputs, and transitions.The results showed that SPY produced, on average, greaternumber of test cases and HSI had smaller number of testcases. This indicates that the most recent methods in theliterature produce a smaller test suite than traditional meth-ods. The P method generates a shorter test set. And also, itwas identified that all methods had a rate of 92% of faultdetection.

Mariano et al. [35] conducted an experiment thatcompared classic algorithms of generation of test casesfor FSM models, such as Depth-First Search (DFS) andBreadth-First Search (BFS), with H-Switch Cover method.This experimental study considered randomly generated

J Electron Test (2019) 35:867–885870

Page 5: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 2 Algorithms identified in the mapping

and real-world FSMs. The analysis considered the numberof test cases, length of test suite, average length of testsequences and generation time of test cases. Generally, thethree algorithms showed similar results when comparedwith randomly and real-world FSMs. In terms of numberof test cases and length, H-Switch Cover method presentedbetter results. But, considering the average length of testcases, DFS presented the best result. Finally, in terms oftime to generate test cases, DFS and BFS presented betterperformance.

Finally, in [54], an investigation of cost and efficiencywas conducted comparing the FSM criteria: H-SwitchCover, UIO and DS. In order to analyze the test cases’efficiency, mutation analysis (mutation score) was adopted.With respect to cost, the size of the test set generated(number of events) was considered. The investigation wasconducted considering two embedded software products(space applications). As a result, in terms of efficiency, thethree methods presented good performances with respect tomutation score. In terms of cost, considering the quantityof events, in the first case study UIO and DS were better;but in the second case study, H-Switch Cover was better.Also, in the second case study, H-Switch Cover got abetter performance in relation to the average and standarddeviation of mutation scores and quantity of events of test

cases generated. In relation to cost (time to detect a fault), H-Switch Cover was the fastest, because it presented smallernumber of test cases.

3 Findings of a Mapping Study

In order to determine which algorithms would be analyzedin this study, we conducted a systematic mapping. Amapping study provides a broad overview of a research areain order to determine whether there is research evidence on aparticular topic. Results of a mapping study help identifyinggaps in order to suggest future research [31].

In addition to the main algorithms, we also investigated,by means of a mapping study, other approaches forautomatic generation of test cases in MBT such as criteria,formal models and evaluation of test cases generated. Afull version of this mapping is available in [37]. We hereinsummarize only the results relevant for this paper.

The research method used in this mapping study wasdefined based on the guidelines given in [31] and in [43].The method involves three main phases: (i) Planning: refersto the pre-review activities, and establishes a protocol,defining the search string, research questions, inclusionand exclusion criteria, sources of studies, and mapping

Fig. 3 Criteria identified in the selected studies

J Electron Test (2019) 35:867–885 871

Page 6: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 4 UML models

procedures; (ii) Conducting: focuses on searching andselecting the studies in order to extract and synthesize datafrom included ones; (iii) Reporting: is the final phase andit writes up the results and circulates them to potentiallyinterested parties. Following, the main parts of the mappingprotocol used in this work are presented.

In this mapping we considered the studies published untilMarch 2018. The following Research Questions (RQs) wereinvestigated:

RQ1. What are the algorithms used in the generation oftest cases?RQ2. What are the test criteria used in the generation oftest cases?RQ3. What modeling technique is presented for generat-ing test cases?RQ4. What evaluation strategy is carried out in theselected study?

We used the following search string, which was appliedon Scopus2 database:

(“MBT” OR “model based test” OR “model basedtesting” OR “model based tests” OR “MDT” OR “modeldriven test” OR “model driven testing” OR “model driventests”) AND (“test criteria” OR “test criterion” OR“testing criteria” OR “testing criterion” OR “generationmethod” OR “generation algorithm” OR “test casesgeneration”).

As a result, 385 publications were returned by Scopus.The selection process was divided into two stages. In the1st stage, inclusion and exclusion criteria were appliedconsidering title, abstract and keywords, so 156 publicationswere eliminated. In the 2nd stage, the exclusion criteriawere applied considering the entire text, resulting in 45studies. Over these 45 studies considered relevant, weperformed backward and forward snowballing. Backwardsnowballing refers to papers cited in the selected studies andforward snowballing refers to those studies that cited theseremaining studies [22, 31].

2https://www.scopus.com/home.uri

Backward snowballing resulted in 1173 studies. Fromthese 1173 studies, the selection process was again applied.This stage was carried iteratively. Three iterations wereperformed and 18 new studies were selected. Forwardsnowballing returned 670 studies. Four iterations wereconducted, resulting in 34 new studies. As a result, we got afinal set of 97 studies3.

From the selected studies we analyzed each one in orderto answer the research questions.

In RQ1, we identified algorithms that have been usedin MBT, considering FSM as a model. Figure 2 showsthe main algorithms. A total of 33 types of algorithmswere identified. Most of the papers propose new algorithms(24.1%), always to improve an existing one in terms ofefficiency and coverage. Other algorithms identified thatare considerably used to generate test cases from FSM areOptimal-path Finding (13.64%) and DFS (12.12%).

Considering the results found we decided to compareHSC algorithm (proposed in the study), CPP (optimal-path finding) and classic algorithms BFS and DFS. Thealgorithms analysis is presented in Section 4.

Following, the answers to the new RQs investigated inthis mapping are presented.

RQ2 focuses on verifying the criteria used in the studiesidentified. Figure 3 shows the main criteria found in theselected studies. We identified 29 types of criteria, and thethree most mentioned criteria are related to graph coverage:All-edges of a graph (or equivalent representation, such astransitions of an FSM) with 36.5%, followed by all-states(13.9%) and all-paths (12.4%). Many methods identified inRQ2 use a single criterion to generate test cases [10, 17, 56].Other studies adopt a method to cover two or more criteria[9, 26].

In order to answer RQ3, a large number of formal modelsfor generating test cases were observed. However, a largenumber of Unified Modeling Language (UML) models usedin the papers drew our attention. So, we analyzed this RQ intwo steps. First, we analyzed only UML models, in order toidentify which UML diagram was the most used. Then, allother formal modeling techniques were analyzed. Figure 4shows the distribution of UML models. State diagram is themost used. Some studies convert a UML diagram to anotherformal model to generate test cases [4, 33]. Figure 5 presentsthe main identified formal models. In general, similar toUML diagrams, formal models based on states are verymuch accepted in the literature, with FSM being the mostused to represent a system [34, 58].

Finally, RQ4 verifies what are the most used evaluationstrategies in the literature. This evaluation strategy is relatedto the test cases generated and also the model used. We

3The references of the 97 selected studies can be accessed at: https://goo.gl/Yv1KKn.

J Electron Test (2019) 35:867–885872

Page 7: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 5 Formal models used

separated the evaluations into two categories (test case andmodel), because we observed that these were the mostused in the selected studies. Evaluations related to testcases were more commonly used in the selected studies.Some evaluations are: number of test cases (18.2%), faultdetection (11.5%) and mutants analysis (7.4%) [1, 7, 8].With respect to model evaluations, they were based mainlyon number of edges (7.4%), number of states (6.8%) andfault coverage (4.1%) in a model [14, 49]. Finally, weidentified evaluations for both test cases and models, such astime (24.3%) and memory used (2.7%). The time is the mostmentioned evaluation in this systematic mapping. Somestudies use the time as a performance metric of computationtime [16, 47, 57].

4 Analyzed Algorithms

In this study, considering the results found by the systematicmapping, we decided to investigate two criteria and fourmethods, divided into two main groups of methods,employed to generate test cases from FSM models: graph-search methods (Breadth-First Search, Depth-First Search)and Eulerian methods (H-Switch Cover and ChinesePostman Problem). BFS, DFS and CPP were chosen due tothe results obtained from Systematic Mapping.

In graph theory [11], BFS and DFS are used to performsearch for a particular vertex or to traverse a graph. BFS

focuses on traversing a graph, represented in the formatof a spanning tree, exploring all the neighboring verticesof a certain level and moving to the neighbors of the nextlevel. On the other hand, DFS visits each vertex, of a givenlevel, starting at the root and traverses each branch beforebacktracking and moving on to the next branch.

HSC [54] is a method, based on the classical SwitchCover [44], that specifies the ATP criterion must beexecuted at least once. One of the main characteristics ofSwitch Cover is to generate a transition-pair balanced graphfrom an FSM, known as Dual Graph. The main feature ofHSC is the ability to cope with complex FSMs and the useof Hierholzer algorithm [41] as a heuristic to ensure thatall edges are visited exactly once. The Hierholzer algorithmis based on Euler theorem [40] that generates an Euleriancycle.

CPP is a classical optimal-path method that solves theproblem of finding the best path in a graph visiting alledges [20]. This problem is focused on graphs that are notbalanced, that is, the number of input and output edgeson a vertex is not the same. This method has three majorsteps if the graph is not balanced: 1) find the minimalpath from a state to any another state, using the Floyd-Warshall algorithm; 2) find the maximal matching betweenthe unbalanced states, with the Hungarian Method; and 3)insert the minimal path between two unbalanced states tobalance the graph. In this last step, BFS algorithm was usedto search the path between two unbalanced nodes. If the

Fig. 6 Study overview

J Electron Test (2019) 35:867–885 873

Page 8: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 7 Number of test cases.Please, note since an Eulerianalgorithm generates test casesdepending on the number ofinitial states, the number of testcases will always be the same.The number of test casesgenerated from HSC and CPP toAT is 1 (and 4 to ATP), so onecovers the other in the figure

graph is balanced, an Eulerian cycle search algorithm canbe used to find the minimal path that visits all edges.

All the four methods were implemented using Javalanguage (version 1.8). While BFS, DFS and H-SwitchCover were implemented using tree data structures,separating the states and transitions of an FSM in differentclasses, CPP was implemented by using matrix, a simpledata structure, mapping each row and column of the matrixas a state of FSM. The implementation is available as anopen source project at GitHub4 for download.

5 Experimental Study Conducted

Figure 6 illustrates the study overview. From an FSM,a criterion can be chosen to define how the FSM willbe covered. In all-transitions there is no need for anotherstep, but in all-transition-pairs it is necessary to cover theoriginal FSM and generate a Dual Graph. After this step,it is necessary to choose a method to generate test cases:graph-search methods or Eulerian methods. In graph-searchmethods, there is yet another step. So, with all-transitionscriterion, the graph-search method will generate test casefrom the original FSM, or with all-transition-pairs, fromthe Dual Graph. On the other hand, for Eulerian methodsit is necessary to balance the graph. In all-transitions, theEulerian methods will balance the original FSM, but in all-transition-pairs they will balance the Dual Graph. After thisphase, the method selected will generate test cases and theresults will be compared with certain evaluations.

The methods were evaluated with 875 randomly gener-ated FSMs and twenty-one models from real-world systems.For random models, we adopted the benchmark of reducedand deterministic FSMs from [21]. In particular, we selectedmachines with four inputs, four outputs, and the number ofstates ranging from four to twenty. For each FSM config-uration (#states), approximately 100 different FSMs wereselected, and the average values were calculated for eachmetric to reduce the influence of particular FSM character-istics on the results.

4https://github.com/MatheusMMariano/H-Switch Cover

With respect to the real-world FSMs, two softwareproducts embedded into computers of space projects underdevelopment at National Institute for Space Research(INPE)5 were used in this experiment. The first software isAlpha, Proton and Electron monitoring eXperiment in themagnetosphere (APEX). The second software is Softwarefor the Payload Data Handling Computer (SWPDC), whichwas developed in the context of the Quality of SpaceApplication Embedded Software) (QSEE) research project.Details for the two software products used and their FSMscan be found in [54].

Evaluation of methods is not an easy task, because itis necessary to analyze parameters before defining if amethod is more efficient than other. Each method has itspeculiarities, which can be of advantage in some cases anda problem in others. So, evaluating a method does not onlycheck if a method is more efficient, but also checks inwhich cases a method can be more or less advantageous.In our study, each method generated test cases to coverAT and ATP criteria. The methods were applied in graphsgenerated from FSMs (AT) and in a Dual Graph (ATP).The choice of these criteria was guided by the systematicmapping (Section 3), which showed the most used methodsand criteria in test case generation from FSMs. With respectto test algorithm evaluation, we analyzed:

– Number of test cases. We consider the number of testcases as the number of resets in a test suite, that is,number of test sequences.

– Test suite size. The number of input events as the testsuite size was considered.

– Length of smallest and largest test cases. Lengthof the smallest and largest test cases for each FSMconfiguration. An average is calculated since each FSMconfiguration have approximately 100 FSMs.

– Average length of sequences. This considers the sumof number of input values in each test cases divided bythe total number of test sequences.

– Standard deviation and Distribution. These metricsare calculated in relation to average length of testsequences.

5http://www.inpe.br/

J Electron Test (2019) 35:867–885874

Page 9: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 8 Test suite size

– Generation time. This refers to the time (in millisec-onds) needed to generate a test suite. Each configurationwas executed 100 times and the average generation timewas measured.

– Efficiency. The efficiency of a test suite is related to itscapacity in uncovering faults. As real faults are not readilyavailable, researchers have relied on mutation testingas a proxy for test efficiency (i.e., fault coverage). Inparticular, empirical studies have shown evidence thatmutants are correlated to real faults [29, 42] [Just et al.,2014; Papadakis et al., 2018]. Therefore, we adoptedmutation testing and a supporting tool as in [21] tomeasure test efficiency. For each FSM, we generate aset of faulty models (mutants) using operators that addnew redundant states, change the initial state, modifythe output of a transition, and redirects a transitionfrom one state to another. Such operators simulate themost common mistakes that may occur when designingFSMs. Using mutants, we calculate the mutation scoredividing the number of mutants detected by a test suiteby the total number of mutants.

This experiment was run on a iMac computer (21.5-inch,Mid 2011), with an Intel Core i5 processor running at 2.7GHz, memory of 8 GB (1333 MHz DDR3) and runningmacOS High Sierra on Version 10.13.6.

6 Results

In this section, we present the results of this experiment itapplying the selected methods by each criterion. Section 6.1shows the results of each method and criterion applied onthe random generated FSMs. In Section 6.2, the results ofreal-world FSMs is presented.

6.1 Randomly Generated FSMs

Number of Test Cases Figure 7a shows how many testcases were generated by the algorithms for each FSMconfiguration with AT and Fig. 7b by ATP criterion. BFSgenerated more test cases due to the construction of aspanning tree and number of leaf nodes increases as graph

Fig. 9 Length of shortest test cases

J Electron Test (2019) 35:867–885 875

Page 10: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 10 Length of longest test cases

size increases. The result of HSC and CPP is always thesmallest number and the same in all FSM configurationssince both produce a single Eulerian cycle for each initialstate.

Test Suite Size Figure 8a and b show how the test suitesize varies considering the number of states. CPP endedup turning out the best method in AT and ATP criteria,generating a minimum number of inputs with and withoutresets. DFS, on the other hand, generates more input valuesdue to the nature of the depth of the graph right at thefirst time. Despite HSC and CPP balancing the graph andgenerating an Eulerian sequence, this result was expectedsince CPP generates a minimum balanced graph and smallerthan the one generated by HSC.

Length of Shortest and Longest Test Case Figure 9a andb show the average length of shortest test case of eachFSM configuration, and Fig. 10a and b show the averagelength of longest test case. In all cases, BFS produced testcases with the shortest length. With respect to methods that

generate Eulerian cycle, CPP was better than HSC, sinceits balancing process generates a minimum graph. Also, theshortest and largest test case results are the same as HSC andCPP since Eulerian cycle generates a minimum sequencethat visits all transitions.

Average Length of Test Cases Figure 11a and b showthe average length of test cases generated by each FSMconfiguration. First, for each test suite an average wascalculated and then the same process was repeated for eachFSM configuration. BFS was the method that generated theshortest test sequences. Analyzing Eulerian methods, CPPgenerated sequences shorter than HSC, once this methodgenerates a minimum balanced graph and, then, an Eulerianpath. Also, all sequences of CPP are smaller than or equalto HSC.

Standard Deviation Figure 12a and b show the standarddeviation for each method by FSM configuration. In allscenarios, DFS has high standard deviation and highvariance of test cases. CPP has 0 standard deviation and

Fig. 11 Average length of test cases

J Electron Test (2019) 35:867–885876

Page 11: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 12 Standard deviation results. Please, note that the same situation of Figure 7 happens in this result. The standard deviation will always be 0as the number of test cases is the same

variance in all the cases, indicating that the test casesgenerated had the same length without any change amongthem.

Distribution of Average Length of Test Sequences Fig-ures 13, 14, 15 and 16 show the distribution of averagelength of test sequences by number of states. This result ispresented as a violin plot, a style that shows the dispersionof data set between the maximum and minimum points. Thelarger is the curve, the higher is the number of sequencesthat has a similar length. White point represents an averagevalue for all graphs. In general, CPP has a similar distri-bution in both AT and ATP, with length between minimumand maximum less than HSC. These results show how thebalancing step of Eulerian algorithm can affect the perfor-mance of method. BFS shows a tendency in generating testsequences with long lengths or too small as the number ofstates grow. In DFS the average seems to be varying, but

the scale of DFS is larger than BFS. So, results show thebehavior of standard variation in DFS.

Generation Time Figure 17a and b show the generationtime in milliseconds of test cases of each method; y-axisis in log scale. For AT criterion, BFS, DFS and CPPmethods had similar time, with CPP being the methodwith smallest oscillation as the number of states increases,and HSC having a higher increase in time. These resultsare not repeated with ATP criterion, with BFS showing ahigher growth than HSC. In both cases, CPP was the leastoscillating algorithm in relation to generation time and, onaverage, it was the fastest to generate all test cases.

At a glance, it may seem unusual to observe CPP,naturally a costly algorithm, to outperform BFS and DFS.But it is valid to mention that the data structures of the threealgorithms are different. CPP uses a matrix, a simple datastructure, while BFS and DFS use a graph structure. So,

Fig. 13 Dispersion of average length of test cases by BFS

J Electron Test (2019) 35:867–885 877

Page 12: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 14 Dispersion of average length of test cases by DFS

this aspect may be an impact factor for the behavior of thealgorithm. The implementation of CPP was not in graph dueto the complex formula of Hungarian Method to analyze themaximal-matching of unbalanced nodes, which would bemore complex to implement.

Efficiency Figure 18a and b show the efficiency of test casesapplying mutant analysis. These results show the percentageof mutation score obtained by each method’s test set. CPPperformed better for AT and ATP criteria, but HSC hadsimilar results as CPP, despite HSC generating a large testcase compared with CPP. BFS in AT criterion showed not tobe efficient with a decreased mutation score as the numberof states increases.

As expected, the mutation score is higher for the ATPcriterion (over 90%) than the AT criterion (about 80%).Other observation is that using different algorithms to coverthe same criterion may produce very different results from afault detection perspective. For instance, BFS covering ATdetects around 40% of the mutants while CPP’s detectionrate is around 80%.

6.2 Real-World FSMs

Figure 19a shows the results of APEXwith the four methodswith AT criterion; each metric is represented by a group ofbars. BFS and DFS achieved the best results in four casestudies, but regarding number of test cases, HSC and CPPnaturally had the best results. Eulerian methods also had 0standard deviation and variance, similarly as in the case ofrandom FSMs. This scenario is repeated with ATP criterion,as shown in Fig. 19b, except with test suite size that HSCand CPP were better than BFS and DFS.

Figure 20 shows the results of efficiency analyses of eachmethod applied in the APEX product. In general, CPP andHSC were better than DFS and BFS, with a small differencein ATP criterion. In Test suite size result, the HSC and DFSare quite similar, as with BFS and DFS. In ATP, HSC andCPP obtained the same result. Curiously, BFS performedbetter than DFS in ATP criterion, being the only case thatthis happens.

Figure 21 shows the average results of second casestudy referring to the 20 scenarios of SWPDC. To perform

Fig. 15 Dispersion of average length of test cases by HSC

J Electron Test (2019) 35:867–885878

Page 13: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 16 Dispersion of average length of test cases by CPP

Fig. 17 Generation time

Fig. 18 Efficiency with mutation testing

Fig. 19 Average result for APEX product

J Electron Test (2019) 35:867–885 879

Page 14: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 20 Efficiency results for APEX product by each criterion

the analysis, we calculated the average of 20 FSMs ofSWPDC for each method. HSC and CPP achieved thebest results in 4 out of 7 analyses. Between the graph-search methods, BFS presented the worst results. Eulerianmethods, again, achieved the smaller number of test casesand test suite size. The length of smallest, largest test casesand the average length of test cases from Eulerian methodsis the same because they generate a unique sequence withall transitions of FSM. Again, these 2 methods present 0standard deviation and variance values. These scenarios arerepeated in AT criterion. In ATP criterion, the results ofBFS in test suite size analysis is much higher than others.In general, Eulerian methods results did not present a highvariation among applied criteria.

Figure 22 shows the efficiency analyses of 20 scenariosof SWPDC. This result presents the distribution of mutationscore resulting from applying each method and criterion.It should be noted that the outliers of violin plot are onlya representation and do not represent the maximum andminimum values of this result (the maximum and minimumare represented by vertical straight line in the center, withthe white point being the average). CPP and HSC obtainedsimilar results in AT and ATP criterion, with a higher rangeof distribution in comparison with BFS and DFS. In fact,one can see that the distribution of results is close to themaximum and minimum values. Due to this, the BFS andDFS methods were better than HSC and CPP. This happensbecause SWPDC has two particular cases of models: 1)when its states have only one transition to another state,without any branch; and 2) when only one state has abranch. In this case, DFS and BFS can outperform HSC andCPP. In ATP, this situation is more evident with the DFSresult in comparison to others.

Figure 23a and b show the generation time in millisec-onds. DFS was the fastest method with the AT criterion, andCPP was the best with AT pairs criterion. In relation to Eule-rian methods, CPP was better than HSC in all cases, needingless time to generate test cases.

In relation to average distribution, Eulerian methodsbehave similarly with all criteria. BFS and DFS have thesame distribution in AT criterion, but in ATP the average isdifferent; for BFS the average is close to the maximum limit.

7 Discussion

This paper presented a comparative study of test casegeneration from four graph-based algorithms, consideredthe most investigated in the literature, according tothe systematic mapping. Two methods are classic-searchgraphs: BFS and DFS, and other two are Eulerian methods:HSC, CPP. The criteria used are AT and ATP. The methodsused the original graph to apply AT criterion, whereasa Dual Graph was used to apply the ATP criterion. 875random FSMs and two real-world FSMs from embeddedsoftware of spatial applications were used in the experiment.

In general, CPP presented better results in all FSMs, bothwith AT and ATP. This method provides the best results interms of number of test cases and test suite size, havinga better or similar behavior than HSC in all results. Also,CPP presents low distribution of average length, since itsalgorithm generates a minimum balanced graph and, so,a minimum test sequence. This method loses to BFS andDFS in terms of length of smallest and largest test caseand the average length, but these results were expectedsince Eulerian methods generate a unique sequence with alltransitions of graph.

Looking just at the results of APEX, CPP loses to BFSand DFS. Since APEX represents a unique FSM and thisresult does not repeat in ATP criterion, it is an indicationthat for small FSMs, CPP may not be a better choice thanother methods. This happens because CPP is a brute-forcealgorithm. But, analyzing the generation time that CPP usedto generate test cases, it is practically the same as BFS. So,even for this case, it is valid to say that BFS and CPP tied inresults.

In general, we observed that Eulerian methods (mainlyCPP) performed better when applied to large FSMs, andsearch graphs are better to smaller FSMs. It is valid tomention that, in all cases, CPP was better than HSC. Thisis because the balancing process of CPP is better than HSC,generating a minimum balanced graph. In our empiricalanalysis, the balanced graph of CPP is around three timessmaller than the balanced graph generated by HSC.

The analyses of this study are not focused in showing,explicitly, what are the best method and criterion to generatetest cases, instead, they can guide a test analyst in evaluatingwhich are the best for a particular problem. For example, forspace application software, or even an embedded software,it is more important to generate as many test cases aspossible that cover all system gaps due to the system

J Electron Test (2019) 35:867–885880

Page 15: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 21 Average results for 20 scenarios of SWPDC product

criticality. So, Eulerian algorithms may be better to beused since they generate a minimum number of test casesthat cover all system states, trying to find as many bugsas possible. Comparing the results of HSC and CPP, thelatter may be a better choice since its balancing process ismore efficient and faster. If more precision is important,that is more test cases are necessary to cover the model,ATP can be used as an alternative. On the other hand, ifthe SUT is not so complex and the process has differenttest steps such as Technical Acceptance Testing (TAT) orUAT (User Acceptance Test), and also as there may be

a limited budget to invest in testing, the classic-searchalgorithm may be an option since it generates as many testcases as possible quickly. Based on the results, BFS can bea better choice regarding the classic-search algorithms. Ingeneral, the results of our study showed that the behaviorof algorithms, naturally, always depends on the structureof the graph, as shown by the results of random andreal-world FSMs. So, even for a space application, it isimportant to analyze how much precision is needed for testcases and what is the size of the graph to generate testcases.

Fig. 22 Average mutantionscore results for 20 scenarios ofSWPDC product

J Electron Test (2019) 35:867–885 881

Page 16: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Fig. 23 Generation time results for real-world FSMs

8 Conclusion

In general, this work is a conclusion of four years ofresearch in MBT with objective of finding new methodsfor generation of test cases from FSMs. Initially, thisresearch started with an experimental study investigatingswitch cover test sets and classic-graph methods (BFS,DFS) from the literature. With this research, we identifiedthat the balancing heuristic of H-Switch Cover was notoptimized. Initially, it was necessary to identify othermethods for FSMs in the literature that balance the graph.But we generalized our work for other methods thatgenerate test cases from a FSM with the purpose to seewhat methods are being used in MBT for generation oftest cases from FSM. So, for this purpose a SystematicMapping was conducted. Finally, considering only graph-based algorithms, an experimental study was conducted forcomparing the most used graph-based methods for MBT.

The main contributions of this work are a systematicmapping made to summarize the MBT literature toinvestigate what are the most used methods to generatetest cases for FSM. Besides that, a comparison of the mostused methods in random and real-world FSMs was made todefine what is the best method in certain cases.

The different analyses about the methods’ behavior cansupport the choice of the best method to fit in a givenproject. The test expert can identify the trade offs betweentest case characteristics. It is at the discretion of the testingspecialist to decide which algorithm can be used consideringthe experiment results and characteristics of the systembeing tested. So, we believe that the results of this work canguide a test analyst, whether in INPE or in industry, to definewhat is the best method for a problem.

The study selection and data extraction stages wereinitially performed by the author. In order to reduce somesubjectivity, the advisors performed these same stages overa sample of studies. The samples were compared in order todetect possible bias.

Our review was limited by the search terms used in theScopus database. Although Scopus is considered the largestabstract and citation database, we tried to overcome possiblelimitations by using backward and forward snowballing. Wealso considered papers from a control group to calibrate thestring. The categorization of papers was based on researchquestions (method, criterion, formal model and evaluationstrategies). Even so, we are possibly leaving some valuablestudies out of our analysis, since we considered papersindexed just by the Scopus database, and those obtainedfrom snowballing. However, the studies discussed in thismapping provide an overview of empirical research onoutcomes of existing research on MBT.

The limitation of this work is the FSM used in comparingmethods. We can not claim that experimental results aretrue for all FSMs, since the random FSMs do not cover alltypes of FSMs. Also, even the real-world FSMs are spaceapplications, they are relatively small compared to otherexperiments of INPE and other institutes. So, it is importantto perform other studies with more real-world FSMs. But,we expect that these results show a tendency of methods’behavior with big and small FSMs in the use cases shown.

As future work we intend to implement the methodsinvestigated in this project in WEB-PerformCharts6 [5], atool created in the Laboratory of Computing and AppliedMathematics (LABAC), at INPE, that uses formal modelsto specify a reactive system and generates test cases from asoftware specification. The following methods are alreadyimplemented: DS, UIO, TT and HSC. With incorporationof new methods on the tool, we expect this will allow otherresearchers conduct new experiments, and use it to generatetest cases on their systems. In addition, we also intend toconduct a detailed study on the problem of complexity issue,i.e. regarding scalability of methods regarding cost of testgeneration.

6https://webperformcharts.lcc.unifesspa.edu.br/nova.html

J Electron Test (2019) 35:867–885882

Page 17: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Acknowledgments The authors would like to thank CAPES forthe financial support and Dr. Rafael Santos for his help with theanalyses. A.T. Endo and E.F. Souza are partially financially supportedby CNPq/Brazil (grant numbers 420363/2018-1 and 432247/2018-1,respectively).

References

1. Ali S, Briand LC, Rehman MJ-U, Asghar H, Iqbal MZZ, NadeemA (2007) A state-based approach to integration testing based onuml models. Inf Softw Technol 49(11-12):1087–1106

2. Amaral A (2005) Geracao de casos de testes para sistemasespecificados em statecharts. (INPE-14215-TDI/1116)

3. Ammann P, Offutt J (2008) Introduction to Software Testing.Cambridge University Press, Cambridge. [S.l.]

4. Anbunathan R, Basu A (2013) Dataflow test case generation fromuml class diagrams. In: Proc. IEEE international conference oncomputational intelligence and computing research, 2013. IEEE,pp 1-?9

5. Arantes AO, de Santiago VA, Vijaykumar NL, De Souza EF(2014) Tool support for generating model-based test cases viaweb. Int J Web Eng Technol iiWAS 9(1):62–96

6. Barmi ZA, Ebrahimi AH, Feldt R (2011) Alignment ofrequirements specification and testing: a systematic mappingstudy. In: Proc. international conference on software testing,verification and validation workshops, 2011. IEEE, pp 476?-485

7. Belli F, Beyazit M, Takagi T, Furukawa Z (2012) Model-basedmutation testing using pushdown automata. IEICE Trans Inf Syst95(9):2211–2218

8. Belli F, Beyazit M, Takagi T, Furukawa Z (2013) Mutationtesting of “go-back” functions based on pushdown automata. In:Proc. international conference on software testing, verification andvalidation, 2011. IEEE, pp 249–258

9. Belli F, Endo AT, Linschulte M, Simao A (2014) A holisticapproach to model-based testing of web service compositions.Software: Practice and Experience 44(2):201–234

10. Belli F, Hollmann A, Kleinselbeck M (2009) A graph-model-based testing method compared with the classification tree methodfor test case generation. In: Proc. international conference onsecure software integration and reliability improvement, 3., 2009.IEEE, pp 193–200

11. Bondy J, Murty U (2008) Graduate texts in mathematics: graphtheory. Springer, USA

12. Brito RC, Martendal DM, de Oliveira HEM (2003) Maquinas deestados finitos de mealy e moore

13. Broy M, Jonsson B, Katoen JP, Leucker M, Pretschner A (2005)Model-based testing of reactive systems. Springer, Berlin

14. Cartaxo EG, Machado PD, Neto FGO (2011) On the use of asimilarity function for test case selection in the context of model-based testing. Software Testing, Verification and Reliability21(2):75–100

15. Chow TS (1978) Testing software design modeled by finite-statemachines. IEEE Trans Softw Eng SE-4(3):178–187

16. Dang T, Nahhal T (2009) Coverage-guided test generation forcontinuous and hybrid systems. Formal Methods in SystemDesign 34(2):183–213

17. Devroey X, Perrouin G, Schobbens P-Y (2014) Abstract test casegeneration for behavioural testing of software product lines, 86–93

18. Dorofeeva R, El-Fakih K, Maag S, Cavalli AR, YevtushenkoN (2010) Fsm-based conformance testing methods: a surveyannotated with experimental evaluation. Inf Softw Technol52(12):1286–1297

19. Dorofeeva R, El-Fakih K, Yevtushenko N (2005) An improvedconformance testing method. In: Proc. international conference onformal techniques for networked and distributed systems, 2005,pp 204–218

20. Edmonds J, Johnson EL (1973) Matching, euler tours and thechinese postman. Math Program 5(1):88–124

21. Endo AT, Simao A (2013) Evaluating test suite characteristics,cost, and effectiveness of fsm-based testing methods. Inf SoftwTechnol 55(6):1045–1062

22. Felizardo KR, Mendes E, Kalinowski M, Souza EF, VijaykumarNL (2016) Using forward snowballing to update systematicreviews in software engineering. In: International symposium onempirical software engineering and measurement (ESEM)

23. Gill A et al (1962) Introduction to the theory of finite-statemachines. [S.I.] McGraw-Hill, New York

24. Gonenc G (1970) A method for the design of fault detectionexperiments. IEEE Trans Comput 100(6):551–558

25. Gurbuz HG, Tekinerdogan B (2017) Model-based testing forsoftware safety: a systematic mapping study. Softw Qual J26(4):1–46

26. Hessel A, Pettersson P (2007) A global algorithm for model-basedtest suite generation. Electronic Notes in Theoretical ComputerScience 190(2):47–59

27. IEEE (2005) Ieee standard for software verification and validation.IEEE Std 1012-2004 (Revision of IEEE Std 1012-1998), pp 1–110

28. Jia Y, Harman M (2011) An analysis and survey of thedevelopment of mutation testing. Trans Softw Eng 37(5):649–678

29. Just R, Jalali D, Inozemtseva L, Ernst MD, Holmes R, Fraser G(2014) Are mutants a valid substitute for real faults in softwaretesting? In: Proceedings of the 22nd ACM SIGSOFT internationalsymposium on foundations of software engineering, (FSE-22),Hong Kong, China, November 16 - 22, 2014. ACM, pp 654–665

30. Khan MU, Iftikhar S, Iqbal MZ, Sherin S (2017) Empirical studiesomit reporting necessary details: a systematic literature review ofreporting quality in model based testing. Computer Standards &Interfaces

31. Kitchenham BA (2007) Guidelines for performing systematicliterature reviews in software engineering. Keele University,Durham. Technical Report EBSE-2007-01

32. Lee D, Yannakakis M (1996) Principles and methods of testingfinite state machines-a survey. Proc IEEE 84(8):1090–1123

33. Li N, Offutt J (2017) Test oracle strategies for model-basedtesting. IEEE Trans Softw Eng 43(4):372–395

34. Majeed S, Ryu M (2016) Model-based replay testing for event-driven software. In: Proceedings..., pages 1527–1533. AnnualACM Symposium on Applied Computing, 31., 2016, ACM

35. Mariano MM, Souza EF, Endo AT, Vijaykumar NL (2016) Acomparative study of algorithms for generating switch cover testsets. In: Anais..., Maceio. Simposio Brasileiro de Qualidade deSoftware, 15., 2016

36. Mariano MM, Souza EF, Endo AT, Vijaykumar NL (2019a)Analyzing graph-based algorithms employed to generate test casesfrom finite state machines. In: 2019 IEEE Latin American testsymposium (LATS), pp 1–6

37. Mariano MM, Souza EF, Endo AT, Vijaykumar NL (2019) Iden-tifying approaches to generate test cases in model-based testing: asystematic mapping. In: Proceedings of the Iberoamerican confer-ence on software engineering, 22., 2019, pp 1–14, Havana

38. Mathur AP (2008) Foundations of software testing. [S.l.] PearsonEducation, London

39. Myers GJ, Sandler C, Badgett T (1976) The art of software testing.[S.l.] Wiley, New York

40. Naito S (1981) Fault detection for sequential machines bytransition-tours

J Electron Test (2019) 35:867–885 883

Page 18: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

41. Neumann F (2004) Expected runtimes of evolutionary algorithmsfor the eulerian cycle problem. In: Proceedings..., pages 904–910.Congress on Evolutionary Computation, 2001. IEEE

42. Papadakis M, Shin D, Yoo S, Bae D (2018) Are mutation scorescorrelated with real fault detection?: a large scale empiricalstudy on the relationship between mutants and real faults. In:Proceedings of the 40th international conference on softwareengineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03,2018, pp 537–548. ACM

43. Petersen K, Vakkalanka S, Kuzniarz L (2015) Guidelines forconducting systematic mapping studies in software engineering:an update. IEEE Trans Vis Comput Graph 64:1–18

44. Pimont S, Rault J-C (1976) A software reliability assessmentbased on a structural and behavioral analysis of programs. In:Proc. international conference on software engineering, 2., 1976,IEEE, pp 486–491

45. Pinheiro AC, Simao A, Ambrosio AM (2014) Fsm-based test casegeneration methods applied to test the communication software onboard the itasat university satellite: a case study. J Aerosp TechnolManag 6(4):447–461

46. Pressman RS (2005) Software engineering: a practitioner’sapproach. [S.l.]: Palgrave Macmillan

47. Proch S, Mishra P (2014) Directed test generation for hybrid sys-tems. In: Proceedings..., pages 156–162. International Symposiumon Quality Electronic Design, 15., 2014

48. Santiago V, Vijaykumar NL, Guimaraes D, Amaral AS, FerreiraE (2008) An environment for automated test case generationfrom statechart-based and finite state machine-based behavioralmodels. In: Proc. Software Testing Verification and ValidationWorkshop, 2008. ICSTW’08. IEEE International Conference on,pp 63–72

49. Satpathy M, Yeolekar A, Peranandam P, Ramesh S (2012)Efficient coverage of parallel and hierarchical stateflowmodels fortest case generation. Software Testing, Verification and Reliability22(7):457–479

50. Siavashi F, Truscan D (2014) Environment modeling in model-based testing: concepts, prospects and research challenges: asystematic literature review. In: Proceedings..., pages 30:1–30:6.Proc. International Conference on Evaluation and Assessment inSoftware Engineering, 19., 2015. ACM, New York

51. Sidhu DP, Leung T-K (1989) Formal methods for protocol testing:a detailed study. IEEE Trans Softw Eng 15(4):413–426

52. Simao A., Petrenko A, Maldonado J (2009) Comparing finite statemachine test coverage criteria. 3(2): 91–105

53. Souza EF, Santiago VA JR, Guimaraes D, Vijaykumar NL (2008)Evaluation of test criteria for space application software modelingin statecharts. In: Proc. international conference on computationalintelligence for modelling control and automation, p 2008

54. Souza EF, Santiago VA JR, Vijaykumar NL (2017) H-switchcover: a new test criterion to generate test case from finite statemachines. Softw Qual J 25(2):373–405

55. Souza SDRSD (2000) Validacao de especificacoes de sistemasreativos: definicao e analise de criterios de teste

56. Vu T-D, Hung PN, Nguyen V-H (2015) A method for automatedtest data generation from sequence diagrams and object constraintlanguage, pp 335–341

57. Wang W, Sampath S, Lei Y, Kacker R, Kuhn R, Lawrence J(2016) Using combinatorial testing to build navigation graphsfor dynamic web applications. Software testing. Verification andReliability 26(4):318–346

58. Yao J, Wang Z, Yin X, Shiyz X, Wu J (2014) Formal modelingand systematic black-box testing of sdn data plane. In: Proc.international conference on network protocols, 22., 2014, IEEE,pp 179–190

Publisher’s Note Springer Nature remains neutral with regard tojurisdictional claims in published maps and institutional affiliations.

Matheus Monteiro Mariano is Technologist in Database fromFATEC-Faculdade de Tecnologia do Estado de Sao Paulo (2016) andMaster’s in Applied Computing (CAP) at INPE-National Institutefor Space Research (2019). Currently, he is a Software Developer atJohnson & Johnson. He has experience in Software Engineering ingeneral, and his main interests is to research, and development relatedto Database and Software Engineering, applying whenever possibleSoftware Testing knowledges to improve and maintain systems anddevelopment processes. He also works with heuristics and algorithmsto solve computational problems.

Erica Ferreira de Souza holds a degree in Data Processing fromFaculdade de Tecnologia do Estado de Sao Paulo (2005). Shehas Masters (2010) and PhD (2014) degrees from the NationalInstitute for Space Research (INPE) in Brazil. She is a Professor atthe Federal University of Technology – Parana (UTFPR), CornelioProcopio, Brazil. She has experience in software engineering andknowledge management. Her main research interests include model-based testing, knowledge management applied in software engineeringand systematic literature review.

Andre Takeshi Endo received a bachelor’s degree in Analysis ofSystems from the Universidade Federal de Mato Grosso do Sul(UFMS), Brazil, in 2005. He also holds M.Sc. and Ph.D. degrees inComputer Science from Universidade de Sao Paulo (USP), Brazil,awarded in 2008 and 2013, respectively. He is an Assistant Professorat the Federal University of Technology - Parana (UTFPR), CornelioProcopio, Brazil. Currently, he is conducting postdoctoral studies atAarhus University, Denmark. Since he left his hometown AlvaresMachado in 2002, he has pursued his professional passions forcomputers, programming, and research. His area of interest includessoftware engineering with emphasis on software testing, dynamicanalysis, mobile computing, and Web applications.

Nandamudi Lankalapalli Vijaykumar is with the National Institutefor Space Research (INPE) as a Voluntary Research Collaboratorat the Laboratory of Computing and Applied Mathematics (LAC),SJCampos, SP, Brazil. He is also with the Federal University of SaoPaulo (UNIFESP) as a Visiting Associate Professor at the Institute ofScience & Technology (ICT), SJCampos, SP, Brazil. He holds a PhD(1999) in Informatics from the Aeronautics Institute of Technology(ITA). His areas of interest include Markov Models, Model-BasedTesting, Computational Modeling of Coastal Environment and DataAnalysis.

J Electron Test (2019) 35:867–885884

Page 19: Comparing Graph-Based Algorithms to Generate Test Cases ...vagrawal/JETTA/FULL_ISSUE_35-6/...[46]: unit testing, integration testing and system testing. Each level focuses on a specific

Affiliations

Matheus Monteiro Mariano1 · Erica Ferreira de Souza2 · Andre Takeshi Endo2 ·Nandamudi Lankalapalli Vijaykumar3,4

Erica Ferreira de [email protected]

Andre Takeshi [email protected]

Nandamudi Lankalapalli [email protected]; [email protected]

1 Laboratory of Computing and Applied Mathematics, NationalInstitute for Space Research, INPE Av. dos Astronautas, 1.758 -Jardim da Granja, Sao Jose dos Campos, SP, 12227-010, Brazil

2 Department of Computer, Federal University of Technology,UTFPR Av. Alberto Carazzai, 1640, Cornelio Procopio,PR, 86300-000, Brazil

3 ICT - Institute of Science & Technology, Federal Universityof Sao Paulo, Av. Cesare Monsueto Giulio Lattes, 1201, Sao Josedos Campos, Brazil

4 LABAC, National Institute for Space Research,Av. dos Astronautas, 1758, Sao Jose dos Campos, Brazil

J Electron Test (2019) 35:867–885 885