Machine Learning Based Framework for Software ... - TEJAS

Machine Learning Based Framework for

Software Product Line Testing Ashish Saini#1, Rajkumar#2, Gaurav Kumar*3

#Department of Computer Science, Gurukula Kangri (Deemed to be University), Haridwar, Uttarakhand, India [email protected]

[email protected]

*Department of Electronics and Telecommunications, College of Engineering Roorkee, Roorkee, Uttarakhand, India [email protected]

Abstract— Software product line includes a series of software

products which share common feature’s set. Since the number of

features may grow exponentially, it is not possible to test

individual products of the entire product line. Since the time

budget for testing is limited or even unknown a priori, the

sequence of testing products is critical to effective product line

testing. Regression testing is the way, to test a product after

making some changes in this product (for example, after a new

version or product is developed). Due to a lack of resources, only

a subset of test cases is executed for testing a specific product.

Which leads to problems with important test cases regarding

testing. Therefore, to lead the test cases, minimization and

prioritization of test cases is initiated by the regression testing

technique. Existing techniques usually require source code which

is time-consuming and complex to execute. However, testing of

complex applications often restricts access to source code.

Therefore, complex applications can be tested by black-box testing.

In this paper, we proposed a machine learning-based technique to

test the software product line. We apply Fuzzy C-Means clustering

to minimize the test cases and Ranked Support Vector Machine to

prioritize the rest of the test cases.

Keywords— Product Line, Software Product Line Engineering,

Product Line Testing, Support Vector Machine, Fuzzy C-Means

Clustering.

I. INTRODUCTION

Testing of a single software system is a very tough

and expensive phase of the software development

process. Testing of any software is a means to

provide surety and the quality of the product.

Software Product Line (SPL) is a paradigm adopted

by the organizations to enhance their productivity in

a limited time. To keep himself stable and

sustainable in the market organizations wants to

provide the product as per the market and customer

need as well. In general, modern systems are very

complex, so testing performs as an important part of

the whole development of SPL.

Testing is performed to uncovering the problems that

enhance further debugging. The development of a

software product without testing can raise doubts

about the reliability of the product. In single-system

software, development testing is an important and

difficult phase. In the context of an SPL, testing is a

more complex and expensive task because it deals

with the number of variants of a software.

It is difficult to reduce and execute all the variants of

a software product line. Regression testing is a way

to test SPL efficiently. The regression testing can be

conducted with minimization or selection of test

cases, retest all the test cases and, prioritization of

test cases are the sub-methods through which testing

can be done in SPL.

The complete testing of an SPL cannot be possible at

a glance. Recently Jung et al. conducted regression

testing of an SPL with the help of a selection of code

of an SPL. According to the author, it is the first

method that works on regression testing with code

in-depth and fulfills the assumptions regarding SPL.

But this method works only the single section of the

SPL testing test case selection. This paper did not

focus on test case prioritization and generation of test

cases.

To test an SPL completely there is a need of such

a method that completely test the product line. In this

direction, we are proposing a method that takes part

to test the SPL completely. In this paper, we

proposed a framework to test the SPL without,

human intervention or fully automated. This

framework complete in three phases one is the

generation of test cases, the second reduction of

generated test cases, and finally, prioritizes the rest

of test cases after reduction in test cases. To make

this framework fully automated we used recent

technologies.

The remaining part of this paper organized as

follows: next section regarding background of

feature modelling, product configuration, and

The Engineering Journal of Application & Scopes, Volume 5, Issue 2, Dec 2020 ISSN No. 2456-0472

11

introduction of machine learning. Related work

describe in Section 3. We discuss our proposed

method in section 4. Section 5, discuss algorithms

used for testing. Section 6, discussed threats of

validity regarding work. In the last, conclude paper

with future plans regarding the testing of product line

in section 7.

II. BACKGROUND

In SPL, the fundamental part is said to be a feature

model or feature diagram. A feature diagram is a

compact hierarchical representation of all the

products of SPL. A feature diagram has features

represented by nodes and all the connections showed

the relationship between them. Some features have

mandatory relationships which mean features are

part of all products. The optional relationship

showed that features may or may not exist in the

product. In addition, if an Alternate relationship

exists between children and parent shows that at least

one child take part in the product if the parent exists

in the product. The Or relationship exists between

parent and children then, one or more child can be

included in the product if parent as a part of the

product.

Fig. 1 Example of GPS Feature Model

Apart from these basic relationships between

features of an SPL. The cross tree constraints (CTC)

Requires and Excludes also exist in between features.

If a feature has the required relation with feature b

then both the features part of a product. On the other

side, exclude relationship in between two features

tells that both the features cannot be part of the same

product.

A. Product Configuration

A configuration is a subset of a set of features F

and product configuration is a product of a subset of

features that satisfies all the constraints of that exist

in between features. If a product satisfied all the

constraints then this product is said to be valid,

otherwise, it is said to be invalid.

B. Machine Learning

Machine learning is a computer-based method that

improves the learning process while being

automated. This method proceeds with their previous

experiences without human assistance and without

actually being programmed, also. In machine

learning, we feed the data as well as the output,

which we run on the machine during training so that

the machine used creates its logic, later this logic is

used to evaluate in the testing phase. It can be

categorized into supervised, unsupervised, and

reinforcement learning.

III. RELATED WORK

An SPL is a set of the software system that built

with a common set of features. Andres et al. [1]

proposed a formal framework for SPL. The proposed

framework provides a formal semantic for FODA

(Feature-Oriented Domain Analysis) like

frameworks. This framework was also adaptable to

new problems regarding SPL. For this framework,

the author defines SPLA as an algebraic language,

through which they define the SPLs. This approach

provides the semantics as follows; firstly define

operational semantics, next is denotational and the

last is axiomatic semantic. The author proves three

of semantics are equal and also showed how FODA

translated into SPLA. Then they developed a tool AT

which used an SAT-solver to check the satisfiability

of an SPL.

Model-based development widely used approach

to implementing software. Models are used to

replace source code which is primary executable

artifacts. Pietsch et al. [2] proposed a method that

overcomes from the challenges like manually

writing edit commands in a delta are difficult to

access. Another one is very few methods are

available on the deltas. The contribution of this

approach is a delta-based modelling framework for

SPLs. The goal is to solve the presented issue arisen


12

due to conventional delta and manual programming.

The Author used SiPL framework to implement their

approach.

A security requirement based framework proposed

by Mellod et al. [3]. In this method author focused

on security requirements which facilitates the secure

SPL development and their derived products. The

proposed framework composed of an SREPPLine

tool driven by security standards to manage the

requirements of security. The process defined with

SPEM 2.0 and the data repository indicated by XML

grammar. The contribution of this work, to provide a

systematic technique to handle the requirements like

variability and security which facilitate most

relevant security standards. Alam et al. [4] proposed

a secure framework that amalgamation of aspect and

feature-oriented methods. The aspect-orient

technique focuses on crosscutting concerns and

functional behaviour. On the other side, to hold the

variability and commonality of SPL addressed by the

feature-oriented technique. The author includes the

security standard through the security architecture

language such as XACML (eXtensible Access

Control Markup Language) or XADL (eXtensible

Access Description Language). The author includes

the vocabulary for security requirements in the

model.

The common and variant features of an SPL

marked with the help of variation points in the

orthogonal variability model [5]. Hierarchical

structure and expressiveness are the drawbacks of

this presented model. Kim et al. [6] proposed a

framework that works on domain requirements and

modelling the core architecture as well in SPL. The

framework provides mapping in between

requirements and reference architecture of product

line with supported tools and process methods. The

method involves the concept like goal-oriented,

domain requirement analysis, analytical hierarchy

process (AHP), matrix method, and architecture

style. The method helps to build and identify the

components such as business, service, Interaction,

and internal level. In the last, a reference model is

designed on component and quality attributes.

Tanhaei et al. [7] presented an architecture based

method to choose the constituent components in an

SPL. The method based on components used to

control and manage the selection of components.

This selection and management of component

deduct the software development cost and threats.

The components choose from the component

repository or COTS component. If components are

not available then developed them, the selected

components send for approval and then, proceed

approved components for integrity test.

IV. PROPOSED METHODOLOGY

The proposed methodology starts with the feature

model shown in figure 2. The feature model

generates with the help of FeatureIDE [8].

FeatureIDE is a plug-in tool with Eclipse.

FeatureIDE used to create a user-defined feature

diagram and also supports analyze, edit, and test a

feature model. FeatureIDE also contains CASA,

ICPL, Chavtal, and IncLing t-wise sampling

algorithms, JUnit is used to test a feature model.

The next step is to generate the test cases or product

from the feature diagram. To generate products we

used the FaMa tool [9]. FaMa is a well-known

command-line open-source tool in the research

community. The tool helps to generate products that

follow all the constraints of a feature diagram.

Product generation, error detection, checking

validity, etc. operations can be performed in FaMa.

Fig 2: Concept of Product Line Testing using Machine Learning

In single-system software, the test cases generated

in a good amount. Besides this, in SPL the number

of products generates in an exponential manner. So,


13

it is not feasible to test all the products because it

consumes a lot of time and resources also so that it

seems to be very expensive. To deduct this

expansiveness we can select or minimize the test

cases which proceed for the further process.

The selected test cases proceed to the training stage

that starts from the test expert. The need of test

expert is to select a positive test case set which have

the high importance and significant due to some

other reasons. On the contrary part a set of negative

test cases is to be select by test expert that have low

importance and generates valid and invalid catalogs

of test cases. The aim of this training phase is to

generate a list of final products. The products that

come from the previous stage in an unordered

manner. But to fulfill the requirements of testing we

should prioritize the test cases that are in unordered

form. Test case minimization/selection and

prioritization are the parts of regression testing. In

our approach machine learning technique support to

prioritize the test cases.

In machine learning, we used a supervised learning

algorithm. For training, the test expert distributes the

products into valid and invalid test cases. The data is

to be prepared on the basis of these valid and invalid

products. After that, the learning algorithm applies to

this prepared data. Then finally a list of ordered

products generated. The whole process works as a

black-box testing method. Because white-box testing

technique requires the source code, that could not be

covered in this paper.

To test the software product line, training data is

to be used as input for ML algorithms. The

proposed approach is cooperative to different ML

algorithms that adhere to the following needs such

as:

a) To emulate the decisions made by test experts

based on valid and invalid test cases

b) Able to manage the large scale product line

c) Result ranked in the classification model that

means the input value given with the output to show

priority.

V. ALGORITHM APPLIED FOR TESTING

Clustering is a mechanism that divides the

population of data into groups. Clustering can be

categories into two types, hard clustering, and soft

clustering. When the data belongs to exactly one

cluster after division then this type of clustering said

to be hard clustering. On the other side, fuzzy

clustering or soft clustering, when elements belongs

to more than one cluster or reside not exactly one

associate with the set of membership level. This

association shows the strength between data element

and a cluster. Fuzzy clustering is a mechanism to

assign the membership level to the data elements and

through membership level assign the cluster (one or

more) data elements. This clustering permits that the

feature with different degree of membership belongs

to more than single cluster with vague boundaries

among the clusters. In fuzzy clustering one element

with degree of membership have strength to belong

in different cluster rather than belongs to exact one

cluster. Hence, the data elements which reside at the

boundary of cluster may be in cluster with low

degree then the elements situated in the center of

cluster.

To minimize the data into clusters, fuzzy methods:

K-means and C-means clustering both are the similar

and can be used to minimize the data. But in the

context of proposed method, we are using the C-

means or FCM clustering. It is a mathematical model

of partitioning the set of data into cluster (two or

more). FCM algorithm try to partition a fixed set of

data points of n elements Y= {Y1, Y2, Y3……Yn} into

a set of fuzzy cluster ‘C’ in respect of some given

criteria. On a finite set of data the algorithm provides

a list of cluster V= {V1, V2, V3 .........., Vc} and a

matrix of partition {Formatting Citation}.

P = Vij ϵ [0, 1], i = {1, 2 …n} and j = {1, 2 …c}

Where, Vij degree of each element Yi which belongs

to Vj cluster. The goal of FCM algorithm is to

optimize the given objective function

Om=∑ ∑ (Vij)m‖Yi-Vj‖

2Cj=1

Ni=1 , 1 ≤ m < ∞ (1)

Where, ‖𝑌𝑖 − 𝑉𝑗‖ is the distance of Euclidean

between ith data and jth cluster center and Vij is the

membership or standard function defined as:

𝑉𝑖𝑗 =1

∑ (‖𝑌𝑖−𝑉𝑗

𝑌𝑖−𝑉𝑘‖)

2𝑚−1𝐶

𝑘=1

(2)


14

Here, the objective function Om is the addition of

membership value Vij and the fuzzifier m, the

fuzziness level of cluster determine by m fuzzifier. If

the m is large then it gives the smaller value of

membership Vij. Thus, the results comes in fuzzy

clusters, If m=1 is the limit then the membership Vij

converge to 0 or 1, which is a crisp partition. If there

is no knowledge about domain or experimentation

then commonly m equal or more than 2.

The mean of all points of clusters can be represent

as centroid can be defined as follows in which all

the points that belong to the cluster, weighted by

their degree.

Vj= ∑ Vijm* Yi

Ni=1 ∑ Vij

mNi=1⁄ (3)

Vijm, the belonging degree which related to the

distance of cluster center y, inversely. Also depends

on parameter m which is used to control the weight

of closest center. The algorithm can be summarized

in following steps[10]:

• Select the number of clusters.

• Assign randomly to each point coefficient in

the clusters.

• Repeat the above steps until reached the

maximum number of iteration (i.e. maxit

given by{|𝑉𝑖𝑗(𝑘+1)

− 𝑉𝑖𝑗(𝑘)|} < 𝛽 , where 𝛽 is

sensitivity threshold which depicts the

criteria of termination from 0 to 1; and k

represents the number of iteration steps).

• Centroid Computed of each cluster

i.e. Vj.

• Calculate the aggregation of center of

clusters to calculate the standard

deviation, at which point the

deviation will be minimum that will

be the optimal number of clusters

which we require.

The problem of minimization of test cases has

been solved by the clustering algorithm. To solve the

priority problem of test cases we apply ranking

support vector machine (SVM Rank) [11]. SVM

Rank can even calculate the ranking classification

model of large feature vectors and has provided good

results in black-box regression testing work [12].

Similar to ordinary SVM, this algorithm calculates a

hyper plane in the n-dimensional feature vector

space to create a maximum margin between two

given class labels.

Our technique is designed to perform test case

minimization as well as prioritization of test cases.

Therefore, we aim to evaluate whether our

technology can indeed improve regression testing in

terms of effectiveness. The effectiveness is measured

using a specific metric: the Average Percentage of

Faults Detected (APFD) [13]. It calculates the speed

at which n test cases cover f out of m faults. It returns

a value between 0 and 1, where 1 is the theoretically

optimal value, therefore, and the best value.

Particularly, a higher value indicates that the fault

reveals test case is executed first according to the

displayed priority.

APFD is defined as follows [13]:

𝐴𝑃𝐹𝐷 = 1 −∑ 𝑇𝐹𝑖𝑚𝑖=1

𝑛 ∗ 𝑚+

1

2𝑛

where, number of test cases or configurations

represented by n, in our approach, number of faults

represented by m, and TFi is the position of first test

T that exposes the fault.

VI. THREATS OF VALIDITY

We aim to minimize any negative effects, which

may affect the evaluation results. We have proposed

to develop a tool in our work, so some faults may

arise. The functionality of the presented framework

and its parameters have not been tested which may

cause a threat to the validity of the framework.

To solve our problem, we have used machine

learning algorithms (Fuzzy clustering and SVM) in

our approach. Other machine learning algorithms

can further improve quality in terms of testing. We

have used popular techniques to reduce test cases

and prioritize them, which already produces

desirable results. While further improvements from

other algorithms are possible.

VII. CONCLUSION AND FUTURE WORK

In this paper, we present an ML-driven approach

for the reduction of test cases and the prioritization

of the remaining cases in black-box testing. We have


15

presented our technique by incorporating other ML

algorithms fuzzy clustring with SVM rank. Initially,

we prepared test cases through the feature model.

The test cases are reduced by the clustering

technique and the remaining test cases are prioritized

with the help of SVM. To perform priorities, we

prepare the data, of which the test is divided into two

classes (valid and invalid) by the expert. After which

the prioritization is done by the ML algorithm. The

presented method has not been analytically tested,

which is lacking in this paper and is included in our

future work. Other than this, we will test various

feature models with other algorithms.

References

[1] C. Andrés, C. Camacho, and L. Llana, “A formal framework for

software product lines,” Inf. Softw. Technol., vol. 55, no. 11, pp.

1925–1947, 2013.

[2] C. Pietsch, T. Kehrer, U. Kelter, D. Reuling, and M. Ohrndorf,

“SiPL - A delta-based modeling framework for software product

line engineering,” in Proceedings - 30th IEEE/ACM International

Conference on Automated Software Engineering, ASE 2015, 2016,

pp. 852–857.

[3] D. Mellado, E. Fernández-Medina, and M. Piattini, “Security

requirements engineering framework for software product lines,”

Inf. Softw. Technol., vol. 52, no. 10, pp. 1094–1117, 2010.

[4] M. M. Alam, A. I. Khan, and A. Zafar, “A Secure Framework for

Software Product Line Development A Secure Framework for

Software Product Line Development,” Int. J. Comput. Appl., vol.

159, no. February, pp. 33–40, 2017.

[5] K. Pohl, G. Bockle, and F. Van der Linden, Software Product Line

Engineering:Foundations, Principles, and Techniques, vol. 49, no.

12. 2006.

[6] J. Kim, S. Park, and V. Sugumaran, “DRAMA: A framework for

domain requirements analysis and modeling architectures in

software product lines,” J. Syst. Softw., vol. 81, no. 1, pp. 37–55,

2008.

[7] M. Tanhaei, S. Moaven, and J. Habibi, “Toward an architecture-

based method for selecting composer components to make software

product line,” in International Conference on Information

Technology: New Generations (ITNG), 2010, pp. 1233–1236.

[8] C. K̈astner et al., “FeatureIDE: A tool framework for feature-

oriented software development,” Proc. - Int. Conf. Softw. Eng., pp.

611–614, 2009.

[9] D. Benavides, S. Segura, P. Trinidad, and A. R. Cortés, “FAMA:

Tooling a Framework for the Automated Analysis of Feature

Models,” in In Proceeding of the First International Workshop on

Variability Modelling of Software intensive Systems, 2007, vol. 1,

no. January, pp. 129–134.

[10] G. Kumar and P. K. Bhatia, “Software testing optimization through

test suite reduction using fuzzy clustering,” CSI Trans. ICT, vol. 1,

no. 3, pp. 253–260, 2013.

[11] T. Joachims, “Optimizing Search Engines using Clickthrough

Data,” in International Conference on Knowledge Discovery and

Data Mining, 2002, pp. 133–142.

[12] R. Lachmann, M. Nieke, C. Seidl, I. Schaefer, and S. Schulze,

“System-Level Test Case Prioritization Using Machine Learning,”

2016.

[13] G. Rothermel, R. H. Untcn, C. Chu, and M. J. Harrold, “Prioritizing

test cases for regression testing,” IEEE Trans. Softw. Eng., vol. 27,

no. 10, pp. 929–948, 2001.


16

Documents

Machine Learning Based Framework for Software ... - TEJAS