A Machine Learning approach to predict Software Defects

A Machine Learning approach to predict Software

Defects

Chetan Hireholi

February 19, 2017

Abstract

Software engineering teams are not only involved in developing new versions of a

product but are often involved in fixing customer reported defects. Customers report issues

faced by a particular software and some of these issues may actually require the engineering

team to analyze and potentially provide a fix for the issue. The number of defects that a

software engineering team has to analyze is significant and teams often prioritize the order

of the defects based on the customer’s priority and to the extent the defect impacts the

business operations of the customer. Often, it is likely that the engineering team may not

truly understand the business impact that a defect is likely to have and this results in the

customer escalating the defect up the engineering team’s management chain seeking more

immediate attention to their problem.

Such escalated defects tend to consume a lot of engineering bandwidth

and increase defect handling costs; further such escalations impact existing plans as all crit-

ical resources are used to handle these cases. Software escalations are classified under three

categories: Red, Yellow and Green. The software defect report containing a high business

value is prioritized and is marked as Red, Yellow being the neutral one, which might be

escalated if appropriate attention is not given and the Green reports have the low priority.

The engineering team understands the nature of the software escalation and allocates the

resources appropriately. The objective of this project is to be able to analyze software defects

and predict the defects that are likely to be escalated by the customer. This would permit

the engineering team to be alerted about the situation and take proactive measures that

will give better support to the customers. For the purpose of our analysis, we have used the

defects database provided by Hewlett-Packard (HP) India. We have used the concepts of R

programming for cleaning the database and to pre-process it. We then extracted keywords

using natural language processing and then used machine learning (J 48 decision tree, Naıve

Bayes and Simple K Means) to predict the escalations. Thus by combining the key words

and the tickets received to the team, we can predict the nature of the Escalation and alert

the engineering team so that they can take respective steps appropriately.

i

Acknowledgment

I would like to take this to thank a lot of eminent personalities, without whose

constant encouragement and support, the endeavor of mine would not have been successful.

Firstly, I would like to thank the PES University, for having ”Final Year Project”

as a part of my curriculum, which gave me a wonderful opportunity to work on research and

presentation abilities, and provided excellent facilities, without which, this project could not

have acquired the orientation, it has now.

At the outset I would like to venerate Prof. Nitin V. Pujari, Chairperson, PES

University, who toned me encompassing the attitude towards the subject implied in this

literary work.

It gives me immense pleasure to thank Dr. K. V. Subramaniam, Department

of Computer Science and Engineering, PES University for his continuous support, advice

and guidance.

I would also like to thank Dr. Jayashree R., Department of Computer Sci-

ence and Engineering, PES University for her initiative and support which made this work

possible.

ii

Contents

Abstract i

Acknowledgment ii

List of figures v

1 Introduction 1

1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Literature Survey 4

2.1 A Probabilistic Model for Software Defect Prediction . . . . . . . . . . . . . 4

2.2 Predicting Bugs from History . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Exploring the Dataset 8

3.1 Incident Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3.2 Change Requests(CR) Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.3 Cleaning the Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4 Defect Escalation Analysis 14

4.1 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4.1.1 Analyzing the Incidents Dataset . . . . . . . . . . . . . . . . . . . . . 14

4.1.2 Analyzing Change Requests Dataset . . . . . . . . . . . . . . . . . . 21

4.2 Applying Machine Learning on the Dataset . . . . . . . . . . . . . . . . . . . 26

4.2.1 Classifying the Incidents Data . . . . . . . . . . . . . . . . . . . . . . 26

4.2.2 Clustering the Incidents Data . . . . . . . . . . . . . . . . . . . . . . 36

4.2.3 Text Mining and Natural Language Tool Kit (NLTK) . . . . . . . . . 42

5 Results and Conclusions 51

5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Bibliography 53

List of Figures

1 Predicting Bugs from History- Commonly used complexity metrics . . . . . . 5

2 Life Cycle of a Defect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Cleansing in OpenRefine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

4 Data Transformation in Microsoft Excel . . . . . . . . . . . . . . . . . . . . 12

5 Distribution of Incident Escalations . . . . . . . . . . . . . . . . . . . . . . . 14

6 Analyzing RED Incidents: Customers vs Escalations . . . . . . . . . . . . . 15

7 Analyzing RED Incidents: Modules vs Escalations . . . . . . . . . . . . . . . 17

8 Analyzing RED Incidents: Software release vs Escalations . . . . . . . . . . 17

9 S/w release vs Escalations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

10 Analyzing RED Incidents: OS vs Escalations . . . . . . . . . . . . . . . . . . 18

11 Analyzing RED Incidents: Developer vs Escalations . . . . . . . . . . . . . . 19

12 Other observations made on Incidents . . . . . . . . . . . . . . . . . . . . . . 20

13 Analyzing CR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

14 Analyzing CR data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

15 Analyzing CRs: Customers vs Escalations . . . . . . . . . . . . . . . . . . . 22

16 Analyzing CRs: Modules vs Escalations . . . . . . . . . . . . . . . . . . . . . 23

17 Analyzing CRs: S/w release vs Escalations . . . . . . . . . . . . . . . . . . . 23

18 Analyzing CRs: OS vs Escalations . . . . . . . . . . . . . . . . . . . . . . . 24

19 Analyzing CRs: Developer vs Escalations . . . . . . . . . . . . . . . . . . . . 25

20 Classifying using: J48 Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

21 Classifying using: J48 Tree: Prefuse Tree . . . . . . . . . . . . . . . . . . . . 28

22 Module as root node . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

23 Probability for MODULE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

24 Probability distribustion for SERVERITY SHORT . . . . . . . . . . . . . . 31

25 ESCALATION as the root node . . . . . . . . . . . . . . . . . . . . . . . . . 31

26 Probability distribution table for ESCALATION . . . . . . . . . . . . . . . . 32

27 Probability distribution table for EXPECTATION . . . . . . . . . . . . . . . 33

28 SEVERITY SHORT as the root node . . . . . . . . . . . . . . . . . . . . . . 33

iv

29 Probability distribution for SEVERITY SHORT . . . . . . . . . . . . . . . . 34

30 Probability distribution for Customer Expectations . . . . . . . . . . . . . . 35

31 Final Cluster Centroids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

32 Model and evaluation on training set . . . . . . . . . . . . . . . . . . . . . . 37

33 Cluster Centroids I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

34 Cluster Centroids II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

35 Total number of Incidents and thier Escalation count . . . . . . . . . . . . . 42

36 Words with highest frequency mined on GREEN tickets escalated to RED . 44

37 GREEN tickets escalated to RED . . . . . . . . . . . . . . . . . . . . . . . . 44

38 Words with highest frequency mined on GREEN tickets escalated to YELLOW 45

39 GREEN ticket escalated to YELLOW . . . . . . . . . . . . . . . . . . . . . . 45

40 Words with highest frequency mined on GREEN tickets escalated to YELLOW 46

41 YELLOW ticket escalated to RED . . . . . . . . . . . . . . . . . . . . . . . 46

42 Observations made on RED ticket were Escalated . . . . . . . . . . . . . . . 47

43 Plotting the highest mined words . . . . . . . . . . . . . . . . . . . . . . . . 47

44 Words with highest frequency mined . . . . . . . . . . . . . . . . . . . . . . 48

45 Plotting the words with highest frequency mined . . . . . . . . . . . . . . . . 48

46 Words with highest frequency mined . . . . . . . . . . . . . . . . . . . . . . 49

47 Plotting the words with highest frequency mined . . . . . . . . . . . . . . . . 49

48 Output of the program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

v

1 Introduction

In spite of diligent planning, documentation, and proper process adherence in

software development, occurrences of defects are inevitable. In today’s cutting edge com-

petition, it is important to make conscious efforts to control and minimize these defects by

using techniques to allow in-process quality monitoring and control. Predicting the total

number of defects before testing begins improves the quality of the product being delivered

and helps in planning and decision making for future project releases.

Defect Prediction in software is viewed as one of the most useful and cost efficient

operation. Software developers see it as a vital phase on which the quality of the product

being developed depends. It has taken up major part in bringing down the allegations on

the software industry, of being incapable to deliver the requirements within budget and on

time. Besides this, the clients response regarding the product quality has shown a large shift

from unsatisfactory to satisfactory.

Today, many data miners have replaced the earlier statistical approaches for defect

prediction. The basis of data mining is the classification model which places the component

in one of the two classes: fault prone or non fault prone.

Defects reported to the engineering team carry important information that may

lead to various important decisions. These defects are reported as Tickets. Each ticket

contain the nature of the escalation which is based on the business value.

These tickets may also give valuable information regarding the manner in which:

• Engineering defects are collected as part of the Quality & Analysis(QA) cycle – of

a software product or software application

• Product management which manages the usage and direction of the product

• Escalations: How often an Escalation happens?, What modules of the application is

buggy, and requires more attention

• Who has worked on the code-base? – during Development, QA and defect/ticket/in-

cidence fixing, etc.

1

1.1 Problem Definition

• Determine causes for Defects during the Engineering phase which may lead

to Escalation of Customer Support Cases

While most software defects are corrected and tested as part of the prolonged software

development cycle, enterprise software vendors often have to release software products

before all reported defects are corrected, due to deadlines and limited resources.

A small number of these reported defects will be escalated by customers whose busi-

nesses are seriously impacted. Escalated defects must be resolved immediately and

individually by the software vendors at a very high cost. The total costs can be even

greater, including loss of reputation, satisfaction, loyalty, and repeat revenue.

• Build a Recommendation Engine to alert on Escalation based on the nature

of defects to create an alert system for the team, given one such escalation

that has happened

By this Alerting Mechanism, the team can take proactive steps in preventing the

Escalation which is going to happen.

2

1.2 Motivation

• Market Research companies especially, Gartner and Forrester

� Predicts have conducted survey that 80% of the IT budgets goes towards main-

tenance of applications

� There is a dismal success rate of 2% of the new projects being successful (funded

from the 20% of the budget allocated to IT)

� The research hopes to finding the way tickets information (in our opinion – wealth

of information, which is being neglected so far) is being looked at

• Mine information that are hidden and are lost in the tickets dump. The mined infor-

mation will then be useful to deduce important details - which would help the Project

Manager of the team to plan out the activities appropriately.

3

2 Literature Survey

In this process we have found many works related to the software bugs prediction

and which has helped us in understanding the kind of knowledge that can be captured by

bugs. The following are the work carried out by the specific persons in the area of the Soft-

ware Defect Prediction:

2.1 A Probabilistic Model for Software Defect Prediction

Although a number of approaches have been taken to quality prediction for software, none

have achieved widespread applicability. The author’s aim here is to produce a single model

to combine the diverse forms of, often causal, evidence available in software development

in a more natural and efficient way than done previously. The authors use graphical prob-

ability models (also known as Bayesian Belief Networks) as the appropriate formalism for

representing this evidence. The authors have used the subjective judgments of experienced

project managers to build the probability model and use this model to produce forecasts

about the software quality throughout the development life cycle. Moreover, the causal or

influence structure of the model more naturally mirrors the real world sequence of events and

relations than can be achieved with other formalism. We used WEKA in order to apply the

Bayesian Network Classifier. We selected the attributed: Escalation, Expectation, Modules

& Severity. Then by rotating the attributes as the root nodes, results were captured.

A disadvantage of a reliability model of this complexity is the amount of data that is needed

to support a statistically significant validation study. More detailed description regarding

the application of Bayesian Classification is covered in section 4.3.[PMDM]

4

2.2 Predicting Bugs from History

Version and bug databases contain a wealth of information about software failures — how

the failure occurred, who was affected, and how it was fixed. Such defect information can be

automatically mined from software archives; and it frequently turns out that some modules

are far more defect-prone than others. How do these differences come to be?

The authors have researched how code properties like (a) code complexity, (b) the problem

domain, (c) past history, or (d) process quality affect software defects, and how their cor-

relation with defects in the past can be used to predict future software properties — where

the defects are, how to fix them, as well as the associated cost.[PRBD]

Figure 1:

Commonly used complexity metrics

5

Learning from history means learning from successes and failures — and how to make the

right decisions in the future. In our case, the history of successes and failures is provided

by the bug database: systematic mining uncovers which modules are most prone to defects

and failures. Correlating defects with complexity metrics or the problem domain is useful

in predicting problems for new or evolved components. Learning from history has one big

advantage: one can focus on the aspect of history that is most relevant for the current situ-

ation. Thus the history data provided to us by Helwet Packard(HP): which consisted of the

incident and the change requests data proved helpful during the statistical analysis. More

descriptive coverage of the statistical analysis is covered in section 4.2. The dataset given to

us by HP is explained in detail in section 3.

Some more research work carried out by few people on similar problem domain:

• Work done using the Machine Learning approach:

� Predicting Effort to Fix Software Bugs [PEFS]:

The authors have used the K - Nearest Neighbor approach to predict the effort

put by an engineer to fix the software bugs. In this study carried out by the

authors, their technique leverages existing issue tracking systems: given a new

issue report, they search for similar, earlier reports and use their average time

as a prediction. This technique will not be much useful in our problem domain,

since the work done here focuses on the assignment of the job to a resource for

which the effort is estimated.

� Cost-Sensitive Learning for Defect Escalation [CSDE]:

The authors have established that the Cost - Sensitive Decision Tress is the best

methods for producing the highest positive net profit. Our approach on the defect

reports is different from this, since we are not focusing on the cost aspect, but

on the Escalation of a defect report. We have used Decision Trees in the form of

Apriori Algorithms to derive rules with respect to Escalation.

6

� Predicting Failures with Hidden Markov Models [PHMM]:

The authors have come up with an approach using Hidden Markov Model(HMM)

to recognize the patterns in failures. Since HMMs give accurate outputs on lesser

attributes, we cannot use this approach in recognizing the defect patterns in the

defect reports.

� Data Mining Based Social Network Analysis from Online Behavior [DSNA]: The

authors have used the Neural Networks and done sentimental analysis on the social

networks to predict the online behavior of people. The approach used in sentiment

analysis have given me insights on Natural Language Tool Kit(NLTK) and how

NLTK can be used to find the sentiments of the user data. This motivated me

to pick up NLTK to analyze the tickets dataset provided by Hewlett Packard.

Detailed information on the application of NLTK is explained in section 4.

7

3 Exploring the Dataset

The methodology describes on exploring the data set acquired from Hewlett Packard(HP)

and closely analyzing it. This is the beginning phase of the project where the data is under-

stood and meaningful analysis is done. This gives an high level overview on the datasets.

HP had provided two datasets: Customer Incident data set and Change Requests(CR) data

set.

The Incidents dataset had the customer cases. These cases included - troubleshooting errors,

field issues, installation issues, environment issues and all other cases which are related to

the Software. When a customer logs a unique case which the team identifies it as a Change

Request, and a respective entry is made in the CR dataset.

Below are the characteristics of Incident & CR dataset:

3.1 Incident Dataset

• Contains 153 headers with 6,433 entries

Few important characteristics are:

� Each entry has the owner assigned to respective entries

� Each entry has an Unique ID called ISSUEID

� The life line of each entry is captured and is represented numerically(For e.g.

DAYS TO OPEN, DAYS TO FIXED, etc.)

� Each entry has an Escalation set by the CPE Support Team on consulting the

customer. The Escalation comes in 3 categories: RED, ORANGE & GREEN.

RED being the most high priority ticket, YELLOW being a potentially important

ticket and GREEN being a ticket with lesser business impact compared to the

Red and Yellow tickets.

� Each entry has an Expectation set by the CPE Support team on the inputs by

the customer(E.g. Investigate Issue & Hotfix requested, Answer Question, Create

Enhancement, Investigate Issue, etc.)

8

� The mail communication between the Developer and Customer can be found

under: NOTE CUSTOMER. This field contains all the information about the

Defect being tracked with the team.

� Each entry has a Severity set by the CPE Support Team on consulting the cus-

tomer. The Severity comes in 3 stages: Low, Medium & High.

� Each entry has a date describing the date on which the case was Escalated. It

called QUIXY ESCALATED ON DATE & QUIXY ESCALATED YELLOW DATE.

� Each entry has a RESOLUTION attribute, where it describes the resolution of

the case.

On observing the Incident data set, We needed to come up with a life cycle of how a defect

is tracked with the team.

The below figure illustrates the life cycle of the defect. This process is currently in use by

the team.

Figure 2:

Life Cycle of a Defect

9

3.2 Change Requests(CR) Dataset

The CRs dataset contains the Incident Cases which were identified as Change Requests.

• Contains 153 headers with 11960 entries

Few important characteristics are:

� Each entry has the owner assigned to respective entries

� Each entry has an Unique ID called ISSUEID

� The life line of each entry is captured and is represented numerically(For e.g.

DAYS TO OPEN, DAYS TO FIXED, etc.)

� Each entry has an Escalation set by the CPE Support Team on consulting the

customer. The Escalation comes as Y(Escalated) and N(Not Escalated)

� Each entry has Expectation set by the CPE Support team on the inputs by the

customer(For e.g. Investigate Issue & Hotfix requested, Answer Question, Create

Enhancement, Investigate Issue, etc.)

� The mail communication between the Developer and Customer can be found

under: NOTE CUSTOMER. This field contains all the information about the

Defect being tracked with the team.

� Each entry has a Severity set by the CPE Support Team on consulting the cus-

tomer. The Severity comes in 3 stages: Low, Medium & High.

� Each entry has a date describing the date on which the case was Escalated. It

called QUIXY ESCALATED ON DATE

� Each entry has a RESOLUTION attribute, where it describes the resolution of

the case.

10

3.3 Cleaning the Dataset

After receiving the huge dataset the next approach was to clean the data. There were many

discrepancies in the dataset viz., presence of non-numeric values in the dates field, the data

in the rows were shifted to the left by 2 - 4 columns; due to this data - shift, the data were

not aligned to its specific header. The following steps were performed to clean the dataset.

• Removing Discrepancies

OpenRefine (formerly Google Refine) is a powerful tool for working with messy data:

cleaning it; transforming it from one format into another; and extending it with web

services and external data.

This tool brought down the immense cleaning effort. OpenRefine helped to explore

large data sets with ease. It provided functions to transform the data to make it

unified. E.g. In the ”Customer” column, there were different names for a single

company. This tool helps in organizing varied names to a single one. Thus unifying

the word:- ”Vodafone”, ”Vodafone Inc”, ”vodafone” to a single name- ”Vodafone”. It

also help in removing special characters and make the text readable. It also takes care

of the case sensitivity of the context. i.e. we can edit the contents of multiple rows

using the feature: ”Text Facet”, shown in the figure next page.

• Removing the Unwanted Data

Microsoft Excel helped in converting the whole data set to a table. Then by moving

into the table, applying filters and filtering the data excluded all the noise. Then

further it was also possible to extract the pivot tables which favored in the statistical

analysis of the dataset.

11

Figure 3:

Figure 4:

Data Transformation in Microsoft Excel

12

• Removing Stop Words

The Text Mining (tm) package from R language helped converting the dataset into

a corpus. The pre - processing of the data is efficiently done by the ’tm’ package in

R. The various text transformations offered by the tm package are: “removeNumbers”

“removePunctuation” “removeWords” “stemDocument” “stripWhitespace”.

13

4 Defect Escalation Analysis

After getting the data cleaned, the next approach was Statistical Analysis of the data. The

Statistical Analysis will further unveil some under - the - hood facts

4.1 Statistical Analysis

The initial Statistical Analyses of the the data received from HP- incidents.csv and crs.csv

was carried out by Microsoft Excel(MS Excel). The Pivot Charts obtained from MS Excel

helped in graphically analyzing the huge datasets.

4.1.1 Analyzing the Incidents Dataset

The incidents.csv contained all the customer incidents received reported to team.

Figure 5:

Distribution of Incident Escalations incidents.csv

There in total 125 RED Escalated incidents, 3831 GREEN incidents and 329 YEL-

LOW incidents in the dataset.

14

• Analyzing RED Incidents: Customers vs Escalations

In certain situations, the escalation of a task is necessary. For example, a user is doing

a task, and is unable to complete that task within a certain period of time. In such

cases, where the user/customer is completely blocked, a case is RED Escalated. These

RED Escalated Incident Cases are High Priority cases and have to be addressed on

high severity.

Companies RHEINENERGIE, HEWLETT PACKARD, DEUTSCHE BANK had the

highest number of RED Escalations.

Figure 6:

Analyzing RED Incidents: Customers vs Escalations

15

• Company behavior analysis: RHEINENERGIE

RHEINENERGIE had maximum RED escalations among the other customers. There

were around 28 incident cases registered to the Operations Team. The Patterns ob-

served in those 28 incidents are:

� Out of the 28 escalations, 6 were RED escalated.

� 21.28% chance that an incident logged in will be a RED escalation

� Most reported module:

Ops - Monitor Agent (opcmona) (7 nos.) ; where 3 of them were RED

Installation(6 nos.)

Perf – Collector(3 nos.)

� Average number of days a single incident handled: 73.5 days

� Number of incidents which move to CR: 15; 53.57% of the incidents move to CRs

16

• Analyzing RED Incidents: Modules vs Escalations

The software modules: Ops - Action Agent (opcmona) & Installation had the

highest number of RED escalations reported to the Operations Team.

Figure 7:

Analyzing RED Incidents: Modules vs Escalations

• Analyzing RED Incidents: Software Release vs Escalations

Out of all the 125 RED escalated cases, the Operations Agent version 8.6 had the

maximum number of incident reports. Followed by version 11.14 & 11.02. These

Software Release Versions had the maximum escalations in the incident.

Figure 8:

Analyzing RED Incidents: Software release vs Escalations

17

The incident frequencies of the other Software Versions is shown below:

Figure 9:

S/w release vs Escalations

• Analyzing RED Incidents: Operating System(OS) vs Escalations

It can be observed that 83 entries out of 125 Red Escalations had a ’blank’ OS field.

After having a talk with a HP Developer, we got the inputs that the customers tend

to skip the OS field. But we further observed that the no specific version of the OS

which could aid the trouble shooting. The customers tend to put just the general OS

names viz. Windows, Solaris, etc. instead of mentioning the entire name which the

version details.

Figure 10:

Analyzing RED Incidents: OS vs Escalations

18

• Analyzing RED Incidents: Developer vs Escalations

This describes the Developer associated with the incident case. Below is the distribu-

tion of incidents among the developers. The developer who was been assigned with

high number of the incident cases is prasad.m.k hp.com

Figure 11:

Analyzing RED Incidents: Developer vs Escalations

• Other observations made on Incidents

Calculating the age of a single ticket was challenging. The data dump had two

columns: ”OPEN-IN-DATE and CLOSED-IN-DATE.The difference of those two

dates would give the total days taken to close the ticket. But it was contradicting

to the column present in the table: DAYS SUPPORT TO CPE. Both the values

were not matching. Below is a pictorial representation of the same:

19

Figure 12:

Other observations made on Incidents

20

4.1.2 Analyzing Change Requests Dataset

The second data set provided by HP was the CR data(Incident cases which were

”Change Requests”). These are the cases which are added to the product backlog.

Each entry had an escalation attached to it. The three escalations attached were:

Showstopper, Yes and No. Below is the distribution of CRs and the nature of

escalation it carried. There were in total 10,387 CR entries. Of these, 10219 cases did

not escalate, 75 cases escalated and 93 of them were marked as ”Showstopper”.

Figure 13:

Analyzing CR data

The Showstopper escalations were High Priority cases.

Figure 14:

CR data

21

• Analyzing CRs: Customers vs Escalations

The company TATA CONSULTANCY SERVICES LTD. had the maximum

Showstopper escalations. Where as Allegis, NORTHROP GRUMMAN,PepperWeed

are the companies with highest ”Y”(Yes) escalations.

Figure 15:

Analyzing CRs: Customers vs Escalations

• Analyzing CRs: Modules vs Escalations

The software module: Ops - Monitor Agent (opcmona) & Installation had the

highest ”Showstopper” escalations. Where as the modules: Installation & Lcore

had the highest number of ”Y”(yes) escalations.

22

Figure 16:

Analyzing CRs: Modules vs Escalations

• Analyzing CRs: Software Release vs Escalations

Software Release Version 11 had the highest ”Showstopper” and ”Y”(yes) escalations.

Where as the Software Release Version 8.6 was the second highest in having the ”Show-

stopper” and ”Y”(yes) escalations.

Figure 17:

Analyzing CRs: S/w release vs Escalations

23

• Analyzing CRs: OS vs Escalations

The Software running Windows OS had the maximum number of both ”Showstop-

per” and ”Y”(yes) escalations. Note: Submitter of these tickets tend to choose the OS

fields as they want to. Some choose the exact versions where the issue was seen or

reported or some choose just at a high level. No strict rules observed

Figure 18:

Analyzing CRs: OS vs Escalations

• Analyzing CRs: Developer vs Escalations

While analyzing the Developers and the Escalated tickets assigned to them: swati.sinha hp.com

was been assigned the highest number of ”Showstopper” escalations.

Where as umesh.sharoff hp.com was been assigned the highest number of ”Y”(yes)

escalations.

24

Figure 19:

Analyzing CRs: Developer vs Escalations

Post statistical analysis we used WEKA for applying few machine learning algorithms

onto the dataset. In the next section we will use few datamining concepts along with

machine learning algorithms to collect meaningfull conclusion from the dataset.

25

4.2 Applying Machine Learning on the Dataset

4.2.1 Classifying the Incidents Data

In this phase, classification and clustering is applied on the data to get to know the main

attributes which are responsible to trigger an Escalation. By using WEKA tool, which offers

various machine learning algorithms to use on the data set, certain informative conclusions

have been drawn. Classification is a data mining function that assigns items in a collection

to target categories or classes. The goal of classification is to accurately predict the target

class for each case in the data.

Here the target class will be the Escalation attribute of each ticket. The class assignments

are know: viz., Severity of a Bug, Expectation on the Defect Resolution, the Modules of the

Software, etc. By computing these important attributes of a ticket, the classification algo-

rithm finds the relationship between these values and predicts the vale of the target. In this

work, I have chosen J48 Decision Tree Algorithm and Bayes Network Classifier Algorithm

to predict the Target class: Escalation.

Classifying using: J48 Tree

The J48 Decision tree classifier follows the following simple algorithm. In order to classify a

new item, it first needs to create a decision tree based on the attribute values of the avail-

able training data. So, whenever it encounters a set of items (training set) it identifies the

attribute that discriminates the various instances most clearly. This feature that is able to

tell the most about the data instances so that it can classify them the best is said to have

the highest information gain. Now, among the possible values of this feature, if there is any

value for which there is no ambiguity, that is, for which the data instances falling within

its category have the same value for the target variable, then it terminates that branch and

assign to it the target value that it has obtained.

We used WEKA to apply the decision tree on the Data set.

Attributes selected:

• Escalations(Yellow, Red)

26

• Expectation(Contains the customer expectation on the resolution of the ticket from

the support team)

• Modules

• Severity

Results:

• SEVERITY SHORT = Urgent

• ESCALATION = Red: Ops - Monitor Agent (opcmona) (17.0/11.0)*

• ESCALATION = Yellow: Ops - Logfile Encapsulator (opcle) (21.0/17.0)*

• SEVERITY SHORT = High: Installation (92.0/74.0)*

• SEVERITY SHORT = Medium: Installation (23.0/20.0)*

• SEVERITY SHORT = Low: Ops - Message Agent (opcmsga) (1.0)

• Number of Leaves : 5

• Size of the tree : 7

• Correctly Classified Instances = 32 (20.7792 %)

• Incorrectly Classified Instances = 122 (79.2208 %)

• Kappa statistic = 0.0626

• Mean absolute error= 0.0595

• Root mean squared error= 0.1725

• Relative absolute error= 96.4179

• Root relative squared error= 98.5439

• Total Number of Instances= 154

27

* The first number is the total number of instances weightofinstances reaching the leaf.

The second number is the number weight of those instances that are miss-classified.

Figure 20:

Classifying using: J48 Tree

Figure 21:

Classifying using: J48 Tree: Prefuse Tree

After observing that the incorrectly classified instances were greater than the cor-

rectly classified instances, J48 Decision Tree did not yield the required answers.

28

Bayes Network Classifier, A Supervised Learning

Recent work in supervised learning has shown that a surprisingly simple Bayesian classifier

with strong assumptions of independence among features, called naive Bayes, is competi-

tive with state-of-the-art classifiers such as C4.5. This fact raises the question of whether a

classifier with less restrictive assumptions can perform even better. In this paper we evalu-

ate approaches for inducing classifiers from data, based on the theory of learning Bayesian

networks. These networks are factored representations of probability distributions that gen-

eralize the naive Bayesian classifier and explicitly represent statements about independence.

We used WEKA to apply this Classifier on the Data set.

By using this classifier, the Probability Distribution is found out:


• ESCALATION (Yellow, Red)

• EXPECTATION (Contains the customer expectation from the support team)

• MODULES

• SEVERITY SHORT

Results I: MODULE as the root node

• Red Escalation:

Highest probable module: Lcore- Control, Ops- Message Interceptor (opcmsgi)

• Yellow escalation:

Highest probable module: Ops-Trap interceptor, other, Lcore BBC

Figure 22: Module as root node

29

Figure 23: Probability for MODULE

• The modules: Ops - Action Agent (opcacta) & Installation: Have the highest

number of RED escalations reported to the Operations Team.

• When the SEVERITY is:

URGENT: Most probable module is : LCore BBC

HIGH: Most probable module is : Installation

MEDIUM: Most probable module is : LCore- XPL

LOW: Most probable module is : LCore-control, Opcmsgi, LCore-Security,

LCore-Deploy, Operation Agent, Agent Framework

The Probability Distribution of Modules vs Severity is shown in the next page:

30

Figure 24:

Probability distribustion for SERVERITY SHORT

Results II: ESCALATION as the root node

• Correctly Classified Instances= 80; 51.9481%

• Incorrectly Classified Instances= 74; 48.0519%

Figure 25:

ESCALATION as the root node

• Probability that it would be a RED Escalation : 24.20%

• Probability that it would be a YELLOW Escalation : 75.80%

31

Figure 26:

Probability distribution table for ESCALATION

• Probability of the EXPECTATION from Customer when it’s a RED Escalation:

Answer Question: 64%

Investigation & Hotfix requested: 47.4%

Investigate issue: 39.7%

Create Enhancement: 64%

• Probability of the EXPECTATION from Customer when it’s a YELLOW Escala-

tion:

Answer Question: 63%

Investigation & Hotfix requested: 56.7%

Investigate issue: 32.4%

Create Enhancement: 46%

32

Figure 27:

Probability distribution table for EXPECTATION

• Probability of the SEVERITY of the incident when it’s a RED Escalation:

URGENT: 44.90%

HIGH: 44.90%

MEDIUM: 09.00%

LOW: 13.00%

• Probability of the SEVERITY of the incident when it’s a YELLOW Escalation:

URGENT: 18.10%

HIGH: 63.40%

MEDIUM: 17.20%

LOW: 13.00%

Results III: SEVERITY SHORT as the root node

• Correctly Classified Instances= 63; 40.9091%

• Incorrectly Classified Instances= 91; 59.0909%

Figure 28:

SEVERITY SHORT as the root node

33

• Probability distribution for SEVERITY SHORT

Probability that it would URGENT is 24.7%

Probability that it would HIGH is 59.3%

Probability that it would MEDIUM is 15.1%

Probability that it would LOW is 0.01%

Figure 29:

Probability distribution for SEVERITY SHORT

• Probability distribution for RED escalation:

URGENT: 44.9%

HIGH: 18.8%

MEDIUM: 14.6%

LOW: 25%

• Probability distribution for YELLOW escalation:

URGENT: 55.1%

HIGH: 81.2%

MEDIUM: 85.4%

LOW: 75%

34

• Probability distribution for Customer Expectations

URGENT: Investigate Issue & Hotfix request

HIGH: Investigate Issue & Hotfix request

MEDIUM: Investigate Issue

LOW: Investigate Issue

Figure 30:

Probability distribution for Customer Expectations

35

4.2.2 Clustering the Incidents Data

Cluster analysis or clustering is the task of grouping a set of objects in such a way that

objects in the same group (called a cluster) are more similar (in some sense or another) to

each other than to those in other groups (clusters).

Simple K Means method - is a method of vector quantization.

Originally from signal processing, that is popular for cluster analysis in data mining. k-

means clustering aims to partition n observations into k clusters in which each observation

belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This

results in a partitioning of the data space into Voronoi cells.

We used WEKA for applying this method. the following are the results got from it:

• Number of iterations: 3

• Within cluster sum of squared errors: 261.0

• Cluster 0: Yellow,’Investigate Issue & Hotfix requested’, ’Ops - Trap Inter-

ceptor (opctrapi)’, Urgent

• Cluster 1: Red,’Investigate Issue & Hotfix requested’, Perf,High

Figure 31:

Final Cluster Centroids

36

Figure 32:

Model and evaluation on training set

In the below cluster centroid, the instances are divided according to the Escalation

Type of the tickets:

Figure 33:

Cluster Centroids I

37

In the below cluster centroid, the instances are dived according to the Severity nature

of the tickets:

Figure 34:

Cluster Centroids II

38

Predictive Apriori - An Apriori variant

Predictive Apriori algorithm use larger support and traded with higher confidence,

and calculate the expected accuracy in Bayesian framework. The result of this algo-

rithm maximizes the expected accuracy for future data of association rules.

We used WEKA to apply this algorithm to the incident Dataset.

Below are the findings:

(Try I)Attributes selected:

� ESCALATION (Yellow, Red)

� CUSTOMER ENTITLEMENT

� SEVERITY SHORT

� Best rules fond:

1. CUSTOMER ENTITLEMENT = Premier & SEVERITY SHORT

= Medium(8) == ESCALATION = Yellow(8) Accuracy: 95.49%


= High(38) == ESCALATION = Yellow 34 Accuracy: 83.22%

The above rules describe that: When the Customer Entitlement is Premier and

the Severity set to the ticket is Medium, then there is 95% chance that the Esca-

lation might be Yellow

we used WEKA to apply this algorithm on the Incidents Data sets.

(Try II)Attributes selected:



� SEVERITY SHORT

� MODULE

� OPERATING SYSTEM

39



= Medium(11) == ESCALATION = Yellow(11) Accuracy:98.84%

2. CUSTOMER ENTITLEMENT = Premier OS & Linux(11) == ES-

CALATION = Yellow(11) Accuracy: 98.46%

3. MODULE = Ops - Logfile Encapsulator[opcle](10) == ESCALA-

TION = Yellow(10) Accuracy: 98.70%

The above rules describe that:

When the module is Ops - Log Encapsulator[opcle], there is 98.70% chance that

it will be a Yellow Escalated.

When the Customer Entitlement is Premier and the Severity of the ticket is

Medium, there is 98.46% chance that it would be Yellow Escalated.

40

Simple Apriori Algorithm - Apriori Algorithm is an association rule mining that

was founded in 1994

Apriori Algorithm works by several steps. First, the candidate item sets are generated.

Then, scan the database to check the support of these item sets. This later will generate

the frequent 1-item sets. In this first scan, the 1-item sets are generated by eliminating

item sets with support below the threshold value. Later, the passes candidates became

k-item sets that generated after k-1 of threshold founded. The iteration of database

scanning and calculating support will be resulting support and confidence of each

association rule that found.




� MODULES

� SEVERITY SHORT

� OS


1. OS is Linux(19); ESCALATION observed is Yellow(18). The Con-

fidence achieved is 95%

2. CUSTOMER ENTITLEMENT is Premier & SEVERITY SHORT

is High(38); ESCALATION observed is Yellow(35). The Confidence

achieved is 92%

The above rules describe that:

When the OS is Linux, then there is 95% confidence that the Escalation is Yellow.

When the Customer Entitlement is Premier and the Severity of the ticket is high,

then there 92% confidence that the Escalation is Yellow.

41

4.2.3 Text Mining and Natural Language Tool Kit (NLTK)

After evaluating the results acquired from Phase I, a final conclusion cannot be drawn.

It did not answer- what actually triggers an incident so that it escalates. This phase

describes the use of Text Mining and Natural Language processing to determine the

triggering factor of an incident.

Figure 35:

Total number of Incidents and thier Escalation count

The purpose of Text Mining is to process unstructured (textual) information, extract

meaningful numeric indices from the text, and, thus, make the information contained

in the text accessible to the various data mining (statistical and machine learning)

algorithms. Information can be extracted to derive summaries for the words contained

in the documents or to compute summaries for the documents based on the words

contained in them.

We used the tm package in R for text mining the Incident tickets.

The tm package provides the methods for data import, corpus handling, pre-processing,

metadata management and creation of term-document matrices.

42

The main crux was hidden in finding out What made an Incident Ticket to get

RED Escalated from other escalation states?

The following is the step by step process in finding out:What might help to identify

the reason of an Escalation We took the dataset(Incidents.csv) and performed the

following tasks using R:

� Data Import: Load the text to a corpus

� Inspecting Corpora: to get a concise overview of the corpus

� Transformations: Modify the corpus - e.g., stemming, stop-word removal, etc.

� Creating Term-Document Matrices

� Operations on Term-Document Matrices e.g.: Calculating the word fre-

quencies, Plot the word frequencies, word cloud, etc.

43

Observations made on GREEN tickets which were RED Escalated It was

observed that, opcle was most talked module. It can be also observed that the words:

please, hotfix & support were used the most in the mail chain exchanged between

the customer and the developer.

Figure 36:

Words with highest frequency mined on GREEN tickets escalated to RED

Figure 37:

GREEN tickets escalated to RED

44

Observations made on GREEN ticket which were YELLOW Escalated It

was observed that, support was most talked word. It can be also observed that the

words: issue & time were used the most in the mail chain exchanged between the

customer and the developer.

Figure 38:

Words with highest frequency mined on GREEN tickets escalated to YELLOW

Figure 39:

GREEN ticket escalated to YELLOW

45

Observations made on YELLOW ticket which were RED Escalated This

set of corpus did not yield anything notable. But it did brought out the name of the

developer which was most associated with the resolution of the tickets

Figure 40:

Words with highest frequency mined on YELLOW tickets escalated to RED

Figure 41:

YELLOW ticket escalated to RED

46

Observations made on RED ticket were Escalated It was observed that, opc-

mona was most talked module. It can be also observed that the words: waiting,

hotfix & issue were used the most in the mail chain exchanged between the customer

and the developer.

Figure 42:

Observations made on RED ticket were Escalated

Figure 43:

Plotting the highest mined words

47

Observations made on the Whole RED Escalated Tickets There were in total

125 entries RED escalated in the Incidents. Text mining on all of the 125 entries

revealed the below details:

Figure 44:

Words with highest frequency mined

The words: issue, please, support & escalation were used the most in the mail

chain exchanged between Customer and the Team

Figure 45:

Plotting the words with highest frequency mined

48

Observations made on Whole GREEN Escalated Tickets There were in total

3831 entries GREEN escalated in the Incidents. Text mining on all of these entries

revealed the below details:

Figure 46:

Words with highest frequency mined

Mining the whole GREEN Escalated tickets did not yield valuable information.

Figure 47:

Plotting the words with highest frequency mined

49

The above observations show that the mail chains in which the key words: please,

hotfix, support & please are most expected to get converted to a RED Escalation.

The we used all these key words and built a program which takes input the Incident

data dump and scans the email chain. As the probability of these key words increases,

it alerts the user when it crosses the threshold limit. The threshold limit can be ad-

justed by the Developer based on the on going trend.

Figure 48:

Output of the program

50

5 Results and Conclusions

By text mining and applying machine learning algorithms on the incident dataset, we ob-

tained the below results:

• The mail chain of a ticket which is going to be escalated to Red will contain these

words in high occurrences : please, hotfix, support.

• The software modules: Ops - Logfile Encapsulator (Opcle), Ops - Action Agent (op-

cacta) & Installation had the highest number of red escalations reported to the team

as incidents. Whereas Ops - Monitor Agent (opcmona) & Installation had the highest

showstopper escalations for change requests.

• By applying the Predictive Apriori algorithm on the Incident dataset, we observed the

following:

� We got a confidence of 98.70% when a module reported was ’Opcle’ and the

Escalation of the case was ’Yellow’.

� We got a confidence of 98.84% for Escalation as ’Yellow’, when the Customer

Entitlement was ’Premier’ and Severity was ’Medium’

• We got two clusters formed by using Simple K - Means method:

� Cluster 1: Escalation is ’Yellow’, Customer expectation is ’Investigate issue &

Hotfix required’, Software module is ’Ops - Trap Interceptor’ and Severity is

’Urgent’.

� Cluster 2: Escalation is ’Red’, Customer expectation is ’Investigate issue & Hotfix

required’, Software module is ’Perf’ and Severity is ’High’.

For an Engineering team it is really important to avoid any major Red escalations. The

team gets a lot of incidents which need to be resolved in a limited time. Since the number of

incidents are more, it becomes hard for the team to keep track of all the incident issues with

respect of the criticality and the severity of the incident. By implementing such predictive

51

mechanisms where, an incident which will turn RED can be alerted to the team. This would

help the manager to allocated appropriate resources based on the criticality of the tickets

coming in. This would help in the resolution of the incident ticket within the stipulated time

and avoiding any unwanted escalations. This would indeed help in maintaining the trust of

the customer as well.

More accurate and varied results could have been achieved, but the missing data in the

dataset limited us from it. Due to the discrepancies in the data (data being shifted by 3-4

columns), we had to ignore such inconsistent entries in the dataset to perform the statistical

analysis.

5.1 Future Work

The use of NLTK proved to be very helpful in extracting the meaningful conclusion

from the tickets dataset. NLTK can be used to analyze real time behavior of the tickets

incoming to the team. This analysis can be used in providing proactive resolutions to the

customers, thus preventing the tickets to get escalated.

52

References

[BAG] L. Mach Learn, Bagging Predictors Breiman

[SFTWR] Ishani Arora, Vivek Tetarwal, Anju Saha, Software Defect Prediction

[PRBD] Thomas Zimmermann, NachiappanNagappan, Predicting Bugs from History

[SFRC] Stamatia Bibi, Grigorios Tsoumakas, I. Vlahavas, Ioannis Stamelos, Prediction Using

Regression via Classification

[HIDM] Felix Salfner, Predicting Failures with Hidden Markov Models

[IEMD] Ian H. Witten, Eibe Frank, Mark A. Hall,Data Mining Practical Machine Learning

Tools and Techniques, Third Edition, Copyright c© 2011 Elsevier Inc.

[EFRB] Eibe Frank,Computer Science Department, University of Waikato, New Zealand and

Remco R. Bouckaert,Xtal Mountain Information Technology, Auckland, New Zealand,

Naive Bayes for Text Classification with Unbalanced Classes

[JMJ11] Jiawei Han, Micheline Kamber, Jian Pei, Data Mining: Concepts and Techniques,

Third Edition, 2011 Addison-Wesley, Reading, MA, 1999.

[IT2008] Irina Tudor, “Association Rule Mining as a Data Mining Technique”, 2008.

[PMDM] Norman Fenton, Paul Krause and Martin Neil, A Probabilistic Model for Software

Defect Prediction, 2006

[BLMK] Billy Edward Hunt, Jr., Overland Park,KS (Us); Jennifer J- Kirkpatrick,Olathe, KS

(US); Richard Allan Kloss, Wlnllgen Josseph, SOFTWARE DEFECT PREDICTION,

2014

[WEKA] WEKA Online, www.cs.waikato.ac.nz/ml/weka.

[PEFS] Cathrin Weil, Rahul Premraj, Thomas Zimmermann, Andreas ZellerPredicting Effort

to Fix Software Bugs, 2006

53

[CSDE] Victor S. Shenga, Bin Gub, Wei Fangc, Jian Wud, Cost-Sensitive Learning for Defect

Escalation, 2001

[DSNA] Jaideep Srivastava, Muhammad A. Ahmad, Nishith Pathak, David Kuo-Wei Hsu Data

Mining Based Social Network Analysis from Online Behavior, 2008

[PHMM] Felix Salfner, Predicting Failures with Hidden Markov Models, 2005

54

Education

A Machine Learning approach to predict Software Defects