Upload
iosif-itkin
View
233
Download
6
Embed Size (px)
Citation preview
Using Cluster Analysis for
Characteristics Detection in
Software Defect Reports
Anna Gromova, Exactpro
Open Access Quality Assurance & Related Software Development for Financial Markets
Tel: +7 495 640 2460, +1 415 830 38 49
www.exactpro.com
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com2
Defect Management
Areas of research in defect management:
• automatic defect fixing
• automatic defect detection
• metrics and predictions of defect reports
• quality of defect reports
• triaging defect reports
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com3
• time to fix / time to resolve
• which defects get reopened
• which defects get fixed
• which defects get rejected
Examples of Testing Metrics
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com4
Defect clustering helps:
● Understand the nature of defects
● Understand software weaknesses
● Improve the testing strategy
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com5
1. Expanding the scope of defect report attributes
conventionally taken into account by the researchers in
the field and including implicit data to gain a wider
perspective in the process of bug examination.
2. Calculating the Silhouette and the Davies-Bouldin indices
to find the proper number of clusters for the k-means
clustering algorithm.
3. Providing a description and interpretation of the received
clusters.
Contribution
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com6
Clustering allows solving
the following tasks:
● finding bug duplicates;
● automating testing;
● predicting the testing
workload;
● improving the defect
management practices,
etc.
Clustering of defect reports: related work
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com7
Defect dataset
D={d1, d2 .. dn},
dj is a defect, n is the number of defects in the project.
dj= {Priority, Status, Resolution, Time to resolve, Count of
attachments, Count of comments, Area 1, .. Area k },
k is the number of defined areas of testing
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com8
Area of Testing: Component/s and Summary
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com9
The Attributes For Cluster Analysis
Attribute Data type Values
Priority [17] Categorical priority1, priority2, priority3, priority4
Status Categorical status1, status2, status3, status4, status5, status6
Resolution Categorical resolution1, resolution2, resolution3, resolution4, resolution5,
resolution6, resolution7, resolution8, resolution9, resolution10
Time to resolve Numeric
Count of comments Numeric
Count of
attachments
Numeric
Area i Boolean {0;1}
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com10
Distribution of Defect Reports (Project 1)
2,795 defects
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com11
Distribution of Defect Reports (Project 2)
5,788 defects
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com12
Objects of Сlassification According to the Area of Testing
DF={df1, df2 .. dfn}
dfj is defect, n is the number of bugs
dfj={component, area}
component=concatenate (Component/s, Summary)
area={0,1}M
M is the number of the areas of testing
Project 1: n=2,795 ; M=8
Project 2: n=5,788 ; M=10
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com13
Preprocessing: Text Fields
● Natural language processing:
❖ Tokenization
❖ Removal of stop-words
❖ Stemming
● Bag of words (TF-IDF)
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com14
Techniques: Feature Selection
● Information gain
● Consistency-based method
● Correlation-based method
● Simplified Silhouette Filter
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com15
● Logistic regression
● SVM
● Decision tree
● Random forest
● Naive Bayes
● Bayes Net
Techniques: classifiers
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com16
Results: Metrics
Area 1 Area 2 Area 3 Area 4 Area 5 Area 6 Area 7 Area 8
RF+Cons 0.942 0.837 0.92 0.95 0.951 0.975 0.991 0.975
SVM+Cons 0.946 0.844 0.914 0.954 0.954 0.976 0.991 0.965
Area 1 Area 2 Area 3 Area 4 Area 5 Area 6 Area 7 Area 8 Area 9 Area
10
RF+Cons 0.814 0.885 0.928 0.909 0.912 0.924 0.973 0.936 0.967 0.929
SVM+Cons 0.824 0.888 0.931 0.91 0.908 0.928 0.973 0.926 0.971 0.93
F-measure values for
Project 1
F-measure values for
Project 2
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com17
Preprocessing: Numeric Fields
● Standardization:
z=(x-μ)/σ
● The Pearson correlation coefficient
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com18
Clustering
C={c1, c2..ck..cg}
ck={dj, dq and dj ∊ D, dq ∊ D, distance (dj, dq)<s ),
s is a value that defines the proximity of objects to be
included in one cluster
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com19
The Results of the Validity Indices
Index Count of
clusters /
Project
2 3 4 5 6 7 8 9
Silhouett
e Index
1 0.9993 0.9996 0.9999 1 1 0.6489 0.6335 0.6375
Davies-
Bouldin
Index
1 0.2733 0.2445 0.1098 0.0488 4.5635e-
04
0.2367 0.3278 0.5492
Silhouett
e Index
2 0.9997 0.9997 0.9999 1 0.57 0.6284 0.6381 0.6002
Davies-
Bouldin
Index
2 0.2364 0.3939 0.1413 2.9406e-
04
0.3364 0.4304 0.5329 0.6636
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com20
Approach: Clustering
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com21
Final Centroids of The First Project
Attribute/ ClusterCluster0 Cluster1 Cluster2 Cluster3 Cluster4 Cluster5
653 175 909 426 208 424
Priority Priority3 Priority3 Priority3 Priority2 Priority2 Priority2
Status Status1 Status1 Status1 Status1 Status1 Status1
Resolution Resolution2 Resolution2 Resolution2 Resolution2 Resolution2 Resolution2
Time to resolve-0.112 0.2846 0.075 -0.2365 -0.2172 0.2385
Count of comments -0.0398 -0.3405 -0.0426 0.1529 -0.2418 0.2581
Count of attachments -0.1479-0.1257 0.1182 0.1012 -0.1763 0.0111
Area 1 0 0 0 0 0 0
Area 2 0 0 0 0 1 0
Area 3 0 0 0 0 0 0
Area 4 0 0 1 0 0 0
Area 5 0 0 0 0 0 1
Area 6 0 0 0 1 0 0
Area 7 0 1 0 0 0 0
Area 8 1 0 0 0 0 0
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com22
Final Centroids of the Second Project
Attribute/ cluster Cluster0 Cluster1 Cluster2 Cluster3 Cluster4
578 1855 2051 659 645
Priority Priority3 Priority2 Priority2 Priority3 Priority3
Status Status3 Status1 Status1 Status1 Status1
Resolution Resolution1 Resolution2 Resolution2 Resolution2 Resolution2
Time to resolve -0.4452 -0.0537 -0.2282 0.8606 0.3999
Count of comments 0.5361 -0.1157 -0.1576 -0.1688 0.526
Count of attachments 0.0263 0.1243 -0.181 0.0025 0.1921
Area 1 0 0 0 0 0
Area 2 0 0 0 0 0
Area 3 0 0 0 0 0
Area 4 0 0 0 0 0
Area 5 0 0 0 0 0
Area 6 0 0 0 0 1
Area 7 0 0 0 0 0
Area 8 0 0 0 1 0
Area 9 0 1 0 0 0
Area 10 0 0 0 0 0
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com23
How Clustering Results Affect the Testing Strategy
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com24
1. Using of an extraordinary set of attributes for the cluster
analysis.
2. Using the k-means algorithm, setting the number of clusters
by calculating the Silhouette and the Davies-Bouldin
indices.
3. Giving a description and interpretation of the received
clusters.
Conclusions
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com25
● Building an automated recommendation system for Project
Managers and QA Team Leads ;
● Improving the existing processes of developing the testing
strategies and plans;
● Analyzing threats to validity.
Future work
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com26
Thank you!
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com27
1. Bhattacharya P., Neamtiu I. Bug-fix time prediction models: Can we do better? / In Proc. 8th Working Conf. Mining Software
Repositories. P. 207—210 —New York, NY, USA: ACM, 2011.
2. Chaddock R.E. Principles and methods of statistics, First Edition — Cambridge: Houghton Miffin Company, The Riverside Press,
1952.
3. Gromova A.O. Defect Report Classification in Accordance with Areas of Testing / Proceedings of TMPA 2017 Conference. To be
published in Springer CCIS series in 2017.
4. Fry Z.P., Weimer W. Clustering static analysis defect reports to reduce maintenance costs / In Proc. Working Conference On
Reverse Engineering, WCRE, P. 282–291, 2013
5. Guo P.J., Zimmermann T., Nagappan N., Murphy B. Characterizing and predicting which bugs get xed: An empirical study of
Microsoft windows/ In Proc. 32nd ACM/IEEE Int. Conf. Software Eng., vol. 1, 2010, ser. ICSE ’10., P. 495—504 New York, NY,
USA: ACM.
6. Hooimeijer P., Weimer W. Modeling bug report quality / In ASE ’07: Proceedings of the twenty-second IEEE/ACM International
Conference on Automated Software Engineering , P. 34–43, 2007.
7. Lamkanfi A., Demeyer S., Soetens Q., Verdonck T. Comparing mining algorithms for predicting the severity of a reported bug. In
Proc. 15th Eur. Conf. Software Maintenance Reengineering (CSMR), P. 249—258, 2011.
8. Limsettho N., Hata H., Monden A., Matsumoto K. Automatic Unsupervised Bug Report Categorization / In 2014 6th International
Workshop on Empirical Software Engineering in Practice, P. 7—12, 2014 .
9. Minh P.N. An Approach to Detecting Duplicate Bug Reports using N-gram Features and Cluster Chrinkage Technique //
International Journal of Scientific and Research Publications (IJSRP), Volume 4 (5), 2014.
10. Rus V., Nan X., Shiva S., Chen Y. Clustering of Defect Reports Using Graph Partitioning Algorithms/ In proc. Of the 21st
International Conference on Software engineering and knowledge engineering, P. 442–445, 2009.
11. Nagwani N.K., Bhansali A. A data mining model to predict software bug complexity using bug estimation and clustering / In Proc.
2010 Int. Conf. Recent Trends Inf., Telecommun., Comput., ser. ITC ’10. P. 13–17, Washington, DC, USA: IEEE Computer
Society,2010.
12. Strate J.D., Laplante P. A. A literature review of research in software defect reporting // IEEE Transactions on Reliability, vol. 62,
2013, P. 444–454 .
13. Weiss C., Premraj R., Zimmermann T., Zeller A. How long will it take to fix this bug? / In Proc. 4th Int. Workshop Mining
Software Repositories, ser. MSR ’07. Washington, DC, USA: IEEE Computer Society, 1, 2007.
14. Zhou Y., Tong Y., Ruihang Gu, Gall H.C. Combining Text Mining and Data Mining for Bug Report Classification/ In Proc. of 30th
International Conference on Software Maintenance and Evolution (ICSM/ICSME), IEEE, P. 311–320, 2014.
Related work
Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49
www.exactpro.com28
The Area of testing: example
CR
T1: Property1 = true
T2: Property1 = true
Market Structure
Document
Ti: Property1 = false
Current situation
Market Structure Gateway
T1: Property1 = true T1: Property1 = NULL