Upload
ngothuan
View
267
Download
2
Embed Size (px)
Citation preview
LogOpt: Static Feature Extraction from Source
Code for Automated Catch Block Logging
Prediction
Sangeeta Lal and Ashish Sureka
JIIT, Noida, India
ABB, Bangalore, India
[email protected], [email protected]
February 20, 2016
Outline
• Introduction
• Research Motivation
• Related Work
• Research Contribution
• Feature Details
• Proposed Framework
• Experimental Dataset
• Results
• Conclusion
2
Introduction
• Software Logging is an important software
development practice which is used to trace
important execution points in the source code.
• Example:
3
try {
thread.sleep(20) ;
log.debug("Security checking request “);
} catch(Exception e) { }
Why software logging is important ?
• Logging statements are helpful in debugging
• Yuan et al. [1] reported that bug reports having
logging statements are fixed faster (1.8 times)
• Logging is the only information available for
debugging many times
– Privacy concerns related to user input
– Difficulty of creating the same execution
environment (same hardware, software
version) 4
Why software logging is important?
Cont…
• Better than commonly used “printf” statements for
debugging
– Customization support of various libraries
such as Log4j (function name, timestamp,
threadID can be easily printed)
– Verbosity level (Info, Fatal, Error, Warn)can be
used based on severity and user application
need
5
Research Motivation
• Cost and benefit tradeoff in inserting logging statements
in the source code
– Sparse logging can lessen the benefits of logging by
leaving important information
– Excessive logging can cause performance and cost
overhead
• Developers often face difficulty in inserting logging
statement in the source code
– Lack of training required to make strategic logging
decision, logging mostly done based of knowledge
and expertise of the developers
– Several times developer modify logging statement as
afterthought 6
Related Work
• State of the art of automated logging/logging
prediction
S.No Author &
Year
Aim Dataset
1 Zhu et
al.[2015]
Logging prediction Proprietary Microsoft
software (C#)
2 Fu et al.
[2014]
Logging prediction Proprietary Microsoft
software (C#)+ Opens
source project
3 Yuan et
al.[2012]
Verbosity level
prediction
Open source
projects(C/C++)
4. Yuan et al.
[2011]
Enhancing log
statements
Open source project
7
• 1
• 2
• 3
Research Contribution
8
We propose LogOpt, a machine learning tool
based on static features from source code for
catch block logging prediction
We present results of comprehensive evaluation
of LogOpt on two large open source projects
with five machine learning algorithms
We identified 46 distinguishing features for
catch block logging prediction
• We extracted 46 distingushing features
• Each feature have three properties
– Domain: Identifies part of the source code from
where the feature is extracted
– Type: Identifies if a feature is textual, boolean and
numeric
– Class: Identifies whether a feature belongs to
positive or negative class
Feature Extraction
9
Features Extraction Cont…
10
• Example of catch block from Apache Tomcat project
showing three domains of the extracted features
Features Extraction Cont…
11
• Feature 1: LOC of try block [STB]
– Domain: try/catch
– Type: Numeric
– Class: Positive
• We hypothesize that complexity of try blocks can be an
indicator for catch blocks logging decision
• Empirical results show that average LOC of logged try
blocks is 9.19 as compared to that of non-logged catch
blocks (3.74)
Features Extraction Cont…
12
• Feature 2: Logged try block [LTB]
– Domain: try/catch
– Type: Boolean
– Class: Positive
• We hypothesize that presence of logging statements in
try can be an indicator for catch blocks logging decision
• Results show that 21.22% of logged catch blocks have
this feature as compared to non logged catch blocks
2.83
Features Extraction Cont…
13
• Feature 3: Thread.Sleep() in Try Block [TSTB]
– Domain: try/catch
– Type: Boolean
– Class: Negative
• We observe that try blocks consisting of call to
thread.sleep() are not logged.
• Results show that in 84 occurrences of try blocks with
thread.sleep() only 13 blocks are logged
Features Extraction Cont…
14
• Feature 4: Catch exception type [ETC]
– Domain: try/catch
– Type: Textual
– Class: Positive
• Empirical results shows that logging ratio of exception
types is skewed
• Ex: InterrupedException type have only 6.1% of catch
blocks logged
Features Extraction Cont…
15
• Feature 5: Return statements in catch block [RC]
– Domain: try/catch
– Type: Boolean
– Class: Negative
• Return statement used to transfer control to calling
method, and hence inserting logging statement after
return statements will not add any benefit
• Results shows that in 579 occurrences of return in try
blocks only 88 catch blocks are logged
Extracted Features
• Catch Block
– Listing of 46 extracted features
16
S.No Feature Name S.No Feature Name S.No Feature Name
1 Size of Try Block 17 Variable Declaration
Count in Method_BT
33 Call Name in Try Block
2 Size of Method_BT 18 Method Call Count in Try
Block
34 Method Call Name in
Method_BT
3 Catch Exception Type 19 Method Call Count in
Method_BT
35 Throw/Throws in Try Block
4 Previous Catch Blocks 20 Method have Parameter 36 Throw/Throws in Catch
Block
5 Logged Previous Catch
Blocks
21 Method Parameter Count 37 Throw/Throws in
Method_BT
6 Logged Try Block,Logged 22 Method Parameters(Type ) 38 Return in Try Block
7 Method_BT 23 Method Parameters (Name) 39 Return in Catch Block
8 Log Count Try Block 24 IF in Try 40 Return in Method_BT
9 Log Count in Method_BT 25 IF in Method_BT 41 Assert in Try Block
10 Log Levels in Try Block 26 IF Count in Try Block 42 Assert in Catch Block
11 Log Levels in Method_BT 27 IF Count in Method_BT 43 Assert in Method_BT
12 Operators in Try Block 28 Container Package Name 44 Thread.Sleep in Try Block
13 Operators in Method_BT 20 Container Class Name 45 Interuppted Exception Type
14 Count of Operators in Try
Block
30 Container Method Name 46 Exception Object "Ignore"
in Catch
15 Count of Operators in
Method_BT
31 Variable Declaration
Name in Try Block
16 Variable Declaration Count
in Try Block
32 Variable Declaration
Name in Method_BT
• Count and percentage across logged and non-logged
catch blocks (Apache Tomcat)
Empirical Analysis of Boolean Features
17
S.NO Feature Class TCP CLC CLC% CNLC CNLC% PTLC PTNLC
1 [PCC] P 411 165 40.15 246 59.86 18.63 10.09
2 [LPCC] P 140 131 93.58 9 6.43 14.79 0.37
3 [LTB] P 257 188 73.16 69 26.85 21.22 2.83
4 [LM] P 507 336 66.28 171 33.73 37.93 7.02
5 [ITB] P 817 399 48.84 418 51.17 45.04 17.14
6 [IM] P 1667 602 36.12 1065 63.89 67.95 43.67
7 [PM] P 2144 582 27.15 1562 72.86 65.69 64.05
8 [TTB] N 151 39 25.83 112 74.18 4.41 4.6
9 [TTC] N 850 85 10 765 90 9.6 31.37
10 [TTM] N 342 40 11.7 302 88.31 4.52 12.39
11 [RTB] N 783 87 11.12 696 88.89 9.82 28.54
12 [RC] N 579 88 15.2 491 84.81 9.94 20.14
13 [RM] N 338 75 22.19 263 77.82 8.47 10.79
14 [ATB] N 18 0 0 18 100 0 0.74
15 [AC] N 16 0 0 16 100 0 0.66
16 [AM] N 7 0 0 7 100 0 0.29
17 [TSTB] N 84 13 15.48 71 84.53 1.47 2.92
18 [IEC] N 98 6 6.13 92 93.88 0.68 3.78
19 [EOIC] N 58 5 8.63 53 91.38 0.57 2.18
Apache Tomcat CloudStack
S.No Feature Acronym AVLC AVNLC AVLC AVNLC
1 STB 9.19 3.74 13.00 11.73
2 SM 17.56 10.95 16.71 12.77
3 LCTB 0.36 0.03 0.89 0.06
4 LCM 0.97 0.18 1.16 0.14
5 COTB 26.72 10.90 40.95 48.88
6 COM 42.79 24.66 46.98 33.82
7 VCTB 1.05 0.39 2.65 3.05
8 VCM 2.03 1.39 3.32 2.65
9 MCTB 5.75 2.44 9.45 12.06
10 MCM 8.93 4.76 9.43 7.24
11 ICTB 1.54 0.44 1.68 1.43
12 ICM 2.95 1.64 2.19 1.01
13 PCM 1.96 2.26 2.26 1.50
Empirical Analysis of Numerical Features
18
• Average values of numerical features in logged and
non-logged catch blocks
• Overview of the proposed framework
LogOpt: Proposed Framework for logging
Prediction
19
• We use LogOpt model with following five machine
algorithms
– Adaboost(ADA)
– Decision Trees (DT)
– Gausian Naïve Bayes (GNB)
– K-nearest negibor (KNN)
– Random Forest (RF)
• We created 10 subsamples of –ve class and report
average results
• 70-30 train-test split
LogOpt Model
20
S.No Apache Tomcat CloudStack
1 Version 8.0.9 4.3.0
2 LOC (Java Code) 276081 1142928
3 Number of Java Files 2037 5351
4 Total Catch Blocks 3325 12584
5 Logged Catch Blocks 886 (27%) 2784(22.12%)
Experimental Dataset Details
21
• Two open source projects : Apache Tomcat and
CloudStack
Results
• LogOpt model results
Project Class Accuracy Precision Recall F1 ROC
Apache
Tomcat
ADA 81.04 80.40 82.41 81.33 81.06
DT 80.54 77.21 86.88 81.83 80.39
GNB 76.41 71.33 88.69 79.05 76.35
KNN 68.72 72.68 64.19 62.00 66.81
RF 85.12 83.98 87.11 85.50 85.10
Cloud-
Stack
ADA 91.68 89.81 94.04 91.87 91.68
DT 92.06 89.00 95.99 92.32 92.06
GNB 85.59 89.13 81.06 84.89 85.59
KNN 81.81 87.36 74.37 80.33 81.81
RF 92.92 88.26 99.02 93.34 92.92
22
Conclusion
• Machine learning based approach is effective in
catch block logging prediction giving highest F1-
score of 93.34% (CloudStack project)
• Random forest model give best results as compared
to other machine learning algorithms
23
References
• [1] D. Yuan, S. Park, Y. Zhou, Characterizing Logging
Practices in Open Source Software, ICSE, 2012.
• [2] Q. Fu, J. Zhu, W. Hu, J. Lou, R. Ding, Q. Lin,
Where do developers log? An empirical study on
Logging practices in Industry, ICSE, 2014.
• [3] J. Zhu, P. He, Q. Fu, H. Zhang, M. Lyu, and
D. Zhang, “Learning to log: Helping developers
make informed logging decisions,” in Software
Engineering (ICSE), 2015.
• [4] D. Yuan, J. Zheng, S. Park, Y. Zhou, and S.
Savage, Improving software diagnosability via log
enhancement., ASPLOS, 2011.24
Questions?