LogOpt: Static Feature Extraction from Source Code for ... · LogOpt: Static Feature Extraction from Source Code for Automated Catch Block Logging Prediction ... –Adaboost(ADA)

LogOpt: Static Feature Extraction from Source

Code for Automated Catch Block Logging

Prediction

Sangeeta Lal and Ashish Sureka

JIIT, Noida, India

ABB, Bangalore, India

[email protected], [email protected]

February 20, 2016

Outline

• Introduction

• Research Motivation

• Related Work

• Research Contribution

• Feature Details

• Proposed Framework

• Experimental Dataset

• Results

• Conclusion

2

Introduction

• Software Logging is an important software

development practice which is used to trace

important execution points in the source code.

• Example:

3

try {

thread.sleep(20) ;

log.debug("Security checking request “);

} catch(Exception e) { }

Why software logging is important ?

• Logging statements are helpful in debugging

• Yuan et al. [1] reported that bug reports having

logging statements are fixed faster (1.8 times)

• Logging is the only information available for

debugging many times

– Privacy concerns related to user input

– Difficulty of creating the same execution

environment (same hardware, software

version) 4

Why software logging is important?

Cont…

• Better than commonly used “printf” statements for

debugging

– Customization support of various libraries

such as Log4j (function name, timestamp,

threadID can be easily printed)

– Verbosity level (Info, Fatal, Error, Warn)can be

used based on severity and user application

need

5

Research Motivation

• Cost and benefit tradeoff in inserting logging statements

in the source code

– Sparse logging can lessen the benefits of logging by

leaving important information

– Excessive logging can cause performance and cost

overhead

• Developers often face difficulty in inserting logging

statement in the source code

– Lack of training required to make strategic logging

decision, logging mostly done based of knowledge

and expertise of the developers

– Several times developer modify logging statement as

afterthought 6

Related Work

• State of the art of automated logging/logging

prediction

S.No Author &

Year

Aim Dataset

1 Zhu et

al.[2015]

Logging prediction Proprietary Microsoft

software (C#)

2 Fu et al.

[2014]

Logging prediction Proprietary Microsoft

software (C#)+ Opens

source project

3 Yuan et

al.[2012]

Verbosity level

prediction

Open source

projects(C/C++)

4. Yuan et al.

[2011]

Enhancing log

statements

Open source project

7

• 1

• 2

• 3

Research Contribution

8

We propose LogOpt, a machine learning tool

based on static features from source code for

catch block logging prediction

We present results of comprehensive evaluation

of LogOpt on two large open source projects

with five machine learning algorithms

We identified 46 distinguishing features for

catch block logging prediction

• We extracted 46 distingushing features

• Each feature have three properties

– Domain: Identifies part of the source code from

where the feature is extracted

– Type: Identifies if a feature is textual, boolean and

numeric

– Class: Identifies whether a feature belongs to

positive or negative class

Feature Extraction

9

Features Extraction Cont…

10

• Example of catch block from Apache Tomcat project

showing three domains of the extracted features


11

• Feature 1: LOC of try block [STB]

– Domain: try/catch

– Type: Numeric

– Class: Positive

• We hypothesize that complexity of try blocks can be an

indicator for catch blocks logging decision

• Empirical results show that average LOC of logged try

blocks is 9.19 as compared to that of non-logged catch

blocks (3.74)


12

• Feature 2: Logged try block [LTB]


– Type: Boolean

– Class: Positive

• We hypothesize that presence of logging statements in

try can be an indicator for catch blocks logging decision

• Results show that 21.22% of logged catch blocks have

this feature as compared to non logged catch blocks

2.83


13

• Feature 3: Thread.Sleep() in Try Block [TSTB]


– Type: Boolean

– Class: Negative

• We observe that try blocks consisting of call to

thread.sleep() are not logged.

• Results show that in 84 occurrences of try blocks with

thread.sleep() only 13 blocks are logged


14

• Feature 4: Catch exception type [ETC]


– Type: Textual

– Class: Positive

• Empirical results shows that logging ratio of exception

types is skewed

• Ex: InterrupedException type have only 6.1% of catch

blocks logged


15

• Feature 5: Return statements in catch block [RC]


– Type: Boolean

– Class: Negative

• Return statement used to transfer control to calling

method, and hence inserting logging statement after

return statements will not add any benefit

• Results shows that in 579 occurrences of return in try

blocks only 88 catch blocks are logged

Extracted Features

• Catch Block

– Listing of 46 extracted features

16

S.No Feature Name S.No Feature Name S.No Feature Name

1 Size of Try Block 17 Variable Declaration

Count in Method_BT

33 Call Name in Try Block

2 Size of Method_BT 18 Method Call Count in Try

Block

34 Method Call Name in

Method_BT

3 Catch Exception Type 19 Method Call Count in

Method_BT

35 Throw/Throws in Try Block

4 Previous Catch Blocks 20 Method have Parameter 36 Throw/Throws in Catch

Block

5 Logged Previous Catch

Blocks

21 Method Parameter Count 37 Throw/Throws in

Method_BT

6 Logged Try Block,Logged 22 Method Parameters(Type ) 38 Return in Try Block

7 Method_BT 23 Method Parameters (Name) 39 Return in Catch Block

8 Log Count Try Block 24 IF in Try 40 Return in Method_BT

9 Log Count in Method_BT 25 IF in Method_BT 41 Assert in Try Block

10 Log Levels in Try Block 26 IF Count in Try Block 42 Assert in Catch Block

11 Log Levels in Method_BT 27 IF Count in Method_BT 43 Assert in Method_BT

12 Operators in Try Block 28 Container Package Name 44 Thread.Sleep in Try Block

13 Operators in Method_BT 20 Container Class Name 45 Interuppted Exception Type

14 Count of Operators in Try

Block

30 Container Method Name 46 Exception Object "Ignore"

in Catch

15 Count of Operators in

Method_BT

31 Variable Declaration

Name in Try Block

16 Variable Declaration Count

in Try Block

32 Variable Declaration

Name in Method_BT

• Count and percentage across logged and non-logged

catch blocks (Apache Tomcat)

Empirical Analysis of Boolean Features

17

S.NO Feature Class TCP CLC CLC% CNLC CNLC% PTLC PTNLC

1 [PCC] P 411 165 40.15 246 59.86 18.63 10.09

2 [LPCC] P 140 131 93.58 9 6.43 14.79 0.37

3 [LTB] P 257 188 73.16 69 26.85 21.22 2.83

4 [LM] P 507 336 66.28 171 33.73 37.93 7.02

5 [ITB] P 817 399 48.84 418 51.17 45.04 17.14

6 [IM] P 1667 602 36.12 1065 63.89 67.95 43.67

7 [PM] P 2144 582 27.15 1562 72.86 65.69 64.05

8 [TTB] N 151 39 25.83 112 74.18 4.41 4.6

9 [TTC] N 850 85 10 765 90 9.6 31.37

10 [TTM] N 342 40 11.7 302 88.31 4.52 12.39

11 [RTB] N 783 87 11.12 696 88.89 9.82 28.54

12 [RC] N 579 88 15.2 491 84.81 9.94 20.14

13 [RM] N 338 75 22.19 263 77.82 8.47 10.79

14 [ATB] N 18 0 0 18 100 0 0.74

15 [AC] N 16 0 0 16 100 0 0.66

16 [AM] N 7 0 0 7 100 0 0.29

17 [TSTB] N 84 13 15.48 71 84.53 1.47 2.92

18 [IEC] N 98 6 6.13 92 93.88 0.68 3.78

19 [EOIC] N 58 5 8.63 53 91.38 0.57 2.18

Apache Tomcat CloudStack

S.No Feature Acronym AVLC AVNLC AVLC AVNLC

1 STB 9.19 3.74 13.00 11.73

2 SM 17.56 10.95 16.71 12.77

3 LCTB 0.36 0.03 0.89 0.06

4 LCM 0.97 0.18 1.16 0.14

5 COTB 26.72 10.90 40.95 48.88

6 COM 42.79 24.66 46.98 33.82

7 VCTB 1.05 0.39 2.65 3.05

8 VCM 2.03 1.39 3.32 2.65

9 MCTB 5.75 2.44 9.45 12.06

10 MCM 8.93 4.76 9.43 7.24

11 ICTB 1.54 0.44 1.68 1.43

12 ICM 2.95 1.64 2.19 1.01

13 PCM 1.96 2.26 2.26 1.50

Empirical Analysis of Numerical Features

18

• Average values of numerical features in logged and

non-logged catch blocks

• Overview of the proposed framework

LogOpt: Proposed Framework for logging

Prediction

19

• We use LogOpt model with following five machine

algorithms

– Adaboost(ADA)

– Decision Trees (DT)

– Gausian Naïve Bayes (GNB)

– K-nearest negibor (KNN)

– Random Forest (RF)

• We created 10 subsamples of –ve class and report

average results

• 70-30 train-test split

LogOpt Model

20

S.No Apache Tomcat CloudStack

1 Version 8.0.9 4.3.0

2 LOC (Java Code) 276081 1142928

3 Number of Java Files 2037 5351

4 Total Catch Blocks 3325 12584

5 Logged Catch Blocks 886 (27%) 2784(22.12%)

Experimental Dataset Details

21

• Two open source projects : Apache Tomcat and

CloudStack

Results

• LogOpt model results

Project Class Accuracy Precision Recall F1 ROC

Apache

Tomcat

ADA 81.04 80.40 82.41 81.33 81.06

DT 80.54 77.21 86.88 81.83 80.39

GNB 76.41 71.33 88.69 79.05 76.35

KNN 68.72 72.68 64.19 62.00 66.81

RF 85.12 83.98 87.11 85.50 85.10

Cloud-

Stack

ADA 91.68 89.81 94.04 91.87 91.68

DT 92.06 89.00 95.99 92.32 92.06

GNB 85.59 89.13 81.06 84.89 85.59

KNN 81.81 87.36 74.37 80.33 81.81

RF 92.92 88.26 99.02 93.34 92.92

22

Conclusion

• Machine learning based approach is effective in

catch block logging prediction giving highest F1-

score of 93.34% (CloudStack project)

• Random forest model give best results as compared

to other machine learning algorithms

23

References

• [1] D. Yuan, S. Park, Y. Zhou, Characterizing Logging

Practices in Open Source Software, ICSE, 2012.

• [2] Q. Fu, J. Zhu, W. Hu, J. Lou, R. Ding, Q. Lin,

Where do developers log? An empirical study on

Logging practices in Industry, ICSE, 2014.

• [3] J. Zhu, P. He, Q. Fu, H. Zhang, M. Lyu, and

D. Zhang, “Learning to log: Helping developers

make informed logging decisions,” in Software

Engineering (ICSE), 2015.

• [4] D. Yuan, J. Zheng, S. Park, Y. Zhou, and S.

Savage, Improving software diagnosability via log

enhancement., ASPLOS, 2011.24

Thank You

[email protected]

[email protected]

mailto:[email protected]

mailto:[email protected]

Questions?

Documents

LogOpt: Static Feature Extraction from Source Code for ... · LogOpt: Static Feature Extraction from Source Code for Automated Catch Block Logging Prediction ... –Adaboost(ADA)