Promise 2011: Keynote 2 - "Nothing else Matters: What Predictive Model should I use?"

Nothing else Matters: what Predictive Model

should I use?Massimiliano Di Penta

University of Sannio, [email protected]

http://www.rcost.unisannio.it/mdipenta

mailto:[email protected]

mailto:[email protected]



FAQ when people met me for the first time at

a conference

University of... what?






M. Di Penta

About me

4

M. Di Penta

About me

• Not really a wizard ofpredictor models

• Software evolution

• Mining software repositories

• Experimental software engineering

• Search-based software engineering

4

M. Di Penta

Interests

5

M. Di Penta

InterestsDesign and experiment material

Claros

Claros

WfMS

WfMS

Lab 2

WfMS

WfMS

Claros

Claros

Lab 1

Group 4 Group 3 Group 2 Group 1

C o n a l l e nUML

UML UML

C o n a l l e n

C o n a l l e n C o n a l l e n

UML

!  Subjects received: "  Short description of the application

"  Diagrams

"  Source code

5

M. Di Penta


Claros

Claros

WfMS

WfMS

Lab 2

WfMS

WfMS

Claros

Claros

Lab 1


C o n a l l e nUML

UML UML

C o n a l l e n


UML


"  Diagrams

"  Source code

8

Example of CS Pair

package org.argouml.uml.cognitive.critics;...

public class CrNoOutgoingTransitions extends CrUML {

...public boolean predicate2(Object dm, Designer dsgr) {if (!(dm instanceof MStateVertex)) return NO_PROBLEM;MStateVertex sv = (MStateVertex) dm;if (sv instanceof MState) {MStateMachine sm = ((MState)sv).getStateMachine();if (sm != null && sm.getTop() == sv) return NO_PROBLEM;}Collection outgoing = sv.getOutgoings();boolean needsOutgoing = outgoing == null || outgoing.size() == 0;if (sv instanceof MFinalState) {needsOutgoing = false;}if (needsOutgoing) return PROBLEM_FOUND;return NO_PROBLEM;}

} /* end class CrNoOutgoingTransitions */

1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:


public class CrNoIncomingTransitions extends CrUML {

...public boolean predicate2(Object dm, Designer dsgr) {if (!(dm instanceof MStateVertex)) return NO_PROBLEM;MStateVertex sv = (MStateVertex) dm;if (sv instanceof MState) {MStateMachine sm = ((MState)sv).getStateMachine();if (sm != null && sm.getTop() == sv) return NO_PROBLEM;}//Vector outgoing = sv.getOutgoing();Collection incoming = sv.getIncomings();//boolean needsOutgoing = outgoing == null || outgoing.size() == 0;boolean needsIncoming = incoming == null || incoming.size() == 0;if (sv instanceof MPseudostate) {MPseudostateKind k = ((MPseudostate)sv).getKind();if (k.equals(MPseudostateKind.INITIAL)) needsIncoming = false;//if (k.equals(MPseudostateKind.FINAL)) needsOutgoing = false;}// if (needsIncoming && !needsOutgoing) return PROBLEM_FOUND;if (needsIncoming) return PROBLEM_FOUND;return NO_PROBLEM;}

} /* end class CrNoIncomingTransitions */

1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:

CrNoIncomingTransitions.java (ver. 1.1) CrNoOutgoingTransitions.java (ver. 1.1)

CS1

CS2

CS3

CS4

5

M. Di Penta


Claros

Claros

WfMS

WfMS

Lab 2

WfMS

WfMS

Claros

Claros

Lab 1


C o n a l l e nUML

UML UML

C o n a l l e n


UML


"  Diagrams

"  Source code

66

Evolution of vulnerability density

•  Splint vulnerabilities tend to have a lower density (thorough analysis)

•  Initially, a high number vulnerabilities detected by RATS –  Pre-release, then

vulnerabilities removed by security patches

•  No trend detected (ADF test)

Samba - Overall Squid – Buffer Overflows

•  Buffer Overflows introduced at release 2.3 STABLE3

•  Then removed in the subsequent releases 2.4STABLE7 and 2.5STABLE7 with proper security patches –  As documented in the system

history

8

Example of CS Pair





1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:





1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:


CS1

CS2

CS3

CS4

5

M. Di Penta


Claros

Claros

WfMS

WfMS

Lab 2

WfMS

WfMS

Claros

Claros

Lab 1


C o n a l l e nUML

UML UML

C o n a l l e n


UML


"  Diagrams

"  Source code

66









history

8

Example of CS Pair





1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:





1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:


CS1

CS2

CS3

CS4

11

Recall the content of a licensing… /* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK ***** * Version: MPL 1.1/GPL 2.0/LGPL 2.1 * * The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.mozilla.org/MPL/

…. * Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved. * * Contributor(s): * Brian Ryner <[email protected]>

…. * decision by deleting the provisions above and replace them with the notice * and other provisions required by the GPL or the LGPL. If you do not delete * the provisions above, a recipient may use your version of this file under * the terms of any one of the MPL, the GPL or the LGPL. * * ***** END LICENSE BLOCK ***** */ #include "nsXULAppAPI.h" #ifdef XP_WIN #include <windows.h>

License (MPL+GPL+LGPL)

Copyright statement

Copyright year

Contributor

D. M. German and M. Di Penta

5

M. Di Penta


Claros

Claros

WfMS

WfMS

Lab 2

WfMS

WfMS

Claros

Claros

Lab 1


C o n a l l e nUML

UML UML

C o n a l l e n


UML


"  Diagrams

"  Source code

66









history

RQ3 – CSBF Graph (excerpt) Blue/cyan: FreeBSD Red/orange: OpenBSD Yellow: common

8

Example of CS Pair





1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:





1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:


CS1

CS2

CS3

CS4

11





Copyright statement

Copyright year

Contributor

D. M. German and M. Di Penta

5

M. Di Penta


Claros

Claros

WfMS

WfMS

Lab 2

WfMS

WfMS

Claros

Claros

Lab 1


C o n a l l e nUML

UML UML

C o n a l l e n


UML


"  Diagrams

"  Source code

66









history

RQ3 – CSBF Graph (excerpt) Blue/cyan: FreeBSD Red/orange: OpenBSD Yellow: common

8

Example of CS Pair





1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:





1:...12:13:14:...30:31:32:33:34:35:36:37:38:39:40:41:42:43:44:45:46:47:48:49:50:51:


CS1

CS2

CS3

CS4

11





Copyright statement

Copyright year

Contributor

D. M. German and M. Di Penta 76

Association rules vs. Granger

A B C D E

A

C

A

C

B

D

B

D

E

D

E

C

A

D

S1 S2 S3 S4 S7

E

S5 S6 S8 S9

Changes occurring in snapshots

File

s

Association rules: A→C, B→D, D→E Granger causality test: A→{B,D}, C→{D,E}

5

M. Di Penta

Outline• Many models ...

• Providing the right suggestionsto developers

• Approaching causation

• Bias in datasets

• Model usability

6

M. Di Penta

Some popular prediction models

• Bug prediction models suggest artifacts that will likely exhibit faults

• Change impact models suggest artifacts likely impacted by changes occurring to other artifacts

7

M. Di Penta

A few examples...• Code Metrics (e.g., CK suite):

[Basili et al., 1996, Gyimothy et al., 2005]

• Process Metrics [Moser et al. 2009, Hassan 2009]

• Bug caching/previous defects [Ostrand et al. , 2005, Kim et al. 2007]

• Bug introducing changes [Kim et al., 2008]

• Recent survey and comparison:

• Marco D’Ambros, Michele Lanza, and Romain Robbes: Evaluating defect prediction approaches: a benchmark and an extensive comparison. Empir. Software Eng., 2011 (available online)

8

M. Di Penta

The good news

• Most of these models have very good performances

• Evaluated on industrial, as well as open source data sets

• They capture different facets of software complexity

• that is likely to be a symptom (and cause?) of fault-proneness

9

M. Di Penta

Is that true?

• Indeed, there have been substantial research advances in this field

• However, as a matter of fact, industry seldom uses predictive models

• Or use very simple ones...

• Of course there are exceptions...

10

M. Di Penta

Open problems and barriers to adoption of bug prediction models

• ESEC/FSE 2011 Project Working Group

• http://pwg.sed.hu

• We surveyed conference participants

• Awarded as the best working group

• Thanks to the exceptional team:

• Emitzá Guzmán Ortega, Amir Molzam Sharifloo, Dávid Tengeri, Melinda Tóth, Zuoning Yin, and Marco D’Ambros (group leader)

11

http://pwg.sed.hu

http://pwg.sed.hu

Let’s start to see what kind of problem we

face off ...

M. Di Penta

Nothing else Matters

• Defects are certainly inserted when the code is very complex but...

• ...there are many other characteristics of the software we should be aware of

• Design, lexicon, legal issues, when changes are performed ...

• They can also relate to bugs

13

M. Di Penta

Increasing the level of abstraction

• Often we look at the quality of code

• Let’s try to observe the design instead

• Antipatterns encode poor design choices

• As design patterns encode (possibly) good design choices

• Various catalogues, very popular the one by Brown (40 antipatterns)

14

M. Di Penta

Examples of antipatterns

• LazyClass: a class does too little

• MessageChain: a functionality requires a long chain of method calls between classes

• Blob: large class centralizing behavior

15

M. Di Penta

Antipatternsand fault/change-proneness

• As metric models, but at a higher level of abstraction

• Empirical study carried out on several releases of four systems:

• ArgoUML, Eclipse, Mylyn, and Rhino

Foutse Khomh, Massimiliano Di Penta, Yann-Gael Guéhéneuc, and Giuliano Antoniol : An Exploratory Study of the Impact of Antipatterns on Class Change- and Fault-Proneness. In

Emp. Soft. Engineering, 2011 (available online)

16

M. Di Penta

Method

• H0: proportion of faulty antipattern classes = proportion of faulty non-antipattern classes

• Fisher’s exact test and Odds Ratio (OR)

• Logistic regression model to study the significant effect of each kind of antipattern

⇡(X1, X2, . . . , Xn) =eC0+C1·X1+···+Cn·Xn

1 + eC0+C1·X1+···+Cn·Xn

OR =p/(1� p)

q/(1� q)

17

M. Di Penta

Antipatterns and Fault-Proneness

05

101520

0.10.1 0.14 0.18.1 0.22 0.26

ArgoUML

Odd

s R

atio

Releases

01234

1.0 2.1.2 3.0.1 3.2.1 3.3.1

Eclipse

Odd

s R

atio

Releases

08

152330

1.0.1 2..0M1 2.0M3

Mylyn

Odd

s R

atio

Releases

010203040

1.4.R3 1.5R3 1.5R5 1.6R3 1.6R6

Rhino

Odd

s R

atio

Releases18

M. Di Penta

Fault-Proneness: What Antipatterns?

AntiSingleton

Blob

CDSBP

ComplexClass

LargeClass

LazyClass

LongMethod

LPL

MessageChain

RPB

0% 25% 50% 75% 100%

ArgoUML Eclipse Mylyn Rhino

% of releases where the antipattern significantly correlates with fault proneness19

M. Di Penta

Code Lexicon• Various recent studies have investigated the relationship

between code lexicon and quality attributes

• Maintainability, Fault proneness [Takang et al. , 1996, Lawrie et al., 2006, 2007]

• “Conceptual” CK metrics and use to predict fault-proneness

• Conceptual Cohesion [Marcus et al., 2005, 2008]

• Conceptual Coupling [Poshyvanyk and Marcus et al., 2006]

• Predictive models [Ujhazi et al., 2010]

• Conceptual metrics capture different components of fault-proneness than structural metrics

20

M. Di Penta

Developers take care of renaming

Laleh Mousavi Eshkevari, Venera Arnaoudova, Massimiliano Di Penta, Rocco Oliveto, Yann-Gaël Guéhéneuc, Giuliano Antoniol: An exploratory study of identifier renamings. MSR 2011: 33-42

21

Renaming Example

add meaning type ! authtype (T)

resource ! visitedResource (E)

remove meaning copyJAR ! copy (T)

fTypeBinding ! fBinding (E)

same meaning committed ! commited (T)

methodsBu↵er ! methodsBu↵ered (E)

gen/spec scanCurrentPosition ! scanCurrentLine (E)

thrownExceptionSize ! thrownExceptionLength (E)

opposite meaning findNextLevelChildrenByElementName !findNextLevelParentByElementName (E)

hasClosingBracket ! hasOpeningBracket (E)unrelated meaning createContents ! createControl (E)

getClusterReceiver ! getChannelReceiver (T)

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/e/Eshkevari:Laleh_Mousavi.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/e/Eshkevari:Laleh_Mousavi.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Arnaoudova:Venera.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Arnaoudova:Venera.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/o/Oliveto:Rocco.html


http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/g/Gu=eacute=h=eacute=neuc:Yann=Ga=euml=l.html




http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Antoniol:Giuliano.html


http://www.informatik.uni-trier.de/~ley/db/conf/msr/msr2011.html#EshkevariAPOGA11

http://www.informatik.uni-trier.de/~ley/db/conf/msr/msr2011.html#EshkevariAPOGA11

M. Di Penta

Licensing can be faulty too!• In 2004, MySQL AB changed the license of its client libraries

from LGPL v2.1 to GPL v2 to prevent industrial companies from using the libraries within proprietary products

• Unintended consequences:

• PHP systems were no longer able to connect to MySQL

• PHP license is incompatible with the GPL v2

• MySQL addressed this problem by adding the MySQL FOSS License Exception to the GPL v2

Changing the license of a FOSS system might have unintended/undesirable consequences to its legitimate users

22

M. Di Penta

Wrong license changes

• Mozilla changed its license from the NPL (commercial) to a combination of multiple open source licenses (MPL + GPL)

• At some point someone changed back on some files to NPL (bug #98089)

MozillaMozillaMozillaMozillaNPL 'NPL v1.1'-style+GPL v2+LGPL

v2.1DUAL 2914

NPL 'Dual MPL GPL'-style+MPL DUAL 1274

'Dual MPL GPL'-style+MPL NPL BUG 1194

23

Massimiliano Di Penta, Daniel M. Germán, Yann-Gaël Guéhéneuc, Giuliano Antoniol: An exploratory study of the evolution of software licensing. ICSE (1) 2010: 145-154



http://www.informatik.uni-trier.de/~ley/db/conf/icse/icse2010-1.html#PentaGGA10

http://www.informatik.uni-trier.de/~ley/db/conf/icse/icse2010-1.html#PentaGGA10

M. Di Penta

Different kinds of problems:

1. declared license inconsistent wrt. source code

2. dependencies create license incompatibility

Binary 1

Binary 1

Source 1Lic: GPLv2

Source 2Lic: LGPL

Source 3Lic: BSD

Source 4Lic: GPLv3

Requires: Lib1

License: GPLv2

Binary package

Src package

Lib 1Lic: GPLv3

Licensing Inconsistencies in RPM Packages

24

M. Di Penta




Binary 1

Binary 1

Source 1Lic: GPLv2

Source 2Lic: LGPL

Source 3Lic: BSD

Source 4Lic: GPLv3

Requires: Lib1

License: GPLv2

Binary package

Src package

Lib 1Lic: GPLv3


24

M. Di Penta




Binary 1

Binary 1

Source 1Lic: GPLv2

Source 2Lic: LGPL

Source 3Lic: BSD

Source 4Lic: GPLv3

Requires: Lib1

License: GPLv2

Binary package

Src package

Lib 1Lic: GPLv3


24

M. Di Penta

License Dependency Issues

• Two GPLv2 source packages (lvm2, pilot-link) were using the library readline (GPLv3+)

• License evolution problem

• PHP was dynamically linking readline, a violation of the GPLv3+

• Problem was created by a build script

• PHP either uses readline (GPLv3+) or libedit (BSD3) depending on what it finds

25

M. Di Penta

In summary

• Different characteristics of a software system can induce defects

• Some can be used to build predictors, some are good just to raise warnings

• Many studies showed that these models captures different dimensions of fault-proneness

26

so... we know how to correlate various kinds of

symptoms to fault-proneness...

That’s great!

M. Di Penta

Poor design!

Incompatible licensing!

Poorlexicon!

Propagateclone changes!

Codeis getting too

complex!

You’vejust changed

a pointer ref.!You’re

touchingtoo many

files!28

M. Di Penta

That’s too much!• We could build models that warn the developer

against anything

• It would be better to

• Avoid information overload [Murphy, 2007]

• Avoid false alarms based on common wisdom

• Provide hints at the right time, in the right context

• Also, we should qualitative justification to our models

• To at least justify the cause-effect relation

29

M. Di Penta

False Alarm: Clones

• Common wisdom suggests that code cloning could be harmful

• Recent (and past) studies suggested clones are not necessarily harmful[Kapser and Godfrey, 2008, and Krinke, 2007, Koschke and Gode, 2011]

• Koschke and Gode reported that only 15% of clones undergo unintended inconsistent changes

• Developers use cloning as a development practices30

M. Di Penta

Clone evolution patterns

31

CFx

CFy

S0 S1 S2

Consistent change

CFx

CFy

S0 S1 S2

Late propagation

CFx

CFy

S0 S1 S2

Late propagation

CFx

CFy

S0 S1 S2

Independent evolution

36

M. Di Penta

Late propagation of clone changes could be risky...

• A tale of late propagation in PostgreSQL

• The modules parse_oper.c and parse_func.c contain two block size clones.

• August, 26 1999: the first underwent to a bug fixing

• February, 20 2000: the same bug was discovered six months later on the other clone

• CVS commit note: “...I had previously fixed the identical bug in oper_select_candidate, but didn't realize that the same error was repeated over here...”

32

M. Di Penta

... but it does not happen quite often!

0%

20%

40%

60%

80%

ArgoUML JBoss OpenSSH PostgreSQL

1%1%5%4%

16%

4%3%3% 6%0%2%4%

39%

24%

52%

34%38%

71%

40%

55%

Consistent ChangesIndependent Evolution(Quick) late propagationLate propagationN/A

Suresh Thummalapenta, Luigi Cerulo, Lerina Aversano, Massimiliano Di Penta: An empirical study on the maintenance of source code clones. Empirical Software Engineering 15(1): 1-34 (2010)

33

% o

f clo

ne c

lass

es

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/t/Thummalapenta:Suresh.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/t/Thummalapenta:Suresh.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Cerulo:Luigi.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/c/Cerulo:Luigi.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Aversano:Lerina.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Aversano:Lerina.html

http://www.informatik.uni-trier.de/~ley/db/journals/ese/ese15.html#ThummalapentaCAP10

http://www.informatik.uni-trier.de/~ley/db/journals/ese/ese15.html#ThummalapentaCAP10

M. Di Penta

Right information at the right time

“Continuous” reverse engineering

exploiting developer feedbacks/interactions

34

FoSE - ICSE 2007 Gerardo Canfora 15

class foo{ void m1(){…} void m2(){…} void m3(){…} }

Interactive Reverse engineering

Feedback to !

the heuristic!

Evolutionary!Development!

class foo{ void m1A(){…} void m2(){…} } class bar extends foo{ void m1B(){…} void m3(){…} }

class foo{ void m1(){…} void m2(){…} void m3(){…}}

class foo{ void m1A(){…} void m2(){…} }class bar extends foo{ void m1B(){…} void m3(){…}}

M. Di Penta




34




Feedback to !

the heuristic!




InteractiveReverse engineering


M. Di Penta




34




Feedback to !

the heuristic!






M. Di Penta




34




Feedback to !

the heuristic!






Metrics

Lexicon hints

Clone info

M. Di Penta




34




Feedback to !

the heuristic!





Feedback to

the heuristic


Metrics

Lexicon hints

Clone info

Hints to improve lexicon quality: COCONUT

1. The Administrator activates the add member function in the terminal of the system and correctly enters his login and password identifying him as an Administrator.

2. The system responds by presenting a form to the Administrator on a terminal screen. The form includes the first and last name, the address, and contact information (phone, email and fax) of the customer, as well as the fidelity index. The fidelity index can be: New Member, Silver Member, and Gold Member. After 50 rentals the member is considered as Silver Member, while after 150 rentals the member becomes a Gold Member. The system also displays the membership fee to be paid.

3. The Administrator fills the form and then confirms all the requested form information is correct.

addmember.txt

Andrea De Lucia, Massimiliano Di Penta, Rocco Oliveto: Improving Source Code Lexicon via Traceability and Information Retrieval. IEEE Trans. Software Eng. 37(2): 205-227 (2011)

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/l/Lucia:Andrea_De.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/l/Lucia:Andrea_De.html



http://www.informatik.uni-trier.de/~ley/db/journals/tse/tse37.html#LuciaPO11

http://www.informatik.uni-trier.de/~ley/db/journals/tse/tse37.html#LuciaPO11

Suggesting identifiers from use cases...

1. The Administrator activates the add member function in the terminal of the system and correctly enters his login and password identifying him as an Administrator.

2. The system responds by presenting a form to the Administrator on a terminal screen. The form includes the first and last name, the address, and contact information (phone, email and fax) of the customer, as well as the fidelity index. The fidelity index can be: New Member, Silver Member, and Gold Member. After 50 rentals the member is considered as Silver Member, while after 150 rentals the member is a Gold Member. The system also displays the membership fee to be paid.

3. The Administrator fills the form and then confirms all the requested form information is correct.

addmember.txt

Better lexicon...

...and some comments

Explaining your model

...the long road towards causation

M. Di Penta

• We often observe two or more variables

• We correlate them

• …or even build prediction models that actually work pretty well J

• So… everything looks pretty nice…

• We got a strong paper… but…

Typical habits…

40

M. Di Penta

The bad part…• We know that for sure we are missing something

• Do classes change more/exhibit bugs because of certain metrics?

• Or was that because of the the introduction of an additional conditional in the code?

• Do antipatterns make systems more change-prone?

• .. or rather they change because they have to…

41

http://www.google.ro/imgres?imgurl=http://www.thecenter2000.com/ursa/confused.jpg&imgrefurl=http://www.thecenter2000.com/ursa/tour1.htm&usg=__a8t5w3yNWkaM0L5N1PrbzaBbW-M=&h=400&w=480&sz=22&hl=ro&start=34&zoom=1&tbnid=rsvTRzR6GihS3M:&tbnh=148&tbnw=178&prev=/images%253Fq%253Dconfused%2526um%253D1%2526hl%253Dro%2526client%253Dfirefox-a%2526rls%253Dorg.mozilla:en-US:official%2526channel%253Ds%2526biw%253D1330%2526bih%253D658%2526tbs%253Disch:1&um=1&itbs=1&iact=hc&vpx=1052&vpy=175&dur=363&hovh=148&hovw=178&tx=177&ty=88&ei=6YuTTIjQPJi8jAfRtdyrBQ&oei=GYuTTITpPMTTjAf7qvWKBQ&esq=9&page=2&ndsp=19&ved=1t:429,r:5,s:34





M. Di Penta

Ambiguity about direction of casual influence

• A causes B, B causes A, or X causes A and B?

• e.g. correlation between complexity and fault-proneness

• Complexity causes fault-proneness… (A)

• Could it be that fault-prone code (B) tend to be on average more complex (A)?

• Or else problem-specific factors (X) make code more complex (A) and fault-prone (B)

42

M. Di Penta

Meaningless models easy to find

LOC

CK

McCabe

R2=0.90

R2=0.70

43

M. Di Penta

... as already explained yesterday!

44

Failure is a Four-Letter Word – A Parody in Empirical Research –

Andreas Zeller* Saarland University

Saarbrücken, Germany [email protected]

Thomas Zimmermann Microsoft Research Washington, USA

[email protected]

Christian Bird Microsoft Research Washington, USA

[email protected]

ABSTRACT Background: The past years have seen a surge of techniques predicting failure-prone locations based on more or less complex metrics. Few of these metrics are actionable, though. Aims: This paper explores a simple, easy-to-implement method to predict and avoid failures in software systems. The IROP method links elementary source code features to known software failures in a lightweight, easy-to-implement fashion. Method: We sampled the Eclipse data set mapping defects to files in three Eclipse releases. We used logistic regression to as-sociate programmer actions with defects, tested the predictive power of the resulting classifier in terms of precision and recall, and isolated the most defect-prone actions. We also collected initial feedback on possible remedies. Results: In our sample set, IROP correctly predicted up to 74% of the failure-prone modules, which is on par with the most elaborate predictors available. We isolated a set of four easy-to-remember recommendations, telling programmers precisely what to do to avoid errors. Initial feedback from developers suggests that these recommendations are straightforward to follow in practice. Conclusions: With the abundance of software development data, even the simplest methods can produce “actionable” results.

Categories and Subject Descriptors D.2.8 [Software Engineering]: Metrics – process metrics, prod-uct metrics; K.3.2 [Computers and Education]: Computer and Information Science Education – computer science education; K.7.4 [The Computing Profession]: Professional Ethics – codes of good practice;

General Terms Measurement, Experimentation

Keywords Empirical Research, Parody

1. INTRODUCTION In empirical software engineering, it is a long-standing observa-tion that failures follow a Pareto distribution: The largest part of software defects occurs in a small fraction of software compo-nents. Therefore, research has concentrated on identifying fea-tures that correlate with the presence of software defects – fea-tures such as the number of changes, code complexity, or the

number of developers associated with a file. As elaborate as these approaches may be, they all share the same problem which we call the cost of consequence: If I know that a module is failure-prone because it frequently changes, should I stop changing it? If I know failures are related to complexity, should I rewrite it from scratch? Any of these measures induces a new risk – a risk which may be greater than the one originally addressed.

In this paper, we take a different approach. We predict failures from the most basic actions programmers undertake, focusing on the actions that introduce defects as they are being made – literal-ly at the moment the source code is typed in. Our recommenda-tions are immediately actionable: A simple visual representation associates actions with the likelihood of introducing defects – warning programmers before they might hit the wrong key. Our approach is both effective and efficient: In a case study on the Eclipse failure set, it correctly identified up to 74% of the failure-prone modules, which is on par with the most elaborate predictors available. Specifically, our contributions include: 1) A novel mechanism to associate programmer actions with

software defects; 2) A predictor that is purely text-oriented, thus lightweight,

real-time, easy to implement, and language-agnostic; 3) A set of easy-to-remember recommendations, validated on

the well-known Eclipse dataset. The remainder of this paper is organized as follows: We start with motivating our approach (Section 2), linking basic program fea-tures to failures. Section 3 evaluates our approach on the Eclipse bug data set, reaching new heights in accuracy. Section 4 dis-cusses threats to validity, followed by an outline of future work in this area in Section 5. *

2. THE IROP APPROACH Empirical research has long focused on finding abstractions that would correlate with failures – in the hope that addressing these abstractions would also get rid of the failures. In the end, though, all these abstractions (just like software as a whole) are nothing but the product of elementary programmer actions such as open-ing files, writing tests, or running programs. To change pro-grammer behavior for the good, we must act at an abstraction level where such change is actually feasible. (Clearly, we cannot prohibit programmers from opening files!) Interestingly enough, it is the lowest abstraction layers where change becomes actionable. In the end, we can express program-mer actions as a series of low-level human-computer interactions, such as moving the mouse, or typing on the keyboard. The latter

* Andreas Zeller was a visiting researcher with Microsoft Re-

search, Washington, USA while the research leading to this pa-per was conducted.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PROMISE '11, September 20-21, 2011, Banff, Canada Copyright 2011 ACM 978-1-4503-0709-3/11/09... $10.00.

M. Di Penta

We cannot claim for causation…

• We know well that we would never be able to really claim for causation

• Solid studies that found significant correlations are useful

• Especially if multiple studies show consistent results

• Replication is therefore important!

• To make them more useful, we should try to find some qualitative explanation of our findings

45

Some key ingredients...


MailsVersioning

Bugtracking

Quantitative+qualitative analysis


MailsVersioning

Bugtracking


Interviewing/surveying developers


MailsVersioning

Bugtracking


Interviewing/surveying developers

Appropriate statistics

M. Di Penta

Capturing temporal relations

• Multivariate time series and Granger’s causality test

• H0: f1 does not cause f2 (α1=α2=...=αp=0)

• Used as a complement to association rules [Ying et al., 2004, Zimmermann et al., 2005] for change impact analysis

Gerardo Canfora, Michele Ceccarelli, Luigi Cerulo, Massimiliano Di Penta: “Using Multivariate Time Series and Association Rules to Detect Logical Change Coupling: an Empirical Study” - ICSM 2010

f2(t) = c1 + ↵1f1(t� 1) + ↵2f1(t� 2) + · · ·+ ↵pf1(t� p) +

+�1f2(t� 1) + �2f2(t� 2) + · · ·+ �pf2(t� p) + u(t)

47

M. Di Penta

Association rules vs. GrangerA

B

C

D

E

A

C

A

C

B

D

B

D

E

D

E

C

A

D

S1 S2 S3 S4 S7

E

S5 S6 S8 S9


File

s

48

M. Di Penta


B

C

D

E

A

C

A

C

B

D

B

D

E

D

E

C

A

D

S1 S2 S3 S4 S7

E

S5 S6 S8 S9


File

s

Association rules: A→C, B→D, D→E

48

M. Di Penta


B

C

D

E

A

C

A

C

B

D

B

D

E

D

E

C

A

D

S1 S2 S3 S4 S7

E

S5 S6 S8 S9


File

s

Association rules: A→C, B→D, D→E

48

Granger causality test: A→{B,D}, C→{D,E}

M. Di Penta

Granger is complementary to association rule discovery

Mylyn impact sets

Top N artifacts

True

pos

itive

s

49

M. Di Penta

Where is Granger helping out?

Example from Samba

errors.c…

27 August 2001

auth_domain.cauth_server.cauth_rhost.cauth_unix.c

auth_smbpasswd.c

“smbd/auth server: Doco we want to use cli_nt_error

here soon smbd/password.c…”

8 August 2001

“… added automatic mapping between dos and nt error

codes…”

50

M. Di Penta

Thus...

• We should look at statistical models we did not use so far...

• ... plus, mining software repositories offer us great opportunities to provide justifications to our data

• but....

51

Perils in mining software repositories

M. Di Penta

Quality of data sets• Models we build strongly depend on data sets we use

• Great keynote talk by M. Shepperd at WetSOM 2011, May 2011, Honolulu

• ...and other work from the same and other authors

• Gernot Armin Liebchen, Bhekisipho Twala, Martin J. Shepperd, Michelle Cartwright, Mark Stephens: Filtering, Robust Filtering, Polishing: Techniques for Addressing Quality in Software Data. ESEM 2007: 99-106

• Yesterday talk about missing data:

• Wen Zhang, Ye Yang and Qing Wang.: Handling missing data in software effort prediction with naive Bayes and EM

53

Focus on data sets from software repositories

Four problems among others...

M. Di Penta

Fixing-bug changes are identified by commit notes containing bug ids

Fact: there are are many bug fixes for which the bug id is not mentioned in the commit note

Issue I: Missing Links

nmbd_incomingdgrams.c: Fix bug with Syntax 5.1 servers reported by SGI where they do host announcements to LOCAL_MASTER_BROWSER_NAME<00> rather than WORKGROUP<1d>

Quieten level 0 debug when probing for modules. We shouldn't display so loud an error when a smb_probe_module() fails. Also tidy up debugs a bit. Bug 375.

Adrian Bachmann, Christian Bird, Foyzur Rahman, Premkumar T. Devanbu, Abraham Bernstein: The missing links: bugs and bug-fix commits. SIGSOFT FSE 2010: 97-106

Christian Bird, Adrian Bachmann, Eirik Aune, John Duffy, Abraham Bernstein, Vladimir Filkov, Premkumar T. Devanbu: Fair and balanced?: bias in bug-fix datasets. ESEC/SIGSOFT FSE 2009: 121-130

55

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/b/Bachmann:Adrian.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/b/Bird:Christian.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/r/Rahman:Foyzur.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/b/Bernstein:Abraham.html

http://www.informatik.uni-trier.de/~ley/db/conf/sigsoft/fse2010.html#BachmannBRDB10

http://www.informatik.uni-trier.de/~ley/db/conf/sigsoft/fse2010.html#BachmannBRDB10





http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Aune:Eirik.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Aune:Eirik.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/d/Duffy:John.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/d/Duffy:John.html



http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/f/Filkov:Vladimir.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/f/Filkov:Vladimir.html

http://www.informatik.uni-trier.de/~ley/db/conf/sigsoft/fse2009.html#BirdBADBFD09

http://www.informatik.uni-trier.de/~ley/db/conf/sigsoft/fse2009.html#BirdBADBFD09

M. Di Penta

Issue II: Incorrect Classification

• Bug tracking systems contain various kinds of changes

• Classified using inadequate fields, or just poorly and subjectively classified

56

M. Di Penta

Issue II: Incorrect Classification

• Bug tracking systems contain various kinds of changes

• Classified using inadequate fields, or just poorly and subjectively classified

56

M. Di Penta

Results of a manual classification• We manually classified 1,800

randomly selected bugs from Mozilla, Eclipse, JBoss

• Not marked as “Enhancement”

• Classification performed by 3 different people

• Discussion held in case of different classification

0

150

300

450

600

Mozilla Eclipse JBoss

156

24

121

99

382

209

345194270

Bugs Non bugsOthers

Giuliano Antoniol, Kamel Ayari, Massimiliano Di Penta, Foutse Khomh, Yann-Gaël Guéhéneuc: Is it a bug or an enhancement?: a text-based approach to classify change requests. CASCON 2008: 23

57



http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Ayari:Kamel.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Ayari:Kamel.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/k/Khomh:Foutse.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/k/Khomh:Foutse.html



http://www.informatik.uni-trier.de/~ley/db/conf/cascon/cascon2008.html#AntoniolAPKG08

http://www.informatik.uni-trier.de/~ley/db/conf/cascon/cascon2008.html#AntoniolAPKG08

M. Di Penta

Issue III: Irrelevant changes

• We count commits as proxy of amount of changes

• Many commits are related to formatting, change of copyright year, commenting, refactoring

• Kawrykow et al. (2011) developed an approach to identify non-essential changes (3%-15% of total in their study)

• They pruned out them to build better change impact prediction (-20% of erroneous and -4% of true recommendations)

• Issue: What is irrelevant for our study?

David Kawrykow, Martin P. Robillard: Non-essential changes in version histories. ICSE 2011: 351-360

58

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/k/Kawrykow:David.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/k/Kawrykow:David.html

http://www.informatik.uni-trier.de/~ley/db/conf/icse/icse2011.html#KawrykowR11

http://www.informatik.uni-trier.de/~ley/db/conf/icse/icse2011.html#KawrykowR11

M. Di Penta

Issue IV: Secret Life• Software repositories do not capture everything

of a software project

• Not all discussions, not all decisions, and after all also not all changes

• This could be especially true in industrial projects [Aranda and Venolia, 2009]

• Should be less common in FLOSS

Jorge Aranda, Gina Venolia: The secret life of bugs: Going past the errors and omissions in software repositories. ICSE 2009: 298-308

59

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Aranda:Jorge.html

http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/a/Aranda:Jorge.html

http://www.informatik.uni-trier.de/~ley/db/conf/icse/icse2009.html#ArandaV09

http://www.informatik.uni-trier.de/~ley/db/conf/icse/icse2009.html#ArandaV09

How can I benefit of this model?

M. Di Penta

Model usability• A model should provide

developers with the right information

• List of files/classes that will likely exhibit a bug?

• Likelihood that a class exhibits a bug?

• Features that lead to bug prediction?

• Something about bug severity?

Developers are not necessarily scientists!61

M. Di Penta

Experimenting the usage of predictive models

• It is desirable to carry case study or (quasi) experiments or case studies to investigate how developers benefit of bug prediction models

• As for other software engineering artifacts

• e.g. design documents, comments, etc.

• Difficulties:

• Hard to think this can be done with students

• Controlled experiments performed in limited time frames not ideal for this kind of study

62

Conclusions

MailsVersioning

Bugtracking

DataModel

Recommendation to developers

MailsVersioning

Bugtracking

DataModel


Data quality/bias

MailsVersioning

Bugtracking

DataModel


Data quality/bias

Capturing the right symptoms

MailsVersioning

Bugtracking

DataModel


Data quality/bias

Modelexplanation/”causation”


MailsVersioning

Bugtracking

DataModel


Data quality/bias


Better models e.g. capturing

temporal relations


MailsVersioning

Bugtracking

DataModel


Data quality/bias



temporal relations

ModelUsability


MailsVersioning

Bugtracking

DataModel


Data quality/bias



temporal relations

ModelUsability


Providing contextual suggestions

Technology

Promise 2011: Keynote 2 - "Nothing else Matters: What Predictive Model should I use?"