Detecting Occurrences of Refactoring with Heuristic Search

Preview:

DESCRIPTION

Presented at APSEC 2008http://dx.doi.org/10.1109/APSEC.2008.9

Citation preview

TOKYO INSTITUTE OF TECHNOLOGY

DEPARTMENT OF COMPUTER SCIENCE

Shinpei Hayashi Yasuyuki Tsuda Motoshi Saeki

Department of Computer Science

Tokyo Institute of Technology

Japan

Detecting Occurrences

of Refactoring

with Heuristic Search

2008. 12. 5

APSEC 2008

Detecting Occurrences of Refactoring− Detecting where to and what kinds of refactoring

were performed between two ver. of programs

− From version archives such as CVS/Subversion

with Heuristic Search− We use a graph search technique for detecting

impure refactorings

Results− (Semi-)automated tools have been developed

− We have a simple case study

Abstract

3

It is important to reduce the cost for

understanding the changes of a program,

which is continually modified

Background

use

Program S

Program L

(a library)

modify

FOSS project(FOSS: Free/Open Source Software)

I have to follow the changes of L...

What do the changes mean? ver. 123

ver. 124

4

Categorization of modifications is useful− Detecting refactorings between two versions of a

program

Aim

use

Program S

Program L

(a library)

modify

FOSS project(FOSS: Free/Open Source Software)

The changes are:

Extract Method +

Move Method + α!ver. 123

ver. 124

5

Related work

− with software metrics [Demeyer 2000]

− by checking pre/post-conditions of source

code properties [Weißgerber 2006]

Key issue

− detecting impure refactorings [Görg 2005]

• refactoring + refactoring

• refactoring + other modifications

Detecting Refactorings

6

Example from Fowler’s book [Fowler 1999]

Motivating Scenario

Customer

statement()

RentalRental

getCharge()

Customer

statement()

Customer

statement()

getCharge(Rental)

n0 nmn1Extract

Method

Move

Method

Rental

7

Scenario: Lost of States

Customer

statement()

Rental

getCharge()

Customer

statement()

Customer

statement()

getCharge(Rental)

n0 n2n1Extract

Method

Move

Method

Version

ArchiveRold Rnew

commit commitlose

intermediate state

Rental Rental

Version

Archive

8

Hard to detect refactorings via mixed differences

− Considering intermediate states is required

Scenario: Diffs between revs

Customer

statement()

Rental

getCharge()

Customer

statement()

Rold Rnew

1. Some code fragments in

Customer#statement is removed

2. a method invocation to

Rental#getCharge is added to

statement

3. Rental#getCharge is added

Rental

Mixed differences

9

Generating intermediate states by actually

applying refactorings to the program

Finding an appropriate path from Rold (= n0) to

Rnew (= nm) by a graph search technique

Our Approach

Initial state

(Rold)Final state

(Rnew)

n0 nm

States (nodes): versions of the program

Transitions (edges): refactoring operations

10

1. Find likely refactorings

2. Evaluate the distance to the nm

3. Apply the best one and generate new state

Procedure

Initial state

(Rold)Final state

(Rnew)

n0 nm4

6

8

11

Procedure

Initial state

(Rold)Final state

(Rnew)

n0 nm

1. Find likely refactorings

2. Evaluate the distance to the nm

3. Apply the best one and generate new state

6

8

23

3

Terminate if the new state almost equals to Rnew

12

1. How to find candidates of refactorings?− They should be similar to the changes for nm

2. How to evaluate them?− The best one should generate new state closer to nm

We use structural differences between two states

Efficient Search

Initial state

(Rold)Final state

(Rnew)

n0 nm

6

8

23

3

13

Calculated by comparing two AST− 4 types: add, remove, change, move

Structural Differences

change(Customer, the name customerID, the name id)

add(FieldDeclaration int[] phoneNum, Customer)

public class Customer {int customerID;String name;

}

public class Customer {int id;String name;int[] phoneNum;

}

14

By matching between diffs. (D) and modifications

representing a refactoring operation (R)

− if a subset of D matches a subset of R, the refactoring

operation is expanded.

− Matching likelihood: (# of matched modifications) / (# of R)

Find New Refactorings

remove(Block, Customer#statement)

add(MethodInvocation, Rental#statement)

change(Rental, the name “n1”, the name“n2”)

The differences (D):

remove(Block, ClassA#method1)

add(MethodDeclaration, ClassA)

Extract Method (R):

nm

Matching likelihood:

0.5 (1/2)

15

Using f(n, o) = g(n) + h(n) / α(o)− g(n): # of applied refactorings for obtaining n

− h(n): the size of differences between n and nm

− α(o): likelihood of o

f(n2, o6) = 2 + 3 / (1/2) = 8

Evaluation

Initial State

(Pold)

Goal State

(Pnew)

n0 nm

n2

n1

g(n2) = 2 h(n2) = 3

α(o6) = 1/2

o1

o2

o3

o4

o5

o6

16

Expanding and evaluating candidates of

refactorings with REUSAR (implemented)

− Input: Two versions of Java source code

− Output: Candidates of refactorings with priority

− Calculating differences with XMLdiff (an existing tool)

Applying refactorings by Eclipse− Checking pre/post-conditions

− Modifying source code

Supporting Tools

17

Applying our technique to an

existing version archive, REUSAR(supporting tool in this study)

Case Study

DistanceCalculator

calculateDistance(List<Diff>)

R1796 R17991. Rename Method

2. Remove Parameter

DistanceCalculator

calcDistance()

18

Case Study: Result

n0 n2

Rename

Method

Remove

Parameter

Extract

Method

Remove

Parameter

Extract

Method

n1

R1796 R1799

Our technique is effective− It can correctly detect impure refactorings which were

actually performed

− Prioritizing the candidates reduces # of applications of

refactoring operations

4

4

8

7

4

19

Summary

− Detecting refactorings by a graph search,

based on heuristics with structural

differences between two programs

− Our technique can detect impure

refactorings

Future Work

− Tool integration

− Evaluation: larger-scale case study

Conclusion

Recommended