Department of Computer Science, Graduate School of Information Science & Technology,Osaka University
Program Slicing on Java byte-code for Locating Functional Concerns
Takashi Ishio† Ryusuke Niitani †
Gail Murphy‡ Katsuro Inoue †
† Osaka University, Japan‡ University of British Columbia, Canada
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Concern Location
A functional concern is code that helps fulfill a functional requirement. A software maintenance task usually focuses on a functi
onal concern.
Concern location comprises “Search and Explore.” Search “interesting” methods
grep or other feature location tools
Explore the interaction among the methods call graph, class hierarchy tree, cross reference
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Example: Autosave function in jEdit
jEdit periodically saves the contents of text area. A user can specify the frequency.
We can easily find
Autosave class,
Buffer.autosave() method and
BufferIORequest.autosave() method.
How the classes and methods are interacting?
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Exploring Interaction among methods
Important information: control-flow and data-flow. Which method triggers the autosave function. Which class has a necessary data (e.g. filename). How a method saves the contents to a text file.
We have to read following classes:
Autosave, Buffer, BufferIORequest, PerspeciveManager, VFSManager, FileVFS …
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Automated Concern Location
We are trying to extract a concern graph from code fragments specified by a developer. Our approach is based on
program slicing.Our tool is based on
Soot, a Java bytecode analysis framework.
Program Slicing with Heuristics
Slice-to-ConcernGraphTranslation
Code fragmentsrelated to a functionality
A concern graph
a program slice
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Autosave concern graph
Input = Autosave.*(), Buffer.autosave(), BufferIORequest.autosave()
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Program Slicing
Slicing extracts statements related to criteria statements specified by a user.
1. A program P is converted to a program dependence graph. vertices: statements in P edges: control/data dependence relations
2. A user specifies “slicing criteria” statements in P. The statements are translated into “criteria vertices” in the PDG.
3. A program slice, a set of statements that affect or depend on criteria, is extracted by graph traversal from criteria vertices.
1 i = 3;2 if (a > 0) {3 print i;4 }
data dependence
<3,i>
controldependence
use
definition
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slice including unrelated concerns
Slicing usually extracts many statements.A functional unit is connected to other units by
control/data-flow.28% on average in C program†
† Binkley, D., Gold, N. and Harman, M.: An Empirical Study of Static Program Slice Size. ACM TOSEM Vol.16, No.2, Article 8, April 2007.
Autosave UndoManagerautosave_dirty
flagactivate set/reset
reset
CompleteWord
set
slicing
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing with Barriers
A barrier is a vertex or an edge that terminate graph traversal†.
† Krinke, J.: Slicing, Chopping, and Path Conditions with Barriers.Software Quality Journal, Vol.12, No.4, pp.339-360,December 2004.
Autosave UndoManagerautosave_dirty
flagactivate set/reset
reset
CompleteWord
set
A barrier blocks graph traversal.slicing
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Similarity-based Barrier
The key idea is following: if two methods are contributing to the same functionality,
the methods use similar methods, fields and classes.
Name Set NS(m) = a set of types, classes, methods and fields referred in m. A long name is “tokenized”.
e.g. “java.io.File” “java”, “io”, “File”, “java.io.File”
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Example of Similarity
package org.gjt.sp.util;class IntegerArray {private int[] array;private int len;public void add(int num) { if(len >= array.length) { int[] arrayN = new int[len * 2]; System.arraycopy(array,0,arrayN,0,len); array = arrayN; } array[len++] = num;}public final int getSize() { return len;}public final void setSize(int len) { this.len = len;} }
org.gjt.sp.util.IntegerArray,org, gjt, sp, util, integer, array,void, add, int, len, int[], java.lang.System, java, lang, system,arraycopy
NS(IntegerArray.add)
NS(IntegerArray.getSize)
NS(IntegerArray.setSize)
org.gjt.sp.util.IntegerArray,org, gjt, sp, util, integer, array, getSize, get, size, int, len
org.gjt.sp.util.IntegerArray,org, gjt, sp, util, integer, array, len, setSize, set, size, void, int
sim = 0.801
sim = 0.639
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Identifying Barriers
Program slicing is blocked at method m if m is not related to slicing criteria
Similarity(m, C) threshold≦
A method m is related to slicing criteria if slicing criteria includes a method n such that m is similar to n.
C = a set of methods that contain slicing criteria vertices.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing algorithm
Slicing with summary edges
and barriers defined by Horwitz extended by Krinke
PDG based on Jimple code “jimple” is an intermediate represen
tation for bytecode. 3-address code Simple control-flow: “if” and “goto” Independent of JVM stack operation
Calculate similarityfor each method
Code fragmentsrelated to a functionality
a program slice
Identify barriers
Slicing with Barriers
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Visualizing a slice as a concern graph
Concern Graph A vertex is a class, a method or a field. An edge represents a relation between two vertices.
call, create, check, read, write, superclass, …
We applied rule-based translation.†
v1 in m1 v2 in m2m1 m2
call
call or parameter
Slice Concern Graph
† Kameda, D. and Takimoto, M.: Building Cocnern Graph Based on Program Slicing. IPSJ Transactions on Programming, Vol.46, No.11 (Pro 26), pp.45-56. in Japanese.
v1 in m1
READ obj.field
m1 fieldread
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
A graphical output with Graphviz
We omit intra-class edges in graphical format. Detail is provided in textual format.
e.g. “Autosave.setInterval(interval) calls
new Timer(interval, Autosave).”
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
The effectiveness of barriers
Barriers reduced concern graph size: 1000 methods 20 methods Printable on an A3 or A4-sized paper
Comparing extracted graphs
with hand-made concern
graphs (not finished yet).
Our previous experiment is reported in: 仁井谷竜介,石尾隆,井上克郎 :プログラムスライシングを用いた機能的関心事の抽出手法の提案と実装 .PPL 2007. in Japanese.
concern graph size on6 maintenance taskson jEdit and our Slicer
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Information extracted from Java program
To construct a dependence graphControl dependence relationData dependence relationCall Graph (with dynamic binding information)
To identify barriersa set of types, methods, fields referred in each
method m To slice the dependence graph
Mapping source code to vertices
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Slicing Tool Overview
JavaClassFiles
PDGSlicer
SlicingCriteria
ConcernGraph
PDGConstructor
Jimple3-Address
Code
Call GraphPoints-to Set
Soot Framework (http://www.sable.mcgill.ca/soot/)
SPARKPoints-to Set
Analysis
Control-FlowData-flowAnalysis
AnnotatedJimple
JimpleTranslator
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Our effort to implement the system
The program size PDG Construction: 2731 LOC (without comments) Slicing: 9296 LOC (without comments)
slicing algorithms, heuristic functions and concern graph translation
We could implement the PDG construction phase in two weeks: One week to understand how Soot works. The other week to implement code.
Soot enabled us to focus on the essential part of the research idea.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Advantage of Soot
A rich analysis toolkitSoot provides control-flow and data-flow for each
method.Jimple is simpler than source code and bytecode.
Complex Java statements are simplified during compilation.
Body Unit1 n
Stmt(Jimple code)
is-a
Value1 n
Expr
is-a
Control-flow
Data-flow
Local
ExceptionalUnitGraph
SmartLocalDefs
use
use
Method
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Limitation of Soot
Soot is not a program analysis framework.
Soot keeps all data in memory to compile Jimple code to bytecode after the optimization.
Soot requires 2-4GB RAM to analyze jEdit and JDK.
Soot supports only the simple workflow: whole program analysis (call-graph construction) followed by local program analysis.
We cannot implement a statistics tool (whole-program analysis) that uses the result of method-local analysis.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University
Summary
Concern location based on program slicing We introduced heuristics in order to extract a functional
concern of interest to a developer. Input is the same as a traditional program slicing.
Most of graphs can be printed on an A3-sized paper.
Soot framework reduced the implementation effort. Soot is a good framework, but
we hope a framework specialized for program analysis. easy-to-learn, extensible and scalable