Issues of consistency in defining slices for slicing metrics: ensuring comparability in research findings

Issues of consistency in defining slices for slicing metrics: ensuring comparability in

research findingsTracy Hall,

Brunel UniversityDavid Bowes,

University of HertfordshireAndrew Kerr,

University of Hertfordshire

Why are we interested in replicating slices? What are slice-based coupling and cohesion

metrics? What did Meyers & Binkley do in their

study? What did we do in our replication of M&B’s

study? How do our results compare to M&B’s? Do slice results matter? What are the implications of our findings?

Schedule

Aimed to investigate whether sliced-based metrics can predict fault-prone code.

We needed to validate that we were collecting slice-based metrics data correctly.

Tried to identically re-produce Meyers and Binkley’s (2004, 2007) metrics values

Our replication highlights many ways in which the identification of program slices can vary.

Our results identify a need for consistency and/or full specification of slicing variables.

Why are we interested in replicating slices?

What are slice-based metrics? Original set of cohesion metrics proposed

by Weiser in 1981 and extended by Ott et al in the 1990’s

Harman et al. (1997) introduced slice-based coupling.

Green et al (2009) present a detailed overview showing the evolution of slice-based coupling and cohesion metrics.

Slice-based coupling metrics

Meyers and Binkley (2007, p.8), use Harman et al.’s (1997) definition of coupling to define the coupling of a function f to be a weighted average of its coupling to all other functions in the program:

Cohesion metric definition (Ott & Thuss, 1993)

Average ratio of the size of a slice to the size of the module. The average length of each slice compared to the length of the module

Smallest ratio of the size of a slice to the size of the module. The ratio of the shortest slice compared to the length of the module

Largest ratio of the size of a slice to the size of the module. The ratio of the longest slice compared to the length of the module

Average ratio of the size of the intersection to the size of a slice. The average proportion of common slices compared to each slice

Ratio of the size of the intersection to the size of the module. The proportion of the module which is common to all slices

Slice-based cohesion metrics

Meyers and Binkley (2004, 2007) first to collect and analyse large scale slice-based metrics data

Collected slice-based metrics data on 63 open source C projects.

Produced a longitudinal study showing the evolution of coupling and cohesion over many releases of Barcode and Gnugo projects

Used CodeSurfer to slice Wrote scripts to collect slice-based metrics

data

What did Meyers & Binkley do in their study?

Fermats Last Theorem“I have discovered a truly marvelous proof that it is impossible to separate a cube into two cubes, or a fourth power into two fourth powers, or in general, any power higher than the second into two like powers. This margin is too narrow to contain it.” (1637)

Replicated Wiles A (1995)

The problem in replicating studies

Insufficient space in a published paper to describe the methods to allow for replication….

Replicated only M&B’s longitudinal results for the evolution of cohesion in Barcode

Barcode has 65 functions & 49 releases The highest preset build option was used

on CodeSurfer We tried to replicate the method reported

by M&B. We discussed with Dave Binkley

methodological issues that were unclear. We wrote our own Scheme scripts (and were

provided with scripts from CREST (Youssef))

What did we do in our replication?

Longitudinal cohesion

Barcode - M&B Results Barcode - Our results

0.6

0.7

0.8

0.9

1.0

0.89 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99

Tightness Min Coverage Coverage Max Coverage Overlap

Longitudinal cohesion

Barcode - M&B Results Barcode – Our results (full vertex removal)

0.6

0.7

0.8

0.9

1.0

0.89 0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99

Tightness Min Coverage Coverage Max Coverage Overlap

Trying to understand where we were going wrong…Looked in detail at one

data point (release 0.98)

Tried to examine all variations in the way that this data point could be calculated.

We sliced both on files and on projects

We varied the way lines of code are included in slices using:1. Formal Ins: Input

parameters for the function specified in the module declaration.

2. Formal Outs: Return variables.

3. Globals: Variables used by or affected by the module.

4. Printf: Variables which appear as Formal Outs in the list of parameters in an output statement.

(based on the variations reported in previous studies analysed by Green et al 2009)

Combinations of slicing settings testedIndividual slicing settings selected

Formal Ins

Formal outs

Globals Printf

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

NB all these settings were sliced both on a file and project basis

Not possible

Average module metrics for different combinations of variables

Variables Average module metrics

Sliced as a project Files sliced individually

I O G pF Over- lap

Tight- ness

Cover-age

Min C

Max C Over- lap

Tight- ness

Cover-age

Min C Max C

0.859 0.814 0.919 0.828 0.984 0.649 0.481 0.691 0.523 0.901

0.861 0.820 0.926 0.833 0.984 0.643 0.482 0.705 0.524 0.901

0.903 0.857 0.917 0.870 0.984 0.712 0.551 0.717 0.588 0.898

0.905 0.852 0.926 0.863 0.977 0.759 0.563 0.712 0.587 0.892

0.898 0.837 0.918 0.842 0.966 0.745 0.519 0.671 0.543 0.845

0.911 0.869 0.929 0.881 0.984 0.728 0.560 0.743 0.590 0.898

0.891 0.840 0.927 0.852 0.981 0.772 0.518 0.653 0.538 0.820

0.947 0.895 0.928 0.905 0.975 0.839 0.672 0.764 0.694 0.885

0.920 0.844 0.915 0.847 0.953 0.767 0.521 0.653 0.544 0.761

0.911 0.869 0.929 0.881 0.984 0.728 0.560 0.743 0.590 0.898

0.949 0.883 0.914 0.886 0.956 0.820 0.591 0.688 0.610 0.792

0.972 0.929 0.951 0.933 0.975 0.944 0.823 0.856 0.832 0.885

1.000 0.897 0.897 0.897 0.897 1.000 0.612 0.612 0.612 0.612

0.907 0.859 0.941 0.866 0.971 0.851 0.538 0.639 0.547 0.717

0.917 0.851 0.896 0.866 0.968 0.749 0.464 0.597 0.496 0.778

I = Formal Ins, O = Formal Out, G = Globals, pF=printf; NB: Both forward and backward slices were used in all cases.

Meyers & Binkley results: O=0.51 T=0.26 cov=0.54 min=0.30 max=0.71

What issues impact on slice-based data?

Only use pdgs which are 'user-defined‘ and remove pdgs with zero vertices Keep globals identified n times? String constants considered as output variables (?) Slices are based on both data and control edges Slices of length zero are removed (would have a significant impact on tightness) Intersect all slices with the pdg vertices to remove vertices found outside of the pdg Remove vertex indices with an identifier <1 Remove vertices associated with body '{' and '}' Declaration vertices removed as not consistently included with forward and back

slices Return has auto generated value so if a variable is output via a global or written as

well as returned the script may catch the same (source code) variable twice. Global outputs from a function f include globals modified transitively by calls from f

("outgoing variables"), resulting in numerous slices. Selection of actual inputs to output functions is naïve; sometimes we may want

format string in printf statements Dealing with placeholder functions: if they have size zero after vertices are pruned

they are ignored Should only some types of variables not be included in slicing criteria, e.g. string

type? Should forward slices use may-kill or declaration vertices?

Time for variant performance analysis? Slide 19

For slice-based metrics:◦ Specifying precisely all parameters of a slice and

a metric is important but difficult.◦ Identifying the ‘best’ variant of a metric may be

useful. For replicating studies:

◦ Studies need to publish basic information that allows replication

For Software Engineering◦ We need to build bodies of evidence and this

must include replicated studies.

What are the implications of our findings?

References1. Green, P., Lane, P., Rainer, A., Scholz, S.-B. (2009). An

Introduction to Slice-Based Cohesion and Coupling Metrics. Technical Report No. 488, University of Hertfordshire, School of Computer Science.

2. Harman, M., Okunlawon, M., Sivagurunathan, B., Danicic, S. (1997). Slice-Based Measurement of Coupling. IEEE/ACM ICSE workshop on Process Modelling and Empirical Studies of Software Evolution, (pp. 28-32). Boston, Massachusetts.

3. Meyers, T. M., Binkley, D. (2004) A Longitudinal and Comparative Study of Slice-Based Metrics. International Software Metrics Symposium, Chicargo, USA, IEEE Procs

4. Meyers, T. M., Binkley, D. (2007). An Empirical Study of Slice-Based Cohesion and Coupling Metrics. ACM Transactions on Software Maintenance, 17(1), pp. 1-25.

5. Ott, L. M., &Thuss, J. J. (1993). Slice Based Metrics for Estimating Cohesion. In Proceedings of Internationl Software Metrics Symposium, Proceedings of the IEEE-CS, 71—81

Any questions?

Tracy HallReader in Software EngineeringBrunel UniversityUxbridge, [email protected]

David BowesSenior Lecturer in ComputingUniversity of HertfordshireHatfield, [email protected]

The impact of slice variants Some variants have a better relationship

with fault-prone code than other varients…

Another Cohesion metric: ◦ Proposed by Counsel et al 2006

Adapted for program slices : l= number of slicesk = number of vertices in the modulec = is the number of vertices for the slice based on

<variable, locus>j

Normalised Hamming Distance

Documents

Issues of consistency in defining slices for slicing metrics: ensuring comparability in research findings