Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
A New Perspective on Predicting Maintenance CostsFlorian Uunk
April 29, 2012Amsterdam, The Netherlands
External advisor: Rick KazmanUniversity of Hawai‘i
andSoftware Engineering Institute,
Carnegie-Mellon University
A thesis in partial fulfillment of the requirements for the degrees:
Master of Science in ComputerScience
Master of Science in SoftwareEngineering
Thesis supervisor: Patricia Lago Examiner: Ivica Crncovic
VU University Amsterdam Malardalen UniversityFaculty of Sciences School of Innovation, Design and
Engineering
This thesis is performed as part of the Double Degree program in Global Software Engineering (GSEEM).
Abstract
In my thesis I focus on providing a foundation of data on whichdecision makers can base refactoring decisions. For this, I examine therelationship between software complexity and maintenance effort. Tomake the data a stronger basis for refactoring decisions, I present anew approach of correlating file metrics to maintenance effort, whereI look at the relation between changes in file metrics over multiplereleases and changes in the maintenance effort spent on these files. Ido this using a broadened and, more complete notion of maintenanceeffort. I measure maintenance effort in 4 ways: the amount of lines ofcode that had to be changed to resolve tasks, the amount of discus-sion that tasks generated, the amount of atomic changes to a file thatwere required to resolve a task, and the amount of bugs per month.To test this framework, I extracted data from 3 open source projects,where I measured the variation of both complexity and maintenanceeffort, using this new notion of effort, over multiple releases, and in-vestigated their correlation. I found that 21 of the tested metrics weresignificantly correlated to the effort measures, where complexity basedmetrics and incoming propagation cost show the highest correlation.Of the proposed measures for maintenance effort, the amount of dis-cussion to resolve an issue shows the highest correlation to the chosenmetrics.
i
Contents
1 Introduction 11.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Background 42.1 A modularity decision framework . . . . . . . . . . . . . . . . 42.2 Research on metrics . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Related work 73.1 Research on metrics and fault rates . . . . . . . . . . . . . . . 73.2 Research on metrics and maintenance effort . . . . . . . . . . 83.3 Differences in my approach . . . . . . . . . . . . . . . . . . . . 9
4 Methodology 114.1 The subject projects . . . . . . . . . . . . . . . . . . . . . . . 11
4.1.1 Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 114.1.2 Selected projects . . . . . . . . . . . . . . . . . . . . . 12
4.2 The selected metrics . . . . . . . . . . . . . . . . . . . . . . . 134.2.1 Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2.2 Selected metrics . . . . . . . . . . . . . . . . . . . . . . 14
4.3 Measuring maintenance effort . . . . . . . . . . . . . . . . . . 164.3.1 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.2 Change in lines of code (Churn) . . . . . . . . . . . . . 164.3.3 Actions . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.4 Faults . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5 The data collection methods 175.1 The data model . . . . . . . . . . . . . . . . . . . . . . . . . . 175.2 Extracting the project data . . . . . . . . . . . . . . . . . . . 21
5.2.1 Extracting releases . . . . . . . . . . . . . . . . . . . . 215.2.2 Extracting issues and comments . . . . . . . . . . . . . 215.2.3 Extracting files and source instances . . . . . . . . . . 225.2.4 Extracting churn . . . . . . . . . . . . . . . . . . . . . 22
5.3 Measuring the metrics . . . . . . . . . . . . . . . . . . . . . . 235.3.1 Measuring LOC metrics . . . . . . . . . . . . . . . . . 235.3.2 Measuring C&K metrics . . . . . . . . . . . . . . . . . 235.3.3 Measuring Propagation cost metrics . . . . . . . . . . . 23
5.4 Connecting metrics to maintenance . . . . . . . . . . . . . . . 24
ii
6 Statistical analysis 256.1 Normalizing the data . . . . . . . . . . . . . . . . . . . . . . . 256.2 Spearman analysis . . . . . . . . . . . . . . . . . . . . . . . . 266.3 Data grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
7 Results 28
8 Discussion 318.1 Research questions . . . . . . . . . . . . . . . . . . . . . . . . 318.2 Comparison to previous work . . . . . . . . . . . . . . . . . . 318.3 Threats to validity . . . . . . . . . . . . . . . . . . . . . . . . 328.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
9 Conclusion 34
A Apache project overview 41
iii
1 Introduction
Estimating the duration, effort, cost and complexity of software projects isof vital importance for IT management. Underestimating cost and effort canlead to loss of credibility. Overestimating costs can lead to missed opportuni-ties and unhappy stakeholders [29]. However, estimations have traditionallybeen problematic in software engineering. There are various studies regardingaccuracy of estimations, Molukken and Jorgensen [37] did a review, showingthat 60-80% of all software projects run over budget, with projects goingover budget by 33% on average. Heemstra lists a number of reasons thatmake estimations difficult for software projects. In addition to the inherentdifficulty of measuring, he notes the lack of reference data to base estimateson [27].
More than half of the total development effort in software projects is spenton the maintenance phase [32]. With the trend of iterative development, thispercentage may even have increased in recent years. During the maintenancephase, software ages [39] and the code base gets cluttered by an accumulationof changes, often referred to as technical debt [23]. When technical debt isnot addressed, further development will be hindered.
Project managers have to decide if they want to spent their developmenttime on building new features, or on refactoring their code base. Variousframeworks have been developed to support these decisions [14], [30]. How-ever, these frameworks rely on accurate maintenance cost predictions for thecode with and without refactoring. Currently, most researchers seem to agreethat there is a lack of this data. [10], [30], [41]
But if managers do not have good insight into the benefits of refactoring,it is difficult to know when to refactor. My research aims to provide a foun-dation of data—relating complexity to effort and hence cost—upon whichsound refactoring decisions may be based. This foundation can then be usedin combination with work on modularity violation detection like [45] and [48]to decide which parts of the code base to refactor.
Specifically I have three research goals: 1) to introduce a new and morecomplete approach in measuring effort in software projects, including non-code-based measures, such as the amount of discussions and actions neededto resolve an issue, 2) to introduce a new approach of correlating source codemetrics variation to the changes of maintenance effort, and 3) to analyzewhich set of source code metrics is most correlated to maintenance effortusing these new approaches.
1
1.1 Research questions
The primary hypothesis of this thesis is:
There is a statistically significant correlation between softwarecomplexity variation over successive releases of software projectfiles, and the variation of effort required to maintain those files.
Finding a way to correlate maintenance effort with changes in one or moresource code metrics of a file, based on measurements that are easily to captureand track, will make it possible to predict future maintenance costs of thatfile. This information can then be used to make economics-driven decisionsabout software maintenance, including refactoring. While answering thisquestion is of interest to most software projects, few industrial projects arewilling to contribute the data needed to answer the question with authority.For this reason I have turned to open-source projects as the primary source ofdata. The choice of open-source projects is not without its own complexities,but I believe that the challenges of dealing with open-source data are solvable,as I will discuss.
There are a number of questions that I will need to answer in my effortto prove or disprove the primary hypothesis presented above.
1. What metrics should be considered when predicting maintenance costs?
Numerous source code metrics have been proposed and studied, butnot all of them have proven impact on maintenance cost. I first need toidentify the set of metrics, with a solid theoretical and empirical basis,that I can test using my approach.
2. Which measures can effectively approximate maintenance effort?Maintenance effort manifests in multiple ways, and is not always mea-surable in code. For example, adding a few lines of code may requirea great deal of effort, in the form of intensive discussion or multiplerevisions, if the change itself is difficult. To link complexity metrics toeffort, I must first establish a way of measuring maintenance effort.
3. Which complexity metrics are significantly correlated to which mainte-nance effort type?The answer to this question will help in understanding how mainte-nance effort measures increase or decrease with which complexity met-rics. I will examine the metrics selected in 1) and their correlation tothe maintenance effort measures selected in 2) on a per-file basis, usingstatistical models.
2
1.2 Roadmap
The rest of the thesis is organized as follows: In section 2 I will give somebackground on modularity decisions, source code metrics and modularityviolation detection. In section 3, I discuss the related work in the field, andhow my research framework is different from this previous work. Section 4describes the data selection. Section 5.1 describes how I extracted this data.Section 6 describes the analysis process on the data. Section 7 shows theresults I have obtained. Section 8 contains the discussion and threats tovalidity and suggestions for future work. Section 9 contains the conclusion.
3
2 Background
In this section, I will describe the theoretical context that this thesis buildsupon. I will first describe a modularity decision making framework for whichthe results of this thesis can be used. I will then discuss the theoreticalbackground on source code metrics, the different dimensions of complexitythat they measure, and some influential research on metrics.
2.1 A modularity decision framework
To assess the economic benefits of refactoring, you to weigh the costs ofrefactoring against the benefits in maintenance costs. Leitch and Stroulia [30]developed a framework for these assessments. The steps in their frameworkare shown in figure 1.
Legacy code
Refactoring analysis
Maintenancecost
prediction
Refactoring plan
Legacy System
Maintenance cost
prediction
Maintenance cost prediction
Refactoring cost prediction
ROI estimation
ROI
Refactoring decision
Figure 1: Model for economics-based refactoring decisions
The framework starts with predicting the maintenance cost based onthe current code base. Then, the developers perform a refactoring analysis,where they investigate the source code for refactoring opportunities. Theydeliver a refactoring plan, that consist of a number of proposed changes tothe source code that are aimed at improving maintainability. The the costsof executing this plan can be predicted, for example using COCOMO [7],Based on this plan, the maintenance costs of the system after the refactoringeffort are predicted. The predictions of the refactoring cost, maintenancewithout refactoring, and maintenance after refactoring are then input for a
4
Return-On-Investment (ROI) analysis. This analysis assesses the economicbenefits of the refactoring plan. Based on this ROI analysis, a manager candecide to accept or reject the refactoring plan.
There are a couple of challenges in using this framework. First, refactoringopportunities have to be identified. Numerous publications have addressedthis problem, like Trifu and Marinescu [45] and Wong et al. [48]. Second,the maintenance cost with and without the proposed refactoring has to beestimated. In this thesis, I will provide additional data that helps in makingthese estimations, based on source code metrics of the project.
2.2 Research on metrics
Published source code metrics can be divided into 5 categories, based on whatthey measure: size, complexity, coupling, cohesion, and inheritance [19]. Iwill give a brief description of each category, along with some of the mostinfluential publications on source code metrics.
Size Size is the most straightforward metric for source code. The numberof lines of code (LOC) is the easiest way of measuring size. But, like anymetric, it has its flaws. For example, as any experienced coder knows, it ispossible to write the same code with fewer (or greater) lines of code, whilemaintaining the same complexity. To mitigate these problems, several othermetrics have been proposed.
File complexity Measures of how complex the source file is are postu-lated to affect modifiability and maintainability: lower is better. Examplesof complexity-based metrics are Halstead Volume [25], a metric based on op-erator and operand counts, and McCabe Complexity [34], a metric based onthe number of possible paths in the program control graph.
Coupling Coupling describes the amount of connections the code has toother files or other classes. The assumption is that lower coupling is better.Briand et al. proposed a set of metrics that measure different versions of classto class coupling that are possible [8]. Another coupling metric is PropagationCost, which was first introduced by MacCormack et al. in 2006 [33].
Cohesion Cohesion measures how strongly the responsibilities of a file arerelated. The rationale behind measuring cohesion is the belief that sourcefiles and classes should focus on just one thing, and doing so will improvemaintainability.
5
Inheritance Inheritance-based metrics only apply to object-oriented code.Less complex inheritance hierarchies are expected to be easier to understandand maintain.
Chidamber and Kemerer [16] (henceforth C&K) developed the first, andso far most influential, metrics suite aimed at measuring object-orientedsource code. It includes metrics for inheritance, coupling, cohesion and in-heritance.
As you can see, many different metrics and metric types have been pro-posed. However, there is a lack of strong statistical evidence to support manyof these metrics. I will discuss the existing empirical research in section 3.
6
3 Related work
While there is a large amount of published research on source code metricsand maintenance effort, the field is far from stabilized.
In the research literature there are 2 broad categories of project outcomesthat have been associated with software metrics: linking metrics to faultrates, and linking metrics to maintenance effort. I will first review researchon fault rates, and review the research on maintenance effort. Finally, Idiscuss how my approach differs from previous work.
3.1 Research on metrics and fault rates
Much work has been done to study the correlation between source code met-rics and fault rates. I now discuss some of the most influential studies.
Comparing in controlled experiments Basili et al. in 1996 [4] gave 8groups of students the same assignment, at the end of which, the numberof errors that were found per class in the testing phase were counted andthen related to the various metrics in the C&K suite. They find all metricsin the suite to yield significant results. Briand et al. [11] replicated theexperimental setup of Basili et al with 8 student projects, and correlated atotal of 49 different metrics from various suites to fault rates. They find thatmost high-scoring metrics are coupling and inheritance based.
Comparing in industry data Tang et al. [44] compare the C&K metricsset against fault data in 3 real-time systems. They find significant results for 2complexity-based metrics. Cartwright and Shepperd [15] measured a numberof complexity and inheritance metrics, where they concluded that complexitymetrics are able to forecast faults. Subramanyam and Krishnan [43] correlatesome of the C&K metrics to fault rates They find significant results forcomplexity and inheritance based metrics.
Comparing in open source data Gyimthy et al. [24] study the C&Kmetric suite and a few metrics they developed themselves in the Mozillasoftware. They find significant values for complexity, size, inheritance andcoupling metrics.
Comparing over releases El Emam et al. [19,20] compares a larger set ofobject-oriented metrics against fault rates, where they control for class size.They construct a prediction model based on the first version of a project,
7
and then test this on subsequent versions. They conclude that, of the metricsthey examine, only coupling-based metrics provide significant results. Olagueet al. [38] look at data for a project that utilizes iterative development. Theycompare metric data with faults over various iterations. They find significantvalues for complexity metrics. Zhou et al. [51] have written a correction of thestatistics used in the Olagues work. They also performed new research overa project that contains various iterations, and again find significant valuesfor complexity-based metrics.
3.2 Research on metrics and maintenance effort
A number of papers have attempted to correlate source code metrics to main-tenance effort. However, there is no generally agreed methods to determinethe maintenance effort for a file. I will now describe a number of approachesthat have been attempted.
Comparing against expert judgement Welker et al. [47] propose apolynomial that uses complexity based metrics to predict maintenance ef-fort. The weights for each of these metrics are automatically fitted, so thepolynomial matches data of expert judgement in 8 systems. They presentthis polynomial as the Maintainability Index.
Comparing against Maintainability Index Misra [36] and Zhou andXu [50] compare a list of complexity and inheritance metrics against theMaintainability Index at the system level. Both papers find significant cor-relations in both categories.
Comparing in controlled experiments Harrison et al. [26] comparemetrics against both expert judgement and maintenance measurements ob-tained in a controlled experiment. They find correlations between complex-ity and cohesion and their maintenance measures, and a correlation betweencomplexity and the expert judgement of a system.
Arisholm [2] looks at 10 changes made to an industrial system, where helogged hours spent on the tasks. He finds no correlation between source codemetrics and effort. However, this may be due to the small size of the dataset.
Comparing against change Li and Henry [31] did research linking aset of metrics against total change volume of classes in 2 different projects.They found significant correlations for complexity, coupling, cohesion and
8
inheritance metrics. Binkly and Schach [6] looked at change volume of anindustrial system. They positively correlate this to coupling and complexitymetrics and to one inheritance metric. Ware et al. [46] look at the number ofchanges and the number of lines changed for files in a commercial application.They find significant correlations for complexity and coupling measures.
Comparing over releases Demeyer and Ducasse [17] try to identify prob-lem areas in the source code of a project and check if those problem areasget refactored in later releases. They find no correlation, however they alsonote that the project is in good shape, so there might not have been a needfor refactoring. Alshayeb and Li [1] try to correlate a polynomial, consistingof complexity, coupling, and inheritance metrics, to maintenance effort initerative projects. They do this by measuring lines of code added, deletedand changed first between releases of a project, then between changes withina release. They find that their constructed polynomial is reasonably good atpredicting effort between changes, but less good at predicting effort betweenreleases.
3.3 Differences in my approach
There are three ways in which my research is different from earlier researchin this area. The first difference is that instead of simply comparing filemeasures to maintenance effort, I compare an increase or decrease in filemeasures to an increase or decrease in maintenance effort. This analysis givesmuch more insight into the effect of source code metric variation during thelifetime of a project. If there is a clear correlation between a difference insome metric values and maintainability measures, the project manager willbe able to use this knowledge make informed decisions about maintenanceand refactoring opportunities.
The second difference is that instead of measuring maintenance effort asdifference in lines of code or the number of total changes, I look at a moreprecise estimation of effort. For this, I add 2 new measurements; the amountof actions—changes to a file—required to resolve an issue and the amountof discussion required to resolve an issue. This results in a new and morecomprehensive and holistic view of maintenance effort. The amount of codechanged is an inadequate measure of effort because it does not take intoaccount how difficult it was to make those changes. Actions and discussionsattempt to capture this dimension of effort; the idea is that a complex changewill require more discussion among project members and more changes to files(since some of the initial changes will not be correct and need subsequentrounds of changes).
9
The third difference is that instead of looking at either effort measuresor fault rates, I look at both, since faults introduce extra maintenance workthat has to be performed on files as well.
To my knowledge, no previous work on maintenance prediction measuresdiscussion or the amount of actions, or also considers fault rates. I am also thefirst to look at change in file metrics versus change in maintenance effort overreleases. Previous work like [1,19,20,38,51] have also looked at maintenanceeffort or faults over various releases, but instead of comparing variations inmetrics over different releases to variations in maintenance effort, they createa formula to predict maintenance effort from source code metrics, and testthat over various releases and changes on the project during those releases.To my knowledge, my research is also on the biggest dataset so far, containingdata for over 6000 changes.
10
4 Methodology
In this section I will describe the subject projects that I have studied, the setof metrics I chose to collect from these projects, the rationale behind thesechoices, and my data collection methods.
4.1 The subject projects
The selection of projects is important to the quality of the data, and thereforeto the validity of the research. Now I describe the criteria I used to selectthe projects for this research, and the motivation behind those criteria.
4.1.1 Criteria
To make my research as convincing as possible, I attempted to obtain a setof projects that was as large and heterogeneous as possible. I specificallysearched for heterogeneity along the following dimensions:
Variation in domain of software Uses of software can be categorizedinto various application domains. I tried to find projects from distinct do-mains to ensure that my research results would apply generically.
Variation in source code sizes Even though Dolado [18] has shown thatdevelopment productivity in software engineering does not vary significantlyacross project sizes, maintaining a large-scale software project is in practicestill different from maintaining a small scale software project and so I wantedto ensure that I sampled from projects with a broad range of sizes.
Variation in team size There has been considerable research on the ef-fects of team size on development speed. Brooks argues that smaller teamstend to have greater productivity per person [12], whereas Raymond [40]argues that larger teams are beneficial in Open Source Software.
Variation in Project Age The age of a body of software can influencedeveloper productivity in ways that may not be measurable by source codemetrics. For example, the technology chosen (language of implementation,operating system, development libraries, etc.) can cease being supported,and key developers can leave the project resulting in a knowledge loss.
Sampling the data from a set of projects that is heterogeneous amongthese dimensions will make my results more generally applicable. To controlthe scope of the work that I initially pursued, I have restricted my attention to
11
projects written in Java, so that I could reuse the metrics extraction processfor entire set of projects. Furthermore, to be able to extract maintenanceeffort data, all of the chosen projects were required to use a version controlsystem and a bug tracking system. To make sure a sufficient amount of datais available for each project, the selected projects must all contain sourcecode and maintenance data for at least 3 releases, and all have a minimumof 500 resolved items in their bug tracking systems.
4.1.2 Selected projects
Based on the selection criteria, I did a research among various known opensource projects. A main obstacle in selecting projects was that many opensource projects used different bug tracking systems and version control sys-tems. To make sure that all data was in the same format, and the projectsadhere the same conventions, I decided to select projects that are stored inthe Apache Software Foundation repository. The Apache Software Founda-tion has a database of, as of 31-12-2011, 102 projects, which all use Jirafor their bug tracking and Subversion for their version control. The full setfrom this analysis is visible in Appendix A. I have analysed these projects,and based on the criteria described above, I selected 3 Java projects for myresearch.
I have summarized the project characteristics, the first and last releasefor which I have extracted data, and the number of resolved or closed issuesthat I was able to extract in table 1).
Derby Derby1 is a relational database. It was started by Cloudscape Inc.in 1996. Through a series of acquisitions, the project was transferred to IBMownership in 2001. IBM then donated it to the Apache Foundation in 2004.Subsequent to this donation IBM was still contributing to the project [22].In 2005, Sun Microsystems (now Oracle) took an interest in Derby, and thecompany made significant contributions since that time [28]. I extractedinformation from 18 releases of Derby available in the Apache repository. Asof 31-12-2011 there were a total of a 102 contributors who contributed to
1http://db.apache.org/derby/
Table 1: Selected ProjectsProject Releases Resolved Issues Contributers First Release Last Release Domain
Derby 18 3058 102 08-2005 (10.1.1.0) 10-2011 (10.8.2.2) Database
Lucene 18 2444 41 03-2006 (1.9.1) 12-2010 (3.0.3) Distributed search
PDFBox 8 699 9 02-2010 (1.0.0) 07-2011 (1.6.0) PDF document manipulation tool
12
their bug-tracking system on over 3000 resolved or closed items.
Lucene Lucene2 is a high-performance, full-featured text search engine li-brary. It was started as a SourceForge project in 2000, but it has been anApache project since 2001. I extracted data from 18 releases. A total of 41contributors have contributed to Lucene’s bug-tracking system, as of 31-12-2011, on over 2400 resolved or closed items. An example consumer of theLucene search engine is Twitter Search [13].
PDFBox PDFBox3 is a collection of command-line tools for working withPDF documents. The PDFBox project was started in 2008. Just 9 peoplehave made contributions to their bug-tracking system as of 31-12-2011, onalmost 700 resolved or closed issues. I extracted data from 8 releases for thisproject.
4.2 The selected metrics
In this section, I will discuss the metrics that I calculated for each project.Each of these metrics will subsequently be compared against various measuresof maintenance effort to determine which of them are good predictors offuture maintenance effort (and hence cost).
4.2.1 Criteria
As described in section 3, numerous metrics (summarized in [35, 41]) havebeen proposed that purport to predict software quality and maintenanceeffort. Unfortunately, testing all these metrics for their power in predictingmaintenance effort was infeasible. To select a smaller target set of metricsto analyze, I applied 3 criteria.
The metric is widely applicable Since I am restricting my research toprojects written in Java, the metrics will have to be applicable at least tothis language.
The metric is defined at file level The unit of analysis in my researchis source file, so the metric has to be interpretable at the file level. I alsoemploy metrics that are defined at the class level, but for these metrics I
2http://lucene.apache.org/core/3http://pdfbox.apache.org/
13
only consider files that contain just 1 class. This constraint only eliminatesaround 7% of the files from the candidate data set.
The metric has been consistently proven in previous research Tokeep the scope of the research manageable, I chose only metrics that havebeen consistently shown to be correlated with maintenance effort in previousstudies.
4.2.2 Selected metrics
I have selected the following metrics to investigate their correlation to main-tenance cost:
Source Lines of Code (LOC) The total lines of code in the file. Theidea behind this metric is that, all other things being equal, larger files areharder to maintain.
Weighted Method Complexity (WMC) The sum of the complexitiesof the methods in the class defined in this file.
Return For Class (RFC) Total number of methods that can be invokedfrom method invocations on the class defined in this file.
Coupling Between Objects (CBO) The number of other classes thatthe class in this file is connected to.
Lack of Cohesion Of Methods (LCOM) The number of method pairsin the class in this file that do not share the usage of a single attribute of theclass.
Depth in Tree (DIT) The number of classes that are a superclass of theclass in this file.
Number of Children (NOC) The number of classes that have the classin this file as a superclass.
WMC, RFC, CBO and LCOM, DIT and NOC have all been describedby C&K in 1994 [16]. I have altered their definitions slightly to make themmeaningful at the file level, as described in section 4.2.1. The C&K suite hasbeen studied heavily, and its metrics have been validated in many studies(e.g. [6, 26,31,36,46] ).
14
In addition to the C&K metrics, I have selected one more metric—Propagation Cost. The creators of this metric claim that it captures ar-chitectural complexity.
Propagation cost (PC): The propagation cost metric, a coupling-basedmetric, is based on the observation that a change in one class can affect otherclasses. This metric was first introduced by MacCormack et al. in 2006 [33].I have added this metric because primary research shows coupling basedmetrics have a strong correlation to the appearance of ripple effects–a changein a single file affecting other files [9]. There has also been some promisingearly research on the predictiveness of propagation cost on maintenance effort[14,33].
Propagation cost is based on a visibility matrix for files in a project [33].This visibility matrix is a binary matrix where a project’s files are the labelson the rows and columns, and dependencies between the files are the values.These dependency values are determined using a path length L, which allowsa file A to be dependent on a file B through a dependency chain of length L.For example, with a path length of 1, only direct dependencies are used forcalculating the propagation cost—this is the traditional coupling metric. Tocalculate the propagation cost, just take the sum of all dependencies in thevisibility matrix, and divide that by the total possible dependencies in theproject to obtain the propagation cost value.
However, propagation cost when calculated this way, is not defined atthe file level. For this, instead of taking the sum of all dependencies in thematrix, I take the sum of the incoming dependencies for a file, and divide thatby the total possible dependencies per file to calculate incoming propagationcost per file. Concretely, this means I take the sum of the column in thevisibility matrix that represents the file, and divide this value by the lengthof the column. I do the same for outgoing dependencies to calculate theoutgoing propagation cost per file, where I take the sum of the row in thevisibility matrix that represents the file, and divide that by the length of therow. This approach is a slight variation on the work of Ferneley [21] andYang and Tempero [49] who have found promising results.
In my research I have investigated both incoming and outgoing propaga-tion costs with path lengths of 1, 3, 5, 10, and 20 to see which variant of thepropagation cost metric has the most predictive power.
Finally, I introduce a new variant of the propagation cost metric thatemploys a decay rate. With this decay rate, the strength of indirect depen-dencies is reduced by a factor D for each additional step in the dependencychain between the 2 files. In the present study, I applied a decay rate of0.1. The above combinations give a total of 20 propagation cost metrics(where the decaying version of propagation cost of length 1 is the same as
15
the non-decaying version).The 20 propagation cost metrics, plus the 7 selected C&K metrics, add
up to a grand total of 27 metrics that I calculate for each file of each releaseof each project.
4.3 Measuring maintenance effort
Since I am looking at open source projects, the developers did not log hoursfor their maintenance work. To approximate maintenance effort, I scrutinizedthe data set to see which proxy measures for maintenance effort I could find.I settled on and collected 4 file-based proxy measures.
4.3.1 Discussion
This is the amount of discussion that occurred in resolving an issue. Theassumption is that a more complex change is likely to generate more discus-sion. Concretely, I measure the number of comments that have been madein the bug tracking system to resolve an issue that affects a file.
4.3.2 Change in lines of code (Churn)
Churn is the total number of lines of code that were changed in the file toresolve an issue. If a file is changed multiple times for the same issue, I see ifthe file changes overlap to make sure I don’t count the same changes multipletimes.
4.3.3 Actions
For actions I measure the amount of actions that were performed to resolvean issue. Concretely, this counts the total number of patches and commitsthat was needed to resolve an issue that affects a file. The notion here is thatthe more complex the file, the more likely it is that it would require a largenumber of actions if something has to be changed in it.
4.3.4 Faults
This is the amount of bugs that that were reporting per month during therelease. If more bugs occur in a file, then more maintenance effort will haveto be put in the file to restore its functionality.
16
5 The data collection methods
5.1 The data model
The data model used in the data extraction and analysis procedure is illus-trated in Figure 2.
Figure 2: The data model
For each project, I study a number of releases, each of which have a set offiles associated with them. This connection is realized through the concept ofsource instances. A source instance is a folder in the version control systemthat stores an instance of the source code. Such an instance exists for eachrelease. Each project also has a list of issues, which are extracted fromthe project’s bug- or issue-tracking software. Issues consist of both bugreports and change requests. Developers can submit patches to suggest asolution to an issue. Patches consist of a list of actions, which are changesto files that were made to resolve an issue. For each action, I measure thenumber of lines of code added and removed, where changed lines count asboth added and removed. Each action corresponds to a change done in 1 filefor 1 issue, but issues are often resolved using multiple actions, and multipleactions on a single file may be performed in multiple patches to resolve anissue. The patches that finally get accepted (and which resolve the issue) arecalled commits. Developers can also associate comments with issues. Thesecomments are used for communication between developers.
For each file, we generate a number of file metrics. These are also storedin the database.
I will now disucss the various entities and the properties of those entitiesthat are extracted into the database for each project.
Releases A release in a project is a snapshot of the source code that wasdistributed as a new version. Releases are the time unit that I use in the
17
analysis to compare metric and effort values over. For releases, I extract thefollowing properties:
• Version numberThe version number tells us which version the release represent.
• Release dateThe release date allows us to sort releases in chronological order. It isalso used it to calculate the time between releases, which is used forcalculating the fault rate per month.
• Source instanceThe source instance describes which instance of the source code in theversion control system contains the snapshot of the source code in thisrelease.
Source instance The version control system contains by convention mul-tiple folders containing different instances of the source code. All 3 projectsI look at use the the same organization for these folders. There is the trunkfolder, which contains the instance of the code where the main developmentis done. There is the branches folder, where developers can create a separateinstance for separate development and testing, and the tags folder.
The tags folder is a directory in the version control system that containsa replica of the source code as it existed at the moment of a release. It isa convention that for each release, a tag directory is stored in this way. Iuse these tag directories to extract metrics of the files during their variousreleases. For source instances, I extract the following properties:
• Instance pathThis is the path from the root of the version control system to thisinstance of the source code.
• Instance typeThe instance type describes whether the instance contains the trunk, abranch, or a tag.
Files Files are the unit of measurement in the analysis. I look at bothmetric values and effort measurements on a per-file basis. There is an entryin the database for each file in the repository. For files, I extract the followingproperties:
• File pathThe path from the root of the repository to this file.
18
• Source instanceThe instance of the source code in the repository that this file belongsto.
• Relative pathThe path from the root of the source instance to this file. This path isused to compare files between source instances.
Issues Issues represent change requests or bug reports in the bug trackingsystem. They represent a wanted or needed atomic change in the project.For issues, I extract the following properties:
• Issue typeThe issue type describes whether the issue is a bug report or a changerequest.
• StatusThe status describes if the issue is resolved or closed yet, or if it is stillopen.
• Creation dateThe date this issue was created. I use this to calculate the fault rateduring a release.
• Resolution dateThe date this issue was resolved or closed. I use this to see which issueshave been fixed during a release, and thus for which issues to calculatemaintenance effort.
Patches Patches are suggested changes to the source code, which would(help) resolve an issue. Patches consist of a list of actions. For patches, Iextract the following properties:
• IssueThe issue that this patch aims to (help) resolve.
Commits A commit is a change in the code in the version control system.One or more commits together resolve an issue. Commits also consist ofactions, and can be seen as patches that have been accepted as resolving anissue. For commits, I extract the following properties:
• IssueThe issue that this commit resolves.
19
Actions An action is a (suggested) change to a file. It consist of a list oflines that has to be added or removed. Patches and commits consist of a setof actions to one or more files, where each action belongs to either a patch ora commit, and each action applies to exactly one file. For actions, I extractthe following properties:
• Patch or CommitThe patch or commit that this action belongs to.
• FileThe file that this action affects.
• Lines addedThe number of lines of code that are added to this file by this action.
• Lines removedThe number of lines of code that are removed from this file by thisaction.
Comments Comments can be placed on change requests or bug reports.They are used by developers to discuss the resolution of the issue. I use thenumber of comments to estimate the amount of discussion that was requiredto resolve an issue. For comments, I extract the following properties:
• IssueThe issue that this comment is on.
File Metrics For each file, I extract and store a number of file metrics.These are used to relate to the various maintenance metrics in search for acorrelation. Files have a rows in the file metrics table associated with themfor each metric type. File metrics are stored with the following properties:
• FileThe file this metric value belongs to.
• Metric TypeThe type of metric that this value measures of the file.
• Metric valueThe value that this file has regarding the metric type.
20
5.2 Extracting the project data
I populated the data model described above by extracting data from the bugtracking system and version control repositories for each project. Since theselected 3 projects were all maintained by the Apache foundation, it was pos-sible to use the same technologies for all projects. The bug tracking systemin use is Jira4, which has a WSDL API that is usable for data extraction.Their version control system is Subversion5.
For the extraction of the data, I used the Taverna Workflow ManagementSystem6. Taverna allows you to create workflows, that are essentially pro-grams that have a pipes-and-filters architecture. Filters can be in the formof Java programs, UNIX shell scripts, XML operations or service calls.
I have constructed a number of workflows that I will discuss here. Theseworkflows query both systems for the necessary data, format it, and insert itinto the database based on the data model. The workflows are all written sothat they either update the information if it already exists in the database,or insert it if it did not already exists. This way, if any error occurs duringthe sometimes long extraction runs, the workflow can just be restarted. Thismeans that a workflow can be run incorrectly many times, but as long as oneof the runs was successful, the data will be consistent.
5.2.1 Extracting releases
The workflow that extracts release information first queries the WSDL inter-face of the Jira bug tracking system for the existing releases in the project. Itthen extracts the version number and release date from the returned XML.A Java script then connects to the database to insert this information. Thesource instance field does not get filled yet by this workflow, since that in-formation is not available in Jira. I manually linked each release to its cor-responding source instance after the workflows have succeeded.
5.2.2 Extracting issues and comments
The issues and comments tables are both populated by the same workflow.This workflow first queries the Jira WSDL interface for a list of issues. Thenon a per issue basis, it queries the WSDL interface for comments to thatissue. It then inserts the obtained information for both entities into thedatabase. Since this workflow runs for every issue, it can take a while to
4http://www.atlassian.com/software/jira/5http://subversion.tigris.org/6http://www.taverna.org.uk
21
execute. For this reason, I’ve written the workflow to run over a specifiedtime range. This time range makes the workflow query for issues that wereresolved within this time range.
5.2.3 Extracting files and source instances
This workflow expects a full checkout of the project repository to be availableon the filesystem. It recursively scans all files and folders in this checkout,where it first identifies different source instances. Since our 3 projects allfollow the same convention of dividing the code up in trunk, branches, andtags sections, this can be done using a regular expression. The workflow thensubtracts the part that describes the source instance from the full path toobtain the relative path for each file or folder. Finally the source instancesand their files that are obtained this way are inserted into the database.
5.2.4 Extracting churn
To extract the churn per issue, we need all patches and commits per issue.The workflow queries the database for a list of (already inserted) issues thatare resolved. Then per issue, it queries the Jira WSDL interface for a listof patches that have been submitted, and then downloads these patch files.Meanwhile, it queries the Jira web interface for a list of commits that havebeen done to resolve the issue, and queries Subversion for the diff files ofthese commits.
Now the workflow has a list of patch files and Subversion commit diff files.These 2 file types are both in the same format, and represent the changes aremade to files in applying the patch or commit. The workflow orders this listchronologically. This combination of patches and commit diff files is a logof what changes were made to files to resolve an issue. To get from this listof patches and commits to a list of actions, the workflow performs a regularexpression to the commits and patches into affected files, with the change inthat file.
For each action, the workflow now checks what the difference is with theaction on that file in the previous patch or commit (if it exists). This givesus the incremental effort for each action that was applied in order to resolvethe issue.
Because this workflow also takes a long time to run, I have made itpossible to run it over time ranges, like the workflow described in section5.2.3.
22
5.3 Measuring the metrics
To know the metrics for each file during each release, I need to measure eachfile in the source instances that are snapshots of releases. For this I developeda set of workflows that extract the different metric types. Each workflowsmeasures all files in a specified source instance, so the workflows have to berun for each source instance in the projects. I will describe the workflowsbelow, with the input they expect from the source files.
5.3.1 Measuring LOC metrics
The workflow that measures the LOC metric takes the directory that containsthe source instance as input, and then runs the UNIX script:
wc -l
on all files to extract the number of lines of code. These values are thenstored in the FileMetrics table.
5.3.2 Measuring C&K metrics
To measure the C&K metrics, I used ckjm, a freely available tool, writtenby Diomidis Spinellis [42]. This tool takes the .class files as input, whichare compiled java files. To obtain these .class files, I downloaded the .jararchives for each projects for all the releases I needed file metrics for, andextracted these .jar files into folders that contain .class files.
Since Java class names are directly linked to file names, and packagehierarchies are directly linked to directory trees, I am able to reconstruct thefile path from the java name by replacing the periods in the class name byforward slashes. As discussed in section 4.2.1, files with more than one classdefined in them are ignored in this analysis.
The ckjm tool calculates the C&K metrics that we have described in4.2.2, and outputs the values as plain text. The workflow then extracts thesevalues, and inserts them into the FileMetrics table.
5.3.3 Measuring Propagation cost metrics
Like the C&K metrics, the propagation costs are also measured from the com-piled .class versions of the files. This workflow uses two tools that have beendeveloped at the Drexel University: Moka and Janus. The Moka tool trans-forms the .class files into .moka files, which contain an XML-based summaryof the class. These .moka files are then used by Janus to measure which class
23
dependencies exist. Janus outputs a Dependency Structure Matrix (DSM),which is a matrix that contain all files in the project and the dependenciesamong them. This matrix is then used to calculate the propagation cost forthe different propagation costs and their path lengths, as described in section4.2.2. The class names are linked to file paths in the same way as done insection 5.3.2. The metrics are then stored in the FileMetrics table.
5.4 Connecting metrics to maintenance
I now want to compare the metric values against maintenance measures foreach release. We get the file metrics by looking at the files in the sourceinstance that is connected to the release in the database. The File Metricstable contains the metrics for these files.
In extracting effort data for the maintenance measures, we make a dis-tinction between discussion, churn, and actions and faults. Fault data iscalculated by counting the number of reported faults during the release, sothe creation date of the bug report determines which release the value countstowards. For discussion, churn, and action data, we look at the resolutiondate, so the date that the issue was resolved determines which release thevalue counts towards.
24
6 Statistical analysis
6.1 Normalizing the data
The metrics described in section 4.2.2 give a data set per file that containsa list of releases, associated complexity metric and maintenance effort data.Since I want to analyze the relationship between increasing or decreasingcode metrics and maintenance effort, I transform these numbers so that theyreflect changes compared with the previous version. To this end I take themetric value of each file in a release, and divide that value by the metric valuein the previous release. This gives me a factor representing the increase ordecrease in the metric value as compared with the previous release. I do thesame calculation for effort values, dividing average effort measures by theircomparable values in the previous release. These derived data points thusrepresent the difference between two releases, or the relative change per re-lease of the same file. Since these are now relative values, I can compare themacross different files and can compare files across projects without having tocorrect for file size.
Table 2: Example Data Extracted from Project FilesFile Release Issue Metric Effort Group Rel. Metric Rel. Effort
1 client.java 1.0 PROJ-01 100 40 - - -2 client.java 1.0 PROJ-02 100 60 - - -3 client.java 1.1 PROJ-11 140 75 1 1.4 1.54 client.java 1.2 PROJ-20 70 60 2 0.5 0.85 library.java 1.1 PROJ-11 60 75 - - -6 library.java 2.0 PROJ-31 72 150 1 1.2 1.57 server.java 1.0 PROJ-11 200 40 - - -8 server.java 1.2 PROJ-21 240 52 1 1.2 1.3
This relative value calculation is exemplified in Table 2. Please ignorethe group column for now, this will be discussed in section 6.2. In row 3you can see that to get the relative value, the metric value is divided by themetric value in the previous release. For the relative effort value, the value isdivided by the average effort over the previous releases. The relative valuesof row 4, which are of the release after the release of row 3 are then calculatedusing the absolute values of row 3. The file “library.java” in row 5 and 6 hasno data associated with it for release 1.0 or 1.2, so the entry for release 1.1is used as first data point, and the entry for release 2.0 as second data point.
I have generated a table like this for each metric type versus effort typecombination. This yields a total of 27 x 4 = 108 tables.
25
6.2 Spearman analysis
I now have a table of data points that I can analyze to understand the corre-lations between complexity metric values and effort measures. Unfortunately,the effort values do not follow a normal distribution, which limits the statis-tical tools that I can employ. For this reason I have opted to use a Spearmanrank correlation test.
However, Spearman assumes independent measurements in the data set.This means that I have to make sure that I do not compare data pointsof various releases of the same file in the same analysis, as these could bedependent.
6.3 Data grouping
I thus categorize the data points into groups as follows: I first skip all datapoints which come from the first release I have of a file, since I need valuesthat are relative to the previous release for my analysis. After that, all datapoints that belong to the second release of a file go in group 1, all data pointsthat belong to the third release of a file go in group 2, etc. This means thatdata from different releases of the same file will not be compared against eachother.
Table 2 shows an example of how the assignment to groups is done. Row1 and 2 belong to the first release I have information of for file “client.java”.Because I use values that are relative to the previous release, I can notuse these values in my analysis. Row 3 belongs to the second release of“client.java”, so it goes into group 1. Row 4 belongs to the third release of“client.java” that I have information for, so it goes into group 2.
Row 5 contains the first release of “library.java” that I have informationfor, thus it has no relative values. Please note that release 1.1, which thedata point for row 5 belongs to, does not have to be the first release in whichlibrary.java existed in the project; rather release 1.1 is the first release where Ihave maintenance effort data for “library.java”. Row 6 belongs to the secondrelease of “library.java” that I have information for, so it goes into group 1.
Row 7 is the first release of “server.java” that I have information for, so Ican not add it to a group. Row 8 is the second release of “server.java” thatI have information for, so I add it to group 1. Please note for row 8 that theinformation in group 1 here is not from the release right after the release inrow 7, but rather information from 2 releases later. This is because release1.2 is the first release after release 1.0 for which there is maintenance effortinformation.
Table 3 shows the number of data points per group that were obtained
26
Table 3: Data Points Derived from GroupingGroup Data points
1 9602 4343 1824 845 40
in this way, in this case for discussion data. Because of reducing the numberof data points–which decreases the accuracy in the higher group numbers–Idecided to only include groups 1, 2 and 3 in my analysis.
Finally, in my analysis I exclude issues that did not affect source files. Ialso need changes over more than one release of a file to make sure the fileshows up in a group. Because of this selection in data, the total of 6201 issuesthat I extracted from the three projects resulted in a total of 1739 usable datapoints that are in a group, of which 1576 are in the first 3 groups.
In summary, each group of data shows whether and how one type of main-tenance effort varies with source code complexity changes in two successivereleases. Investigating all the groups will show how these two aspects changetogether over multiple releases.
27
7 Results
Since I have performed a Spearman analysis on a set of 108 (27 x 4) codemetric type versus maintenance effort type combinations, over 3 differentreleases, I have obtained 324 p and rho values. Normally, it is acceptableto accept all outcomes with a p-value under 0.05 as a significant outcome.However with a large number of tests like this, that would lead to many falsepositives. I correct for this using the False Discovery Rate (FDR), describedby Benjamini and Hochberg [5] to control the expected proportion of falsepositives in the result set. I select the expected amount of false positives to be5%, this means that (on average) 5% of the results that are found significantare actually false positives, or type I errors. This results in a total amount of63 Spearman Analyses with significant outcome, spread over 40 metric type -effort type combinations. The highest p-value that is still declared significantat an FDR of 5% is 0.010.
The results are presented in Table 4. Values that are considered significantat an FDR of 5% are printed in bold.
Significant rho values in my results range from 0.29 to 0.03. I have chosento consider all rho values as long as there is a significant p-value, since thisp-value already shows that there is a correlation, even if it is not a very strongone. There are many factors that can influence the data, such as the inherentdifficulty of the issue, the experience of the developer, and the fact that I justmeasure proxies for effort. Each of these factors will reduce the strength ofcorrelation between the collected metric and effort values. I conjecture thatmeasuring these correlations again with a larger dataset, or one that filtersout specific sources of variation in the data set (such as issue difficult) wouldreveal higher Rho values.
The data shows that the most significant hits, 22 in total, are of met-ric/effort combinations that use the amount of discussion as the measureof effort. Actions show up 8 times in the significant hits and churn showsup 9 times. Of the 27 considered metric types, there is only 1 significantcorrelation to fault measures, this is a surprisingly low amount.
This data shows that the amount of discussion that is needed to resolve anissue is most closely correlated to source code metrics. All metrics that scoresignificantly on the faults, churn, or actions measures, also score significantlyon the discussion effort measure.
Of the scoring metric types, I found significant results for complexitybased and coupling based metrics in the C&K metric suite, and for variousvariations of the propagation cost. There is only 1 significant result forthe size based metric (LOC), and none of the inheritance or cohesion basedmetrics from the C&K suite show significant results.
28
Among the variations of propagation cost, incoming propagation costmetrics seem to score higher rho values than outgoing propagation cost. Theoptimum path length for both incoming and outgoing propagation cost ap-pears to be 5. Non-decaying propagation cost scores higher than decayingpropagation cost, however this may also be dependent on the decay factorthat I have chosen.
Another noticeable fact is that the propagation cost metrics with pathlength 1 decaying and non-decaying score exactly equal. This is expected,as the decay factor only starts to take effect with path lengths greater than1. You can also see that the CBO (coupling between objects) and outgoingpropagation cost of length 1, which are similar in intent and implementation,also score similar results.
29
Tab
le4:
met
ric
cate
gori
es-
effor
tm
easu
res
Met
ric
Typ
eE
ffor
tT
yp
eP
Gro
up
1P
Gro
up
2P
Gro
up
3R
ho
Gro
up
1R
ho
Gro
up
2R
ho
Gro
up
3R
FC
dis
cuss
ion
0.0
00044
0.0
00005
0.0
08104
0.2
67665
0.2
94932
0.2
58305
PR
OP
-IN
-5-N
churn
0.0
01877
0.0
00618
0.02
4730
0.2
62433
0.2
76050
0.25
5251
PR
OP
-IN
-5-N
dis
cuss
ion
0.0
00000
0.0
00002
0.02
9023
0.2
52568
0.2
69939
0.29
0199
PR
OP
-IN
-10-
Nac
tion
s0.0
05395
0.01
4327
0.01
0235
0.2
45397
0.25
5007
0.22
7875
PR
OP
-IN
-20-
Ndis
cuss
ion
0.0
00000
0.0
00002
0.03
1972
0.2
14027
0.2
54779
0.21
3701
WM
Cdis
cuss
ion
0.01
1136
0.0
00860
0.03
7767
0.19
4296
0.2
34121
0.19
8693
PR
OP
-IN
-10-
Ndis
cuss
ion
0.0
00000
0.0
00017
0.03
3132
0.2
10000
0.2
49413
0.25
8509
PR
OP
-IN
-3-N
dis
cuss
ion
0.0
00000
0.0
00014
0.03
3328
0.2
04334
0.2
45909
0.46
9582
PR
OP
-IN
-20-
Nch
urn
0.06
0578
0.0
00866
0.0
02220
0.18
1921
0.2
31393
0.2
01866
PR
OP
-IN
-3-N
acti
ons
0.05
0249
0.15
8010
0.0
01243
0.09
5129
0.13
8786
0.2
01210
CB
Odis
cuss
ion
0.0
00569
0.0
01236
0.06
8078
0.1
69772
0.2
14526
0.23
2638
PR
OP
-OU
T-5
-Ddis
cuss
ion
0.0
00032
0.0
00406
0.08
8805
0.1
69210
0.2
10509
0.43
5038
PR
OP
-OU
T-5
-Ndis
cuss
ion
0.0
00008
0.0
00000
0.10
4063
0.1
67687
0.1
99103
0.17
9817
PR
OP
-IN
-10-
Nch
urn
0.10
4461
0.0
05751
0.0
04181
0.16
7291
0.1
91257
0.1
74478
PR
OP
-IN
-3-D
dis
cuss
ion
0.0
00135
0.0
00513
0.10
7429
0.1
67505
0.1
96652
0.17
5641
PR
OP
-IN
-20-
Dch
urn
0.08
5308
0.0
07814
0.03
4112
0.14
2537
0.1
81523
0.16
4104
PR
OP
-IN
-10-
Dch
urn
0.09
3091
0.0
09535
0.03
9627
0.12
9498
0.1
77632
0.15
8916
PR
OP
-OU
T-2
0-D
dis
cuss
ion
0.0
00030
0.0
00230
0.14
0024
0.1
37781
0.2
10541
0.16
2847
PR
OP
-OU
T-3
-Ddis
cuss
ion
0.0
00038
0.0
01662
0.11
5767
0.1
55631
0.1
86864
0.20
2353
PR
OP
-OU
T-3
-Ndis
cuss
ion
0.0
00000
0.0
00003
0.12
5026
0.1
48419
0.1
82640
0.16
7002
PR
OP
-IN
-20-
Nac
tion
s0.0
05473
0.24
0186
0.02
6023
0.1
62186
0.12
5535
0.16
2042
PR
OP
-OU
T-1
0-D
dis
cuss
ion
0.0
00040
0.0
00275
0.13
0689
0.1
37781
0.1
77770
0.16
3026
PR
OP
-IN
-5-N
acti
ons
0.02
5416
0.13
7811
0.0
04198
0.11
8281
0.18
8294
0.1
54387
PR
OP
-IN
-5-D
dis
cuss
ion
0.0
00002
0.0
00554
0.16
6887
0.1
18084
0.1
82912
0.15
3885
PR
OP
-OU
T-5
-Nch
urn
0.51
4329
0.27
1746
0.0
07842
0.05
0278
0.08
5223
0.1
48704
PR
OP
-OU
T-1
0-N
acti
ons
0.77
5158
0.0
04388
0.18
3897
0.04
7211
0.1
44762
0.11
7764
PR
OP
-OU
T-2
0-N
churn
0.52
2952
0.19
9760
0.0
00499
0.04
5013
0.09
5871
0.1
23543
PR
OP
-OU
T-1
0-N
fault
s0.
5965
000.
2494
790.0
06697
0.04
1914
0.07
1588
0.1
20719
PR
OP
-OU
T-1
0-N
churn
0.50
6907
0.27
6148
0.0
00585
0.04
8636
0.09
1226
0.1
20624
PR
OP
-OU
T-1
0-N
dis
cuss
ion
0.0
00274
0.0
00000
0.21
9109
0.0
82389
0.1
57583
0.13
9906
PR
OP
-IN
-10-
Ddis
cuss
ion
0.0
00000
0.0
00096
0.22
4926
0.0
83375
0.1
30134
0.16
7204
PR
OP
-OU
T-1
-Ddis
cuss
ion
0.0
00737
0.02
0970
0.16
6423
0.1
04263
0.14
7759
0.15
0384
PR
OP
-OU
T-2
0-N
dis
cuss
ion
0.0
00039
0.0
00000
0.24
0412
0.0
74278
0.1
26909
0.13
9861
PR
OP
-IN
-3-N
churn
0.0
01035
0.04
3624
0.11
1714
0.0
98234
0.19
8409
0.15
9015
PR
OP
-IN
-20-
Ddis
cuss
ion
0.0
00000
0.0
00070
0.24
4104
0.0
68123
0.1
25535
0.13
5649
PR
OP
-OU
T-5
-Nac
tion
s0.
5723
620.0
01008
0.07
5781
0.11
4235
0.0
89123
0.13
0535
PR
OP
-OU
T-1
-Ndis
cuss
ion
0.0
00737
0.02
0970
0.16
6423
0.0
88898
0.14
2376
0.15
0384
PR
OP
-OU
T-2
0-N
acti
ons
0.70
3452
0.0
01969
0.16
1903
0.06
7607
0.0
72738
0.14
1149
PR
OP
-OU
T-3
-Nac
tion
s0.
6339
970.0
03344
0.22
9549
0.03
6049
0.0
68901
0.14
3187
LO
Cdis
cuss
ion
0.0
03094
0.07
3916
0.80
5239
0.0
27155
0.05
2623
0.10
7775
30
8 Discussion
In this section, I discuss the result, threats to validity and possible futurework.
8.1 Research questions
Based on the analysis results presented in the previous sections, I can nowanswer the research questions proposed in Section 1:
1. What metrics should be considered when predicting maintenance costs?Previous work shows that the 7 metrics from the C&K metric suite,are good predictors of maintenance effort. Propagation cost metricsare also shown to be promising predictors.
2. Which measures can effectively approximate maintenance effort?Discussions, actions and churn data all appear to be valid proxy mea-sures for effort, in the sense that they are correlated with complexitymeasures that have been shown to be strongly correlated with effort inother research.
3. Which complexity metrics are significantly correlated to which mainte-nance effort type?Propagation cost, RFC, CBO, WMC and LOC show significant correla-tions to maintenance effort in the form of discussions. Various versionsof the propagation cost metric also show correlation actions and churns.One version of the propagation cost metric shows correlation to faults.
8.2 Comparison to previous work
Despite the different approach in correlating metrics to effort data, and thedifferent representation of maintenance effort, my results are in many wayscomparable to results in previous work. Many of the C&K metrics scoresignificantly on maintenance effort, and I have also found significant resultsfor many variations of the propagation cost metric. Coupling and cohesionbased metrics score in both my research and in previous research the highestcorrelations.
In previous work, there is a great variation in whether or not inheri-tance and cohesion based metrics are correlated to maintenance effort. Inmy research no metrics that are based on either of these principles scoresignificantly.
31
An interesting result is that there is only 1 significant correlation betweensource metrics and fault rates. This is a strong contradiction with previousresearch, which consistently reported correlations between metrics and faultrates. This could be caused by the fact that we correlate the change in sourcemetrics to change in fault rates. It is possible that developers introduce bugsduring restructuring of code, which cancel out the positive effects of therefactoring. I have some suggestions for future work in section 8.4.
8.3 Threats to validity
Here I discuss the threats to the conclusions that exist in this research. Ihave split this section in three parts: construct validity, internal validity andexternal validity.
Construct validity Because developers in open source projects do notlog their effort, I had to employ a set of measures that I believe togetherapproximate true maintenance effort. This is a hypothesis impossible to testwithin my existing research framework. This research would therefore bemore accurate, and my hypothesized maintenance cost measures could bevalidated, if future research would be conducted using this framework thatcollects actual effort hours.
Internal validity One assumption I made is that all bugs and changerequests that were resolved between one release and the next are attributableto the latest release. However, it is possible that a few of the bugs or changerequests actually applied to an older release that was still being maintained.These bugs or change requests are described as backports by Bachmann etal. [3].
Another threat to validity is that, because I start from the bug or changerequest, and from there look at the effort, I miss commits that are not linkedto issues in the bug tracker in my analysis.
To make it possible to use class metrics in our file based approach, Ialso ignored files that had more than one class defined in them. However, Ishowed in section 4.2.1 that only 7% of the files are excluded for this reason.
External validity A threat to the external validity is that I only re-searched open source projects. While I expect that similar results will beobtained in industry projects, I cannot guarantee it.
Furthermore, I have only investigated 3 projects and I only investigateda subset of the possible metrics that I could have considered. For example
32
I could have considered propagation cost metrics with different path lengthsand different decay factors. A larger study employing more projects andmore metric types would improve the validity of my conclusions.
8.4 Future work
In future work, this research framework could be expanded with data frommore projects. This will give a more solid base to do data analysis on. An-other possibility is focussing on collecting data from non open source projects,specifically projects with logged hours for maintenance work. These loggedhours would also make it possible to verify my current assumptions about theused maintenance measures. It would also be interesting to measure moremaintenance effort proxies in real industrial settings when efforts in terms oflogged hours are not available. It is possible that maintenance effort measuresare different in industrial settings.
As mentioned above, there are many extraneous project factors such asthe inherent difficult of bugs and issues, and the inherent skill of the devel-opers that add noise to the data. Further study that controls for some ofthese noise factors is thus called for. A solution here would be to categorizechange requests into groups of small, medium, and large requests. This way,it would be possible to see what the influence of source code metrics is onthe different sizes of changes, and eliminate the noise that this variation inchange size adds to the data.
Another point for further research to investigate why fault rates as main-tenance measure yields less significant results in our research. It would beinteresting to look at long-term reduction (over multiple releases) in faultrates after refactoring, and the possible introduction of faults during refac-toring effort (in the release after the refactoring) to see if that negativelyinfluenced our results.
Finally, if there is a balanced data set that contains more maintenancecost measures and both open source and industrial projects, the next stepwould be to investigate polynomial models that can predict maintenance costbased on predicted values of source code metrics.
33
9 Conclusion
In this thesis, I introduced a new way of examining the relationship betweensource code metrics and maintenance effort (and hence cost) that is focusedon supporting modularity decisions. I do this by investigating whether andhow variations in source code metrics correlate with variations in mainte-nance effort. I also introduced a new perspective of maintenance effort whichconsiders non-code-based measurements, including the amount of discussionand actions to resolve issues. Using this new approach, I conducted a researchon the correlation between source code metrics and maintenance effort in 3open source projects, in which I identified a set of source code metrics thathave most impact on maintenance effort.
The framework proposed in this paper has the potential to help predictfuture maintenance cost based on changes in source code metrics. The main-tenance effort measures proposed in this paper push a step further to linkwith actually costs in terms of time and money, and fit in a long term visionof explicitly estimating the value of costly maintenance activities, such asrefactoring, that are currently hard to justify.
A paper has been extracted from this work, and at the moment of thesiscompletion it is under submission at the 28th International Conference onSoftware Maintenance.
34
Acknowledgment
I would like to thank my external advisor, Rick Kazman, for all the construc-tive feedback on my research and writing.
I would also like to thank the research group: Yuanfang Cai, PeppoValetto, Phil Maconi and Carlos Andrade for their support and suggestions,with a special thanks to Kevin Cheung and Lu Xiao for assisting in the dataextraction process.
I would also like to thank Fetsje Bijma of the Department of Mathematicsat the VU University Amsterdam for her advice on the data analysis in thisresearch.
35
References
[1] M. Alshayeb and W. Li. An empirical validation of object-orientedmetrics in two different iterative software processes. IEEE Transactionson Software Engineering, 29(11):1043 – 1049, nov. 2003.
[2] E. Arisholm. Empirical assessment of the impact of structural proper-ties on the changeability of object-oriented software. Information andSoftware Technology, 48(11):1046 – 1055, 2006.
[3] A. Bachmann, C. Bird, F. Rahman, P. Devanbu, and A. Bernstein.The missing links: bugs and bug-fix commits. In Proceedings of theeighteenth ACM SIGSOFT international symposium on Foundations ofsoftware engineering, FSE ’10, pages 97–106, New York, NY, USA, 2010.ACM.
[4] V. Basili, L. Briand, and W. Melo. A validation of object-orienteddesign metrics as quality indicators. IEEE Transactions on SoftwareEngineering, 22(10):751 –761, oct 1996.
[5] Y. Benjamini and Y. Hochberg. Controlling the false discovery rate:A practical and powerful approach to multiple testing. Journal of theRoyal Statistical Society. Series B (Methodological), 57(1):pp. 289–300,1995.
[6] A. B. Binkley and S. R. Schach. Inheritance-based metrics for predictingmaintenance effort: An empirical study. Technical Report TR 9705,Computer Science Department, Vanderbilt University, 1997.
[7] B. W. Boehm, Clark, Horowitz, Brown, Reifer, Chulani, R. Madachy,and B. Steece. Software Cost Estimation with Cocomo II with Cdrom.Prentice Hall PTR, Upper Saddle River, NJ, USA, 1st edition, 2000.
[8] L. Briand, P. Devanbu, and W. Melo. An investigation into couplingmeasures for c++. In Software Engineering, 1997., Proceedings of the1997 (19th) International Conference on, pages 412 –421, may 1997.
[9] L. Briand, J. Wust, and H. Lounis. Using coupling measurement forimpact analysis in object-oriented systems. In Software Maintenance,1999. (ICSM ’99) Proceedings. IEEE International Conference on, pages475 –482, 1999.
[10] L. C. Briand and J. Wust. Empirical studies of quality models in object-oriented systems. volume 56 of Advances in Computers, pages 97 – 166.Elsevier, 2002.
36
[11] L. C. Briand, J. Wst, J. W. Daly, and D. V. Porter. Exploring therelationships between design measures and software quality in object-oriented systems. Journal of Systems and Software, 51(3):245 – 273,2000.
[12] J. Brooks, F.P. The Mythical Man-Month, Essays on Software Engi-neering. Addison-Wesley, Reading, MA, 1975.
[13] M. Busch. Twitter’s new search architecture. http://engineering.
twitter.com/2010/10/twitters-new-search-architecture.html,October 2010.
[14] J. Carriere, R. Kazman, and I. Ozkaya. A cost-benefit framework formaking architectural decisions in a business context. In Software Engi-neering, 2010 ACM/IEEE 32nd International Conference on, volume 2,pages 149 –157, may 2010.
[15] M. Cartwright and M. Shepperd. An empirical investigation of an object-oriented software system. IEEE Transactions on Software Engineering,26(8):786 –796, aug 2000.
[16] S. Chidamber and C. Kemerer. A metrics suite for object oriented de-sign. IEEE Transactions on Software Engineering, 20(6):476 –493, jun1994.
[17] S. Demeyer and S. Ducasse. Metrics, do they really help. In Proc. Lan-gages et Modeles a Objets LMO (LMO’99), pages 69–82, Paris, France,1999.
[18] J. Dolado. On the problem of the software cost function. Informationand Software Technology, 43(1):61 – 72, 2001.
[19] K. El-Emam, S. Benlarbi, N. Goel, and S. Ray. A validation of object-oriented metrics. Technical Report NRC/ERB-1063, National ResearchCouncil Canada, MA, Oct. 1999.
[20] K. E. Emam, W. Melo, and J. C. Machado. The prediction of faultyclasses using object-oriented design metrics. Journal of Systems andSoftware, 56(1):63 – 75, 2001.
[21] E. H. Ferneley. Design metrics as an aid to software maintenance: Anempirical study. Journal of Software Maintenance: Research and Prac-tice, 11(1):55–72, 1999.
37
[22] A. Foundation. Derby history. http://db.apache.org/derby/derby_
charter.html#Derby+History, February 2006.
[23] M. Fowler. Technical debt. http://www.martinfowler.com/bliki/
TechnicalDebt.html, February 2009.
[24] T. Gyimothy, R. Ferenc, and I. Siket. Empirical validation of object-oriented metrics on open source software for fault prediction. IEEETransactions on Software Engineering, 31(10):897 – 910, oct. 2005.
[25] M. H. Halstead. Elements of Software Science (Operating and program-ming systems series). Elsevier Science Inc., New York, NY, USA, 1977.
[26] R. Harrison, S. J. Counsell, and R. V. Nithi. An investigation into theapplicability and validity ofobject-oriented design metrics. EmpiricalSoftw. Engg., 3(3):255–273, Sept. 1998.
[27] F. Heemstra. Software cost estimation. Information and Software Tech-nology, 34(10):627–639, 1992.
[28] M. LaMonica. Apache derby graduates with sun onboard. http:
//news.cnet.com/Apache-Derby-graduates-with-Sun-onboard/
2100-7344_3-5818473.html, August 2005.
[29] A. L. Lederer and J. Prasad. Causes of inaccurate software developmentcost estimates. Journal of Systems and Software, 31(2):125 – 134, 1995.
[30] R. Leitch and E. Stroulia. Assessing the maintainability benefits ofdesign restructuring using dependency analysis. In Software MetricsSymposium, 2003. Proceedings. Ninth International, pages 309 – 322,sept. 2003.
[31] W. Li and S. Henry. Object-oriented metrics that predict maintainabil-ity. Journal of Systems and Software, 23(2):111 – 122, 1993.
[32] B. P. Lientz. Issues in software maintenance. ACM Comput. Surv.,15:271–278, September 1983.
[33] A. MacCormack, J. Rusnak, and C. Y. Baldwin. Exploring the structureof complex software designs: An empirical study of open source andproprietary code. Manage. Sci., 52:1015–1030, July 2006.
[34] T. McCabe. A complexity measure. IEEE Transactions on SoftwareEngineering, SE-2(4):308 – 320, dec. 1976.
38
[35] K. Z. Michalis Xenos, D. Stavrinoudis and D. Christodoulakis. Object-oriented metrics - a survey. In Proceedings of the FESMA 2000, Feder-ation of European Software Measurement Associations, Madrid, Spain,2000.
[36] S. C. Misra. Modeling design/coding factors that drive maintainabil-ity of software systems. Software Quality Control, 13(3):297–320, Sept.2005.
[37] K. Molokken and M. Jorgensen. A review of software surveys on softwareeffort estimation. In Empirical Software Engineering, 2003. ISESE 2003.Proceedings. 2003 International Symposium on, pages 223 – 230, sept.-1oct. 2003.
[38] H. M. Olague, L. H. Etzkorn, S. L. Messimer, and H. S. Delugach.An empirical validation of object-oriented class complexity metrics andtheir ability to predict error-prone classes in highly iterative, or agile,software: a case study. Journal of Software Maintenance and Evolution:Research and Practice, 20(3):171–197, 2008.
[39] D. Parnas. Software aging. In Software Engineering, 1994. Proceedings.ICSE-16., 16th International Conference on, pages 279 –287, may 1994.
[40] E. Raymond. The cathedral and the bazaar. Knowledge, Technology &Policy, 12:23–49, 1999. 10.1007/s12130-999-1026-0.
[41] M. Riaz, E. Mendes, and E. Tempero. A systematic review of softwaremaintainability prediction and metrics. In Empirical Software Engineer-ing and Measurement, 2009. ESEM 2009. 3rd International Symposiumon, pages 367 –377, oct. 2009.
[42] D. Spinellis. ckjm – chidamber and kemerer java metrics. http://www.spinellis.gr/sw/ckjm/, May 2010.
[43] R. Subramanyam and M. Krishnan. Empirical analysis of ck metricsfor object-oriented design complexity: implications for software defects.IEEE Transactions on Software Engineering, 29(4):297 – 310, april 2003.
[44] M.-H. Tang, M.-H. Kao, and M.-H. Chen. An empirical study on object-oriented metrics. In Software Metrics Symposium, 1999. Proceedings.Sixth International, pages 242 –249, 1999.
[45] A. Trifu and R. Marinescu. Diagnosing design problems in object ori-ented systems. In Reverse Engineering, 12th Working Conference on,page 10 pp., nov. 2005.
39
[46] M. P. Ware, F. G. Wilkie, and M. Shapcott. The application of productmeasures in directing software maintenance activity. J. Softw. Maint.Evol., 19(2):133–154, Mar. 2007.
[47] K. D. Welker, P. W. Oman, and G. G. Atkinson. Development andapplication of an automated source code maintainability index. Journalof Software Maintenance: Research and Practice, 9(3):127–159, 1997.
[48] S. Wong, Y. Cai, M. Kim, and M. Dalton. Detecting software modularityviolations. In Software Engineering (ICSE), 2011 33rd InternationalConference on, pages 411 –420, may 2011.
[49] H. Yang and E. Tempero. Measuring the strength of indirect coupling. InSoftware Engineering Conference, 2007. ASWEC 2007. 18th Australian,pages 319 –328, april 2007.
[50] Y. Zhou and B. Xu. Predicting the maintainability of open source soft-ware using design metrics. Wuhan University Journal of Natural Sci-ences, 13:14–20, 2008.
[51] Y. Zhou, B. Xu, and H. Leung. On the ability of complexity metricsto predict fault-prone classes in object-oriented systems. Journal ofSystems and Software, 83(4):660 – 674, 2010.
40
AA
pach
epro
ject
overv
iew
Tab
le5:
Ap
ach
ep
roje
cts
Nam
eT
ota
lis
sues
Rel
ease
sF
irst
rele
ase
Last
rele
ase
Tec
hn
iqu
eD
escr
ipti
on
Harm
ony
6500
15
2007/12
2010/09
C+
+/Java
AJava
Ru
nti
me
Envir
on
me
Had
oop
Com
mon
6000
59
2006/04
2010/08
Java
Th
eH
ad
oop
com
mon
lib
rary
Ger
on
imo
6000
23
2006/02
2011/11
Java
J2E
EA
pp
lica
tion
serv
erD
erby
5300
19
2004/08
2011/04
Java
Rel
ati
on
al
Data
base
Axis
25000
16
2006/05
2011/08
Java
Web
Ser
vic
esfr
am
ework
HB
ase
4500
27
2008/02
2011/08
Java
Big
data
stora
ge
Cam
el4500
12
2009/12
2011/09
Java
Mes
sage
rou
tin
gfr
am
ework
OF
Biz
4400
72010/04
2011/05
Java
Op
enso
urc
eau
tom
ati
on
pack
age
Wic
ket
4000
52011/08
2011/10
Java
Java
Web
ap
ps
fram
ework
Tu
scany
3900
25
2006/05
2011/08
Java/C
++
SO
Afr
am
ework
CX
F3800
14
2009/03
2011/10
Java
SO
Afr
am
ework
Str
uts
23600
52009/11
2011/09
Java
Ente
rpri
sew
ebap
ps
fram
ework
Lu
cen
e3500
23
2006/02
2011/09
Java
Fu
ll-t
ext
searc
hlib
rary
QP
ID3500
13
2006/12
2011/08
C+
+M
essa
ge
qu
euei
ng
fram
ework
Act
ive
MQ
3400
18
2006/06
2011/04
Java
Mes
sage
qu
euei
ng
fram
ework
Cass
an
dra
3300
44
2009/07
2011/10
Java
Dis
trib
ute
dst
ora
ge
Jack
Rab
bit
3000
72
2006/02
2011/10
Java
Conte
nt
rep
osi
tory
MyF
ace
s3000
39
2009/03
2011/09
Java
Web
ap
ps
use
rin
terf
ace
pla
tform
Fel
ix3000
34
2007/07
2011/09
Java
Ser
vic
ep
latf
orm
Str
uts
12900
21
2007/07
2009/12
Java
Ente
rpri
sew
ebap
ps
fram
ework
SO
LR
2900
92006/12
2011/09
Java
Sea
rch
serv
erb
ase
don
Lu
cen
eA
xis
2700
52007/05
2011/10
Java
Web
serv
ices
fram
ework
Tap
estr
y2500
19
2006/06
2008/09
Java
Java
web
ap
ps
fram
ework
Xala
nJ2
2500
11
2001/02
2007/12
Java
XS
LT
an
dX
Path
imp
lem
enta
tion
HIV
E2500
72009/04
07/2011
Java
Data
ware
hou
seb
uilt
on
Had
oop
Slin
g2200
13
2008/06
2011/08
Java
Web
ap
plica
tion
sfr
am
ework
PIG
2200
11
2007/10
2011/07
Java
Pla
tform
for
an
aly
zin
gla
rge
data
sets
UIM
A2200
52008/08
2011/08
Java/C
++
An
aly
sis
tool
for
un
stru
ctu
red
conte
nt
Coco
on
2200
42006/04
2008/04
Java
Web
dev
elop
men
tfr
am
ework
MyF
ace
sT
rin
idad
2100
12
2007/02
2011/04
Java
Exte
nsi
on
on
the
MyF
ace
sfr
am
ework
Had
oop
Map
/R
edu
ce2100
62009/11
2010/08
Java
Dis
trib
ute
dp
ipel
ine
fram
ework
Roller
2000
40
2005/10
2011/05
Java
Blo
gse
rver
Xer
ces-
C+
+2000
24
2008/06
2011/06
C+
+X
ML
pars
er
41
Tab
le5:
Ap
ach
ep
roje
cts
Nam
eT
ota
lis
sues
Rel
ease
sF
irst
rele
ase
Last
rele
ase
Tec
hn
iqu
eD
escr
ipti
on
Op
enJP
A2000
19
2007/08
2011/07
Java
Java
per
sist
ence
imp
lem
enta
tion
Tap
estr
y5
1700
19
2008/09
2011/03
Java
Java
web
ap
ps
fram
ework
Ser
vic
eMix
1650
29
2005/11
2011/05
Java
Ente
rpri
sese
rvic
eb
roker
Dir
ecto
ryA
pach
eDS
1600
18
2006/09
2011/08
Java
LD
AP
serv
erO
pen
EJB
1600
10
2007/09
2011/10
Java
EJB
imp
lem
enta
tion
Cayen
ne
1600
62010/05
2011/09
Java
Ob
ject
/R
elati
on
al
Map
pin
gfr
am
ework
Xer
ces-
J1500
25
2000/12
2010/11
Java
XM
Lp
ars
erA
xis
2C
1500
17
2006/02
2009/04
CW
ebse
rvic
esfr
am
ework
MyF
ace
sT
om
ah
aw
k1500
10
2005/10
2011/07
Java
Exte
nsi
on
of
the
MyF
ace
sfr
am
ework
Sh
ind
ig1500
72009/07
2010/12
Java/P
HP
Op
enS
oci
al
conta
iner
Th
rift
1400
82010/04
2011/08
Mu
ltip
leF
ram
ework
for
serv
ices
dev
elop
men
tJam
esS
erver
1300
72004/06
2011/08
Java
Java
ente
rpri
sem
ail
serv
erH
ad
oop
HD
FS
1300
62009/11
2010/08
Java
Dis
trib
ute
dfi
lesy
stem
Jet
Sp
eed
21250
11
2004/10
2011/10
Java
Ente
rpri
sep
ort
al
pla
tform
Cou
chD
B1200
13
2009/07
2011/07
Erl
an
gD
ocu
men
td
ata
base
Ivy
1200
12
2005/04
2010/10
Java
Dep
end
ency
man
ager
ZooK
eep
er1200
12
2008/10
2011/02
Java
Ser
ver
for
dis
trib
ute
dco
ord
inati
on
Forr
est
1200
10
2004/10
2011/02
Java
Pu
blish
ing
fram
ework
Bee
Hiv
e1150
62004/11
2006/12
Java
Ob
ject
mod
elon
J2E
Ean
dS
tru
tsP
DF
Box
1100
82010/02
2011/07
Java
PD
Fto
ol
Axis
-C+
+1100
72005/06
2007/01
C+
+W
ebS
ervic
esfr
am
ework
Htt
pC
om
pon
ents
Htt
pC
lien
t1100
32007/08
2011/08
Java
Clien
t-si
de
HT
TP
imp
lem
enta
tion
MyF
ace
sT
ob
ago
1000
18
2010/01
2011/10
Java
Exte
nsi
on
of
the
MyF
ace
sfr
am
ework
OD
E1000
10
2007/05
2011/02
Java
Bu
sin
ess
pro
cess
exec
uti
on
engin
eC
++
Sta
nd
ard
Lib
rary
1000
82005/07
2009/05
C+
+IS
O/IE
C14882
C+
+st
an
dard
imp
lem
enta
tion
Ser
vic
eMix
Com
pon
ents
1000
62007/09
2011/02
Java
Com
pon
ents
for
the
Ser
vic
eMix
serv
ice
bu
sT
raffi
cse
rver
950
15
2010/05
2011/08
C+
+P
roxy
serv
erso
ftw
are
Kara
f940
13
2010/07
2011/10
Java
Ser
vic
ep
latf
orm
Ser
vic
eMix
4940
11
2008/04
2011/05
Java
Ente
rpri
sese
rvic
eb
us
AV
RO
900
14
2009/07
2011/09
Mu
ltip
leD
ata
seri
aliza
tion
syst
emM
ah
ou
t850
62009/04
2011/05
Java
Mach
ine
learn
ing
imp
lem
enta
tion
sG
eron
imo
dev
tools
800
18
2006/02
2011/11
Java
Set
of
dev
elop
men
tto
ols
for
Ger
on
imo
Piv
ot
800
82009/04
2011/01
Java
Web
dep
loym
ent
pla
tform
Flu
me
800
62010/06
2011/08
Java
Log
data
an
aly
sis
pla
tform
iBati
sfo
rJava
750
19
2004/11
2010/04
Java
Data
map
per
fram
ework
Com
mon
sla
ng
750
11
2002/10
2009/08
Java
Hel
per
uti
liti
esfo
rth
eja
va.lan
gA
PI
42
Tab
le5:
Ap
ach
ep
roje
cts
Nam
eT
ota
lis
sues
Rel
ease
sF
irst
rele
ase
Last
rele
ase
Tec
hn
iqu
eD
escr
ipti
on
Dir
ecto
ryst
ud
io720
72007/02
2010/04
Java
LD
AP
toolin
gp
latf
orm
Jsp
Wik
i700
10
2007/12
2010/08
Java
Wik
iWik
icl
on
eJD
O700
72006/01
2010/07
Java
Java
per
sist
ence
imp
lem
enta
tion
Syn
ap
se700
52007/11
2011/09
Java
Ente
rpri
sese
rvic
eb
us
Xala
nC
700
32005/08
2009/06
C+
+X
SLT
an
dX
Path
imp
lem
enta
tion
Com
mon
sm
ath
690
62004/12
2011/03
Java
Math
emati
csan
dst
ati
stic
sco
mp
on
ents
Tik
a620
10
2007/12
2011/09
Java
Conte
nt
an
aly
sis
toolk
itP
luto
600
15
2005/12
2011/09
Java
Java
Port
let
imp
lem
enta
tion
Bu
ild
r600
14
2007/11
2011/06
Ru
by
Java
app
lica
tion
sb
uild
syst
emO
pen
Web
Bea
ns
600
11
2008/12
2011/08
Java
Conte
xt
an
dd
epen
den
cyin
ject
ion
fram
ework
Chu
kw
a600
42009/04
2010/04
Java
Dis
trib
ute
dsy
stem
sm
on
itori
ng
syst
emJet
Sp
eed
600
42004/07
2005/05
Java
Use
rcu
stom
izab
lep
ort
al
syst
emN
Pan
Day
570
92008/12
2011/05
.NE
T/Java
Inte
gra
tion
of
Ap
ach
eM
aven
for
.NE
TV
CL
550
32009/12
2011/04
Mu
ltip
leV
irtu
al
mach
ines
man
agem
ent
fram
ework
Ju
dd
i530
20
2004/06
2011/10
Java
UD
DI
v3
imp
lem
enta
tion
Tiles
520
15
2007/02
2011/02
Java
Web
inte
rface
tem
pla
tefr
am
ework
Sh
ale
500
52006/10
2008/05
Java
Web
ap
plica
tion
fram
ework
Xm
lBea
ns
470
10
2004/06
2009/12
Java
XM
Lto
Java
bin
din
gC
hem
istr
y450
72011/3
2011/09
Java
Conte
nt
man
agem
ent
inte
rop
erab
ilit
yse
rvic
eC
om
mon
sn
et425
12
2003/02
2011/06
Java
Clien
tsi
de
pro
toco
lim
ple
men
tati
on
sC
om
mon
sB
eanu
tils
400
12
2001/06
2010/03
Java
Ob
ject
acc
essi
bilit
yw
rap
per
Axio
m400
12
2006/11
2011/07
Java
XM
Lob
ject
mod
elW
hir
r400
62010/09
2011/08
Java
Lib
rari
esfo
rru
nn
ing
clou
dse
rvic
esA
ctiv
eM
QC
++
clie
nt
390
28
2006/11
2011/04
C+
+A
ctiv
eMQ
C+
+cl
ient
Xw
ork
380
72004/11
2011/04
Java
Com
man
d/in
terc
epto
rfr
am
ework
Com
mon
sC
on
figu
rati
on
370
82004/10
2011/09
Java
Gen
eric
con
figu
rati
on
inte
rface
Com
mon
sC
ollec
tion
s370
72001/07
2006/05
Java
Data
stru
ctu
res
lib
rary
ES
ME
370
42010/03
2011/08
Sca
laE
nte
rpri
seM
essa
gin
gE
nvir
on
men
tC
om
mon
sd
bcp
365
72002/08
2010/02
Java
Data
base
con
nec
tion
pools
lib
rary
Act
ive
MQ
.Net
340
72009/07
2011/06
.NE
TA
ctiv
eMQ
.NE
Tcl
ient
WS
S4J
320
13
2008/06
2010/10
Java
Web
serv
ices
secu
rity
Jam
esIM
AP
315
42010/09
2011/06
Java
Imap
lib
rary
Wookie
250
42009/06
2011/10
Java
Wid
get
sse
rver
43