SQO-OSS D 2 final

Software Quality Observatory for Open Source Software

Project Number: IST-2005-33331

D2 - Overview of the state of the artDeliverable Report

Work Package Number: 1Work Package Title: Requirements Definition and AnalysisDeliverable Number: 2Coordinator: AUTHContributors: AUTH,AUEB,KDEDue Date: 22nd January 2007Delivery Date: 22nd January 2007Availability: RestrictedDocument Id: SQO-OSS_D_2

D2 / IST-2005-33331 SQO-OSS 22nd January 2007

Executive Summary

Chapter one presents the most important and widely used metrics in software engi-neering for quality evaluation. The area of software engineering metrics is alwaysunder study; researchers continue to validate the metrics. The metrics presentedwere selected after studying software engineering literature, yielding only thosemetrics that are widely accepted. We must stress that we have not presented anymodels for evaluating quality, only metrics that can be used for quality evaluation.Quality evaluation models will be presented in the appropriate deliverable.

The metrics presented are categorised according to an accepted taxonomy amongresearchers into three sections: process metrics, product metrics and resources met-rics. We have also included a section for metrics specific for Open Source softwaredevelopment. The presentation of the metrics is brief, allowing for a straightforwardapplication and tool development. We have included both metrics that are consid-ered classic (e.g. program length and McCabe’s cyclomatic complexity) and modernmetrics (e.g. the Chidamber and Kemerer metrics suite and object oriented designheuristics). While we present some metrics for Open Source software development,this topic will be presented at length elsewhere.

Chapter two presents tools for acquiring metrics presented in chapter one. Thetools presented are both Open Source and proprietary. There are a lot of metricstools available and we tried to present a representative sample of them. Specificallywe present those tools that are going to be useful for our own system and there is apotential to include them in our system (especially the Open Source ones). We triedto install and test each tool ourselves. For each tool we present its functionalityand include also some screenshots of it. Although we tried to include all possibletools that might be helpful to our project, future work will accomodate such tools asbecome available.

Chapter three introduces empirical Open Source Software studies from manyviewpoints. The first part details the historical perspectives of the evolution of fivepopular Open Source Software systems (Linux, Apache, Mozilla, GNOME, and theFreeBSD). This is followed by horizontal studies in which researchers examining sev-eral projects collectively. A model for the simulation of the evolution of Open SourceSoftware projects and results from early studies is also presented. The evolution ofOpen Source Software projects is directly linked with the evolution of the code andcommunities around the project. Thus, the forth viewpoint in this chapter considerscode quality studies of Open Source Software by applying evolution laws of OpenSource software development to study how code evolves and how this evolution af-fects the quality of the software. The chapter concludes with community studiesin mailing lists, in which a research methodology for the extraction and analysis ofcommunity activities in mailing lists is proposed.

Chapter four introduces the concept of data mining and its significance in the

Revision: final 1


context of software engineering. A large amount of data is produced in software de-velopment that software organizations collect in hope of better understanding theirprocesses and products. Specifically, the data in software development can referto versions of programs, execution traces, error or bug reports and Open Sourcepackages. As well, mailing lists, discussion forums and newsletters could provideuseful information about software. This data is widely believed that hides signif-icant knowledge about software projects’ performance and quality. Data miningprovides the techniques (clustering, classification and association rules) to analyzeand extract novel, interesting patterns from software engineering databases. In thischapter we review the data mining approaches that have currently been proposed,aiming to assist with some of the main software engineering tasks.

Since software engineering repositories consists of text documents (e.g. mailinglists, bug reports, execution logs), the mining of textual artifacts is requisite formany important activities in software engineering: tracing of requirements, retrievalof components from a repository, identification and prediction of software failures,etc. We present the state-of-the-art of the text mining techniques applied in softwareengineering, providing also a comparative study for them. We conclude by brieflydiscussing further work directions of Data/Text Mining in software engineering.

Revision: final 2


Document Information

Deliverable Number: 2Due Date: 22nd January 2007Deliverable Date: 22nd January 2007

Approvals

Name Organisation Date

Coordinator GeorgiosGousios

AUEB/SENSE 10/09/2006

Technical Coordinator Ioannis Samo-ladas

AUTH/PLaSE

WP leader Ioannis Antoni-ades

AUTH/PLaSE

Quality Reviewer 1Quality Reviewer 2Quality Reviewer 3

Revisions

Revision Date Modification Authors

0.1 05/10/2006 Initial version AUTH

Revision: final 3


Contents

1 Software Metrics and Measurement 71.1 Software Metrics Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Process Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.2.1 Structure Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131.2.2 Design Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.2.3 Product Quality Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.3 Productivity Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.4 Open Source Development Metrics . . . . . . . . . . . . . . . . . . . . . . 221.5 Software Metrics Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1.5.1 Validation of prediction measurement . . . . . . . . . . . . . . . . . 251.5.2 Validation of measures . . . . . . . . . . . . . . . . . . . . . . . . . 26

2 Tools 272.1 Process Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.1.1 CVSAnalY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.1.2 GlueTheos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302.1.3 MailingListStats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.2 Metrics Collection Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.1 ckjm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.2 The Byte Code Metric Library . . . . . . . . . . . . . . . . . . . . . 332.2.3 C and C++ Code Counter . . . . . . . . . . . . . . . . . . . . . . . . 332.2.4 Software Metrics Plug-In for the Eclipse IDE . . . . . . . . . . . . 34

2.3 Static Analysis Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.3.1 FindBugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372.3.2 PMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.3.3 QJ-Pro . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.3.4 Bugle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

2.4 Hybrid Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.4.1 The Empirical Project Monitor . . . . . . . . . . . . . . . . . . . . . 392.4.2 HackyStat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.4.3 QSOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.5 Commercial Metrics Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.6 Process metrics tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

2.6.1 MetriFlame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.6.2 Estimate Professional . . . . . . . . . . . . . . . . . . . . . . . . . . 412.6.3 CostXpert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.6.4 ProjectConsole . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.6.5 CA-Estimacs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.6.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

2.7 Product metrics tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Revision: final 4


2.7.1 CT C++ -CMT++-CTB . . . . . . . . . . . . . . . . . . . . . . . . . . 472.7.2 Cantata++ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.7.3 TAU/Logiscope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.7.4 McCabe IQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.7.5 Rational Functional Tester (RFT) . . . . . . . . . . . . . . . . . . . 522.7.6 Safire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.7.7 Metrics 4C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.7.8 Resource Standard Metrics . . . . . . . . . . . . . . . . . . . . . . . 552.7.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3 Empirical OSS Studies 573.1 Evolutionary Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

3.1.1 Historical Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . 573.1.2 Linux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 583.1.3 Apache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643.1.4 Mozilla . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.1.5 GNOME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 673.1.6 FreeBSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 683.1.7 Other Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693.1.8 Simulation of the temporal evolution of OSS projects . . . . . . . . 72

3.2 Code Quality Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783.3 F/OSS Community Studies in Mailing Lists . . . . . . . . . . . . . . . . . 84

3.3.1 Introduction: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843.3.2 Mailing Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 843.3.3 Studying Community Participation in Mailing Lists: Research method-

ology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

4 Data Mining in Software Engineering 884.1 Introduction to Data Mining and Knowledge Discovery . . . . . . . . . . 88

4.1.1 Data Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . 884.2 Data mining application in software engineering: Overview . . . . . . . . 89

4.2.1 Using Data mining in software maintenance . . . . . . . . . . . . . 904.2.2 A Data Mining approach to automated software testing . . . . . . 102

4.3 Text Mining and Software Engineering . . . . . . . . . . . . . . . . . . . . 1054.3.1 Text Mining - The State of the Art . . . . . . . . . . . . . . . . . . . 1064.3.2 Text Mining Approaches in Software Engineering . . . . . . . . . . 108

4.4 Future Directions of Data/Text Mining Applications in Software Engi-neering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5 Related IST Projects 1135.1 CALIBRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1135.2 EDOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

Revision: final 5


5.3 FLOSSMETRICS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1205.4 FLOSSWORLD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1225.5 PYPY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.6 QUALIPSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.7 QUALOSS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1285.8 SELF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.9 TOSSAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131

Revision: final 6


1 Software Metrics and Measurement

As stated in the Description of Work, SQO-OSS aims to provide a a holistic approachon software assessment, initially targeted to open source software development.Specifically the main goals of the project are:

1. Evaluate the quality of Open Source software.

2. Evaluate the health of an Open Source software project.

These two main goals will be delivered through a plug-in based quality assessmentplatform. In order to achieve these goals, the project’s consortium has to answerspecific questions derive from those goals. Thus, for the goals presented the follow-ing have to be answered:

1. How can the quality of Open Source software be evaluated and improved?

• How is quality evaluated?

2. How can the health of an Open Source software project be evaluate?

• How is the health of a project evaluated?

These questions can be answered if we examine and measure both the process ofcreating Open Source software and the product itself, i.e. the code. Both entitiescan be measured with the help of software metrics. This section presents softwaremetrics and overview of how useful the metrics are for software evaluation.

1.1 Software Metrics Taxonomy

In this section we describe the various software metrics that exist in the area ofsoftware engineering and are going to be useful for our research. Furthermore,we refer to metrics specific to open source software development. These metricsare divided into categories. The chosen classification is widely used in the softwaremetrics literature [FP97].

• Process metrics are metrics that refer to the software development activitiesand processes. Measuring defects per testing hour, time, number of people,etc. falls under this category.

• Product metrics are metrics that refer to the the products of the software de-velopment process (e.g. code but also documents etc.).

• Resources metrics are metrics that refer to any input to the development pro-cess (e.g. people and methods).

Revision: final 7


Each one of these categories contains metrics that are further distinguished aseither internal or external metrics.

• Internal metrics of a product refers to the process or resource that can bepurely measured by examining the product, process or the resource on its own.

• External metrics of a product refers to the process or resource that can bemeasured only with respect to how the product, process or resource related toits environment (i.e. the behaviour).

Apart from the formal categories presented, we shall also include some metrics de-rived directly from the Open Source development process.

In the following sections the most important (to our own perspective) metricsshall be presented in each of the categories above. However, the metrics presentedhave been studied and used extensively in traditional closed source software devel-opment. In the end we present metrics for Open Source software that have appearedin the recent years, when researchers started studying Open Source software. Al-though these metrics can be classified according to the above taxonomy, we preferto present them separately.

1.2 Process Metrics

Defect Density: One of the most widely accepted metrics for software quality isDefect Density. This metric is expressed as the number of defects found per certainamount of software. This amount is usually counted as the number of lines of code ofthe delivered product (specific metrics regarding size are presented in the followingsections). Defect Density can simply be expressed thus:

Defect Density =Number Of Known Defects

LOC

Many researchers split the kind of defects into two categories: known defects, whichare the defects that have been discovered during testing (before the release of theproduct) and latent defects, which are the defects discovered after the release ofproduct [FP97]. For each one of these two categories, there is a separate defectdensity metric.

Defect density is considered to be a product metric and thus it should have beenpresented in the next section. However it is directly derived from the developmentprocess [FP97] (defect discovery through testing) so it is presented in this section.In addition, it is also a product metric, as it reflects the quality of the product, par-ticularly the defects found after product release.

Revision: final 8


Defect Removal Effectiveness: Defect Removal Effectiveness is a process metricand reflects the ability of the development team to remove defects [Kan03]. Themetric is defined as:

Defect Removal Effectiveness =Defects Removed in Development

Defects Removed + Defects Found Later∗ 100%

This is a very useful metric and can be applied at any phase of the software develop-ment process.

One other metric which can be derived from defect density is system spoilage[FP97], a metric rather useful for the effectiveness of the development team. Thismetric is defined as

System Spoilage =Time To Fix Post Release Defects

Total System Development Time

As mentioned, this metric reflects the ability of the development team to respond todefects found.

LOC: Code can be measured in several ways. The first and most common metricsin the area of software engineering is the number of lines of code (LOC). Althoughit may seem easy to measure the lines of the code of a computer program, there isa controversy about what we mean by LOC. Most researchers refer to LOC as theSource Lines Of Code (SLOC) which can either be physical SLOCs or Logical SLOCs.Specific definitions of these two measures vary in the sense that what is actuallymeasured is not explicit. One needs to consider whether what is measured refers toany one of the following:

• Blank lines.

• Comment lines.

• Data declarations.

• Lines that contain several separate declarations.

Logical SLOC measures attempt to measure the number of "statements". Itsdefinition will vary depending on the programming language. Since programminglanguages have language-specific syntax, the Logical SLOC definition for each lan-guage will be different. One simple logical SLOC measure for C-like languages is thenumber of statement-terminating semicolons. It is much easier to create tools thatmeasure physical SLOC, and physical SLOC definitions are easier to explain. How-ever, physical SLOC measures are sensitive to logically irrelevant formatting andstyle conventions, while logical SLOC is less sensitive to formatting and style con-ventions. Unfortunately, SLOC measures are often stated without providing their

Revision: final 9


definition, and logical SLOC can often be significantly different from physical SLOC.For the purpose of our research, a physical source line of code (SLOC) will be definedas:

... a line ending in a newline or end-of-file marker, and which contains atleast one non-whitespace non-comment character. Comment delimiters(characters other than newlines starting and ending a comment) are con-sidered comment characters. Data lines only including whitespace (e.g.,lines with only tabs and spaces in multiline strings) are not included.

Using the definition above we have to stress that this size metrics does not repre-sent the actual size of the source code of the program since it excluded the commentlines. Thus the total length of the program is represented as

Totallength(LOC ) = SLOC + Numberofcommentedlines .

The number of commented line is also a useful metric when we refer to other aspectsof software, e.g. documentation.

Halstead Software Science: Apart from counting lines of code there are alsoother kind of metrics that try to measure the length of a computer program. Oneof the earliest of such metrics was introduced by Halstead [Hal77] in the late ’70s.Halstead measures are based on four measures that are directly derived from thesource code:

• µ1 the number of distinct operators,

• µ2 the number of distinct operands,

• N1 the total number of operators,

• N2 the total number of operands.

Halstead further introduced some metrics based upon the previous measures. Thesemetrics are:

• The length N of a program N = N1 + N2,

• The vocabulary µ of a program µ = µ1 + µ2,

• The volume V of a program V = N ∗ log2µ,

• The difficulty D of a program D = µ1

2∗ N2

µ2.

Revision: final 10


In order for these metrics to be measured, one has to decide how to identify theoperators and operands. Halstead also used his metrics to estimate the length andthe effort for a given program. For more on Halstead estimations see [Hal77].

Halstead Software Science metrics have been criticised a lot during the years andthere are controversial opinions regarding them, especially for the volume, difficultyand the rest of estimation metrics. These opinions vary from “no correspondingconsensus” [FP97] to “strongest measures of maintainability” [OH94]. However thevalue of N as a program length, as well as the volume of a program, as proposedby Halstead, does not contradict any relations we have between a program and itslength. Thus, we choose to include Halstead metrics in our research [FP97].

Function Points: The previous size measures (1.2-1.2) count physical size: lines,operators and operands. Many researchers argue that this kind of measurementcould be misleading since it does not capture the notion of functionality, i.e. theamount of function inside the source code of a given program. Thus, they proposethe use of functionality metrics.

One of the first such metrics was proposed by Albrecht in 1977 and it was calledFunction Point Analysis (FPA) [Alb79] as a means of measuring size and productivity(and later on also complexity). It uses functional, logical entities such as inputs,outputs, and inquiries that tend to relate more closely to the functions performed bythe software as compared to other measures, such as lines of code. Function pointdefinition and measurement have evolved substantially; the International FunctionPoint User Group or IFPUG1, formed in 1986, actively exchanges information onfunction point analysis (FPA).

In order to compute Function Points (FP), one first need to compute UnadjustedFunction Point Count (UFC). To calculate this, first on further needs to calculate thefollowing:

• External inputs: Every input provided from the user (data and UI interactions)but not inquiries.

• External outputs: Every output to the user (i.e. reports and messages).

• External inquiries: Interactive inputs requiring a response.

• External files: Interfaces to other systems.

• Internal files: Files that the system uses for its purposes.

Next each item is assigned a subjective “complexity” rating on a 3-point ordinalscale:

• Simple.

1http://www.ifpug.com/

Revision: final 11

http://www.ifpug.com/


• Average.

• Complex.

Then a weight is assigned to the item according to some tables (e.g. for simpleexternal input this is 3 and for a complex external inquiry this is 6, the total numberof weights equals to 15). So, the UFC is calculated as

UFC =15∑i=1

{(Number Of Items Of Variety i) ∗ weighti}

Then we compute a technical complexity factor (TCF). In order to do this we rate 14factors(Fi, such as Reusability and Performance, from 0 to 5 (0 means irrelevant, 3average and 5 means it is essential to the system built) and then combine all this tothe following formula:

TCF = 0.65 + 0.0114∑i=1

Fi

The final calculation of the total FP of the system is

FP = UFC ∗ TCF

There is a very large user community for function points; IFPUG has more than1200 member companies, and they offer assistance in establishing a FPA program.The standard practices for counting and using function points can be found in theIFPUG Counting Practices Manual. Without some standardisation of how the func-tion points are enumerated and interpreted, consistent results can be difficult toobtain. Successful application seems to depend on establishing a consistent methodof counting function points and keeping records to establish baseline productivityfigures for specific systems. Function measures tend to be independent of language,coding style, and software architecture, but environmental factors such as the ratioof function points to source lines of code will vary, although there have been sometries to map LOCs to FPs [Jon95]. Some limitations of the function points includeproblems about the subjectivity of the TCF and other subjective measures used, theweights and other. Also, their application is rather time consuming and demandswell trained staff. Taking into account its limitations, the method can be rather use-ful as an estimator about size and other metrics that take size into account.

Object Oriented Size Metrics: In object oriented development, classes and meth-ods are the basic constructs. Thus, apart from the metrics presented above, in objectoriented technology we can use the number of the classes and methods as an aspectof size. These metrics are straightforward:

• Number of classes.

Revision: final 12


• Number of methods per class.

• LOC per class.

• LOC per method.

It is obvious that metrics from other sections also apply to object oriented develop-ment, but in relation to classes and objects (for example, for the complexity metricspresented later in this document, we have average complexity per class or method).

Reuse: With the term reuse we mean the amount of code which is reused in the fu-ture release of the software. Although it may sound simple, reuse cannot be countedin a straightforward manner, because it is difficult to define what we mean by codereuse. So there are different notions of reuse that take into account the extent ofreuse [FP97]: we have straight reuse (copy and paste of the code) and modifiedreuse (take a module and change the appropriate line in order to implement newfeatures). In addition, in object oriented programming, reuse extends to the reuseor inheritance of certain classes.

Reuse also affects size measurement of successive releases, if the present releaseof a software contains a large identical amount of code from the previous one, whatis its actual size? For example, IBM uses a metric called shipped source instructions(SSI) [Kan03] which is expressed as

SSI (current) = SSI (previous)

+ CSI (new and changed code for current release)

− deleted code− changed code

The final term adjusts for changed code which would otherwise be counted twice.This metric encapsulates reuse in its definition and it is rather useful.

1.2.1 Structure Metrics

Apart from size, there are other internal product attributes that are useful to soft-ware engineering measurement practice. Since the early stages of the science ofsoftware metrics, researchers pointed out a link between the structure of the prod-uct (i.e. the code) and certain quality aspects. These are called structural metricsand here we are going to present them. According to our belief these metrics aregoing to be useful for our research.

McCabe’s Complexity Metrics: One of the first and widely used complexity met-rics is McCabe’s Cyclomatic Complexity [McC76]. McCabe proposed that a pro-gram’s cyclomatic complexity can be measured by applying principles of graph the-ory. He represented the program structure as a graph G. So for a program with a

Revision: final 13


graph G, the cyclomatic complexity is

v(G) = e− n + 1

where e is the number of edges of G and n the number of nodes. In addition, McCabehas given some other definitions such as the cyclomatic number, which is

v(G) = e− n + 2

and the essential cyclomatic complexity, which is

ev(G) = v(G)−m

where m is the number of sub flowgraphs, else the number of connected componentsof the graph. In the literature there is also the definition:

v(G) = e− n + p

(where e is the number of edges, n the number of nodes and p the number of nodesthat are exit points — last instruction, exit, return etc.) So for the graph in Figure1.2.12, the cyclomatic complexity V (G) = 3.

Although the cyclomatic complexity metric was developed in the mid ’70s, it hasevolved and been calibrated during the years and it has become a mature, objectiveand useful metric to measure a program’s complexity. It is also considered to be agood maintainability metric.

The above metrics (LOC, McCabe’s Cyclomatic Complexity and Halstead’s Soft-ware Science) treat each module separately. The metrics below try to take intoaccount the interaction between the modules and quantify this interaction.

Coupling: The notion of coupling was introduced by three IBM researchers in1974. Stevens, Myers and Constantine proposed a metric that measures the qualityof a program’s design [SMC74]. Coupling between two modules of a piece of soft-ware is the degree of interaction between them. By combining coupling between allthe system’s modules, one can compute the whole system’s global coupling. Thereare no standard measures of coupling. However, there are six basic types of couplingthat are expressed as a relation between two modules x and y [FP97] (the relationsare listed from the least dependent to the most):

• No coupling relation: x and y have no communication and they are totallyindependent of each other.

• Data coupling relation: x and y communicate by parameters. This type ofcoupling is necessary for the communication of x and y.

2Courtesy of: http://www.dacs.dtic.mil/techs/baselines/complexity.html

Revision: final 14

http://www.dacs.dtic.mil/techs/baselines/complexity.html


Figure 1: A program’s flowchart. The cyclomatic complexity of this program V (G) is3.

• Stamp coupling relation: x and y accept the same record type (i.e. in databasesystems) as a parameter, which may cause interdependency between otherwisenot related modules.

• Control coupling relation: x passes a parameter to y with the intention of con-trolling its behaviour.

• Common coupling relation: x and y refer to the same global data. This type ofcoupling is the kind that we don’t want to have.

• Content coupling relation: x refers to the inside of y (i.e. that is it branchesinto, changes data in, or alters a statement in y.

From the above coupling relations the instance of common coupling has been ex-plored in the case of the Linux kernel in order to explore its maintainability [YSCO04].

Henry and Kafura’s Information Flow Complexity: Another complexity metricthat is common in software engineering measurement is Henry and Kafura’s Infor-mation Flow Complexity [HK76]. This metric is based on the information passingbetween the modules (or functions) of a program and particularly the fan in andfan out of a module. With the term fan in of a module m, we mean the number of

Revision: final 15


modules that call module m plus the number of data structures that are retrievedby m. With fan out we mean the number of modules that are called from m plus thenumber of data structures that are updated by m. The definition of the metric for amodule m is:

Information Flow Complexity(m) = length(m) · (Fan In(m) · Fan Out(m))2

Other researchers have proposed to omit the factor length and, thus simplify themetric.

Since its introduction, Henry and Kafura’s metric has been validated and con-nected with maintainability [FP97], [Kan03]. Modules with high information flowcomplexity tend to be error prone, while, on the other hand, low values of the metriccorrelate with fewer errors.

Object Oriented Complexity Metrics: With the rise of object oriented program-ming, software metrics researchers tried to figure out how to measure the complex-ity of such applications. One of the most widely used complexity metrics for objectoriented systems is the Chidamber and Kemerer’s metrics suite [CK76]:

• Metric 1: Weighted Methods per Class (WMC) WMC is the sum of the complexi-ties of the methods, whereas complexity is measured by cyclomatic complexity:

WMC =∑n

i=1 ci

where n is the number of methods and c i is the complexity of the i -th method.We have to stress here that measuring the complexity is difficult to implementbecause, due to inheritance, not all methods are assessable in the class hier-archy. Therefore, in empirical studies, WMC is just the number of methods ina class and the average of WMC is the average number of methods per class[Kan03].

• Metric 2: Depth of Inheritance Tree (DIT) This metric represents the length ofthe maximum path from the node to the root of the inheritance tree.

• Metric 3: Number of Children (NOC) This is the number of immediate succes-sors (subclasses) of a class the hierarchy of the inheritance tree.

• Metric 4: Coupling between Object Classes (CBO) An object class is coupledto another if it invokes another one’s member functions or instance variables.CBO is the number of these other classes.

• Metric 5: Response for Class (RFC) This metric represents the number of themethods that can be executed in response to a message received by an objectof that class. It equals to all the local methods plus the number of methodscalled by local methods.

Revision: final 16


• Metric 6: Lack of Cohesion Metric (LCOM) The cohesion of a class is indicatedby how closely the local methods are related to local instance variables of theclass. LCOM equals the number of disjoint sets of local methods.

Several studies show that the CK metrics suite assist in measuring and predicting anobject oriented systems maintainability [FP97], [Kan03]. In particular, studies showthat certain CK metrics are linked to faulty classes and help predict such [Kan03].

1.2.2 Design Metrics

Along with object oriented programming the notion of object oriented design wasintroduced, too. The programmer has to use some kind of modelling with classesand objects in order to design his application first. After the design is completed,the programmer goes on with coding. One of the questions that a programmer askshimself is whether or not his design is of good quality. An experienced programmercan answer that question by applying on his/her design a number of rules basedon his/her experience. He looks for bad choices that may have been done or theviolation of some intuitive rules of himself. If the design passes his own checks thenit is of good quality and he continues to code writing. Of course with big applications,inspection of a design by a person is rather difficult, so a tool is needed.

These intuitive rules are called “design heuristics” checks. They are based onexperience. They are like design patterns, but rather than proposing a certain designfor certain problems, heuristics are rules that help the designer check the validityof his design. Design heuristics are validations for the object oriented design andadvise the programmer for certain design mistakes. These advises should be takeninto account by the programmer, who has to make some research to correct things.Of course a heuristics violation does not mean a design mistake all the time, but itis a point for further investigation by the development team. A well known set ofsuch object oriented design heuristics was first introduced by Arthur Riel. Riel inhis seminal work [Rie96] defined a set of more than 60 design heuristics, a resultof his experience. His work has helped many people to improve their designs andthe way they program. Before Riel other researchers have addressed similar issues,including Coad and Yourdon [YC91]. Additionally, there is on going research in thefield of design heuristics. Researchers are investigating the impact of the applicationof object oriented design heuristics and the evaluation and the validation of theseheuristics [DSRS03, DSA+04]. As an example someone can read the object orienteddesign heuristics listed in the list below. These heuristics are taken from Riel [Rie96].

1. The inheritance hierarchy should not be deeper than six.

2. Do not use global data. Class variables or methods should be used instead.

3. All data should be hidden within its class.

Revision: final 17


4. All data in a class should be private.

5. All methods in a class should have no more than six parameters.

6. A class should not have zero methods.

7. A class should not have one or two methods.

8. A class should not be converted to an object of another class.

9. A class should not contain more than six objects.

10. The number of public methods of a class should be no more than seven.

11. The number of classes with a class collaborates should not be more than four.

12. Classes with so much information should be avoided. We consider that a classfits into this description when it associates with more than four classes, hasmore than seven methods and more than seven attributes.

13. The fan out of the class should me minimised. The fan out is the product of thenumber of methods defined by the class and the number of methods they send.This number should be no more than nineteen.

14. All abstract classes must be base classes.

15. All base classes should be abstract classes.

16. Do not use multiple inheritance.

17. A class should not have only methods with names set, get print.

18. If a class contains objects from another class, then the containing class shouldbe sending messages to the contained objects. If this does not happen then wehave a violation of the heuristic.

19. In case that a class contains objects from other classes, these objects shouldnot be associated with each other.

20. If a class has only one method apart from set, get and print it means then thereis a violation.

21. The number of messages between a class and its collaborator should me mini-mal. If this number is more than fifteen we have

One should note here that the above heuristics can be validated with the use of atool.

Before Riel, Lorenz [LK94] proposed similar rules derived from industrial experi-ence (include metrics for the development process):

Revision: final 18


1. Average method size should be less than 24 LOC for C++.

2. Average number of methods class should be less than 20.

3. Average number of instance variables per class should be less than 6.

4. Class hierarchy nesting level (or DIT of CK metrics) should be less than 6.

5. Number of subsystem-subsystem relationships should be less than the numberof the 6th metric.

6. Number of class-class relationships in each subsystem should be relativelyhigh.

7. Instance variable usage: If groups of methods in a class use different sets ofinstance variables, look closely to see if the class should be split into multipleclasses along those “service” lines.

8. Average number of comment lines per method should be greater than 1.

9. Number of problem reports should be low.

10. Number of times class is reused (a class should be reused in other projects,otherwise it might need redesign).

11. Number of classes and methods thrown away (this should occur in a steadyrate).

As mentioned before, all these “rules of thumb” are derived from the experiencegained during multiple development processes and reflect practical knowledge. Forexample, a large number of average method size may indicate a poor OO designand a function oriented coding [Kan03]. A class containing too much responsibility(many methods) indicates that there should be a separate class for some of methods.The list of practice goes on, reflecting this practical knowledge mentioned.

1.2.3 Product Quality Metrics

The previous sections discussed development and design quality. These are the qual-ity metrics that can be applied to a software product early in the product lifecycle:before the product is released these metrics may already be calculated. The follow-ing metrics are post-release metrics and apply to a finished software product.

Revision: final 19


Maintainability: When a software product is complete and released, it enters intothe maintenance phase. During this phase, defects are corrected, re-engineeringoccurs and new features are added. Here we look at four types of software mainte-nance:

• Corrective maintenance, which is the main maintenance task and involves cor-recting defects that are reported by users.

• Adaptive maintenance is the maintenance that has to do with adding new func-tionality to the system.

• Preventive maintenance is the defect fixing done by the development team,preventing defects delivered to the user.

• Perfective maintenance involves mainly re-engineering and redesigning tasks.

For the maintenance process, we mainly have four maintainability metrics.

Average Code Lines per Module: This is a very simple metric which is the aver-age number of comment lines in the code of its module of the code (e.g. function, orclass). This metric show how easy the code can be maintained or how easy someonecan understand part of the code and correct it. With this metric there are some con-siderations regarding the comment lines, considerations that also apply later in theMaintainability Index metric. For instance, considerations need to be given to howmuch of the comment lines reflects the code (are there useless comment lines?), ifthe commented lines contain comment with copyright notices and other legal noticesetc.

Mean Time To Repair: Mean Time To Repair (MTTR) is an external measure, ithas to do with the delivered product from the user point of view, and not the code.MTTR is the average time to fix a defect from the time it was reported to the momentthe development team corrected it. Sometimes MTTR is referred to as “fix responsetime.”

Backlog Management Index: Backlog Management Index (BMI) is also an exter-nal measure of maintainability and it has to do both with defect fixing and defectarrival [Kan03]. The BMI is expressed as

BMI =Number Of Problems Closed

Number Of Problem Arrivals· 100%

The number of problems that arrive or are closed are counted over some fixed timeperiod, usually a month. Of course the time period can change from a week to afixed day period. If BMI is bigger than 100% it means that the development team

Revision: final 20


is efficient and closes bugs faster than their arrival rate. If it is less than 100% itmeans that the development team has efficiency problems and stays behind with thedefect fixing process.

Maintainability Index: Several metrics have been proposed to describe the inter-nal measurement of maintainability [FP97]. Most of them try to correlate structuralmetrics presented before, with maintainability. In certain cases there has been a linkof a certain metric with maintainability. For example, McCabe categorised programsin the maintenance risk categories, stating that any program with a McCabe metriclarger than 20 has a high risk of causing problems.

One interesting model that derived from regression analysis and is based on met-rics proposed before is theMaintainability Index (MI) proposed by Welker and Oman[WO95]. The MI shows strong correlation between Halstead Software Science met-rics, McCabe’s cyclomatic complexity, LOCs , and the number of comments in thecode. There are two expressions of MI, one using the three of the previous metricsand another using the four:

Three-Metric MI equation

MI = 171− 5.2 ln(aveV )− 0.23aveV (g)− 16.2 ln(aveLOC)

where aveV is the average Halstead Volume per module, aveV (g) is the averageextended cyclomatic complexity per module, and aveLOC is the average linesof code per module.

Four-Metric MI equation

MI = 171− 5.2 ln(aveV )− 0.23aveV (g)− 16.2 ln(aveLOC) + 50 sin√

2.4perCM

here aveV (g) and aveLOC are as before and aveE is the average Halstead Effortper module and and perCM is the average percent of lines of comments permodule.

In their article, Welker and Oman proposed three rules on how to choose whichmetric (3 or 4 metric equation) is appropriate for use [WO95]. They proposed threecriteria, if one them is true then it is better to use the 3 metric equation, otherwiseuse the 4 metric one. The criteria are:

• The comments do not accurately match the code. Unless considerable attentionis paid to comments, they can become out of synchronisation with the code andthereby make the code less maintainable. The comments could be so far off asto be of dubious value.

• There are large, company-standard comment header blocks, copyrights, anddisclaimers. These types of comments provide minimal benefit to softwaremaintainability. As such, the 4 metric MI will be skewed and will provide anoverly optimistic maintainability picture.

Revision: final 21


• There are large sections of code that have been commented out. Code that hasbeen commented out creates maintenance difficulties.

Calculated MI is simple because there are tools (we examine such tools in Section2) that measure the metrics it facilitates. As authors suggest MI is useful periodicassessment of the code in order to test its maintainability.

1.3 Productivity Metrics

Software productivity is a very complex metric which is mainly used in effort andcost estimation. However, we shall use productivity as a quality metric in order toevaluate the health of a software project. Generally productivity is expressed as

Productivity =Number Of Things Implemented

PersonMonths

The term “things” refers to size measurements which can be expressed as lines ofcode, function points or number of classes in case of object oriented development.Similarly, person months can be any fixed time period. We must note here that themetric proposed is a very simple one. Of course, more complex metrics exist (suchas metrics derived from regression techniques) but are beyond the scope of our ownresearch.

1.4 Open Source Development Metrics

Apart from the metrics presented in the previous sections, there are metrics thatcan be applied directly to the Open Source development process and have beenuse in the past in order to perform open source software evaluation and successmeasurement [lD05], [CHA06]. Additionally we present some metrics used by OpenSource hosting companies like Freshmeat.net.

Number of releases in past 12 months: This measures the activity of a softwareproject, particularly its productivity and also reliability (and defect density). This bi-directional nature of this metric (productivity and maintainability) has no a nominalscale. Small values of the metric may show small productivity, but it may be an indi-cator for minor improvements and bug fixes. Thus, this metric has to be measuredalong with others, like the number of contributors and/or number of downloads.Furthermore, the metric can be refined to number of major releases and number ofminor releases or patches. With this distinction, the previous problem of the metriccan be overcome, high number of minor releases is an indicator of problematic soft-ware (but also of high response fix time). The number of minor releases or patchesmetric can be used along with the defect removal effectiveness metric

Revision: final 22


Volume of mailing lists: This metric is rather useful in order to evaluate thehealth of a project and the support it provides [SSA06]. It is a direct measurementof the number of messages sent in a project’s list in a month period (or anotherfixed time period). A healthy project has an active mailing list, while a soon to beabandoned one has lower activity. The volume of the users’ mailing list is also anindicator of how well this project is supported and well documented.

Volume of available documentation: Along with the previous metric, this oneis an indicator of the available support. When we refer to the volume of availabledocumentation, we mean the available documents, like the installation guide or theadministrator’s guide.

Number of contributors: A direct repository measurement, which representshow big the community of a project is. High number of contributors mean fast bugfixing, availability of support and of course it is a perquisite for a project to evolve.We have to stress here that a lot of projects like Apache, have a small core group thatproduces the majority of the code and a larger one that contributes less [MFH02].Thus, this metric has to be further evaluated and be used along with other metrics.

Repository Checkouts: From a project’s repository, someone can directly extractsome other interesting metrics, particularly productivity metrics. These metrics arethe number of commits per commiter, the number of commits of a specific commiterfor a fixed period (for example, a month) and the total number of commits for a fixedperiod. All these metrics are productivity metrics and also can be an indicator forthe defect removal effectiveness. Of course, as these metrics measure activity, theyrepresent the healthiness of the project.

Number of downloads: Direct measurement of the number of downloads of anOpen Source project. This metric can show us a project’s popularity, thus it is anindicator of its healthiness and end user quality. However, someone must have inmind that someone may downloading a software does not mean that he used it and,if he did, whether he was satisfied with it.

Freshmeat User Rating: Freshmeat.net hosting service uses a user rating metricwhich works like this, according to its website3: Every registered user of freshmeatmay rate a project featured on this website. Based on these ratings, they build a top20 list and users may sort their search results by ratings as well. Please be awareof the fact that unless a project received 20 ratings or more a project will not be

3http://freshmeat.net/faq/view/31/

Revision: final 23

http://freshmeat.net/faq/view/31/


considered for inclusion in the top 20. The formula gives a true Bayesian estimate,the weighted rank (WR):

WR =v

v + mR +

m

v + mC

where:R = average rating for the projectv = number of votes for the projectm = minimum votes required to be listed in the top 20 (currently 20)C = the mean vote across the whole report

Freshmeat Vitality: The second metric that Freshmeat uses is the project’s vital-ity. Again, according to Freshmeat 4, the vitality score for a project is calculatedthus:

popularity = (announcements ∗ age)/(last announcement)

which is the number of announcements multiplied by the number of days an applica-tion exists divided by the days passed since the last release. This way, applicationswith lots of announcements that have been around for a long time and have recentlycome out with a new release earn a high vitality score, old applications that have onlybeen announced once get a low vitality score. The vitality score is available throughthe project page and can be used as a sort key for the search results (definable inthe user preferences).

Freshmeat Popularity: From the Freshmeat site 5: The popularity score super-seded the old counters for record hits, URL hits and subscriptions. Popularity iscalculated as

popularity =√

(record hits + URL hits) ∗ (subscriptions + 1)

Again we have to stress here that these metrics are used by Freshmeat and ofcourse they need further investigation and validation.

1.5 Software Metrics Validation

The metrics presented in this chapter try to measure a wide range of attributes ofsoftware. For each attribute of a piece of software (e.g. length) there are vari-ous kinds of metrics to measure them. This availability of various metrics for eachattribute raises the question if a particular metric is suitable for measuring an at-tribute. The matter of suitability of a specific metrics is a very important area in

4http://freshmeat.net/faq/view/27/5http://freshmeat.net/faq/view/30/

Revision: final 24




software engineering research and it is the reason of why metrics are questioned byresearchers and there is a lot of discussion for them.

According to Fenton [FP97] the way we want to validate metrics depends onwhether we want just to measure an attribute or we want to measure in order topredict. Prediction of an attribute of a system (e.g. code quality or cost) is a coreissue in software engineering. So, in order to perform metrics validation we mustdistinguish between these two types:

• Measurement that is performed in order to assess an existing entity by numer-ically characterising one or more of its attributes, for example size.

• Measurement that is performed in order to predict some attribute of a futureentity, like the quality of the code.

The validation procedure also can be distinguished in two types[BEM95]:

• Theoretical validation. This kind of validation facilitates a mathematical for-malism and a model. This is usually done by setting mathematical relations foreach attribute and try to validate these relations.

• Empirical validation. As Briand et al state [BEM95] empirical validation is theanswer to the question “is the measure useful in a particural development envi-ronment, considering a given purpose in defining the measure and a modeler’sviewpoint?”.

From these two approaches, empirical validation is the one which is widely used. Itpractically tries to correlates a measure with some external attributes of a software,for example complexity with defects.

1.5.1 Validation of prediction measurement

As Fenton states,validating a measurement conducting for prediction is the processof establishing the accuracy of the prediction by empirical means, that is by com-paring model performance with known data. In other words, a prediction measure-ments valid if it makes accurate predictions. This kind of validation is widely usedin software engineering research for cost estimation purposes, quality and reliabil-ity prediction. With this method, researchers form and test hypotheses in order topredict certain attributes of software or conduct formal experiments. Then they usemathematical (statistical) techniques to test their results, for example if a particularmetric such as size is an accurate cost estimator. Other kinds of predictions arequality and fault proneness detection. Mathematical techniques used are regressionanalysis, logistic regression and also more sophisticated methods such as decisiontrees and neural networks. Examples of such a metric validation are [GFS05] and[BBM96]

Revision: final 25


1.5.2 Validation of measures

Again, according to Fenton [FP97] validating a software measure is the process ofensuring that the measure is a proper numerical characterisation of the claimedattribute by showing that the representation condition is satisfied. As it is implied,this kind of validation facilitates theoretical validation. For example, for a metricthat measures size, we form a model to represent a program and relation of thatmodel to the notion of size. Let’s call the program P and the relation m(P). In orderto validate the length we can use the following. If a program P1 is of length m(P1)

and a program P2 of length m(P2) then this equation

m(P1 + P2) = m(P1) + m(P2)

should be valid. If also this is valid

P1 < P2 ⇒ m(P1) < m(P2)

then our relation, our metric, is valid.Although we are not going to discuss metrics validation in depth here, we are

going to perform validation throughout our projects, especially when we are goingto present new metrics for Open Source software development. A good place tostart studying metrics validation is [BEM95] and [Sch92]. Both papers provide a lotof insight about metrics validation and also present mathematical techniques bothfor theoretical and empirical validation. A more recent study that discusses metricsis that of Kener and Bond [KB04b]. Another interesting paper which discusses howempirical research in software engineering should be conducted and contains a lotabout validation is that of Kitchenham et. al. [KPP+02]. Two good examples ofapplication of metrics validation are that of Briand et al [BDPW98] and Basili et. al.[BBM96]. A rather complete publication list on software metrics validation can befound here http://irb.cs.uni-magdeburg.de/sw-eng/us/bibliography/bib_10.shtml

Revision: final 26

http://irb.cs.uni-magdeburg.de/sw-eng/us/bibliography/bib_10.shtml

http://irb.cs.uni-magdeburg.de/sw-eng/us/bibliography/bib_10.shtml


2 Tools

Many publications mention measurement tool support and automation as importantsuccess factors for software measurement efforts and quality assurance [KSPR01],providing frameworks and general approaches [KRSZ00], or giving more specificsolution architectures [JL99]. There is a great variety of research tools to supportsoftware metric creation, handling, and analysis; an overview on different types ofsoftware metrics tools is given in [Irb]. Wasserman [A.I89] introduces the conceptof tools with vertical and horizontal architecture, with the former supporting ac-tivities in a single life cycle phase, such as UML design tools or change requestdatabases, and the latter supporting activities over several life cycle phases, such asproject management and version control tools. Fuggetta [Fug93], on the other hand,classifies tools as either single tools, workbenches supporting a few development ac-tivities, and environments supporting a great part of the development process. Theabove ideas for different kind of metrics tools certainly affected the functionality thatcommercial tools offer but still the most popular categorisation of metrics tools clas-sifies metrics tools as either product metrics tools or process metrics tools. Productmetrics tools measure the software product at any stage of its development, fromrequirements to installed system. Product metrics tools may measure the complex-ity of the software design, the size of the final program (either source or objectcode), or the number of pages of documentation produced. Process metrics tools,on the other hand, measure the software development process, such as overall de-velopment time, type of methodology used, or the average level of experience of theprogramming staff. In this chapter we are going to present tools, both Open Sourceand commercial that support and automate the measurement process.

2.1 Process Analysis Tools

The process of Open Source software development depends heavily on a repository,responsible for version control. The majority of the projects use mainly two ver-sioning control systems for their repositories, CVS6 and Subversion7. Many of theinformation needed in order to extract the various metrics is contained in that repos-itories. This information is

• The code itself, along with historical data (changes, additions, etc).

• Information regarding programmers (commiters) (number of commiters, user-names, etc).

• Historical data about the productivity of commiters (number of commits, whichpart of code is committed by whom, etc.

6http://www.nongnu.org/CVS/7http://subversion.tigris.org/

Revision: final 27


Figure 2: CVSAnalY Web Interface, Main Page

All these are stored in a repository and someone can find tools, available as OpenSource software, to extract all the useful information from the repository.

2.1.1 CVSAnalY

CVSAnalY8 (CVS Analysis) is one of the first tools that accesses a repository in orderto find information regarding an open source project. It has been developed by theLibresoft Group at the Universidad Juan Carlos in Spain and has already producedresults, used in research in open source software [RKGB04]. The tool is licencedunder the GNU General Public Licence.

Specifically, CVSAnalY is a tool that extracts statistical information out of CVSand Subversion repository logs and transforms it in database SQL formats. Themain tool is a command line tool. The presentation of the results is done with a webinterface - called CVSAnalYweb - where the results can be retrieved and analysedin an easy way (after someone has run the command line main tool, CVSAnalY). Thetools produces various results and statistics regarding the evolution of a project overtime. A general view of the tool is shown in Figure 2. The tool stores historical datasuch as:

• First commit logged in the versioning system.

• Last commit (until the date we want to examine.

• Number of days examined.

8http://cvsanaly.tigris.org/

Revision: final 28


• Total number of modules in the versioning system.

• Commiters.

• Commits.

• Files.

• Aggregated Lines.

• Removed Lines.

• Changed Lines.

• Final Lines.

File type statistics for all modules:

• File type.

• Modules.

• Commits.

• Files.

• Lines Changed.

• Lines Added.

• Lines Removed.

• Removed files.

• External.

• CVS flag.

• First commit.

• Last commit.

The tool also logs the inactivity rate for modules and commiters, commiters permodule, Herfindahl-Hirschman Index for modules and also as mentioned before itproduces helpful graphs. Example of graphs produced are:

• Evolution of the number of modules.

• Modules by Commiters (log-log).

Revision: final 29


Figure 3: CVSAnalY Web Interface, Evolution of the number of modules

• Modules by Commits (log-log).

• Modules by Files (log-log).

• Commiter by Changes (log-log).

Examples of the graphs are shown in Figure 3 and Figure 4.The CVSAnalY tool is a rather useful one and it helps to gather data about the

process of Open Source software development and data substantial to measure othermetrics, especially process metrics. Another very important feature of CVSAnalY isthe reconstruction of the repository in specific timelines.

2.1.2 GlueTheos

GlueTheos [RGBG04] has been developed to coordinate other tools used to analyseOpen Source repositories. The tool is a set of scripts used to download data (source-code) from Open Source repositories, analyse them with external tools (developedfrom third parties) and store the results in a database for further investigation. Theparts, which comprise GlueTheos are:

• The core scripts act as a user interface interacting with the user and handledetails like repository configuration, periods of analysis (the periodic snapshotsfrom a repository), storage details, third party tools details and parameters.

Revision: final 30


Figure 4: CVSAnalY Web Interface, Commiter by Changes

• The downloading module which is responsible for downloading source codesnapshots at specific dates and storing it locally.

• The analysis module. Here user describes further details of the external toolsused for source code analysis. These details include instructions on how toinvoke the tool, which are the parameters to the tool and the output details ofthe tool. The module is also responsible for running these eternal tools.

• The storage module. This module is responsible for the storage of the resultscreated by the previous module. It takes the output of an analysis tool andformats it into an appropriate SQL command, suitable to store the result into adatabase.

Generally the tools runs like this:

1. The user chooses which project to analyse (e.g. GNOME) and which periods toanalyse (e.g. every month from December 2003 until September 2005).

2. Then it chooses an analysis tool (e.g. sloccount, which counts physical sourcelines of code9 ). The integration of the tool with the main set of scripts includedescription on how to call the tool, parameter passing and description of itsoutput.

9http://www.dwheeler.com/sloccount/

Revision: final 31


Figure 5: GlueTheos, Table that contains analysis of a project

3. The program retrieves the code of the project analysed for the configureddates, then it analyses the code with the external tool and stores the outputin a database.

The table of the database that contains the analysis results has as a column its outputof the external tool. Figure 5 shows a table created by GlueTheos, which containsthe output of sloccount (SLOC -source lines of code- and language type) for the filesof the gnome core project at a specific date. GlueTheos is released under the GNUGeneral Public Licence.

2.1.3 MailingListStats

MailingListStats10 analyses Mailman (and in future other mailing list manager soft-ware) archives in order to get statistical data out of them. Statistical data is trans-formed into XML and SQL to allow further analysis and research. This tool alsoincludes a web interface.

10http://libresoft.urjc.es/Tools/MLStats

Revision: final 32


2.2 Metrics Collection Tools

2.2.1 ckjm

ckjm11 calculates Chidamber and Kemerer object-oriented metrics by processing thebytecode of compiled Java files. The program calculates for each class the six metricsproposed by Chidamber and Kemerer as well as afferent couplings and the numberof public methods. This application was developed by Professor Diomidis Spinellis,the coordinator of the SQO-OSS project.

2.2.2 The Byte Code Metric Library

The Byte Code Metric Library12 (BCML) is a collection of tools to calculate the met-rics of the Java byte code classes or JAR files in directories, output the result intoXML files, and report the result with HTML format.

2.2.3 C and C++ Code Counter

CCCC is a tool which analyses C / C++ files and generates a report on various metrics.The tool 13 is developed as a MSc thesis by Tim Littlefair and it is copyrighted by him.The tool is command line and analyses an input of a list of files and generates HTMLand XML reports containing results. The metrics measured are the most commonones, specifically they are:

• Summary table of high level metrics summed over all files processed in thecurrent run.

• Table of procedural metrics (i.e. lines of code, lines of comment, McCabe’scyclomatic complexity summed over each module.

• Table of four of the six metrics proposed by Chidamber and Kemerer.

• Structural metrics based on the relationships of each module with others. In-cludes fan-out (i.e. number of other modules the current module uses), fan-in(number of other modules which use the current module), and the InformationFlow measure suggested by Henry and Kafura, which combines these to give ameasure of coupling for the module.

• Lexical counts for parts of submitted source files which the analyser was unableto assign to a module. Each record in this table relates to either a part of thecode which triggered a parse failure, or to the residual lexical counts relatingto parts of a file not associated with a specific module.

11http://www.spinellis.gr/sw/ckjm12http://csdl.ics.hawaii.edu/Tools/BCML13http://cccc.sourceforge.net/

Revision: final 33


Figure 6: CCCC, Report for Procedural Metrics

Figure 6 shows the report for procedural metrics for an Open Source project, whileFigure 7 shows the report for Object Oriented Metrics of the same project.

2.2.4 Software Metrics Plug-In for the Eclipse IDE

The Software Metrics 14 Plug In for Eclipse IDE is a powerful add-on for the popularOpen Source software IDE Eclipse. It is installed, as its name denotes, as a plug into Eclipse and it is distributed under the same licence as the Eclipse IDE itself. Thetool measures Java code against a long list of metrics:

• Lines of Code (LOC): Total lines of code in the selected scope. Only countsnon-blank and non-comment lines inside method bodies.

• Number of Static Methods (NSM): Total number of static methods in the se-lected scope.

• Afferent Coupling (CA):The number of classes outside a package that dependon classes inside the package.

• Normalised Distance (RMD): RMA + RMI − 1, this number should be small,close to zero for good packaging design.

• Number of Classes (NOC): Total number of classes in the selected scope.

• Specialisation Index (SIX): Average of the specialisation index, defined as NORM* DIT / NOM. This is a class level metric.

14http://metrics.sourceforge.net/

Revision: final 34


Figure 7: CCCC, Report for Object Oriented Metrics

• Instability (RMI): CE / (CA + CE).

• Number of Attributes (NOF): Total number of attributes in the selected scope.

• Number of Packages (NOP): Total number of packages in the selected scope.

• Method Lines of Code (MLOC): Total number of lines of code inside methodbodies, excluding blank lines and comments.

• Weighted Methods per Class (WMC): Sum of the McCabe Cyclomatic Complex-ity for all methods in a class.

• Number of Overridden Methods (NORM): Total number of methods in the se-lected scope that are overridden from an ancestor class.

• Number of Static Attributes (NSF): Total number of static attributes in theselected scope.

• Nested Block Depth (NBD): The depth of nested blocks of code.

• Number of Methods (NOM): Total number of methods defined in the selectedscope.

• Lack of Cohesion of Methods (LCOM): A measure for the cohesiveness of aclass. Calculated with the Henderson-Sellers method: If m(A) is the numberof methods accessing an attribute A, calculate the average of m(A) for all at-tributes, subtract the number of methods m and divide the result by (1-m). Alow value indicates a class with a high degree of cohesion. A value close to 1

Revision: final 35


indicates a lack of cohesion and suggests the class might better be split into anumber of (sub)classes.

• McCabe Cyclomatic Complexity (VG): Counts the number of flows through apiece of code. Each time a branch occurs (if, for, while, do, case, catch andthe ?: ternary operator, as well as the && and || conditional logic operators inexpressions) this metric is incremented by one. Calculated for methods only.For a full treatment of this metric see McCabe [McC76].

• Number of Parameters (PAR): Total number of parameters in the selected scope.

• Abstractness (RMA): The number of abstract classes (and interfaces) dividedby the total number of types in a package.

• Number of Interfaces (NOI): Total number of interfaces in the selected scope.

• Efferent Coupling (CE): The number of classes inside a package that dependon classes outside the package.

• Number of Children (NSC): Total number of direct subclasses of a class.

• Depth of Inheritance Tree (DIT): Distance from class Object in inheritance hi-erarchy.

The user can also set ranges and thresholds for each metric in order to track codequality. Examples of these ranges can be:

• Lines of Code (Method Level): Max 50 - If a method is over 50 lines of code it issuggested that the method should be broken up for readability and maintain-ability.

• Nested Block Depth (Method Level): Max 5 - If a block of code has over 5nested loops, break up the method.

• Lines of Code (Class Level): Max 750 - If a class has over 750 lines of code,split up the class and delegate it’s responsibilities.

• McCabe Cyclomatic Complexity (Method Level): Max 10 - If a method has over10 different loops, break up the method.

• Number of Parameters (Method Level): Max 5 - A method should have no morethan 5 parameters. If it does have, create an object and pass the object to themethod.

Revision: final 36


Figure 8: Metrics, List of metrics

As someone can see from this list, the tool is rather extensive and the metrics mea-sured is exhaustive. A view of the plugin displaying the results of a measurement isshown in Figure 8.

The tool also displays the dependency connections among the various packagesand classes of a project analysed as a connected graph. An example of this graph isshown in Figure 9.

2.3 Static Analysis Tools

These tools analyse a program’s source code and locate bugs and problematic con-structions. If a tool simply collects metrics, then it is listed under metric collectiontools. It is best to limit this page to tools that are open source, and candidates forSQO-OSS data generation. Wikipedia maintains an exhaustive list of tools.

2.3.1 FindBugs

FindBugs15 looks for bugs in Java programs. It is based on the concept of bug pat-terns.

15http://findbugs.sourceforge.net

Revision: final 37


Figure 9: Metrics, Dependency Graph

2.3.2 PMD

PMD16 scans source code and looks for potential problems possible bugs, unusedand suboptimal code, over-complicated expressions and duplicate code.

2.3.3 QJ-Pro

QJ-Pro17 is a tool-set for static analysis of Java source code: a combination of auto-matic code review and automatic coding standards enforcement.

2.3.4 Bugle

Bugle18 uses Google code search queries to locate security vulnerabilities.

2.4 Hybrid Tools

Hybrid tools analyse both process and project data.

16http://pmd.sourceforge.net17http://qjpro.sourceforge.net18 http://www.cipher.org.uk/index.html?p=projects/bugle.project

Revision: final 38


2.4.1 The Empirical Project Monitor

The Empirical Project Monitor19 (EPM) provides a tool for automated collection andanalysis of project data. The current version uses CVS, GNATS, and Mailman as datasources.

2.4.2 HackyStat

Hackystat20 is a framework for automated collection and analysis of software engi-neering product and process data. Hackystat uses sensors to unobtrusively collectdata from development environment tools; there is no chronic overhead on devel-opers to collect product and process data. Hackystat does not tie you to a partic-ular tool, environment, process, or application. It is intended to provide in-processproject management support.

2.4.3 QSOS

QSOS21 is a method, designed to qualify, select and compare free and Open Sourcesoftware in an objective, traceable and argued way. It publicly available under theterms of the GNU Free Documentation License.

2.5 Commercial Metrics Tools

This section aims to document several popular commercial software metrics tools.When possible we attempted to assess properties of commercial metrics tools in ahighly heterogeneous and ever-changing software development environment. There-fore, the chosen tools are able to support the generation and storage of metric dataconsistently and in a structured way and provide some degree of customisation withdevelopment specific parameters. Based on the most common categorisation of met-rics tools mentioned earlier product and process metrics tools will be documented.

2.6 Process metrics tools

This section documents several software process metrics tools. Apart from the pre-sentation of the tools the assessment of the capabilities of the tools will be performedwhen possible. The evaluation is based on three basic criteria, indicated by otherstudies [A.I89] involving platform independence, input/output functions and automa-tion.

19http://www.empirical.jp20http://www.hackystat.org21http://www.qsos.org

Revision: final 39


Platform The first step in utilising any tool is to install it on an operating system.In the worst case, tool’s platform requirements can not be fulfilled by an exist-ing environment, which means a new OS would have to be added, i.e., bought,installed, maintained. Another platform issue is the database support: sometools are based on a metric repository and have to rely on some sort of rela-tional database. The range of supported databases affects the tools’ platforminteroperability. As some of the tools have both server and client components(for data storage and collection/reporting purposes, respectively), one has todistinguish these components’ platform interoperability separately.

Input/output Software project quality tracking and estimation tools heavily rely ondata from external sources such as UML modelling tools, source code anal-ysers, work effort or change request databases etc. The ease of connectingto these applications through interfaces or file input substantially influencesa metric tool’s efficiency and error-proneness. On the other hand, data oftenhas to be exported for further processing in spread-sheets, project manage-ment tools or slide presentations. Reports and graphs have to be created andpossibly viewed, posted on the Web, or printed.

Automation A key aspect of metric data processing is automatic data collection.This can range from simple alerts sent to project managers at certain condi-tions, periodic extraction of metric information from external tools, to advancedscripting and programming capabilities. Missing automation usually requirestedious and expensive manual data input and makes measurement inconsisten-cies more likely, as measurements are performed by different persons.

2.6.1 MetriFlame

MetriFlame22, a tool for managing software measurement data, is strongly based onthe GQM approach [BCR94]. A goal is defined, then corresponding questions andmetrics are determined, to assess whether a goal has been reached. Metrics canonly be accessed through such a GQM structure; it is not possible to simply collectmetrics without having to formulate goals and questions. The main elements of theMetriFlame tool environment are: the actual MetriFlame tool, data collectors andconverters, and components for viewing the results. MetriFlame does not feature aproject database; it stores all its data in different files with proprietary formats. Thefunctionality that the tool offers is summarised in figure 1. MetriFlame supports 32-bit Microsoft Windows environments (Windows 95 and later versions). The databaseconverter requires the Borland Database Engine (BDE) in order to access the differ-ent types of databases. BDE is installed during the MetriFlame installation proce-dure. Data can be imported to MetriFlame by using the so called data converters,

22http://www.virtual.vtt.fi

Revision: final 40


Figure 10: Metriflame functionality

which are not part of the MetriFlame tool, but separate programs. These programsconvert the data and generate structured text files, which can then be imported intoMetriFlame. New data can also be entered manually. The process of data collectioncannot be automated. Project data can only be saved in a MetriFlame project file;no other file format is available. Reports (graphs) can be saved as WMF, EMF, BMP,JPEG or structured text. MetriFlame does not feature an estimation model.

2.6.2 Estimate Professional

Estimate Professional23 is a tool to control project data, create project estimatesbased on different models or historical project data and visualise these estimates.Different scenarios can be created by changing project factors. Estimate Profes-sional is an extended and improved version of “Estimate”, a freely available pro-gram, which can perform only basic size-based estimates, does not feature reportingand does not consider risk factors. Estimate Professional does not feature a projectdatabase; it stores all project information in a single file. Initially, project data isentered by creating a new project and starting the estimate wizard. After specifyingproject related information like type of project, current phase of project, maximumschedule, priority of a short schedule, one has to choose between size-based estima-tion, which focuses on artifact metrics (LOC, number of classes, function points), andeffort-based estimation, which focuses on effort metrics (staff-months). Estimates inEstimate Professional are based on three models: Putnam Methodology, COCOMOII and Monte Carlo Simulation. Estimates can be calibrated in three ways: Using theoutcome of historical projects from the project database, altering the project type

23http://www.workflowdownload.com/

Revision: final 41


Figure 11: Estimate Professional.

by choosing subtypes for parts of the project or tuning the estimation by changingproductivity drivers like database size or programmer capability. A screenshot of thetool is presented in Figure 11. Estimate Professional supports MS Windows 95/98,NT 4.0 and 2000. For Installation on NT systems, administrator rights are required.Project data can be imported from a Microsoft Project file or from a CSV file. Theprocess of data collection cannot be automated. Project data can be exported to aMicrosoft Project file; project metrics can be exported into a CSV file.

2.6.3 CostXpert

The software cost estimation tool CostXpert24 produces estimates of project dura-tion, costs, staff effort, labour costs etc. using software size, labour costs, risk fac-tors and other input variables. The tool features mappings of source lines of codeequivalents for more than 600 different programming languages. The main menuof the tool is presented in Figure 12. Import of project data is limited to manualentry. Data connectors to tools processing software artifacts do not exist. Data canbe exchanged between different copies of CostXpert via CostXpert project files. Theprocess of data collection can not be automated. Regarding the estimation processCost Xpert integrates multiple software sizing methods, it is compliant with CO-COMO and over 32 lifecycles and standards. Cost Xpert is designed to aid projectcontrol, facilitate process improvement and earn a greater return on investment(ROI). Especially for COTS products the tool is able to estimate the portion of thepackage that needs no modification but should be configured and parameterised,what portion of the package needs to be modified and the amount of functionality

24http://www.costxpert.com/

Revision: final 42


Figure 12: Cost expert main menu

that should be added to the system. Project data in a work breakdown structurecan be exported to Microsoft Project or Primavera TeamPlay. The expected labourdistribution can be exported to a CSV file. Customised project types, standards andlifecycles can be exported to so-called customised data files. Reports can be printedor exported as PDF, RTF or HTML files. Graphs can be exported as BMP, WMF orJPEG files. CostXpert integrates more than 40 different estimation models based ondata from over 25.000 software projects. CostXpert supports MS Windows 95 and alllater versions. CostXpert does not feature a project database; project data is storedin a project file in proprietary format.

2.6.4 ProjectConsole

ProjectConsole 25 is a Web-based tool for project control that offers project reportingcapabilities to software development teams. Project information can be extractedfrom Rational tools or other third-party tools, is stored in a database and can beaccesses through a Web site. Rational ProjectConsole makes it easy to monitor thestatus of development projects, and utilise objective metrics to improve project pre-dictability. Rational ProjectConsole greatly simplifies the process of gathering met-rics and reporting project status by creating a project metrics Web site based ondata collected from the development environment. This Web site, which RationalProjectConsole updates on demand or on schedule, gives all team members com-plete, up-to-date view of your project environment. Rational ProjectConsole collectsmetrics from the Rational Suite development platform and from third-party products,and presents the results graphically in a customisable format to help the assessment

25http://www-128.ibm.com

Revision: final 43


Figure 13: Rational ProjectConsole.

of the progress and quality. Rational ProjectConsole supports MS Windows XP; Win-dows NT 4.0 Server or Workstation, SP6a or later and Windows 2000 Server orProfessional, SP1 or later. All the data is stored in a database, the so-called metricdata warehouse. Supported databases include SQL Server, Oracle and IBM DB2.ProjectConsole needs a Web server (IIS or Apache Tomcat) to publish its data overa network (local network or the Internet). The project Web site can be viewed withany browser. ProjectConsole can extract metrics directly from Rational Clear-Quest,Requisite Pro, Rose, and Microsoft Project repositories. In addition, ProjectConsoleprovides so-called collection agents that can parse Rational Purify, Quantify, Cover-age, and ClearCase data files. Automatic collection tasks can be scheduled to rundaily, weekly or monthly at a specified date and time. The data is extracted from thesource programs and stored in the metric data warehouse. The project Web site isautomatically updated. Graphs are in stored in PNG files. Data can be published intables and exported into HTML format. MS Excel 2000 or later can be used to im-port the HTML table format. ProjectConsole does not feature an estimation model.Figure 13 depicts the multi chart display of Project Console.

2.6.5 CA-Estimacs

Rubin has developed a proprietary software estimating model 26 that utilises grossbusiness specifications for its calculations. The model provides estimates of totaldevelopment effort, staff requirements, cost, risk involved, and portfolio effects.The ESTIMACS model addresses three important aspects of software management-estimation, planning, and control. The ESTIMACS system includes five modules.

26http://www.ca.com/products/estimacs.htm

Revision: final 44


The first module is the System development effort estimator. This module requiresresponses to 25 questions regarding the system to be developed, development envi-ronment, etc. It uses a database of previous project data to calculate an estimate ofthe development effort. Staffing and cost estimator is another. Inputs required forthis module are: the effort estimation from above, data on employee productivity,and salary for each skill level. Again, a database of project information is used tocompute the estimate of project duration, cost, and staffing required. Hardware con-figuration estimator requires as inputs information on the operating environment forthe software product, total expected transaction volume, generic application type,etc. Output is an estimate of the required hardware configuration. Risk estimatormodule calculates risk using answers to some 60 questions on project size, struc-ture, and technology. Some of the answers are computed automatically from otherinformation already available. Finally portfolio analyser provides information on theeffect of this project on the total operations of the development organisation. Itprovides the user with some understanding of the total resource demands of theprojects.

2.6.6 Discussion

The tools evaluated provide a broad variety of analysis capabilities, and differentdegrees of explicit estimation support. However, they all allowed storing and com-paring project measures in a structured way. Certain conclusions can be drawn onwhether the tools can integrate seamlessly in an existing and heterogeneous soft-ware development environment. All of the evaluated tools are only available on oneoperating system (MS Windows). This is particularly problematic for server compo-nents, as many times a dedicated server would have to be added to an otherwiseUnix-based server farm. Some tools only work with particular database engines, forexample Project Console. In addition to manual data entry, the tools generally arerestricted to a few input file formats (e.g. Estimate Professional only reads MicrosoftProject and CSV files). While communication with spreadsheet applications is usu-ally supported, few tools can access development tools like integrated developmentenvironments (IDE) or requirement databases directly. Tools with advanced metricdata collection capability (like MetricCenter) offer only a limited set of connectors tospecific development tools, which have to be purchased separately. Their communi-cation protocol is disclosed. Automation support is either not available (MetriFlame,Estimate Professional, CostXpert) or limited to pull operations (MetricCenter). Thedegree of flexibility with respect to defining new metrics and changing reports dif-fers greatly, however all tools provide only basic reporting flexibility. This would notbe a problem itself, if the tools would allow unrestricted data access for online analyt-ical processing (OLAP) reporting tools, but this is not possible with most of the toolseither. Data output for further processing is sometimes limited to CSV files and a pro-prietary file format (Metri-Flame). Tools often don’t support common reporting file

Revision: final 45


formats like PDF. Output automation is supported by few of the evaluated tools (Met-ricCenter, Project-Console). Some tools, instead of supporting integration, seem toduplicate features, which normally are already available in medium and large-scaleIT environments: Some tools introduce a proprietary file format (Metri-Flame), orare limited to a particular database system instead of accessing the companies re-liable database infrastructure. Some basic graphical reporting and Web-publishingfeatures are provided, instead of feeding advanced OLAP reporting tools, whose usewould also automatically eliminate the need of duplicating features for the handlingof user access rights. Finally, the difficulties in getting access to some tools pro-vide an additional cost barrier in integrating them in existing IT environments andseem to indicate that at least some of these tools do not provide user interfaceswith a low learning curve. Altogether, process engineers and portfolio managersoperating in highly dynamic environments must still expect substantial costs whenevaluating, integrating, customising, operating and continuously adapting planningand monitoring tools. Even tools with advanced architectures like MetricCenter of-fer a limited set of supported development tool, restricted customisation capabilitiesdue to disclosed data protocols, and platform restrictions. Proprietary approaches tosecurity and user access concerns are further complicating integration. Much workneeds to be done to lower the technological barrier for collecting software metricsin a varying and changing environment. Possible approaches to some of the currentproblems are likely to embrace the support of modern file formats like XML, andlight-weight data communication by using, for example, the SOAP protocol.

2.7 Product metrics tools

The initial target of product metrics tools was the assessment of objective measuresof software source code regarding size and complexity. As experience has beengained with metrics and models, it has become increasingly apparent that metricinformation available earlier in the development cycle can be of greater value incontrolling the process and results. Along with the calculation of several metricsvalues the tools attempt to support testing procedures as well taking into consider-ation the information coming from the metrics values. In this section a number ofproduct metrics tools will be discussed. These tools were chosen because of theirwide use or because they represent a particularly interesting point of view. The toolspresented reflect the areas where most work on product metrics has been done. Ref-erences have been provided for readers who are interested in further examining atool.

Revision: final 46


2.7.1 CT C++ -CMT++-CTB

CT C++ , CMT++ and CTB 27 are all tools developed by the Finish company, Testwelland available from Verifysoft for Microsoft Windows, Solaris, HP-UX and Linux. Theyfocus on test coverage (CT C++ ), metric analysis (CMT++) and unit testing (CTB)for C/ C++ source code. CT C++ is a coverage tool supporting testing and tuningof programs written in C and C++ . This coverage-analyser supports coverage forfunction, decision, statement, condition and multi-condition presenting the resultin a text or HTML report. The analyser is available for coverage measuring in thehost, for operating systems as well as for embedded systems. The tool is integratedin Microsoft Visual C++ , the Borland compiler and WindRiver Tornado. CMT++ isa tool for assessing code complexity. Code complexity has effect on how difficultit is to test and maintain an application. Complex code is likely to contain errors.Metrics like McCabe Cyclomatic Complexity, Halsteads Software Metrics and Line-of-Code-Mare are supported by the tool. The tool can be customised by the userfor company coding standards. CMT++ identifies complex and error-prone code.As there is usually too little time to inspect all the code carefully, it is an importantstep to select the most error-prone modules. CMT++ also gives an estimation ofthe number of test cases needed to test all paths of a function and gives you anidea of how many bugs you should find to have a “clean” code. CTB is a module-testing-tool for C programming language that allows the testing of the code at avery early development stage having as a result the prevention of bugs. As soonas the module compiles, the test bed can be generated on it without any additionalprogramming. The tool supports specification based (black-box) testing approachfrom “ad-hoc”-trial to systematic script-based regression-tests. Tests can run in aninteractive mode with a C-like command interface as well as script- or file-basedand made automatic. Test based execution is as if the test driver would read thetest main program and immediately execute it command-by-command showing whathappens. CTB works together with coverage analysis tools, such as CT C++ .

2.7.2 Cantata++

Cantata++ 28 is a commercial tool for unit and integration testing, coverage andstatic analysis. It tool is built on Eclipse v3.2 Open Source development platformincluding the C Development Tools (CDT). Unit and test integration capabilities ofthe environment support automated test script generation by parsing source code toderive parameter and data information with stubs and wrappers automatically gen-erated into the test script. Stubs provide programmable dummy versions of externalsoftware while wrappers are used for establishing programmable interceptions tothe real external software. The building and the execution of tests, black and white

27http://www.verifysoft.com28http://www.ipl.com/products/tools/pt400.uk.php

Revision: final 47


Figure 14: Cantata++ V5 - a fully integrated Test Development and Analysis Envi-ronment

box, is supported both by the tool and also via the developer’s build system. Verifi-cation of the code is also supported by providing sequential execution of test casesbased on wrappers and stubs. The test cases defined in verification can be reusedfor inherited classes and template instantiation. Figures 14 and 15 present the en-vironment of the tool.

Coverage analysis provides measurement of how effective testing has been in ex-ecuting the source code. Configurable coverage requirements are defined in rulesets that are integrated into dynamic tests resulting in Pass/Fail for coverage re-quirements. The coverage metrics used by the tool are the following:

• Entry points

• Call Returns

• Statements

• Basic Blocks

• Decisions (branches)

• Conditions MC/DC (for DO-178B)

Cantata has certain features that support coverage especially for applications de-veloped in Java such as reuse of JUnit tests with coverage by test case, and buildswith ANT. Static analysis generates over 300 source code metrics. The results ofthese metrics are stored in reports that can used to help enforce code quality stan-dards. The metrics defined are both procedural and product metrics. Procedural

Revision: final 48


Figure 15: Automated Test Script, Stub and Wrapper generation

metrics involve code lines, comments, functions and counts of code constructs. Prod-uct metrics calculate Myers, MOOSE, McCabe, MOOD, Halstead, QMOOD, Hansen,Robert Martin, McCabe, Object Oriented, Bansiya’s Class Entropy metrics. Can-tata++ can be integrated with many development tools including debuggers, simula-tors/emulators, UML modelling, Project Management and Code execution profilers.

2.7.3 TAU/Logiscope

Logiscope 29 supports automated error-prone module detection and code reviewsfor bug detection. This is enabled by the use of quality metrics and coding rulesto identify the modules that are most likely to contain bugs. Finally the tool pro-vides direct connection to the faulty constructs and improvement recommendations.There is a set of predefined coding and naming rules or quality metrics, which canbe customised to comply with specific types of project and organisational guidelinesalong with reuse industry standards. The main aspect of the tool is the establish-ment of best coding practices that are used both to test the existing code and totrain developers. Logiscope supports three basic functions, RuleChecker, Audit andTestchecker. RuleChecker checks code against a set of programming rules, prevent-ing language traps and code misunderstandings. There are over 220 coding andnaming rules initially in the tool with potentials for other rules to be added. Lo-giscope Audit locates error-prone modules and produces quantitative informationbased on software metrics and graphs that is used for the analysis of problems andthe rendering of corrective decisions. The decision may involve either the rewrite ofthe module or the more thorough testing. Software metrics templates used to evalu-

29http://www.telelogic.com/products/logiscope/index.cfm

Revision: final 49


Figure 16: Results presented in Logiscope

ate the code are ISO 9126 compliant. Templates as mentioned can be customised tofit project-specific requirements. Logiscope TestChecker measures structural codecoverage and shows uncovered source code paths having as a result the discoveryof bugs hidden in untested source code. TestChecker is based on a source code in-strumentation technique that is adaptable to test environment constraints. Figure16 shows the way results are depicted by Logiscope. Both the three functions of thetool are based on international recognised standards and models such as SEI/CMM,DO-178B and ISO/IEC 9126 and 9001. Several techniques that methodically tracksoftware quality for organisations at SEI/CMM Level 2 (repeatable) that want toreach Level 3 (defined) and above are supported. “Reviews and Analysis of theSource Code” and the “Structural Coverage Analysis” as required by the avionicsstandard, DO-178B, for software systems from Levels E to A are partially supportedby Logiscope as well as “Quality Characteristics” as defined by ISO/IEC 9126. TheLogiscope product line is available for both UNIX and Windows.

2.7.4 McCabe IQ

McCabe30 IQ manages software quality through advanced static analysis based onMcCabe’s research in software quality measurement and tracks the system’s metricvalues over time to document the progress made in improving the overall stabilityand quality of the project. The tool identifies error-prone code by using severalmetrics:

• McCabe Cyclomatic Complexity

30http://www.mccabe.com/iq.htm

Revision: final 50


Figure 17: Battlemap in Mc Cabe IQ

• McCabe Essential Complexity

• Module Design Complexity

• Integration Complexity

• Lines of Code

• Halstead

By using the above metrics complex code is identified. Figure 17 shows an exampleof how complex code identification is presented to the user. The Battlemap usescolour coding to show which sections of code are simple (green), somewhat complex(yellow), and very complex (red). Figure 18 presents the metric statistics that thetool calculates. Another function supported is the tracking of redundant code byusing a module comparison tool. This tool allows the selection of predefined searchcriteria or the establishment of new criteria for finding similar modules. After theselection of the search criteria the process is as follows: selection of the modulesyou used for matching, specification of programs or repositories that will be usedfor searching and finally localisation of the modules that are similar to the ones usedfor matching based on the search criteria selected. Then it is determined if there isany redundant code. If redundant code is found it is evaluated and if needed reengi-neered. The tool provides a series of data metrics. The parser analyses the datadeclarations and parameters in the code. The result of this analysis is the produc-tion of metrics based on data. There are two kinds of data-related metrics: globaldata and specified data. Global data refers to those data variables that are declaredas global in the code. Based on the result of the parser’s data analysis reports are

Revision: final 51


Figure 18: Presentation of the metrics statistics

produced that show how global data variables are tied to the cyclomatic complexityof each module in code. As cyclomatic complexity and global data complexity in-crease, so does the likelihood that the code contains errors. Specified data refersto the data variables that are specified as what is called a specified data set in thedata dictionary. In general, a data set is specified in the data dictionary one or morevariables have to be located in the code in order to analyse their association withthe complexity of the modules in which they appear. The tool includes a host oftools and reports for locating, tracking, and testing code containing specified data,as well as for enforcing naming conventions. The tool is platform independent andsupports Ada, ASM86, C, C++ .NET, C++ , COBOL, FORTRAN, JAVA, JSP, Perl, PL1,VB, VB.NET

2.7.5 Rational Functional Tester (RFT)

Rational Functional Tester 31 is an automated functional and regression testing toolfor Java, Visual Studio .NET and Web-based applications. It provides automated ca-pabilities for activities such as data-driven testing and it includes pattern-matchingcapabilities for test script resiliency in the face of frequent application user inter-face changes. RFT incorporates support for version control to enable parallel de-velopment of test scripts and concurrent usage by geographically distributed teams.The tool includes several components. IBM Rational Functional Tester Extensionfor Siebel Test Automation provides automated functional and regression testing forSiebel 7.7 applications. Combining advanced test development techniques with thesimplification and automation of basic test needs, Rational Functional Tester Exten-

31http://www-306.ibm.com/software/awdtools/tester/functional/features/index.html

Revision: final 52


sion for Siebel Test Automation accelerates the process of system test creation, exe-cution and analysis to ensure the early capture and repair of application errors. IBMRational Manual Tester is a manual test authoring and execution tool for testers andbusiness analysts. The tool enables test step reuse to reduce the impact of softwarechange on manual test maintenance activities and supports data entry and verifi-cation during test execution to reduce human error. IBM Rational TestManager isa tool for managing all aspects of manual and automated testing from iteration toiteration. It is the central console for test activity management, execution and re-porting supporting manual test approaches, various automated paradigms includingunit testing, functional regression testing, and performance testing. Rational Test-Manager is meant to be accessed by all members of a project team, ensuring thehigh visibility of test coverage information, defect trends, and application readiness.IBM Rational Functional Tester Extension for Terminal-based Applications allowsthe testers to apply their expertise to the mainframe environment while continuingto use the same testing tool used for Java, VS.NET and Web applications.

2.7.6 Safire

SAFIRE 32 Professional is a fully integrated development and run-time environmentoptimised for the implementation, validation and observation of signalling systems.It is used for a wide range of applications, such as gateways, signalling testers andprotocol analysers. The tool is based on international standards, such as UML, SDL,MSC, ASN.1 and TTCN (ITU-T, ETSI, ANSI, ISO). SAFIRE supports testing featuresfor signalling systems that can be validated to various levels of confidence, from top-level tests to detailed conformance tests according to international standards. Thetests generated are automated, deterministic, reproducible and documented. Thetool has a modular architecture that involves the following components:

• SAFIRE Designer - graphical editor, viewer, compiler

• SAFIRE Campaigner - test execution and report generator

• SAFIRE Animator - slow motion replay (actions, events, behaviour)

• SAFIRE Tracer - protocol analyser

• SAFIRE Organiser - version control and project management

• SAFIRE VM Virtual Machine - high performance virtual machine

The component that is most involved in quality assurance is the Campaigner thatsupports automated execution of tests. This component creates, edits, manages andexecutes test campaigns allowing the configuration of parameters. Campaigner also

32http://www.safire-team.com/products/index.htm

Revision: final 53


produces test report in the form of quality pass or fail modules. Also the tool allowsautomated repentance of certain tests. The quality rules that are used during thedesign and the testing of the code are the following:

• System structure

• Naming conventions-existence

• Naming conventions-properties

• SDL simplicity

• Uniqueness

• Modularity

• Proper-functionality

• Comments

• Communication

• Events

• Behaviour

2.7.7 Metrics 4C

Metrics4C33 calculates software metrics for individual modules or for the entireproject. These tools run interactively or in the background on a daily, weekly, ormonthly basis. The software metrics calculated for an individual module include:

• Lines of code

• Number of embedded SQL lines

• Number of blank lines

• Number of comment lines

• Total number of lines

• Number of decision elements

• Cyclomatic complexity

• Fan out

33http://www.plus-one.com

Revision: final 54


The above values are then summed to provide their respective project metrics. Inaddition, other project metrics calculated include:

• Average project cyclomatic complexity

• Project fan out metric (with and without leaf nodes)

• Total number of procedures and functions

• Total number of source code and header files

• Lines of code in source code and header files

• Total number of source code files unit tested

• Number of embedded SQL statements

• Lines of code unit tested

• Percent of files unit tested

• Integration Test Percentage

The Integration Test Percentage (ITP) provides a numeric value indicating how muchof the project’s source code has been tested and can be used to better prepare forFormal Qualification Testing (FQT). Output from Metrics4C can easily be importedinto a spreadsheet program to graphically display the data. Metrics4C can also flagwarnings if the lines of code or the cyclomatic complexity value exceeds a specifiedmaximum.

2.7.8 Resource Standard Metrics

Resource Standard Metrics 34 for C/ C++ and Java in any operating system generatessource code metrics. Source code quality metrics and complexity are measured bythis tool from the written source code having as a target to evaluate the projectsperformance. Source code metric differentials can be determined between base-lines using RSM code differential work files. Source code metrics (SLOC, KSLOC,LLOC) from this tool can provide line of code derived function point metrics. RSM iscompliant with ISO9001, CMMI and AS9100. Typical functionality of RSM enables:

• The determination of source code LOC, SLOC, KSLOC for C, C++ and Java

• Measurement of software metrics for each baseline and determine metrics dif-ferentials between baselines

34http://msquaredtechnologies.com/

Revision: final 55


• Capturing baseline code metrics independent of metrics differentials in orderto preserve history.

• Report of CMMI, ISO metrics for code compliance audit

• Performance of source code static analysis, best used for code peer reviews

• Remove of tabs, conversion from DOS to UNIX format.

• Measurement and analysis of source code for outsourced or subcontractedcode.

• Measurement of cyclomatic code complexity and analysis of interface complex-ity for maintenance.

• Creation of user defined code quality notices with regular expressions or utili-sation of the 40 predefined code quality notices.

2.7.9 Discussion

Most of the testing and product metrics tools provide the online capability to recorddefect information including severity, class, origin, phase of detection, and phaseintroduced. Several tools automate the testing procedure by providing estimationof error prone code and automatically generating results and reports. Metrics toolsprovide a variety of metrics reports or transport data into spreadsheets or reportgenerators. Query and search capabilities are also provided. Users have the ca-pability to customise tools to meet their organisation’s unique requirements. Forexample, users can customise quality rules, workflow, queries, reports, and accesscontrols. Other common features of the tools studied include:

• Graphical user interface.

• Integration to databases, spreadsheets, version control tools, configurationmanagement systems, test tools, and E-mail systems.

• Support for ad hoc queries and reports.

• Support for standards, i.e., CMMI, DoD-STD-2167A and ISO 9000.

• Support for distributed development.

• Ability to link defects and track duplicate defect reports.

Metrics capabilities of tools in most cases involve:

• Data gathering.

• Measurement analysis.

• Data reporting.

Revision: final 56


3 Empirical OSS Studies

3.1 Evolutionary Studies

3.1.1 Historical Perspectives

Back in 1971, in his book titled “The Psychology of Computer Programming,” GeraldM. Weinberg was probably the first who analysed the so-called “egoless program-ming,” meaning non-selfish, altruistic programming. This term was used in order todescribe the function of a software development environment in which volunteersparticipate actively by discovering and fixing bugs, contributing new code, express-ing ideas etc. These activities are without any direct material reward. Weinbergsubsequently observed that when developers are not territorial about their code andencourage other people to look for bugs and potential improvements, then improve-ment happens much faster [Wei71].

Several years later, Frederick P. Brooks, in his classic “The Mythical Man-Month:Essays on Software Engineering,” predicted that OSS developers will play a signif-icant role in software engineering in the future. In addition, he claimed that main-taining a widely used program is typically 40% or more of the cost of developing it.This cost is strongly affected by the number of users or developers of the specificproject. As more people will find more bugs and other flaws, the overall cost of thesoftware will be reduced. Brooks concluded [Bro75] that, this is why OSS can becompetent and sometimes even better than conventionally-built software.

In his influential article, “The Cathedral and the Bazaar,” Eric Steven Raymond,gathered and presented the main features of OSS development. Starting with theanalysis of his own OSS project, Fetchmail, he distinguished the classical “Cathedral-like” way of developing a commercial software from the new, “Bazaar-like” world ofLinux and other FOSS projects. Eventually, he came up with a series of lessons to belearned, which can very well serve as principles that make a FOSS project successful[Ray99].

According to OSS History written by Peter H. Salus [Sal], there are indicationsthat OSS development has its roots in the 1980’s or even earlier. But Raymond’s arti-cle was actually the first attempt for a systematic approach to OSS and its methods.His work though has met a lot of opposition, both in the FOSS community [DOS99]and the academic circles [Beza, Bezb], as being too simplistic and shallow. No mat-ter how controversial Raymond’s article is, its main contribution is that it raised awidespread interest in OSS empirical studies. Since the dawn of the new millen-nium, a satisfactory number of research essays on this subject have been published.Some findings of these essays are described below, in order to let us gain a deeperunderstanding on the evolution of several famous OSS projects.

Revision: final 57


3.1.2 Linux

The Linux operating system kernel is the best-known FOSS project worldwide, there-fore it’s a case worth of a closer study. The Linux project started in 1991 as a privateresearch project by a 22-years-old Finnish student named Linus Torvalds. Being dis-satisfied with the existing operating systems, he started programming a kernel him-self, based on code and ideas from Minix, a tiny Unix-like operating system. Linux’sfirst official release version 1.0 occurred in March 1994.

Today, Linux is one of the dominant computer operating systems, enjoying world-wide acceptance. It is a large system: it contains over four millions lines of code andit releases new versions very often. It has occupied hundreds of developers, whohave willfully dedicated a lot of their time to fix bugs, develop new code and reporttheir ideas for its evolution. According to Wikipedia’s relative article, it is estimatedthat Linus Torvalds himself has contributed only about 2 per cent of Linux’s code,but he remains the ultimate authority on what new code is incorporated into theLinux kernel. Such a case is definitely a fine example of how a FOSS communitycan work successfully by gathering the powers of a large, geographically distributedcommunity of software specialists.

The growth of the Linux Operating System began by following two parallel paths:the stable and the development releases. The stable release contains features thathave been already tested, showing a proven stability, ease of use and lack of bugs.The development release contains more features that are still in an experimentalphase, therefore it lacks stability and it contains more bugs. As one would expect,there are more development releases than stable ones. Also, the features of develop-ment releases that have been adequately tested are incorporated in the next stablerelease. This development concept has played a big part in the project’s success, asit provides conventional users with a reliable operating system (the stable release)and at the same time giving software developers the freedom to experiment and trynew features (the development release).

Following Raymond’s analysis on development method of the Linux operating sys-tem, Godfrey and Tu presented a research of Linux’s evolution over the years from1994 till 1999 [GT00]. As they say, most might think that as Linux got bigger andmore complex, its growing pace should slow down. This is also what the well-knownLehman’s laws of software evolution suggest: “as systems grow in size and com-plexity, it becomes increasingly difficult to insert new code” [LRW+97]. In the samecontext, Turski analysed several large software systems - that were all created andmaintained by small, predefined teams of developers using traditional managementtechniques. From his study, Turski posits that system growth is usually sub-linear.That is, a software system slows down as the system grows in volume and complex-ity [Tur96]. Also Parnas referred to this subject, by comparing software aging withhuman aging [Par94].

But the findings of Godfrey and Tu after studying the evolution of Linux, indicated

Revision: final 58


Figure 19: Growth of the compressed tar file for the full Linux kernel source release,([GT00], p.135).

a different trend. The methodology that they employed was to examine Linux bothat the overall system level and at each one of the major subsystems. In this way,they were able to study not just the whole system’s evolution of size, but each majorsubsystem’s volume as well. This concept can provide us with more information, as itis not obligatory that each and every subsystem follows the same evolution patternswith the overall system. A sample of 96 kernel versions was selected, including 34stable releases and 62 development releases. Two main metrics were used in thisresearch: the size of tar files and the number of the lines of code (LOC). A tar fileincludes all the source artifacts of the kernel, such as documentation, scripts andother, but no binary files. LOC were counted in two ways: with the Unix commandwc -l (that included blank lines and comments) and with an awk script (that ignoredblank lines and comments).

Regarding the overall system’s growth, the results of this research show that thedevelopment releases grew at a super-linear rate over time, while the stable releasesgrew at a much slower rate (Figures 19 and 20). These tendencies are common forboth metrics that were used. It is therefore clear that Linux’s development releasesfollow an evolution pattern that differs from the Lehman’s laws of software evolu-tion. We can support the view that this happens due to the way development releasesare built: they attract capable developers that are willing to contribute to the sys-tem’s growth. As the project’s popularity rises, more developers are attracted to itand more code is contributed. The stable releases, that follow a more conservativedevelopment path and don’t accept new contributions too easily, show a slower rateof size growth.

As for the growth of major subsystems, Godfrey and Tu selected 10 of these

Revision: final 59


Figure 20: Growth in the number of lines of code measured using two methods: theUnix command wc -l, and an awk script that removes comments and blank lines,([GT00], p.135).

subsystems:

• drivers: contains the drivers for various hardware devices

• arch: contains the kernel code that is specific to particular hardware architec-tures/CPU’s

• include: contains most of the system’s include (header) source files

• net: contains the main networking code

• fs: contains support for various kinds of file systems

• init: contains the initialisation code for the kernel

• ipc: contains the code for inter-process communications

• kernel: contains the main kernel code that is architecture independent

• lib: contains the library code

• mm: contains the memory management code

Figure 21 shows the evolution of each one of these subsystems in terms of LOC.We notice that drivers subsystem is both the biggest subsystem and the one with thefastest growth. In Figure 22, a comparative analysis of each subsystem’s LOC versusthe overall system’s LOC is presented. We can see that drivers occupy more than 60

Revision: final 60


Figure 21: Growth of SLOC of the major subsystems of Linux (developmentreleases),([GT00], p.138).

per cent of the total system’s size and this percentage is continuously growing. Thisfact can be explained as a result of Linux’s rising awareness: more users wish to runit with many different types of devices, therefore the respective drivers have to beincluded to the system.

A recent observation of Linux’s evolution was published by [Rob05]. He employeda methodology similar to this of Godfrey and Tu, but examined all the available re-leases of Linux (both stable and development) till December 2004, instead of pickinga sample. The metric that was used in this research was the SLOCCount tool, whichcounts source lines of code written in identified source code files. The kernel hadgrew a lot in comparison to the previous survey: the number of SLOC and the sizeof tar file were more than double. This trend is visible in Figures 23 and 24: thesuper-linearity of Linux’s evolution is even more remarkable during the last years.

Like Godfrey and Tu, Robles also examined the evolution of Linux’s major sub-systems, as we can see in Figures 25 and 26. The results were similar, as drivers isstill the biggest subsystem, though its share of the total Linux kernel has decreased,mainly due to the removal of sound subsystem in early 2002.

All in all, we conclude that the OSS communities’ power can push a project tosuper-linear growth, in contrast to the typical software evolution rules. Voluntaryparticipation in a software’s development ensures that the participants are reallyinterested on it both as developers and as users. In this case, software isn’t treatedmerely as a commercial product, but as a means of improving people’s lives. Linuxis a very good example of such a case.

Revision: final 61


Figure 22: Percentage of SLOC for each major subsystem of Linux (developmentreleases), ([GT00], p.138).

Figure 23: Growth of SLOC of Linux for all the stable and development releases,([Rob05], p.89).

Revision: final 62


Figure 24: Growth of the tar file (right) and the number of files (left) for the fullLinux kernel source release, ([Rob05], p.90).

Figure 25: Growth of SLOC of the major subsystems of Linux (development re-leases), ([Rob05], p.91).

Revision: final 63


Figure 26: Percentage of SLOC for each major subsystem of Linux (developmentreleases), ([Rob05], p.93).

3.1.3 Apache

Another famous OSS project is the Apache web server. It began in early 1995 byRob McCool, a software developer and architect who was 22 years old at that time.Apache was initially an effort to coordinate the improvement of NCSA (NationalCenter for Supercomputing Applications) HTTPd program, by creating patches andadding new features. Actually this was the initial explanation of the project’s name:it was “a patchy” server. Later though, the project’s official website claimed thatApache name was given as a sign of respect to the native American tribe of Apache.Apache quickly attracted the attention of an initial core team of developers, whoformed the “Apache Group,” and it was first launched in early 1996, as Apache HTTPversion 1.0. That time, it was actually the only workable Open Source alternative tothe Netscape web server. Since April 1996, it has reportedly been the most popularHTTP server on the internet, as it hosts over half of all websites globally.

One of the most comprehensive research on Apache server was conducted by Au-dris Mockus, Roy T. Fielding and James Herbsleb in 2002 [MFH02]. In this research,they discuss about the way the Apache development occurred and they present somequantitative results of Apache’s development evolution. The following information isbased on this article.

As we mentioned earlier, the “Apache Group” was formed at the initial stage ofthe project and it was charged with the project’s coordination. It was an informalorganisation of people, consisted entirely of volunteers, who all had other full-timejobs. Therefore they decided to employ a decentralised, scattered development con-cept, that supported asynchronous communication. This was achieved through the

Revision: final 64


Figure 27: The cumulative distribution of contributions to the code base, ([MFH02],p.321)

use of e-mailing lists, newsgroups and the problem reporting system (BUGDB). Ev-ery developer may take part in the project, submit his contributions and then the“Apache Group” decides on the inclusion of any code change. Apache core develop-ers are free to choose the project’s area that most attracts them and leave it whenthey are no more interested in it.

Mockus, Fielding and Herbsleb studied several aspects of Apache’s development.Firstly, they examined the participation of the project’s development community,which counts almost 400 individuals, in the two main parts of the software’s devel-opment: code generation and bug fixes. In Figure 27, we can see the cumulativeproportion of code changes (on the vertical axis) versus the top N contributors tothe code base (on the horizontal axis), which are ordered by the number of Modifi-cation Requests (MRs) from largest to smallest. Code contribution is measured by4 factors: MRs, Delta, Lines Added and Lines Deleted. The Figure shows that thetop 15 developers contributed more than 83 per cent of MRs and deltas, 88 per centof lines added, and 91 per cent of deleted lines. Similarly, Figure 28 shows the cu-mulative proportion of bug fixes (vertical axis) versus the top N contributors to bugfixing. This time, the core of 15 developers produced only 66 per cent of the fixes.

These two figures show that the participation of a wide development communityis more important in defect repair than in new code submission. We notice that,despite the broad overall participation in the project, almost all new functionalitiesare created by the core developers. A broad developers’ community though, is es-sential for bug fixing. Mockus, Fielding and Herbsleb made a comparative analysisof these findings to several commercial projects’ data. This study’s outcome wasthat in commercial projects, core developers’ contribution in the project’s evolution

Revision: final 65


Figure 28: The cumulative distribution of fixes, ([MFH02], p.322)

was significantly lower than in Apache. As an attempt to interpret these findings,we can argue that Apache core developers seem to be very productive compared tocommercial software’s developers. This conclusion is strengthened given the factthat participation in Apache’s development is a voluntary, part-time activity.

3.1.4 Mozilla

Mockus, Fielding and Herbsleb [MFH02] present an analysis of another OSS project,the Mozilla web browser. Mozilla was initially created as a commercial project byNetscape Corporation, which (in January 1998) decided to distribute its communi-cator free of charge, and give free access to the source code as well - thereforeturning it into a OSS project. Netscape was actually so impressed by Linux’s evo-lution, that they were attracted by the idea of developing an Open Source webbrowser. The project’s management was assigned to the “Mozilla Organisation,”now named “Mozilla Foundation.” Nowadays, the foundation coordinates and main-tains the Mozilla Firefox browser and the Mozilla Thunderbird e-mail application,among others.

Mockus, Fielding and Herbsleb investigate the size of Mozilla’s developmentcommunity. By examining the project’s repository, they found 486 code contribu-tors and 412 bug fixes contributors. In Figure 29, we can see the project’s externalparticipation over time. The vertical axis represents the fraction of external devel-opers and the horizontal axis represents time. It is clear that participation graduallyincreases over time, as a result of widespread interest and improved documentation.As an example, it is mentioned that 95 per cent of the people who created problemreports were external, and they committed 53 per cent of the total number of prob-

Revision: final 66


Figure 29: Trends of external participation in Mozilla project, ([MFH02], p.335)

lem reports. Figure 30 shows the cumulative distribution of code contribution forseven Mozilla modules. In this case, the developer contribution does not seem tovary as much as in Apache project.

Mozilla represents a way in which commercial and Open Source developmentapproaches could be combined. The interdependence among Mozilla modules ishigh and the effort dedicated in code inspections is high. Therefore, Mozilla’s coreteams are bigger than in Apache, employing more formal means of coordinatingthe project. But the fact is that, despite its commercial development roots, Mozillamanaged to leverage the OSS community, achieve high participation and result in ahigh-quality product.

3.1.5 GNOME

GNOME is also one of the biggest and most famous OSS projects. It is a desktopenvironment for Unix systems and its name was formed as an acronym of the words“GNU Network Object Model Environment.” In 2004, Daniel M. German published aresearch of GNOME, in order to examine how global software development can leadto success [Ger04b]. The discussion below is based on that article.

The GNOME project was started by Miguel de Icaza, a Mexican software pro-grammer. Its first version was released in 1997 and contained one simple appli-cation and a set of libraries. Today, GNOME has turned into a large project, withmore than two millions of LOC and hundreds of developers worldwide. In 2000, theGNOME Foundation (similar to Apache’s Software Foundation) was established. It iscomposed of four entities: the Board of Directors, the Advisory Board, the ExecutiveDirector and the members. Many of the participants in the Board of Directors are

Revision: final 67


Figure 30: The cumulative distribution of contributions to the code base for sevenMozilla modules, ([MFH02], p.336

fully employed in private companies. The Advisory Board is composed of corporateand non-profit organisations. Membership can be granted to any of the current con-tributors to the project, which can be non-programmers as well. By October 2003,the Foundation counted 320 members. The GNOME Foundation is also responsi-ble to organise sub-committees that will run some of the project’s administrativetasks, like the Foundation membership Committee, the Fund-raising Committee, theSysadmin Committee, the Release Team etc.

German reaches several interesting conclusions by examining the contributionsand the overall project’s evolution. First of all, an important factor of GNOME’s suc-cess is the wide participation in decision-making process. Developers are treatedas equal partners of the project and are inspired by its goals, which explains theirmotivation to work. Secondly, an essential feature of GNOME is the use of multipletypes of communication, like mailing lists, IRC and reports on the project’s currentstate of development. There are scheduled meetings about GNOME’s evolution, thatboost collaboration and team-spirit between contributors. Moreover, the creationof task forces makes their members accountable and committed towards GNOME’simprovement. Finally, there are clear procedures and policies for conflict manage-ment, as well as a strong culture of creating documentation, so that contributorsknow what others are working on.

3.1.6 FreeBSD

FreeBSD is an open-source operating system that is derived from BSD, the version ofUnix that has been developed by the University of California. The project started in

Revision: final 68


Figure 31: FreeBSD stable release growth by release number, ([IB06], p.207)

1993 and its current (end of 2006) stable version is 6.1. It is run by the FreeBSD de-velopers that have commit access to the project’s CVS. As it is considered a success-ful OSS project, it has attracted scientific interest over its evolutionary process. Themost recent publication on FreeBSD’s evolution has been committed by ClementeIzurieta and James Bieman [IB06]. Based on an earlier study of Trung Dinh-Trongand James Bieman [DTB04] that praised the system’s organizational structure, Izuri-eta and Bieman focused on examining the growth rate of FreeBSD stable releasessince its inception, by employing metrics such as LOCs, number of directories, totalsize in Kbytes, average and median LOC for header (dot-h) and source (dot-c) files,and number of modules for each sub-system and for the system as a whole.

This study indicates that FreeBSD follows a linear (and sometimes sub-linear)rate of growth, as it is demonstrated in figures 31 to 35. We observe that dot-c anddot-h files (figure 34) show a very slight growth in size, which is due to the factthat the system does not evolve in an uncontrolled manner, as Izurieta and Biemanexplain. It also has to be clarified that in figure 35, contrib subsystem contains soft-ware contributed by users, and sys subsystem is the system’s kernel. As one couldexpect, sys is smaller in size and grows in a slower pace than contrib, because itscontent goes through a stricter validation process before its inclusion in the system.

3.1.7 Other Studies

During the last years, some horizontal studies of OSS projects have been published,in which several projects are examined collectively. Such an example is an articleby Andrea Capiluppi, Patricia Lago and Maurizio Morisio [CLM04], in which theypick up 12 projects from the Freshmeat Open Source portal. These projects were

Revision: final 69


Figure 32: FreeBSD cumulative growth rate, ([IB06], p.207)

Figure 33: FreeBSD release sizes by development branch, ([IB06], p.208)

Revision: final 70


Figure 34: FreeBSD average and median values of dot-c and dot-h files, ([IB06],p.209)

Figure 35: FreeBSD contrib and sys sub-systems, ([IB06], p.210)

Revision: final 71


all “alive,” meaning that they had shown significant growth over time and therewere still developers working on them by the date of the research. Actually, theauthors report that during their research on Fresh Meat portal, they discoveredthat a significant percentage of the hundreds of accessible OSS projects were notevolving anymore, having no developers and no growth for a considerable amountof time. The authors concluded that that mortality of OSS projects is quite high.

After an initial observation of the sample, they clustered the 12 projects intothree categories: large, medium and small projects, as follows: Large projects: Mutt,ARLA Medium projects: GNUPARTED, Weasel, Disc-cover, XAutolock, Motion, Bub-blemon Small projects: Dailystrips, Calamaris, Edna, Rblcheck

The authors analysed some basic attributes of these projects, such as size, mod-ules and number of developers. According to their findings, all projects had grownat a linear rate over time, both in terms of size and in terms of the number of de-velopers. Some periodic fluctuations of the code’s size were noticed, mainly causedby internal redesigns of the software, but the long-term view has been upward inall cases. In large and medium projects, the core teams had grown as well, but in alimited way, which suggests that there is always a ceiling in the core project teams’expansion. The same patterns of linear or sub-linear growth have been discoveredfor the number of modules, too. In a later study, Andrea Capiluppi, Maurizio Morisioand Juan Ramil proceeded to a further examination of the ARLA project, reachingsimilar conclusions [CMR04].

Finally, another interesting research has been carried out by James W. Paulson,Giancarlo Succi and Armin Eberlein [PSE04]. In order to test the effectiveness ofOSS development process, they investigated the evolutionary patterns of three ma-jor OSS projects (Linux, GCC and Apache) in comparison to three closed-sourcesoftware projects, the names of which were kept confidential. According to theirfindings, OSS development structure fosters creativity and constructive communica-tion among the developers more effectively than traditional ways of software devel-opment, because the new functions and features that were added to OSS projectswere bigger in number and in volume than the ones added to closed-source softwareprojects. In addition, OSS projects perform faster fixing of bugs and other defects,because of the greater number of developers and testers that contribute to them.However, the evidence presented in this research does not support the argumentsthat OSS systems are more modular and grow faster than closed-source competitors.

3.1.8 Simulation of the temporal evolution of OSS projects

A generic structure for F/OSS simulation modeling

The authors in [ASAB02] and later on in [ASS+05] described a general frameworkfor F/OSS dynamical simulation models and the extra difficulties that have to beconfronted relative to analogous models of the closed-source process. It is actually

Revision: final 72


a framework for discrete-event simulation models which the authors presented asfollows:

1. Much unlike closed source projects, in F/OSS projects, the number of contrib-utors greatly varies in time and is based on the interest that the specific F/OSSproject attracts. It cannot be directly controlled and cannot be predeterminedby project coordinators. Therefore, an F/OSS model should a) contain an ex-plicit mechanism for determining the flow of new contributors as a function oftime and b) relate this mechanism to specific project-dependent factors thataffect the overall “interest” in the project.

2. In any F/OSS project, any particular task at any particular moment in timecan be performed either by a new contributor or an old one. In addition, al-most all F/OSS projects have a dedicated team of “core” programmers thatperform most of the contributions, while their interest in the project stays ap-proximately the same. Therefore, the F/OSS simulation model must containa mechanism that determines the number of contributions that will be under-taken per category of contributors (e.g. new, old or core contributors) at eachtime interval.

3. In F/OSS projects, there is also no direct central control over the number ofcontributions per task type or per project module. Anyone may choose anytask (eg. code writing, defect correction, etc) and any project module to workon. The allocation of contributions per task type and per project module dependon the following sets of factors:

(a) Programmer profile (eg. some programmers may prefer code testing todefect correcting). These factors can be further categorized as follows:

i. constant in time (eg. the preference of a programmer in code-writing)and

ii. variable with time (eg. the interest of a programmer to contributeto any task or module may vary based on frequency of past contribu-tions).

(b) Project-specific factors (eg. a contributor may wish to write code for aspecific module, but there may be nothing interesting left to write for thatmodule).

Therefore, the F/OSS model should (a) identify and parameterise the depen-dence of programmer interest to contribute to a specific task/module on (i)programmer profile, (ii) project evolution and (b) contain a quantitative mech-anism to allocate contributions per task type and per project module.

Revision: final 73


4. In F/OSS projects, because there is no strict plan or task assignment mech-anism, the total number of Lines of Code (LOC) written by each contributorvaries significantly per contributor and per time period, again in an uncon-trolled manner. Therefore, project outputs such as LOC added, number ofdefects or number of reported defects are expected to have much larger sta-tistical variance than in closed source projects. The F/OSS simulation modelshould determine delivered results of particular contributions in a stochasticmanner, i.e. drawing from probability distributions. This is a similar practiceto what is used in closed source simulation models, with the difference beingthat probability distributions here are expected to have a much larger variance.

5. In F/OSS projects there is no specific time plan for project deliverables. There-fore, the number of calendar days for the completion of a task varies greatly.Also, delivery times should depend on project specific factors such as the amountof work needed to complete the task. Therefore, task delivery times should bedetermined in a stochastic manner on the one hand, while average deliverytimes should follow certain deterministic rules.

The authors concluded that the core of any F/OSS simulation model should bebased upon a specific behavioural model that must be properly quantified in order tomodel the behaviour of project contributors in deciding a) whether to contribute tothe project or not, b) which task to perform, c) which module to contribute to and d)how often to contribute. The behavioural model should then define the way that theabove four aspects depend on a) programmer profile and b) project-specific factors.

The formulation of a behavioural model must be based on a set of qualitativerules. Fortunately, previous case studies have already pinpointed such rules eitherby questioning a large sample of F/OSS contributors or by analysing publicly avail-able data in F/OSS project repositories. As previous case studies identified manycommon features across several F/OSS project types, one certainly can devise a be-havioural model general enough to describe at least a large class of F/OSS projects.

Selecting a suitable equation that describes a specific qualitative rule is largelyan arbitrary task in the beginning, however a particular choice may be subsequentlyjustified by the model’s demonstrated ability to fit actual results. Once the be-havioural model equations and intrinsic parameters are validated, then the modelmay be applied to other F/OSS projects.

Application of an F/OSS simulation modelGeneral procedureFigure 36 shows the structure of a generic F/OSS dynamic simulation model. As

in any simulation model of a dynamical system, the user must specify on input a) val-ues to project-specific time-constant parameters, b) initial conditions for the projectdynamic variables. These values are not precisely known from project start. One

Revision: final 74


INPUT (Project-Specific)

OSS SIMULATION

TIME EVOLUTION OF DYNAMIC VARIABLES

Probability distribution parameters

Behavioral model project-specific parameters

PROBABILITY DISTRIBUTIONS

Task Delivery times

BEHAVIOURAL MODEL Behavioural

model fixed

parameters

Task Deliverables

Initial conditions of dynamic variables

OUTPUT

Figure 36: Structure of a generic F/OSS dynamic simulation model. Figure wasreproduced from [ASS+05].

may attempt to provide rough estimates for these values based on results of other(similar) real-world F/OSS projects. However, these values may be readjusted in thecourse of the evolution of the simulated project as real data becomes available. Ifthe simulation does not get more accurate in predicting the future evolution of theproject, by applying this continuous re-adjustment of parameters, it means that a)either some of the behavioural model qualitative rules are based on wrong assump-tions for the specific type of project studied or, b) the values of behavioural modelwhich are project-independent must be re-adjusted.

Calibration of the modelThe adjustment of behavioural model intrinsic parameters is the calibration pro-

cedure of the model. According to this procedure, one may introduce arbitrary val-ues to these parameters as reasonable ’initial guesses’. Then, one would run thesimulation model, re-adjusting parameter values until simulation results satisfac-torily fit the results of a real-world F/OSS project in each time-window of projectevolution. More than one similar type F/OSS projects may be used in the calibrationprocess.

Validation of the modelOnce the project-independent parameters of the behavioural model are properly

calibrated, the model may be used to simulate other F/OSS projects.Practical use of F/OSS simulation models

Revision: final 75


• Prediction of F/OSS project evolution. Project coordinators may obtain a pic-ture of plausible evolution scenarios of the project they are about to initiate.Software users may also be interested in such prediction, as it would indicatewhen the software will most likely be available for use. This also applies toorganizations, especially if they are interested to pursue a specific businessmodel that is based on this software.

• F/OSS project risk management. F/OSS projects are risky, in the sense thatmany, not easily anticipated, factors may affect negatively their evolution. Sim-ulation models may help in quantifying the impact of such factors, taking intoaccount their probability of occurrence and the effect they may have, in casethey occur.

• What-if analysis. F/OSS coordinators may try different development processes,coordination schemes (e.g. core programming team), tool usage, etc. to iden-tify the best possible approach for initiating and managing their project.

• F/OSS process evaluation. The nature of F/OSS guarantees that in the fu-ture we will observe new types of project organisation and evolution patterns.Researchers may be particularly interested in understanding the dynamics ofF/OSS development and simulation models may provide a suitable tool for thatpurpose.

Simulation studies and resultsBased on the general framework described earlier, the authors of [ASAB02] pre-sented a formal mathematical model based on findings of F/OSS case studies. Thesimulation model was applied to the Apache project simulation outputs were com-pared to real data. The model was further refined in [ASS+05] and similarly appliedto the gtk+ module of the GNOME project. Simulation outputs included the temporalevolution of LOC, active programmers, residual defect density, number of reporteddefects etc.

Figures 37 and 38 compare simulation results and real data for LOC vs. time forthe Apache project and gtk+ in respect.

Conclusions

In conclusion, the authors in both [ASAB02] and [ASS+05] claimed that existingcase studies do not contain the complete set of data necessary for a full-scale cali-bration and validation of their simulation model. Despite this fact, qualitatively, thesimulation results demonstrated the super-linear project growth at the initial stages,the saturation of project growth at later stages where a project reached a level offunctional completion (Apache) and the effective defect correction, facts that agreewith known studies.

Revision: final 76


Figure 37: Simulation results for the Apache project: Cummulative LOC differencevs. time for the Apache project. The bold line is the average of the 100 runs. Thegray lines are one standard deviation above and below the average. The dashedvertical line shows the end of the time period for which data was collected in theApache case study [MFH02]. Figure was reproduced from [ASS+05]

.

Figure 38: LOC evolution in gtk+ module of GNOME project: Cumulative LOC dif-ference vs. time. The bold line is the expectation (average) value of LOC evolution.The gray lines are one standard deviation above and below the average. The dashedvertical line shows approximately the end of the time period for which data wascollected in the GNOME case study. Figure was reproduced from [ASS+05].

Revision: final 77


One of the most evident intrinsic limitations of the F/OSS simulation models, theauthors claimed, comes from the very large variances of the probability distribu-tions used. On output, this leads to large variances in the evolution of key projectvariables, a fact that naturally limits the predictive power of the model.

Finally, the authors concluded that despite the aforementioned intrinsic and ex-trinsic limitations, their “first attempt” simulation runs, fairly demonstrated themodel’s ability to capture reported qualitative and quantitative features of F/OSSevolution.

3.2 Code Quality Studies

There are no studies, regarding the code quality of Open Source software. Manyearly studies focus on evolutionary aspects of Open Source software and study evo-lution laws of Open Source software development. It is not until recently that OpenSource code quality studies appeared in highly ranked journals (not white papers byconsulting firms or subjective articles, but reviewed research), resulting the smallnumber of available Open Source code quality studies.

One of the first studies that examined the code quality in Open Source softwarewas conducted by Stamelos et al [SAOB02]. In this study authors tried to measurethe modularity and the structural quality of the code of 100 Open Source applica-tions and tried to correlate the size of the application components with the usersatisfaction. The measurement of the applications was conducted by a commercialtool (Telelogic Logiscope) and the quality was assessed against a quality standardvery similar with that of ISO/IEC 9126. The standard was proposed by the tool itselfand, as authors indicate, is used by more than 70 multinational companies in variousareas. The model facilitated metrics that are a mixture of size metrics, structuralmetrics and complexity metrics, which can be found in this document in the metricssection. The paper also grounds its findings on statistical foundations.

The tool measured each module of all applications and evaluates it against thebuilt in model. For each criterion the tool outputs a recommendation level, namelyACCEPT, COMMENT, INSPECT, TEST and REWRITE. The result of the measurementis depicted in Table 1. As authors notice, the table shows that the mean value of theacceptable components is about 50%, a value that neither good nor bad, and it canbe interpreted both ways. It suggests that either the code quality of the Open Sourceapplications is higher than someone could expect, taken into account the nature ofOpen Source software development and the time of the study, or the quality is lowerthan the industrial code standard implied by the tool.

Regarding the second part of their study, component size or metrics and usersatisfaction, authors did not find any relationship between the majority of the metricsconsidered and user satisfaction. However they detected indication of relationshipbetween the size of its component and user satisfaction (or else “external quality”

Revision: final 78


Table 1: Modules percentage in its recommendation level, as studied by Stamelos etal.

% Minimum Maximum Mean SD MedianACCEPTED 0 100 50.18 18.65 48.37COMMENT 0 66.66 30.95 14.09 31.83INSPECT 0 50 8.55 8.5 7.65TEST 0 16 4.31 4.14 3.55

REWRITE 0 100 5.57 10.73 3.2UNDEFINED 0 7.69 0.42 1.29 0

of a project). The two size metrics that relate with satisfaction are the “Numberof statements” and “Program Length”. This relation is negative, i.e. the bigger acomponent is, the worse the performance of its “external quality”.

The authors at the end suggest that Open Source performs no worse of a standardimplied by an industrial tool and they emphasise the need for more empirical studiesin order to clarify Open Source quality performance. The authors suggest (in 2002)that in an Open Source project programmers should follow a programming standardand have a quality assurance plan, leading to a high quality code. This suggestionhas been recently adopted by large Open Source projects like KDE35.

Another study from the same group that assesses the maintainability of opensource software is that of Samoladas et al [SSAO04]. In this paper, authors studiedthe maintainability of five Open Source software projects and one closed source, acomparison that is not frequent in Open Source literature. The measurement wasconducted in successive versions, allowing the study of the evolution of maintain-ability and how it behaves over time. The maintainability was measured using theMaintainability Index described in section 1.3.3 and the measurement was done withthe help of a metrics package found in the Debian r3.0 distribution, which containstools from Chris Lott’s page, and a set of Perl scripts to coordinate the whole process.

The projects under study had certain characteristics: Two of them were pureopen source projects (initiated as Open Source and continue to evolve as such), theother is an academia project that gave birth to an Open Source project, the fourthis a closed source project that opened its code and continued as open source, thefifth was an Open Source project that was forked to a commercial one, while itselfcontinued as Open Source and the last one is the latter closed source, which codeis available with a commercial, non-modifiable licence. The result of the study wasthat in all cases the maintainability of all projects deteriorates over time. When theycompared the evolution of the maintainability of the closed source one versus its

35http://www.englishbreakfastnetwork.org/

Revision: final 79


Figure 39: Maintainability Index evolution for an Open Source project and its closedsource “fork” (Samoladas et al.).

counterpant, the closed source performs worse than the Open Source project. Theauthors conclude that open source code quality, as it is expressed by maintainability,suffers from the same problems that have been observed in closed source softwarestudies. They also point the fact that further empirical studies are needed in orderto produce safe results about Open Source code quality.

Another study of maintainability of Open Source projects and particularly themaintainability of the Linux kernel, was conducted by Yu et al. [YSCO04]. Here,authors study the number of instances of common coupling between the 26 kernelmodules and all the other non kernel modules. As coupling they mean the degreeof interaction between the modules and thus the dependency between them. Inthis document, coupling was also explained in section 1.3.1. Additionally, for kernelbased software, they also consider couplings between the kernel and non kernelmodules. The reason they studied coupling as a measure for maintainability is that,as authors suggest and explain, common coupling is connected to fault proneness,thus maintainability.

The specific study is a follow up to previous ones, which the same team has con-ducted. In these previous studies, they examined 400 successive versions of theLinux kernel and tried to find relations between the size, as it is expressed by thelines of code, and the number of instances of coupling. Their findings showed thatthe number of lines of code in each kernel module increases linearly with the versionnumber, but that the number of instances of common coupling between kernel mod-ules and all others shows an exponential growth. In this new study they perform anin depth analysis of the notion of coupling in the Linux kernel. In order to perform

Revision: final 80


Figure 40: Maintainability Index evolution for three Open Source project (Samoladaset al.).

their new study, authors first refined the definition of coupling and defined differ-ent expressions of it (e.g global variables inside the Linux kernel, global variablesoutside the kernel, etc), by separating coupling into five categories and characterisethem as “safe” and “unsafe”. Then, they constructed an analysis technique and met-ric for evaluating coupling and applied it to analyse the maintainability of the Linuxkernel.

The application of this classification of coupling in the Linux 2.4.20 kernel, showedthat for a total 99 global variables (common expression of coupling) there are 15.110instances of them, of which 1.908 are characterised as “unsafe”. Along with the re-sults from their previous study (the exponential growth of instances) they concludethat the maintainability of the Linux kernel will face serious problems in the longterm.

A more recent paper from the same group compares the maintainability, as itexpressed by coupling, of the Linux kernel with that of the FreeBSD, OpenBSD andNetBSD [YSC+06]. They applied the similar analysis as in [YSCO04] and comparedthe performance of Linux against the BSD family (as statistical formal hypotheses).Results showed that there Linux contains considerable more instance of commoncoupling than the BSD family kernels, making it more difficult to maintain and faultproness to changes. Authors suggest that the big difference between Linux and theBSD family kernels indicates that it is possible to design a kernel without having alot of global variables and, thus, the Linux kernel development team does not takeinto account maintainability so much.

A more recent study is that of Günes Koru and Jeff Tian [KT05]. Here the two

Revision: final 81


authors try to correlate change proness and structural metrics, like size, coupling,cohesion and inheritance metrics. They suggest, based on previous studies, thatchange prone modules are also defect prone and these modules can be spotted bymeasuring their structural characteristics. In short, authors measured two -large-Open Source projects, namely Mozilla and OpenOffice, by using a large set of struc-tural measures which fit into the categories mentioned before. The measurementwas done with the Columbus36 tool. In addition with the help of a custom made (bythem) Perl scripts, they counted the differences for each application from its imme-diately preceding revision. As the smallest software unit they considered the class.This measurement involved 800 KLOC and 51 measures for Mozilla and 2.700 KLOCand 46 measures for OpenOffice.

With the results obtained they questions whether high change high change mod-ules were the same as modules with the highest measurement values consideringeach metric individually. They also tried to compare the results with an older similarstudy of their own, a study that was conducted for six large scale projects in industry(IBM and Nortel) In order to answer the questions they created appropriate statis-tical, formal hypothesis and tests. The results showed that there is strong evidencethat modules, which had the most changes, did not have the highest measurementvalues, a fact that was true for the previous industrial study. Authors also performeda similar analysis, but with clustering techniques. The second analysis resulted inthe same statement, but also it pointed out that the high change modules were notthe modules with the highest measurement values but those with fairly high mea-surement values.

The latter was the main outcome of the paper and, as authors indicate, the sameis true for the six industrial applications. Authors, trying to explain this, suggestthat this fact holds because expert programmers in Open Source take on the difficulttasks and novice ones the easier ones. This might result in modules with the higheststructural measures, which solve complex tasks, not to be the most problematicones. Of course as they suggest this needs further investigation and is a centralissue in their future studies.

A very intersting paper, although not directly an Open Source code quality study,is that of Gyimóthy, Ferenc and Siket [GFS05]. The study has as its main goal, thevalidation of the Object Oriented Metrics Suite of Chidamber and Kemerer ( CK suite- as described in section 1.3.1) with the help of open source software, not the assess-ment of the quality of an Open Source software per se. Particularly they validatedthe CK metrics suite with the help of a framework-metrics collection tool namedColumbus, which was mentioned previously, on an Open Source project, Mozilla. Inorder to perform their analysis, except from using Columbus to extract the metrics,they also collected information about bugs in Mozilla from the bugzilla database,the system that Mozilla uses for bug reporting and tracking. The validation of the

36http://www.frontendart.com

Revision: final 82


Figure 41: Changes in the mean value of CK metrics for 7 version of Mozilla (Gy-imóthy et al.)

metrics was done with statistical methods such as logistic and linear regression, butalso with machine learning techniques, like decision trees and neural networks. Thelatter techniques were used to predict fault proneness of the code.

The methodology followed can be summarized in:

1. Analysis and calculation of metrics from the Mozilla code

2. Application of the four techniques (logistic and linear regression, decision tressand neural networks) to predict the fault proneness of the code

3. Analysis of the changes in the fault-proneness of Mozilla through seven ver-sions using the results.

It (the methodology) is well described in the paper. As authors admit, the challengeof the whole process was to associate the bugs from the Bugzilla database with theclasses found in the source code. This association was complicated and demanded alot of iterative work, it is described in the paper.

From the “pure” software engineering part of the study, the validation of themetrics and the models predictiveness, the most interesting results is that the CBOmetric (Coupling Between Object classes) seems to be the best in predicting thefault-proneness of classes. Someone is easy to notice that again the notion of cou-pling is strongly related to bugs and, thus, to maintainability. This fact demandsfurther investigation and it has to be in our project’s research agenda. Regardingthe “Open Source” part of the study, authors observed a significant growth of thevalues of 5 out of 7 CK metrics (the seventh is LCOMN - Lower of Cohesion on Meth-ods allowing Negative Value, a metric not included in the CK metrics suite). Authorsassume that this happened because of big reorganization of the Mozilla source codewith version 1.2, causing this growth. Of course this justification needed furtherinvestigation. Figure 11 the changes of metrics for the seven versions of the Mozillasuite. To conclude, we could say that, although this study does not directly assesses

Revision: final 83


Open Source, it is a very good example of applying empirical software engineeringresearch.

3.3 F/OSS Community Studies in Mailing Lists

3.3.1 Introduction:

Free and Open Source Software (F/OSS) development not only exemplifies a vi-able software development approach, but it is also a model for the creation of self-learning and self-organising communities in which geographically distributed indi-viduals contribute to build a particular software. The Bazaar model [Ray99], asopposed to the Cathedral model of developing F/OSS has produced a number ofsuccessful applications (eg. Linux, Apache, Mozilla, MySQL, etc). However, the ini-tial phase of most F/OSS projects does not operated at the Bazaar level and onlysuccessful projects make the transition from Cathedral to Bazaar style of softwaredevelopment [Mar04].

Participants who are motivated by a combination of intrinsic and extrinsic mo-tives congregate in projects to develop software on-line, relying on extensive peercollaboration. Some project participants often augment their knowledge on codingtechniques by having access to a large code base. In many projects epistemic com-munities of volunteers provide support services [BR03], act as distributing agentsand help newcomers or users. The F/OSS windfall is such that there is increasedmotivation to understand the nature of community participation in F/OSS projects.

Substantial research on Open Source software projects focused on software repos-itories such as mailing lists to study developer communities with the ultimate aim toinform our understanding of core software development activities. Mundane projectactivities which are not explicit in most developer lists have also received attention[SSA06], [LK03a]. Many researchers focus on mailing lists in conjunction with othersoftware repositories [KSL03], [Gho04], [LK03a], [HM05]. These studies providedgreat insight into the collaborative software development process that characterisesF/OSS projects. F/OSS community studies in mailing lists are important because onone hand, one major technical infrastructure F/OSS projects require is mailing lists.On the other hand, F/OSS projects are symbiotic cognitive systems where ongo-ing interactions among project participants generate valuable software knowledge- a collection of shared and publicly reusable knowledge - that is worth archiving[SSA06]. One form of knowledge repository where archiving of public knowledgetakes place is the project’s mailing list.

3.3.2 Mailing Lists

Lists are active and complex living repositories of public discussions among F/OSSparticipants on issues relating to project development and software use. They con-

Revision: final 84


tain ’software trails’-pieces left behind by the contributors of a software projectand are very important in educating future developers [GM03b] and non-developers[SSA06] on the characteristics and evolution of the project and software. Generally,a project will host many lists, each addressing a specifically area of need. For ex-ample, software developers will consult developer lists, participants needing helpon documentation will seek links from lists associated with project documentation,beginners or newbies will confer with mentors’ lists, etc. Fundamentally, two formsof activities are addressed in lists;

• core activities typified by developing, debugging, and improving software.Developer mailing lists are usually the avenues for such activities

• mundane activities [KSL03], [MFH02], [LK03a]. Documentation, testing, lo-calisation, and field support exemplifies these activities and they take placepredominately in non-developer lists [SSA06]

However, expert software developers, project and package maintainers take partin mundane activities in non-developer mailing lists. They interact with participantsand help answer questions others posted. Sometimes they encounter useful issueswhich help them to further plan and improve code or overall software quality andfunctionality. In addition, although mundane activities display a low level of innova-tiveness, they are fundamental for the adoption of F/OSS [BR03].

3.3.3 Studying Community Participation in Mailing Lists: Research method-ology

Compared to the traditional way of developing proprietary software, F/OSS devel-opment has provided researchers with an unprecedented abundance of easily ac-cessible data for research and analysis. It is now possible for researchers to obtainlarge sets of data for analysis or to carry out what [Gho04] referred to as ’Inter-net archaeology’ in F/OSS development. However, [Con06] remarked that collectingand analysing F/OSS data has become a problem of abundance and reliability interms of storage, sharing, aggregation, and filtering of the data. F/OSS projectsemploy different kinds of repositories for software development and collaboration.From these repositories community activities can be analysed and studied. The fig-ure below shows a methodology by which community participation in mailing listsmay be studied. The methodology shows F/OSS project selection, choice of softwarerepository and lists to analyse, data extraction schema, and data cleaning proce-dure used to extract results for analysing community participation in developer andnon-developer mailing lists.

Mailing lists participants interact by exchanging email messages. A participantposts a message to a list and may get a reply from another participant. This kindof interaction represents a cycle where posters are continuously internalising and

Revision: final 85


Figure 42: Methodological Outline to Extract Data fromMailing Lists Archives. Mod-ified from [SSA06] (p.1027).

Revision: final 86


externalising knowledge into the mailing lists. In any project’s mailing list, theseposters could assume the role of knowledge seekers and/or knowledge providers[SSA06]. The posting and replying activities of the participants are two variablesthat can be compared, measured and quantified. The affiliation an individual partic-ipation has with others as a result of the email messages they exchange within thesame list or across lists in different projects could be mapped and visualised usingSocial Network Analyses (SNA). For the construction of such an affiliation networkor ’mailing list network’ see ([SSA06], pp. 130-131).

Revision: final 87


4 Data Mining in Software Engineering

4.1 Introduction to Data Mining and Knowledge Discovery

The recent explosive growth of our ability to generate and store data has createda need for new, scalable and efficient, tools for data analysis. The main focus ofthe discipline of knowledge discovery in databases is to address this need. Knowl-edge discovery in databases is the fusion of many areas that are concerned withdifferent aspects of data handling and data analysis, including databases, machinelearning, statistics, and algorithms. The term Data Mining is also used as a synonymto Knowledge Discovery in Databases, as well as to refer to the techniques used forthe analysis and the extraction of knowledge from large data repositories. Formally,data mining has been defined as the process of inducing previously unknown andpotentially useful information from databases.

4.1.1 Data Mining Process

The two main goals of data mining are the prediction and the description. Theprediction aims at estimating the future value or predicting the behaviour of someinteresting variables based on some other variables’ behaviour. The description isconcentrated on the discovery of patterns that represents the data of a complicateddatabase by a comprehensible and exploitable way. A good description could suggesta good explanation of the data behaviour. The relevant importance of the predictionand description varies for different data mining applications. However, as regardsthe knowledge discovery, the description tends to be more important than the predic-tion contrary to the pattern recognition and machine learning application for whichthe prediction is more important. A number of data mining methods have been pro-posed to satisfy the requirements of different applications. However, all of themaccomplish a set of data mining tasks to identify and describe interesting patternsof knowledge extracted from a data set. The main data mining tasks are as follows:

• Unsupervised learning (Clustering). Clustering is one of the most useful tasksin data mining process for discovering the groups and identifying interestingdistributions and patterns in the underlying data. The clustering problem isabout partitioning a given data set into groups (clusters) such that the datapoints in a cluster are more similar to each other than points in different clus-ters [JD88, KR90]. In the clustering process, there are no predefined classesand no examples that would show what kind of desirable relations should bevalid among the data. That is why it is perceived as an unsupervised process[BL96].

• Supervised learning (Classification). The classification problem has been stud-ied extensively in the statistics, pattern recognition and machine learning com-

Revision: final 88


munity as a possible solution to the knowledge acquisition or knowledge ex-traction problem [DH73] [WK91]. It is one of the main tasks in the data miningprocedure for assigning a data item to a predefined set of classes. Accordingto [FPSSR96], classification can be described as a function that maps (classi-fies) a data item into one of the several predefined classes. A well-defined setof classes and a training set of pre-classified examples characterise the clas-sification. On the contrary, the clustering process does not rely on predefinedclasses or examples [BL96]. The goal in the classification process is to inducea model that can be used to classify future data items whose classification isunknown.

• Association rules extraction. Mining association rules is one of the main tasksin the data mining process. It has attracted considerable interest because therules provide a concise way to state potentially useful information that is eas-ily understood by the end-users. Association rules reveal underlying “correla-tions” between the attributes in the data set. These correlations are presentedin the following form: A → B, where A, B refer to sets of attributes in underly-ing data.

• Visualisation of Data. It is the task of describing complex information throughvisual data displays. Generally, visualisation is based on the premise that agood description of an entity (data resource, process, patterns) will improve adomain expert’s understanding of this entity and its behaviour.

4.2 Data mining application in software engineering: Overview

A large amount of data is produced in software development that software organisa-tions collect in hope of extracting useful information from them and thus better un-derstanding their processes and products. However, it is widely believed that largeamount of useful information remains hidden in software engineering databases.Specifically, the data in software development can refer to versions of programs,execution traces, error/bug reports, Open Source packages. Also mailing lists, dis-cussion forums and newsletters could provide useful information about software.Data mining provides the techniques to analyse and extract novel, interesting pat-terns from data. It assists with software engineering tasks by better understandingsoftware artifacts and processes. Based on data mining techniques we can extractrelations among software projects and extracted knowledge. Then we can exploitthe extracted information to evaluate the software projects and/or predict softwarebehaviour. Below we briefly describe the main tasks of data mining and how theycan be used in the context of software engineering [MN99].

• Clustering in software engineeringThe clustering produce a view of the data distribution. It can also be used to

Revision: final 89


automatically identify data outliers. An example of using data mining in soft-ware engineering is to define groups of similar modules based on the numberof modifications and cyclomatic number metrics (the number of linearly inde-pendent paths through a program’s source code).

• ClassificationClassification is a function that maps (classifies) a data item into one of the sev-eral predefined classes. One of the widely used classification techniques is thedecision trees. They can be used to discover classification rules for a chosenattribute of a dataset by systematically subdividing the information containedin this data set. Decision trees have been one of the tools that have been cho-sen for building classification models in the software engineering field. Figure43 shows a classification tree that has been built to provide a mechanism foridentifying risky software modules based on attributes of the module and itssystem. Thus based on the given decision tree we can extract the followingrule that assists with making decision on errors in a module:

IF(# of data bindings > 10) AND (it is part of a non real-time system)THENthe module is unlikely to have errors

• Association rules in software engineeringAssociation discovery techniques discover correlations or co-occurrences ofevents in a given environment. Thus it can be used to extract information fromcoincidences in a dataset. Analysing for instance the logs errors discoveredat software modules in a system we can extract relations between inducingevents based on the software module features and errors categories. Such arule could be the following:(large/small size, large/small complexity, number of revisions) → (interface er-ror, missing or wrong functionality, algorithms or data structure error etc.)

A number of approaches has been proposed in literature which based on the abovedata mining techniques aims to assist with some of the main software engineeringtasks, that is software maintenance and testing. We provide an overview of theseapproaches in the following section. Also Table 2 summarises their main features.

4.2.1 Using Data mining in software maintenance

Data mining due to its capability to deal with large volumes of data and its efficiencyto identify hidden patterns of knowledge, has been proposed in a number of researchwork as mean to support industrial scale software maintenance.

Revision: final 90


Figure 43: Classification tree for identifying risky software modules [MN99]

Analysing source code repositoriesData mining approaches have been extensively used to analyse source code versionrepositories and thus assist with software maintenance and enhancement. Many ofthese repositories are examined and managed by tools such as CVS (Concurrent Ver-sion Systems). These tools store difference information access across document(s)versions, identifies and express changes in terms of physical attributes, i.e., file andline numbers. However, CVS does not identify, maintain or provide any change-control information such as grouping several changes in multiple files as a singlelogical change. Moreover, it does not provide high-level semantics of the natureof corrective maintenance(e.g. bug-fixes). Recently, the interest of researchershas been focused on techniques that aim to identify relationships and trends at asyntactic-level of granularity and further associate high-level semantics from the in-formation available in repositories. Thus a wide array of approaches that performmining of software repositories (MSR) have been emerged. They are based on datamining techniques and aim to extract relevant information from the repositories,analyse it and derive conclusions within the context of a particular interest. Theseapproaches based on [KCM05] can be classified based on:

• Entity type and granularity they use ( e.g. file, function, statement, etc).

• Expression and definition of software changes (e.g. modification, addition,deletion, etc).

• Type of question (e.g. market-basket, frequency of a type of change, etc).

Revision: final 91


Technique Approach Input Output

Data miningClassification [FLMP04] execution profiles & decision tree of

result(success/failure) failed executionsClustering [KDTM06] source code Extract significant

behavioural or patterns fromstructural entities, the system source code

attributes, groups of similarmetrics classes, methods, data

Association rules [ZWDZ04] software entities Prediction offailures,

e.g.functions correlationsbetween entities

identification of additions,modifications,deletionsof syntactic entities

Neural networks [LFK05] input, output variables a network producingof software system sets for function testing

DifferencingPattern extraction [WH05] source code, track of bugs

change historyAnalysis of [RRP04] source code syntactic and

semantic graph repositories semantic changes

CVS AnnotationsSemantic analysis [GHJ98] version history syntax &

of source code, semantic -classes hidden dependencies

[GM03a] file & comments syntax &

semantic -file coupling

Heuristic[HH04] CVS annotation candidate entities

heuristics for change

Table 2: Mining approaches in software engineering

In the sequel, we introduce the main concepts used in MSR and then we brieflypresent some of the most known MSR approaches proposed in literature.

Fundamental Concepts in MSR. The basic concepts with respect to MSR involve

Revision: final 92


the level of granularity of what type of software entity is investigated, the changesand the underlying nature of a change. Then most widely used concepts can besummarised to the followings:

• An entity, e, is a physical, textual or syntactic element in software. For example,a file, line, function, class, comment, if-statement, etc.

• A change is a modification, addition, deletion, to or of an entity. A changedescribes which entities are changed and where the change occurs.

• The syntax of a change is a concise and specific description of the syntacticchanges to the entity. This description is based on the grammar of the entities’language. For instance, a condition was added to an if-statement; a parameterwas renamed; assignment statement was added inside a loop etc.

• The semantics of a change is a high level, yet concise description of the changein the entity’s semantics or feature space. For instance, a class interfacechange, bug fix, a new feature was added to GUI etc.

MSR via CVS annotations. One approach is to utilise CVS annotation information.Gall et. al. [GHJ98] propose an approach for detecting common semantic (logicaland hidden) dependencies between classes on account of addition or modificationof particular class. This approach is based on the version history of the sourcecode where a sequence of release numbers for each class in which its changes arerecorded. Classes that have been changed in the same release are compared inorder to identify common change patterns based on author name and time stampfrom the CVS annotations. Classes that are changed with the same time stamp areinferred to have dependencies.

Specifically, this approach can assist with answering questions such as whichclasses change together, how many times was a particular class changed, how manyclass changes occurred in a subsystem (files in a particular directory). An approachthat studies the file-level changes in software is presented in [Ger04a]. The CVSannotations are utilised to group subsequent changes into what termed modificationrequest (MR). Specifically this approach focus on studying bug-MRs and comment-MRs to address issues regarding the new functionality that may be added or thebugs that may be fixed by MRs, the different stages of evolution to which MRs cor-respond or identify the relation between the developer and the modification of files.

MSR via Data Mining. Data mining provides a variety of techniques with potentialapplication to MSR. One of these techniques are the association rules. The workproposed by Zimmerman et al [ZWDZ04] exploit the association rules extractiontechnique to identify co-occurring changes in a software system. For instance, wewant to discover relation between the modification of software entities. Then we aim

Revision: final 93


to answer the question when a particular source-code entity (e.g. a function A) ismodified, what other entities are also modified (e.g. the functions with names B andC)? Specifically, a tool is proposed that parses the source code and maps the linenumbers to the syntactic or physical-level entities. These entities are representedas a triple (filename, type, id). The subsequent entity changes in the repository aregrouped as a transaction. An association rule mining techniques is then applied todetermine rules of the form B, C → A. This techniques has been applied to open-source projects with a goal of utilising earlier version to predict changes in the laterversions. In general terms, this technique enables the identification of additions,modifications, deletions of syntactic entities without utilising any other external in-formation. It could handle various programming languages and assists with detect-ing hidden dependencies that cannot be identified by source code analysis.

MSR via Heuristics. CVS annotation analysis can be extended by applying heuris-tics that include information from source code or source code models. Hassan etal [HH04] proposed a variety of heuristics (developer-based, history-based, code-layout-based (file-based)) which are then used to predict the entities that are candi-dates for a change on account of a given entity being changed. CVS annotations arelexically analysed to derive the set of changed entities from the source-code repos-itories. Also the research in [ZWDZ04] and [HH04] use source-code version historyto identify and predict software changes. The questions that they answered are quiteinteresting with respect to testing and impact analysis.

MSR via Differencing. Source-code repositories contain differences between ver-sions of source code. Thus MSR can be performed by analysing the actual source-code differences.Such an approach that aims to detect syntactic and semantic changes from a versionhistory of C code is presented by Raghavan [RRP04]. According to this approach,each version is converted to an abstract semantic graph (ASG) representation. Thisgraph is a data structure which is used in representing or deriving the semanticsof an expression in a programming language. A top-down or bottom-up heuristics-based differencing algorithm is applied to each pair of in-memory ASGs. The dif-ferencing algorithm produces an edit script describing the nodes that are added,deleted, modified or moved in order to achieve one ASG from another. The editscripts produced for each pair of ASGs are analysed to answer questions from entitylevel changes such as how many functions and functions calls are inserted, added ormodified to specific changes such as how many if statement conditions are changed.Also in [CH04] a syntactic-differencing approach, which is called meta-differencing,is introduced. It allows us to ask syntax-specific questions about differences. Accord-ing to this approach the abstract syntax tree (AST) information is directly encodedinto the source code via XML format. Then we compute the added, deleted or mod-

Revision: final 94


ified syntactic elements based on the encoded AST. The types and prevalence ofsyntactic changes can be easily computed. Specifically, the approach supports thefollowing questions:

i Are new methods added to an existing class?

ii Are there changes to pre-processor directives?

iii Was the condition in an if-statement modified?

According to the above discussion on MSR we can conclude that the types of ques-tions that MSR can answer can be classified to two categories:

• Market-basket questions. These are formulated as :IF A happens then what ELSE happens on a regular basis?The answer to such a question is a set of rules or guidelines describing situationof trends or relationships. This can be expressed as follows: if A happens thenB and C happen X amount of the time.

• Questions dealing with the prevalence or lack of a particular type or change.

The type of questions often addresses finding hidden dependences or relationshipswhich could be very important for impact analysis. MSR aims to identify the actualimpact set after an actual change. However, the MSR techniques often give a “best-guess” for the change. Then the change may not explicitly be documented and thussometimes it must be inferred.

A clustering approach for semi-automated software maintenanceIn [KDTM06] presents a framework for knowledge acquisition from source codein order to comprehend an object-oriented system and evaluate its maintainability.Specifically, clustering techniques are used to assist engineers with understandingthe structure of source code and assessing its maintainability. The proposed ap-proach is applied to a set of elements collected from source code, including:

• Entities that belong either to behavioural (classes, member methods) or struc-tural domain (member data).

• Attributes that describe the entities (such class name, superclass, method nameetc).

• Metrics used as additional attributes that facilitate the software maintainer tocomprehend more thoroughly the system under maintenance.

The above elements specifies the data input model of the framework. Anotherpart of the framework is an extraction process which aim to extract elements andmetrics from source code. Then the extracted information is stored in a relational

Revision: final 95


database so that the data mining techniques can be applied. In the specific approach,clustering techniques are used to analyse the input data and provide a rough graspof the software system to the maintenance engineer. Clustering produces overviewsof systems by creating mutually exclusive groups of classes, member data, methodsbased on their similarities. Moreover, it can assist with discovering programmingpatterns and outlier cases (unusual cases) which may require attention.

Another problem that we have to tackle in software engineering is the correc-tive maintenance of software. It would be desirable to identify software defectsbefore they cause failures. It is likely that many of the failures fall into small groups,each consisting of failures caused by the same software defect. Recent researchhas focused on data mining techniques which can simplify the problem of classifyingfailures according to their causes. Specifically these approaches requires that threetypes of information about executions are recorded and analysed: i)execution pro-files reflecting the causes of the failures, ii) auditing information that can be used toconfirm reported failures and iii) diagnostic information that can be used in deter-mining their causes.

Classification of software failuresA semi-automated strategy for classifying software failures is presented in [PMM+03].This approach is based on the idea that if m failures are observed over some periodduring which the software is executed, it is likely that these failures are due to asubstantially smaller number of distinct defects. Assume that F = {f1, f2, . . . , fm} isthe set of reported failures and that each failure is caused by just one defect. ThenF can be partitioned into k < m subsets F1, F2, . . . , Fk such that all of the failures inFi are caused by the same defect di, 1 ≤ i ≤ k. This partitioning is called the truefailure classification. In the sequel, we describe the main phases of the strategy forapproximating the true failure classification:

1. The software is implemented to collect and transmit to the development eitherexecution profiles or captured executions and then it is deployed.

2. Execution profiles corresponding to reported failures are combined with a ran-dom sample of profiles of operational executions for which no failures were re-ported. This set of profiles is analysed to select a subset of all profile features touse in grouping related failures. A feature of an execution profile correspondsto an attribute or element of it. For instance, a function call profile contains anexecution count for each function in a program and each count is a feature ofthe profile. Then the feature selection strategy is as follows:

• Generate candidate feature-sets and use each one to create and train apattern classifier to distinguish failures from the successful executions.

• Select the features of the classifier that give the best results.

Revision: final 96


Figure 44: A clusters hierarchy

3. The profiles of reported failures are analysed using cluster analysis, in order togroup together failures whose profiles are similar with respect to the featuresselected in phase 2.

4. The resulting classification of failures into groups is explored in order to con-firm it or refine it.

The above described strategy provides an initial classification of software fail-ures. Depending on the application and the user requirements these initial classescan be merged or split so that the software failure are identified in an appropriatefashion. In [FLMP04], two tree-based techniques for refining an initial classificationof failures are proposed and they are discussed below.

Refinement using dendrograms. A dendrogram is a tree-like diagram used to repre-sent the results of hierarchical clustering algorithm. One of the strategies that hasbeen proposed in literature for refining initial failure clustering relies on dendro-grams. Specifically, it uses them to decide how non-homogeneous clusters should besplit into two or more sub-clusters and to decide which clusters should be consideredfor merging. A cluster in a dendrogram corresponds to a subtree that represents re-lationships among the cluster’s sub-clusters. The more similar two clusters are toeach other, the farther away from the dendrogram root their nearest common an-cestor is. For instance, based on the dendrogram presented in Figure 44we canobserve that the clusters A and B are more similar than the clusters C and D. A clus-ter’s largest homogeneous subtree is the largest subtree consisting of failures withthe same cause. If a clustering is too coarse, some clusters may have two or more

Revision: final 97


Figure 45: Merging two clusters. The new cluster A contains the clusters repre-sented by the two homogeneous sub-trees A1 and A2

large homogeneous subtrees containing failures with different causes. Such a clus-ter should be split at the level where its large homogeneous subtrees are connected,so that these subtrees become siblings as Figure 46 shows. If it is too fine, siblingsmay be clusters containing failures with the same causes. Such siblings (clusters)should be merged at the level of their parent as Figure 45 depicts.

Based on these definitions, the strategy that has been proposed for refining aninitial classification of failures using dendrograms has three phases:

1. Select the number of clusters into which the dendrogram will be divided.

2. Examine the individual clusters for homogeneity by choosing the two execu-tions in the cluster with maximally dissimilar profiles. If the

selected executions have the same or related causes, it is likely that all of theother failures in the cluster do as well. If the selected executions do not havethe same or related causes, the cluster is not homogeneous and should be split.

3. If neither the cluster nor its sibling is split by step 2, and the failures wereexamined have the same cause then we merge them.

Clusters that have been generated from merging or splitting should be analysedin the same way, which allow for recursive splitting or merging.

Refinement using classification trees. The second technique proposed by Francis etal, relies on building a classification tree to recognise failed executions. A classifi-cation tree is a type of pattern classifier that takes the form of binary decision tree.Each internal node in the tree is labelled with a relational expression that comparesa numeric feature of the object being classified to a constant splitting value. On theother hand, each leaf of the tree is labelled with a predicted value, which class ofinterest the leaf represents.

Having the classification tree, an object is classified by traversing the tree fromthe root to the leaf. At each step of the traversal prior to reach a leaf, we evaluatethe expression at the current node. When the object reaches a leaf, the predictedvalue of that leaf is taken as the predicted class for that object.

Revision: final 98


Figure 46: Splitting a cluster: The two new clusters (subtrees with roots A11 andA12) correspond to the large homogeneous subtrees in the old cluster.

In case of software failure classification problem, we consider two classes, thatis success and failure. The Classification And Regression Tree (CART) algorithmswas used in order to build the classification tree corresponding of software failures.Assume a training set of execution profiles

L = {(x1, j1), . . . , (xN , jN)}

where each xi represents an execution profile and ji is the result (success/failure)associated with it. The steps of building the classification tree based on L are asfollows:

• The deviance of a node t ⊆ L is defined as

d(t) =1

Nt

∑ (ji − j(t))

)2

where Nt is the size of t and j(t) is the average value of j in t.

• Each node t is split into two children tR and tL. The split is chosen that max-imises the reduction in deviance. That is, from the set of possible splits S, theoptimal split is found by:

s∗ = argmins∈S

(d(t)− NtL

Nt

d(tR)− NtL

Nt

d(tL))

• A node is declared a leaf node if d(t) ≤ β, for some threshold β.

• The predicted value for a leaf is the average value of j among the executions inthat leaf.

Revision: final 99


Analysing Bug RepositoriesSource code repositories stores a wealth of information that is not only useful formanaging and building source code, but also a detailed log how the source codehas evolved during development. Information regarding the evidence of source coderefactoring will be stored in the repository. Also as bugs are fixed, the changes madeto correct the problem are recorded. As new APIs are added to the source code, theproper way to use them is implicitly explained in the source code. Then, one of thechallenges is to develop tools and techniques to automatically extract and use thisuseful information.

In [WH05], a method is proposed which uses data describing bug fixes minedfrom the source code repository to improve static analysis techniques used to findbugs. It is a two step approach where the source code change history of a softwareproject helps to refine the search for bugs.

The first step in the process is to identify the types of bugs that are being fixed inthe software. The goal is to review the historical data stored for the software project,in order to gain an understanding of what data exists and how useful it may be in thetask of bug findings. Many of the bugs found in the CVS history are good candidatesfor being detected by statistic analysis, NULL pointer checks and function returnvalue checks.

The second step is to build a bug detector driven by these findings. The idea is todevelop a function return value checker based on the knowledge that a specific typeof bug has been fixed many times in the past. Briefly, this checker looks for instanceswhere the return value from a function is used in the source code before beingtested. Using a return value could mean passing it as an argument to a function,using it as part of calculation, dereferencing the value if it is a pointer or overwritingthe value before it is tested. Also, cases that return values are never stored by thecalling function are checked. Testing a return value means that some control flowdecision relies on the value.

The checker does a data flow analysis on the variable holding the returned valueonly to the point of determining if the value is used before being tested. It simplyidentifies the original variable the returned value is stored into and determines thenext use of that variable. If the variable during its next use is an operand to acomparison in a control flow decision, the return value is deemed to be tested beforebeing used. If the variable is used in any way before being used in a control flowdecision, the value is deemed to be used before being tested. Also, a small amountof inter-procedural analysis is performed in order to improve the results. It is oftenthe case that a return value will be immediately used as an argument in a call to afunction. In these cases, the checker determines if that argument is tested beforebeing used in the called function.

Moreover, the checker categorises the warnings it finds into one of the followingcategories:

Revision: final 100


• Warnings are flagged for return values that are completely ignored or if thereturn value is stored but never used.

• Warnings are also flagged for return values that are used in a calculation beforebeing tested in a control flow statement.

Any return value passed as an argument to a function before being tested is flagged,as well as any pointer return value that is dereferenced without being tested.

However there are types of functions that lead the static analysis procedure toproduce false positive warnings. If there is no previous knowledge, it is difficult totell which function does not need their return value checked. Mining techniques forsource code repository can assist with improving static analysis results. Specificallythe data we mine from the source code repository and from the current version ofthe software is used to determine the actual usage pattern for each function.

In general terms, it has been observed that the bugs catalogued in bug databasesand those found by inspecting source code change histories differ in type and levelof abstraction. Software repositories record all the bug fixed, from every step indevelopment process and thus they provide much useful information. Therefore, asystem for bug finding techniques is proved to be more effective when it automati-cally mines data from source code repositories.

Mining the Source Code RepositoryWilliams et al. [WH05] proposes the use of analysis tool to automatically mine datafrom the source code repository by inspecting every source code change in the repos-itory. Specifically, they try to determine when a bug of the type they are concernedwith is fixed. A source code checker is developed (as described above) which is usedto determine when a potential bug has been fixed by a source code change. Thechecker is run over both version of the source code. If, for a particular functioncalled in the changed file, the number of calls remain the same and the number ofwarnings produced by the tool decreases, the change is said to fix a likely bug. Ifwe determine that a check has been added to the code, we flag the function thatproduces the return value as being involved in a potential bug fix in a CVS commit.The results of the mining is a list of functions that are involved in a potential bug fixin a CVS commit.

The output of the function return value checker is a list of warnings denotinginstances in the code where a return value from a function is used before beingtested. A full description of the warning including the source file, line number andcategory of the warning including the source file, line number and category of thewarning. Since there are many reasons that could lead a static analysis to produce alarge number of false positive warnings, the proposed tool provide a ranking of thewarnings. It tries to rank the warnings from least likely to most likely to be falsepositive. The rank is done in two parts. First, the function are divided into those

Revision: final 101


that are involved in a potential bug fix in a CVS commit and those that are not. Next,within each group, the functions are ranked by how often their return values aretested before being used in the current version of the software.

4.2.2 A Data Mining approach to automated software testing

The evaluation of software is based on tests that are designed by software testers.Thus the evaluation of test outputs is associated with a considerable effort by humantesters who often have imperfect knowledge of the requirements specification. Thismanual approach of testing software results in heavy losses to the world’s economy.Thus the interest of researchers has been focused on the development of automatedtechniques that induces functional requirements from execution data. Data miningapproaches can be used for extracting useful information from the tested softwarewhich can assist with the software testing. Specifically the induced data mining mod-els of tested software can be used for recovering missing and incomplete specifica-tions, designing a set of regression tests and evaluating the correctness of softwareoutputs when testing new releases of the system.

In developing a large system, the test of the entire application (system testing)is followed by the stages of unit testing and integration testing. The activities ofsystem testing includes function testing, performance testing, acceptance testingand installation testing. The function testing aims to verify that the system performsits functions as specified in the requirements and there are no undiscovered errorsleft. Thus a test set is considered adequate if it causes all incorrect versions of theprogram to fail. It is then important that the selection of tests and the evaluation oftheir outputs are crucial for improving the quality of the tested software with lesscost. Assuming that requirements can be re-stated as logical relationships betweeninput and outputs, test cases can be generated automatically by techniques suchas cause effect graphs [Pfl01] and decision tables [LK03b]. A software system inorder to stay useful has to undergo continual changes. Most common maintenanceactivities in software life-cycle include bug fixes, minor modifications, improvementsof basic functionality and addition of brand new features.

The purpose of regression testing is to identify new faults that may have beenintroduced into the basic features as a result of enhancing software functionality orcorrecting existing faults. A regression test library is a set of test cases that runautomatically whenever a new version of software is submitted for testing. Sucha library should include a minimal number of tests that cover all possible aspectsof system functionality. A standard way to design regression test library is to iden-tify equivalence classes of every input and then use only one value from each edge(boundary) of every class. One of the main problems is the generation of a mini-mal test suite which covers as many cases as possible. Ideally such a test suite canbe generated by a complete and up-to-date specification of functional requirements.However, frequent changes make the original requirements specifications, hardly

Revision: final 102


Figure 47: An example of Info-Fuzzy Network structure [LFK05]

relevant to the new versions of software. Then to ensure effective design of newregression test cases, one has to recover the actual requirements of an existing sys-tem. Thus, a tester can analyse system specifications, perform structural analysisof the system’s source code and observe the results of system execution in order todefine input-output relationships in tested software.

An approach that aims to automate the input-output analysis of execution databased on a data mining methodology is proposed in [LFK05]. This methodologyrelies on the info-fuzzy network (IFN) which has an ‘oblivious’ tree-like structure.The network components include the root node, a changeable number of hiddenlayers (one layer for each selected input) and the target (output) layer representingthe possible output values. The same input attribute is used across all nodes of agiven layer (level) while each target node is associated with a value (class) in thedomain of a target attribute. If the IFN model is aimed at predicting the values ofa continuous target attribute, the target nodes represent disjoint intervals in theattribute range.

A hidden layer l, consists of nodes representing conjunctions of values of the firstl input attributes, which is similar to the definition of an internal node in a standarddecision tree. The final (terminal) nodes of the network represent non-redundantconjunctions of input values that produce distinct outputs. Considering that thenetwork is induced from execution data of a software system, each interconnectionbetween a terminal and target node represents a possible output of a test case.Figure 47 presents an IFN structure where the internal nodes include the nodes(1,1), (1,2), 2, (3,1), (3,2) and the connect (1, 1) → 1 implies that the expected outputvalues for a test case where both input variables are equal to 1, is also 1. Theconfectionist nature of IFN resembles the structure of a multi-layer neural network.Therefore, the IFN model is characterised as a network and not as a tree.

A separate info-fuzzy network is constructed to represent each output variable.Thus we present below the algorithm for building an info-fuzzy network of a single

Revision: final 103


output variable.

Network Induction Algorithm. The induction procedure starts with defining the tar-get layer (one node for each target interval or class) and the “root” node. The rootnode represents an empty set of input attributes which are selected incrementallyto maximise a global decrease in the conditional entropy of the target attribute. TheIFN algorithm is based on the pre-pruning approach unlike algorithms of building de-cision trees such as CART and C4.5. Thus it assumes that when no attribute causes astatistically significant decrease in the entropy, the network construction is stopped.The algorithm performs discretisation of continuous input attributes “on-the-fly” byrecursively finding a binary partition of an input attribute that minimises the con-ditional entropy of the target attribute [FI93]. The search for the best partition ofattribute is dynamic and it is performed each time a candidate input attribute. Eachhidden node in the network is associated with an interval of a discretised input at-tribute. The estimated conditional mutual information between the partition of theinterval S at the threshold Th and the target attribute T given the node z is definedas follows:

MI (Th; T/S, z =) =∑

t=0,...,MT−1

∑y=1,2

P (Sy; Ct; z) · log P (Sy; Ct/S, z)

P (Sy/S, z) · P (Ct/S, z)

where

• P (Sy; Ct; z) is an estimated conditional probability of a sub-interval Sy, giventhe interval S and the node z.

• P (Ct/S, z) is an estimated conditional probability of a value Ct of the targetattribute T the interval S and the node z.

• P (Sy; Ct; z) is an estimated joint probability of a value Ct of the target attributeT , a sub-interval Sy and the node z.

Then the statistical significance of splitting the interval S by the threshold Th atthe node z is evaluated using the likelihood-ratio statistic. A new input attributeis selected to maximise the total significant decrease in the conditional entropy asresult of splitting the nodes of the last layer. The nodes of the new hidden layer aredefined as the Cartesian product of split nodes of the previous layer and the discre-tised interval of the new input variables. If there is no input variable that decreasesthe conditional entropy of the output variable then the network construction stops.The IFN induction procedure is a greedy algorithm which is not guaranteed to findthe optimal ordering of input attributes. Though some functions are highly sensitiveto this ordering, alternative orderings will still produce acceptable results in mostcases.

Revision: final 104


An IFN-based environment for automated input-output analysis is presented in [LFK05].The main modules of this environment are:

• Legacy system (LS). This module represents a program, a component or a sys-tem to be tested in subsequent versions of the software.

• Specification of Application Inputs and Outputs (SAIO). Basic data on each in-put and output variable in the Legacy System.

• Random test generator (RTG). This module generates random combinations ofvalues in the range of each input variable.

• Test bed (TB). This module feeds training cases generated by the RTG moduleto the LS.

The IFN algorithm is trained on inputs provided by RTG and outputs obtainedfrom a legacy system by means of the Test Bed module. A separate IFN module isbuilt for each output variable. The information derived from each IFN model can besummarised to the following:

• A set of input attributes relevant to the corresponding output.

• Logical (if... then ...) rules expressing the relationships between the selectedinput attributes and the corresponding output. The set of rules appearing ateach terminal node represents the distribution of output values at that node.

• Discretisation intervals of each continuous input attribute included in the net-work. Each interval represents an “equivalence” class, since for all values of agiven interval the output values conform to the same distribution.

• A set of test cases. The terminal nodes in the network are converted into testcases, each representing a non-redundant conjunction of input values / equiva-lence classes and the corresponding distribution of output values.

The IFN algorithm takes as input the training cases that are randomly generatedby the RTG module and the outputs produced by LS for each test case. The IFNalgorithm repeatedly runs to find a subset of input variables relevant to each out-put and the corresponding set of non-redundant test cases. Actual test cases aregenerated from the automatically detected equivalence classes by using an existingtesting policy.

4.3 Text Mining and Software Engineering

Software engineering repositories consists of text documents containing source code,mailing lists, bug reports, execution logs. Thus the mining of textual artifacts is

Revision: final 105


requisite for many important activities in software engineering: tracing of require-ments; retrieval of components from a repository; identify and predict software fail-ures; software maintenance; testing etc.

This section describes the state of the art in text mining and the application of textmining techniques in software engineering. Furthermore, a comparative analysis forthe text mining techniques applied in software engineering is provided, and futuredirections are discussed.

4.3.1 Text Mining - The State of the Art

Text mining is the process of extracting knowledge and patterns from unstructureddocument text. It is a young interdisciplinary research field under the wider area ofdata mining engaged in information retrieval, machine learning and computationallinguistics. The methods deployed in text mining, depending on the application,usually require the transformation of the texts into an intermediate structured rep-resentation, which can be for example the storage of the texts into a database man-agement system, according to a specific schema. In many approaches though, thereis gain into also keeping a semi-structured intermediate form of the texts, as for ex-ample could be the representation of documents in a graph, where social analysisand graph techniques can be applied.

Independently from the task objective, text mining requires preprocessing tech-niques, usually levying qualitative and quantitative analysis of the documents’ fea-tures. In Figure 48, the diagram depicts the most important phases of the prepro-cessing analysis, as well as the most important text mining techniques.

Preprocessing assumes a preselected documents representation model, usuallythe vector space model though the boolean and the probabilistic are other options.According to the representation model, documents are parsed, and text terms areweighted according to weighting schemes like the TF-IDF (Term Frequency - InverseDocument Frequency), which is based on the frequency of occurrence of terms inthe text. Several other options are described in [Cha02, BYRN99]. Natural lan-guage processing techniques are also applied, the state of the art of which is welldescribed in [Mit05, MS99]. Often, stop-words removal and stemming is applied.In favour of the use of natural language processing techniques in text mining, ithas been shown in the past that the use of semantic linguistic features, mainly de-rived from a language knowledge base like WordNet word thesaurus [Fel98], canhelp text retrieval [Voo93] and text classification [MTV+05]. Furthermore, the useof word sense disambiguation (WSD) techniques [IV98] is important in several natu-ral language processing techniques and text mining tasks, like machine translation,speech processing and information retrieval. Lately, state of the art approaches inunsupervised WSD [TVA07, MTF04], have pointed the way towards the use of se-mantic networks generated from texts, enhanced with semantic information derivedfrom word thesauri. These approaches are to be launched in the text retrieval task,

Revision: final 106


DocumentCollection

Natural Language Processing

Part of Speech Tagging

Word Sense Disambiguation

Summarization

Phrase Detection

Entity Recognition

Word Thesauri/Domain Ontologies

Preprocessing

Feature Extraction

Term Weighting

Dimensionality Reduction

Text Keyword Characterization

Structured Representation

Boolean

Vector

Probabilistic

Non-Overlapping Lists

Proximal Nodes

Storage and Indexing

Semi-Structured Representation

Graph

Link Analysis

Meta-data

Structured and/orsemi-structured data

Clustering

Classification

Retrieval

Social Analysis

Domain Ontology Evolution

Text MiningProcessing

Models/Patterns/Answers

Figure 48: Preprocessing, Storage and Processing of Texts in Text Mining

where it is expected that under certain circumstances the representation of texts assemantic networks can improve retrieval performance.

Another important factor when tackling with unstructured text is the curse ofdimensionality. While tackling millions or even billion of documents, the respectiveterm space is huge and often prohibitive of applying any type of analysis or featureextraction. Towards this direction, techniques that are based in singular value de-composition, like latent semantic indexing, or removing of features with low scoresbased on statistical weighting measures are levied. Several examples of such tech-niques can be found in [DZ07].

Once feature extraction and natural language processing techniques have beenapplied on the document collection, storage takes place with the use of techniqueslike inverted indexing. Depending on the application of the text mining methods,a semi-structured representation of documents, like in [TVA07, MTF04], might beneeded. In such cases, indexing of the respective information (i.e. node types, edgetypes, edge weights) is useful.

The text mining techniques that are mentioned in Figure 49 are representative

Revision: final 107


and frequently used in many applications. For example, clustering has already beenused in information retrieval and it is already applied in popular web search engines,like in Vivisimo 37. Text classification is widely used in spam filtering. Text retrievalis a core task with unrestricted range of applications varying from search enginesto desktop search. Social analysis can be applied when any type of links betweendocuments is available, like for example publications and references, or posts in fo-rums and replies, and is widely used for authority and hubs detection (i.e. findingthe most important people in this graph). Finally domain ontology evolution is a taskwhere through the use of other text mining techniques, like clustering or classifi-cation, an ontology describing a specific domain can be evolved and enhanced withterm features of new documents pertaining with the domain. This is really importantin cases where the respective domain evolves fast, prohibiting the manual update ofthe ontology with new concepts and instances.

4.3.2 Text Mining Approaches in Software Engineering

Applying text mining techniques in software engineering is a real challenge, mostlybecause of the perplexed nature of the unstructured text. Text mining in softwareengineering has employed a wide range of text repositories, like document hierar-chies, code repositories, bug report databases, concurrent versioning system logsrepositories, newsgroups, mailing lists and several others. Since the aim is to definemetrics which can lead to software assessment and evaluation, while the input datais unstructured and unrestricted text, the text mining processes in software engi-neering are hard to design and moreover to apply. The most challenging part is theselection and the preprocessing of the input text sources, along with the design of ametric that shall use one or more text mining techniques applied to these sources,while in parallel shall oblige to the existing standards for software engineering met-rics. Discussion of some of the most recent approaches within this scope follows,while Figure 49 summarises the methods and their use.

In [BGD+06], they used as text input the Apache developer mailing list. Entityresolution was essential, since many individuals used more than one alias. After con-structing the social graph occurring from the interconnections between poster andreplier, they made a social network analysis and came to really important findings,like the strong relationship between email activity and source code level activity.Furthermore, social network analysis in that level revealed the important nodes (in-dividuals) in the discussions. Though graph and link analysis were engaged in themethod, the use of node ranking techniques, like PageRank, or other graph process-ing techniques like Spreading Activation, did not take place.

In [CC05] another text source has been used with the aim of predicting parts ofthe source code that will be influenced by fixing future bugs. More precisely, for each

37Publicly available at http://vivisimo.com/

Revision: final 108

http://vivisimo.com/


Method Text Input SourceText MiningTechnique

Output

[BGD+06]E-mail archives of

OSS software

Entity Resolution,Social Network

Analysis

Weighting of OSS

participants,Relationship of e-mail

activity and commitactivity

[CC05]CVS commit notes,

Set of fixed bugsText Retrieval

Similarity betweennew bug reports and

source code files -Prediction

[GM03]

Mailing lists, CVS

logs, Change Logfiles

Text Summarization

and Validation

Statistical measures

for code changes anddevelopers

[JS04]

OSSD WebRepositories (Web

pages, mailing lists,process entity

taxonomy)

Text Extraction,

Entity Resolution,Social Network

Analysis

Transformation of

data into processevents, Ordering of

processing events

[VT06] CVS repositories Text Clustering

Patterns in thedevelopment of large

software projects

(history analysis,major contributions)

[WK05]CVS repositories,

source code

Text analysis,retrieval,

classification

Predictions of sourcebugs

Figure 49: Summary of Recent Text Mining Approaches in Software Engineering

source file they used the set of fixed bugs data and the respective CVS commit notesas descriptors. With the use of a probabilistic text retrieval model they measure thesimilarity between the descriptors of each source file and the new bug description.This way they predict probably future affected parts of code by bug fixing. Still, thesame method could have been viewed from a supervised learning perspective andclassification along with predictive modelling techniques, would have been a goodbaseline for their predictions.

Following the same goal, in [WH05] they mined the CVS repositories to obtaincategories of bug fixes. Using a static analysis tool, they inspected every sourcecode change in the software repository and they predicted whether a potential bugin the code has been fixed. These predictions are then ranked with the analysis ofthe contemporary context information in the source code (i.e. checking the percent-age of the invocations of a particular function where the return value is tested beforebeing used). The whole mining procedure is based on text analysis of the CVS com-mit changes. They have conducted experiments on the Apache Web server sourcecode and the Wine source code, in which they showed that the mined data from thesoftwares’ repositories produced really good precision and certainly better than abaseline naive technique.

From another perspective, text mining has been used in software engineering tovalidate the data from mailing lists, CVS logs, and change log files of Open Source

Revision: final 109


software. In [GM03a] they created a set of tools, namely SoftChange38, that imple-ments data validation from the aforementioned text sources of Open Source soft-ware. Their tools retrieve, summarise and validate these types of data of OpenSource projects. Part of their analysis can mark out the most active developers of anOpen Source project. The statistics and knowledge gathered by SoftChange analysishas not been exploited fully though, since further predictive methods can be appliedwith regards to fragments of code that may change in the future, or associative anal-ysis between the changes’ importance and the individuals (i.e. were all the changescommitted by the most active developer as important as the rest, in scale and inpractice?).

Text mining has also been applied in software engineering for discovering devel-opment processes. Software processes are composed of events such as relations ofagents, tools, resources, and activities organised by control flow structures dictat-ing that sets of events execute in serial, parallel, iteratively, or that one of the setis selectively performed. Software process discovery takes as input artifacts of de-velopment (e.g. source code, communication transcripts, etc) and aims to elicit thesequence of events characterising the tasks that led to their development. In [JS04]an innovative method of discovering software processes from open source softwareWeb repositories is presented. Their method contains text extraction techniques,entity resolution and social network analysis, and it is based in process entity tax-onomies, for entity resolution. Automatic means of evolving the taxonomy using textmining tasks could have been levied, so as for the method to lack strict dependencyfrom the taxonomy’s actions, tools, resources and agents. An example could be textclustering on the open software text resources and extraction of new candidate itemsfor the taxonomy arising from the clusters’ labels.

Text clustering has also been used in software engineering, in order to discoverpatterns in the history and the development process of large software projects. In[VT06] they have used CVSgrab to analyse the ArgoUML and PostgreSQL reposito-ries. By clustering the related resources, they generated the evolution of the projectsbased on the clustered file types. Useful conclusions can be drawn by careful man-ual analysis of the generated visualised project development histories. For example,they discovered that in both projects there was only one author for each major ini-tial contribution. Furthermore, they came to the conclusion that PostgreSQL did notstart from scratch, but was built atop of some previous project. An interesting evo-lution of this work could be a more automated way of drawing conclusions from thedevelopment history, like for example extracting clusters labels, map them to tax-onomy of development processes and automatically extract the development phaseswith comments emerging from taxonomy concepts.

38Publicly available at http://sourcechange.sourceforge.net/

Revision: final 110

http://sourcechange.sourceforge.net/


4.4 Future Directions of Data/Text Mining Applications in Soft-ware Engineering

Defining software engineering metrics with the use of text mining can be no differentprocess from following the existing standards for defining direct or indirect metricsfor evaluating software using any background knowledge. The IEEE Standard 1061[IEE98] defines a methodology for developing metrics for software quality attributes.A framework for evaluating proposed metrics in software engineering, according tothe IEEE 1061 Standard is discussed in [KB04a]. The latter refers to ten questionsthat need to be answered when defining software evaluation measures.

Though any design and implementation of a method using text mining for soft-ware evaluation must follow the aforementioned and/or related standards, there isa common place in how differently, aside or atop of the described techniques, cantext mining be used in future directions. A short description of issues that would beinteresting to address in the context of this project follows.

• Social network analysis, for the purposes of discovering the important clusterof individuals in a software project, through using more sophisticated graphprocessing techniques, like PageRank or Spreading Activation.

Actually social net analysis is a set of algorithms that exist long ago havingbeen applied in other context. The ‘future direction’ is to extend and applythem in the context of SQO-OSS - aiming at ranking relevant entities appearingin software development.

• Supervised Learning approaches, like text classification, based on predictivemodelling techniques, for the purposes of predicting future bugs and/or possi-bly affected parts of code. A measure of future influence of bugs in the sourcecode, associated with a weight and a prediction ranking can show a lot for thesoftware quality.

• Text clustering of the bug reports, and cluster’s labelling can be used to auto-matically create a taxonomy of bugs in the software. Metrics in that taxonomycan be defined to show the influence of generated bugs belonging in a cate-gory of bugs, to other categories. This can also be translated as a metric of buginfluence across the software project.

• Graph mining techniques to detect hidden structures in a OSS(Open SourceSoftware) project. A complex graph can be created based on functions’ rela-tions as defined by the function calls in a project. Then a program execution isa path in this graph. Using graph mining techniques (link analysis algorithms,min-cut algorithms), we could derive correlations of paths leading to errors;predict software behaviour assuming first k steps; statistically analyse largenumber of paths and make decisions.

Revision: final 111


Also we can assume graphs created from the existing OSS software and thecommunication data. This implies a graph G(V, E), where V = node/ noderepresents user, E = edge/ edge: e.g. email exchange.Applying mining techniques we can extract useful information from the graphand predict individual actions (i.e. what/when will be the next action of q user)and calculate aggregate measures regarding the software quality.

Revision: final 112


5 Related IST Projects

This section contains information about related IST projects. The following list wastaken from the draft agenda of the Software Technologies Concertation Meeting, 25September 2006, Brussels. The projects are presented in an alphabetical order.

5.1 CALIBRE

CALIBRE was an EU FP6 Co-ordination Action project that involved the leading au-thorities on libre/Open Source software. CALIBRE brought together an interdis-ciplinary consortium of 12 academic and industrial research teams from France,Ireland, Italy, the Netherlands, Poland, Spain, Sweden, the UK and China.

The two-year project managed to:

• Establish a European industry Open Source software research policy forum

• Foster the effective transfer Open Source best practice to European industry

• Integrate and coordinate European Open Source software research and prac-tice

CALIBRE aimed to coordinate the study of the characteristics of open source soft-ware projects, products and processes; distributed development; and agile meth-ods. This project integrated and coordinated these research activities to addresskey objectives for open platforms, such as transferring lessons derived from opensource software development to conventional development and agile methods, andvice versa.

CALIBRE also examined hybrid models and best practices to enable innovativereorganisation of both SMEs and large institutions, and aimed to construct a com-prehensive research road-map to guide future Open Source software research. Tosecure long-term impact, an important goal of CALIBRE was to establish a Euro-pean Open Source Industry Forum, CALIBRATION, to coordinate policy making intothe future. The CALIBRATION Forum and the results of the CALIBRE project weredisseminated through a series of workshops and international conferences in thevarious partner countries.

The first public deliverable of CALIBRE was to present an initial gap-analysis ofthe academic body of knowledge of Libre Software, as represented by 155 peer-reviewed research artifacts. The purpose of this work was to support the widerCALIBRE project goal of articulating a road map for Libre Software research andexploitation in a European context. For the gap-analysis, a representative collectionof 155 peer-reviewed Libre Software research artifacts was examined and it wasattempted to answer three broad questions about each:

• Who are we (the academic research community) looking at?

Revision: final 113


• What questions are we asking? industry

• How are we trying to find the answers?

The artifacts were predominantly research papers published in international jour-nals or peer reviewed anthologies, and/or presented at international conferences.The papers were discovered through citation indices (e.g. EBSCO, Science-Direct,ACM Portal) and through recursion using the references cited within papers. Peer-review was the key criteria for inclusion, as this represented the official body ofknowledge, however two particularly influential non-reviewed books [DOS99, Ray01]were also included.

In the second publicly available report of CALIBRE, the development model of Li-bre software was addressed. This report described what the research community haslearnt about those models, and the implications for future research lines that thoselessons have. Among the different research approaches applied to understandingLibre software, there was a focus on the empirical study of Libre software develop-ment, based on quantitative data, usually available from the public repositories ofthe studied projects. In the report, the peculiarities of Libre software developmentfrom a research perspective were also studied, concluding that it is quite an inter-esting field in which to apply the traditional scientific methodology, thanks to thiswealth of public data, covering large parts of the development activities and results.From this standpoint, the early and current research was reviewed, offering a sam-ple of the most interesting and promising results, the tools, approaches and method-ologies used to reach those, and the current trends in the research community. Thereport ended with two chapters summarising the most important implications of thecurrent research for the main actors of Libre software development (Libre softwaredevelopers themselves, companies interested in Libre software development, andthe software industry in general), and a road-map for the future on this field. Thisreport was not considered as a set of proven recommendations and forecasts. On thecontrary, it intended to be a starting point for discussion, trying to highlight thoseaspects more relevant to its authors, but for sure missing many other of equal (orlarger) interest.

In the third deliverable of CALIBRE there was a focus on complexity as a majordriver of software quality and costs, both in the traditional sense of software com-plexity and in the sense of complexity theory. The analysis of a benchmark databaseof 10 large Libre & open-source projects, suggested that:

• Risk evaluations could adequately supplement cost estimations of Libre soft-ware products

• Maintenance teamwork seems to be generally correlated with complexity met-rics in large Libre software projects

Revision: final 114


• Libre software projects can be categorised first between small (I-Mode) andlarge (C-Mode) projects in the context of an entrepreneurial analysis of Li-bre software, and second thanks to a dynamic and open meta-maintenance fo-rum which would provide a standard quality assessment model to all software-enabled industries, and specially to the secondary software sector

Another deliverable of CALIBRE was to present an overview of the field of dis-tributed development of software systems and applications (DD). Based on an anal-ysis of the published literature, including its use in different industrial contexts,the document provided a preliminary analysis which established the basic charac-teristics of DD in practice. The analysis resulted in a framework that structuredexisting DD knowledge by focusing on threats to communication, coordination andcontrol caused by temporal distance, geographical distance, and socio-cultural dis-tance. The purpose of this work was to support the wider CALIBRE project goal ofarticulating a road-map for DD in relation to Libre Software research and exploita-tion in a European context. Ultimately, this road-map would form a partial basisfor the development of the next generation software development paradigm, whichwould integrate DD, Libre software and agile methods.

The next deliverable of this project provided an analysis of the process dimensionfor distributed software development. This included an investigation of a number ofcompany case studies in various contexts, and presented a reference model for suc-cessful distributed development. This model was tailored for distributed scenariosin which time differences are low, as is the case in intra-EU collaborations. Thestudy was broadened to consider strategies for successful Libre (Free/Open Source)software development, and then consider the technology dimension of distributeddevelopment. This deliverable was positioned with respect to a road-map for re-search in the domain of Libre software development.

The establishment of this research road-map was the objective of the next deliver-able. This started with a discussion of some of the tensions and paradoxes inherent inFOSS generally, and which served as the engine driving the phenomenon. Then theemergent OSS 2.0 was characterised in terms of the tensions and paradoxes that areassociated with it. Furthermore, a number of business strategies that underpin OSS2.0 were identified. To exemplify the industrial impact of the phenomenon six inter-views with leading industrial partners using Libre/OSS in different vertical domainswere presented, forming a series of industrial viewpoints. Following this the discus-sion of the impact of OSS 2.0 was presented for the IS development process, and itswider implications for organisational and societal processes more generally. Finally,this document concluded with a road-map for European research on Libre/OSS sum-marising and highlighted the history of Free/Libre/OSS, the current status and theareas where more research is needed.

Agile Methods (AMs) was the focus of another public deliverable of CALIBRE.AMs have grown very popular in the last few years and so has Libre Software. Both

Revision: final 115


AMs and Libre Software push for a less formal and hierarchical, and more human-centric development, with a major emphasis on focusing on the ultimate goal of de-velopment -producing the running system with the correct amount of functionalities.This deliverable presented an attempt to deepen the understanding of the analogiesbetween the two methods and to identify how such analogies may help in gettinga deeper understanding of both. The relationships were analysed theoretically andexperimentally, with a final, concrete case study of a company adopting both the XPdevelopment process and Libre Software tools.

Other deliverables of CALIBRE reported on the groundwork for future researchwithin the CALIBRE project, leading towards the overall project goal of articulatinga road-map for Libre Software in the European context. The research was shaped bythe concerns expressed by the CALIBRE industry partners in the various CALIBREevents to-date. Specifically, industry partners, notably Paul Everett of the Zope Eu-rope Association (ZEA) have identified that the primary challenge for Libre softwarebusinesses was effectively delivering the whole product in a manner that takes ac-count of, and in fact leverages, the unique business model dynamics associated withLibre software licensing and processes. The document described a framework foranalysing Libre software business models, an initial taxonomy of model categories,and a discussion of organisational and network agility based on ongoing researchwithin the ZEA membership.

Another deliverable of the CALIBRE project presented a selection of product andprocess metrics defined in various suites, frameworks and categorisations to time.Each metric was analysed for citations and applications to both agile and Libre devel-opment approaches. Opportunities for migration and knowledge transfer betweenthese areas were stressed and outlined. The document also summarised productmaturity models available for Open Source software and emphasises the need foralternative approaches to shaping Open Source Process maturity models.

CALIBRE project has produced the CALIBRE Working Environment (CWE). Asa result, a deliverable described the first version of the CALIBRE Working Envi-ronment (CWE). The requirements for the system were described, and the way inwhich the CWE addresses these requirements was identified. The CWE require-ments were identified collaboratively, in consultation with its users, and the systemas it stands largely meets the needs of the users. The software and hardware used toimplement the CWE was described, and areas for further work were identified. Thecurrent CWE is located at http://hemswell.lincoln.ac.uk/calibre/ and allowsregistered members to prepare content, with varying levels of dissemination (pub-lic, restricted to registered members and private), upload documents and files, addevents to a shared calendar and archive mailing list information.

The last publicly available deliverable of CALIBRE focused on Education andtraining on Libre (Free, Open Source) software. In this report, a scenario whichcould be considered as the second generation in Libre software training was pre-

Revision: final 116

http://hemswell.lincoln.ac.uk/calibre/


sented: the compendium of knowledge and experiences needed to deal with themany facets of the Libre software phenomenon. For this goal, higher education wasconsidered as the best possible framework. The main guidelines of such a programon Libre software were proposed. In summary, the studies designed in this reportwere aimed at providing students with the knowledge and expertise that would makethem expert in Libre software. The programme provided capabilities and enhancesskills to the point that students can deal with problems ranging from the legal oreconomic areas to the more technically oriented ones. It did not (intentionally) focuson a set of technologies, but approached the Libre software phenomenon from anholistic point of view. However, it was also designed to provide practical and realworld knowledge. It could be offered jointly by several universities across Europe,within the framework of the ESHE, or adapted to the specific needs of a single one.In addition, it could also be adapted for non-formal training.

5.2 EDOS

EDOS stands for Environment for the development and Distribution of Open Sourcesoftware. This is a research project funded by the European Commission as a STREPproject under the IST activities of the 6th Framework Programme. The project in-volves universities - Paris 7, Tel Aviv, Zurich and Geneva Universities -, researchinstitutes - INRIA - and private companies - Caixa Magica, Nexedi, Nuxeo, Edge-ITand CSP Torino.

The project aims to study and solve problems associated with the production,management and distribution of Open Source software packages. Software pack-ages are files in the RPM or Debian packaging format that contain executable pro-grams or libraries, their files, along with metadata describing what’s in the packageand what conditions are needed to use it.

There are several problems associated with software packages.

• Dependencies: Software packages may need other software packages to run,and often they don’t tell exactly what other packages they need but leave alarge room for choice. Also, some software packages cannot be installed at thesame time. This makes the job of tools that automatically download requiredsoftware packages difficult. Distribution maintainers want to make sure thatthere is always a way of selecting available packages to correctly install ev-ery piece of software they include, and that users can upgrade their systemswithout loosing functionality. Work package 2 handled these issues.

The stated goal of EDOS Work package 2 was:

To build new generation tools for managing large sets of softwarepackages, like those found in Free software distributions, using for-mal methods

Revision: final 117


The focus was mainly on the issues related to dependency management forlarge sets of software packages, with a particular attention to what must bedone to maintain consistency of a software distribution on the repository side,as opposed to maintaining a set of packages on a client machine. This choiceis justified by the fact that maintaining the consistency of a distribution of soft-ware packages is essential to make sure the current distributions will scaleup, yet it is also an invisible task, as the smooth working it will ensure on theend user side will tend to be considered as normal and obvious as the smoothworking of routing on the Internet. In other words, the project was tacklingan essential infrastructure problem, which was perfectly suited for an Euro-pean Community funded action. Over the first year and a half of its existence,Work Package 2 team of the EDOS project has done an extensive analysis of thewhole set of problems that are in its focus, ranging from upstream tracking, tothinning, rebuilding, and dependency managements for F/OSS distributions.

• Downloading: Users need to download software packages from somewhere.This requires a lots of bandwidth and puts strains on mirrors that host thosepackages. This problem would be better solved with peer-to-peer methods.Work package 4 handles these issues.

The goal of this work package is to investigate scalable and secure solutions toimproving the process of distributing data (source code, binaries, documenta-tion and meta-data) to end-users. The key issue in the code distribution processis the ability to transfer a large sized code base to a large number of people. Inthe case of Mandrake linux, for instance, this entails copying a code base of 20Gigabytes to a community containing up to 4 million users (i.e. the number ofinstalled versions of Mandrake linux). This community is growing so the prob-lems have to be addressed. Currently the process is quite slow, as it takes 48hours to copy from a master server to all mirror servers. This creates a latencyproblem that leads to inconsistencies at the user and developer side. This inturn can create awkward dependencies at the module level in future releases.This work package will test and evaluate two alternative architectures for datadistribution that address the issue of latency and consistency.

• Quality assurance: The complexity of the quality assurance process increasesexponentially with the number of packages and the number of platforms. Tomaintain the workload manageable, Linux distribution developers are forced toreduce system quality, reduce the number of packages, or accept long delaysbefore final releases of high quality system. Work package 3 handles theseissues.

The goal of the work package is to research and experiment solutions whichwill ultimately allow to dramatically reduce the costs and delays of quality as-surance in the process of building an industry grade custom GNU/Linux distri-

Revision: final 118


bution or custom application comprising several. It will design, implement andexperiment an integrated quality assurance framework based on code analysisand runtime tests, which operates at the system level.

• Metrics: Following the “release early, release often” philosophy, Free and OpenSource software is always in constant development and any serious project hasmany versions floating around : older but stable versions, and newer versionswith new features but with more bugs. Free software can be of wildly varyingquality. Quality metrics are defined, their relevance is assessed and they areimplemented. Work package 5 handles these issues.

The goal of work package 5 is to develop technology and products that willimprove the efficiency of two key processes and one system. The two processesare the generation of a new version of a distribution from the previous versionand the production of a customised distribution from an existing one. Thesystem is the current inefficient mechanism of mirroring the Cooker data thatneeds to be replaced by a more efficient system. In the end, a demonstrationthat the processes have indeed been improved and the system will take place.Thus, the goal is to define a set of metrics to measure the efficiency of theprocesses in question. These metrics will include man power as measured inman months and elapsed time.

The EDOS project attempts to solve those problems by using formal methodscoming from the academic research groups in the project, to address in a novel waythree outstanding problems:

• Dependency management among large, heterogeneous collections of softwarepackages.

• Testing and QA for large, complex software systems.

• The efficient distribution of large software systems, using peer-to-peer and dis-tributed data-base technology.

These problems were studied and various technical reports were produced ex-plaining their importance and giving ways of mathematically expressing them, algo-rithms for solving associated problems and real-world statistics. A certain amountof software was also produced which is, of course, Free and Open Source :

• Java software for the peer-to-peer distribution of software packages. debcheck/rpmcheckis a very efficient piece of Ocaml software for verifying that a Debian or RPMcollection of packages does not contain non-installable packages.

Revision: final 119


• The day-to-day evolution of the Debian packages, that is, its detailed history,can be browsed using anla. This also gives, for every day, reports on instal-lable software packages and a global installability index for every day (Debianweather).

• That history can be queried in the EDOS-designed Debian Query Languageusing the command-line tool history or the AJAX-based EDOS Console.

• Ara is a search engine for Debian packages that allows arbitrary boolean com-binations of field-limited regular-expressions, and that ranks results by popu-larity (again in Ocaml)

5.3 FLOSSMETRICS

FLOSSMetrics stands for Free/Libre Open Source Software Metrics.Industry, SMEs, public administrations and individuals are increasingly relying

on Libre (Free, Open Source) software as a competitive advantage in the globalis-ing, service-oriented software economy. But they need detailed, reliable and com-plete information about Libre software, specifically about its development process,its productivity and the quality of its results. They need to know how to benchmarkindividual projects against the general level. And they need to know how to learnfrom, and adapt, the methods of collaborative, distributed, agile development foundin Libre software to their own development processes, especially within industry.

FLOSSMETRICS addresses those needs by analysing a large quantity (thousands)of Libre software projects, using already proven techniques and tools. This analy-sis will provide detailed quantitative data about the development process, develop-ment actors, and developed artifacts of those projects, their evolution over time, andbenchmarking parameters to compare projects. Several aspects of Libre softwaredevelopment (software evolution, human resources coordination, effort estimation,productivity, quality, etc.) will be studied in detail. The main objective of FLOSS-METRICS is to construct, publish and analyse a large scale database with informa-tion and metrics about Libre software development coming from several thousandsof software projects, using existing methodologies, and tools already developed. Theproject will also provide a public platform for validation and industrial exploitationof results.

The FLOSSMetrics targets are to:

• Identify and evaluate sources of data and develop a comprehensive databasestructure, built upon the results of CALIBRE (WP1, WP2).

• Integrate already available tools to extract and process such data into a com-plete platform (WP2).

Revision: final 120


• Build and maintain an updated empirical database applying extraction tools tothousands of open source projects (WP3).

• Develop visualisation methods and analytical studies, especially relating tobenchmarking, identification of best practices, measuring and predicting suc-cess and failure of projects, productivity measurement, simulation and cost/effortestimation (WP4, WP5, WP6, WP11).

• Disseminate the results, including data, methods and software (WP7).

• Provide for exploitation of the results by producing an exploitation plan, vali-dated with the project participants from industry especially from an SME per-spective (WP8, WP9, WP10).

The main results of FLOSSMETRICS will be: a huge database with factual detailsabout all the studied projects; some higher level analysis and studies which will helpto understand how Libre software is actually developed; and a sustainable platformfor continued, publicly available benchmarking and analysis beyond the lifetime ofthis project. With these results, European industry, SMEs, as well as public adminis-trations and individuals will be able to take informed decisions about how to benefitfrom the competitive advantage of Libre software, either as a development processor in the evaluation and choosing of individual software applications. The projectmethodologies and findings go well beyond Libre software with implications for evo-lution, productivity and development processes in software and services in general.

FLOSSMETRICS is scheduled in three main phases (running partially in parallel).The first one will set up the infrastructure for the project, and the first version of thedatabase with factual data. During the second phase most of the studies and analysiswill be performed, and the contents of the database will be enlarged and improved.During the third phase the results of the project will be validated and adapted to theneeds of the target communities.

The usability of the results of the project (datasets and studies) will be targetedto several different users: SMEs developing or using Libre software (or even in-terested in it), industrial players developing Libre software, and the Libre softwarecommunity at large. Based on the feedback obtained in these contexts, a completeexploitation strategy will also be designed.

Dissemination to these communities will be performed using the project website,specific presentations at conferences, and by organising a series of workshops. Wideimpact of the results will be supported by using open access licenses for all outputdocuments.

The data is also expected to be useful for the scientific community, which coulduse it for their research lines, thus helping to improve the general understanding ofLibre software development.

Revision: final 121


The impact of the project is expected to be large in the Libre software develop-ment realm (and in the whole software development landscape). FLOSSMETRICSwill produce the most complete and detailed view of the current landscape of Libresoftware, providing not only a static snapshot of how projects are performing now,but also historical information about the last ten years of Libre software develop-ment.

5.4 FLOSSWORLD

Free Libre and Open Source Software - Worldwide Impact Study The FLOSS-World project aims to strengthen Europe’s leadership in research into FLOSS andopen standards, building a global constituency with partners from Argentina, Brazil,Bulgaria, China, Croatia, India, Malaysia and South Africa. So far, FLOSSWORLD isa European Union funded project involving 17 institutions from 12 countries span-ning Europe, Africa, Latin America and Asia, to undertake a worldwide study on theimpact of select issues in the context of Free/Libre Open Source Software (FLOSS).

Context Free/Libre/Open Source Software (FLOSS) is arguably one of the bestexamples of open, collaborative, internationally distributed production and develop-ment that exists today, resulting in tremendous interest from around the world, fromgovernment, policy, business, academic research and developer communities.

The problem Empirical data on the impact of FLOSS, its use and developmentis still quite limited. The FP5 FLOSS project and FP6 FLOSSPOLS project havehelped fill in the gaps in knowledge about why and how FLOSS is developed andused, but have necessarily been focused on Europe. FLOSS is a global phenomenon,particularly relevant in developing countries, and thus more knowledge on FLOSSoutside Europe is needed.

Project objectives FLOSSWorld primarily aims to strengthen Europe’s leadershipin international research in FLOSS and open standards, and to exploit research andpolicy complementarities to improve international cooperation, by building a globalconstituency of policy-makers and researchers. It is expected that FLOSSWorld willenhance Europe’s leading role in research in the area of FLOSS and strongly em-bed Europe in a global network of researchers and policy makers, and the business,higher education and developer communities. FLOSSWorld will enhance the levelof global awareness related to FLOSS development and industry, human capacitybuilding, standards and interoperability and e-government issues in the geographi-cal regions covered by the consortium. The project will result in a stronger, sustain-able research community in these regions. Broad constituency-building exercisesrisk losing momentum after initial workshops and meetings without specific actions

Revision: final 122


to sustain a focus. FLOSSWorld will perform three global empirical studies of provenrelevance to Europe and third countries, which will provide a foundation for FLOSS-World’s regional and international workshops. The studies will cover topics suchas impact of being in a FLOSS community on career growth and prospects, motiva-tional factors in choice of FLOSS, perspectives from user community towards FLOSS,inter-regional differences in FLOSS development methodology, etc.

A four track approach FLOSSWorld is designed around three research tracks,each providing insights and gathering empirical evidence on important aspects ofFLOSS usage and development:

1. Human capacity building: investigating FLOSS communities as informal skillsdevelopment environments, with economic value for employment

2. Software development: spotting the regional and international differences -technical, organisational, business - between FLOSS projects across countries

3. e-Government policy: reporting adopted policies and behaviour of governmentsaround the world towards FLOSS, open standards and interoperability

4. Workshops and working group activities to build an international research andpolicy development constituency: Following and in parallel with the researchtracks will be a fourth track, for regional and international workshops and fo-cused working groups from the represented target regions for building furthercollaboration.

The first phase focuses on actual collaboration by implementing tasks 1 to 3,thus, the second phase focuses on analysis and building concrete future collabora-tions. Global dissemination is part of the second track, as is the engagement oforganisations outside the FLOSSWorld consortium.

ScheduleFLOSSWorld is funded by 6th Framework Programme and it is a 2-year project.

In the following table, there is the schedule of the project.

Goals of Workshops During workshops all consortium partners (17 in all) arebrought together with additional participants from their countries, and observersfrom the organisations listed as having provided letters of support to the FLOSS-World project. Workshop participants are experts representing the interests of theOpen Source community, government, businesses, researchers and higher educationinstitutes, as appropriate for the workshop questions. Some participants will take amore active role as specific questions are addressed, but in principle all the threeresearch tracks will be treated in each single workshop.

Revision: final 123


Date Action Subject Place1/05/2005 StartNov 05 - Mar 2006 1st regional work-

shopsDiscuss researchquestions,interact

Buenos Aires, Bei-jing,Mumbai, Sofia(Bulgaria), Nairobi(Kenya)

26/04/2006 –28/04/2006

1st InternationalWorkshop

Brussels, Belgium

Nov 2005 – Jul 2006 On –going surveyand study

Aug 2006 – Sep2006

Analysis

Oct 2006 – Feb2007

2nd Regional andInternational Work-shop

Discuss survey re-sults,policy issues

Feb – Apr 2007 Finalise Recom-mendations

30/04/2007 End

On-going survey FlossWorld is conducting worldwide surveys among the follow-ing target groups:

1. Private sector

2. Government sector

3. Open Source participants community

4. Higher Education Institutes - Administrators

5. Higher Education Institutes - IT Managers

Furthermore, there are different questions from country to country to ensureinternational comparability - i.e. using local currencies in the questionnaire and lo-calised scales when asking about income or expenditure levels, introduction of addi-tional questions that are unique to each country’s context. The FLOSSWorld survey,is at least to become an indicator on local OSS perception, usage and adoption ascompared to other countries in the world.

Revision: final 124


5.5 PYPY

The PyPy project has been an ongoing Open Source Python language implementationsince 2003. In December 2004 PyPy received EU-funding within the FrameworkProgramme 6, second call for proposals ("Open development platforms and services"IST).

PyPy is an implementation of the Python programming language written in Pythonitself, flexible and easy to experiment with. The long-term goals of this project areto target a large variety of platforms, small and large, by providing a compiler toolsuite that can produce custom Python versions. Platform, memory and threadingmodels are to become aspects of the translation process - as opposed to encodinglow level details into the language implementation itself. Eventually, dynamic opti-misation techniques - implemented as another translation aspect - should becomerobust against language changes.

A consortium of 8 (12) partners in Germany, France and Sweden are working toachieve the goal of an open run-time environment for the Open Source Program-ming Language Python. The scientific aspects of the project is to investigate noveltechniques (based on aspect-oriented programming code generation and abstractinterpretation) for the implementation of practical dynamic languages.

A methodological goal of the project is also to show case a novel software engi-neering process, Sprint Driven Development. This is an Agile methodology, provid-ing a dynamic and adaptive environment, suitable for co-operative and distributeddevelopment.

The project is divided into three major phases, phase 1 has the focus of develop-ing the actual research tool - the self contained compiler, phase 2 has the focus ofoptimisations (core, translation and dynamic) and in phase 3 the actual integrationof efforts and dissemination of the results. The project has an expected deadline inNovember 2006.

PyPy is still, though EU-funded, heavily integrated in the Open Source communityof Python. The methodology of choice is the key strategy to make sure that the com-munity of skilled and enthusiastic developers can contribute in ways that wouldn’thave been possible without EU-funding.

5.6 QUALIPSO

Goals The Integrated Project (QualiPSo) aims at making a major contribution tothe state of the art and practice of Open Source Software. The goal of the QualiPSointegrated project is:

to define and implement technologies, procedures and policies to leveragethe Open Source Software development current practices to sound andwell recognised and established industrial operations.

Revision: final 125


The project brings together software companies, application solution developersand research institutions and will be driven by the need for having for OSS softwarethe appropriated level of trust which makes OSS development an industrial and wideaccepted practice. To reach this goal the QualiPSo project will define, deploy andlaunch the QualiPSo Competence Centres in Europe (4), Brazil (1) and China (1) allof the making use of the QualiPSo Factory.

Exploitation of results will be achieved through different routes, but with thecommon theme of partners incorporating these results in current or planned prod-ucts. Under their founders partners the QualiPSo project will be closely related withimportant OSS Communities such as QbjectWeb and Morfeo.

With the economy moving towards new open models, the potential impact ofQualiPSo will be across the entire chain of software system development, proposingan integrated approach along many dimensions:

• technically, through a focus on complementary problem areas addressed bystrong research teams,

• industrially, through application partners from different sectors who share acommon vision for the potential of services,

• managerially, through the creation of a strong management structure based onan entrepreneurial company,

• internationally with partners from different countries coming from differentcontinents,

• Individually, through strong existing working relationships between partners.

The need to sustain and advance the QualiPSo solutions in the future requires anopen sustainability approach. QualiPSo is open in the following ways:

• its use of open standards and the Open Source software development approach

• it is based on an open community to enlarge and enforce its resources andinput from researchers, scientists, art professionals and users

• it is open to expansion, by inserting new application scenarios and other projectresults in a "plug and play" manner.

The project will be structured into the following classes of activities:

• Problem activities: These activities provide the foundation and technologicalcontent upon which the project is built.

• Legal Issues: This activity addresses the need for a clear legal context in whichOSS will be able to evolve within the European Union.

Revision: final 126


• Business Models: This activity addresses the need to incorporate new softwaredevelopment models that can cope with the OSS peculiarities.

• Interoperability: This activity addresses the needs of the software industry forstandards based interoperable software.

• Trustworthy Results: This activity addresses the need for the definition ofclearly identified and tested quality factors in OSS products.

• Trustworthy Processes: This activity addresses the need for the definition of anOSS-aware standard software development methodology.

• Project activities. The project activities are cross-cutting activities that takethe results generated by the problem activities, integrate them in a coherentframework and assess and improve their applicability using the selected ap-plication scenarios. Project activities also include all issues related to indus-trialisation, dissemination, standardisation, and exploitation of the resultingframework. These activities are the following:

• QualiPSo Factory: This activity integrates the results achieved in the prototyp-ing phase of the problem activities to create the QualiPSo environment.

• QualiPSo Competence Centre: this activity aims to develop the means for con-tinuous and sustainable (beyond the scope of the project) centralisation of ref-erence information concerning quality OSS development.

• Promotion and support: this activity aims develop awareness for the QualiPSoresults within the global OSS community.

• Demonstration

• Training: This activity will focus on providing training services both in class-room and through the internet in order to evangelise the results of QualiPSo.

Coordination To achieve its ambitious goal QualiPSo will pursue the following ob-jectives:

• Define methods, development processes, and business models for the imple-mentation and deployment of Open Source Software systems to insure inten-sive software consumers that Open Source projects conform to the standardsrequired to provide industry level software.

• Design and implement a specific environment where different tools are inte-grated to facilitate and support the development of viable industrial OSS sys-tems. This environment will include a secure collaborative platform able to

Revision: final 127


guarantee that there is no facetious intrusion in the development of code. Thisalso implies that necessary audit of the liability of the software for the IT play-ers to be able to indemnify their users in case of problem caused by the soft-ware will be supported.

• Implement specific tools for benchmarking to check the expected quality ofOSS that will prove non-functional properties, such as robustness and scalabil-ity, for supporting major critical applications. The evaluation of these qualitieswill be carried out in a rigorous, yet practical way what will encompass bothstatic (i.e. related to the structure of OSS) and dynamic (i.e. related to theexecution and use of OSS) aspects.

• Implement and support better practices in respect to management of informa-tion (including source code, documentation and info exchanged between actorsinvolved in a project) in order to improve the productivity of development andevolution of OSS systems.

• Demonstrate interoperability which is at the centre of Open Standards com-monly implemented in OSS by providing test suites and qualified integrationstacks.

• Understand the legal conditions by which OSS products are protected andrecognised, without violating the OSS spirit.

• Develop a long lasting network of professionals concerned by the quality ofOpen Source Software for the enterprise computing.

5.7 QUALOSS

The strategic objective of this project is to enhance the competitive position of theEuropean software industry by providing methodologies and tools for improvingtheir productivity and the quality of their software products.

To achieve this goal, this proposal aims to build a high level methodology tobenchmark the quality of Open Source software in order to ease the strategic deci-sion of integrating adequate F/OSS1 components into software systems. The resultsof the QUALOSS project directly address the strategic objective 2.5.5 of providingmethodologies to use Open Source software into industrial development, to enableits benchmarking, and to support its development and evolution.

Two main outcomes of the QUALOSS project achieve the strategic objectives bydelivering an assessment methodology for gauging the evolvability and robustnessof Open Source software and a tool that mostly automate the application of themethodology. Unlike current assessment techniques, ours combines data from soft-ware products (its source code, documentation, etc) with data about the developer

Revision: final 128


community supporting the software products in order to estimate the evolvabilityand robustness of the evaluate software products.

In fact, QUALOSS takes advantage of information widely available in F/OSS repos-itories that often contains both kind of information, that is, software product dataand data produced by the developer community while developing and maintainingthe software product. Although tools aim to automate most of the procedure of apply-ing quality models, it is unlikely that every aspect can be computed hence pointersfrom the user will be needed. This is why tools will be accompanied by a user manualspecifying, first, the manual activities to perform when applying quality models andsecond, how to use the outcomes of the manual activities in combination with toolsto finally estimate the evolvability and robustness of the selected F/OSS component.In the end, tools and the user manual provide the user with integrated assessmentmethodology to gauge the quality of F/OSS components.

Ultimately, tooled methodology reaches the strategic objectives stated above. Byintegrating more evolvable and robust F/OSS components in their solutions, organ-isation will spend less time fighting with the F/OSS component hence will be moreproductive. This proposition will studied through cases studies.

This instrumented method will allow increasing the productivity and improvingthe software quality by integrating evolvable and robust Open Source software. In amore quantifiable way, the targets of QUALOSS project are:

• to increase the productivity of software companies by 30%

• to decrease the average number of defects by 10%

• to decrease the effort to modify a software by 20%

The QUALOSS consortium is composed of leading research organisations in thefield of measurement, software quality and Open Source as well as a panel of indus-try representatives (including SMEs) involved in Open Source projects.

5.8 SELF

SELF will be a web-based, multi-language, free content knowledge base writtencollaboratively by experts and interested users. The SELF Platform aims to be thecentral platform with high quality educational and training materials about FreeSoftware and Open Standards. It is based on world-class Free Software technologiesthat permit both reading and publishing free materials, and is driven by a worldwidecommunity.

The SELF Platform is a repository with free educational and training materialson Free Software and Open Standards and an environment for the collaborativecreation of new materials. Inspired by Wikipedia, the SELF Platform provides thematerials in different languages and forms. The SELF Platform is also an instrument

Revision: final 129


for evaluation, adaptation, creation and translation of these materials. Most impor-tantly, the SELF Platform is a tool to unite community and professional efforts forpublic benefit.

The general strategic objectives of the SELF project are:

• Bring together universities, training centres, Free Software communities, soft-ware companies, publishing houses and government bodies to facilitate mutualsupport and exchange of educational and training materials on Free Softwareand Open Standards.

• Centralise, transmit and enlarge the available knowledge on Free Software andOpen Standards by creating a platform for the development, distribution anduse of information, educational and training programmes about Free Softwareand its main applications.

• Raise awareness and contribute to the building of critical mass for the use ofFree Software and Open Standards.

The concrete project objectives of the SELF project are:

• Research the state of the art of currently available Free Software educationaland training programmes and detect the potential gaps.

• Create an open platform for the development, distribution and use of informa-tion, educational and training programmes on Free Software and Open Stan-dards.

• Develop educational and training materials concerning Free Software and OpenStandards. The project aims for including information on at least 50 softwareapplications in the initial period.

• Make the SELF platform self-sustainable by creating an active community of in-dividuals and institutions (universities, training centres, Free Software commu-nities, software companies, publishing houses and government bodies) aroundit. The SELF project aims for involving at least 150 members in the SELFcommunity by the end of the project.

While the SELF platform will be started by the members of the consortium, its fi-nal goal is to become a community of different interested parties (from governmentsand educational institutes to companies) that can not only exploit the SELF materialsbut also participate in its production. The commercial and educational interests ofexploiting the SELF materials will assure the self-sustainable character of the SELFPlatform beyond the EC funding period. The SELF Project aims for involving at least150 members in the SELF community by the end of the project.

This project starts from three main assumptions:

Revision: final 130


1. Free Software and Open Standards are crucial to support the competitive po-sition of the European software industry.

2. The real and long term technological change from private to Free Software canonly come by investing in education and training.

3. The production of educational and training materials on Free Software andOpen Standards should be done collaboratively by all the parties involved.

That is why the SELF platform will have two main functions. It will be simulta-neously a knowledge base and a collaborative production facility. On the one hand,it will provide information, educational and training materials that can be presentedin different languages and forms: from course texts, presentations, e-learning pro-grammes and platforms to tutor software, e-books, instructional and educationalvideos and manuals. On the other hand, it will offer a platform for the evaluation,adaptation, creation and translation of these materials. The production process ofsuch materials will be based on the organisational model of Wikipedia. In short,SELF will be a web-based, multi-language, free content knowledge base written col-laboratively by experts and interested users.

5.9 TOSSAD

Europe, as a whole, has a stake in improving the usage of F/OSS in all branches ofIT and public life, in general. F/OSS communities throughout Europe can achievebetter results through co-ordination of their research activities/programmes thatreflect the current state-of-the-art.

The main objective of the tOSSad project is to start integrating and exploitingalready formed methodologies, strategies, skills and technologies in F/OSS domainin order to help governmental bodies, educational institutions and SMEs to shareresearch results, establish synergies, build partnerships and innovate in an enlargedEurope.

More precisely, the tOSSad project aims at improving the outcomes of the F/OSScommunities throughout Europe through supporting the coordination and network-ing of these communities by means of state-of-the-art studies, national program initi-ations, usability cases, curriculum development and implementation of collaborativeinformation portal and web based groupware.

Main tOSSad coordination activities are:

• F/OSS study (Work package 1)

• F/OSS national programs (Work package 2)

• F/OSS usability study (Work package 3)

Revision: final 131


• F/OSS curriculum development (Work package 4)

• Dissemination and exploitation (Work package 5)

Work package 1 has the intention of producing a report detailing both the currentstatus of F/OSS adoption in European countries, and the barriers that such futureadoption might face. It has the intention of producing a report detailing both thecurrent status of F/OSS adoption in European countries, and the barriers that suchfuture adoption might face. The main goal is to give a clear picture of the currentstatus (usage, implementation, adoption, penetration, government policies, etc.) ofF/OSS related to following topics:

• The technical barriers that hinder F/OSS usage in a larger scale

• Infra-structural weaknesses in some European countries

• Usability and accessibility

• Operating system specific technical problems

• Social barriers that hinder F/OSS usage in a larger scale

• Educational weakness

• Cultural readiness

• Political and financial problems

• Market problems (existing monopolies of any sort)

• Current and future trends and opportunities

The main deliverable of the WP1 is a report entitled ”F/OSS Study”.Work package 2 aims to start up national programmes for improved usage of

F/OSS in some of the target countries and develop guidelines that will be used forF/OSS adoption in the public sector. This Work package aims to start up nationalprogrammes for improved usage of F/OSS in some (at least one) of the target coun-tries and develop guidelines that will be used for F/OSS adoption in the public sector.As part of this Work package, an expert group (containing individuals from partners,as well as policy makers from governmental bodies) will be established in the kick-off meeting in which all participants will attend. This expert group will also helpnational and regional government institutes understand the benefits of F/OSS andOpen Source components where possible. A main goal of Work package WP2 is toproduce a road-map for F/OSS adoption. The deliverables of the work package aredesigned according to this main goal.

Work package 2 tasks:

Revision: final 132


• Organising one workshop aiming to determine the requirements for nationalprograms with special focuses on best practices and success stories, F/OSS inthe public sector and migration strategies.

• Preparing research documents which can be proposed to be added to the Na-tional ICT Programmes. These documents should focus on the following items:

• Usability centres, F/OSS R&D and solution centres

• Making use of F/OSS for e-learning

• F/OSS training and certification solutions for IT people, developers and usersmaking use of existing or new training institutions

• Catalysing the formation of Open Source communities and participation in thedevelopment of Open Source software as part of global projects

• Collaborative models of joint development between F/OSS target countries"and MS with superior F/OSS adoption.

• Building partnerships within the public and private sectors and civil society, aswell as regionally within Europe.

• Preparing not only high-line case histories, but also all the details needed tocopy and implement F/OSS solutions locally.

• Lobbying the national strategies decision makers in public sector by puttingforward reports on economical and social benefits of F/OSS usage. These re-ports can include success stories in Europe and worldwide.

• Developing guidelines towards F/OSS adoption and dissemination in publicbodies.

The major objectives of Work package 3 are to tackle the obstacles and leadingto a breakthrough of usability in F/OSS, by assuring that usability will be paid moreattention in F/OSS in the future.

Within the usability work package of tOSSad the major objectives are to tackleobstacles and leading to a breakthrough of usability in F/OSS, by assuring that us-ability will be paid more attention in F/OSS in the future. To reach these objectives,besides the intensive spreading of awareness, the following three major areas willbe addressed within Work package 3:

• State of the art usability based on both in depth desk research and an empiricalsurvey in F/OSS. If appropriate, the survey will be integrated into the empiricalinvestigations conducted in Work package 1

Revision: final 133


• Usability test of selected F/OSS components with a specific focus towards desk-top applications, personal information management (PIM) and office applica-tions

• Based on the test results and research in the area of tomorrow’s usability re-quirements (thinking of mobile end devices, voice interaction, wearable) F/OSSgaps will be detected. Derived from these recommendations for future researchdirections will generated

• Guideline taking both the attention of usability aspects during F/OSS develop-ment and the conduction of usability testing into account Thereby a recurrentuser involvement for usability assurance during shared developments via mock-ups for inclusion in F/OSS development environment will be focused.

Work package 4 gathers partners with deep and complementary knowledge insoftware engineering, university curricula development, e-learning and collaborativelearning, application of Open Source methodology and business models to real worldproblems. WP4 partners shall work together in order to define one or more broadlyaccepted, detailed curricula for F/OSS. There will be a focus in particular on items2 and 3 below (courses and curricula about F/OSS operating system Linux relatedsystem applications, and courses and and F/OSS software development tools), notexcluding studying and giving suggestions on items 1, 4 and 5.

Work package 4 curriculum development items are as follows:

• Courses and curricula about using the most popular F/OSS desktop applica-tions - F/OSS office automation software, mail applications, Web browsers,Wiki’s, etc. - even on proprietary operating systems.

• Courses and curricula about F/OSS server application & management - Linuxoperating system Application Server (Tomcat), Web Server (Apache), databases,middleware, and related system applications.

• Courses and curricula about F/OSS software development tools IDE (Eclipse),Versioning System and related tools.

• Courses and curricula about how to develop and take advantage of F/OSS soft-ware and software engineering of F/OSS. They are related to ongoing researchon methodologies and tools for F/OSS development, and aim to train softwaredevelopers able to build, customise and consult on F/OSS applications, beingactive members of the F/OSS development community.

• Use of F/OSS software in computer science courses and curricula, as a cheapand powerful mean to help understanding the computer science concepts.

Revision: final 134

References

[A.I89] A.I.Wasserman. The architecture of case environments. CASE Out-look,pp 13-22, 1989.

[Alb79] A. J. Albrecht. Measuring application development. Proceedings of IBMApplications Development Joint SHARE/GUIDE Symposium, Monterey,CA, pp 83-92, 1979.

[ASAB02] Ioannis P. Antoniades, Ioannis Stamelos, Lefteris Angelis, and GeorgeBleris. A novel simulation model for the development process of opensource software projects. Software Process Improvement and Practice,vol.7, pp 173-188, 2002.

[ASS+05] Ioannis P. Antoniades, Ioannis Samoladas, Ioannis Stamelos, Lefteris An-gelis, and George Bleris. Dynamical Simulation Models of the OpenSource Development Process, chapter 8, pages 174–202. Idea GroupInc., 2005.

[BBM96] Victor Basili, Lionel Briand, and Walcelio Melo. A validation of object-oriented design metrics as quality indicators. IEEE Transactions on Soft-ware Engineering, Vol. 22, No. 10, pp 751-761, 1996.

[BCR94] V. Basili, G. Caldiera, and D. Rombach. Encyclopedia of Software Engi-neering, Vol. 1, pages 528-532. John Wiley and Sons, 1994.

[BDPW98] L. Briand, J. Daly, V. Porter, and J. Wuest. A comprehensive empirical val-idation of product measures for object-oriented systems. IEEE METRICSSymposium, Washington D.C, USA, 1998.

[BEM95] Lionel Briand, Khaled El Emam, and Sandro Morasca. Theoreticaland empirical validation of software product measures, technical reportisern-95-03. Technical report, ISERN, 1995.

[Beza] Nikolai Bezroukov. Open source software development as a special typeof academic research (critique of vulgar raymondism).

[Bezb] Nikolai Bezroukov. A second look at the cathedral and the bazaar.

[BGD+06] C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan. Min-ing email social networks. In Proceedings of International Workshop onMining Software Repositories (MSR-06)., 2006.

[BL96] M. Berry and G. Linoff. Data Mining Techniques For marketing, Salesand Customer Support. John Willey and Sons Inc., 1996.


[BR03] A. Bonaccorsi and C. Rossi. Why open source can succeed. http://opensource.mit.edu/papers/rp-bonaccorsirossi.pdf, 2003.

[Bro75] Frederick P. Brooks. The Mythical Man-Month: Essays on Software En-gineering. Addison-Wesley, 1975.

[BYRN99] R. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval. Ad-dison Wesley., 1999.

[CC05] G. Canfora and L. Cerulo. Impact analysis by mining software and changerequest repositories. In Proceedings of the 11th IEEE International Soft-ware Metrics Symposium (METRICS-05)., 2005.

[CH04] M.L. Collard and J.K. Hollingsworth. Meta-differencing: An infrastruc-ture for source code difference anlysis. Kent State University, Kent, OhioUSA, Ph.D. Dissertation Thesis, 2004.

[Cha02] S. Chakrabarti. Mining the Web: Analysis of Hypertext and Semi Struc-tured Data. Morgan Kaufmann., 2002.

[CHA06] K. Crowston, J. Howison, and H. Annabi. Information systems successin free and open source software development: Theory and measures.Software Process Improvement and Practice, 11, pp. 123-148, 2006.

[CK76] S. R. Chidamber and C. F. Kemerer. A metrics suite for object orienteddesign. IEEE Transactions in Software Engineering, Vol. 20, 1994, pp.476-493, 1976.

[CLM04] Andrea Capiluppi, Patricia Lago, and Maurizio Morisio. Software engi-neering metrics: What do they measure and how do we know? 10thInternational Software Metrics Symposium, METRICS 2004, 2004.

[CMR04] Andrea Capiluppi, Maurizio Morisio, and Juan F. Ramil. Structural evo-lution of an open source system: a case study. In Proceedings of the 12thIEEE International Workshop on Program Comprehension (IWPC), Bari,Italy, June 24-26, 2004, 2004.

[Con06] S.M. Conlin. Beyond low-hanging fruit: Seeking the next generation infloss data mining. In IFIP International Federation for Information Pro-cessing (IFIP), Vol. 203, Open Source Systems, pp. 261-266,, 2006.

[DH73] R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. JohnWiley and Sons, 1973.

[DOS99] Chris DiBona, Sam Ockman, and Mark Stone. Open Sources: Voices fromthe Open Source Revolution. OReilly and Associates, 1999.

Revision: final 136


[DSA+04] I. Deligiannis, I. Stamelos, L. Angelis, M. Roumeliotis, and Shepperd.M. A controlled experiment investigation of an object oriented designheuristic for maintainability. The Journal of Systems and Software, 72,pp 129-143, 2004.

[DSRS03] I. Deligianis, M. Shepperd, M. Roumeliotis, and I Stamelos. An empiricalinvestigation of and object oriented design heuristic for maintainability.The Journal of Systems and Software, 65, pp 127-139, 2003.

[DTB04] Trung Dinh-Trong and James Bieman. Open source software develop-ment: A case study of freebsd. In Proceedings of the 10th IEEE Interna-tional Symposium on Software Metrics, 2004.

[DZ07] C. Ding and H. Zha. Spectral clustering, ordering and ranking statisti-cal learning. Springer Verlag, Computational Science and Engineering.,2007.

[Fel98] C. Fellbaum. WordNet – an electronic lexical database. MIT Press., 1998.

[FI93] U. Fayyad and K. Irani. Multi-interval discretization of continuous-valuedattributed for classification learning. In Proceedings of the 13th Interna-tional Joint Conference on Artificial Intelligence, 1993.

[FLMP04] P. Francis, D. Leon, M. Minch, and A. Podguraki. Tree-based method forclassifying software failures. In Proceedings of the 15th InternationalSymposium on Software Reliability Engineering, 2004.

[FP97] Norman Fenton and Shari Lawrence Pfleeger. Software Metrics - A Rig-orous Approach. International Thomson Publishing, London, 1997.

[FPSSR96] U. M. Fayyad, G. Piatesky-Shapiro, P. Smuth, and Uthurusamy R. Ad-vances in Knowledge Discovery and Data Mining. AAAI Press, 1996.

[Fug93] A. Fuggetta. A classification of case technology. Computer, 26(12):25-38,1993.

[Ger04a] D. M. German. An empirical study of fine-grained software modifications.In Proceedings of 20th IEEE International Conference on Software Main-tenance (ICSM’04), 2004.

[Ger04b] Daniel German. Software process improvement and practice, vol.8,2004. Software Process Improvement and Practice, 8:201–215, 2004.

[GFS05] Tibor GyimÃsthy, Rudolf Ferenc, and IstvÃan Siket. Empirical validationof object-oriented metrics on open source software for fault prediction.IEEE Transactions on Software Enginering, 31(10):897–910, 2005.

Revision: final 137


[GHJ98] H. Gall, K. Hajek, and M. Jazayeri. Detection of logical coupling based onproduct release history. In Proceedings of the 14th IEEE InternationalConference in Software Maintainance, 1998.

[Gho04] A.R. Ghosh. Clustering and dependencies in free/open source softwaredevelopment: Methodology and tools. Firstmonday, 8(4), 2004.

[GM03a] D. German and A. Mockus. Automating the measurement of open sourceprojects. In Proceedings of the 3rd Workshop on Open Source SoftwareEngineering, 25th International Conference on Software Engineering(ICSE-03)., 2003.

[GM03b] M. German and A. Mockus. Automating the measurement of open sourceprojects. In Proceedings of the First International Conference on OpenSource Systems. Genova, Italy, pp. 100-107, 2003.

[GT00] Michael W. Godfrey and Qiang Tu. Evolution in open source software:A case study. 16th IEEE International Conference on Software Mainte-nance (ICSM’00), 2000.

[Hal77] M. H. Halstead. Elements of software science. 1977.

[HH04] A. Hassan and R.C. Holt. Predicting change propagation in softwaresystems. In Proceedings of 26th International Conference on SoftwareMaintenance (ICSM’04), 2004.

[HK76] S. M. Henry and D. Kafura. Software structure measurements based oninformation flow. IEEE Transactions in Software Engineering, Vol. SE-7,1981, pp. 510-518, 1976.

[HM05] Koch S. Hahsler M. Discussion of a large-scale open source data col-lection methodology. In Proceedings of the 38th Hawaii InternationalConference on System Sciences (IEEE, HICSS ’05-Track 7), Jan 03-06,Big Island, Hawaii, page 197b., 2005.

[IB06] Clemente Izurieta and James Bieman. The evolution of freebsd and linux.In ACM/IEEE International Symposium on Empirical Software Engineer-ing, Rio de Janeiro, Brazil, 21-22 September, 2006, 2006.

[IEE98] IEEE. Standard for a software quality metrics methodology, revision.IEEE Standards Department., 1998.

[Irb]

[IV98] N.M. Ide and J. Veronis. Word sense disambiguation: The state of the art.Computational Linguistics, 24:1–40., 1998.

Revision: final 138


[JD88] A.K. Jain and R.C. Dubes. Algorithms for Clustering Data. Prentice-Hall,1988.

[JL99] T. Jokikyyny and C. Lassenius. Using the internet to communicate soft-ware metrics in a large organization. In Proceedings of GlobeCom’99,1999.

[Jon95] C. Jones. Backfiring: Converting lines of code to function points. IEEEComputer, Vol. 28, No. 11, pp 87-88, 1995.

[JS04] C. Jensen and W. Scacchi. Data mining for software process discovery inopen source software development communities. In Proceedings of In-ternational Workshop on Mining Software Repositories (MSR-04)., 2004.

[Kan03] Stephen H. Kan. Metrics and Models in Software Quality Engineering.Addison Wesley Professional, 2003.

[KB04a] C. Kaner and P. B. Bond. Software engineering metrics: What do theymeasure and how do we know? In Proceedings of the 10th InternationalSoftware Metrics Symposium., 2004.

[KB04b] Cem Kaner and Walter Bond. Software engineering metrics: What dothey measure and how do we know? 10th International Software MetricsSymposium, METRICS 2004, 2004.

[KCM05] H. Kagdi, L. Colland, and J. Maletic. Towards a taxonomy of approachesfor mining of source code repositories. In Proceedings of InternationalWorkshop on Mining Software Repositories(MSR), 2005.

[KDTM06] Y. Kannelopoulos, Y. Dimopoulos, C. Tjortjis, and C. Makris. Miningsource code elements for comprehensing object-oriented systems andevaluating their maintainability. SIGKDD Explorations, Vol.8, Issue 1,2006.

[KPP+02] Barbara Kitchenham, Shari Lawrence Pfleeger, Lesley M. Pickard, Pe-ter W. Jones, David C. Hoaglin, Khaled El Emam, and Jarrett Rosenberg.Premilinary guidelines for empirical research in software engineering.IEEE Transactions on Software Engineering, Vol. 28, No. 8, pp 721-733,2002.

[KR90] L. Kauffman and P.J. Rousseeuw. Finding Groups in Data: An Introductionto Cluster Analysis. John Wiley and Sons, 1990.

[KRSZ00] R. Kempkens, P. Rsch, L. Scott, and J. Zettel. Instrumenting measure-ment programs with tools. technical report 024.00/e. Technical report,Fraunhofer IESE, March 2000.

Revision: final 139


[KSL03] G. Krogh, S. Spaeth, and K. Lakhani. Community, joining, and specialisa-tion in open source software innovation: a case study. Research Policy,Vol. 32, pp. 1217-1241, 2003.

[KSPR01] S. Komi-Sirvi, P. Parviainen, and J. Ronkainen. Measurement automa-tion: Methodological background and practical solutions-a multiple casestudy. n Proceedings of the 7th International Software Metrics Sympo-sium (Metrics 2001), London, 2001.

[KT05] G Koru and J. Tian. Comparing high-change modules and modules withthe highest measurement values in two large-scale open-source prod-ucts. IEEE Transactions on Software Engineering, Vol. 31, No. 6, pp625-642, 2005.

[lD05] On line Document. Business readiness rating for open source. BRR 2005- RFC 1, http://www.openbrr.org, 2005.

[LFK05] M. Last, M. Friedman, and A. Kandel. The data dimining approach toautomated software testing. In Proceeding of the SIGKDD Conference,2005.

[LK94] M. Lorenz and J Kidd. Object Oriented Software Metrics, A PracticalGuide. Prentice-Hall, Englewood Cliffs, N.J., 1994.

[LK03a] Hippel von E. Lakhani K. How open source software works: "free" user-to-user assistance. Research Policy, 32:923–943., 2003.

[LK03b] M. Last and A. Kandel. Automated test reduction using an info-fuzzy net-work. Annals of Software Engineering, Special Volume on ComputationalIntelligenece in Software Enginnering, 2003.

[LRW+97] M. M. Lehman, J. F. Ramil, P. D. Wernick, D. E. Perry, and W. M. Turski.Metrics and laws of software evolution - the nineties view. 4th Interna-tional Software Metrics Symposium (METRICS’97), 1997.

[Mar04] Michlmayr Martin. Managing volunteer activity in free software projects.In Proceedings of the 2004 USENIX Annual Technical Conference,Freenix Track, pp.93-102., 2004.

[McC76] T. J. McCabe. A complexity measure. IEEE Transactions in SoftwareEngineering, Vol. 2, No. 4, December 1976, pp. 308-320, 1976.

[MFH02] Audris Mockus, Roy T. Fielding, and James Herbsleb. Two case studiesof open source software development: Apache and mozilla. ACM Trans-actions on Software Engineering and Methodology, vol.11, no.3, 2002.

Revision: final 140


[Mit05] R. Mitkov. The Oxford Handbook of Computational Linguistics. OxfordUniversity Press., 2005.

[MN99] M. Mendonca and Sunderhaft N. Mining software engineering data: Asurvey. Report (SPO700-98-D-400), 1999.

[MS99] C.D. Manning and H. Schutze. Foundations of Statistical Natural Lan-guage Processing. MIT Press., 1999.

[MTF04] R. Mihalcea, P. Tarau, and E. Figa. PageRank on semantic networks,with application to word sense disambiguation. In Proceedings of the20th International Conference on Computational Linguistics (COLING-04)., 2004.

[MTV+05] D. Mavroeidis, G. Tsatsaronis, M. Vazirgiannis, M. Theobald, andG. Weikum. Word sense disambiguation for exploiting hierarchical the-sauri in text classification. In Proceedings of the 9th European Confer-ence on Principles of Data Mining and Knowledge Discovery (PKDD-05).,2005.

[OH94] P. Oman and J. Hagemeister. Constructing and testing of polynomialspredicting software maintainability. Journal of Systems and Software 24,3 (March 1994): 251-266., 1994.

[Par94] David Lorge Parnas. Software aging. Proceedings of the 16th Interna-tional Conference on Software Engineering, 1994.

[Pfl01] Shari Lawrence Pfleeger. Software Engineering: Theory and Practice.Prentice-Hall, 2nd edition, 2001.

[PMM+03] A. Podgurski, W. Masri, Y. McCleese, M. Minch, J. Sun, B. Wang, andW. Masri. Automated support for classifying software failure reports. InProceedings of the 25th International Conference on Software Engineer-ing, 2003.

[PSE04] James W. Paulson, Giancarlo Succi, and Armin Eberlein. An empiricalstudy of open-source and closed-source software products. IEEE Trans-actions on Software Engineering, vol.30, No.4, pp 246-256, 2004.

[Ray99] Eric Steven Raymond. The Cathedral and the Bazaar: Musings on Linuxand Open Source by an Accidental Revolutionary. O’Reilly and Asso-ciates, 1999.

[Ray01] Eric Steven Raymond. How to become a hacker, 2001.

Revision: final 141


[RGBG04] Gregorio Robles, JesÃžs M. GonzÃalez-Barahona, and Rishab AiyerGhosh. Gluetheos: Automating the retrieval and analysis of data frompublicly available repositories. Proceedings of the Mining SoftwareRepositories Workshop. 26th International Conference on Software En-gineering (Edinburgh, Scotland), 2004.

[Rie96] Arthur J. Riel. Object Oriented Design Heuristics. Addison Wesley Pro-fessional, 1996.

[RKGB04] Gregorio Robles, Stefan Koch, and Jesus M. Gonzalez-Barahona. Remoteanalysis and measurement of libre software systems by means of theCVSAnalY tool. In Proceedings of the 2nd ICSE Workshop on RemoteAnalysis and Measurement of Software Systems (RAMSS), Edinburg,Scotland, UK, 2004.

[Rob05] Gregorio Robles. EMPIRICAL SOFTWARE ENGINEERING RESEARCHON LIBRE SOFTWARE: DATA SOURCES, METHODOLOGIES AND RE-SULTS. PhD thesis, Dept. of Informatics. Universidad Rey Juan Carlos,Madrid, Spain., 2005.

[RRP04] S. Raghavan, R. Rohana, and A. Podgurski. Dex: A semantic-graph dif-ferencong tool for studying changes in large code bases. In Proceed-ings of 20th IEEE International Conference on Software Maintenance(ICSM’04), 2004.

[Sal] Peter H. Salus. The daemon, the gnu and the penguin.

[SAOB02] Ioannis Stamelos, Lefteris Angelis, Apostolos Oikonomou, and Geor-gios L. Bleris. Code quality analysis in open source software develop-ment. Information Systems Journal, 12(1):43âAS60, 2002.

[Sch92] Norman Schneidewind. Methodology for validating software metrics.IEEE Transactions on Software Engineering, Vol. 18, No. 5, pp 410-422,1992.

[SMC74] W. Stevens, G. Myers, and L. Constantine. Structured design. IBM Sys-tems Journal, 13, 2, 1974.

[SSA06] S.K. Sowe, I. Stamelos, and L. Angelis. Identifying knowledge brokersthat yield software engineering knowledge in oss projects. Informationand Software Technology, 48, 11(November 2006): 1025-1033, 2006.

[SSAO04] Ioannis Samoladas, Ioannis Stamelos, Lefteris Angelis, and Aposto-los Oikonomou. Open source software development should strivefor even greater code maintainability. Communications of the ACM,47(10):83âAS87, 2004.

Revision: final 142


[Tur96] Wladyslaw M. Turski. Reference model for smooth growth of softwaresystems. IEEE Transactions on Software Engineering, vol.22, no.8, 1996.

[TVA07] G. Tsatsaronis, M. Vazirgiannis, and I. Androutsopoulos. Word sensedisambiguation with spreading activation networks generated from the-sauri. In Proceedings of the 20th International Joint Conference on Arti-ficial Intelligence (IJCAI-07)., 2007.

[Voo93] E.M. Voorhees. Using WordNet to disambiguate word senses for text re-trieval. In Proceedings of the 16th International Conference on Researchand Development in Information Retrieval (SIGIR-93)., 1993.

[VT06] L. Voinea and A. Telea. Mining software repositories with cvsgrab. InProceedings of International Workshop on Mining Software Repositories(MSR-06)., 2006.

[Wei71] Gerald M. Weinberg. The Psychology of Computer Programming. VanNostrand Reinhold, 1971.

[WH05] C.C. Williams and J.K. Hollingsworth. Automating mining of source coderepositories to improve bug finding techniques. IEEE Transactions onSoftware Engineering 31(6):466–480., 2005.

[WK91] S.M. Weiss and C. Kulikowski. Computer Systems that Learn: Classi-fication and Prediction Methods from Statistics, Neural Nets, MachineLearning and Expert Systems. Morgan Kauffman, 1991.

[WO95] K. D. Welker and P. W. Oman. Software maintainability metrics modelsin practice. Crosstalk, Journal of Defense Software Engineering 8, 11(November/December 1995): 19-23, 1995.

[YC91] E. Yourdon and P. Coad. Object-Oriented Design. Prentice-Hall, Engle-wood Cliffs, N.J., 1991.

[YSC+06] L. Yu, S. Schach, K. Chen, G. Heller, and J. Offnutt. Maintainability of thekernels of open source operating systems: A comparison of linux withfreebsd, netbsd and openbsd. The Journal of Systems and Software, 79,807-815, 2006.

[YSCO04] Liguo Y., S.R. Schach, K. Chen, and J. Offnutt. Categorization of thecommon coupling and its application to the maintainability of the linuxkernel. IEEE Transaction on Software Engineering, Vol. 30, No. 10, pp694-706, 2004.

Revision: final 143


[ZWDZ04] T. Zimmermann, P. Weibgerber, S. Diehl, and A. Zeller. Mining versionhistories to guide software changes. In Proceedings of 26th InternationalConference on Software Engineering (ICSE’04), 2004.

XML Extensible Markup Language

SQL Structured Query Language

HTML Hypertext Markup Language

IDE Integrated Development Environment

UML Unified Modeling Language

CSV Comma Separated Values

COTS Commercial off-the-shelf

Revision: final 144

Documents

SQO-OSS D 2 final