Upload
esem-2014
View
86
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Context: Number of defects fixed in a given month is used as an input for several project management decisions such as release time, maintenance effort estimation and software quality assessment. Past activity of developers and testers may help us understand the future number of reported defects. Goal: To find a simple and easy to implement solution, predicting defect exposure. Method: We propose a temporal collaboration network model that uses the history of collaboration among developers, testers, and other issue originators to estimate the defect exposure for the next month. Results: Our empirical results show that temporal collaboration model could be used to predict the number of exposed defects in the next month with R2 values of 0.73. We also show that temporality gives a more realistic picture of collaboration network compared to a static one. Conclusions: We believe that our novel approach may be used to better plan for the upcoming releases, helping managers to make evidence based decisions
Citation preview
Effect of Temporal Collaboration Network, Maintenance Activity, and
Experience on Defect Exposure
Andriy Miranskyy1, Bora Caglayan1, Ayse Bener1, and Enzo Cialini2
1DSL, Ryerson University2IBM Toronto Labs
ESEM'14 1
Motivation
• Number of defects fixed in a given month is used as an input for several project management decisions, e.g.:– release time,
– maintenance effort estimation,
– software quality assessment.
• Past activity of developers and testers may help us understand the future number of reported defects.
ESEM'14 2
Proposed Solution/ Method
• We propose a temporal collaboration network model that uses the history of collaboration among developers, testers, and other issue originators to estimate the defect exposure for the next month.
ESEM'14 3
MethodologyDataset
• An enterprise software developed in 5 countries;
• 30 MLOC;
• 4 releases;
• 10 years (2003 – 2013);
• Developed using C/C++;
• Development lifecycle:– development starts 3-4 years
before shipping date;
– product is maintained for 5-7 years.
Actors
• Record owner– Developer, merger / integrator
• Record originator (defect fix)– developer, tester, or support
analyst
– originator and owner can be the same person
• If the change is associated with a new functionality, then the originator can be a requirements engineer
ESEM'14 4
Attributes• In total 20 attributes are used
– Project, Maintenance activity, Collaboration activity, Experience
• Collaboration types– edge_type: owner-owner, owner-originator, and all.
• Collaboration network may contain data for different time horizons, denoted by graph_tm.
• For a given month we are looking at three time frames: cumulative, quarterly, and monthly:– cumulative, for a given month ti, includes all collaboration data
from the beginning of release (t0) up to and including the data from ti: [t0, ti]
– Quarterly: includes all collaboration data for the last three month: [ti-2, ti]
– Monthly, includes all collaboration for a month ti.
ESEM'14 5
Data Cleaning• Eliminated non-valid change records by filtering out all
records that did not result in code change. (duplicate, rejected, returned, etc. records)
• If at least one attribute associated with a given month_before_ga was missing, the whole month was removed (low activity months; early/ late SDLC)
• Multicolinnearity– iteratively removing explanatory variable (attribute) with the
maximum value of the variance inflation factor (VIF), until VIF value of all the attributes in the model was less then five.
• Each data subset, after cleaning, contains between 96 and 341 observations.
ESEM'14 6
Model Selection and Validation
• There may be multiple models yielding similar performance. • Our performance criterion to select the best model is stepwise
algorithm by the Akaike Information Criterion (AIC). • The coefficient of determination, R2, is used as a measure of
prediction accuracy. • The significance of the model is assessed using an F-test for lack of
fit: we conservatively set test’s p-value at less than 0.01 (i.e., probability that a model is statistically insignificant is less than 1%).
• In order to avoid overfitting, we bootstrap 10,000 times, re-fitting the best model to each resampled dataset, and averaging R2 values obtained from each of the re-fits.– We chose bootstrapping over k-fold validation, since small sample size
(in terms of monthly, quarterly, or cumulative data points) prevented us from obtaining robust results for k-fold validation.
ESEM'14 7
Results- Prediction Accuracy
• We split prediction models into four groups containing
– all the attributes (excluding project ones);
– maintenance activity attributes;
– collaboration activity attributes;
– experience attributes.
ESEM'14 8
ESEM'14 9
Models’ prediction accuracy (R2) and
count of explanatory
variables/attributes (Attr. Count)
Models’ prediction accuracy (R2) and count of explanatory variables/attributes (Attr. Count)
ESEM'14 10
The values prefixed with * denote models with explanatory variables t-test p-value > 0.1.
Top Models
Collaboration Attributes Only:A: [-24, -1], B: [0, +∞), C: (-∞, +∞)
All Attributes (excluding project attributes): D: [-24, -1], E: [0, +∞), F: (-∞, +∞)
ESEM'14 11
Collaboration Attributes
• avg_betweenness and avg_degree– Appear in all three top performing models (A, B,
C);
– Explain 21% to 52% of models’ variability (depending on the model);
– Defect count increases with the increase of the attributes:• This suggests that the number of defects increases as
the load on and importance of the participants increases, and collaboration intensifies.
ESEM'14 12
Collaboration Attributes
• avg_page_rank– Shows up only in the model B
– It is the least important term, explaining 6% of the variability;
– Increase of the attribute’s value leads to decrease of defect count. • This may suggest that emergence of important (in the
PageRank sense) participants leads to defect reduction.
• However, given that the term appears in a single top model and has relatively low impact, the last statement is inconclusive.
ESEM'14 13
Collaboration Attributes
• edge count and node count
– Not included into the top prediction models due to their high correlation with the remaining collaboration attributes.
– This fact also suggests that more sophisticated metrics, capturing load and importance of the nodes, contain more information than the metrics capturing just the number of nodes and edges.
ESEM'14 14
Temporal vs. Static Collaboration Network
• Advantage of temporal collaboration networks over the static collaboration networks: the accuracy of the collaboration assumption.
• In the static collaboration model, two developers who perform activities on the same software modules are assumed to have a collaboration link.
• However, there may be a substantial time difference between those two activities. Therefore, the definition of collaboration in the static model may be inaccurate. – This thesis is also supported by the fact that only 1 best
performing model out of 6 use graph_tm cumulative data: the rest rely on monthly or quarterly data.
ESEM'14 15
Types of Collaboration for Top-performing Models
• [-24, -1] : models A, D– use edge_type = owner-owner – Developer interactions are the most important for the active
development phase of the product lifecycle.
• [0, +∞) : models B, E– use edge_type = owner-originator or edge_type = all. – Information about originators becomes important for maintenance
part of the product lifecycle, resonating with our discussion importance of customer-related attributes.
• (-∞, +∞) : models C, F– use edge_type = all– Suggests that information about owner-owner and owner-originator is
important at various points of the lifecycle. This is not surprising, given that -∞, +∞ time interval encloses [-24, -1] and [0, +∞) time intervals.
ESEM'14 16
Threats to Validity• Construct validity:
– collaboration network graph: from the record change data; these data cannot capture all collaboration activities in an organization.
• Internal validity:– to prevent data gathering issues all data collection and cleaning was
automated, a complete set of defects was analyzed.
• Statistical validity: – to prevent model over-fitting, the models were re-fit using
bootstrapping technique (as discussed in; the attributes were analyzed for multi-collinearity and highly correlated attributes (with VIF > 5) were removed from the model
• External validity: – design is based on the rationale of the critical case releases of a large
enterprise software. – Our results should be transferable to other researchers with well-
designed and controlled experiments
ESEM'14 17
Summary• Temporal collaboration model could be used to predict the number of
exposed defects in the next month with R2 values of 0.73.– Comparable to the model based on maintenance attributes (R2=0.76) and a
mix of collaboration & maintenance attributes (R2=0.79).– The model is simple, based on just two attributes.– An analyst can choose between collaboration and maintenance attributes (or
both), depending on which set of attributes is easier to extract for a given change tracking system.
• Temporality gives a more realistic picture of collaboration network compared to a static one.
• Our novel approach may be used to better plan for the upcoming releases, helping managers to make evidence based decisions.
• Future work– Extend to new datasets– Integrate to Dione (our software analytics tool)
ESEM'14 18