36
21/03/22 Dr Andy Brooks 1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Embed Size (px)

Citation preview

Page 1: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

19/04/23 Dr Andy Brooks 1

MSc Software MaintenanceMS Viðhald Hugbúnaðar

Fyrirlestrar 43 og 44Estimating Effort for Corrective

Software Maintenance

Page 2: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

19/04/23 Dr Andy Brooks 2

Case StudyDæmisaga

ReferenceEffort Estimation for Corrective Software Maintenance,

Andrea De Lucia, Eugenio Pompella, and Silvio Stefanucci,

The Fourteenth International Conference on Software Engineering and Knowledge Engineering (SEKE’02)

pp 409-416, 2002. ©ACM

Page 3: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

1. Introduction

• Effort estimation helps managers:– plan resource and staff allocation– prepare less risky bids for external contracts– make maintain versus buy decisions

• Effort estimation is complicated by:– the different types of software maintenance

• corrective, adaptive, perfective, preventive

– the scope of software maintenance work• simple method fixes through to full reengineering

19/04/23 Dr Andy Brooks 3

Page 4: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

1. Introduction

• Effort estimation requires the use of quantitative metrics.

• Software maintenance costs are mainly human resource costs.– the person-days needed

• A linear or non-linear relationship between complexity/size and effort is “commonly assumed”.

19/04/23 Dr Andy Brooks 4

Page 5: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Estimation by analogy simple fictitious example by Andy

• The following historical data is available:– Project A involved 100 maintenance requests for 110,000 LOC

and took 25 person-days.– Project B involved 105 maintenance requests for 111,000 LOC

and took 28 person-days.– Project C involved 20 maintenance requests for 2,000 LOC and

took 2 person-days.

• Project D will involve 85 maintenance requests on 91,000 LOC so how much effort is required?

• Project A is the closest match so the effort expended for Project A can be used as an estimate for Project D: 25 person-days.

19/04/23 Dr Andy Brooks 5

2. RELATED WORK

Page 6: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Shepperd, M., Schofield, C., and Kitchenham, B. Effort Estimation Using Analogy. Proceedings of the International Conference on Software Engineering (ICSE´96), ©IEEE, 1996, 170-178.

• The first step is deciding on the variables used to describe projects.– “all datasets had at least one variable that was in some sense

size related”

• The second step is deciding on how to determine similarity.– “Analogies are found by measuring Euclidean distance in n-

dimensional space where each dimension corresponds to a variable. Values are standardised so that each dimension contributes equal weight to the process of finding analogies.”

19/04/23 Dr Andy Brooks 6

2. RELATED WORK

ArchANGEL tool here: http://dec.bournemouth.ac.uk/ESERG/ANGEL/

Page 7: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Shepperd, M., Schofield, C., and Kitchenham, B. Effort Estimation Using Analogy. Proceedings of the International Conference on Software Engineering (ICSE´96), ©IEEE, 1996, 170-178.

• “In N dimensions, the Euclidean distance between two points p and q is √(∑i=1

N (pi-qi)²) where pi (or qi) is the coordinate of p (or q) in dimension i.”– http://www.nist.gov/dads/

19/04/23 Dr Andy Brooks 7

2. RELATED WORK

Euclidean distance is: √((x1 - x2)² + (y1 - y2)²)

Manhattan distance is: (x2-x1)+(y2-y1)

(x2,y2)

(x1,y1)

Page 8: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Shepperd, M., Schofield, C., and Kitchenham, B. Effort Estimation Using Analogy. Proceedings of the International Conference on Software Engineering (ICSE´96), ©IEEE, 1996, 170-178.

• The third step is deciding how to use known effort data to derive an effort estimate for the new project.– just use the effort for the closest project?– average the effort for the X closest projects?– average the effort for the X closest projects weighting

by closeness of matching?

• Shepperd et. al. used X = 2 and an unweighted average.

19/04/23 Dr Andy Brooks 8

2. RELATED WORK

Page 9: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Shepperd, M., Schofield, C., and Kitchenham, B. Effort Estimation Using Analogy. Proceedings of the International Conference on Software Engineering (ICSE´96), ©IEEE, 1996, 170-178.

• Effort estimation using analogy was found to outperform traditional algorithmic methods for six different datasets.– later studies, however, did not support this finding

• Shepperd et. al. suggest it is better to use more than one estimation technique, to assess the degree of risk associated with a prediction.– if effort estimation using regression analysis and analogy

strongly disagree, then perhaps any estimation is unsafe– Andy says: in industrial projects, it is unlikely resources are

available to apply more than one technqiue

19/04/23 Dr Andy Brooks 9

2. RELATED WORK

Page 10: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

3. Experimental Setting

• Multiple linear regression analysis was applied to real data from five corrective maintenance projects from different companies.– All five corrective maintenance projects were

outsourced to one supplier company whose maintenance process closely followed the IEEE Standard for Software Maintenance.

• The data set comprised 144 observations corresponding to monthly maintenance periods.

19/04/23 Dr Andy Brooks 10

Page 11: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Treatment of missing values

• If a value is missing one approach is simply to exclude the entire observation [effort, size, NA, MB, NC] from the model building process.– the safest approach

• Another approach is to substitute the mean or the median value calculated from the other observations.

• Yet another approach is to find the most similar observation and use the value found there.– best analogy found by calculating euclidean distances

• Fortunately, the data set did not contain missing values.

19/04/23 Dr Andy Brooks 11

Missing Data Techniques

Page 12: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Data available

• Size of the system.• Effort spent in the maintenance period.• Number of maintenance tasks by

– type A source code modification– type B fixing data misalignments through database queries

• data cleansing

– type C (not A or B) user disoperation, problems out of contract, etc.

• Other metrics such as software complexity were not available in full across all the projects.

19/04/23 Dr Andy Brooks 12

3. Experimental Setting

Page 13: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Table 1: Collected Metrics ©ACM

19/04/23 Dr Andy Brooks 13

3. Experimental Setting

Page 14: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Table 2: Descriptive statistics ©ACM

19/04/23 Dr Andy Brooks 14

3. Experimental Setting

144 observations, monthly maintenance periods1960/(35hrs*4wks) = 14 person months

Page 15: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

4. Building Effort Estimation Models

• Multiple linear regression analysis minimizes the sum of the squared error.

• Regression analysis is said to be “as good as or better than many competing modeling techniques”. – see references [7] and [18] of the case study article which

showed estimation by analogy was not better

• Incorporating the size of a maintenance task would be useful, but this metric was not available.

• Analysis of residuals from the regression analyses revealed no non-linearity or other trends.

19/04/23 Dr Andy Brooks 15

Page 16: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Dealing with outliers• If a value is deemed to be an

outlier, one approach is to exclude the entire observation.– outliers can be caused by

transcription errors

• In a box plot, the box contains 50% of the data set.– the interquartile range (IQR)

• 1.5* IQR away from the box, a value is a suspected outlier

• 3.0*IQR away from the box, a value is deemed an outlier

19/04/23 Dr Andy Brooks 16

http://www.physics.csbsju.edu/stats/box2.html

There were no obvious outliers in the data set.

outlier/enfari

Page 17: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Table 3: Metrics correlation matrix ©ACM

• There are no strong correlations between the independent variables used to build the regression models.

• N (total number) correlates less well with NA possibly because NA is much smaller than NB and NC.

• No explanation is given for the correlation r = 0.6458.

19/04/23 Dr Andy Brooks 17strong usually means r > 0.7

correlation matrix/fylgnifylki

Page 18: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Critical commentary from Andy

• Regression models are built assuming that model variables are independent.– so it is important to carry out checks e.g. examine correlations

• We do not know the nature of the correlation coefficient used. Pearson is applied to normally distributed data and Spearman to non-normally distributed data. – sometimes researchers compute both to be sure

• There are some large differences between means and medians in Table 2 which suggests non-normality.– Spearman correlation coefficients should have been calculated

• The correlation of 0.6458 suggests a real linkage between NA and NC i.e. they may not be independent.

19/04/23 Dr Andy Brooks 18

Page 19: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Some plots illustrating correlations of various sizes

19/04/23 Dr Andy Brooks 19

http://www.jerrydallal.com/

Page 20: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Effort estimation models A, B, C

• NBC is the sum of NB and NC

19/04/23 Dr Andy Brooks 20

4. Building Effort Estimation Models

recall

Page 21: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

4.1 Evaluating Model Performances

• The coefficient of determination R2 represents the percentage of variation in the dependent variable explained by the independent variables of the model.

• Having a high R2 does not guarantee the quality of future predictions. – R2 does not represent the performance of the

model on a different data set, only the data set upon which the model was built.

19/04/23 Dr Andy Brooks 21

Page 22: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Table 4 Model parameters ©ACM

• All model variables are statistically significant (p > 0,05).• Model C explains 90% of the variation in effort.

19/04/23 Dr Andy Brooks 22

Page 23: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Assessing the quality of future predictionsPRESS (PREdiction Sum of Squares)

• ŷ (y-hat) means predicted value.

• The residual represents the difference between the ith value in the data set and the value predicted from a regression analysis using all data points except the ith.

• In a data set of size n, n separate regression equations are calculated.

• Smaller PRESS scores are better.

• PRESS is also known as “leave-one-out cross validation”.

19/04/23 Dr Andy Brooks 23

4.1 Evaluating Model Performances

Page 24: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Assessing the quality of future predictionsSPR

• ŷ (y-hat) means predicted value.• The residual represents the difference between the ith

value in the data set and the value predicted from a regression analysis using all data points except the ith.

• SPR is the sum of the absolute values rather than the squares of the PRESS residuals.

• SPR is used when a few large PRESS residuals can inflate the PRESS score unreasonably.

19/04/23 Dr Andy Brooks 24

4.1 Evaluating Model Performances

Page 25: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Assessing the quality of future predictionsMMRE (Mean Magnitude Relative Error)

• MREi is the magnitude of the relative error.

• ŷ (y-hat) means predicted value.• The residual represents the difference between the ith

value in the data set and the value predicted from a regression analysis using all data points except the ith.

• MMRE is the mean magnitude.• MdMRE is the median magnitude. MMRE might be

dominated by a few MREs with very high values.19/04/23 Dr Andy Brooks 25

4.1 Evaluating Model Performances

Page 26: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Assessing the quality of future predictionsPRED

• RE is the relative error.• “We believe that maintenance managers may, in

most cases and specially for small maintenance tasks, accept a relative error between the actual and predicted effort of about 50%.”

• According to reference [36] (1991) of the case study article, an average error of 100% can be considered “good” and an average error of 32% “outstanding”.

19/04/23 Dr Andy Brooks 26

4.1 Evaluating Model Performances

Page 27: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Table 5: Leave-one-out cross validation ©ACM

• Model C is clearly better.– Almost 50% of cases have a relative error of less than 25%.– Almost 83% of cases have a relative error of less than 50%.

19/04/23 Dr Andy Brooks 27

4.1 Evaluating Model Performances

Page 28: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Leave More Out Cross Validation for Model C

• The data set is randomly partioned into a training data set and a test set.

• The training data set is used to build the model.• The test data set is used to assess the quality of

the model´s prediction.

• Lx means the training (learning) data set is composed of x% of the observations.

• T100-x means the test data set is composed of 100-x% of the observations.

19/04/23 Dr Andy Brooks 28

extending the evaluation of Model C

Page 29: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Table 6: Leave more out cross validation with random partitions

• As the size of the learning set decreases, so does the quality of prediction, as expected.

19/04/23 Dr Andy Brooks 29

Model C

Page 30: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Critical commentary from Andy

• It is not stated how many partitions were used to establish each of the average values in Table 6.• a minimum sample size of 10 is usually required to

compute an average with reasonable accuracy

• The trend in Table 6 makes sense, but it is difficult to believe the PRED values in Table 6 for L90-T10.

• PRED50 = 100% yet PRED50 is only 82.64% when all the data except one observation is used for training.– The authors should have addressed what appears to be an

anomalous result.

19/04/23 Dr Andy Brooks 30

Model C

Page 31: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Table 7 Cross validation at a project level

• Column P1 represents training with P2, P3, P4, and P5 and the results of testing on project P1.– the regression analysis has “no knowledge” of project P1

19/04/23 Dr Andy Brooks 31

Model C5 projects

Page 32: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

Cross validation at the project level

• Predictive performance is poor when using projects P1 and P3 as test sets.

• Project P1 had no maintenance tasks of type B and this might explain the poor predictive performance of a model which actually has NB as a predictor variable.

• No explanation is provided for the poor predictive performance using P3 as a test set.

19/04/23 Dr Andy Brooks 32

Model C

Page 33: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

5. Conclusion• Previously, the supplier company (i.e. the company doing

the maintenance) had used a prediction model which did not distinguish between different types of maintenance task.

• PRED values for this earlier prediction model were not very satisfactory.– PRED25 = 33.33%

– PRED50 = 53.47%

• The authors believed that modelling different types of maintenance task (A, B, and C) would improve prediction, which it did, especially for Model C.– PRED25 = 49.31%

– PRED50 = 82.64%

19/04/23 Dr Andy Brooks 33leave-out-one PREDs

Page 34: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

5. Conclusion• More complicated prediction models could be

built, but the authors chose not to, so that the models could be easily calculated by working engineers and managers.

• Effort estimation can also involve estimating values for the independent variables.– estimating the number and type of maintenance tasks

“ex ante” in a forthcoming maintenance period can be done reasonably accurately from historical data

• more complicated models involve more variables to estimate , making it more difficult to predict forthcoming effort

19/04/23 Dr Andy Brooks 34

Page 35: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

5. Conclusion

• The greatest weakness of using regression models for effort estimation is that they only apply to the “analyzed domain and technological environment”.– i.e. The prediction models are company specific and you cannot

apply the values determined for the model coefficients in another company setting.

• Andy says: This is a likely explanation for the “cross validation at the project level” results. Projects were from different companies so it is perhaps not surprising that trying to predict for one company using data for other companies sometimes did not work well.– P1 and P3 in Table 7

19/04/23 Dr Andy Brooks 35

Page 36: 12/07/2015Dr Andy Brooks1 MSc Software Maintenance MS Viðhald Hugbúnaðar Fyrirlestrar 43 og 44 Estimating Effort for Corrective Software Maintenance

5. Conclusion

• The models presented were adopted by the supplier company providing maintenance services.

19/04/23 Dr Andy Brooks 36