Upload
cs-ncstate
View
2.679
Download
0
Embed Size (px)
DESCRIPTION
Promise 2011:"A Principled Evaluation of Ensembles of Learning Machines for Software Effort Estimation"Leandro Minku and Xin Yao.
Citation preview
A Principled Evaluation of Ensembles of Learning
Machines for Software Effort Estimation
Leandro Minku, Xin Yao{L.L.Minku,X.Yao}@cs.bham.ac.uk
CERCIA, School of Computer Science, The University of Birmingham
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 1 / 22
Outline
Introduction (Background and Motivation)
Research Questions (Aims)
Experiments (Method and Results)
Answers to Research Questions (Conclusions)
Future Work
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 2 / 22
Introduction
Software cost estimation:
Set of techniques and procedures that an organisation uses toarrive at an estimate.
Major contributing factor is effort (in person-hours,person-month, etc).
Overestimation vs. underestimation.
Several software cost/effort estimation models have been proposed.
ML models have been receiving increased attention:
They make no or minimal assumptions about the data and thefunction being modelled.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 3 / 22
Introduction
Ensembles of Learning Machines are groups of learning machinestrained to perform the same task and combined with the aim ofimproving predictive performance.
Studies comparing ensembles against single learners in softwareeffort estimation are contradictory:
Braga et al IJCNN’07 claims that Bagging improves a biteffort estimations produced by single learners.
Kultur et al KBS’09 claims that an adapted Bagging provideslarge improvements.
Kocaguneli et al ISSRE’09 claims that combining different
learners does not improve effort estimations.
These studies either miss statistical tests or do not present theparameters choice. None of them analyse the reason for theachieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
Introduction
Ensembles of Learning Machines are groups of learning machinestrained to perform the same task and combined with the aim ofimproving predictive performance.
Studies comparing ensembles against single learners in softwareeffort estimation are contradictory:
Braga et al IJCNN’07 claims that Bagging improves a biteffort estimations produced by single learners.
Kultur et al KBS’09 claims that an adapted Bagging provideslarge improvements.
Kocaguneli et al ISSRE’09 claims that combining different
learners does not improve effort estimations.
These studies either miss statistical tests or do not present theparameters choice. None of them analyse the reason for theachieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
Introduction
Ensembles of Learning Machines are groups of learning machinestrained to perform the same task and combined with the aim ofimproving predictive performance.
Studies comparing ensembles against single learners in softwareeffort estimation are contradictory:
Braga et al IJCNN’07 claims that Bagging improves a biteffort estimations produced by single learners.
Kultur et al KBS’09 claims that an adapted Bagging provideslarge improvements.
Kocaguneli et al ISSRE’09 claims that combining different
learners does not improve effort estimations.
These studies either miss statistical tests or do not present theparameters choice. None of them analyse the reason for theachieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
Introduction
Ensembles of Learning Machines are groups of learning machinestrained to perform the same task and combined with the aim ofimproving predictive performance.
Studies comparing ensembles against single learners in softwareeffort estimation are contradictory:
Braga et al IJCNN’07 claims that Bagging improves a biteffort estimations produced by single learners.
Kultur et al KBS’09 claims that an adapted Bagging provideslarge improvements.
Kocaguneli et al ISSRE’09 claims that combining different
learners does not improve effort estimations.
These studies either miss statistical tests or do not present theparameters choice. None of them analyse the reason for theachieved results.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 4 / 22
Research Questions
Question 1
Do readily available ensemble methods generally improve effortestimations given by single learners? Which of them would bemore useful?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
Research Questions
Question 1
Do readily available ensemble methods generally improve effortestimations given by single learners? Which of them would bemore useful?
The current studies are contradictory.
They either do not perform statistical comparisons or do notexplain the parameters choice.
It would be worth to investigate the use of different ensembleapproaches.
We build upon current work by considering these points.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
Research Questions
Question 1
Do readily available ensemble methods generally improve effortestimations given by single learners? Which of them would bemore useful?
Question 2
If a particular method is singled out, what insight on how toimprove effort estimations can we gain by analysing its behaviourand the reasons for its better performance?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
Research Questions
Question 1
Do readily available ensemble methods generally improve effortestimations given by single learners? Which of them would bemore useful?
Question 2
If a particular method is singled out, what insight on how toimprove effort estimations can we gain by analysing its behaviourand the reasons for its better performance?
Principled experiments, not just intuition or speculations.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
Research Questions
Question 1
Do readily available ensemble methods generally improve effortestimations given by single learners? Which of them would bemore useful?
Question 2
If a particular method is singled out, what insight on how toimprove effort estimations can we gain by analysing its behaviourand the reasons for its better performance?
Question 3
How can someone determine what model to be used considering aparticular data set?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
Research Questions
Question 1
Do readily available ensemble methods generally improve effortestimations given by single learners? Which of them would bemore useful?
Question 2
If a particular method is singled out, what insight on how toimprove effort estimations can we gain by analysing its behaviourand the reasons for its better performance?
Question 3
How can someone determine what model to be used considering aparticular data set?
Our study complements previous work, parameters choice isimportant.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 5 / 22
Data Sets and Preprocessing
Data sets: cocomo81, nasa93, nasa, cocomo2, desharnais, 7ISBSG organization type subsets.
Cover a wide range of features.In particular, ISBSG subsets’ productivity rate is statisticallydifferent.
Attributes: cocomo attributes for PROMISE data, functionalsize, development type and language type for ISBSG.
Missing values: delete for PROMISE, k-NN imputation forISBSG.
Outliers: K-means detection / elimination.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 6 / 22
Experimental Framework – Step 1: choice of learning
machines
Single learners:
MultiLayer Perceptrons (MLPs) – universal approximators;Radial Basis Function networks (RBFs) – local learning; andRegression Trees (RTs) – simple and comprehensive.
Ensemble learners:
Bagging with MLPs, with RBFs and with RTs – widely andsuccessfully used;Random with MLPs – use full training set for each learner; andNegative Correlation Learning (NCL) with MLPs – regression.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 7 / 22
Experimental Framework – Step 2: choice of evaluation
method
Executions were done in 30 rounds, 10 projects for testing andremaining for training, as suggested by Menzies et al. TSE’06.
Evaluation was done in two steps:1 Menzies et al. TSE’06’s survival rejection rules:
If MMREs are significantly different according to a pairedt-test with 95% of confidence, the best model is the one withthe lowest average MMRE.If not, the best method is the one with the best:
1 Correlation2 Standard deviation3 PRED(N)4 Number of attributes
2 Wilcoxon tests with 95% of confidence to compare the twomethods more often among the best in terms of MMRE andPRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 8 / 22
Experimental Framework – Step 2: choice of evaluation
method
Executions were done in 30 rounds, 10 projects for testing andremaining for training, as suggested by Menzies et al. TSE’06.
Evaluation was done in two steps:1 Menzies et al. TSE’06’s survival rejection rules:
If MMREs are significantly different according to a pairedt-test with 95% of confidence, the best model is the one withthe lowest average MMRE.If not, the best method is the one with the best:
1 Correlation2 Standard deviation3 PRED(N)4 Number of attributes
2 Wilcoxon tests with 95% of confidence to compare the twomethods more often among the best in terms of MMRE andPRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 8 / 22
Experimental Framework – Step 2: choice of evaluation
method
Executions were done in 30 rounds, 10 projects for testing andremaining for training, as suggested by Menzies et al. TSE’06.
Evaluation was done in two steps:1 Menzies et al. TSE’06’s survival rejection rules:
If MMREs are significantly different according to a pairedt-test with 95% of confidence, the best model is the one withthe lowest average MMRE.If not, the best method is the one with the best:
1 Correlation2 Standard deviation3 PRED(N)4 Number of attributes
2 Wilcoxon tests with 95% of confidence to compare the twomethods more often among the best in terms of MMRE andPRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 8 / 22
Experimental Framework – Step 2: choice of evaluation
method
Mean Magnitude of the Relative ErrorMMRE = 1
T
∑Ti=1MREi, where MREi =
|predictedi−actuali|actuali
Percentage of estimations within N% of the actual values
PRED(N) = 1T
∑Ti=1
{
1, ifMREi ≤ N100
0, otherwise
Correlation between estimated and actual effort:CORR =
Spa√SpSa
, where
Spa =∑T
i=1(predictedi−p)(actuali−a)
T−1
Sp =∑T
i=1(predictedi−p)2
T−1 , Sa =∑T
i=1(actuali−a)2
T−1 ,
p =∑T
i=1predictedi
T, a =
∑Ti=1
actualiT
.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 9 / 22
Experimental Framework – Step 2: choice of evaluation
method
Mean Magnitude of the Relative ErrorMMRE = 1
T
∑Ti=1MREi, where MREi =
|predictedi−actuali|actuali
Percentage of estimations within N% of the actual values
PRED(N) = 1T
∑Ti=1
{
1, ifMREi ≤ N100
0, otherwise
Correlation between estimated and actual effort:CORR =
Spa√SpSa
, where
Spa =∑T
i=1(predictedi−p)(actuali−a)
T−1
Sp =∑T
i=1(predictedi−p)2
T−1 , Sa =∑T
i=1(actuali−a)2
T−1 ,
p =∑T
i=1predictedi
T, a =
∑Ti=1
actualiT
.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 9 / 22
Experimental Framework – Step 2: choice of evaluation
method
Mean Magnitude of the Relative ErrorMMRE = 1
T
∑Ti=1MREi, where MREi =
|predictedi−actuali|actuali
Percentage of estimations within N% of the actual values
PRED(N) = 1T
∑Ti=1
{
1, ifMREi ≤ N100
0, otherwise
Correlation between estimated and actual effort:CORR =
Spa√SpSa
, where
Spa =∑T
i=1(predictedi−p)(actuali−a)
T−1
Sp =∑T
i=1(predictedi−p)2
T−1 , Sa =∑T
i=1(actuali−a)2
T−1 ,
p =∑T
i=1predictedi
T, a =
∑Ti=1
actualiT
.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 9 / 22
Experimental Framework – Step 3: choice of parameters
Preliminary experiments using 5 runs.
Each approach was run with all the combinations of 3 or 5parameter values.
Parameters with the lowest MMRE were chosen for further 30runs.
Base learners will not necessarily have the same parameters assingle learners.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 10 / 22
Comparison of Learning Machines – Menzies et al.
TSE’06’s survival rejection rules
Table: Number of Data Sets in which Each Method Survived. Methodsthat never survived are omitted.
PROMISE Data ISBSG Data All DataRT: 2 MLP: 2 RT: 3Bag + MLP: 1 Bag + RTs: 2 Bag + MLP: 2NCL + MLP: 1 Bag + MLP: 1 NCL + MLP: 2Rand + MLP: 1 RT: 1 Bag + RTs: 2
Bag + RBF: 1 MLP: 2NCL + MLP: 1 Rand + MLP: 1
Bag + RBF: 1
No approach is consistently the best, even consideringensembles!
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 11 / 22
Comparison of Learning Machines – Menzies et al.
TSE’06’s survival rejection rules
Table: Number of Data Sets in which Each Method Survived. Methodsthat never survived are omitted.
PROMISE Data ISBSG Data All DataRT: 2 MLP: 2 RT: 3Bag + MLP: 1 Bag + RTs: 2 Bag + MLP: 2NCL + MLP: 1 Bag + MLP: 1 NCL + MLP: 2Rand + MLP: 1 RT: 1 Bag + RTs: 2
Bag + RBF: 1 MLP: 2NCL + MLP: 1 Rand + MLP: 1
Bag + RBF: 1
No approach is consistently the best, even consideringensembles!
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 11 / 22
Comparison of Learning Machines
What methods are usually amongthe best?
Table: Number of Data Sets in which Each MethodWas Ranked First or Second According to MMRE andPRED(25). Methods never among the first and secondare omitted.
(a) Accoding to MMRE
PROMISE Data ISBSG Data All DataRT: 4 RT: 5 RT: 9Bag + MLP: 3 Bag + MLP 5 Bag + MLP: 8Bag + RT: 2 Bag + RBF: 3 Bag + RBF: 3MLP: 1 MLP: 1 MLP: 2
Rand + MLP: 1 Bag + RT: 2NCL + MLP: 1 Rand + MLP: 1
NCL + MLP: 1
(b) Acording to PRED(25)
PROMISE Data ISBSG Data All DataBag + MLP: 3 RT: 5 RT: 6Rand + MLP: 3 Rand + MLP: 3 Rand + MLP: 6Bag + RT: 2 Bag + MLP: 2 Bag + MLP: 5RT: 1 MLP: 2 Bag + RT: 3MLP: 1 RBF: 2 MLP: 3
Bag + RBF: 1 RBF: 2Bag + RT: 1 Bag + RBF: 1
RTs and bag+MLPs are morefrequently among the bestconsidering MMRE thanconsidering PRED(25).
The first ranked method’sMMRE is statistically differentfrom the others in 35.16% ofthe cases.
The second ranked method’sMMRE is statistically differentfrom the lower ranked methodsin 16.67% of the cases.
RTs and bag+MLPs areusually statistically equal interms of MMRE andPRED(25).
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 12 / 22
Research Questions – Revisited
Question 1
Do readily available ensemble methods generally improve effortestimations given by single learners? Which of them would bemore useful?
Even though bag+MLPs is frequently among the bestmethods, it is statistically similar to RTs.
RTs are more comprehensive and have faster training.
Bag+MLPs seem to have more potential for improvements.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 13 / 22
Why Were RTs Singled Out?
Hypothesis: As RTs have splits based on information gain,they may work in such a way to give more importance formore relevant attributes.
A further study using correlation-based feature selectionrevealed that RTs usually put higher features higher ranked bythe feature selection method in higher level splits of the tree.
Feature selection by itself was not able to always improveaccuracy.
It may be important to give weights to features when using MLapproaches.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 14 / 22
Why Were RTs Singled Out?
Table: Correlation-Based Feature Selection and RT Attributes RelativeImportance for Cocomo81.
Attributes ranking First tree level in which the attribute Percentage ofappears in more than 50% of the trees trees
LOC Level 0 100.00%
Development modeRequired software reliability Level 1 90.00%
Modern programing practicesTime constraint for cpu Level 2 73.33%
Data base size Level 2 83.34%
Main memory constraintTurnaround timeProgrammers capabilityAnalysts capabilityLanguage experienceVirtual machine experienceSchedule constraintApplication experience Level 2 66.67%
Use of software toolsMachine volatility
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 15 / 22
Why Were Bag+MLPs Singled Out
Hypothesis: bag+MLPs may have lead to a more adequatelevel of diversity.
If we use correlation as the diversity measure, we can see thatbag+MLPs usually had more moderate values when it was the1st or 2nd ranked MMRE method.
However, the correlation between diversity and MMRE wasusually quite low.
Table: Correlation Considering Data Sets in whichBag+MLPs Were Ranked 1st or 2nd.
Approach Correlation intervalacross different data sets
Bag+MLP 0.74-0.92Bag+RBF 0.40-0.83Bag+RT 0.51-0.81NCL+MLP 0.59-1.00Rand+MLP 0.93-1.00
Table: Correlation Considering All Data Sets.
Approach Correlation intervalacross different data sets
Bag+MLP 0.47-0.98Bag+RBF 0.40-0.83Bag+RT 0.37-0.88NCL+MLP 0.59-1.00Rand+MLP 0.93-1.00
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 16 / 22
Taking a Closer Look...
Table: Correlations between ensemble covariance (diversity) andtrain/test MMRE for the data sets in which bag+MLP obtained the bestMMREs and was ranked 1st or 2nd against the data sets in which itobtained the worst MMREs.
Cov. vs Cov. vsTest MMRE Train MMRE
Best MMRE (desharnais) 0.24 0.142nd best MMRE (org2) 0.70 0.382nd worst MMRE (org7) -0.42 -0.37Worst MMRE (cocomo2) -0.99 -0.99
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 17 / 22
Taking a Closer Look...
Table: Correlations between ensemble covariance (diversity) andtrain/test MMRE for the data sets in which bag+MLP obtained the bestMMREs and was ranked 1st or 2nd against the data sets in which itobtained the worst MMREs.
Cov. vs Cov. vsTest MMRE Train MMRE
Best MMRE (desharnais) 0.24 0.142nd best MMRE (org2) 0.70 0.382nd worst MMRE (org7) -0.42 -0.37Worst MMRE (cocomo2) -0.99 -0.99
Diversity is not only affected by the ensemble method, but also bythe data set:
Software effort estimation data sets are very different fromeach other.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 17 / 22
Taking a Closer Look...
Table: Correlations between ensemble covariance (diversity) andtrain/test MMRE for the data sets in which bag+MLP obtained the bestMMREs and was ranked 1st or 2nd against the data sets in which itobtained the worst MMREs.
Cov. vs Cov. vsTest MMRE Train MMRE
Best MMRE (desharnais) 0.24 0.142nd best MMRE (org2) 0.70 0.382nd worst MMRE (org7) -0.42 -0.37Worst MMRE (cocomo2) -0.99 -0.99
Correlation between diversity and performance on test set followstendency on train set.
Why do we have a negative correlation in the worst cases?
Could a method that self-adapts diversity help to improveestimations? How?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 17 / 22
Research Questions – Revisited
Question 2
If a particular method is singled out, what insight on how toimprove effort estimations can we gain by analysing its behaviourand the reasons for its better performance?
RTs give more importance to more important features.Weighting attributes may be helpful when using ML forsoftware effort estimation.
Ensembles seem to have more room for improvement forsoftware effort estimation.
A method to self-adapt diversity might help to improveestimations.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 18 / 22
Research Questions – Revisited
Question 3
How can someone determine what model to be used considering aparticular data set?
Effort estimation data sets affect dramatically the behaviourand performance of different learning machines, evenconsidering ensembles.
So, it would be necessary to run experiments (parameterschoice is important) using existing data from a particularcompany to determine what method is likely to be the best.
If the software manager does not have enough knowledge ofthe models, RTs are a good choice.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 19 / 22
Risk Analysis
The learning machines singled out (RTs and bagging+MLPs) werefurther tested using the outlier projects.
MMRE similar or lower (better), usually better than foroutliers-free data sets.
PRED(25) similar or lower (worse), usually lower.
Even though outliers are projects to which the learning machineshave more difficulties in predicting within 25% of the actual effort,they are not the projects to which they give the worst estimates.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 20 / 22
Risk Analysis
The learning machines singled out (RTs and bagging+MLPs) werefurther tested using the outlier projects.
MMRE similar or lower (better), usually better than foroutliers-free data sets.
PRED(25) similar or lower (worse), usually lower.
Even though outliers are projects to which the learning machineshave more difficulties in predicting within 25% of the actual effort,they are not the projects to which they give the worst estimates.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 20 / 22
Conclusions and Future Work
RQ1 – readily available ensembles do not provide generallybetter effort estimations.
Principled experiments (parameters, statistical analysis, severaldata sets, more ensemble approaches) to deal with validityissues.
RQ2 – RTs + weighting features; bagging with MLPs + selfadapting diversity.
Insight based on experiments, not just intuition or speculation.
RQ3 – principled experiments to choose model, RTs if noresources.
No universally good model, even when using ensembles;parameters choice in framework.
Future work:
Learning feature weights in ML for effort estimation.Can we use self-tuning diversity in ensembles of learningmachines to improve estimations?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
Conclusions and Future Work
RQ1 – readily available ensembles do not provide generallybetter effort estimations.
Principled experiments (parameters, statistical analysis, severaldata sets, more ensemble approaches) to deal with validityissues.
RQ2 – RTs + weighting features; bagging with MLPs + selfadapting diversity.
Insight based on experiments, not just intuition or speculation.
RQ3 – principled experiments to choose model, RTs if noresources.
No universally good model, even when using ensembles;parameters choice in framework.
Future work:
Learning feature weights in ML for effort estimation.Can we use self-tuning diversity in ensembles of learningmachines to improve estimations?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
Conclusions and Future Work
RQ1 – readily available ensembles do not provide generallybetter effort estimations.
Principled experiments (parameters, statistical analysis, severaldata sets, more ensemble approaches) to deal with validityissues.
RQ2 – RTs + weighting features; bagging with MLPs + selfadapting diversity.
Insight based on experiments, not just intuition or speculation.
RQ3 – principled experiments to choose model, RTs if noresources.
No universally good model, even when using ensembles;parameters choice in framework.
Future work:
Learning feature weights in ML for effort estimation.Can we use self-tuning diversity in ensembles of learningmachines to improve estimations?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
Conclusions and Future Work
RQ1 – readily available ensembles do not provide generallybetter effort estimations.
Principled experiments (parameters, statistical analysis, severaldata sets, more ensemble approaches) to deal with validityissues.
RQ2 – RTs + weighting features; bagging with MLPs + selfadapting diversity.
Insight based on experiments, not just intuition or speculation.
RQ3 – principled experiments to choose model, RTs if noresources.
No universally good model, even when using ensembles;parameters choice in framework.
Future work:
Learning feature weights in ML for effort estimation.Can we use self-tuning diversity in ensembles of learningmachines to improve estimations?
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 21 / 22
Acknowledgements
Search Based Software Engineering (SEBASE) research group.
Dr. Rami Bahsoon.
This work was funded by EPSRC grant No. EP/D052785/1.
Leandro Minku, Xin Yao {L.L.Minku,X.Yao}@cs.bham.ac.uk Ensembles for Software Effort Estimation 22 / 22