Upload
gregoryg
View
1.312
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm - PROMISE 2008
Citation preview
IMPROVING ANALOGY SOFTWARE EFFORT ESTIMATION USING FUZZY
FEATURE SUBSET SELECTION ALGORITHM
Mohammad Azzeh [email protected]. Daniel Neagu [email protected]. Peter Cowling [email protected] Department, School of informaticsUniversity of Bradford
1
Promise’08
Agenda
Motivation The Problem. The Proposed Solution. Results. Conclusions.
2
Motivation
Estimation by Analogy (EA) is the common technique for Software Cost Estimation.
The user may be willing to accept such kind of estimation because it mimics human problem solving.
Based on actual project data and experience.
It can model the complex relationship.
3
The Problem
The quality of data is important issue in analogy software estimation as being precondition to obtain quality knowledge and accurate estimation.
Typically data sets are not collected with a particular prediction task in mind. [Kirsopp & Shepperd, 2002]1
This estimation approach is sensitive to Incomplete and noisy data Irrelevant and misleading features.
4
Feature Subset Selection (FSS)
Benefits of FFS: Reducing time of training and utilization, Improving data understanding and
visualization, and Reducing data dimensionality.
5
DatasetD(K x M)
FSSDatasetD(K x N)Where N<M
DatasetD(K x N)Where N<M
FSS in Software estimation
Existing searching techniques: Wrappers: use machine learning algorithms
[Kirsopp & Shepperd, 2002]
Exhaustive search Random search. Hill Climbing. Forward selection Backward selection
Filters: use statistical approaches [Briand et al, 2000]
Fitness criteria of Wrappers is often MMRE.
6
The proposed Solution
The algorithm is designed based on Fuzzy c-Means and Fuzzy Logic.
Selecting the optimal feature subset based on the similarity between fuzzy clusters.
The feature set the presents a smaller similarity degree has the potential to deliver accurate estimation. It reflects the data structure. It combines both numerical and categorical
data.
7
The proposed Solution8
Using Fuzzy c-Means
Using Partition matrix and clusters centres
Data set with M features
Select a subset feature with N dimension
The Proposed Solution
Definition 1. Similarity between two clusters for given m-dimensional features:
is normalized weighting factor representing the importance of some features among others
Definition 2. Overall similarity between all clusters in a feature subset Si is given as:
9
)),(*(max=),(1=
ylxlFl
M
lyx CCSMWCCSM
l
∑1=
1=M
llW
)),((min=E1=
si yx
N
jCCSM
FFSS algorithm
Input:
D(F1,F2,...FM) //input dataset (NxM)Out //Output Dataset(Nx1)Output:
Dbest //feature subset of high predictive features. beginDo:
Step1:Select feature subset to be searched Si.Step2:Fuzzify the feature subset (Si).Step3:For feature subset Si, assess similarity degree between all pairs of
clusters (i.e. fuzzy sets) in all features in Si.
Until all feature subsets are searched.
Step4:Evaluate each feature subset Si. using Esi
Step5: Evaluation: best feature subset is one with minimum Esi. End;
10
Empirical validation
We built analogy estimation model for each FSS algorithm, where Euclidian distance measure was the similarity measure.
Validation strategy: 10-Fold CV Evaluation criteria: Mean Magnitude of
Relative Errors (MMRE), Median MRE (MdMRE), Performance indicator (Pred(25%))Dataset Number of
featureNumber of Projects
ISBSG (release 10)
14 400
Desharnais 10 77
11
Kirsopp & Shepperd said:
“It is difficult to compare algorithms used classification problems with algorithms deals with prediction problems so the measures of accuracy used are different”
12
Results...ISBSG
Algorithm used in analogy model
One analogy mean of 2 analogies
mean of 3 analogies
All features 37.74% 41.0% 29.4%Exhaustive search, Hill climbing, Random search
28.25% 30.3% 30.2%
Forward subset selection
33.3% 30.4% 31.2%
Backward subset selection
34.7% 38% 34.4%
FFSS 28.7% 30.6% 32.2%
13
Results...ISBSG
Algorithm used in analogy model
One analogy
mean of 2 analogies
mean of 3 analogies
All features 31.6% 33% 20.62%
Exhaustive search. Hill climbing, Random search
21.9% 24% 20.8%
Forward subset selection
21.0% 21.4% 20.7%
Backward subset selection
22.6% 28.7% 25.2%
FFSS 21.8% 22.3% 22.7%
14
Results...Desharnais
Algorithm used in analogy model
One analogy
mean of 2 analogies
mean of 3 analogies
All features 60.1% 51.5% 50.0%
Exhaustive search, Forward subset selection, Hill climbing, Random search
38.2% 39.4% 36.4%
Backward subset selection
42.4% 43.9% 46.6%
FFSS 40.2% 40.3% 38.5%
15
Results...Desharnais
Algorithm used in analogy model
One analogy
mean of 2 analogies
mean of 3 analogies
All features 41.7% 41.0% 36.1%
Exhaustive search, Forward subset selection Hill climbing, Random search
30.8% 38.0% 30.9%
Backward subset selection
38.4% 37.4% 34.6%
FFSS 32.4% 33.3% 31.7%
16
Conclusions
The fuzzy feature subset selection has a significant impact on accuracy of EA.
Our FFSS algorithm produces comparable results with exhaustive search, Hill climbing and forward selection.
It reduces uncertainty when categorical data is involved.
17
Conclusions…cont18
Which FSS is suitable
?-Fuzzy Feature subset selection-Forward feature selection, -Backward features selection
Exhaustive search and Hill
Climbing
Accuracy only
Less time, and quit reasonable AccuracyAnd data set is large
Dataset size
Random Hill Climbing
Small Large
Threats to experiment validity Selecting ISBSG representative data. MMRE has been used as fitness criteria
for all feature selection algorithms except our FFSS.
Number of projects. Outliers and extreme values.
19
References
Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modelling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000.
Kirsopp, C., Shepperd, M. 2002. Case and Feature Subset Selection in Case-Based Software Project Effort Prediction, Proc. 22nd SGAI Int’l Conf. Knowledge-Based Systems and Applied Artificial Intelligence.
20
Questions
21