21
IMPROVING ANALOGY SOFTWARE EFFORT ESTIMATION USING FUZZY FEATURE SUBSET SELECTION ALGORITHM Mohammad Azzeh [email protected] Dr. Daniel Neagu [email protected] Prof. Peter Cowling [email protected] Computing Department, School of informatics University of Bradford 1 Promise’ 08

Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Embed Size (px)

DESCRIPTION

Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm - PROMISE 2008

Citation preview

Page 1: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

IMPROVING ANALOGY SOFTWARE EFFORT ESTIMATION USING FUZZY

FEATURE SUBSET SELECTION ALGORITHM

Mohammad Azzeh [email protected]. Daniel Neagu [email protected]. Peter Cowling [email protected] Department, School of informaticsUniversity of Bradford

1

Promise’08

Page 2: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Agenda

Motivation The Problem. The Proposed Solution. Results. Conclusions.

2

Page 3: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Motivation

Estimation by Analogy (EA) is the common technique for Software Cost Estimation.

The user may be willing to accept such kind of estimation because it mimics human problem solving.

Based on actual project data and experience.

It can model the complex relationship.

3

Page 4: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

The Problem

The quality of data is important issue in analogy software estimation as being precondition to obtain quality knowledge and accurate estimation.

Typically data sets are not collected with a particular prediction task in mind. [Kirsopp & Shepperd, 2002]1

This estimation approach is sensitive to Incomplete and noisy data Irrelevant and misleading features.

4

Page 5: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Feature Subset Selection (FSS)

Benefits of FFS: Reducing time of training and utilization, Improving data understanding and

visualization, and Reducing data dimensionality.

5

DatasetD(K x M)

FSSDatasetD(K x N)Where N<M

DatasetD(K x N)Where N<M

Page 6: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

FSS in Software estimation

Existing searching techniques: Wrappers: use machine learning algorithms

[Kirsopp & Shepperd, 2002]

Exhaustive search Random search. Hill Climbing. Forward selection Backward selection

Filters: use statistical approaches [Briand et al, 2000]

Fitness criteria of Wrappers is often MMRE.

6

Page 7: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

The proposed Solution

The algorithm is designed based on Fuzzy c-Means and Fuzzy Logic.

Selecting the optimal feature subset based on the similarity between fuzzy clusters.

The feature set the presents a smaller similarity degree has the potential to deliver accurate estimation. It reflects the data structure. It combines both numerical and categorical

data.

7

Page 8: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

The proposed Solution8

Using Fuzzy c-Means

Using Partition matrix and clusters centres

Data set with M features

Select a subset feature with N dimension

Page 9: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

The Proposed Solution

Definition 1. Similarity between two clusters for given m-dimensional features:

is normalized weighting factor representing the importance of some features among others

Definition 2. Overall similarity between all clusters in a feature subset Si is given as:

9

)),(*(max=),(1=

ylxlFl

M

lyx CCSMWCCSM

l

∑1=

1=M

llW

)),((min=E1=

si yx

N

jCCSM

Page 10: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

FFSS algorithm

Input:

D(F1,F2,...FM) //input dataset (NxM)Out //Output Dataset(Nx1)Output:

Dbest //feature subset of high predictive features. beginDo:

Step1:Select feature subset to be searched Si.Step2:Fuzzify the feature subset (Si).Step3:For feature subset Si, assess similarity degree between all pairs of

clusters (i.e. fuzzy sets) in all features in Si.

Until all feature subsets are searched.

Step4:Evaluate each feature subset Si. using Esi

Step5: Evaluation: best feature subset is one with minimum Esi. End;

10

Page 11: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Empirical validation

We built analogy estimation model for each FSS algorithm, where Euclidian distance measure was the similarity measure.

Validation strategy: 10-Fold CV Evaluation criteria: Mean Magnitude of

Relative Errors (MMRE), Median MRE (MdMRE), Performance indicator (Pred(25%))Dataset Number of

featureNumber of Projects

ISBSG (release 10)

14 400

Desharnais 10 77

11

Page 12: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Kirsopp & Shepperd said:

“It is difficult to compare algorithms used classification problems with algorithms deals with prediction problems so the measures of accuracy used are different”

12

Page 13: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Results...ISBSG

Algorithm used in analogy model

One analogy mean of 2 analogies

mean of 3 analogies

All features 37.74% 41.0% 29.4%Exhaustive search, Hill climbing, Random search

28.25% 30.3% 30.2%

Forward subset selection

33.3% 30.4% 31.2%

Backward subset selection

34.7% 38% 34.4%

FFSS 28.7% 30.6% 32.2%

13

Page 14: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Results...ISBSG

Algorithm used in analogy model

One analogy

mean of 2 analogies

mean of 3 analogies

All features 31.6% 33% 20.62%

Exhaustive search. Hill climbing, Random search

21.9% 24% 20.8%

Forward subset selection

21.0% 21.4% 20.7%

Backward subset selection

22.6% 28.7% 25.2%

FFSS 21.8% 22.3% 22.7%

14

Page 15: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Results...Desharnais

Algorithm used in analogy model

One analogy

mean of 2 analogies

mean of 3 analogies

All features 60.1% 51.5% 50.0%

Exhaustive search, Forward subset selection, Hill climbing, Random search

38.2% 39.4% 36.4%

Backward subset selection

42.4% 43.9% 46.6%

FFSS 40.2% 40.3% 38.5%

15

Page 16: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Results...Desharnais

Algorithm used in analogy model

One analogy

mean of 2 analogies

mean of 3 analogies

All features 41.7% 41.0% 36.1%

Exhaustive search, Forward subset selection Hill climbing, Random search

30.8% 38.0% 30.9%

Backward subset selection

38.4% 37.4% 34.6%

FFSS 32.4% 33.3% 31.7%

16

Page 17: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Conclusions

The fuzzy feature subset selection has a significant impact on accuracy of EA.

Our FFSS algorithm produces comparable results with exhaustive search, Hill climbing and forward selection.

It reduces uncertainty when categorical data is involved.

17

Page 18: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Conclusions…cont18

Which FSS is suitable

?-Fuzzy Feature subset selection-Forward feature selection, -Backward features selection

Exhaustive search and Hill

Climbing

Accuracy only

Less time, and quit reasonable AccuracyAnd data set is large

Dataset size

Random Hill Climbing

Small Large

Page 19: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Threats to experiment validity Selecting ISBSG representative data. MMRE has been used as fitness criteria

for all feature selection algorithms except our FFSS.

Number of projects. Outliers and extreme values.

19

Page 20: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

References

Briand L, Langley T, Wieczorek I. Using the European Space Agency data set: a replicated assessment and comparison of common software cost modelling techniques, presented at 22nd IEEE Intl. Conf. on Softw. Eng., Limerick, Ireland, 2000.

Kirsopp, C., Shepperd, M. 2002. Case and Feature Subset Selection in Case-Based Software Project Effort Prediction, Proc. 22nd SGAI Int’l Conf. Knowledge-Based Systems and Applied Artificial Intelligence.

20

Page 21: Improving Analogy Software Effort Estimation using Fuzzy Feature Subset Selection Algorithm

Questions

21