25
Assessing the Reliability of a Human Estimator http://nas.cl.uh.edu/boetticher/publications.html The 3 rd International Predictor Models in Software Engineering (PROMISE) Workshop Gary D. Boetticher Nazim Lokhandwala Univ. of Houston - Clear Lake, Houston, TX, USA [email protected] [email protected]

Assessing the Reliability of a Human Estimator

Embed Size (px)

DESCRIPTION

Gary Boetticher and Nzim Lokhandwala

Citation preview

Page 1: Assessing the Reliability of a Human Estimator

Assessing the Reliability of a Human Estimator

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Gary D. Boetticher Nazim Lokhandwala Univ. of Houston - Clear Lake, Houston, TX, [email protected] [email protected]

Page 2: Assessing the Reliability of a Human Estimator

Current Configuration of PROMISE Repository

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Defect Prediction – 18

Others - 9

Effort Estimation - 9

Page 3: Assessing the Reliability of a Human Estimator

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software

Engineering (PROMISE) Workshop

Research vs. Reality according to JörgensenTSE ’07: 300+ software est. papers,

76 journals, 15+ Years

5226197Misc.

4621223Human

7441321ML

2557013748Algorithm

Total00-0489-99-89

68% Algorithm

20% ML12% Human

72%Kitchenham 02100%Hill 0084%Jørgensen 9786%Paynter 9662%Heemstra 9189%Hihn 91

HumanPaper

JSS ’04: Compendium of expert estimation studies

82% Human

18% Formal

Page 4: Assessing the Reliability of a Human Estimator

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software

Engineering (PROMISE) Workshop

Research vs. Reality

How to resolve?

• Researchers coerce/entice/exhort/nudge practitioners

• Practitioners ignore researchers

• Researchers meet practitioners where they are

COCOMO

Page 5: Assessing the Reliability of a Human Estimator

Statement of Problem

How do human demographics affect human-based estimation?

Can predictive models be constructed using human demographics?

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Page 6: Assessing the Reliability of a Human Estimator

PROMISE 2006 Addressed the problem using Genetic Programs and non-linear

regression (up to 5th order) models Produced some accurate(77 – 93%) models, GP solutions got lengthy:

The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

http://nas.cl.uh.edu/boetticher/publications.html

((MgmtGCourses ^ (((Log (((TotLangExp / (TotLangExp / (TechGCourses * HWPMExp))) - (TechGCourses * HWPMExp)) - ((Sin (MgmtGCourses ^ (Sin ((TechGCourses * HWPMExp) - (MgmtGCourses ^ (((Log (HWPMExp ^ (TotLangExp / (TechGCourses * HWPMExp)))) - (Abs (Log ((TotLangExp / (TechGCourses * HWPMExp)) - ((Sin ((Sin (Abs (TechUGCourses / MgmtGCourses))) - (TotLangExp / (MgmtGCourses ^ (((Log (((TotLangExp / (HWPMExp / SWProjEstExp)) - (Sin (TotLangExp / (TotLangExp / ((MgmtGCourses ^ ((Log (TechGCourses * HWPMExp)) - (Sin (Abs (Log ((HWPMExp / SWProjEstExp) - (TechGCourses * HWPMExp))))))) + ((Sin (TechGCourses * HWPMExp)) - (Sin (TechUGCourses / MgmtGCourses)))))))) - (Sin (TechUGCourses / MgmtGCourses)))) - (TechGCourses * HWPMExp)) - (Sin (TechUGCourses / MgmtGCourses))))))) - (HWPMExp / SWProjEstExp)))))) - (Sin (TechUGCourses / MgmtGCourses)))))))) - ((Sin (Abs (Log ((TotLangExp / (TechGCourses * HWPMExp)) - ((Sin ((Sin (Abs (Log (HWPMExp ^ (TotLangExp / (TechGCourses * HWPMExp)))))) - (TechGCourses * HWPMExp))) - (HWPMExp / SWProjEstExp)))))) - (Sin (TechUGCourses / MgmtGCourses)))))) - (TotLangExp / (TechGCourses * HWPMExp))) - (Sin (TechUGCourses / MgmtGCourses)))) + (TotLangExp / (TechGCourses * HWPMExp)))

So for 2007…

Page 7: Assessing the Reliability of a Human Estimator

The 3rd International Predictor Models in Software

Engineering (PROMISE) Workshophttp://nas.cl.uh.edu/boetticher/publications.html

PROMISE 2007

• Larger sample set.• 2006 PROMISE 122 samples• 2007 PROMISE 178 samples

• Many learners. • 51 classifiers, 4142 experimental trials

• Attribute analysis.

• Simpler models. • Focus is on classifiers Human readable models

Page 8: Assessing the Reliability of a Human Estimator

Strategy

2. Create a Web-based survey

Users demographics

Users Estimate software components

Feedback Users

3. Build models: demographics estimates

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Page 9: Assessing the Reliability of a Human Estimator

The Survey (2001 -2005)

http://nas.cl.uh.edu/boetticher/EffortEstimationSurvey.html

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Demographics Personal Academic Background Work Experience Domain Experience

Page 10: Assessing the Reliability of a Human Estimator

Ecommerce: Competitive Procurement

Buyer Admin

Buyer1 Buyern...

Buyer Software

DistributionServer

Supplier1

Supplier2

Suppliern

:

SupplierSoftware

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Page 11: Assessing the Reliability of a Human Estimator

Sample Estimation Screenshots

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Page 12: Assessing the Reliability of a Human Estimator

Feedback to Users

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Page 13: Assessing the Reliability of a Human Estimator

User Demographics - 1

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

• Average age: 31.43

• 148 males, 30 females

• 1% Ph.D., 24% Master, 72% Bach., 5% High School

• 25 countries:• 42% India, 32% U.S., 6% Romania, 4% Vietnam.

Page 14: Assessing the Reliability of a Human Estimator

User Demographics - 2

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

5.3856283.6629 Process Industry

4.4391251.4382 Procurement & Billing

Domain Experience

5.3856283.6692 Software Projects

4.4390251.4382 Hardware Projects

No. of Projects estimated

2.4757151.6967 Software Project Manager

3.0633251.0169 Hardware Project Manager

Years of Experience as a

Std. Dev.Max.Ave.

Years

Page 15: Assessing the Reliability of a Human Estimator

Data preprocessing & Experiments

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Remove outliers: Estimate > 10 * Actual or Estimate < 0.1*Actual

178 Samples163

Extract:

• 25 Worst under-estimators• 25 Best estimators• 25 Worst over-estimators

WEKA: 51 Classifiers, 4 seeds, 10-fold Attribute Reduction: 2 configs.

Page 16: Assessing the Reliability of a Human Estimator

Results: Under vs. Best

64%VFI

64%ThresSel

64%Logistic

68%J48

76%PART

AccuracyClassifier

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

YYYTotal Lang Exp.

YYYTotal Workshops

YTotal Conferences

YTech Undergrad

Courses

YYSoftware Proj. Mgmt

Exp.

YYLevel of College

YYYY# of Hardware Proj. Est.

YYYMgmt Undergrad Crses

YMgmt Grad. Courses

YYYYHardware ProjectManagement Exp.

YYYDomain Exp.

VFIThresh.PARTLogisticJ48Demographic

Evaluator Classifier

Ave. Accuracy48.22%

68%Logistic/Logistic

70%VFI / VFI

74%PART/J48

74%J48/J48

74%LogitBoost/J48

74%Bagging/J48

76%ThresholdSel/ThresholdSel

78%ADTree/Part

AccuracyClass./Eval.

Page 17: Assessing the Reliability of a Human Estimator

Under vs. Best: Attribute Reduction

YYTotal Lang Experience

YYTotal Workshops

YYTotal Conferences

YYTech Undergrad Crses

YYSoft. Proj. Mgmt Exp.

YYYLevel of College

Y# of Software Proj. Est.

YYYY# of Hardware Proj. Est.

YYYMgmt Undergrad Crses

YYYMgmt Grad. Courses

YYYYHardware Proj. Mgmt Exp.

YYYYDomain Experience

VFIThreshPARTLogisticJ48Demographic

Evaluator Classifier

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

68%Logistic / Logistic

70%VFI / VFI

74%PART / J48

74%ADTree / J48

74%PART/ PART

74%J48/ PART

76%ADTree/ThreshSel

AccuracyClass / Eval

Page 18: Assessing the Reliability of a Human Estimator

Under vs. Best: Attribute Reduction

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Domain Exp <= 3| No Of Hardware Proj Estimated <= 4| | Hardware Proj Mgmt Exp <= 1| | | MgmtUGCourses <= 0: BEST (23.0/8.0)| | | MgmtUGCourses > 0: UNDER (13.0/1.0)| | Hard. Proj Mgmt Exp > 1: BEST (5.0)| No Of Hard. Proj Est. > 4: UNDER (5.0)Domain Exp > 3: BEST (4.0)

J48 Rule: 74% Accuracy

BEST UNDER <-- classified as 21 4 | BEST 9 16 | UNDER

Page 19: Assessing the Reliability of a Human Estimator

Results: Best vs. Over

60%Ridor

60%ThresholdSel

60%RandComm

62%Decorate

66%RndTree

AccuracyClassifier

YYTotal Lang Experience

YTotal Workshops

YYTotal Conferences

YTech Undergrad Courses

YYSoft. Proj. Mgmt Exp.

Y# of Software Proj. Est.

YMgmt Undergrad Crses

YYYMgmt Grad. Courses

YYYHard. Proj Mgmt Exp.

ThresholdSelector

RidorRnd

CommDemographic

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Ave. Accuracy42.86%

62%ADTree /ThresholdSel

66%ThresholdSel /ThreshSel

72%Rand. Comm./RandComm

80%IB1 / Ridor

AccuracyClass/ Eval

Page 20: Assessing the Reliability of a Human Estimator

Experiment: Best vs. Over

62%RidorRidor

62%ThresholdSelRidor

64%RidorThresholdSel

66%ThresholdSelNNge

72%DecoratePART

72%DecorateNNge

72%DecorateRndComm

74%DecorateRandomFores

t

74%DecorateIBk

74%DecorateIB1

80%RndCommRandomTree

80%RndCommRndComm

AccuracyEvaluatorClassifier

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

YYYTotal Lang Experience

YTotal Workshops

YTech Undergrad Courses

YYTech Grad Courses

YSoftware Proj. Mgmt Exp.

YYProcurement Industry Exp

YLevel of College

Y# of Hardware Proj. Est.

YYMgmt Undergrad Courses

YMgmt Grad. Courses

YYYHard. Proj Mgmt Exp

YDomain Experience

ThreshRidorRand

Comm.DecorateDemographic

Page 21: Assessing the Reliability of a Human Estimator

Experiment:Best vs. Over

TechUGCourses < 45.5| Hardware Proj Mgmt Exp < 6| | No Of Hardware Proj Estimated < 4.5| | | No Of Hardware Proj Estimated < 3| | | | TechUGCourses < 23| | | | | Hardware Proj Mgmt Exp < 0.75| | | | | | TechUGCourses < 18| | | | | | | Hardware Proj Mgmt Exp < 0.13| | | | | | | | TechUGCourses < 0.5| | | | | | | | | TechUGCourses < -1 : F (1/0)| | | | | | | | | TechUGCourses >= -1| | | | | | | | | | Degree < 3.5 : A (4/0)| | | | | | | | | | Degree >= 3.5 : A (5/2)| | | | | | | | TechUGCourses >= 0.5| | | | | | | | | TechUGCourses < 5.5| | | | | | | | | | Degree < 3.5 : F (5/0)| | | | | | | | | | Degree >= 3.5| | | | | | | | | | | TechUGCrses < 2 : A (1/0)| | | | | | | | | | | TechUGCrses >= 2 : F (1/0)| | | | | | | | | TechUGCrses >= 5.5| | | | | | | | | | Degree < 3.5| | | | | | | | | | | TechUGCrs < 10.5 : A (3/0)| | | | | | | | | | | TechUGCrses >= 10.5| | | | | | | | | | | | TechUGCrs<12.5 : F (3/0)| | | | | | | | | | | | TechUGCrses >= 12.5| | | | | | | | | | | | | TechUGCrs<16: A (2/0)| | | | | | | | | | | | | TechUGCrs>15 : A (2/1)| | | | | | | | | | Degree >= 3.5 : F (1/0)| | | | | | | HardProjMgmt Exp >= 0.13 : A (2/0)| | | | | | TechUGCourses >= 18 : A (2/0)| | | | | Hard Proj Mgmt Exp >= 0.75 : F (1/0)| | | | TechUGCourses >= 23 : F (5/0)| | | No Of Hardware Proj Est >= 3 : F (1/0)| | No Of Hardware Proj Est >= 4.5 : A (5/0)| Hardware Proj Mgmt Exp >= 6 : F (4/0)TechUGCrses >= 45.5 : A (2/0)

The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

BEST OVER <-- classified as 23 2 | BEST 8 17 | OVER

Page 22: Assessing the Reliability of a Human Estimator

Conclusions

Very Good accuracy rates,

especially after attribute reduction

Bridges expert and model groups

http://nas.cl.uh.edu/boetticher/publications.html The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Page 23: Assessing the Reliability of a Human Estimator

http://nas.cl.uh.edu/boetticher/publications.html

Questions?

The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Page 24: Assessing the Reliability of a Human Estimator

http://nas.cl.uh.edu/boetticher/publications.html

Thank You !

The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

Page 25: Assessing the Reliability of a Human Estimator

References

1) Jorgensen, M., “A review of studies on Expert Estimation of Software Development Effort,” Journal of Systems and Software, 2004.

2) Jørgensen, Shepperd, A Systematic Review of Software Development Cost Estimation Studies, IEEE Transactions on Software Engineering, 33, 1, January, 2007, Pp. 33-53.

The 3rd International Predictor Models in Software Engineering (PROMISE) Workshop

http://nas.cl.uh.edu/boetticher/publications.html