Benchmarking Software Estimation Methods

1

Benchmarking Software Estimation Methods

Adam Boyd, Sameer Huque, Kristine Pachuta, Travis Pincoski, John TrumbleJuly 11, 2013

2

OUR TEAM • Product Director at Bloomberg LP

• Former IT Sourcing Director at Bloomberg and Pricing Analyst at General Dynamics

• Mechanical Engineer at US Bureau of Engraving & Printing

• Sr. Consultant at Deloitte Consulting LLP

• Sr. Education Associate at the American Chemical Society

• Senior Financial Planning Analyst at BAE Systems

Adam

Sameer

Kristine

Travis

John

3

AGENDA

1 Review of Project Scope

2 Best Practices Research

3 Interview Results

4 Data Analysis and Benchmarking

5 Data Recommendations

6 Go-Forward Plan

4

OUR PROJECT SCOPE COVERED BEST PRACTICE RESEARCH AND DATA BENCHMARKING

PROJECT SCOPE

• Compile a list of software estimation methods, best practices and benchmarks used at competing firms in US

• Analyze GMD data and summarize findings

• Analyze third party data

• Present data and suggestions to GMD project sponsors

5

OUR RECOMMENDATIONS ARE BASED ON THE RESULTS OF OUR RESEARCH AND ANALYSIS

Researched Industry Best Practices

Interviewed Subject Matter Experts

Analyzed Available Data

Formulated Recommendations

• Searched Industry Journals • Comprehensive News Searches• Established Summaries

• Interviewed experts across the industry• Used standardized questions• Established Summaries

• Analyzed GMD provided data• Analyzed third party data• Established Equation and Recommendations

• Compiled themes across all research areas• Formulated the data model• Delivered multiple options for recommendations

1

2

3

4

A ROADMAP OF INCREMENTAL IMPROVEMENTS WILL HELP YOU MEET YOUR GOALS

Estimation Consistency

Pro

du

ctiv

ity

Continually improve the consistency, reliability and productivity of your developers and the data and resources to track and estimate

Support consistent and reliable project planning

Support timely and effective estimates, bug fixes and re-work

Build and deploy in-house estimating and productivity planning capability

7

AGENDA



3 Interview Results



6 Go-Forward Plan

SEVERAL THEMES EMERGED IN OUR RESEARCH OF BEST PRACTICES

Theme Research

Some data is easy to game

Metrics can be de-motivating

The best measurements are quantitative and qualitative

• Data should be refined and measured• Data analysis should be coupled with soft skills analysis• The best developers may not make the best leaders• Measures are best used in large distributed projects

• Metrics like lines of code are easy to trick• Mentoring or leading projects may result in low metrics• Refactoring or documenting are often not accounted

• Metrics should not be used punitively• Comparing across projects is difficult and possibly inaccurate• Project managers, not management should control

----------------------------------------------------------------------------------------------------------------

----------------------------------------------------------------------------------------------------------------

DATA MODELS SHOULD HAVE FIVE KEY CHARACTERISTICS

To handle multiple technologies

To looks at all levels of the application

Enough to test against industry standards

To show where improvements can be made

To not detract from the workload

Breadth

Depth

Explicit

Actionable

Automated

TECHNICAL GOALS SHOULD ALIGN WITH YOUR CORPORATE AND STRATEGIC GOALS

Technical Goals

Strategic Goals

Corporate Goals

Customer SatisfactionDeveloperEfficiency

Software Quality

Customer SatisfactionOn Time Delivery

Product Quality

Customer SatisfactionEfficiencyQuality

10

11

AGENDA



3 Interview Results



6 Go-Forward Plan

12

WE INTERVIEWED MANY INDUSTRY EXPERTS Si

ze

Business SectorConsulting Academia Government Corporate

Co-Director, Security and Software Engineering Research Center

Director, Thoughtworks Consulting

Chief Technologist, SWComplete

Chief Information Officer, Philips IT

Head of Digital Content Development, AOL

Georgetown University

13

THE EXPERTS HAVE STRONG OPINIONS ABOUT THE NECESSITY OF MEASURING DEVELOPER OUTPUT

most formal measurement systems may de-motivate developers

peer code review is key

identify the developer that other developers go to for advice

we don't base pay or increase on development metrics at all

a great developer will have vocal ideas about changing things AND make it happen

14

MOST EXPERTS STRESSED THE DIFFICULTY OF MEASUREMENT (1)Interviewee Key Takeaways

Co-Director, Security and Software Engineering Research Center, Georgetown University

lines of code per day delivered (debugged) best metric but backward looking

function point analysis allows interoperability but no consensus on translation

evaluations must account for code complexity

soft skills (communication, attention to detail) equally important to quality

peer evaluations critical component of developer evaluation

important not to penalize developers for taking time to exceed expectations

Chief Technologist,SWComplete

15

MOST EXPERTS STRESSED THE DIFFICULTY OF MEASUREMENT (2)Interviewee Key Takeaways

Former Head, AOL Digital Content Development

difficult to disaggregate contributions of individual team members

leadership important but hard to quantify

how to quantify extensibility of code?

poor developers will always be exposed regardless of measurement

estimations should be done with input from multiple developers

look at teams not individuals

track what and how both equally important—

CIO, Philips IT

Director, Thoughtworks Consulting

16

ONLY ONE OF THE COMPANIES WE INTERVIEWED MEASURES INDIVIDUAL DEVELOPER PRODUCTIVITY

Measures Productivity Do Not Measure Productivity

17

AGENDA



3 Interview Results



6 Go-Forward Plan

18

WE CONDUCTED AN ANALYSIS OF THE GMD PROVIDED DATA SET

• Comparison of GMD data to data from ISBSG (International Software Benchmarking Standards Group)

• Research to see what other metrics may be tracked

• Analysis of GMD regression model and raw data

• Re-define existing variables

• Experiment with new approaches

Combined best practices found in research with experimental work done with GMD data to determine best recommendation for productivity measurements moving forward

Approach 1 Approach 2

Outcome

19

WE ALSO CONDUCTED AN ANALYSIS OF A THIRD PARTY DATASET

•Sample data set obtained from ISBSG• ISBSG tracked roughly thirty different metrics in estimation

•They tracked many similar variables• Task type (creation/modification)• Language and standards used

•They also tracked a few items that may be helpful to GMD in the future• Job Size• Quantity of defects

GOAL: Benchmark GMD data to an independent data set

20

SEVERAL EXPERIMENTS WERE RUN WITH THE GMD DATA TO DETERMINE THE CRITICAL PATH

GOAL: Develop a predictive model with a high R2

Re-defined variables as categorical

Tested Java programming jobs

only

Eliminated jobs in which programmers

had <6 months experience

Grouped programming

languages by type

Major improvement when base lining to

average programmer

Assigning different

weighting to job difficulty

21

SUGGESTION: BASELINING TO THE AVERAGE

Defining the average allows for better productivity evaluation- Better performers will be faster than the average time assuming acceptable quality

AVERAGE

SLOWER THAN AVERAGE

FASTER THAN AVERAGE

22

THIS RESULTED IN THE NEW MODEL

• Seven programmers used to define the average• Java: #4 & #9• .NET: #19• LN4: #27• COBOL: #40• RPG: #45• JavaScript: #79

• Additional programmers added to GMD list to create a broad set of data

• Programming languages categorized as 0 or 1 and each language given its own variable

23

INPUT VARIABLES COEFFICIENT P-VALUE

Constant term -4.349 0.000

Complexity 1.492 0.000

RPG 3.177 0.000

COBOL 1.007 0.287

LN4 2.476 0.000

Java 2.867 0.000

.NET 0.490 0.674

JavaScript 2.860 0.001

SQL 0.000 N/A

Development? -0.123 0.896

Creation? 0.607 0.007

Developing Experience (Months)

-0.004 0.917

Knowledge of the Business 0.043 0.806

Programming Language Domain

-0.068 0.046

Technology/Framework Domain

0.029 0.018

Knowledge of Tool (Software?) 0.083 0.011

THE NEW MODEL

Residual df 264

Multiple R-squared 0.65

Std. Dev. estimate 0.89

Residual SS 208.50

24

RMS ERRORS INCREASE WITH COMPLEXITY

25

COBOL HAS THE HIGHEST RMS ERROR, LN4 AND JAVA HAVE THE LOWEST

26

AGENDA



3 Interview Results



6 Go-Forward Plan

27

THREE KEY DATA RECOMMENDATIONS

Baseline the model to the average programmer to establish relative performance

Re-define programming language variable to establish proper context

Expand upon the current complexity variable to reduce error measurements

1

2

3

28

1ST RECOMMENDATION:BASELINE THE MODEL TO THE AVERAGE

• Account for inherent talent• Distinguish great performance

from poor performance• Adjust as necessary to maintain

the baseline average

Baseline the model to the average

1

29

• Current scale for programming language has no predictive relationship

• Group by language type or on a difficulty scale

• If neither is used, treat as binary

2ND RECOMMENDATION:RE-DEFINE PROGRAMMING LANGUAGE VARIABLE

Re-define programming

language variable

2

30

• Current error measurement grows with project difficulty

• Change variable to represent more variation in difficulty

• Add additional break points as needed for accuracy

3RD RECOMMENDATION:EXPAND THE COMPLEXITY VARIABLE

Expand the complexity variable

3

31

MOVING BEYOND THE DATA CREATES A HIGH PERFORMANCE CULTURE

High Performance Culture

Soft Skills

Aligns Data

32

AGENDA



3 Interview Results



6 Go-Forward Plan

OUTLINE OF HOW TO ACHIEVE THESE GOALS

Estimation Consistency

Resource estimating models and tools

Continual collection of actuals to assist with refinement of estimating models and tools

Reusable resource catalogs

Standard cost estimating quality assessment

Dedicated Estimation Program Office (EPO)

Next Steps• Adopt commercially

available tools• Consider Agile

Maintenance of existing and development of new resource estimating models and tools by the EPO

Pro

du

ctiv

ity

Standard product-based CES tailored for different offices in the Department

OTHER OPTIONS: STRATEGIC APPLICATION OF AGILE

Determine Business Issue

Examine Organizational

Culture

Assess Deployment Strategy

Tailor Project Approach

Determine if Agile is a good fit for the needs of the business

Examine the beliefs and values articulated by members of the organization

Assess the organization to ensure business issue is addressed and organization is minimally disrupted

Tailor relevant pieces of the methodology to confirm business issue is fully supported by project team

Business Issue

Organizational Culture

Deployment Strategy

Project Approach

OTHER OPTIONS: EFFECTIVE ESTIMATION CAPABILITY

In addition to organizational commitment, executive sponsor support, and adequate resources, the following is a partial list of critical success factors in building a sustainable resource analysis capability:

Success Factor Description

Dedicated Program, Team, or Support Expert stewards of the processs and coaches who maintain momnetum and quality across the organization.

Product-oriented Cost Element Structure A well understood CES that communicates to the organization what the investment will include. An itemized “invoice” for the investment.

Resource Catalogs Predefined reusable increments of scope that have been previously socialized and endorsed/approved. Enables quick, consistent, and defensible definition of a cost/resource estimate.

Resource Estimating Models and Tools Standard models and tools which support and enforce the estimating process.

Software Sizing Methodology Software costs are a function of the volume of software to be developed. Measuring the volume is critical for all software resource estimation.

Quality Assurance / Validation of Investment Proposal

QA checklist help ensure thoroughness. Assessments provide maturity measure of process and program capability.

36

OTHER OPTIONS: DEVELOP RESOURCE CATALOGS

• A resource catalog is a small portion of scope for an IT project• A resource catalog will include reusable size increments that will reflect

cost, effort, staffing, schedule, labor rates, labor types, and risk• Size increments include both non-recurring and recurring costs, and are

defined logically or in “T-shirt sizes”• Resource catalogs enable:

• Quick, consistent, repeatable, and defensible development of life-cycle estimates

• Tracking and trend analysis of size increments, enabling better understanding of efficiency gains

• Structure for continual estimation process improvement

Resource Catalogs include reusable size increments

37

OTHER OPTIONS: OVERVIEW OF 3RD PARTY SOFTWARE OPTIONS

Options for the best software tools to track productivity

• According to a 2009 University of California study, developers spend an average of 11 minutes on a task before they are distracted by a separate task• It takes them 25 minutes to get back to the original task

Mylyn – open source option

Tasktop and Tasktop pro -

integrate with IBM Rational

Cubeon – runs on Google Code

TAKING THESE STEPS WILL RESULT IN SOUND MEASUREMENTS AND ESTIMATIONS

Collect More data

Create a resource library

Consider alternatives

• Start collecting project actuals• Provide standard data collection templates to enable

data analysis• Manage projects against resource estimates

• Quick, consistent, repeatable, and defensible development of resource estimates

• Tracking and trend analysis of size increments, • Structure for continual estimation process improvement

• Third Party Software• Cost oriented product structure• Adoption of Agile development methods

1

2

3

39

Gracias!

40

Appendix

41

WE REVIEWED RESEARCH FROM MULTIPLE SOURCES

Sample Works Reviewed:

Measuring Developers, Aligning Perspectives and other Best Practices by Medha Umarji and Forrest Shull published in the IEEE Software Journal in 2009

The Futility of Developer Productivity Metrics by Neil Mcallister November 17, 2011http://www.infoworld.com/d/application-development/the-futility-developer-productivity-metrics-179244

Five Requirements for Measuring Application Quality by Jitendra Subramanyam, director of research, CAST Inc. June 17, 2011 http://www.networkworld.com/news/tech/2011/061611-application-quality.html

Establishing a Measurement Program, Whitepaper by the Construx Staff

http://www.infoworld.com/d/application-development/the-futility-developer-productivity-metrics-179244

http://www.infoworld.com/d/application-development/the-futility-developer-productivity-metrics-179244

http://www.networkworld.com/news/tech/2011/061611-application-quality.html

OTHER OPTIONS: STRATEGIC APPLICATION OF AGILE

Determine Business Issue

Examine Organizational

Culture

Assess Deployment Strategy

Tailor Project Approach

Determine if Agile is a good fit for the needs of the business

Examine the beliefs and values articulated by members of the organization

Assess the organization to ensure business issue is addressed and organization is minimally disrupted

Tailor relevant pieces of the methodology to confirm business issue is fully supported by project team

• Project charter• List of key

dependencies

• Stakeholder interviews

• Stakeholder survey & questionnaire results

• Deployment strategy plan

• Project processes• Project team charter• Project team values

Is there high market uncertainty or need for customer involvement?

Is the environment rapidly changing?

What are the values of the organization? Is the organizational culture

conducive to Agile adoption? Who are the

key stakeholders?

What strategy fits the business need? What

strategy is supportable by the organization?

What is required to deliver the project?

What is the commitment of team

members to each other?

Business Issue

Organizational Culture

Deployment Strategy

Project Approach

43

SEVERAL EXPERIMENTS WERE RUN WITH THE GMD DATA

• Different methods were attempted to achieve a higher R2 for the predictive model

• Re-defined variables as categorical• Assigned different weighting to job difficulty• Tested Java programming jobs only• Grouped programming languages by type• Eliminated jobs in which programmers had <6 months experience

• High variability• Also experimented with other data mining approaches other than multiple linear regression

• Regression trees• Good results (Lower RMS Error), not precise enough for GMD

• Principal Components Analysis• Major improvement when base lining to average programmer

OTHER OPTIONS: DEVELOP PRODUCT-ORIENTED COST ELEMENT STRUCTURE

• Develop product-oriented cost element structure (CES) that will facilitate decision making

• Utilize resource catalogs to populate individual cost elements

• Standardize as many of the cost elements as possible to facilitate consistent cost estimation while allowing easy integration of project-specific custom cost elements

• Facilitate repeatable and consistent cost reporting with CES standardization

• Facilitate consistent and reliable reporting of cost information