53
MAC6912 - Ambientes de Desenvolvimento de Software Professor Marco Aurélio Gerosa Ana Paula Oliveira Bertholdo

Software Analytics

Embed Size (px)

DESCRIPTION

Four papers about Software Analytics.

Citation preview

Page 1: Software Analytics

MAC6912 - Ambientes de Desenvolvimento de Software

Professor Marco Aurélio Gerosa

Ana Paula Oliveira Bertholdo

Page 2: Software Analytics

• 4 papers:– Analytics for SW Development

(Zimmermann & Buse, 2010)

– Sw Analytics as a Learning Case in Practice:

Approaches and Experiences

(Zhang et al., 2011)

– Analyze this! 145 questions for data scientists in Sw

Engineering

(Begel & Zimmermann, 2014)

– What’s next in SW Analytics

(Hassan et al., 2013)

Page 3: Software Analytics

• Software engineering is a data rich

activity.

• Artifacts of a project’s development

– automation, efficiency, and granularity.

• Projects can be measured throughout

their life-cycle.

Page 4: Software Analytics

• SW development continues to be risky

and unpredictable.

• It is not unusual for major development

efforts to experience large delays or

failures.

Page 5: Software Analytics

• Substantial disconnect between

– (A) the information needed by project managers to make good decisions and

– (B) the information currently delivered by existing tools.

– At its root:• Problem: real-world information needs of project

managers are not well understood by the research community.

• Research has ignored the needs of managers and has instead focused on the information needs of developers.

Page 6: Software Analytics

• When data needs are not met…

– tools are unavailable

– too difficult to use

– too difficult to interpret or

– they simply do not present useful or actionable information

• Managers must primarily rely on past experience and intuition for critical decision making.

Page 7: Software Analytics

• The data-centric style of decision making is

known as analytics.

• The idea is to leverage large amounts of data

into real and actionable insights.

Page 8: Software Analytics

Figure 1: Analytical Questions. The researchers distinguish between questionsof information which can be directly measured, from questions of insight which arise from a careful analytic analysis and provide managers with a basisfor action.

Page 9: Software Analytics

• Transition isn’t easy!

• Insight necessarily requires

– knowledge of the domain coupled with the

– ability to identify patterns involving multiple

indicators.

Page 10: Software Analytics

• Managers may be too busy or may simply lack

the quantitative skills or analytic expertise to fully

leverage advanced analytical applications.

• One possibility is that tools should be created

with this in mind.

• Another possibility is the addition of an analytic

professional to the software development team.

Page 11: Software Analytics
Page 12: Software Analytics

• Conclusion– All resources, especially talent, are always constrained.

– This alludes to the importance of careful and deliberate decision making by the managers of software projects.

– The observation that software projects continue to be risky and unpredictable despite being highly measurable implies that more analytic information should be leveraged toward decision making.

– In this paper, the researchers • described how software analytics can help managers move from low-level

measurements to high-level insights about complex projects.

• advocated more research into the information needs and decision process of managers.

• discussed how the complexity of software development suggests that dedicated analytic professionals with both quantitative skills and domain knowledge might provide great benefit to future projects.

Page 13: Software Analytics

• Researchers (Microsoft Research Asia) advocate that when applying analytic technologies in practice one should:– (1) incorporate a broad spectrum of domain

knowledge and expertise, • e.g., management, machine learning, large-scale

data processing and computing, and information visualization; and

– (2) investigate how practitioners take actions on the produced information, and provide effective support for such information-based action taking.

Page 14: Software Analytics

– Various analytic technologies• (data mining, machine learning, and information

visualization).

– Software analytics is to enable to perform data exploration and analysis in order to obtain insightful and actionable information.

– Insightful information• meaningful and useful understanding or knowledge

towards performing the target task.

– Actionable information• upon which software practitioners can come up with

concrete solutions towards completing the target task.

Page 15: Software Analytics

• Developing a software analytic project

typically goes through iterations of the

life cycle of four phases:

1) task definition,

2) data preparation,

3) analytic-technology development, and

4) deployment and feedback gathering.

Page 16: Software Analytics

• Task definition is to define the target

task to be assisted by software analytics

– pull model: Stack Mine -> performance

analysis

– push model: XIAO -> refactoring and defect

detection

Page 17: Software Analytics

• Data preparation is to collect data to be

analyzed.

– 2 types of infrastructure supports: existing

ones in industry and in-house ones.

– StackMine-> existing Microsoft

infrastructure support.

– XIAO-> in-house code-analysis.

Page 18: Software Analytics

• Analytic-technology development is to develop problem formulation, algorithms, and systems to explore, understand, and get insights from the data.

– The SA team needs to acquire deep knowledge about the data (including its format and semantics) and target tasks.

– the time this acquirement process may be non-trivial.

Page 19: Software Analytics

• Deployment and feedback gathering

involves two typical scenarios.

– 1: the researchers have obtained some

insightful information from the data and they

ask domain experts to review and verify.

– 2: the researchers ask domain experts to use

the analytic tools to obtain insights by

themselves.

• “the more the customers use the tools, the “smarter”

the tools become.”

Page 20: Software Analytics

• Domain knowledge and expertise are strongly needed in successfully developing a software analytic project for technology transfer.

• Types of domain knowledge:

– Specific application domain knowledge (customers).

– Common application domain knowledge(family of sw applications).

– Data domain knowledge (data preparation).

Page 21: Software Analytics

• Types of expertise:– Task expertise

• work with the customers to learn the workflow.

– Management expertise• good management and communication skills to interact with

the customers and manage the team.

– Machine learning expertise. • to develop machine learning algorithms and tools (not just in

a black-box way).

– Large-scale data processing/computing expertise. • to design and implement scalable data processing tools and

learning tools.

– Information visualization expertise. • to design and implement good user interfaces and

visualization for presenting analysis results.

Page 22: Software Analytics

• Conclusion:

– What do developers think about your

result?

– Is it applicable in their context?

– How much would it help them in their daily

work?”

Page 23: Software Analytics

• Results from 2 Surveys related to data Science applied to SW Engineering.

• 1st Survey:• questions that sw engineers would like data

scientists to investigate about sw, sw processes and practices and sw engineers.

• 2nd Survey:

– Sw engineers rate 145 questions andidentify the most importante ones to workon first.

Page 24: Software Analytics

• Businesses of all types commonly use analytics to better reach and understand their customers.

• Many software engineering researchers have argued for more use of data for decision-making.

• The demand for data scientists in software projects will grow rapidly.

• Harvard Business Review named the job of Data Scientist as the most desired Job of the 21st Century

• By 2018, the U.S. may face a shortage of as many as 190,000 people with analytical expertise and of 1.5 million managers and analysts with the skills to make data-driven decisions, according to a report by the McKinsey Global Institute.

Page 25: Software Analytics

• Research goal:

– Presents a ranked list of questions that sw

engineers want to have answered by data

scientists.

– The list was deployed among professional

sw engineers at Microsoft.

Page 26: Software Analytics
Page 27: Software Analytics

• The research:– provides a catalog of 145 questions that

software engineers would like to ask data scientists about software.

– ranks the questions by importance (and opposition) to help researchers, practitioners, and educators focus their efforts on topics of importance to industry.

– calls to action to other industry companies and to the academic community to replicate its methods and grow the body of knowledge from this start (technical report).

Page 28: Software Analytics

• Initial survey:

– 2 pilot surveys to 25 and 75 Microsoft engineers.

– The pilot demonstrated the need to seed the survey with data analytics questions.

• What impact does code quality have on our ability to monetize a software service?

– 1500 SW engineers in September 2012.

– 36,5% developers, 38,9% testers, 22,7% program managers.

Page 29: Software Analytics
Page 30: Software Analytics
Page 31: Software Analytics
Page 32: Software Analytics

• Rating Survey:

– Split Questionnaire Survey Design

• Component blocks

– 607 responses (2500 engineers)

– 16,705 ratings

– Multiple-choice format

– 29,3% developers, 30,1% testers and

40,5% program managers.

Page 33: Software Analytics
Page 34: Software Analytics
Page 35: Software Analytics
Page 36: Software Analytics

• Of the questions with the most

opposition, the top five are about the

fear that respondents had of being

ranked and rated.

Page 37: Software Analytics
Page 38: Software Analytics

Catalog of 145 questions is relevant for: • Research:

– the descriptive questions outline opportunities to collaborate with industry and

– influence their software development processes, practices, and tools.

• Practice:• the list of questions identifies particular data to collect and analyze to

find answers,

• as well as the need to build collection and analysis tools at industrial scale.

• Education:• the questions provide guidance on what analytical techniques to teach

in courses for future data scientists,

• as well as providing instruction on topics of importance to industry (which students always appreciate).

Page 39: Software Analytics

• Conclusion– Researchers hope that this paper will inspire similar research

projects.

– In order to facilitate replication of this work for additional engineering disciplines and companies, they provide the full text of both surveys as well as the 145 questions in a technical report.

– With the growing demand for data scientists, more research is needed to better understand how people make decisions in software projects and what data and tools they need.

– There is also a need to increase the data literacy of future software engineers.

– Lastly, we need to think more about the consumer of analyses and not just the producers of them (data scientists, empirical researchers).

Page 40: Software Analytics

• 6 established experts in SW analytics

• What is the most importante aspect of

this field?

Page 41: Software Analytics

• 1) SW analytics should go beyond developers.

• 2) Analytics should prove its relevance to

practitioners.

• 3) Mere numbers aren’t enough.

• 4) 3 Questions for analytics.

• 5) Opportunities for natural SW analytics.

• 6) Assistance from Information Analysts.

Page 42: Software Analytics

• SW analytics should go beyonddevelopers

– SA focuses on helping individual developers with coding and bug-fixing decisions

• by mining developer-oriented repositories such as version control systems and bug trackers.

– SA needs to service a project’s various stakeholders

• marketing, sales, support teams – not just developers.

Page 43: Software Analytics

• SW analytics should go beyond

developers

– Artifacts and Knowledge across a project’s

various facets.

– Importance os a piece of code and its

impact on user satisfaction and revenue.

• Marketers -> field usage data.

• Sales staff -> inherent value that customers

associate with each feature.

Page 44: Software Analytics

• Proving relevance to Practitioners

– Future -> Layers of context are taken intoconsideration:

• Domain of SW development– nonfunctional requirements, environments, tools,

idioms, and so on.

• Domain of the software itself– databases, applications, and so on.

• Context of the overall software project– Requirements, glossary, architecture, community, and

so on.

Page 45: Software Analytics

• Proving relevance to Practitioners

– Software analytics has to prove its

relevance by showing its cost effectiveness

versus the alternative, which is doing

nothing.

• Doing nothing can be amazingly efficient.

• We need to evaluate these techniques with

practitioners in mind.

• More meaningful and less superficial software

analytics.

Page 46: Software Analytics

• Mere numbers aren’t enough

– Numbers and equations are important to capture relations in the data,

– For practical use: they must be accompanied with interpretation and visualization.

– It’s a transfer from the quantitative domain to the qualitative domain.

– more research is needed on:• how to bring the message out of the software

analytics to those who make decision based on them.

Page 47: Software Analytics

• 3 Questions for Analytics:

• 1) How much better is my model performing than a simple strategy, such as guessing?

• 2) How practically significant are the results?– effect sizes

• 3) How sensitive are the results to small changes in one or more of the inputs?– uncertain data

Page 48: Software Analytics

• Opportunities for natural SW analytics– using models from statistical natural language

processing for a new kind of analytics.

– What most people write and say, most of the time, is highly repeatable and predictable.

– Devices like Google Translate and Siri.

– Code is no different.• most everyday code is simple and highly predictable.

– Able to adapt standard n-gram models from statistical NLP to code, and train them on hundreds of millions of LOC.

– Code is actually between 8 and 16 times more predictable than English.

Page 49: Software Analytics

• Wanted: Assistance from Information Analysts

– Mission Impossible and TV Series 24• Fields agents -> heroes -> developers

• We shouldn’t neglect the information analysts (Chloe on 24)

• Information Analysts -> provide critical information

– such as the backgrounds, strengths, and weaknesses of the people, places, and eventualities faced by the field agents.

– Without the information analysts, it’s hard to imagine a successful mission.

– Information analysts = real heroes.

Page 50: Software Analytics

• Wanted: Assistance from Information Analysts– Developers have to figure out all the necessary

information about • what and where and how to change the software by

themselves.

– We need to provide the services of information analysts to developers

• and assist them in making the right decisions.

– SW analytics can continually provide contextual information based on developers’ current tasks.

– Decent information visualization and computer-human interaction technologies

• can help present this information efficiently.

Page 51: Software Analytics

• Papers discuss:

– Context!

– Relevance for practioners.

– New ways for conducting SW analytics.

– Importance of new studies.

– Addition of an analytic professional to the

software development team.

Page 52: Software Analytics

• Video:

– https://www.youtube.com/watch?v=nO6X0a

zR0nw

IEEE Software editor in chief Forrest Shull

speaks with Tim Menzies about the growing

importance of software analytics. From IEEE

Software's July/August 2013 issue.

Page 53: Software Analytics

[1] Buse & Zimmermann: Analytics for Software

Development (FoSER 2010).

[2] Zhang et al.: Software Analytics as a Learning Case

in Practice: Approaches and Experiences (MALETS

2011).

[3] Begel & Zimmerman: Analyze This! 145 Questions

for Data Scientists in Software Engineering (ICSE 2014).

[4] Hassan, Hindle, Runeson, Shepperd, Devanbu, &

Kim: What’s Next in Software Analytics (IEEE Software

2013).