Measuring for team effectiveness (NEW)

1

Measuring for Team EffectivenessMark Barber, Agile Coach @ MYOB @mark_barbs

Why measure anything?

Key takeaways

• Why is it important to measure?

• What can we learn from these metrics?

• How do we use the data responsibly?

“When a measure becomes a target, it ceases to be a good measure” – Goodhart’s Law

“Measures tend to be corrupted/gamed when used for target-setting” - Campbell’s Law

“Monitoring a metric may subtly influence people to maximize that measure” – The Observer Effect

(via https://www.industriallogic.com/blog/what-should-we-measure/)

Measure for OUTCOMES not output.

Focus on being EFFECTIVE not efficient.

Metrics help us to answer…

• Are we building the RIGHT THING?

• Are we building the THING RIGHT?

• Are we building it in a SUSTAINABLE WAY?

7

Building the RIGHT THING (a.k.a. PIRATE metrics)

Acquisition

• WHAT: # new users in a given period

• WHY: Upward trends are indicators of positive user engagement

• HOW: Visitor data

Activation

• WHAT: A measure of how users are engaging with your product for the first time

• WHY: Focus on making our product more engaging to new users and converting to loyal users

• HOW: Track multiple page visits, new account sign ups

Retention

• WHAT: A measure of how often and for how long our customers are coming back

• WHY: Another measure of even higher engagement

• HOW: Repeat visits, length of session

Referral

• WHAT: Measuring how many customers come to us from existing customer referral

• WHY: People will generally only refer a product that they find valuable and/or love, so referrals are a good indicator that we are building a high value product

• HOW: Dependent on referral mechanism (for example, surveys, email clickthroughs)

Revenue

• WHAT: The amount of revenue that can directly be attributed to users of your product

• WHY: If our products are successful, they should generate revenue.

• HOW: Track all revenue generating activities, such as products purchases, licences purchased

(Note, revenue may be replaced by “some benefit” if you are a team that builds internal products)

13

Building the THING RIGHT

Themes

• Flow• Eliminating waste• Progress

15

Building the thing right – FLOW

Why is flow important?

• Predictability• Less bottlenecks• Frequent delivery of value to the customer• Easier to manage capacity

How can we measure flow?

• Cycle and lead time• Queue sizes• Tact time• Throughput• Cumulative flow• Process control charts

Process Control Charts

• WHAT: Measuring and visualising the uniformity of your cycle time.

• WHY: Predictability, drive conversations around outliers (mura).

• HOW: Use process control charts from your cycle time data.

Cumulative Flow

• WHAT: Number of stories in each lane, over time.

• WHY: It highlights when there are bottlenecks in the process.

• HOW: Count cards in each step daily.

From paulklipp.com (Getting started with kanban)

22

Building the thing right – ELIMINATING WASTE

What is waste?

• Any activity leading to a suboptimal system• An activity that does not add to the value stream• 3 Lean wastes - muda, mura and muri

How can we measure waste?

• Value stream mapping• Cycle time throughout the value stream, especially wait times• Failure demand

Failure Demand

• WHAT: Identify and visualise the time spent on failure demand work (bug fixing, production support issue, unnecessary rework).

• WHY: Identify areas of improvement by highlighting areas of concern.

• HOW: Simple timers when working on certain activities.

"You could have a timer go off a dozen times per day and record whether you were a) wasting time, b) working on new stuff, or c) fixing stuff that should have been done correctly the first time. If the team does that for a week, a picture will start to emerge. After a month, you'll know how much of your capacity is spent on failure demand”

Kent Beck

27

Building the thing right – PROGRESS

A better way to measure progress…

• Visualising in a post burn up/down world• Providing the “full picture” to all stakeholders• Release confidence, risk management, deployment cadence

Release Confidence

• WHAT: A quantifiable measure of how confident the team is of releasing the next version of a product within defined time frames.

• WHY: It replaces the concept of a deadline with a conversation about how confident we are as a team that we will be ready to release. It drives conversations when dissonance exists. what does person A know that makes them far less confident than person B.

• HOW: Vote on confidence based on bands (e.g. 0 - 4 weeks, 4 - 8 weeks etc). Visualise reasons for change, such as increased understanding leading to broader scope.

Visualising Release Confidence

Risk Radiators

• WHAT: A collaborative approach to triaging, assessing and mitigating risks combined with metrics to watch for trends.

• WHY: Managing risks early is extremely important. Upward trends in high priority risks may be a sign to reset.

• HOW: Visualise risks on the wall, identify risks early (e.g. during inception) and review regularly. Identify severity and likelihood. Track mitigation efforts. Visualise trends.

33

Building in a SUSTAINABLE WAY

Themes

• Team Health

• Quality

35

Building in a sustainable way – TEAM HEALTH

What do we mean by TEAM HEALTH?

• How engaged is the team with the work and with each other?• Are we creating a happy and safe environment?• Are there underlying issues we are not addressing? • Do we have time for learning and continuous improvement? Are we managing “muri”?

How can we measure team health?

• Health checks• Safety and happiness indicators• Feedback health• Slack time• Continuous improvement actions

Team Health Check

• WHAT: A more detailed analysis of factors that contribute to the overall health of the team.

• WHY: We can target specific areas for kaizen and track how we are improving.

• HOW: Spotify health check, for example. Adapt based on the team’s needs. Remember it’s about the data AND the conversations.

Slack Time

• WHAT: Identify and visualise the time spent on learning and improving.

• WHY: Slack time is essential for continuous improvement and innovation.

• HOW: Track activities. Team surveys.

41

Building in a sustainable way – QUALITY

Driving a focus on quality

• Lack of quality increases failure demand• How confident are we in our code and systems?• How robust are our incident response processes?

Measuring our confidence

• System health checks• System confidence• Test coverage

System Health

• WHAT: Measurements that indicate the health of our systems in production, such as load times.

• WHY: Monitoring system health allows us to be proactive in ensuring a good user experience. For example, if we see page load times trending up, we can assess and fix before users contact us.

• HOW: Use automated tools such as new relic, but most importantly, visualise these metrics in your team area.

System Confidence

• WHAT: A shared understanding, often subjective but always collaborative, of how confident we are in our ability to maintain and build new features upon our systems.

• WHY: It is a tool to help us prioritise work and identify areas of concern that we may need to address (such as an increase in technical debt).

• HOW: Any form of health check on system confidence would be sufficient.

Measuring our incident responses

• Number of production incidents• How long it takes us to detect an incident• How long it takes us to resolve an incident

Mean Time to Detect (MTTD)

• WHAT: The mean time taken by the team to detect a production incident.

• WHY: The less time taken to detect production incidents the less likely it is to severely impact users, revenue and/or business reputation due to a potentially quicker time to resolve. Always run Post-Incident Reviews with an aim to generate actions to lower MTTD.

• HOW: The best time to gather this information is during a post-incident review by doing a timeline exercise. Keep track of the time taken to detect and note that it may vary depending on incident type.

Mean Time to Resolve (MTTR)

• WHAT: The mean time taken by the team to resolve a production incident.

• WHY: Quicker resolution times can usually reduce the impact an incident may have, so track this with the aim of trending downward. Look for outliers and call these out in retrospectives.

• HOW: Gather this information during post-incident reviews by doing a timeline exercise.

49

(APPENDIX) Measuring team effectiveness – more examples…

Cycle Time

• WHAT: How long a “unit of work” takes to move through a process step. In most cases, we are referring to the number of days a user story takes to go from “dev start” to done.

• WHY: Reduce cycle time to shorten the feedback loop.

• HOW: Dot the card (this has multiple benefits) and record the number of days when done.

Cycle Time (rolling average)

• WHAT: Average cycle time over a shorter (rolling) period.

• WHY: Long term, overall average can “smooth” out trends and hide areas of concern.

• HOW: Calculate average CT only for the last x-weeks.

Tact Time

• WHAT: The time between starting new work.

• WHY: Another way of measuring predictability and flow and highlighting issues using trends.

• HOW: Record dates on which new work is started. This can be done at various levels of granularity (for example, user stories, epics, features, initiatives)

Throughput

• WHAT: Number of stories completed in a given time period.

• WHY: Helps us plan by measuring consistent flow / rate of work.

• HOW: Date a card when completed and derive count for a period, eg stories per month

Work-In-Progress Limits

• WHAT: Limit the number of stories in each step (lane) and track when and why limits are exceeded.

• WHY: Improve flow and reduce context switching.

• HOW: Use cumulative flow diagrams to inform WIP limits.

Lead Time

• WHAT: Amount of time it takes a unit of work to go through the entire value stream.

• WHY: Gives us an insight into our value stream and can highlight areas of waste. We can watch for issues such as long queues or slow deployment processes.

• HOW: Track time from idea to market, for example, date the card when you write it and add it to the backlog and date it when it is in production.

Queue Size

• WHAT: The amount of work sitting in queues, such as backlogs.

• WHY: Drives prioritisation conversations. Aim to minimise the size of queues (both quantity of work and the length of time it takes to pass through).

• HOW: Count cards. Date when they are added to backlog.

Time Between Production Deployments

• WHAT: Time between deployments to production.

• WHY: Working software is the primary measure of progress, so we want to see it put in front of users as often as possible. Is it frequent enough for us? If time is trending upward, is it a symptom of batch sizes being too large?

• HOW: Track date (and time) between production deployments and visualise means and trends

Team Happiness Indicator

• WHAT: A simple indicator of how happy people are in the team.

• WHY: We want the team to be happy as it helps us do our best work. Are people in the team enjoying what they do? are they being fulfilled? do they have any unmet needs?

• HOW: Simple traffic light, ad-hoc or at regular intervals, anonymous or otherwise (depending on safety).

Team Safety Indicator

• WHAT: A simple indicator of how safe people feel to speak openly within the team, to speak their minds and voice concerns.

• WHY: If people do not feel safe to speak their mind we will struggle to improve as a team, and we people will feel unhappy and stressed.

• HOW: Team safety check (1 - 5). This could be done as part of retrospectives.

Feedback Health

• WHAT: Measuring how often people in the team are giving and receiving constructive feedback.

• WHY: A healthy feedback culture is integral to an environment of continuous improvement.

• HOW: Feedback matrix.

Retrospective Actions

• WHAT: Tracking the number of retro actions completed as a percentage of retro actions raised.

• WHY: Measure the effectiveness of retrospectives (large, unactionable results from retros will lead to disengagement and loss of continuous improvement opportunities).

• HOW: Tracking retro actions raised and completed (e.g. retro action kanban).

Note, this is not highly useful on its own, but could be a leading indicator for team health problems.

Test Coverage

• WHAT: A measure of the degree to which your code base is covered by tests, usually as a percentage of the total code.

• WHY: Testing allows your team to make changes to code with a higher level of confidence. Test coverage will tell you how much of your code is tested (or untested) but won’t tell you about the quality of these tests.

• HOW: Many automated tools will calculate code coverage. Often teams will agree on a minimum level.

Number of Production Incidents

• WHAT: A count of production incidents (e.g. bugs, security incidents, outages).

• WHY: An upward trend in the number of production incidents is often a trailing indicator of potential quality issues in our systems. We should be aiming to always trend down.

• HOW: Keep track of production incidents (date occurred) and visualise trends/counts/days since last incident.

Technology

Measuring for team effectiveness (NEW)