• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Stop wasting time looking for files and revisions. Connect your Gmail, DriveDropbox, and Slack accounts and in less than 2 minutes, Dokkio will automatically organize all your file attachments. Learn more and claim your free account.


Scrum Metrics and Myths

Page history last edited by Pete Behrens 11 years, 6 months ago


  • Pete Behrens Facilitated
  • About 30 people from the Gathering 


Topic Overview

This was a session to evaluate how metrics are used and misused in measuring Scrum teams in terms of predictability, quality, predicability and value. This session was held by facilitating a discussion brain storming session in which each person in the group was open to write down metrics they have found effective or ineffective at driving team performance in some dimension. A short discussion followed each item presented and sometimes moved the item from a good to poor metric.



Overall, we used the work of Robert Austin's Measuring and Managing Performance in Organizations. His thesis work indicated that any measurement can lead to dysfunctional behavior based on how it is used. Thus, there are not necessarily good or bad metrics, but rather good and bad uses of those metrics. For the most part, items below listed as metrics or myths may be reversed (e.g. metrics become myths or myths become metrics based on how they are used).

Following the brainstorming and discussion session, we provided an opportunity for each person to vote on the metrics and myths that most resonated with them - essentially a popularity contest to determine the best metrics.

Below is a list of the four dimensions we categorized our metrics: Productivity, Quality, Predictability and Value. Overall, most metric challenges (e.g. myths) surfaced in the productivity measurements. Quality was second, followed by only a couple of myths discovered in Predictability and Value.  In each dimension, we articulate the metrics that were brainstormed and discussed followed by the votes in parenthesis. While they are put in the metrics and myths buckets based on their general use in Scrum, they could be moved based on poor or appropriate use.

A couple of additional general thoughts for managers: Thinking that developers can just work faster without impacting quality is a misconception. Teams need to determine ways of increasing efficiency and quality - just working faster or harder is not a meaningful goal. Thinking that with metrics, the leader or team doesn't have to think anymore. Metrics should not replace thinking, they should be inputs into thinking how to improve as a team.


Metrics Identified and Discussed

Productivity Metrics

  1. In Sprint Cycle Time (4) - measuring the turnover of stories throughout the sprint as measured by when a story was started vs. when it was completed. A shorter cycle time is a good indicator. A longer cycle time is a poor indicator.
  2. In Sprint Work in Process (3) - measuring the continuous work in process flow during a sprint relates to the flow of the team. More work in process is a poor indicator. Less work in process is a good indicator. This assumes the lean principle that less work in process leads to higher throughput.
  3. Sprint Completion Bar - measuring sprint points using a stacked bar with 4 elements of Done (green), Not Tested (yellow), Not Coded (orange), and Waste (red). Each element presented in a separate color. The stacked bar shows work sprint over sprint. Longer Done bars is a good indicator. Longer Not Tested or Not Coded are a poor indicator. Any waste (as defined by work that was subsequently removed due to not needed) is a very poor indicator.

Productivity Myths

  1. Completed vs. Commited % (a.k.a. Earned Value)- measuring the complete vs. the original sprint plan commitment expressed as a percentage. Used to say if > 80% then the team succeeds the sprint, less than that the team fails. This is often misused because if pushed, teams will tend to either be safe in their commitment (e.g. lower productivity) or hide unfinished work.This metric can be used effectively if the term "failure" is removed and the team is not pushed to meet a certain percentage.
  2. Velocity - measured by number of points completed per sprint. If used as a productivity measure, it may initially drive the team to increase productivity, but if pushed, will eventually lead to story point inflation. Avoid using velocity to compare different teams. Rather, velocity should be considered as a predictability measurement.
  3. Individual Performance Reviews - Most companies use performance reviews to measure individuals for compensation. These tend to drive individual (anti-team) behaviors that work against team productivity.
  4. Resource Utilization - Measuring individual percentage busy-ness factor. Teams that are pushed to utilize each resource fully tend to locally optimize their teams - which means their whole system is sub-optimized. This is a lean principle trying to increase team throughput requires sub-optimizing individuals.
  5. Business Value measuring performance - Attempting to measure productivity through delivery of business value counters a typical prioritized backlog approach. If properly prioritized, higher valued stories will be at the top of the product backlog and thus early sprints will deliver more value. If value is used to measure productivity - it will lead to dysfunction when lower valued, but higher complexity stories enter the backlog. Rather, this metric should be used to validate the prioritization of the backlog - not performance of the team.
  6. Source Lines of Code (SLOC) - measured by the number of lines of code to determine productivity. A higher number does not necessarily translate to higher productivity - only more code. Using this as a measurement will go against the principles of "simplest thing possible" and "re-factoring" to keep code simple. A productive story may actually reduce code, not increase it.


Quality Metrics

  1. Technical Debt Points (12) - measuring the volume and throughput of technical debt to determine the quality evolution of a product. A higher number of points is a poor indicator. A lower number of points is a good indicator. This can be used with a technical debt limit - if the debt exceeds a certain value, the team will be put into a forced debt reduction format. This can also be used in a stacked bar sprint over sprint showing technical debt vs. new stories.
  2. Running Automated Tests (4) - measuring unit and functional automated tests that are passing each sprint. A higher (and growing) number is a good indicator. A lower (or flattening) number is a poor indicator. This metric was detailed by Ron Jeffries. Paired with code coverage metrics, it can drive both productivity and quality behaviors.
  3. Post Sprint Defect Arrival (3) - measuring the number of defects that are found after the sprint they were initially developed. A higher number is a poor indicator, a lower number is a good indicator. This is a leading indicator of quality in that it attempt to predict released quality based on incremental sprint quality. Reminder - this is a team metric - not a QA only metric.
  4. Post Release Defect Arrival (1) - measuring the number of defects found after release to customers. A higher number is a poor indicator, a lower number is a good indicator. This is a lagging indicator of quality meaning that quality is not measured until after release. Reminder - this is a team metric - not a QA only metric.
  5. Root Causes Fixed - measuring the number of defects that defined and fixed a root cause. A higher number is a good indicator, a lower number is a poor indicator. This works well in a sustaining team environment to drive fixing core issues, not just the symptoms.


Quality Myths

  1. Number of Test Cycles (1) - measured by how many times a story or sprint iterates between development and test. As presented, it was indicated as a lower number is a good indicator. This metric, however, could lead to fewer feedback loops.
  2. QA Story Points - measure stories in separate engineering and quality points. This allows QA teams to better predict their productivity and throughput. The concern is that this separates the engineering and quality teams going against the Scrum principle of cross-functional teams. It also limits the engineering/test conversation which is critical to effective Scrum teams.
  3. Quality Number of Defects - measuring Quality and Quality productivity by the number of defects they find. A higher number is thought to increase quality and measure productivity, however it goes against the Scrum team principle of whole team. It is better to encourage both QA and developers to work towards story completion - not
  4. Quality Checked After Development - not really a metric, but not a good thing to do.


Predictability Metrics

  1. Enhanced Burndown Chart (12) - measuring both velocity and scope change throughout a release as a measure of meeting a release goal - predictably. This metric can be combined with cost metrics (below) to measure predictability in project costs.
  2. Velocity (4) - essentially a subset of the metric above.
  3. Cost per Story Point (2) - measuring the $ per story point by knowing the number of people on the team, the sprint length and the story point velocity to determine the average cost per story point. This can also be used to measure capitalized vs. non-capitalized expenses in IT firms by separating the story points which are capitalized and non-capitalized on projects. It is recommended to use a average number for both costs and velocity by using a number (3) of sprints.
  4. Hours per Story Point - measuring the average number of hours estimated per story point. Used to determine if the estimated labor to implement story points is changing over time. For example, a team that starts out at 10 hours per point on average migrates to 20 hours per point shows that the point value is decreasing over time. This metric can easily be misused either by the team in estimating points using the reverse of this metric or by leaders pushing teams to decrease their estimated hours per point (assuming they are making them more efficient). This should only be used to evaluate predictability. Other methods are to use a "reference" story, or to create a estimated labor hour range per story point rather than a specific number.


Value Metrics

  1. Business Value Delivered (7)- measuring the value delivered each sprint based on assigning a business value to each item in the backlog. A higher number is a good indicator.  This can also be an employee motivator because teams want to deliver value and feel good about being able to. As mentioned above, this can be dysfunctional if teams attempt to sustain business value over the course of multiple sprints because if the backlog is prioritized correctly, business value should be lowering over the course of the release.
  2. Customer Satisfaction Survey (6) - measuring feedback quantitatively and/or qualitatively from customers and other stakeholders through a regular (quarterly) survey. Various feedback can be gathered on quality, predictability, productivity of delivery, support, appropriateness of new features, etc.
  3. Employee Satisfaction Survey (3) - measuring the qualitative and quantitative feedback internally from employees with a regular survey (quarterly) and tracking the results over time. This can measure effectiveness at roles, quality of work/life balance, teamwork, product definition, process adherence, feedback, happiness value, etc. A happy workforce tends to increase productivity due to people wanting to be at work and contribute to team results.


Comments (0)

You don't have permission to comment on this page.