Do You Assume Correlation?

One of the core underpinnings of the practice of estimating effort is the assumption that effort strongly correlates with time. But does it? That’s a question that I always ask teams who are grappling with the concept of NoEstimates.

Mattias Skarin, in his helpful book Real-World Kanban, shared his own research about the correlation of upfront estimates and delivery times. Here’s what he writes:

The interesting question here is how the actual delivery times correlate to developers’ upfront estimates. To help answer this, we had developers estimate sizes using the following buckets: small (two to three days), medium (one to three weeks), and large (longer than one month). We then correlated the initial sizing estimates with lead-time output. Take a look. Is the initial sizing a good predictor of when you can expect to get your stuff? In our case, the surprising truth was a resounding “no!”

He includes this supporting chart:

estimation vs lead time

I’ve since replicated his research with my own teams and found the same results: Weak correlation between upfront estimates and delivery times. One “high-performing” team even generated a negative correlation between their feature estimates and actual delivery times. Think about that for a second: The smaller they estimated their features to be, the longer it took them to deliver, and vice versa! And again, this wasn’t a bad team; quite the contrary.

A note about definitions: I use Delivery Time to refer to the time it takes from commitment date to delivery date (however that is defined). A couple of important components are:

  • The time you’re referring to when you estimate is the same as the end-to-end duration of delivery time. For instance, if you make your upfront estimates based on the time from when a developer starts on a story to when it is available in a QA environment, that needs to be understand as the time you’re trying to estimate.
  • However, since no one requesting software really cares how long it takes to go from Dev to QA, the best understanding of delivery time is from commitment to production. Most business people want to know when something will be “done” (or at least should) based on whether it’s in production.

With those considerations in mind, I invite you to do your own correlation analysis. It doesn’t require you to change anything about your process but to simply observe and record what you’re already doing: The commitment and delivery dates, along with the upfront estimates. Feel free to add to my public data set. Then you can see for yourself whether the assumption of correlation holds true for your team. If not, you may want to consider a different approach!

What We’re Learning from the NoEstimates Game

RPS_Image-290
NoEstimates workshop at the 2018 LeanAgileUS conference

Having facilitated the NoEstimates game for more than a year, in many places around the world with differing groups — most recently at the outstanding LeanAgileUS conference — I’ve observed some patterns for success. Though these “winning strategies” may at first appear to be useful only if you want to play the boardgame, I believe that they likely translate into real-world success in intangible-goods (e.g,. software) delivery processes.

(Spoiler alert: If you haven’t played the game yet but plan to, you may not want to read the following — unless, of course, you want to cheat your way to victory!)

To remind you of some context: The game is a simulation of a group of interdependent work teams, each with an identical backlog of 25 work items. The teams play in simulated days, and, depending on how long the session is, usually play between 15 and 30 days. Teams earn “money” based on how much value they deliver, according to the following table:

Delivery Time (Days) Value ($)
1-2 days $700
3-5 days $400
6+ days $300
Urgent -$100 per day

Using data that I’ve collected from the teams over several sessions, I’m seeing that the teams who earn the most money per day are also the ones that are most predictable. That is, while they can’t do anything about some of the variation (e.g., the essential effort required to do the work), they either consciously or unconsciously follow common policies that reduce other kinds of variation. This appears to support Dan Vacanti’s idea that “doing predictability” is a rewarding business strategy.

Teams typically earn the most value per day and deliver most predictably by following these policies:

  • Limit work in progress: We generally know that this is a helpful policy. The learning for me with the game is that the optimal work-in-progress levels are even lower than one might expect, typically half (or fewer than) the number of people on the team. Even four or five-person teams who follow a single-piece flow policy don’t trade off much, if any, throughput. For small teams, the difference between having three-to-four WIP and one-to-two WIP can yield twice as much revenue per day in the game!
  • First-in, first-out: It’s easier to do this when you’ve got low WIP levels, of course. And single-piece flow is the natural extension of this policy. The game includes a few random “urgent” work items, which cost the team $100 each day they’re in progress, so they’re highly incentivized to “jump the queue” with these cards. Even so, the teams that have low WIP (a conWIP of one or two) are able to continue to honor their FIFO policy, which creates better predictability, throughput and value delivered. (Dan Vacanti has written about this.)
  • Cross-functional collaboration: Probably because the game makes both the scarcity of effort available and the work highly visible, players almost naturally “focus on the work, not the worker.” Rather than optimize in their specialty areas, players on successful teams instead work outside their specialties, where they get only half credit for their effort. (This appears to support the research that Dimitar Bakardzhiev has done.)
  • Flow over utilization: Winning teams generally don’t mind not fully utilizing all of their capacity, preferring to leave some effort on the table (literally, in the form of effort cubes) rather than pulling in enough cards for everyone to “stay busy.” One of the event cards attempts to entice teams to improve utilization, but nearly every team chooses not to.
RPS_Image-289 cropped
This team executes a strategy of limiting WIP to fewer than half the number of team members at the 2018 LeanAgileUS conference.

Although these lessons are from simulations, I think that, to the extent that the game emulates real work, the lessons can be extended into our actual work environments. In general, these gameplay experiences — because they are rooted in the incentive to optimize value — tend to manifest the mantra “Value trumps flow, flow trumps waste reduction.” So why to teams playing the game seem to know these lessons almost intuitively? The reasons aren’t necessarily anything that can’t also be done in real life: Connect more directly to the value feedback loop (John Yorke’s recent post on verifying value of user stories helps with this) and use flow metrics (e.g., delivery time depicted on a scatter plot) to make your process more predictable. “Keeping score” — of things that matter, anyway — doesn’t need to be limited to games, after all.