Probabilistic Forecasting

Probabilistic Thinking is, according to

…essentially trying to estimate, using some tools of math and logic, the likelihood of any specific outcome coming to pass. It is one of the best tools we have to improve the accuracy of our decisions. In a world where each moment is determined by an infinitely complex set of factors, probabilistic thinking helps us identify the most likely outcomes. When we know these our decisions can be more precise and effective.

Because Estimation is very difficultperhaps impossible, and often misused (Ron Jeffries), an alternative approach to answering the question “When will it be done?” is Probabilistic Forecasting, which is part of the broader NoEstimates approach to management and risk mitigation.

Probabilistic Forecasting relies on using data to inform decision making, typically in confidence intervals and Monte Carlo simulations. Why do we prefer data? As Douglas Hubbard writes in How to Measure Anything:

These decisions should be modeled quantitatively because (as we will see) quantitative models have a favorable track record compared to unaided expert judgment.

In Adaptive Leadership, Jim Highsmith writes:

To a manager’s question “How much will project Zebra cost?”, the comment “I don’t know” is usually an unacceptable (but often the best) response. A more specific answer, such as “Well, somewhere between $2 million and $6 million,” would be rejected in many, if not most, organizations. The project manager who says, “Project Zebra will cost $3.2468 million,” will be hailed as “with it.” Now, everyone knows down deep that this figure is fantasy, but they are comfortable with it because they know (1) there is an off chance it will be right or (2) it will be wrong but we can deal with that when the time comes. People seem to be more willing to accept a highly questionable figure over a range of numbers that delineate the degree of uncertainty.

One of the principles of the Declaration of Interdependence (DOI; http://www.agileleadershipnetwork. org) addresses the topic of uncertainty directly: “We expect uncertainty and manage for it through iterations, anticipation, and adaptation.” An approach to uncertainty must be multifaceted—no single strategy or practice will be enough.

Having an agile mindset means accepting uncertainty about the future as a way of dealing with the future. Any development effort should be a balance between anticipation (planning based on what we know) and adaptation (responding to what we learn over time). Managers, and teams, need to find the balance point—how much time we spend investigating requirements, technology, and other issues in the beginning of a project versus actually developing product features (working software) and adjusting/adapting to new information as the project unfolds.

Traditional teams attempt to drive out uncertainty by planning and analysis. Agile teams tend to drive out uncertainty by developing working software in small increments and then adjusting. Traditional teams often think they have mitigated much of the uncertainty, when in fact a high degree of uncertainty still remains in the project.

Sources of Variation

Many factors impact how long work takes to be completed, including but certainly not limited to essential complication (“effort”). We call these factors sources of variation. Among them are:

  • Multitasking/Context switching (Duarte calls this “focus factor”)
  • Work in Progress (strong lead indicator)
  • Dependencies
    • Team
    • System
  • Team composition
  • Availability of specialists
  • Blockers
  • Selection policy (urgent cards “jump” over other WIP, incurring flow debt)
  • Collaboration policy
  • Utilization policy
  • Time spent estimating! 
  • Waiting for availability
  • Rework
  • Stages in team development (Tuckman)
  • Steps/ handoffs
  • Essential complication
  • Accidental complication 
  • Technology/ domain/ product
  • Specialization

Reference-Class Forecasting

Reference-class forecasting is a way of predicting future performance by looking at actual performance of implemented comparable projects.

From Troy Magennis:

Reference-Class Forecasting, described in works by Amos Tversky and Daniel (Kahneman & Tversky, Prospect Theory: An Analysis of Decision under Risk, 1979) (Kahneman & Tversky, Intuitive prediction: Biases and corrective procedures, 1977)… is to find similar past situations, use those as the base level and adjust your case up and down based on specific context and circumstances. Similar work and practical conclusions come from Bent Flyvbjerg, who looks at how major infrastructure projects have performed poorly versus budget (Flyvbjerg, 2006).

Beyond its efficiency, there is real science behind why this way of estimating will perform better than others. It forces the group to take an “outside-view” of the feature and avoid overly considering specifics of this feature before understanding how much work things like this feature have historically endured. It avoids the group falling into a bias known as the Planning Fallacy.

Magennis again:

One alternative is to use historical measure lead-time for similar items. Lead time is a measure from when someone committed (or first heard, definitions vary) to when (that same?) someone said it is done. The historical times to achieve these boundaries does incorporate system delays and factors, because they impacted the prior work. In essence, the item itself isn’t estimated, it is allocated an estimate based on similar historical items. This type of forecasting is used in other industries… Independent of any other mathematic process for planning, consider ALSO using reference class forecasting as a pre-cursor and a double-sanity-check to encourage everyone to understand system delivery time versus individual “do my bit” time.

To do this, keep a prior set of stories or features ordered along a wall or table. Get the team to consider the item in question and decide where it fits along that continuum. The size or time estimate should be similar to those items closest to the item in question. I find this works well for item size and time estimates.

The easy way to think about reference class is in terms of house sales. If we want to forecast how much a house down the street that just went on the market will sell for, we would use reference class data (aka comparables), like:

  • Number of bedrooms/baths
  • School district
  • Recency (the last few houses sold)
  • Style (e.g., brick)
  • Etc.

The art is to get enough data points (if we had only one comp, it wouldn’t help much) and the right variables (does roof-shingle color matter? Probably not). So that’s the tension. But since we don’t know, and since we have a good amount of data, we can run multiple reference-class forecasts and get a range.

Case Study

A client’s business group wanted to know how long it would take them to deliver a new initiative/project, one that had only been approved but not yet started. We used a reference-class forecast to help them answer the question probabilistically. All we needed was an appropriate reference class to use.

We started with the start and end data for 260 projects dating back to 2012 and generated a scatterplot chart showing delivery duration (aka cycle time) over time. This allowed the group to have a probabilistic sense of when they might expect the new project – without estimating stories, features, etc. – because the historical project data took into account all of the factors that contribute to delivery time (not simply how long it takes a dev team to push code to the infrastructure/deployment team).

Some notes:

  • This is based on data from 260 projects going back to 2012, the latest having finished this month. The start and end dates are based on people billing hours to those projects, so the actual project completion dates may be a bit different.
  • The 85th percentile is 688 days. That means that we finish 85% of our projects in 688 days (1.9 years) or less.
  • Depending on what confidence interval our stakeholders prefer, we may choose to plan at different intervals (95th is 878 days, 50th is 383).
  • The first green trendline is at a one-year moving 85th percentile. You can see that our projects are progressively taking longer to complete (they’re back to 2016 durations). You can see how the spread of dots is increasing top to bottom, an indicator that we’re becoming slightly less predictable.
  • The second scatterplot focuses only on 2020, and the green trendline is at a 90-day moving 85th percentile. For this calendar year, our 85th percentile is 768 days. The two recently completed projects account for the slight decrease in cycle time.
  • We can now further refine/filter this list of 260 projects to get a more appropriate reference class to the Business Card project. Some variables we discussed were team size, number of features, tech stack and dependencies.
  • As we increasingly move from a project-based approach to a product-based one, we will instead use feature duration to do these (rather than project duration).

Relevant Laws, Formulas and Analogies

  • Little’s Law (or Formula): describes the relationship between wait time (or Delivery Time), queue size (or Work in Progress) and processing rate (or Throughput).
  • Parkinson’s Law: the adage that “work expands so as to fill the time available for its completion.” It is sometimes applied to the growth of bureaucracy in an organization.
  • Hofstadter’s Law: “It always takes longer than you expect, even when you take into account Hofstadter’s Law.”
  • Amdahl’s Law: “a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved.”
  • German Tank Problem

Sources and Resources

Create a free website or blog at

Up ↑

Create your website with
Get started
%d bloggers like this: