Why Engineering Estimates Are Always Wrong — And How to Stop Planning Around Them

Your PM asks the question every team has heard a hundred times: how long will this take? Your lead engineer thinks for a moment and says: two weeks, maybe three. You write it down as two weeks — because the roadmap needs a date — and schedule the release demo for week three.

Six weeks later, you're still waiting.

What's surprising is how surprised everyone seems. This has happened before. It will happen again. And yet the post-mortem will almost certainly conclude that the team needs to estimate more carefully next time.

It won't help. Not because the team is incompetent, but because the assumption underneath the whole exercise — that complex software work can be predicted with point estimates — is structurally flawed. And the research has been saying so for decades.

The numbers: software estimation is structurally broken

In a 2024 analysis of estimation data spanning multiple methodologies — story points, function points, and time-based estimates — researcher Derek Jones found that only 30% of software estimates are accurate. The remaining 70% miss their target, most commonly by underestimating.

What makes that figure land harder is the follow-up finding: developer estimation accuracy does not improve with practice. More experience, more estimation rounds, more refinement sessions — none of it meaningfully moves the needle.

The picture at the organizational level is just as stark. A McKinsey study of more than 5,400 IT projects conducted with the University of Oxford found that the average large software project runs 45% over budget and delivers 56% less value than originally projected. Seventeen percent of projects go so badly off-course they threaten the existence of the company.

These aren't outliers from companies with dysfunctional engineering orgs. These are the averages.

Why this is not an engineering problem

The instinct when estimates consistently miss is to fix the estimation process. Add more detail to tickets. Run planning poker. Break stories down further. Require sign-off on estimates before a sprint starts.

These interventions feel productive. They rarely are.

The core issue isn't that engineers estimate carelessly. It's that complex, novel work — the kind most software development actually involves — resists prediction. Every feature touches code that nobody fully understands. Every integration has failure modes that only surface under real conditions. Every dependency has its own untracked backlog.

The Cone of Uncertainty, a concept from software engineering management, illustrates this precisely: at the start of a project, the range of possible actual effort spans a factor of four in either direction. That range only narrows as unknowns get resolved — not as estimates get more granular.

More granular estimates don't reduce uncertainty. They just give the uncertainty a more specific number to hide behind.

Asking engineers to estimate more precisely doesn't change how complex the work is. It just changes how confident the spreadsheet looks.

The planning fallacy: the bias baked into every sprint

Even if uncertainty were somehow containable, there's a second problem: human psychology.

Psychologists Daniel Kahneman and Amos Tversky identified the planning fallacy in 1979. When people estimate how long a task will take, they simulate a best-case scenario — the version where nothing goes wrong, no one gets sick, no other priorities land on the desk mid-sprint. Historical evidence that similar tasks took longer gets ignored in favor of the imagined smooth path.

This isn't a personality flaw. It's a consistent cognitive pattern that affects individuals, teams, and organizations alike. And it's amplified in workplace settings by a specific dynamic: estimates are usually given in approval contexts. Someone is asking because they need a timeline for a roadmap, a client promise, or a board slide. The social pressure to give an answer that fits the plan is real, even when the honest answer would be "I don't know."

The result is what researchers call strategic misrepresentation — not deliberate lying, but a systematic drift toward optimism that's hard to separate from genuine uncertainty.

What "estimate better" actually buys you

Most teams respond to chronic estimation misses with one of three moves:

ResponseWhat it looks likeWhat it actually does
More granular estimatesBreaking every story into sub-tasks, estimating hours not daysIncreases estimation overhead; doesn't improve accuracy
Buffer paddingQuietly multiplying estimates by 1.5× or 2× before sharingCreates scope inflation and erodes trust in the numbers
Post-mortem blame"We should have caught this in planning"Produces guilt, not better prediction next time

None of these address the root cause. And the organizational cost compounds: engineers spend more time in refinement sessions that don't move product forward, PMs lose credibility when dates slip anyway, and the team develops a learned helplessness around planning.

This dynamic is closely related to the decision debt problem — teams relitigating the same scope questions every sprint, because the original reasoning was never captured. When estimates break down and scope shifts, the absence of documented decisions means the team has to reconstruct context from scratch, adding delays on top of delays.

Three approaches that actually improve delivery predictability

If more precise estimates don't work, what does? Research and practitioner experience point in the same direction: stop trying to predict the future from first principles. Use historical data instead.

1. Reference class forecasting

Instead of estimating a task by thinking about what it will require, estimate it by looking at what similar tasks have historically taken. How long did your last five API integrations take to ship? Your last three database migrations? Your last sprint with a similar mix of work?

This approach — popularized by researcher Bent Flyvbjerg in infrastructure project forecasting — consistently outperforms bottom-up expert estimation. It works because it bypasses the planning fallacy: you're not simulating the future, you're reading the past.

2. Throughput-based planning

Rather than estimating individual items and summing them, track how many items your team actually completes per sprint — regardless of stated size — and use that rate to forecast delivery. If your team ships an average of eight items per two-week sprint, and your backlog has 32 items, you have four sprints of work. No story points required.

Agile coaches frequently cite sprint velocity as the metric most likely to be unhelpful when used as a commitment mechanism. Throughput — items completed per period, measured over time — tells a cleaner story without the psychological baggage of point estimation.

3. Range-based estimates with explicit uncertainty

When a single number is genuinely needed, replace point estimates with ranges. Not "two weeks" but "two to five weeks, most likely three." This forces an explicit conversation about uncertainty rather than burying it inside a number that sounds more certain than it is.

Teams that adopt range estimates typically see two things happen: stakeholders initially resist (because ranges feel less committal) and then start making better decisions (because they're working with real information rather than optimistic fiction).

Diagnostic: is your team planning around estimates?

Use these questions to assess how estimation is actually functioning on your team:

  1. Can you pull up your team's last 10 sprint completion rates without digging through notes? If not, you're flying without instruments — throughput data is the foundation of everything else.
  2. Are estimates used to make commitments, or to plan capacity? Estimates made as commitments face different social pressure and tend to drift toward optimism faster.
  3. What happens when an estimate is missed? If the answer involves blame or urgent replanning, estimation has become accountability theater rather than a planning tool.
  4. How often does scope change after an estimate is given? Scope changes invalidate estimates — but teams often hold engineers to the original number anyway.
  5. Does your team track how long similar past work took? Without a reference class, every estimate starts from scratch, every time.

If most of your answers reveal gaps, you're not alone. But the path forward isn't a better estimation template — it's a different relationship with uncertainty at the team level.

Understanding how context loss compounds delivery problems is also worth examining: estimates made without full context of past decisions and current constraints miss regardless of methodology. And as adding more people to a late project typically makes it later, the reactive move most teams reach for — scale up — doesn't solve the underlying predictability problem either.

The shift worth making

The goal isn't to make estimates more accurate. The goal is to make decisions with less reliance on estimates being accurate.

That means tracking what your team actually completes, not just what it planned. It means building buffers structurally into your roadmap rather than hiding them inside individual estimates. It means treating a missed estimate as information about the system — a signal about scope complexity or hidden dependencies — not a failure of individual engineering judgment.

Teams that become genuinely predictable don't get there by estimating better. They get there by understanding their own throughput, reducing the size of work items, and making scope decisions earlier. Tooling that surfaces actual delivery patterns — rather than just planned work — is where this kind of discipline starts to feel sustainable rather than heroic.

Frequently asked questions

Why are software estimates always wrong?

Software estimation is structurally difficult because complex, novel work resists prediction. Unknown unknowns — edge cases, dependency behavior, undocumented code — only surface during implementation. Research shows that only about 30% of software estimates hit their target, and this accuracy doesn't improve meaningfully with more experience or more detailed refinement sessions.

What is the planning fallacy in software development?

The planning fallacy, identified by Kahneman and Tversky, describes how people systematically underestimate how long tasks will take by imagining a best-case scenario rather than drawing on historical evidence. In software teams, this is amplified by social pressure to give committed dates that fit roadmaps and stakeholder expectations.

What is reference class forecasting and how does it help engineering teams?

Reference class forecasting means estimating a new task based on how long similar past tasks actually took, rather than building an estimate from scratch. Instead of asking "how long will this API integration take?" you ask "how long did our last five API integrations actually take?" This approach bypasses optimism bias and tends to produce more accurate forecasts than bottom-up expert estimation.

Should engineering teams stop estimating entirely?

Not necessarily — but the purpose of estimation matters. Estimates used for capacity planning and rough sequencing decisions are useful. Estimates used as commitments, held to strict accountability when missed, tend to drive the dysfunctions described above. The practical path for most teams is to move toward throughput-based planning for sprints and use estimates only for coarse-grained roadmap decisions where ranges are acceptable.

Sources / Further reading