Debugging Predictability at the Team Level

What the numbers are trying to tell you, and how to get them to say it out loud.

In the last post I wrote about what predictability is and what it actually measures. This one is about what to do when the number looks wrong.

When I first started using this metric, diagnosing an anomaly took real time. I'd read through everything, cross-reference the supporting metrics, form a hypothesis, test it against the data. It was slow. But practice creates patterns, and over time I've learned that there are only a handful of situations that tend to produce the readings you're going to see in most organizations. The root causes fall into recognizable categories. And once you know what to look for, the chain of evidence is usually pretty short.

The patterns I see most often:

The undercommitting team. The overcommitting team. The team that never fully understood the scope. The product manager who cannot say no. The organization whose leadership cannot stay focused long enough to let a team finish anything. And the team that's being eaten alive by quality issues because nobody ever decided that quality was a cultural priority.

There are probably others. But those cover most of what I've seen across a dozen or so organizations over the years.

I'm going to walk through two of them in detail, because the point isn't to give you a taxonomy. The point is to show you how to read the chain of metrics so you can do the diagnostic yourself, whatever pattern you're looking at.

The product manager who cannot say no

I walked into an organization and ran my standard first-week metrics pull: the last six months of sprint data across all the teams. Before I even opened the spreadsheet I already knew one thing: this particular team had a hard time meeting project delivery dates. That was the organizational understanding. Nobody had been able to explain why.

I looked at predictability first. Sprint over sprint, it was declining, and had been for months. It was sitting in the 30% to 40% range. That's not a rough patch. That's a pattern.

The next thing I looked at was velocity. Velocity was flat, maybe slightly increasing. The amount of work this team was completing every sprint was actually quite stable. So the team wasn't falling apart. They were getting work done at a consistent pace. Which meant the predictability problem wasn't about execution capacity.

So I looked at sprint commitments. And that's where it fell apart.

Every sprint, the team was completing less than the original commitment. Originally just consistently less, but over time it grew to be dramatically less. And yet, the commitment on the next sprint was always larger than the commitment on the previous sprint. Sprint after sprint after sprint. The team was behind, and the plan kept getting bigger.

The mechanism was two things happening simultaneously. Anything not completed in a sprint rolled over into the next one. And the product manager responsible for prioritization kept adding new work on top of the rollover. Nobody was removing anything. The pile just kept growing.

This wasn't an engineering failure. The engineers were doing what they said they'd do, roughly speaking, and doing it at a consistent pace. The problem was that nobody was willing to make hard choices about what should be done in a given period of time. The product manager wasn't able to look at the rollover, look at the new requests coming in, and say: these can wait. As a result, the commitment was always unrealistic, the team was always behind, and the project was always going to miss its date because the work that mattered for the project was competing with everything else on the list.

The data made that visible. It took about twenty minutes to read.

The team that never fully understood the scope

Different organization. One of my teams was consistently hitting around 150% predictability. That means every sprint they were completing significantly more work than they originally committed to. You might read that as a good thing. I read it as a question.

Are they just underestimating what they can do? Or is something else happening?

Velocity looked fine: flat to increasing. That ruled out the team falling apart. Then I looked at the percentage of the original plan completed. And here's where it got strange. They were doing more total volume than planned, but they were completing less of the actual planned work than they said they would. More work done, less of the right work done. That's a specific shape.

The next metric: work added to sprints. Every sprint, twenty to thirty points of new tickets were being added. And nothing was being removed. Good sprint hygiene says that when you add work mid-sprint, you should pull something else out to protect the commitment. That wasn't happening here.

At that point I started looking at the actual tickets being added. What I found was that as engineers worked on planned tickets, they kept discovering work that hadn't been captured anywhere. Not tangential things, not scope creep in the traditional sense. Work that was required to complete the project, that nobody had identified before the sprint started. They were adding it because it had to be done. And this was happening sprint over sprint over sprint.

The accountability here isn't really about individuals dropping the ball. This was an early-stage company that had been operating Kanban, just-in-time, figure-it-out-as-you-go. The shift to sprints happened for real reasons: the company had grown, other parts of the organization needed to plan around Engineering, and the API dev tools space runs on external release cadences that don't wait for you. Sprints were the right call. But adopting sprint ceremonies without installing sprint discipline just moves the chaos inside the timebox. The engineers had never been in an environment where deep pre-sprint scoping was expected, and nobody was giving them the time to do it even if they'd known to ask.

The team had developed an intuition about this. They knew from experience that there was always more work hiding in the project than had been identified, so they undercommitted on their initial estimates to give themselves room. The undercommit wasn't pessimism. It was the scar tissue of a team that had never broken the cycle.

Which means the learning wasn't primarily theirs. If the team had never been taught to plan at that level of depth, that's not a performance problem. That's a coaching gap. And the coaching gap belongs to me.

What happens after you have the diagnosis

The metrics get you to the root cause. What happens next is a different problem entirely, and I want to be honest about the distinction.

How I bring the diagnosis into a conversation depends on what hat I'm wearing. When I'm acting as a Scrum Master and running retros directly, I can bring this into the room myself. It's the seed for the entire retro. I'll put the historical charts up, not just the most recent sprint, and I'll ask the team: How did we do? What do you think is happening here? This is a little outside what we'd expect. What's behind it?

The goal is to point them toward the topic and let them find the answer. Usually they will. Their observations might not go straight to the root cause, but they're not wrong. They're seeing real things. You listen, you validate what's real, and then you keep asking why until you get to the thing underneath the thing. It's the classic five whys, except you're doing it in a room full of people who know things you don't.

If they can't get there on their own, I'll shift tactics: here's the chain I followed. I started with predictability, looked at velocity, then looked at what was being added to sprints. And when I looked at the tickets that got added, here's what I noticed. What's the story behind that?

When I'm not in the Scrum Master role, when I'm functioning as Head of Engineering with team managers between me and the teams, the channel is different but the approach is the same. Somewhere in my regular one-on-ones with my managers, I'll shift to: how's the team doing? That's the door. The metrics are fair game from there. I might be steering the conversation toward something I've already seen in the data, but the manager doesn't need to know that. What I'm trying to find out is whether they're seeing what I'm seeing.

If they are, we work through it together. If they aren't, that tells me something about the manager. It also tells me something about myself: either I haven't made the expectation clear enough, or I haven't done enough to show them how.

Here's the part I want to be clear about: I've never hit a situation where following this chain didn't get me to a root cause. The methodology is reliable for the diagnostic. What happens after is not. You can identify exactly what's broken and still watch the team keep doing the thing. People are cats. They're going to do what they're going to do. The best you can do is create the conditions where doing the right thing is easier than doing the wrong thing, and have a different kind of conversation when that doesn't work.

The metrics don't solve the problem. They just make it impossible to pretend you don't know what it is.

I write about engineering leadership and team health in the Engineering Health newsletter on LinkedIn. Search "Engineering Health" to find it.