AI review debt is the bottleneck nobody's measuring

AI can write 10x more PRs now and the team is still checking them at the same pace as last year.

That gap? It's growing every sprint. And almost nobody is tracking it.

The metric everyone's celebrating is wrong

Teams are excited about the number of PRs. "We delivered a PR volume 40% higher this quarter." Nice! But did someone check them?

AI coding assistants write remarkable code. They don't write extra reviewers. The bottleneck vanished. It just shifted downstream.

DORA doesn't capture this (yet)

DORA metrics measure lead time, change failure rate, and more to quantify engineering productivity.

However, there is a blind spot here. Lead time is the clock from first commit to production. If a PR sits in a review queue for three days because every senior engineer is buried under a pile of AI-generated diffs, that shows up as slow lead time, but nobody's attributing it to the right cause.

The way the system fails is not obvious:

→ AI generates more PRs, faster
→ Review queue balloons
→ Reviewers skim instead of reading carefully
→ Change failure rate creeps up
→ Team blames "quality issues" instead of recognizing a capacity problem

You're not shipping faster. You're just creating work faster. That's not the same thing.

AI review debt is real

The term "AI Review Debt" was recently coined by Sumant Thakur on his Substack, and it couldn't be more accurate. Any AI-generated PR that enters a review queue without an associated expansion of the review capacity is debt. It grows unnoticed.

I think most teams are already feeling this. Senior engineers are overwhelmed. The juniors are generating PRs with Copilot but don't yet have the context to review each other's work meaningfully. Hence, three team members end up being the bottleneck for everything.

And unlike regular tech debt, nobody's putting this on a roadmap. There's no Jira ticket for "our review pipeline can't keep up with our generation pipeline." 😅

What actually helps

I don't think there's a clean solution yet. However, some things appear to be heading in the right direction:

→ Measure review queue depth and review cycle time separately from lead time. If you can't see the bottleneck, you can't fix it.
→ Stop celebrating PR count as a productivity metric. Merged PRs matter. Open PRs are inventory, not output.
→ Invest in review tooling with the same energy you invested in generation tooling. AI-assisted review is coming, but most teams haven't even explored what's available today.
→ Set explicit review capacity limits. If a human can thoughtfully review maybe 4-5 substantial PRs per day, that's your throughput ceiling. Plan around it.

The uncomfortable truth here is that adding AI coding and tools into the mix without a reconsideration of review workflows simply shifts the pressure from the writers to the reviewers. That’s a problem with the team design, not the tooling.

The real risk

What worries me about this is that when reviewers are very busy, they might not decline as many PRs but actually merge them quicker.

This is the least desirable result. You get the illusion of velocity with the reality of degraded quality. The number of failed changes increases. People lose confidence in the system. And six months later everyone's wondering why the product feels fragile.

The bottleneck moved. The org chart didn't.

If you doubled your PR output this year because of some amazing AI tools and technology, here’s one question: did your review capacity also double? If the answer to that is indeed no, you’re accumulating debt you haven’t named yet.

I'm curious to know what strategies your team is using to manage the review load as AI generated PRs continue to increase? Are you doing anything differently, or just white-knuckling it?