For two years we argued about which model writes the best code. That argument is mostly settled, and it created a new problem. The agent now produces working code faster than any human can read it.
So the bottleneck moved. It used to be writing. Now it is trusting.
Most teams react in one of two ways, and both are wrong.
- Review everything the same way. Every change waits in the same queue for the same full human review. A one-line copy fix sits behind a payments refactor. The speed you gained from generation is lost to the backlog.
- Review almost nothing. The agent seems confident, so changes get waved through. Then authentication, money, or regulated data ships without the judgment it needed. Confidence is not correctness.
The idea: graduate the review
Graduated Review Authority (GRA) drops the single fixed path. Instead, the amount of review a change gets scales with two things: the risk it carries and the strength of the evidence that it is correct.
A trivial change that passes every automated check does not need the same scrutiny as a change to how payments are calculated. GRA makes that difference explicit and consistent.
The core rule is one line:
Generation is not authority. The pipeline is the authority.
Anyone can write code, a person or an agent. Producing it does not grant permission to ship it. Approval comes only from passing validation that was fixed before the change was written. A model can propose. It cannot grant itself passage.
How a change earns a lighter review
GRA weighs three kinds of evidence, strongest first:
- Deterministic gates. Types, tests, security scans, policy checks. Same answer every time, so a failure blocks outright.
- Agent review. Useful but probabilistic, and only counts when the reviewer is independent of the author.
- Human review. Reserved for what machines cannot settle: risk, ambiguity, architecture, intent.
Some categories always route to a person no matter how clean the gates are: auth, money movement, regulated data, destructive migrations, and broad architectural change.
And the authority is graduated, not granted once. A class of change that builds a clean track record earns a lighter touch. One that produces regressions or escaped defects earns more oversight. Crucially, the system can only ever tighten review below the floor that risk demands. It never loosens past it.
Why I think this matters
The honest version of "AI-native delivery" is not "let the agent ship." It is "spend human attention where it actually changes the outcome." GRA is one way to decide where that is, on purpose, instead of by reflex.
The full write-up is at gra.dev. It is a single page, no signup. I would genuinely like to hear where you think it breaks.













