An internal release agent finished a deploy a little after 2 a.m. and then had nothing it could read.
The dashboards were green. But green is something you see, not something you fetch. The agent's next step, continue the rollout or pause and wait for a human, depended on a judgment that lived only on a screen nobody was looking at. So it did the one safe thing available to it. It stopped, and waited for someone to wake up and confirm what the graphs were already showing.
That gap stayed with me. The pipeline was automated end to end, except for the fifteen minutes right after the deploy, where it quietly fell back to a person. Someone still had to look at runtime signals and decide whether the thing that just shipped was behaving.
I had built a verdict layer for that exact window. It read raw deploy signals: error rates, latency, exception types, deploy metadata. It emitted one of three states. STABLE, WATCH, RISK. The problem wasn't the verdict. The problem was that I had built it for a human to read, and the agent that actually needed the answer at 2 a.m. couldn't get to it.
The second reader I hadn't designed for
I had assumed the post-deploy decision has one consumer: the operator. On a small team that is often true. One person knows the deploy, sees the dashboards, makes the call in a few seconds.
But the moment anything downstream of the deploy is automated, whether a release agent, a progressive rollout controller, or a workflow that promotes a build from canary to full, that automation becomes a second reader of the same decision. And it does not read the way a human does.
A human glances at a latency panel and infers that the bump at 02:04 lines up with the rollout. An agent can't glance. It can't infer from a rendering. It needs the conclusion as a value it can branch on, not a picture it has to interpret.
That is the part I had collapsed. The verdict existed, but it existed in a shape only one of its two readers could use.
Why a dashboard doesn't serve it
A dashboard is a rendering for a reader with eyes, attention, and context. It assumes someone who already knows which deploy went out, which panel matters, and what normal looks like for this service.
An agent arrives with none of that. Handing it a dashboard is handing it a photograph of an answer and asking it to read the answer back. Even when the data behind the panel is available through an API, what comes back is usually more raw signal: time series, event streams, the same inputs the human was interpreting. The interpretation, the part that closes the decision, still isn't in the response. It was in the operator's head.
So the agent is stuck in the same place the human was, except worse, because it can't even do the intuitive pattern-match a tired engineer does at a glance.
What the verdict had to become
The fix wasn't a new dashboard. It was making the verdict a structured object instead of a message.
A human reading the verdict wants a sentence: the checkout API looks fine, move on. An agent reading the same verdict wants fields it can branch on: verdict: WATCH, a decision_tier, the affected_apis, a recommended_action, and the operator_steps to follow.
Same verdict. Two encodings of it. The state tells a human what happened and tells an agent which branch to take. The metadata tells a human what to look at and tells an agent what to do next or when to re-check. Once the verdict carried both, neither reader had to translate for the other.
This is the shift that mattered. Not adding a feature — changing what a verdict is. It stopped being a thing rendered for a person and became a thing two kinds of reader could consume from the same source of truth.
Why a read surface, and why MCP
A structured verdict still needs a doorway the second reader can walk through.
A plain REST endpoint is the obvious one, and it works. An agent can call GET on the latest verdict and read the fields. But agent runtimes increasingly speak a more specific protocol for fetching context and calling tools, and for those runtimes MCP is the native door. So alongside the REST surface there's an MCP server the agent can query the same way it queries any other tool. Ask for the latest verdict, get back the structured object, branch on it.
In practice that looks like one call in the pipeline. The agent finishes the deploy, asks for the latest verdict, reads the state, and decides. A get_verdict call instead of a screenshot nobody opened.
I want to be careful about what this is and isn't. The MCP server is a doorway, not the product. The verdict layer's job is to read deploy signals and decide STABLE, WATCH, or RISK. MCP is just one of the ways the answer gets handed to a reader that doesn't have eyes. The moment you let the doorway become the identity of the thing, you start building an "AI tool" and stop building the verification layer that was the actual point.
A read surface is not a control surface
Here is the boundary that keeps this from going wrong.
The agent reading the verdict is a consumer of the decision, not a thing the verdict controls. Relivio produces the verdict. It does not decide what happens next. The agent, or the policy the team wrote for it, reads STABLE and continues, reads RISK and pauses, reads WATCH and re-checks in a few minutes. That policy belongs to the team, not to the verdict layer.
I learned this the hard way in an earlier version, where the verdict layer reached across that line and tried to hold deploys itself. It quietly eroded trust, because once a layer both judges and acts, you can't tell which hat it's wearing when something goes wrong. Exposing the verdict as a read surface keeps the hats separate on purpose. The verdict is readable. What to do about it stays owned by whoever owns the deploy.
So the MCP surface is deliberately read-only in spirit. An agent pulls a verdict. It does not get told by Relivio what to do, and Relivio does not get to drive the rollout through the back door of "the agent read RISK so I paused it." The agent paused it. The team's policy said to. The verdict was just the input.
What can go wrong
A few failure modes show up once a verdict has a machine reader.
The verdict gets read as a command. An agent reading RISK and hard-stopping every time, with no policy layer in between, turns a judgment into an automatic gate. That's the coupling failure again, just relocated into the agent. The verdict should be an input to a policy, not a trigger wired straight to an action.
The read surface grows write verbs. It's tempting to add "and also let the agent acknowledge, or snooze, or override the verdict through the same surface." Each of those is a control verb sneaking into a read surface. Once they're there, the layer is back to owning decisions it shouldn't.
The protocol becomes the positioning. Because MCP is having a moment, it's easy to start describing the whole thing as an MCP server. Then people file it under "agent tooling" and miss that the value is the verdict, which is just as useful read by a human through a REST call or a Slack message. The protocol is plumbing. The judgment is the product.
None of these are loud failures. They each quietly move the layer back toward being something it was trying not to be.
Where this leaves it
The verdict layer now has two readers it serves on purpose. A human reads a short message and moves on. An agent or workflow pulls the structured object, through REST or through MCP when the runtime speaks it, and branches on a value instead of squinting at a chart.
What it doesn't have is a verdict that only exists as a rendering. The 2 a.m. release agent that used to stop and wait for a human can now read the same answer the human would have read, in a shape it can act on, under a policy the team still owns.
I'm not sure MCP is where most of this traffic ends up; plenty of teams will just hit the REST endpoint and be done. But the underlying shift feels right and a little overdue: a post-deploy verdict isn't only a thing people look at. It's a decision other software has to be able to read.
Relivio is a small verdict layer for the first 15 minutes after a production deploy. It returns STABLE / WATCH / RISK with the affected APIs, a decision tier, and a recommended action, readable by humans and by the agents and workflows downstream of a deploy. relivio.dev













