I've been building agents that call paid APIs and settle real payments. The moment you let an agent spend money on its own, two questions show up and never leave:
- How do I stop it before it overspends, instead of finding out on next month's invoice?
- Who holds the funds while it does that?
Most of what I found answered the first by quietly taking over the second. To cap an agent's spend, you hand a service your wallet (keys, or a prefunded balance), and it meters from there. That works. But now a third party sits between your agent and its money, and for anything touching real settlement, that trade-off felt backwards: keep control, but only by surrendering custody.
So I built it the other way around.
Enforcement without taking the wallet
The agent keeps its own API key and its own wallet. Nothing is handed over. A drop-in proxy sits in front of the agent's outbound calls and enforces the budget in the request path, so the call is checked before it leaves, not logged after it returns.
import { createPaymentClient } from "@gatewards/agent-sdk";
// the agent's own key; nothing is handed to us to hold
const client = createPaymentClient({ apiKey: process.env.AGENT_KEY, proxy: true });
const res = await client.get("https://api.example.com/data");
// over the cap → throws before the upstream is ever called
When an agent blows its budget, concretely:
- per-call or daily limit exceeded →
429, the request never reaches the upstream - a runaway loop hammering the same resource → the pipeline auto-pauses with
423
No funds moved through us to make that happen. The gateway never generates, stores, or sees a private key. Onboarding an agent returns walletMode: "external", and that's the whole custody story.
The part I didn't expect to matter: dedup
Once every outbound call goes through one place, you notice how often agents in the same fleet make the identical call. Two agents asking the same API the same question inside the cache window, and the second one shouldn't pay twice.
So calls are deduplicated across the fleet. First call is a miss and pays; an identical call within the window is a hit and returns free. In practice that's the gap between a 1284ms paid round-trip and a 49ms cached one. It isn't the headline, but it's the piece that quietly pays for itself as the fleet grows, and it only works because everything already flows through one gate.
When it does pay
The agent pays directly: x402 on Base, USDC, from its own wallet. We're the rail and the governance layer, not the bank. This is on testnet today; mainnet is the next step, and I'd rather say that plainly than imply otherwise.
Where this honestly is
It runs. The governance and dedup above are live and tested. But it's early: pre-traction, still finding the operators it's actually for. I'm writing this partly to find them.
If you're running a fleet that spends real money on outside APIs and you've run into that same wall, I'd like to compare notes: what does your spend enforcement look like right now, and where does it break?













