TL;DR: An agent gateway is a control layer specifically built for AI agents - handling authentication, routing, policy enforcement, observability, and orchestration for agent-to-model and agent-to-tool communication. An AI gateway routes LLM requests. An agent gateway governs autonomous agent behaviour. Once your agents start calling tools, spawning sub-agents, and running multi-step workflows, you need both.
For the first six months, our AI gateway solved every problem we threw at it. One endpoint for all LLM providers, rate limiting per team, cost attribution, guardrails on prompts and responses. It worked cleanly and we were happy with it.
Then we started deploying agents.
The first one was simple - a support triage agent that classified incoming tickets and routed them to the right team. It called one model, occasionally fetched a Confluence doc via an MCP tool, and wrote a label back to Jira. Fine. No problems.
The second one was a data pipeline agent that pulled from BigQuery, ran transformations, called two different models depending on the task type, and spawned sub-agents for specific segments of the work. That's when things got interesting in the wrong way.
The AI gateway saw all of this as a stream of LLM requests. It had no idea that request #47 was a sub-agent spawned by the parent agent that started request #12. It couldn't tell us what the agent had decided to do between step 3 and step 7. When the sub-agent went into an unexpected loop and burned through our token budget in 40 minutes, the gateway flagged the cost spike after the fact - there was no per-agent circuit breaker that would have caught it at step 4.
That's the problem an agent gateway solves. And it took us longer than I'd like to admit to realise these were genuinely different infrastructure requirements.
What is an Agent Gateway?
An agent gateway is a centralised control layer that sits between AI agents and everything they interact with: LLM models, external tools, other agents, and internal APIs.
The definition that clicked for me: think of what an API gateway does for microservices - centralised auth, routing, rate limiting, observability. Now imagine that same concept, but designed specifically for autonomous agents that maintain state across multiple steps, can spawn other agents, and interact with the world through tools rather than just returning text.
An API gateway handles a request and returns a response. An agent gateway handles a workflow - a sequence of decisions, tool invocations, model calls, and sub-agent delegations that might run for seconds or hours, where each step's output feeds the next.
The fundamental difference: LLM requests are stateless. Agent workflows are stateful. That distinction drives almost every architectural decision in an agent gateway.
How an agent gateway works?
When an agent makes a request through a gateway, here's what happens:
1. Identity and auth
The gateway authenticates the agent - not just "is this a valid API key" but "which agent is this, which user or service account owns it, and what is it permitted to do?" This matters because in a multi-agent system, a sub-agent spawned by a parent agent should only inherit the permissions the parent was authorised to delegate, not the parent's full access. OAuth 2.0 identity injection ties every action to a verified identity before anything reaches a model or tool.
2. Policy enforcement
Before the request reaches its destination, the gateway evaluates policies: does this agent have access to this tool? Has this workflow exceeded its token budget? Is this request within the agent's defined scope? These checks happen on the hot path not as a post-hoc audit, but as a gate before execution.
3. Protocol-aware routing
Unlike HTTP requests that a standard API gateway understands natively, agent traffic typically uses protocols like MCP (Model Context Protocol) or A2A (agent-to-agent). An agent gateway understands these protocols - it can route MCP tool calls to the right server, multiplex multi-agent conversations, and handle server-initiated messages (SSE, streaming updates) that agents send back to clients. A generic reverse proxy can't do this; it doesn't understand what an MCP session is.
4. State and session tracking
The gateway maintains context across the agent's workflow. It knows that request #47 belongs to the same session as request #12. This is what enables step-level observability — rather than a flat list of LLM calls, you get a trace that shows the agent's full execution path: which model was called at each step, which tool was invoked, what the intermediate results were, and where the workflow branched.
5. Circuit breakers and execution controls
If an agent goes into a loop - calling the same tool repeatedly, or spawning sub-agents that spawn more sub-agents, the gateway detects this and intervenes before it becomes a $400 Tuesday incident. Hard budget caps apply per agent, per workflow, per team. Retry limits and timeout controls apply at the workflow level, not just per individual request.
6. Audit trail
Every agent action, tool invocation, model call, and sub-agent delegation is logged with structured metadata — which agent, which step, which user identity, which tool, what the parameters were, what the result was. This is what a compliance audit actually needs: not "model X was called 4,000 times" but "agent Y, acting under user Z's identity, invoked the delete_record tool on table T at 14:23:07 with these parameters."
Agent gateway vs AI gateway vs API gateway
This is the question I get most often when I explain what we built, so let me just address it directly.
| API Gateway | AI Gateway | Agent Gateway | |
|---|---|---|---|
| Designed for | REST/HTTP microservices | LLM requests (prompts → responses) | Autonomous agent workflows |
| Traffic model | Stateless request/response | Stateless request/response | Stateful, multi-step, long-running |
| Protocol understanding | HTTP, gRPC, REST | OpenAI-compatible API | MCP, A2A, JSON-RPC, SSE |
| Auth model | Service-level API keys | Per-user/team API keys | Per-agent identity with delegated permissions |
| Rate limiting unit | Requests/second | Tokens/minute, spend/team | Budget/workflow, steps/session |
| Observability | Request logs, latency | Token usage, cost per model | Step-level traces across agent lifecycle |
| Intervention capability | Block/allow per request | Budget caps, content guardrails | Circuit breakers, loop detection, workflow pause |
| Knows about tools | No | Partial (guardrails on tool calls) | Native — MCP tool registry, per-tool policies |
The short version: you need all three, and they live at different layers of the stack. The API gateway handles your microservices. The AI gateway handles your LLM traffic. The agent gateway handles your autonomous agents. In practice, the AI gateway and agent gateway are often the same system — but they need to be the same system that was designed for both, not an AI gateway with agent features bolted on.
The specific things that break without one
If you're running agents now and thinking "we don't have an agent gateway and things seem fine," here are the four things that will eventually catch you:
Credential sprawl in multi-agent systems. Each agent needs access to tools, models, and APIs. Without a gateway, each agent manages its own credentials. In a system with 12 agents, that's potentially 12 × 6 credential relationships to track, rotate, and revoke. When an agent is decommissioned, you're hunting down API keys across every system it ever touched.
The M×N integration explosion. Each agent that needs to call a tool needs a direct integration with that tool. With 10 agents and 8 tools, that's 80 potential integration points to build, maintain, and monitor. An agent gateway — specifically, an MCP gateway layer within it — collapses this to each agent knowing one endpoint and each tool being registered once.
No visibility into what agents actually did. Your AI gateway can tell you "4,000 model calls happened today." It cannot tell you "the data pipeline agent made 12 calls to BigQuery, then called GPT-4o three times with the results, then spawned a sub-agent that made 6 more calls before producing its output." That second kind of trace is what you actually need to debug a production agent incident.
Runaway workflows. Agents can loop. They can spawn sub-agents that spawn more sub-agents. They can get stuck in a retry pattern that burns tokens indefinitely. Without a circuit breaker at the workflow level, the first you hear about it is the cost spike notification — after the damage is done.
What we ended up using
After hitting walls three and four simultaneously (a runaway sub-agent loop concurrent with an audit request we couldn't answer), we evaluated our options and ended up on TrueFoundry's Agent Gateway.
A few specific things that mattered:
Framework-agnostic registration. We have agents on LangGraph, one on CrewAI, and two custom HTTP agents. The agent gateway supports all of them through a single registration interface. Once an agent is registered, it gets a governed identity, shows up in the central registry, and its traffic flows through the same policy engine regardless of what framework built it.
Per-agent identity with OAuth 2.0 injection. Every agent action is tied to the authenticated user or service account that owns that agent's session. When a sub-agent is spawned, it inherits only the delegated permissions its parent was authorised to pass. The "over-privileged sub-agent" pattern where a sub-agent ends up with broader access than the human who started the workflow is closed by design.
Step-level traces across the full workflow. The observability goes beyond model call logs. We can see the full execution trace: every decision point, every tool invocation, every sub-agent delegation, every intermediate result. When something breaks in production, debugging time dropped significantly instead of inferring what happened from model call logs, we read the actual execution trace.
Workflow-level circuit breakers. Budget caps apply per workflow, not just per team. Loop detection fires at the gateway level before a runaway agent creates an incident. We set a max-steps limit on long-running workflows as a safety backstop. These controls didn't exist in our AI gateway and would have been genuinely painful to implement at the application layer across 14 different agent implementations.
MCP tool governance in the same system. Because TrueFoundry's agent gateway sits alongside the MCP gateway, tool access policies for agents live in the same control plane as model access policies. When a new agent is onboarded, we define both in one place: which models it can call, which tools it can use, what its budget is, who can modify it. One system, one audit trail.
The honest tradeoff: there's more to configure upfront than a simple AI gateway setup. If you have two agents and they're not doing anything complex, it's probably overhead you don't need yet. The inflection point for us was around four agents with overlapping tool access and a compliance requirement that needed a proper audit trail. That's when the additional configuration started paying back.
When you actually need an agent gateway
The question isn't whether agents need governance - they do. The question is when the governance complexity justifies dedicated infrastructure.
You probably don't need an agent gateway yet if:
- You have one or two simple agents with no tool access or sub-agent delegation
- Your agents are purely internal, no compliance requirements, low stakes
- You're still in the prototyping phase and governance can wait
You need an agent gateway when:
- Multiple agents share access to the same tools and you need to control who can do what
- A runaway agent loop could cause a real operational or financial incident
- Someone (security, compliance, a customer) has asked you for an audit trail of agent actions
- Sub-agents are involved and you need to reason about delegated permissions
- You're deploying agents across multiple teams and need consistent governance without each team re-implementing it
The M×N integration explosion is usually the first concrete forcing function. The audit trail request is usually the second.
What's your current agent setup - are you handling governance at the application layer or have you moved to a dedicated agent gateway layer? Would love to hear how teams are solving the runaway loop problem without infrastructure-level circuit breakers. Drop it in the comments.













