Your finance team built an agent that cut month-end close time by 40%. Procurement saw the results and built their own for intake-to-PO. Customer service is prototyping complaint resolution. IT operations wants incident triage.
Every team started with the same good intentions. Each chose their own stack. Finance logs to a spreadsheet. Procurement picked a different model gateway. Customer service stores context in local files. IT operations designed a custom approval mechanism.
Individually, every decision made sense. Collectively, you now have four agents that can't share tools, don't follow consistent access controls, produce incomparable audit logs, and can't be evaluated against each other. What felt like progress is actually fragmentation.
This is the moment most organizations discover the hard truth: you're not building multiple agent applications. You're failing to build an enterprise agent platform.

The reference architecture separates what agents do (runtime), what they know (context), and how they're controlled (governance) — three layers that scale independently.
The One Distinction That Changes Everything
The most common mistake in enterprise AI is confusing an agent application with an agent platform.
An agent application solves a specific business problem: AP exception handling, procurement intake, complaint resolution. It contains workflows, prompts, tools, and context unique to that domain. Users see it. They love it. They want more.
An agent platform is invisible to business users. It provides the shared capabilities every agent needs: identity and access control, model routing, tool registry, context retrieval, observability, evaluation, deployment, and policy enforcement.
Without this distinction, companies go in one of two wrong directions. Some build their first agent with so many custom components that it can't be reused. Others spend months building a generic platform that no use case ever adopts.
The right path is a minimum viable platform — born from real use cases, built with consistent architecture, and grown as needs emerge.
What the Runtime Layer Actually Needs
The runtime layer is where agents execute. It's not just "call a model and return an answer." Enterprise execution requires five components that most teams skip.
The model gateway is the most underrated component. It doesn't just connect to models — it selects the right model for each task, handles fallbacks, logs every call, and controls cost. Without it, every agent calls models differently, and you lose all visibility into spending and quality. Simple classification tasks can use a lightweight model. Complex reasoning across documents needs a stronger one. The gateway makes that decision consistently.
Tool registry and tool execution must be separate. The registry is a catalog: metadata, owner, permissions, risk tier. The execution service actually runs the tool after validation — checking parameters, permissions, policy, and sometimes requiring approval. An agent can request a purchase order draft, but the execution service rejects it if the vendor isn't approved. An agent can prepare a refund, but execution pauses if the amount exceeds threshold.
State and memory serve different purposes. State stores deterministic workflow status — what step is the agent on, what decisions were made. Memory stores contextual information across sessions. Many implementations mix them, but state needs stricter governance and auditability. Memory can be more flexible but must respect permission and retention policies.
Policy enforcement must be an explicit checkpoint near every tool call, data access, and action execution. If policy is just a document or scattered logic, it's too fragile for production.
Context Is Where Agents Actually Fail
Most agent failures aren't the model's fault. The context was wrong, incomplete, outdated, or didn't have permission to be there.
Permission-aware retrieval is the single most important capability in this layer. An agent should never retrieve a document just because it's semantically similar. It must know who the user is, which agent is asking, what domain is being processed, and what data is permitted. HR agents shouldn't see compensation documents for other cases. Customer service agents shouldn't access other customers' histories.
The ingestion pipeline handles extraction, chunking, metadata enrichment, sensitivity classification, versioning, and sync. Without disciplined ingestion, retrieval pulls stale or irrelevant content.
Three storage types serve different needs. Vector stores handle semantic search on unstructured content. Metadata catalogs provide structure: source, owner, validity date, classification, access rights. Knowledge graphs capture entity relationships — vendor to contract, product to customer, incident to policy. Not every use case needs a graph. Simple knowledge assistants work fine with vector plus metadata. But supply chain disruption, customer entitlement, or cross-entity finance exceptions benefit enormously from graph-based reasoning.
The Governance Layer Nobody Wants to Build
Governance isn't bureaucracy. It's the difference between agents you trust and agents you can't deploy.
Agent registry is the official catalog: name, purpose, business and technical owners, risk tier, tools, data sources, autonomy level, lifecycle status, dependencies. Policy registry stores cross-agent rules: transaction thresholds, approval requirements, tool restrictions, risk classifications. Without registries, you have no inventory to govern.
Risk tiering prevents one-size-fits-all controls. An internal knowledge assistant in assist mode is different from an agent that executes ERP transactions. Drafting commentary is different from triggering refunds or production remediation. Tiering connects to approval workflows, observability depth, testing rigor, and release controls.
Evaluation harness is the testing environment for agents before and after release. Golden datasets, scenario tests, policy compliance checks, regression tests when models or prompts change, and post-production sampling. Without it, you only know agents are running — not whether quality is improving or degrading.
The Only Build Order That Works
The classic platform mistake is trying to build everything at once. It ends up slow, expensive, and disconnected from business needs.
- Start with the model gateway. Give every early agent a standard path for model access, logging, fallback, and cost control.
- Add tool registry and execution as soon as agents touch enterprise systems. Without this, integrations become wild and unauditable.
- Next comes logging, tracing, and observability. Before scaling, you must see what agents are doing, what they cost, and how fast they respond.
- Permission enforcement and policy checks follow when agents read sensitive data or execute actions.
- Evaluation harness becomes critical once model, prompt, or tool changes happen frequently.
- Memory service and agent registry can wait unless your use cases specifically need them.
The principle is simple: capabilities must be born from real use cases. Building a knowledge graph without a use case that needs complex relationships creates an expensive asset nobody uses. Building sophisticated memory for task-based, stateless agents is premature.
What This Means in Practice
Imagine starting with two use cases: AP exception handling and IT incident triage. From these, you'll likely discover the most urgent shared needs are model gateway, tool registry, observability, permission-aware retrieval, and approval workflows. Full knowledge graphs and cross-agent memory can wait.
A good reference architecture isn't the most complete on paper. It's the one that lets you answer one question with confidence: if we add ten new agents across finance, procurement, customer operations, and IT tomorrow, do we have the shared foundation to run them safely, at scale, without creating agent sprawl?
If the answer is no, your next priority isn't building more agents. It's strengthening the platform.
This article originally appeared on the author's blog.













