The Agent-of-Agents Problem: Orchestrating a Fleet

The single-agent era lasted about nine months. For most of 2025, the engineering challenge was getting one agent to reliably complete one task — loop design, memory management, harness selection. That problem isn't solved, but it's increasingly irrelevant to what's happening on the ground. Stripe now merges over 1,000 agent-authored pull requests per week. Ramp reports 30% of merged PRs in their frontend and backend repos come from agents. Clay runs billions of agent invocations monthly for GTM workflows. The agent is no longer the unit of concern — the fleet is.

📖 Read the full version with charts and embedded sources on AgentConn →

And fleets have a management problem that single-agent design never anticipated. When you have five background agents running maintenance across repositories, responding to PR events, remediating vulnerabilities on a schedule, and triaging issues overnight, the question shifts from "how do I make this agent work" to "who watches the agents, and what happens when three of them conflict?"

This is the agent-of-agents problem: the orchestration layer that spawns, routes, observes, and kills a fleet of parallel agents. The tools are arriving fast — stablyai/orca shipped an open-source ADE for fleet management, GitHub launched Copilot /fleet for parallel agent execution, and Harrison Chase announced LangSmith Fleet as an enterprise workspace for managing agent fleets. The pattern is converging, and if you're building with agents in 2026, you need to understand it.

What fleet management actually means

Single-agent reliability is about loop design — how many iterations, when to bail, how to recover from tool failures. Fleet management is a different discipline entirely. It has three operational primitives that don't exist in single-agent architectures:

Observe. When five agents are running concurrently, you need a command center — not a terminal window. Which agent is stuck? Which one finished? Which one is about to merge a PR that conflicts with what another agent just pushed? Orca's mobile interface lets you monitor and steer agents from your phone. Airtable's Hyperagent provides a command center to oversee an entire fleet at scale.

Route. Not every task should go to every agent. The orchestrator pattern — one agent that receives the task, breaks it into subtasks, delegates each to a specialist worker, and assembles the results — is how you avoid paying GPT-4o prices for work that GPT-4o-mini handles fine. Microsoft's MDASH orchestrates 100+ specialized agents across an ensemble of models, cutting costs 40–60%.

Kill. When an agent goes off the rails — hallucinating file paths, writing to the wrong branch, spinning in a retry loop — you need to kill it without taking down the other four. Orca gives each agent its own Git worktree, so one agent's bad merge doesn't contaminate another's working directory.

💡 Fleet management is NOT the same as multi-agent frameworks. Frameworks like LangGraph and CrewAI define agent topologies at the code level. Fleet management operates at the infrastructure level — observing, routing, and killing agents that may be running different frameworks, different models, and different codebases simultaneously.

The tools: what's shipping now

Orca: the open-source fleet ADE

Orca from stablyai is the most complete open-source fleet management tool available today. It's an Agent Development Environment (ADE) — not a framework, not a harness, but a desktop and mobile application for managing parallel agents.

The core capability: spin up Claude Code, Codex, OpenCode, or any of 27 supported CLI agents side-by-side, each in its own isolated Git worktree, and manage them from one interface. Fan one prompt across five agents, compare the results, and merge the winner.

GitHub Copilot /fleet

GitHub's answer is the /fleet command in Copilot CLI. Instead of working through tasks sequentially, Copilot analyzes your prompt, breaks it into subtasks, and dispatches multiple subagents to execute them simultaneously.

/fleet shines for intra-codebase parallelism — refactoring across multiple files, generating documentation for several components, implementing features that span API, UI, and tests. It's less suited for cross-repo fleet management or heterogeneous agent stacks.

LangSmith Fleet

Harrison Chase's LangSmith Fleet takes the enterprise angle — "Claws" (fixed credentials) and "Assistants" (act on behalf of end user). The distinction matters for compliance.

Ona: the infrastructure layer

Ona provides the plumbing — scheduling, sandboxing, monitoring — that lets you run agent fleets in production without building the orchestration layer yourself.

The patterns: how to wire a fleet

Three patterns matter for fleet management:

Supervisor/Worker: One orchestrator delegates subtasks to specialist workers. The default pattern — MDASH, Copilot /fleet, and most custom implementations start here.

Parallel Fan-Out: Multiple agents execute the same task independently, you pick the best result. 5x the tokens for a dramatically higher quality ceiling.

Hierarchical Delegation: A tiered structure where higher-level agents supervise teams. Emerges naturally past 10 agents.

⚠️ 40% of multi-agent pilots fail within six months. The failure mode is almost always orchestration complexity, not individual agent capability. Start with supervisor/worker.

What Stripe and Ramp learned in production

At the Background Agents Summit in May 2026:

Stripe built its harness on Goose, running one-shot agents in air-gapped sandboxes with 400+ internal tools via MCP. Key insight: air-gapped sandboxes are non-negotiable.

Ramp built around the principle that agents should verify their own work. 30% of merged PRs come from agents, with a lower revert rate than human-authored code.

Both teams converged on: the fleet needs an immune system, not just a nervous system.

What this means for builders

Decision tree:

All Copilot? → Use /fleet
Mixed agent stack? → Use Orca
Enterprise compliance? → Watch LangSmith Fleet
Building your own? → Start with supervisor/worker, use Ona's guides as a blueprint

The agent-of-agents pattern is the inevitable consequence of agents that actually work. The tools are here — the question is how fast your fleet grows past what manual oversight can handle.

Originally published at AgentConn