Multi-Agent Orchestration: When One Agent Isn't Enough

On this page

The mega-agent that did everything badly
The principle: decompose, then coordinate
The picture: a supervisor routing to specialists
Designing a multi-agent system
Four patterns and when to use each
A supervisor in code
Shared state, cost, and loop control
Common mistakes that cost hours (and dollars)
Takeaways
Where to go next

The mega-agent that did everything badly

I once shipped a single agent with a 4,000-token system prompt, 27 tools, and instructions to "research the topic, draft the post, fact-check it, format the Markdown, and publish." On paper it was one tidy assistant. In practice it was a committee of one trapped in a single brain. It would research, forget halfway through what it was researching, call the publish tool before the draft existed, re-read the same docs four times, and burn 60,000 tokens producing a post that contradicted itself.

The fix wasn't a smarter model or a longer prompt. It was structure. I split that one overloaded agent into a researcher, a writer, and an editor, with a small supervisor deciding who runs next. Each piece had a tight prompt, a handful of tools, and one job. The output quality jumped and, counterintuitively, the cost dropped, because nobody was re-deriving context they'd already lost.

That's the whole story of multi-agent orchestration: not "more agents are better," but "the right decomposition beats one mind doing everything." This article is the map, the patterns, the wiring, and the failure modes that make people swear off multi-agent forever.

Who this is for

You've built a [single agent](/blog/building-ai-agents), a loop that calls tools until it's done, and it's straining under too many responsibilities. You're wondering whether to split it up, and you don't want to trade one problem (a confused agent) for two (confused agents that also loop forever and cost a fortune). Comfort with an agent loop and tool calls is assumed; no orchestration-framework experience required.

The principle: decompose, then coordinate

Don't make one agent smarter. Decompose the problem into roles, then coordinate them. The hard part of multi-agent isn't the agents, it's the coordination.
The orchestration mindset

A single agent fails at big tasks for the same reason a single person fails at running a company alone: context limits and attention. Cram research, writing, and review into one prompt and the model context-switches badly, loses track of intermediate state, and dilutes its own instructions. Split the work into focused roles and each one stays sharp, but now you have a new problem: getting them to work together without stepping on each other.

A project manager who assigns work and reviews resultsSupervisor / router agent

Specialists, a researcher, a writer, an editorWorker / specialist agents with narrow tools

A shared project doc everyone reads and updatesShared state / scratchpad memory

"Hand this off to legal"Handoff: one agent transfers control to another

A budget and a deadline nobody can exceedToken budget + step cap per run

A multi-agent system is a team, not a genius. The manager doesn't do the work, they decide who does.

Keep that team in your head as we go. Most orchestration questions reduce to "how would a well-run team handle this?", who decides, who specializes, what's written down, and when do you stop.

The picture: a supervisor routing to specialists

The most common, and most forgiving, pattern is the supervisor (also called router or orchestrator). One agent owns the goal and the control flow. It doesn't do the domain work itself; it inspects the task, routes to the right specialist, collects each result into shared state, and decides whether to route again or finish.

Supervisor pattern: the orchestrator routes each turn to a specialist, reads back results from shared state, and aggregates a final answer.

1
Supervisor receives the goal
It gets the user's task plus the current shared state (empty on the first turn) and a list of available specialists.
2
Supervisor decides who runs next
Based on what's already in state, it picks a specialist, or decides the work is done. This is a routing decision, not domain work.
3
The specialist executes
The chosen worker runs its own tool loop with a narrow prompt and small toolset, then writes its result into shared state.
4
Control returns to the supervisor
It reads the updated state and loops back to step 2, route again or finish.
5
Aggregate and return
When the goal is satisfied (or the step cap is hit), the supervisor composes the final answer from shared state.

Designing a multi-agent system

Before you wire anything, design the team on paper. Skipping this step is how you end up with five agents that overlap, argue, and loop. The order matters.

1
Write the goal as one sentence
If you can't state the end state in a sentence, you can't tell a supervisor when to stop. "Produce a fact-checked, formatted blog post" is a goal; "help with content" is not.
2
Decompose into roles, not steps
Roles are durable (researcher, writer, editor); steps are transient. Give each role a single responsibility and the minimum tools to fulfill it.
3
Pick a coordination pattern
Supervisor, pipeline, parallel, or handoff, chosen by how the roles depend on each other (see the table below). Default to supervisor when unsure.
4
Define the shared state schema
Decide exactly what each agent reads and writes. A typed scratchpad beats "they share the whole conversation", which is how context explodes.
5
Set hard limits
A per-run step cap and a token budget, decided up front. These are not optional; they are the only thing standing between you and an infinite loop billed by the token.

Four patterns and when to use each

There are four orchestration shapes worth knowing. They're not exclusive, real systems nest them (a supervisor whose "researcher" is itself a parallel fan-out). Start with the simplest that fits the dependency structure of your roles.

Pattern	Shape	Use it when
Supervisor / router	One coordinator routes to specialists each turn	Roles are dynamic, you don't know up front which specialists are needed or in what order. The safe default.
Sequential pipeline	A → B → C, each output feeds the next	Stages have a fixed, linear dependency (extract → transform → summarize). Simple, cheap, deterministic.
Parallel workers	Fan out to N agents at once, then aggregate	Subtasks are independent (summarize 10 docs, review a diff from 3 angles). Cuts latency; watch combined cost.
Handoff / swarm	Agents transfer control peer-to-peer, no central boss	Conversation-style flows where the active specialist changes (triage → billing → tech support). Powerful, but easiest to let run unbounded.

Pick the pattern by how your roles depend on each other and who decides control flow.

Default to the simplest

If a sequential pipeline fits, use it, it's deterministic, debuggable, and cheap. Reach for a supervisor when routing must be dynamic, and for handoff/swarm only when control genuinely needs to move peer-to-peer. Every step up in flexibility costs you predictability.

A supervisor in code

Here's a stripped-down supervisor routing to two specialists, with a hard step cap. No framework, just the control loop, so you can see the moving parts. Using the Claude API, the specialists are ordinary single agents; the supervisor's only job is to choose one and stop on time.

supervisor.py

python

from anthropic import Anthropic

client = Anthropic()
MODEL = "claude-sonnet-4-5"
MAX_STEPS = 6          # hard loop cap, non-negotiable
TOKEN_BUDGET = 80_000  # stop if we blow the budget

def researcher(state: dict) -> str:
    """Specialist: gathers facts. Narrow prompt, search tools only."""
    msg = client.messages.create(
        model=MODEL, max_tokens=1024,
        system="You are a researcher. Return concise, sourced facts.",
        messages=[{"role": "user", "content": state["goal"]}],
    )
    return msg.content[0].text

def writer(state: dict) -> str:
    """Specialist: drafts prose from the research already in state."""
    msg = client.messages.create(
        model=MODEL, max_tokens=2048,
        system="You are a writer. Draft from the provided research only.",
        messages=[{"role": "user",
                   "content": f"Goal: ${state['goal']}\n\nResearch:\n${state.get('research','')}"}],
    )
    return msg.content[0].text

SPECIALISTS = {"researcher": researcher, "writer": writer}

def supervisor(goal: str) -> str:
    state = {"goal": goal}       # shared state, the scratchpad
    spent = 0
    for step in range(MAX_STEPS):
        if spent > TOKEN_BUDGET:
            break               # budget guard, not just step guard
        decision = client.messages.create(
            model=MODEL, max_tokens=256,
            system=(
                "You are a supervisor. Reply with ONE word: the next "
                "specialist to run (researcher | writer), or DONE if the "
                "draft in state is complete. Do no work yourself."
            ),
            messages=[{"role": "user", "content": f"State keys: ${list(state)}"}],
        )
        spent += decision.usage.input_tokens + decision.usage.output_tokens
        choice = decision.content[0].text.strip().lower()
        if choice == "done":
            break
        if choice in SPECIALISTS:
            # write the result back into shared state, keyed by role
            key = "research" if choice == "researcher" else "draft"
            state[key] = SPECIALISTS[choice](state)
    return state.get("draft", "(no draft produced)")

if __name__ == "__main__":
    print(supervisor("Write a short post on why DNS breaks deploys."))

Notice what the supervisor never does: domain work. It routes, and it stops. The MAX_STEPS loop and the TOKEN_BUDGET check are the load-bearing safety here, remove them and a confused supervisor can route forever. In production you'd persist state to a store and structure the routing decision as a tool call rather than parsing a word, but the skeleton is exactly this.

Shared state, cost, and loop control

Coordination lives and dies on three things: how agents share state, how you cap cost, and how you stop loops. Get these wrong and multi-agent becomes the chaos people warn you about.

Shared state and memory

Agents need a way to pass results to each other. The naive approach, share the entire conversation history between every agent, explodes context and cost, because each agent re-reads everyone else's reasoning. Prefer a typed scratchpad: a small structured object ({goal, research, draft, review}) where each agent reads only the keys it needs and writes one key back. This is the difference between a team sharing a one-page brief versus cc'ing everyone on every email. For long-running systems, back the scratchpad with a store (Redis, a DB row, MCP resources) so state survives restarts and you can inspect it.

Cost and loop control

Multi-agent cost is multiplicative, not additive, a supervisor that calls three specialists, each of which loops a few times, can 10x your token spend before you notice. Three guards, all mandatory: a step cap (max routing turns per run), a token budget (abort if exceeded, like the spent > TOKEN_BUDGET check above), and bounded handoffs (a max number of control transfers, so a swarm can't ping-pong forever). Use the cheapest model that works for routing decisions, the supervisor's job is classification, not generation, so it rarely needs your most expensive model.

The unbounded handoff trap

Swarm/handoff systems feel elegant until two agents hand off to each other in a loop, triage sends to billing, billing sends back to triage, and you discover it at the end of the month on the invoice. Always cap total handoffs per session and log every transfer.

Common mistakes that cost hours (and dollars)

Going multi-agent when one would do. Multi-agent adds coordination overhead, latency, and cost. If a single well-scoped agent with good tools handles the task, ship that. Reach for multiple agents only when one is genuinely overloaded.
No token budget. Without a hard spend ceiling, a confused supervisor or a looping swarm will keep calling models until something, usually your bill, forces the issue. Set a budget per run and abort on breach.
Unbounded handoffs. Peer-to-peer control transfer with no cap is how you get two agents bouncing a task between them forever. Cap total handoffs; log each one.
Sharing the whole conversation as state. Passing every agent's full history to every other agent explodes context and re-derives lost work. Use a typed scratchpad; share only what each role needs.
A supervisor that does the work. If your coordinator starts researching or writing instead of routing, you've recreated the mega-agent, now with extra latency. Keep it to routing and aggregation only.
No step cap. The single most common cause of a runaway agent system. A MAX_STEPS loop bound is the cheapest insurance you'll ever write.

Takeaways

The whole article in seven lines

One overloaded agent does everything badly, decompose into focused roles, then coordinate them.
The hard part is coordination, not the agents. Think "well-run team," not "genius."
Four patterns: supervisor (dynamic routing, the default), pipeline (fixed linear), parallel (independent fan-out), handoff/swarm (peer-to-peer).
A supervisor routes and stops, it never does domain work itself.
Share state via a typed scratchpad, not the whole conversation history.
Cost is multiplicative: always set a step cap, a token budget, and a handoff limit.
Don't go multi-agent when one agent would do, the simplest design that fits wins.

Where to go next

Multi-agent orchestration is a layer on top of single agents, make sure that foundation is solid first, then give your specialists clean tool access.

Start with Building AI Agents, every specialist here is just a single agent loop; nail that before you orchestrate.
Read Model Context Protocol (MCP), a clean, shared way to give specialists tools and back your shared state with inspectable resources.
Follow the full AI Engineer career path to go from one agent to production multi-agent systems with evals, tracing, and cost controls.

Want to go deeper?

This article covers concepts taught hands-on in the Cloud Engineer and DevOps career paths, with real terminal labs, production scenarios, and structured lessons.

Explore Career Paths Try the Labs

Keep reading

AI Engineering

RAG Architecture Explained for Backend Engineers

Read

AI Engineering

What Is an AI Engineer?

Read

AI Engineering

How LLMs Actually Work (for Engineers)

Read