Lesson 7: Multi-Agent Patterns — Orchestrators, Specialists, and When Not To

By Carey Whitten— May 5, 2026

Lesson 7: Multi-Agent Patterns — Orchestrators, Specialists, and When Not To

What's in the Field and Why It Matters in Accounts

Three multi-agent topologies are showing up in RFIs and architecture decks with enough frequency that you need a working vocabulary before the meeting: supervisor-worker, where an orchestrator decomposes tasks and delegates to specialized agents; debate, where peer agents independently generate and critique each other's outputs; and handoff, where agents pass work sequentially through a pipeline. You'll encounter them described as "agentic frameworks," "collaborative AI," or — the one that should make you reach for a follow-up question — "swarms." The precise language matters because each topology carries a different coordination cost and a different failure mode, and buyers who've read the same Gartner deck you have will be asking about them without necessarily knowing what they're actually asking about. Knowing the difference between these patterns, and knowing what each one costs, is what lets you be the person in the room who asks the question that exposes whether the architecture has a real coordination story or just a diagram.

The Profiles

Supervisor-Worker

What it is: An orchestrator agent decomposes a goal into subtasks and delegates each to a specialized worker agent, then synthesizes the workers' outputs into a final result.

What it does: The orchestrator holds the plan. Workers hold the execution. A coding assistant built on this pattern might have the orchestrator break a feature request into "write the function," "write the tests," and "write the documentation," then dispatch each to a worker optimized for that subtask, and finally assemble the outputs. Subtasks can run in parallel where dependencies allow, which is the main case for this topology over a single agent.

Who's behind it / where it comes from: Anthropic's published agent documentation describes this as the "orchestrator-subagent" pattern and treats it as the canonical form of multi-agent coordination. LangGraph, the graph-based agent orchestration library from LangChain, implements it as the "supervisor" node pattern in its published architectural documentation. The distributed systems lineage is visible — MapReduce is a distant structural cousin — though the LLM version introduces coordination complexity that batch processing never had to handle.

What makes it distinct: The coherence of the final output depends entirely on the orchestrator's ability to synthesize worker outputs that were produced without shared context. Workers don't know what the other workers are doing. The orchestrator is the only entity that holds the full picture, which means the orchestrator's context window is the architectural bottleneck, and the synthesis step is where most of the coherence tax gets paid.

Debate

What it is: Multiple peer agents independently generate responses to the same prompt, then exchange and critique each other's outputs across one or more rounds, converging toward a consensus answer or a synthesized final response.

What it does: The mechanism is argument, not instruction. No agent has authority over the others. A legal research application might run three agents independently analyzing a contract clause, then have each agent critique the others' analyses before a final synthesis pass. Calibration is the payoff: errors that a single agent would confidently propagate get surfaced when a second agent, working from the same inputs but without access to the first agent's reasoning, reaches a different conclusion.

Who's behind it / where it comes from: The modern LLM version of this pattern was formalized in a 2023 paper by Yilun Du, Shuang Li, Antonio Torralba, Joshua Tenenbaum, and Igor Mordatch — "Improving Factuality and Reasoning in Language Models through Multiagent Debate" — from MIT and Google Brain. The paper demonstrated measurable improvements in factual accuracy and mathematical reasoning on benchmark tasks when agents were allowed to critique and revise each other's outputs across multiple rounds. The "Society of Mind" framing that sometimes accompanies this pattern traces to Marvin Minsky's 1986 work, though the connection is more inspirational than architectural.

What makes it distinct: No hierarchy, no delegation, no pipeline. The coordination mechanism is structured disagreement. This makes it the most expensive topology per token — every agent processes every other agent's output in every round — and the most resistant to the specific failure mode where a single confident-but-wrong agent propagates an error through a system.

Handoff

What it is: A linear chain where each agent completes its stage of a task and passes the result to the next agent, which operates without access to prior agents' reasoning or intermediate steps.

What it does: Each agent is a specialist in one stage of a pipeline. A customer service system might route an incoming ticket through a classification agent, then a policy-lookup agent, then a response-drafting agent, with each agent receiving only the output of its predecessor. The orchestration is structural — baked into the pipeline topology — rather than dynamic. No agent decides what happens next; the sequence is fixed.

Who's behind it / where it comes from: Anthropic's published documentation describes the workflow-level version as "prompt chaining" and notes that it trades flexibility for predictability. The agent-level version appears in production systems for document processing, compliance review, and customer service escalation, where the stages are well-defined and the handoff boundaries are stable. Unlike the other two topologies, handoff has no single academic origin — it's an architectural pattern that emerged from production constraints.

What makes it distinct: Every handoff is a lossy compression. The receiving agent gets the output of the prior stage, not the reasoning that produced it. Errors introduced early in the chain propagate silently — there's no orchestrator to catch them and no peer to critique them. The topology is the simplest to reason about and the hardest to debug when something goes wrong in the middle.

Comparison Strategy

Method: trait-led analysis. Three subjects is too few for clustering to add value, and scenario mapping risks producing a feature matrix that obscures the central analytical question. The cost-and-coherence tax is a specific set of traits — coordination overhead, context coherence, failure propagation, and conditions under which the tax is worth paying. Running all three topologies through these traits gives the reader the vocabulary to interrogate any multi-agent architecture claim, regardless of which topology is being proposed. That's the goal.

Coordination Overhead

Every agent boundary is a token boundary. Information that crosses an agent boundary must be serialized into language, transmitted, and re-parsed by the receiving agent. This is not free. In supervisor-worker, the orchestrator pays this cost twice: once when dispatching to workers (translating the plan into instructions), and once when synthesizing (translating worker outputs back into a coherent result). In debate, every agent pays the cost of processing every other agent's output in every round — the overhead scales with the number of agents and the number of rounds. In handoff, each agent pays the cost once, at the moment of receiving the prior stage's output.

Handoff has the lowest per-agent coordination overhead. Debate has the highest. Supervisor-worker sits in the middle, with overhead concentrated at the orchestrator.

A 2024 analysis of multi-agent coding systems on SWE-bench-style benchmarks found that naive task decomposition in supervisor-worker architectures frequently produced worse results than a single capable agent with a well-structured prompt, because the synthesis step introduced more error than the parallelization removed. Token cost and latency are visible in the bill; accuracy degradation shows up in the output.

Context Coherence

A single agent holding a complete problem in its context window has access to everything simultaneously. Multi-agent architectures fragment that context by design. The question is where the fragmentation happens and who's responsible for reassembly.

In supervisor-worker, coherence lives in the orchestrator. Workers operate on fragments; the orchestrator is supposed to hold the whole. The orchestrator's synthesis can be less coherent than the original problem, because synthesis is a compression operation and compression is lossy.

In debate, coherence is supposed to emerge from the critique process. Agents that disagree surface the incoherence explicitly, which is the topology's primary value. The failure mode to watch is consensus through capitulation — later-round agents deferring to confident earlier-round outputs rather than maintaining genuine disagreement. The Du et al. paper noted this as a live concern and recommended structured critique formats to mitigate it.

In handoff, coherence degrades monotonically down the chain. Each agent has less context than the one before it. The final agent in the chain is operating on the most compressed representation of the original problem. For well-defined pipelines where each stage's output is a sufficient input for the next stage, this is fine. For tasks where early-stage nuance matters to the final output, it's a structural problem with no architectural fix.

Failure Propagation

In supervisor-worker, a worker failure can be caught by the orchestrator — if the orchestrator is designed to detect it. If it isn't, the orchestrator synthesizes a result that includes the failed worker's output, and the error surfaces in the final answer. The orchestrator is both the recovery mechanism and the single point of failure.

In debate, a single agent producing a wrong answer is the scenario the topology is designed to handle. Peer critique is supposed to surface the error. The failure mode is correlated errors — all agents reaching the same wrong conclusion because they share the same training distribution or because the prompt structure leads them all to the same mistake. Debate doesn't help when all the debaters are wrong in the same direction.

In handoff, a failure at stage N is invisible to stage N+1. The downstream agent has no way to know that its input is the product of a failed upstream stage. Errors propagate silently and compound. This is the topology's most serious structural weakness, and it's why handoff architectures require explicit validation at each stage boundary, which reintroduces coordination overhead through the back door.

When the Tax Is Worth Paying

Supervisor-worker earns its overhead when the task genuinely decomposes into independent subtasks that benefit from specialization, and when the orchestrator's synthesis is simpler than the original task. Code generation with separate agents for implementation, testing, and documentation is the canonical example. The tax is not worth paying when the subtasks are interdependent, when the synthesis is as hard as the original problem, or when a single agent with a well-structured prompt would have held the full context anyway.

Debate earns its overhead when calibration matters more than speed, when the task involves factual claims that can be independently verified, and when the cost of a confident wrong answer is high. Legal analysis, medical triage support, and financial risk assessment are the use cases where the Du et al. results are most relevant. The overhead doesn't justify itself when the agents share the same failure modes, when the task has a clear right answer that doesn't benefit from critique, or when the latency of multiple rounds is operationally unacceptable.

Handoff earns its overhead when the pipeline stages are stable, well-defined, and genuinely sequential — when each stage's output is a complete and sufficient input for the next stage. Document processing pipelines, compliance review chains, and customer service escalation flows are the production environments where handoff appears most reliably. The architecture doesn't earn its complexity when stages are interdependent, when early-stage nuance matters to the final output, or when silent error propagation is unacceptable in the operational context.

Field Language Guide

Don't say	Do say	Why it matters
"The agents collaborate"	"The agents exchange structured outputs — here's the coordination mechanism"	Collaboration is a human concept; buyers need to know the actual protocol
"It's a swarm of agents"	"It's a supervisor-worker topology — one orchestrator, N workers, synthesis at the end"	Swarm implies emergent coordination; buyers need to know who's in charge
"More agents means more capability"	"More agents means more coordination overhead — the question is whether the task justifies it"	The cost-and-coherence tax is real; setting this expectation early protects the relationship
"The agents check each other's work"	"It's a debate topology — peer agents critique each other's outputs across multiple rounds"	"Check each other's work" hides the latency and token cost of multi-round critique
"It's like a team of experts"	"It's like a team of experts who can't talk to each other except through written memos"	The analogy holds on specialization; it breaks on shared context
"The pipeline handles it automatically"	"The handoff topology passes outputs between stages — errors at stage N are invisible to stage N+1"	Silent failure propagation is the specific risk buyers need to understand
"We can always add more agents later"	"Adding agents adds coordination overhead — the architecture should justify the complexity upfront"	Architectural complexity is easier to add than to remove
"The orchestrator manages everything"	"The orchestrator is the coherence bottleneck — its context window is the architectural constraint"	Buyers who understand context windows will ask the right follow-up question
"It's more robust because it's distributed"	"Distribution changes the failure mode — it doesn't eliminate it"	Distributed failure modes are often harder to debug than single-agent failures
"The agents reach consensus"	"The agents converge — the risk is capitulation rather than genuine reasoning"	Consensus through deference is a known failure mode in debate topologies
"It scales horizontally"	"Parallelism is available in supervisor-worker — the synthesis step doesn't parallelize"	The bottleneck is always the orchestrator; horizontal scaling claims need scrutiny
"It's an agentic framework"	"Which topology? Supervisor-worker, debate, or handoff — and what's the coordination story?"	Framework is a category; topology is an architecture; buyers need the latter

“

Okta Concept Mapping: Delegation Chains

The supervisor-worker topology looks, at first pass, like a delegation chain — a principal authorizing a sub-principal to act on its behalf, the way OAuth delegation or a service account hierarchy works in IDAM. The orchestrator is the delegating principal; workers are the delegated agents. The analogy holds on structure. It breaks on auditability. In an IDAM delegation chain, the scope of delegation is explicit, the identity of the delegatee is verified, and the chain is auditable after the fact. In a supervisor-worker multi-agent system, the "delegation" is a prompt — unscoped, unverified, and often unlogged at the level of individual agent interactions. When a CAIO asks "how do we audit what the agents did," the honest answer in most current implementations is "we log the orchestrator's inputs and outputs, but the worker interactions are often opaque." That's not a delegation chain in any sense that an identity governance framework would recognize. It's the question to surface before the architecture gets built, not after.