Workflows vs. Agents — Six Patterns, One Decision Framework

Six architecture patterns buyers call "agents," a decision ladder for picking the right one, and field language to prove you know the difference.

By Leigh Garrity— May 8, 2026

Workflows vs. Agents — Six Patterns, One Decision Framework

Six architecture patterns buyers call "agents," a decision ladder for picking the right one, and field language to prove you know the difference.

Buyers are building AI systems that coordinate multiple LLM calls to accomplish tasks. Those systems fall into six architectural patterns: five workflows (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer) and one agent loop. Anthropic's frameworkdrew the canonical line between them: workflows orchestrate LLMs through predefined code paths; agents let the LLM dynamically direct its own tool use and sequencing. You will hear these patterns in every conversation where a buyer mentions automating multi-step processes or building "AI agents." The word "agent" is doing enormous work in the market right now, and most of the time it's describing a workflow. Being able to name the actual pattern a buyer needs is what separates you from the last three vendors who walked in promising autonomy.

Prompt Chaining

What it is: A fixed sequence of LLM calls where each step's output feeds the next step's input, with programmatic gates between steps.

What it does: Breaks a complex task into a pipeline you can draw on a whiteboard before the system runs. Step one extracts data from a document. A gate checks the extraction against a schema. Step two drafts a summary from validated data. Each gate catches errors before they propagate downstream. Document processing, contract review, any editorial workflow with a predictable sequence.

Where it comes from: Anthropic's framework names this as the simplest multi-step workflow pattern. The concept predates LLMs by decades. ETL pipelines have worked this way forever: extract, transform, load, with validation between stages. Anthropic codified the pattern for LLM orchestration and gave it a name.

What makes it distinct: The path is fixed before the first call runs. Every step is predetermined. If you can draw the flowchart in advance, you want this pattern and nothing more.

Routing

What it is: A classification step that reads each input and sends it to a specialized handler based on input type.

What it does: A single LLM or classifier examines the incoming request and decides which downstream path handles it. Anthropic's example: customer service queries get sorted into general questions, refund requests, or technical support, each dispatched to a different prompt and toolset. A second application routes simple queries to cheaper models and complex ones to more capable models, a straight cost optimization play. In the federal space, the Transportation Department describes complaints management and grant compliance review use cases that are routing workflows, even when they call them "agentic AI."

Where it comes from: Anthropic's framework. Routing is as old as IT service management. Every ticket triage system, every benefits eligibility check, every call center IVR tree is a routing pattern. The new part is using an LLM as the classifier instead of a rules engine.

What makes it distinct: The router's job is dispatch. It decides who solves the problem, and that's all it does.

Parallelization

What it is: Multiple LLM calls running simultaneously on the same input, either dividing the work or providing independent assessments.

What it does: Two variants. Sectioning splits a task into independent pieces that run concurrently: one model processes user queries while another screens for inappropriate content. Voting runs the same task through multiple models and aggregates their answers, which is how content moderation pipelines balance false positive and false negative rates. You're buying either speed (sectioning) or confidence (voting) with extra compute.

Where it comes from: Anthropic's framework. The voting variant descends from consensus mechanisms in distributed systems and ensemble methods in machine learning. Sectioning is parallel processing applied to LLM calls. Neither concept is novel; the contribution is naming them as distinct workflow patterns for LLM orchestration.

What makes it distinct: It's the only pattern that trades compute cost for either speed or confidence. If the subtasks aren't genuinely independent, you don't want this. You're paying for multiple simultaneous calls because the task demands it.

Orchestrator-Workers

What it is: A central LLM that analyzes a task, decides what subtasks are needed, delegates them to worker LLMs, and synthesizes the results.

What it does: The orchestrator reads the input and determines the work plan at runtime. Anthropic's example: a coding task where the number of files that need editing depends on the specific issue. The orchestrator can't know in advance whether it needs two files or twelve, so it assesses first, then delegates. Uber's use of LangGraph for large-scale code migrations is a documented production instance (sourced from a vendor comparison guide, not Uber directly). The boundary between workflows and agents starts to blur here. The orchestrator makes a dynamic decision about task decomposition, but the overall system still runs on code-defined paths. The orchestrator decides what to delegate. The code still governs how delegation works.

Where it comes from: Anthropic's framework, and the pattern Anthropic uses in their own SWE-bench coding agent. The concept mirrors task decomposition in distributed computing and project management. What's specific to LLM orchestration is that the decomposition itself is performed by a model, not a human or a rules engine.

What makes it distinct: The orchestrator figures out the work breakdown at runtime, after the input arrives, based on what it finds. It's making a real decision, but making it within rails a developer set. A project manager who can assign tasks but can't rewrite the org chart.

Evaluator-Optimizer

What it is: A two-LLM loop where one generates output and another evaluates it, cycling until the output meets defined criteria or hits a maximum iteration count.

What it does: One model drafts. Another model critiques. The draft gets revised based on the critique. Repeat. Anthropic's examples: literary translation where nuance improves through iterative feedback, and complex search tasks where an evaluator decides whether more searching is warranted. Descript's video editing agent used three evaluation dimensions (don't break things, do what I asked, do it well), a clean instantiation. Cost scales with iteration count. One practitioner estimate (Stevens Institute blog, not a benchmarked study) puts an unconstrained 10-cycle loop at roughly 50x the tokens of a single pass. You need a hard exit condition.

Where it comes from: Anthropic's framework. The pattern has deep roots in code generation, where generate-then-review loops are standard practice. Peer review, red-teaming, editorial revision: the structure is ancient. The LLM version automates the reviewer.

What makes it distinct: It's the only pattern with a built-in quality feedback loop. The system improves its own output without human intervention, within bounds you set. The exit condition is what keeps this pattern from eating your budget.

Agent Loop

What it is: An LLM that plans its own actions, executes them using tools, observes the results, and decides its next step in a loop until the task is complete.

What it does: The model receives a goal. It decides which tools to call, in what order, based on what it observes after each step. Simon Willison's definition is the cleanest: "something that runs tools in a loop to achieve a goal." The underlying mechanism is the ReAct pattern (Reason + Act) from Princeton/Google research: plan, execute, observe, repeat. Estonia's Bürokratt system, where a citizen asks to renew a passport and the system coordinates across agencies, is a genuine agent loop. The coordination path can't be predetermined because it depends on what the system discovers about the citizen's situation.

Where it comes from: Anthropic's framework, building on the ReAct paper (Yao et al., 2022). Anthropic identifies customer support with tool integration and coding agents as the two most promising applications. Worth flagging the vocabulary collision: in IDAM, "agent" means a software component acting on behalf of a user with delegated credentials. In AI, "agent" means an LLM directing its own tool use. Your buyer may be using the word either way.

What makes it distinct: The system chooses its own path. No human drew the flowchart in advance. The only pattern where the LLM controls both what to do and when it's done.

Okta Concept Mapping: Routing and Adaptive MFA

Adaptive MFA evaluates risk signals and routes the user to the appropriate authentication challenge. That's the routing pattern. The analogy breaks on determinism: MFA routing is policy-driven; LLM routing is probabilistic, which means governance shifts from policy compliance to classification accuracy monitoring.

The Escalation Ladder

I'm using scenario mapping organized around a single principle: you move up the ladder only when the pattern below it can't handle the specific task shape you're facing. This is Anthropic's own design philosophy: "start with simple prompts and add multi-step agentic systems only when simpler solutions fall short." The ladder matters because each step up adds cost, latency, and failure surface. Complexity is a tax you pay when the task demands it.

Before the ladder: real systems compose patterns. A production deployment might use routing at the front to dispatch to different prompt chains at the back. The ladder describes which pattern handles the core task shape, not a rule that you pick exactly one.

The diagnostic question that determines where you land: Can a human draw the process flow before the system runs?

If yes, and the steps are sequential: prompt chaining. A document pipeline where extract → validate → summarize is the same sequence every time. No reason to add anything.

If yes, but the input determines which flow: routing. The buyer says "we get different types of requests." The system classifies and dispatches. If they're also routing easy queries to cheaper models, they're optimizing cost without adding architectural complexity.

If yes, but the subtasks are independent and you need speed or confidence: parallelization. Content moderation that needs multiple perspectives. Multi-language processing that can run concurrently. You're buying time or accuracy with extra compute.

If mostly, but the number of subtasks depends on the input: orchestrator-workers. The buyer says "it depends on the case" when you ask how many steps are involved. The orchestrator assesses each input and delegates accordingly. Cost starts to climb here because the LLM is making a structural decision at runtime.

If yes, but the output needs iterative refinement: evaluator-optimizer. The buyer cares about precision. Translation quality, compliance accuracy, code correctness. The generate-evaluate loop improves output but multiplies token cost with each iteration. A practitioner estimate puts an unconstrained reflexion loop at 50x the tokens of a single pass. Set hard exit conditions or the budget conversation gets uncomfortable.

If no, the process flow genuinely cannot be predetermined: agent loop. The buyer says "it needs to figure out the steps on its own." Most powerful, most expensive, hardest to audit. Agents consume roughly 4x more tokens than standard interactions, and up to 15x in multi-agent configurations. The GAO found that even the best-performing agents complete only about 30% of complex tasks autonomously without error. The remaining 70% lands on a human who may not have the context to catch the mistake before it propagates. Use the agent loop only when no workflow pattern handles the task shape.

Workflows are cheaper, faster, more auditable, and easier to govern. Most of what buyers describe as "agents" are workflows. The pitch deck calls this "delegation," which is a generous word for what's actually happening in most production deployments. Telling a buyer their "agent" is actually a well-designed workflow is the most useful thing you can say in the room.

Okta Concept Mapping: Orchestrator-Workers and SCIM Provisioning

SCIM provisioning orchestrates across target systems, and when the targets depend on the user's role discovered at runtime, that's the orchestrator-workers pattern. The analogy breaks on error handling: SCIM has standardized error responses and retry logic; LLM orchestrators make probabilistic recovery decisions when a worker fails.

Okta Concept Mapping: Agent Loop and Dynamic Authorization

The agent loop maps to moving from static RBAC to real-time, context-aware authorization, where the access path emerges at runtime instead of being fixed by policy. The governance implication is identical: dynamic systems require continuous monitoring and bounded authority on top of upfront policy definition.

How to Say This in the Field

Don't say	Do say	Why it matters
"You probably need an AI agent for that"	"Let me ask — are the steps predictable, or does the system need to figure them out as it goes?"	Separates workflow from agent in one question
"Agentic AI can automate your workflows"	"Most of what gets called 'agentic' is actually a structured workflow, and that's the better architecture for predictable tasks"	Positions you as someone who won't oversell complexity
"Agents are the future of AI"	"Agents solve a specific problem: tasks where the path can't be defined in advance. For everything else, workflows are cheaper, faster, and easier to audit"	Matches what federal buyers are saying about operationalizing AI
"We can route different request types"	"That's a routing pattern — classify the input, send it to the right handler. What are the categories you're routing between?"	Names the pattern, then pivots to discovery
"AI can handle your document processing"	"If the pipeline is extract, validate, summarize — same steps every time — that's prompt chaining. Simplest pattern, easiest to govern"	Connects architecture to auditability, which is what CAIOs care about
"The AI will figure out what needs to happen"	"That's an agent loop — most powerful pattern, but roughly 4x the token cost and hardest to audit. What happens when it gets a step wrong?"	Introduces cost and risk without dismissing the pattern
"We should build this with agents"	"Let's start with the task shape. Can you draw the process flow today, or does it change based on what the system finds?"	The diagnostic question that determines workflow vs. agent
"Our AI handles complex, multi-step tasks"	"Complex and multi-step doesn't automatically mean agent. If the steps are known, an orchestrator-workers pattern handles it with less cost and more predictability"	Distinguishes orchestrator-workers from agent loop — the most common conflation
"We need a governance framework for AI agents"	"Does your framework distinguish between workflows and agents? Workflows have a bounded action surface you can enumerate. Agents don't. The governance model is different"	Directly useful in CAIO conversations about M-25-21 compliance
"AI agents can work across your disconnected systems"	"Before adding autonomy, do the systems have the APIs to support integration? That's an infrastructure question that comes before the architecture question"	Prevents the buyer from skipping the data modernization step
"We're looking at agentic AI to reduce administrative burden"	"That's the right goal. Let's figure out whether the task shape calls for a workflow or an agent — the answer determines your cost, your audit surface, and your governance model"	Mirrors State Department language while steering toward the right architecture

Start with the simplest pattern that handles the task shape. Move up the ladder only when you have a specific reason. The buyer who says "we want AI agents" usually wants a workflow that works. Name the right pattern, and you've already added more value than anyone else in the room.

Things to follow up on...

GAO's 30% autonomy ceiling: The GAO's science and technology arm found that even top-performing AI agents complete only about 30% of complex tasks without error, which is the strongest public sector data point for justifying workflow-first design in buyer conversations.
Federal agentic AI pilots accelerating: A May 2026 Market Connections survey found more than half of federal agencies are planning or piloting agentic AI, yet fewer than a third have implemented the oversight frameworks they say are essential.
Agent cost compounding in production: An Oracle developer blog estimates that agents consume roughly 4x more tokens than standard chat interactions, scaling to 15x in multi-agent configurations, which makes the workflow-vs-agent decision a direct budget question.
"Fix the work first" principle: A Nextgov analysis of federal AI adoption argues that automating a broken process just produces a faster broken process, and that rules-based workflow automation should precede any move toward agent autonomy.