Workflows vs. Agents: Six Patterns, One Distinction That Changes Every Question You Ask

By Carey Whitten— May 5, 2026

Workflows vs. Agents: Six Patterns, One Distinction That Changes Every Question You Ask

When a buyer says "we're building an agent," that phrase is carrying more architectural weight than it appears. It might mean a two-step prompt chain that classifies support tickets. It might mean a system where the model autonomously decides which tools to call, in what order, based on what it observes at runtime. Those are different things. They have different failure modes, different governance requirements, and different answers to the question underneath the question, which is usually: how much control do we actually have over what this system does?

Anthropic's published framework is the stable reference point here. The field was genuinely contested as recently as 2024, with "agent" applied to everything from a single-turn API call to a multi-model autonomous system. Anthropic's framework draws one clean line: in a workflow, the LLM executes within predefined code paths that engineers control; in an agent, the LLM dynamically decides its own next action, tool selection, and sequence. The framework names five workflow patterns — prompt chaining, routing, parallelization, orchestrator-workers, and evaluator-optimizer — and one agent architecture: the agent loop. Knowing which one your buyer is actually building changes every question you ask and every concern you surface.

The Six Patterns

Prompt Chaining

What it is: A sequence of LLM calls where each call's output becomes the next call's input.

What it does: Breaks a complex task into discrete steps, each handled by a focused prompt. A contract review pipeline might extract key terms in step one, check each term against a policy database in step two, draft remediation language in step three, and produce a summary in step four. Each step is narrow; the chain is the capability.

Architectural tradition: Anthropic's framework; also reflected in LangChain's early chain abstractions and the broader "chain-of-thought" literature, though the framework here refers to multi-call pipelines, not single-prompt reasoning.

What makes it distinct: Every transition between steps is defined at design time. The LLM has no discretion about what happens next. If step two fails, the failure is localized and the fix is in step two's prompt or the code that calls it.

Routing

What it is: An LLM classifier that directs inputs to one of several specialized downstream processes.

What it does: Reads an incoming request, assigns it to a category, and hands it off to the appropriate handler. A customer support system might route billing questions to a billing-specialized prompt, security incidents to a different prompt with different tool access, and general product questions to a third. The routing decision is made by the LLM; what happens after routing is predefined per branch.

Architectural tradition: Anthropic's framework; structurally similar to intent classification in older NLU systems, which routed to scripted dialog trees rather than LLM-powered handlers.

What makes it distinct: The LLM makes exactly one decision — which branch — and then the branch executes deterministically. The discretion is bounded and early. Misclassification is the primary failure mode, not runaway behavior.

Parallelization

What it is: Multiple LLM calls running simultaneously against the same input, with results aggregated afterward.

What it does: Handles tasks where independent analysis is faster or more reliable than sequential processing. A due diligence workflow might simultaneously send a target company's financials to one LLM call, its legal filings to another, and recent news coverage to a third, then pass all three outputs to a synthesis call. Parallelization also covers voting patterns, where multiple independent calls on the same input reduce the variance of any single response.

Architectural tradition: Anthropic's framework; the voting variant has roots in ensemble methods from classical machine learning.

What makes it distinct: The parallel calls are independent — they don't observe each other's outputs. The aggregation logic is defined at design time. The LLM has no discretion about which calls run or how results are combined.

Orchestrator-Workers

What it is: An LLM orchestrator that decomposes a task and delegates subtasks to worker LLMs (or other tools), then synthesizes the results.

What it does: Handles tasks too complex or variable to decompose at design time. A research report pipeline might give the orchestrator a topic and let it decide how to break the research into subtopics, which subtopics to assign to which workers, and how to weight the results in synthesis. The orchestrator has genuine discretion about decomposition, but only decomposition. The workers execute their assigned tasks without further discretion.

Architectural tradition: Anthropic's framework; related to the "planner-executor" pattern in multi-agent research, though Anthropic's framing keeps the orchestrator within the workflow category because the overall task structure is still defined by the system prompt, not discovered at runtime.

What makes it distinct: The orchestrator exercises limited runtime discretion, within a bounded domain. This is the workflow pattern most likely to be called an "agent" by buyers who haven't read the framework, and the most important one to probe when you hear that word.

Evaluator-Optimizer

What it is: A loop where one LLM generates output and a second LLM evaluates it against criteria, repeating until the output passes or a maximum iteration count is reached.

What it does: Handles tasks with clear quality criteria that are easier to evaluate than to specify upfront. Code generation is the canonical example: a generator produces code, an evaluator runs it against test cases and returns a structured critique, the generator revises, and the loop continues until tests pass or the iteration limit is hit. The same pattern applies to document drafting against a rubric, or translation quality against a back-translation check.

Architectural tradition: Anthropic's framework; structurally similar to RLHF (Reinforcement Learning from Human Feedback) reward modeling, though here the evaluator is an LLM rather than a trained reward model, and the loop runs at inference time rather than training time.

What makes it distinct: The number of iterations is determined at runtime, not design time. This is the one workflow pattern where the system's behavior is genuinely variable in duration. The failure mode — converging on a local optimum that satisfies the evaluator but not the actual requirement — is subtle and worth naming explicitly.

The Agent Loop

What it is: An architecture where the LLM perceives its environment, plans a next action, executes that action via tools, observes the result, and repeats, with the LLM deciding at each step what to do next.

What it does: Handles tasks that are genuinely open-ended at design time: the sequence of actions required isn't known until the agent starts working. An IT support agent might receive a report of a VPN authentication failure, decide to query the user's directory record, observe that the account is active, decide to check VPN gateway logs, observe an unusual certificate error, decide to check certificate expiration dates, and so on. Each step is determined by what the previous step revealed. The engineer who built the system didn't specify that sequence. The agent constructed it at runtime.

Architectural tradition: Anthropic's framework; also reflected in the ReAct (Reasoning + Acting) paper from Yao et al. (2022), which formalized the perceive-plan-act-observe loop. Anthropic's framework establishes a categorical boundary here: the agent loop is architecturally distinct from workflows, not a more complex version of them.

What makes it distinct: Tool use is what makes this possible. The agent loop requires the model to call external tools — APIs, databases, code execution environments — and incorporate the results into its next planning step. The mechanism for how tool calling works is covered in Lesson 5; the architectural point here is that without tool use, the agent loop is just a single LLM call. Tool use is what gives the loop its legs.

Comparing the Six: A Trait-Led Analysis

Editorial call: With five workflow variants plus one agent loop, a flat six-column table obscures more than it reveals. The workflow patterns cluster meaningfully by their controllability profile, and the agent loop sits outside that cluster entirely. The structure here is trait-led: for each of the three dimensions, I'll show where the workflow patterns fall relative to each other, then contrast the cluster against the agent loop. Every subject appears on every dimension.

Controllability — How Much Is Determined at Design Time?

The workflow patterns form a spectrum. Prompt chaining, routing, and parallelization are fully determined at design time: the engineer defines every transition, every branch, every parallel call. The LLM has no discretion about what happens next. Orchestrator-workers introduces limited runtime discretion — the orchestrator decides how to decompose the task — but within a system prompt that defines the decomposition domain. Evaluator-optimizer introduces variable iteration count, but the criteria and the loop structure are fixed.

The agent loop sits off this spectrum entirely. At design time, the engineer defines the tools available and the system prompt that shapes the agent's behavior. The sequence of actions (which tools, in what order, how many times) is determined entirely at runtime by the model's own planning. A workflow's path is a road. The agent loop's path is whatever the agent decides to walk.

Debuggability — How Easy Is It to Trace What Happened?

Prompt chaining and routing are the most debuggable architectures in this set. Each step is discrete, logged separately, and causally connected to the next. When something goes wrong, you find the step where the output degraded and fix the prompt or the code at that step. Parallelization adds one layer of complexity: you need to trace which parallel output corrupted the aggregation. Still tractable.

Orchestrator-workers requires tracing the orchestrator's decomposition decision, which is an LLM output and therefore not fully deterministic. You can log it, but explaining why the orchestrator decomposed the task the way it did requires inspecting the model's reasoning, which may or may not be exposed depending on whether chain-of-thought logging is enabled. Evaluator-optimizer adds iteration count variability: you need to trace not just what happened but how many times the loop ran before it stopped, and why the evaluator accepted the final output.

The agent loop is the hardest to debug by a significant margin. The path is non-deterministic. Two runs of the same agent on the same input may take different tool-calling sequences. Intermediate reasoning steps are often not logged by default. Tracing a failure means reconstructing a decision tree that the model built at runtime and may not have made visible. Better prompting won't fix this. It's a structural property of the architecture.

Failure Mode — What Goes Wrong and How?

Prompt chaining: A bad intermediate output propagates forward and corrupts downstream steps. The failure is usually visible at the final output but traceable to the step that degraded. Fix is local.

Routing: Misclassification sends the input to the wrong branch. The wrong handler produces a coherent but incorrect response — which is worse than an obvious error, because it may not be caught. The failure is in the classifier prompt or the category definitions.

Parallelization: One bad parallel output corrupts the aggregation. In voting patterns, minority errors get outvoted; in synthesis patterns, a single bad input can dominate the final output depending on how synthesis is weighted. The failure is in the aggregation logic as much as the parallel calls.

Orchestrator-workers: The orchestrator makes a bad decomposition, assigns the wrong subtask to the wrong worker, or misses a subtask entirely. The workers execute correctly within their (incorrectly scoped) assignments. The failure is in the orchestrator's planning, which is an LLM output and therefore requires prompt-level debugging.

Evaluator-optimizer: The loop converges on an output that satisfies the evaluator but not the actual requirement. This happens when the evaluator's criteria are underspecified — the evaluator is checking for what it was told to check, not for what the user actually needs. The loop terminates successfully; the output is wrong. This is the failure mode most likely to reach production undetected.

Agent loop: Compounding errors. Each action is based on observations from the previous action; a wrong early observation propagates through every subsequent planning step. The agent may also take actions that weren't anticipated at design time — calling a tool in an unexpected way, making a request that exceeds intended scope, or entering a loop that doesn't terminate. A single bad output isn't the failure mode here. A sequence of plausible-looking steps that collectively produce an outcome nobody intended is.

Field Language Guide

The buyer says: "We're building an agent." Here's what to do with that.

Don't say	Do say	Why it matters
"Great, what identity provider are you using?"	"When you say agent — is the system deciding its own next steps at runtime, or is the sequence predefined in code?"	The answer determines whether you're talking about a workflow governance problem or an agent governance problem. Different scope.
"Agents are just workflows with more steps."	"Workflows and agents are architecturally distinct. In a workflow, engineers define every transition. In an agent, the model decides what to do next."	Conflating them will cost you credibility with any buyer who has read the Anthropic framework — and the ones who haven't will be better served by the distinction.
"We support agentic workflows."	"Which pattern are you using: fixed sequence, dynamic routing, or the model choosing its own tool calls at runtime?"	Forces the buyer to locate their architecture. Their answer tells you what they actually need.
"That sounds like an orchestrator pattern."	"Is there a model coordinating other models, or is one model deciding its own sequence of tool calls?"	Orchestrator-workers is a workflow. The agent loop is not. The difference matters for how you scope governance.
"How many agents are in the system?"	"How many decision points exist where the model chooses what to do next, rather than following a defined path?"	"Number of agents" is not a useful unit. Decision autonomy is.
"We can help with the AI part."	"The governance questions for a workflow are different from the governance questions for an agent. Which one are we solving for?"	Positions you as someone who understands the architecture, not just the product category.
"Is this using LangChain or something similar?"	"What does the system do when a tool call returns an unexpected result — does it follow a predefined error path, or does the model decide how to recover?"	Recovery behavior is the fastest diagnostic for whether you're looking at a workflow or an agent loop.
"Agents are still pretty new, so there aren't many standards yet."	"The Anthropic framework distinguishes five workflow patterns from the agent loop. Which one matches what your team is building?"	Demonstrates fluency. Gives the buyer a framework they can use internally.
"That's a complex architecture."	"The evaluator-optimizer pattern is worth asking about specifically — it's the one where the loop count varies at runtime, which creates audit trail questions."	Names a specific pattern. Shows you know the taxonomy. Surfaces a real governance concern.
"We'll need to loop in our SE."	"Before we bring in the SE, can you tell me whether the model is choosing which tools to call, or whether that's hardcoded?"	Gets the diagnostic information you need to brief the SE correctly.

“

Okta Concept Mapping

The closest IDAM analog to the workflow/agent distinction is the difference between a SCIM provisioning workflow and a PAM (Privileged Access Management) session. A SCIM workflow is fully predefined: create user, assign group, provision app. Every step is defined at design time; the system executes it. A PAM session involves a human operator making real-time decisions about what commands to run, with the session recorded and audited. The workflow patterns in this piece are SCIM: predefined, auditable, bounded. The agent loop is the PAM session, except the operator is the model rather than a human, and the audit trail depends entirely on whether the framework logs intermediate reasoning steps. Most don't, by default. That's where the analogy breaks, and it's the break point worth naming when a buyer asks how they'll audit what the agent did.

The canonical distinction is not subtle: workflows run on paths engineers define; agents run on paths models construct. The five workflow patterns differ in how much runtime variability they permit — none for prompt chaining and routing, limited for orchestrator-workers, iteration-variable for evaluator-optimizer — but they all share the property that the overall structure was decided before the system ran. The agent loop doesn't share that property. That's the source of its capability, and the source of every governance question your buyer is going to ask, whether or not they know to ask it yet.

Lesson 5 covers how tool calling actually works, the mechanism that makes the agent loop possible. Take from this piece the map: six patterns, one clean line, and the question that locates every buyer conversation on the right side of it.