Workflows vs. Agents: The Architectural Distinction That Actually Matters

By Leigh Garrity— May 6, 2026

The short version: workflows orchestrate LLMs through predefined code paths. Agents let the LLM dynamically direct its own tool use and sequence.

Workflows

What it is: A system where the sequence of LLM calls and tool invocations is determined by code written before runtime.

What it does: A workflow takes an input, moves it through a defined series of steps — each potentially involving an LLM call, a tool call, a conditional branch, or all three — and produces an output. The developer specifies the path. The LLM executes within each step. The path itself does not change based on what the LLM produces.

Where it comes from: The workflow framing predates the current generation of LLMs — it's borrowed from software engineering, where workflow engines have orchestrated business processes for decades. What's new is that individual steps in the workflow can now involve language model inference rather than deterministic computation. Anthropic's framework gave the AI-specific version a clean taxonomy, but the underlying pattern is familiar to anyone who's built a data pipeline or a BPM system.

What makes it distinct: The sequence is owned by the developer, not the model. The LLM is a component inside the workflow, not the director of it. That's the trait that matters most when evaluating an architecture proposal.

The Five Patterns

Workflows aren't monolithic. Anthropic's framework identifies five patterns, each suited to a different class of problem.

Prompt chaining is the simplest. The output of one LLM call becomes the input to the next, in a fixed sequence. You might use it to draft a document, then extract key claims from the draft, then verify each claim against a knowledge base. Each step is a separate inference call; the sequence is hardcoded. The advantage is that each step can be optimized and tested independently. The failure mode is that errors compound — a bad output from step two poisons step three, and the chain doesn't self-correct.

Routing classifies an incoming input and directs it to a specialized downstream handler. A customer support system might route billing questions to one prompt, technical questions to another, and escalation requests to a human queue. The routing decision can itself be made by an LLM (more flexible) or by a classifier (faster and cheaper). The possible routes are defined in advance — the system can only go where the developer drew a path.

Parallelization runs multiple LLM calls simultaneously rather than sequentially, in two distinct variants. Sectioning breaks a large task into independent subtasks and runs them in parallel — useful when a document is too long to process in a single context window, or when independent analyses need to be combined. Voting runs the same task multiple times and aggregates the results — useful when you want to reduce variance or catch errors that a single call might miss. Both variants involve the developer deciding upfront that parallelization is appropriate and specifying how the outputs get recombined.

Orchestrator-workers introduces a second LLM into the picture, but in a specific role: one model (the orchestrator) breaks a complex task into subtasks and assigns them to specialized worker models. The orchestrator's job is decomposition and delegation; the workers execute. This looks superficially like an agent — there's a model making decisions about what other models should do — but the orchestrator's decision space is bounded by what the developer has defined as possible worker tasks. It can't invent a new worker or call a tool that wasn't provisioned.

Evaluator-optimizer uses one LLM to generate a response and a second to evaluate it against a defined quality criterion, looping until the criterion is met or a maximum iteration count is reached. The evaluator's criteria are specified by the developer. The loop structure is specified by the developer. How many iterations it takes varies at runtime. This pattern suits tasks with clear quality signals — translation accuracy, code correctness, factual consistency — and struggles with tasks where "good enough" is inherently subjective.

The Agent Loop

What it is: A system where the LLM dynamically determines the sequence of its own tool use based on what it observes at runtime.

What it does: An agent receives a goal or task, then enters a loop: it reasons about what to do next, selects and invokes a tool, observes the result, updates its understanding of the situation, and decides whether to continue or stop. The sequence of tool calls is not predetermined. The model decides, at each step, what action to take based on what it just learned. This loop continues until the model determines the task is complete or a hard stop condition is triggered.

Where it comes from: The agent loop formalizes a pattern that emerged from research into LLM reasoning — specifically, the observation that models perform better on complex tasks when they're allowed to interleave reasoning and action rather than producing a single output. The ReAct paper (Yao et al., 2022) is the academic anchor; the practical implementations followed quickly as tool-calling APIs became available from major model providers. Anthropic's framework treats the agent loop as the defining characteristic of agentic systems, distinct from any specific implementation.

What makes it distinct: The LLM owns the sequence. The developer provisions the tools and sets the goal; the model decides which tools to call, in what order, with what inputs. The execution path is not knowable before the task runs, and it cannot be fully specified in code.

Comparison: Trait-Led Analysis

A flat A/B table doesn't work here because the comparison isn't symmetric — five workflow patterns versus one loop structure. The more useful structure is trait-led: anchor on the dimensions that actually drive enterprise decisions, then show where each pattern and the agent loop land.

Four traits carry the weight in real architecture reviews, especially in environments where compliance, cost control, or security posture are active concerns: sequence ownership, failure surface, auditability, and resource commitment.

Sequence ownership is the definitional split. Every workflow pattern — prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer — has a sequence that was written by a developer before the system ran. The LLM executes within steps; it doesn't choose the steps. The agent loop inverts this. The model chooses the steps. If you need to answer the question "what will this system do when it receives input X," a workflow lets you answer it by reading the code. An agent does not.

Failure surface differs in kind, not just degree. Workflow failures are bounded by the code path — a step fails, the workflow fails (or catches the error, if the developer wrote error handling). The failure modes are enumerable in advance. Agent failures are harder to anticipate because the failure can emerge from the model's reasoning, not just from individual tool calls. A model can enter a reasoning loop, call the wrong tool for a plausible-sounding reason, or reach a confident wrong conclusion and stop. The evaluator-optimizer pattern partially addresses this within workflows by building in a quality check, but the check criteria are still developer-specified. In the agent loop, the model's own judgment is the primary quality gate.

Auditability follows directly from sequence ownership. Workflow execution produces a trace that matches the code path — you can reconstruct exactly what happened and why each step was triggered. Agent execution produces a trace too, but interpreting it requires reconstructing the model's reasoning at each decision point, which is harder and sometimes impossible without chain-of-thought logging. In regulated environments, "why did the system do that" is a compliance question, not just a debugging question.

Resource commitment is where the practical tradeoffs get concrete. Workflows have predictable compute and latency profiles because the number of LLM calls is bounded by the code path. Parallelization adds cost but reduces latency; evaluator-optimizer adds both cost and latency but improves quality. The agent loop has neither bound — a task that requires twelve tool calls costs more and takes longer than one that requires three, and you don't know which it will be until it's done. Size agents correctly and set hard stop conditions; the unpredictability is a design constraint, not a disqualifier.

Workflows are appropriate when the task structure is known in advance, when auditability is a hard requirement, or when cost and latency need to be predictable. The agent loop is appropriate when the task structure genuinely cannot be specified in advance — when the sequence of actions depends on what the model discovers during execution. The evaluator-optimizer pattern sits closest to the agent loop on the control spectrum, but it's still a workflow: the loop structure is in the code, not in the model's reasoning.

Field Language Guide

Don't say	Do say	Why it matters
"We're building an AI agent"	"We're using a workflow with an orchestrator-worker pattern" or "we're using an agent loop where the model directs its own tool use"	Precision signals you've evaluated the architecture, not just the marketing
"The AI will figure out the steps"	"The LLM determines the sequence of tool calls at runtime based on what it observes"	Surfaces the control question before the buyer asks it
"It's automated"	"It follows a predefined code path" or "the model decides the path dynamically"	"Automated" describes both; the distinction is who owns the sequence
"It's deterministic"	"The code path is predefined, but LLM outputs within each step are probabilistic"	Prevents the buyer from assuming workflow means predictable outputs
"It routes requests to the right model"	"It classifies the input and directs it to a specialized downstream prompt or handler"	Makes the routing logic concrete and auditable
"It runs things in parallel"	"It sections the task across parallel calls" or "it runs independent checks simultaneously and aggregates the results"	Distinguishes the two parallelization variants, which have different cost profiles
"It has a manager LLM"	"It uses an orchestrator-worker pattern: one model decomposes the task and directs specialized workers"	Grounds the description in a recognizable pattern with known tradeoffs
"It keeps trying until it gets it right"	"It uses an evaluator-optimizer loop with a developer-specified quality criterion and a maximum iteration count"	Prevents the buyer from assuming infinite retry behavior
"Agents are more powerful"	"Agents handle tasks where the sequence of actions can't be specified in advance"	Grounds the capability claim in a specific circumstance instead of a general assertion
"It can take actions"	"The agent loop includes an observe step where the model evaluates tool results before deciding the next action"	Makes the loop structure visible, which is the first step toward auditing it
"We need auditability"	"Workflows produce a fixed execution trace; agents produce a dynamic one that requires chain-of-thought logging to interpret"	Gives the buyer a concrete infrastructure requirement to evaluate
"It's agentic"	"The LLM owns the sequence of operations"	"Agentic" is a marketing word; this is the mechanism

“

Okta Concept Mapping

The closest Okta analog to this distinction is Okta Workflows (the product) versus a hypothetical AI agent with admin-level access to your tenant. Okta Workflows is a no-code orchestration tool: you define triggers, conditions, and actions in a flow that runs exactly as configured. That's the workflow pattern — predefined code path, bounded execution, auditable trace. An AI agent operating on your Okta tenant would be something different: a model that decides at runtime which API calls to make, in what order, based on what it observes. The analogy holds cleanly for the first half. It breaks on the second, because Okta Workflows gives you a visual representation of every possible execution path before you deploy it. An agent does not. "We're using AI to automate identity workflows" and "we're using an agent to manage identity decisions" are not the same sentence, and the second one should prompt a different set of questions about scope, credential management, and stop conditions.