Workflows or Agents: What's Actually Running, and What It Means for the Identity Surface

By Leigh Garrity— May 8, 2026

Workflows or Agents: What's Actually Running, and What It Means for the Identity Surface

The Two Systems

Agentic Workflows

What it is: A system where a model executes a predefined sequence of tool calls, with the path determined by code, not by the model.

What it does: The model handles the language work — reading inputs, generating outputs, interpreting tool results — but the orchestration logic lives outside it. A workflow might call a search API, summarize the results, then write a draft. The model does the summarizing and drafting; the code decides that search comes before summarize comes before draft. If a step fails, the code handles it. The model never decides to skip search and go straight to drafting.

Who's behind it: Anthropic's published framework defines workflows as "systems where LLMs and tools are orchestrated through predefined code paths." This is the field's settled reference point — the distinction between workflows and agents was genuinely contested as recently as 2024, and Anthropic's framing is what serious practitioners are using now. In practice, workflows are built on orchestration frameworks like LangChain or LlamaIndex, or on vendor-provided pipeline tools. Most enterprise AI deployments in production today are workflows, not agents, even when they're marketed as agents.

What makes it distinct: The model is the executor. The path is fixed before the model runs.

“

Callout: The Provisioning Workflow Analog

An agentic workflow maps cleanly to a Joiner/Mover/Leaver provisioning flow: predefined steps, deterministic sequence, model handles the language work the same way a connector handles attribute mapping. The analogy holds for understanding execution control. It breaks when you ask about failure recovery — a provisioning workflow fails loudly and stops; an agentic workflow's model can generate plausible-sounding output even when a tool call returns garbage. In a buyer conversation, this matters: "it completed" and "it completed correctly" are not the same claim.

True Agents

What it is: A system where the model decides its own next action at each step, running in a loop until it determines the task is complete.

What it does: The model receives a goal, a set of available tools, and the current state of the conversation. It decides which tool to call, calls it, receives the result, and decides what to do next — repeat until done. There's no predefined path. The model might search, find the results unhelpful, try a different query, find a document, decide to read a specific section, find a contradiction, and decide to flag it rather than resolve it. Every one of those decisions is the model's. The code provides the loop structure and the tool interfaces; the model provides the plan.

Who's behind it: Same Anthropic framework: agents are "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks." In practice, true agents are rarer in production than the market conversation suggests. They're more common in research, in internal tooling at AI-native companies, and in controlled pilots. The gap between "we're building an agent" and "we have an agent running in production" is wider than most buyers realize, and the reason for that gap is the failure mode covered below.

What makes it distinct: The model is the planner. The path emerges from execution. Keep that distinction close.

“

Callout: The Privileged Session Analog

A true agent running with broad tool access looks a lot like a privileged session in PAM terms: the model has standing access to a set of capabilities and decides in real time how to use them. The analogy holds for understanding why least-privilege is hard to apply — you can't pre-scope permissions to only what's needed if you don't know what the model will decide to do next. It breaks on auditability: a privileged session generates a session recording; a true agent's decision process lives inside the model's reasoning, which is not the same as a log. In a buyer conversation, this is the moment to ask who's reviewing what the agent decided to do, not just what it did.

Comparing Them: Trait-Led Analysis

The comparison structure here is trait-led analysis, anchored on the dimensions that actually matter for buyer decisions: execution control, failure mode, identity surface, and production readiness. These are the four questions a CAIO or security architect will eventually ask, even if they don't ask them in those terms.

Execution control

In a workflow, a human or a team decided the sequence before the model ran. The model's autonomy is bounded to each individual step. In a true agent, the model decides the sequence at runtime. That's a design choice with downstream consequences for every other dimension, not a quality judgment.

Ask the buyer to describe what happens when step three fails. In a workflow, they can tell you — the code handles it, or it escalates, or it retries. In a true agent, the answer is "the model decides," which is either reassuring or alarming depending on what the model has access to.

Failure mode

Most buyer conversations go sideways here, and the field itself was confused until recently.

The intuitive assumption is that agents fail by making a single catastrophically wrong tool call. The actual production failure mode is context bloat. Every step in an agent loop adds to the model's context window — the tool call, the result, any intermediate reasoning the model generated. Over a long loop, the context window fills. Performance degrades not because the model makes one bad decision but because the relevant signal gets crowded out by the accumulated history of previous steps. The model starts losing track of the original goal. It treats early decisions as constraints rather than revisiting them. It starts hallucinating connections between things that appeared near each other in the context rather than things that are actually related.

Practitioners who've run agents in production consistently identify this as the dominant failure pattern. Engineers who've pushed agent loops past ten or fifteen steps describe the same experience in community forums: the model doesn't fall off a cliff, it degrades gradually, and the degradation is hard to detect because the outputs still look coherent. That last part is the dangerous part.

Production agent design is therefore mostly about aggressive context management: summarizing intermediate results before they accumulate, pruning tool outputs that are no longer relevant, using external memory stores to offload history the model doesn't need in the active context, and bounding loop length so the model never gets far enough to degrade. This is not a solved problem. The field is actively working through it, and anyone who tells you they have a clean answer is probably selling something.

Workflows don't have this problem in the same way. Each step is bounded; the model isn't accumulating context across steps. This is one of the underappreciated reasons why most production AI deployments are still workflows: their failure modes are more legible, and "it degraded gradually over thirty steps" is not a failure mode you want to explain to a federal CISO.

Identity surface

A workflow has a bounded, predictable identity surface. You know which tools it will call, in what order, with what credentials. You can apply least-privilege because you know what the workflow needs before it runs. The permissions are as narrow as the path.

A true agent has an unbounded identity surface. The model decides which tools to call at runtime, which means you can't pre-scope permissions to only what's needed. You're either granting the agent broad access upfront and hoping, or you're building a real-time authorization layer that evaluates each tool call as the model requests it — which is harder, and which most current agent frameworks don't support natively.

This is the dimension that matters most for the identity conversation. It's also the dimension most buyers haven't thought through when they say "we're building an agent."

Production readiness

Workflows are in production at scale. True agents are in production in controlled contexts — specific tasks, bounded tool sets, human-in-the-loop checkpoints at high-stakes decisions. The gap is real, and it's not primarily about model capability. It comes down to the failure modes described above and the tooling to manage them. A buyer who says their agent is "in production" is worth asking: how long does a typical task run, and what's the longest loop you've seen it complete correctly?

“

Callout: Context Window and Session Lifetime

"Context window" in AI and "session" in IDAM are not the same thing, but they rhyme usefully: both are bounded containers that determine what the system knows about what's happened so far. The difference is that an IDAM session expiring is a clean event with a defined recovery path; a context window filling up degrades the model's performance gradually and silently. When a buyer asks about agent reliability, the IDAM instinct is to ask about session management — that instinct is directionally right, but the failure mode is softer and harder to detect. Frame it as: "What's your strategy for what the agent remembers across a long task?" rather than "How do you handle session expiry?"

How to Say This in the Field

The most common buyer situation: someone says "we're building an agent" and means a workflow. The distinction matters for scoping the identity surface — a workflow's permissions can be pre-scoped; an agent's can't. The table below is for that conversation.

Don't say	Do say	Why it matters
"Is that a workflow or an agent?"	"Does the model decide which tools to call, or is that predefined?"	The first question sounds like a quiz; the second surfaces the architecture.
"Agents are risky"	"What decisions do you want the model making autonomously, versus what's hardcoded in the sequence?"	Moves the conversation from risk posture to design intent.
"You need to scope the agent's permissions"	"Do you know which systems it will touch before it runs, or does that depend on what it decides to do?"	This is the identity surface question in plain English.
"The agent might hallucinate"	"What happens when the agent runs for twenty steps and starts losing track of the original goal?"	Names the actual production failure mode instead of the popular one.
"Context window"	"The agent's working memory for a task"	Keeps the conversation moving without stopping to define terms.
"The model is the orchestrator"	"The model decides what to do next at each step"	More concrete; doesn't require the buyer to know what orchestration means.
"We need to audit the agent"	"How are you capturing what the agent decided to do, not just what it did?"	Distinguishes action logs from decision logs — the gap that matters for accountability.
"Least privilege is hard with agents"	"With a workflow, you can pre-scope exactly what it needs. With an agent, you're either granting broad access upfront or building real-time authorization."	Gives the buyer a concrete design choice instead of a problem statement.
"Is this a true agent?"	"Walk me through what happens when one step doesn't go as expected — who decides what to do next?"	The answer tells you whether the model is the planner or the executor.
"Agent identity is complicated"	"Every tool call the agent makes is an authorization decision. How many of those happen per task, and who's reviewing them?"	Quantifies the identity surface in terms the buyer can act on.
"You should use a workflow instead"	"For this use case, if the sequence of steps is predictable, you get a much cleaner identity surface and more predictable failure modes."	Frames the recommendation as a design tradeoff, not a downgrade.
"The agent has credentials"	"What credential does the agent present when it calls your systems, and does it present the same credential for every call regardless of what it's doing?"	Opens the conversation about credential scoping without assuming the buyer has thought about it.

A buyer who says "agent" and means "workflow" is using the term the way the market uses it, which is loosely. Correcting them isn't the job. Asking the question that surfaces which architecture they actually have is — because the architecture determines the identity surface, and the identity surface determines what the conversation about access control looks like.

One sentence does that work: Does the model decide what to do next, or does the code?

Everything else follows from the answer.