When a buyer says "we're building agentic AI," they are almost always describing a workflow. The federal vocabulary hasn't caught up to the technical vocabulary, and nobody in the room is going to pause the meeting to sort it out. Anthropic drew the sharpest linein the field: workflowsare systems where LLMs and tools follow predefined code paths; agentsare systems where the LLM dynamically directs its own processes and tool usage. That distinction governs authorization, cost, failure modes, and auditability. The AE who can name which one the buyer actually needs, and explain why it matters for each of those dimensions, changes the shape of the conversation.
Workflows
What it is: A system where your code determines the sequence and an LLM handles the reasoning inside each step.
What it does: Breaks a complex task into predictable stages. Each stage calls an LLM, gets a result, and hands it to the next stage through logic a developer wrote. The path is fixed before the system runs. Validation gates between steps catch errors before they compound.
Where it comes from: Anthropic's "Building Effective Agents" paper (December 2024), authored by engineers Erik Schluntz and Barry Zhang, identifies five workflow patterns that cover most of what's actually running in production:
| Pattern | What happens | When you'd use it |
|---|---|---|
| Prompt chaining (step by step) | Step A feeds step B feeds step C, with validation between each | Tasks that break into clear sequential stages |
| Routing (sort and send) | A classifier LLM reads the input and sends it to the right handler | Distinct categories needing different treatment, or cheaper models for easy cases |
| Parallelization (run side by side) | Multiple LLM calls run at the same time; results get aggregated | Independent subtasks, or voting on the same task for reliability |
| Orchestrator-workers (divide and conquer) | A central LLM breaks the task apart, delegates subtasks, synthesizes results | Work where the full scope of subtasks isn't known until you start |
| Evaluator-optimizer (generate, critique, repeat) | One LLM generates; another evaluates; loop until quality criteria are met | Tasks where iteration measurably improves output |
What makes it distinct: You can read the code and know every path the system might take before it runs. The LLM is doing real work at each node, but it's on rails. A workflow is a flowchart that happens to have a language model at some of the decision points.
This maps cleanly to how you already think about SCIM provisioning or lifecycle management. Joiner-mover-leaver is a workflow: predefined triggers, predefined steps, predictable authorization at each stage. The identity question for AI workflows is the same one you already know — what can each step access, and who authorized it? The analogy holds well here. It starts to break when the system can choose its own next step.
Agents
What it is: An LLM autonomously using tools in a loop, deciding its own next action based on what it observes.
What it does: Receives a goal. Plans a step. Picks a tool. Calls the tool. Reads the result. Decides the next step. Repeats until it judges the task complete or hits a wall. Decides is doing a lot of work in that description. Nobody wrote the sequence in advance. The model is choosing which tool to call, in what order, based on what it's learned so far in the loop.
Where it comes from: Same Anthropic paper, refined in their October 2025 follow-up on context engineering for agents. The simplified test: tool use, plus a loop, plus the model choosing what happens next. If all three are present, it's an agent.
What makes it distinct: Nobody, including the people who built it, can tell you exactly what the agent will do before it runs. The agent figures it out. That autonomy is what makes agents powerful for open-ended tasks, and it's what makes them fail in ways workflows can't.
An agent's authorization problem looks more like privileged access management than standard provisioning. The agent needs credentials that may touch multiple systems, and the scope of access can't be fully predicted at design time because the agent chooses its own path. Think of it as a service account that decides at runtime which systems to call. If that sentence made you uncomfortable, good — that discomfort is the correct security instinct, and it's the conversation your buyer needs to have before deploying agents.
Four Dimensions That Matter in a Buyer Conversation
The comparison structure here is trait-led, anchored on the four dimensions that most directly affect a buyer's architecture and procurement decisions. Both subjects appear against each dimension.
Control and Predictability
Workflows give you a complete audit trail before the system runs. You wrote the paths. You know the paths. An agent's path is emergent. You can constrain it by restricting which tools are available, but within that set, the agent decides the sequence at runtime. For public sector buyers operating under OMB guidance that requires documented human oversight for AI systems, this is not an abstract concern.
How It Fails
Workflows fail at steps. A step produces bad output, the validation gate catches it or doesn't, and you debug a specific point in a known sequence. The failure is localized and legible.
Agents fail by accumulating. Most "agentic AI" conversations skip this entirely, and it's the single most important reliability concept for anything running in production.
Every time an agent calls a tool, the tool's output gets added to the agent's context window. Call a function, get 2,000 tokens of JSON back. Call another, get 3,000 more. The agent's working memory fills with tool results, previous reasoning steps, error messages from failed attempts, and the full history of everything it's tried. This is context bloat.
Performance degrades well before the agent hits the context limit. Anthropic's context engineering guidance describes the need to keep context "informative, yet tight" precisely because model performance degrades as context accumulates. Independent research confirms this is a function of input length itself: even replacing irrelevant tokens with blank spaces didn't eliminate the degradation. Practitioner benchmarks suggest most models are reliable to roughly 60–70% of their advertised context window (a finding published by, among others, Redis, which sells infrastructure designed to reduce context dependence, so calibrate accordingly). As context fills, models shift attention toward recent tokens and away from earlier ones. The original task instructions sit at the beginning of context. Tool results accumulate at the end. The agent progressively loses sight of what it was supposed to be doing.
Anthropic's own engineering team documented this directly: even a frontier model running in a loop across multiple context windows will fall short without additional structure. Their agent tried to do too much at once, ran out of context mid-implementation, and the next session "would then have to guess at what had happened."
The math is unforgiving. At 95% reliability per step over a 20-step chain, your combined success rate is 36%. Production agent design centers on aggressive context management and bounded autonomy. Smarter prompting won't fix this.
Cost Profile
Workflows are predictable. You know how many LLM calls a task requires because you designed the path. A 10-step workflow makes 10 LLM calls with known token counts at each step. You can estimate the bill before you run it.
Agents are open-ended, and the cost curve compounds. A simple task might take three tool calls; a complex one might take thirty. Each call processes the full accumulated context, so the tenth call is substantially more expensive than the first. By the time an agent is deep in a loop, it may be working with 60,000–80,000 tokens of effective context per call, even on a model with a 200,000-token window, because system prompts, tool definitions, and rolling history consume the rest. Anthropic's original paper states it plainly: "The autonomous nature of agents means higher costs, and the potential for compounding errors." The AE who can say "agents cost more because every step gets more expensive than the last" has a line that lands.
Authorization Surface
A workflow's authorization surface is static and enumerable. Step 1 needs access to system A. Step 3 needs access to system B. You can map this at design time and provision accordingly.
An agent's authorization surface is dynamic. Because the agent chooses which tools to call at runtime, you can't fully enumerate what it will access in advance. You can restrict the available tools, but within that set, the agent decides. This is the identity governance problem that makes security teams nervous. They're right to be nervous.
This maps directly to static vs. dynamic authorization. Workflows behave like service accounts with fixed permissions scoped to known operations (the way you'd provision a system-to-system integration today). Agents behave more like users with role-based access who choose which resources to touch at runtime. The CISO who hears "we're deploying agents" and immediately asks about blast radius is asking the right question — helping them frame it precisely is where you add value.
The Vocabulary Problem
Anthropic drew the line. Not everyone uses it.
Federal buyers don't distinguish between workflows and agents at all. When a CAIO says "agentic AI," they typically mean any system that takes actions without a human initiating each step. The State Department CIO publicly described plans to "slap AI agents on top of older systems." What she described, technically, is a workflow wrapping legacy systems. The word "agent" is doing a lot of work in buyer conversations, and most of that work is imprecise.
Even the model providers disagree on terminology. OpenAI's Agents SDK treats the agent as the building block of workflows rather than a separate category. LangChain/LangGraph uses "agentic workflow" as a compound term that blurs the line Anthropic draws.
Anthropic's framing remains the most useful reference point because it draws the sharpest line. Use it as your anchor, but don't expect the buyer to share your terminology.
How to Say This in the Field
| Don't say | Do say | Why it matters |
|---|---|---|
| "You need an AI agent for that" | "What you've described is a workflow: predefined steps with AI at each node. That's actually better for your use case because you can audit every path." | Positions you as someone who right-sizes, not upsells |
| "Workflows and agents are basically the same thing" | "A workflow follows a path you designed. An agent chooses its own path. The difference matters for authorization and cost." | Clean distinction the buyer can repeat to their CISO |
| "Agents are the future of AI" | "Agents are powerful for open-ended tasks, but most of what agencies are deploying right now are workflows. That's the right call for auditable processes." | Earns trust by not overselling |
| "Context windows are big enough now" | "Agents degrade before they hit the context limit. The longer the loop runs, the less reliably the agent remembers its original instructions." | Names the failure mode nobody else in the room will |
| "We can handle the identity piece" | "The identity question is different for workflows and agents. Workflows need static permissions scoped to known steps. Agents need dynamic authorization because they choose which systems to call at runtime." | Connects AI architecture to the buyer's existing security framework |
| "Agentic AI is autonomous" | "There's a spectrum. A workflow automates a known process. An agent makes its own decisions about what to do next. Most of what agencies call 'agentic' today is closer to the workflow end." | Corrects the vocabulary without condescending |
| "Just add guardrails" | "Guardrails constrain which tools the agent can use. They don't control the sequence. For auditable processes, a workflow gives you the control you need." | Specificity about what guardrails do and don't solve |
| "The AI handles it" | "In a workflow, your code handles orchestration and the AI handles reasoning at each step. In an agent, the AI handles both." | Makes the control boundary concrete |
| "Let's start with an agent pilot" | "Start with a workflow. If the task turns out to need dynamic decision-making, promote it to an agent. Anthropic's own guidance says start simple." | Cites the authority the buyer's technical team already respects |
| "Our platform supports agents" | "Does your use case actually need an agent? Most don't. The ones that do need identity governance that most platforms haven't solved yet." | Reframes from capability to fit |
The buyer who says "we want agentic AI to automate our claims workflow" is telling you, in one sentence, that they need a workflow. The word "workflow" is right there. The word "agent" is aspirational vocabulary wrapped around an operational need. Name what they actually need. Explain why it's the better fit. You've earned the room.
Things to follow up on...
- Anthropic's long-running agent failures: Anthropic Engineering published a candid account of how even frontier models fall short in extended agent loops without structured artifacts and session handoff mechanisms, documenting the production reality behind context bloat.
- Federal agentic AI adoption numbers: A May 2026 Market Connections survey of 200+ federal IT executives found more than half are planning agentic AI pilots while only 29% have documented kill-switch procedures, a gap worth tracking.
- OpenAI's competing vocabulary: OpenAI's Agents SDK treats the agent as the core building block of workflows rather than a separate category, which means your buyer's technical team may be using the same words to mean structurally different things depending on which vendor's docs they read.
- Context rot before context limits: Independent research confirms that model accuracy degrades as a function of input length itself, with most models reliable to roughly 60–70% of their advertised context window, making context management the primary production discipline for any agent deployment.

