From One Prompt to a System: The Spectrum of AI Applications

Maps the four-level AI complexity spectrum so sellers can spot when buyers are building at the wrong level and ask the right question.

By Leigh Garrity— May 8, 2026

From One Prompt to a System: The Spectrum of AI Applications

Maps the four-level AI complexity spectrum so sellers can spot when buyers are building at the wrong level and ask the right question.

AI applications are built in layers of increasing complexity, from a single prompt-and-response up through systems that choose their own tools and sequence their own steps. That range forms a spectrum. Most of the expensive mistakes in enterprise AI happen because someone skipped levels on it, or couldn't tell which level they were looking at.

This section covers that spectrum. The goal is structural literacy: enough understanding of how AI systems are actually assembled that when a buyer says "we're building an agent," you can ask the question that reveals whether they mean it.

Four levels of AI application

Anthropic's "Building Effective Agents" framework is the cleanest published reference for this architecture ladder. It's a practitioner guide from a model provider's engineering team, not a formal standard, but it has become the reference point most enterprise AI teams work from. Worth anchoring to because it's specific where other frameworks are vague.

Four levels:

Single LLM call. One prompt in, one response out. No tools, no retrieval, no looping. You send text, the model returns text. Classification, summarization, extraction, Q&A. This is the atomic unit of every AI application.

Augmented LLM. Still a single call, but the system retrieves relevant data and injects it into the prompt before the model sees it. This is where retrieval-augmented generation (RAG) lives: pulling your documents at query time so the response is grounded in your actual data instead of relying solely on the model's training. The model still doesn't loop or decide what to do next. It just gets better input. Anthropic's advice: for many applications, optimizing single LLM calls with retrieval and in-context examples "is usually enough." Most teams don't need to climb higher than this.

Workflow. Multiple LLM calls orchestrated through predefined code paths. Your code decides the sequence. The model fills in content at each step, but the path is fixed. Anthropic names several patterns here: prompt chaining, routing, parallelization, orchestrator-workers. The key word is predefined. A workflow follows a path somebody already drew.

Agent. Anthropic's definition: "systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks." The model selects tools, sequences steps, and adjusts based on what it learns from each action's result. The model helps decide what path to take.

A note on the word "agent," because it is doing heroic amounts of work across the industry right now. Some teams use it for anything with a tool call. Others reserve it for fully autonomous systems. Anthropic acknowledges this ambiguity directly. This section uses their distinction as the anchor: when the model chooses its own next step, that's an agent. When the code chooses, that's a workflow. The distinction was genuinely contested as recently as 2024, and Anthropic's December 2024 framework is the version that gained traction. Their guide calls this line "agentic," a generous word for a bright line between predictable and unpredictable.

Building at the wrong level

The spectrum matters because decision-makers routinely build at the wrong level. The error runs both directions.

Over-engineering is more common and more visible. A team that needs document classification builds an agent loop with tool selection and multi-step reasoning. The result is slower, more expensive, harder to debug, and no more accurate than a single well-crafted prompt. Anthropic states the tradeoff plainly: agentic systems "trade latency and cost for better task performance." If the task doesn't need that performance, you're paying the cost for nothing.

Gartner has projected that over 40% of agentic AI projects will be canceled by end of 2027. That stat is widely cited in practitioner discussions of Gartner's paywalled research; the link goes to one such discussion, not to Gartner directly. And Gartner projections deserve the skepticism you'd give any forecast built on a methodology they don't fully publish. But it reflects a pattern that practitioners confirm independently: organizations reaching for the top of the spectrum before they've exhausted what the bottom can do.

Under-engineering is subtler and harder to catch. A team locks a genuinely dynamic task into a rigid workflow because the workflow felt safer. The system handles the three scenarios the designers anticipated. The fourth scenario either fails silently or routes to a human queue that defeats the purpose of building the system at all. You won't find this failure mode in statistics because it shows up as missed capability, not visible breakdown. But some tasks genuinely require the model to select tools and sequence steps at runtime, and a predefined path will never accommodate what the designers didn't foresee.

Anthropic's practical advice, echoed broadly across the practitioner community: start at the simplest level that solves the problem. Escalate only when a measured failure mode justifies the added complexity. In their words:

“

"This might mean not building agentic systems at all."

Bridge: API Calls → LLM Calls

In identity, an API call is deterministic. You send a request, the endpoint executes a defined function, you get a predictable response. A single LLM call looks structurally similar: prompt in, response out. It diverges here: the LLM's response is probabilistic. The same input can produce different outputs. There is no defined function being "executed." The model predicts likely text given context. Your API intuition helps you understand the shape of the interaction. It starts to mislead you at the output layer, where repeatability is not guaranteed.

Bridge: Service Accounts → AI Agents

In identity, a service account is a non-human identity with an owner, credentials, a lifecycle, and audit requirements. An AI agent maps naturally onto this model. It needs the same inventory, ownership, and revocation paths. It diverges here: a service account's permissions are static. An agent can interpret untrusted input, choose tools at runtime, and chain permissions in ways a static integration never would. The service-account model covers governance and lifecycle. It misses the fact that the agent's next action may not be predictable from its current permissions.

How every piece in this section works

Those two callouts show the pattern that runs through every piece in this section. We start from an identity concept you already hold, use it to build toward the AI mechanism, then name the exact point where your identity intuition stops bearing weight.

The break point is where the real learning lives. The analogy gets you close enough to engage with the concept on familiar ground. The break tells you what's genuinely new, where leaning on the familiar will actively mislead you.

You'll see pieces that bridge from OAuth scopes to AI task authorization, from federation trust to model-to-model trust, from session state to context windows. In each case, the bridge is real and useful up to a marked line. Past that line, we'll tell you.

The goal across the section: follow a buyer's AI conversation without bluffing. Recognize whether your identity expertise applies or misleads in a given moment. Know which question to ask. In a field where definitions were still moving eighteen months ago, that skill outlasts any specific answer.

What comes next: we start at the bottom of the spectrum and work up. First stop is the single LLM call, where the concept of a "token" means something completely different than it does in your world. That collision turns out to be more useful than it first appears.

Things to follow up on...

Multi-agent failure rates: A study analyzing 1,642 execution traces across seven open-source agent frameworks found failure rates ranging from 41% to 86.7%, with coordination breakdowns as the largest category.
Anthropic's "start simple" framework: The five workflow patterns Anthropic names (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer) are defined with use-case guidance in their Building Effective Agents guide, which this section's subsequent lessons will unpack individually.
Context engineering as a discipline: Anthropic Engineering published a detailed guide on assembling optimal context for AI agents, framing it as the natural progression beyond prompt engineering and the skill that determines whether augmented LLM calls actually work.
Agent architecture pattern taxonomy: A practitioner taxonomy covering supervisor-worker, debate, and handoff topologies maps where each pattern survives or breaks down in production, useful background for when buyers describe their multi-agent plans.