Recap — Deployment Patterns Cheat Sheet

A decision framework mapping AI deployment patterns to their identity implications, built for fast pattern recognition in discovery calls.

By Leigh Garrity— May 9, 2026

A decision framework mapping AI deployment patterns to their identity implications, built for fast pattern recognition in discovery calls.

The Diagnostic Ladder

Four questions, synthesized from the patterns you just worked through. Each one gates whether you need the next rung. The governing principle from the section opener: start at the simplest pattern that works. Escalate only when you can name the problem the current rung can't solve.

#	Diagnostic Question	YES	NO
1	Is the knowledge already in the model?	Single LLM call may suffice	Add retrieval (RAG)
2	Is the path predictable?	Workflow — code controls the sequence	Agent — model directs its own path
3	Does the model choose which tools to call?	Agent-level authorization problem	Workflow-level; code governs tool calls
4	Does it need to coordinate with other agents?	Multi-agent — delegation chains compound	Single agent loop

Q3 deserves a flag. Recall from the tool use piece: tools appear at multiple rungs. Workflows use tools. Agents use tools. The gate is who picks the tool, the code or the model. That single distinction reshapes the identity problem completely.

If you remember nothing else

Start at the simplest rung. Every rung you climb adds authorization surface you have to govern. If you can't name the problem the current rung can't solve, you haven't earned the next one.

The Patterns

Single LLM Call

One prompt in, one completion out. No retrieval, no tools, no loops.

When it comes up: Buyer describes a chatbot, a summarizer, a classification endpoint. "We're using GPT for internal Q&A." Identity is simple here: who can call the model?
Don't confuse with: RAG. If the buyer mentions "pulling in documents" or "searching our knowledge base," they've left this rung.

RAG (Retrieval-Augmented Generation)

External knowledge retrieved and injected into the model's context before generation. As you saw in the RAG lesson, the mechanism is embed-index-retrieve-generate.

When it comes up: Buyer says "we're grounding the model on our data" or "it searches our docs first." Now the identity question shifts: what data can be retrieved, and do the original access controls survive the vector index?
Don't confuse with: Fine-tuning. RAG adds knowledge at query time. Fine-tuning bakes behavior into model weights. A buyer who says "we trained it on our data" might mean either. Ask which.

Fine-Tuning (not a rung — a model customization method)

Modifying model weights to change behavior, style, or format. The fine-tuning lesson made the point clearly: this shapes how the model responds. It has almost no reliable effect on what the model knows. Knowledge injection through fine-tuning is unreliable; RAG is the tool for grounding in current data.

When it comes up: Buyer says "we trained a custom model" or "we fine-tuned it on our corpus." Identity concern: who controls the training data pipeline, and does the fine-tuned model inherit or lose the base model's safety constraints?
Don't confuse with: RAG. If the buyer needs current, document-level knowledge with access controls, fine-tuning is the wrong tool. This is the single most common confusion in discovery.

Workflow

Multiple steps, fixed orchestration. Code controls the sequence. Anthropic names five patterns: chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer. Models may fill individual steps, but the path is predetermined.

When it comes up: Buyer describes a pipeline. "First it classifies, then routes, then generates." What matters for identity: who can trigger each path, and what credentials execute at each step?
Don't confuse with: Agent. If the buyer says "it decides what to do next based on what it finds," that's past workflow territory. Recall the core distinction from the workflows piece: workflows don't improvise.

Agent

Model dynamically directs its own process and tool usage. Plans, acts, observes, adjusts. The loop runs until the task completes or a stop condition fires. Anthropic's definition is the anchor worth holding onto when the buyer uses "agent" to mean six different things.

When it comes up: Buyer says "it figures out the steps on its own" or "it can browse, search, and file tickets." The hard identity question surfaces here: who constrains dynamic tool choice, and can you revoke the agent without breaking the human user's access?
Don't confuse with: Workflow with tools. The test is autonomy. A workflow calls five APIs in a fixed sequence. An agent calls the same five APIs in an order it determined at runtime. The difference is who chose the path.

Multi-Agent System

Multiple agents coordinate via supervisor-worker, debate, or handoff patterns. Each agent may carry its own tools, context, and credentials. The multi-agent lesson made one thing abundantly clear: coordination overhead compounds fast.

When it comes up: Buyer describes specialized agents handing off tasks. "One agent researches, another drafts, a third reviews." Identity at this rung is about delegation: how does delegated authority propagate across agents, and where does it terminate?
Don't confuse with: A workflow with parallel steps. Parallel workflows have independent steps with independent credentials. Multi-agent systems have delegation chains where authorization decisions compound.

If you remember nothing else: Ask the buyer: "Does the model decide what to do next, or does your code?" The answer places them on the ladder.

Vocabulary Collision Tables

Terms Where Your IDAM Intuition Misleads

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Token	Unit of text (~4 characters). Billing and context-window unit.	Bearer token, ID token, refresh token	LLM tokens carry zero authorization semantics. "500K tokens/day" is a cost statement, never a credential count.
Agent	System where an LLM dynamically directs its own tool use	Endpoint software, daemon, background process	Buyers call everything an agent. Only systems where the model directs tool choice at runtime raise the authorization-at-runtime problem.
Scope	Task constraint, tool availability, often expressed in natural language	OAuth scope string — machine-enforceable, checked by authorization server	"Only summarize HR policy" in a system prompt is an instruction, nothing more. Natural-language scope carries zero enforcement weight.
Session	Conversation thread, retained chat state, memory store	Stateful authenticated interaction with expiration and revocation	Ending a chat session does not revoke downstream credentials, clear retained logs, or remove agent memory.
Context	Everything assembled for a model call: prompt, retrieved docs, chat history, tool outputs	Risk signals — device posture, network, location, behavior	AI context is simultaneously evidence and instruction. A retrieved document can carry facts and prompt injection in the same payload.
Identity	Human user, AI agent, app, model provider, MCP server, downstream API principal	Managed subject — human, service account, device, workload	"The AI did it" tells you nothing auditable. You need to know which principal acted at each hop.

Identity Model Per Pattern

Your IDAM knowledge genuinely helps here, up to a point. The "Where It Breaks" column marks that point for each pattern.

AI Pattern	IDAM Equivalent	Where It Holds	Where It Breaks
Single LLM call	API call to a service	Authentication, rate limiting, access logging all apply	Same input produces different outputs. You can't acceptance-test it like a deterministic endpoint. Standard monitoring answers "is it running?" and stops there. Whether it's right requires evals.
RAG	Search with ACL-trimmed results	Per-document access control is the right instinct	Vector indexes often flatten ACLs at embed time. Access controls must be re-applied at retrieval, not assumed from the source system.
Workflow	Orchestrated application process	Step-level authorization, credential-per-step, audit trail	A workflow step can return 200 OK with a wrong answer. The model fills content probabilistically inside a deterministic frame. Evals exist for exactly this reason.
Agent (including MCP tool discovery)	Service account / non-human identity	Credential issuance, scoping, revocation all apply	Static service-account permissions miss dynamic tool choice. MCP standardizes tool discovery but does not enforce resource-level authorization. Credentials must be short-lived and scoped per task.
Multi-agent	Multiple service accounts with delegation	Each agent needs its own identity, credentials, audit trail	Delegation chains form at runtime and can't be fully pre-configured. Agent A grants Agent B access to a tool neither was explicitly authorized for in combination. Blast radius is non-deterministic.

If you remember nothing else

When a buyer uses any of these terms, pause before assuming you know what they mean. Same word. Different mechanism entirely.

Source Index

Every entry above traces to the Patterns & Practice section. Use this to navigate back when a concept needs more depth than the recap provides.

Concept	Source Article
Complexity ladder, diagnostic questions, "start simple" principle	The Spectrum of AI Applications (Section Opener)
Context engineering, system prompt, context failure modes	Prompting and Context Engineering
RAG mechanics, access controls per-document, naive RAG failures	Retrieval-Augmented Generation
Fine-tuning vs. prompting, knowledge injection misconception	Fine-Tuning vs. Prompting
Workflows vs. agents, five workflow patterns, agent loop	Workflows vs. Agents
Tool use, function calling, MCP integration and authorization	Tool Use, Function Calling, and MCP
Evals, observability, tracing, tool-call failure modes	Evals and Observability
Multi-agent topologies, delegation chains, coordination overhead	Multi-Agent Patterns
Vocabulary collisions (token, agent, scope, session, context, identity)	Introduced across all lessons; consolidated in section glossary

Things to follow up on...

MCP's accumulating security surface: Trend Micro's scan found 492 MCP servers running without basic security controls, and the OWASP MCP Top 10 is now in beta — worth tracking as the auth story matures.
Multi-agent failure rates in practice: The MAST study analyzed 1,642 execution traces across seven open-source frameworks and found failure rates ranging from 41% to 86.7%, with coordination breakdowns as the largest category — useful ammunition when a buyer's pitch outpaces their architecture.
Anthropic's agent-building guidance: The canonical Building Effective Agents piece defines the workflow-vs-agent distinction this entire ladder rests on, and it's worth reading in full for the five workflow pattern descriptions and the complexity warnings.
Context engineering as a discipline: Anthropic Engineering published a detailed guide on effective context engineering for AI agents that explains why context assembly is now the core production skill — and why prompt engineering alone stopped being sufficient.

The Diagnostic Ladder

#	Diagnostic Question	YES	NO
1	Is the knowledge already in the model?	Single LLM call may suffice	Add retrieval (RAG)
2	Is the path predictable?	Workflow — code controls the sequence	Agent — model directs its own path
3	Does the model choose which tools to call?	Agent-level authorization problem	Workflow-level; code governs tool calls
4	Does it need to coordinate with other agents?	Multi-agent — delegation chains compound	Single agent loop

If you remember nothing else

Start at the simplest rung. Every rung you climb adds authorization surface you have to govern. If you can't name the problem the current rung can't solve, you haven't earned the next one.

The Patterns

Single LLM Call

One prompt in, one completion out. No retrieval, no tools, no loops.

When it comes up: Buyer describes a chatbot, a summarizer, a classification endpoint. "We're using GPT for internal Q&A." Identity is simple here: who can call the model?
Don't confuse with: RAG. If the buyer mentions "pulling in documents" or "searching our knowledge base," they've left this rung.

RAG (Retrieval-Augmented Generation)

External knowledge retrieved and injected into the model's context before generation. As you saw in the RAG lesson, the mechanism is embed-index-retrieve-generate.

When it comes up: Buyer says "we're grounding the model on our data" or "it searches our docs first." Now the identity question shifts: what data can be retrieved, and do the original access controls survive the vector index?
Don't confuse with: Fine-tuning. RAG adds knowledge at query time. Fine-tuning bakes behavior into model weights. A buyer who says "we trained it on our data" might mean either. Ask which.

Fine-Tuning (not a rung — a model customization method)

When it comes up: Buyer says "we trained a custom model" or "we fine-tuned it on our corpus." Identity concern: who controls the training data pipeline, and does the fine-tuned model inherit or lose the base model's safety constraints?
Don't confuse with: RAG. If the buyer needs current, document-level knowledge with access controls, fine-tuning is the wrong tool. This is the single most common confusion in discovery.

Workflow

When it comes up: Buyer describes a pipeline. "First it classifies, then routes, then generates." What matters for identity: who can trigger each path, and what credentials execute at each step?
Don't confuse with: Agent. If the buyer says "it decides what to do next based on what it finds," that's past workflow territory. Recall the core distinction from the workflows piece: workflows don't improvise.

Agent

When it comes up: Buyer says "it figures out the steps on its own" or "it can browse, search, and file tickets." The hard identity question surfaces here: who constrains dynamic tool choice, and can you revoke the agent without breaking the human user's access?
Don't confuse with: Workflow with tools. The test is autonomy. A workflow calls five APIs in a fixed sequence. An agent calls the same five APIs in an order it determined at runtime. The difference is who chose the path.

Multi-Agent System

When it comes up: Buyer describes specialized agents handing off tasks. "One agent researches, another drafts, a third reviews." Identity at this rung is about delegation: how does delegated authority propagate across agents, and where does it terminate?
Don't confuse with: A workflow with parallel steps. Parallel workflows have independent steps with independent credentials. Multi-agent systems have delegation chains where authorization decisions compound.

If you remember nothing else: Ask the buyer: "Does the model decide what to do next, or does your code?" The answer places them on the ladder.

Vocabulary Collision Tables

Terms Where Your IDAM Intuition Misleads

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Token	Unit of text (~4 characters). Billing and context-window unit.	Bearer token, ID token, refresh token	LLM tokens carry zero authorization semantics. "500K tokens/day" is a cost statement, never a credential count.
Agent	System where an LLM dynamically directs its own tool use	Endpoint software, daemon, background process	Buyers call everything an agent. Only systems where the model directs tool choice at runtime raise the authorization-at-runtime problem.
Scope	Task constraint, tool availability, often expressed in natural language	OAuth scope string — machine-enforceable, checked by authorization server	"Only summarize HR policy" in a system prompt is an instruction, nothing more. Natural-language scope carries zero enforcement weight.
Session	Conversation thread, retained chat state, memory store	Stateful authenticated interaction with expiration and revocation	Ending a chat session does not revoke downstream credentials, clear retained logs, or remove agent memory.
Context	Everything assembled for a model call: prompt, retrieved docs, chat history, tool outputs	Risk signals — device posture, network, location, behavior	AI context is simultaneously evidence and instruction. A retrieved document can carry facts and prompt injection in the same payload.
Identity	Human user, AI agent, app, model provider, MCP server, downstream API principal	Managed subject — human, service account, device, workload	"The AI did it" tells you nothing auditable. You need to know which principal acted at each hop.

Identity Model Per Pattern

Your IDAM knowledge genuinely helps here, up to a point. The "Where It Breaks" column marks that point for each pattern.

AI Pattern	IDAM Equivalent	Where It Holds	Where It Breaks
Single LLM call	API call to a service	Authentication, rate limiting, access logging all apply	Same input produces different outputs. You can't acceptance-test it like a deterministic endpoint. Standard monitoring answers "is it running?" and stops there. Whether it's right requires evals.
RAG	Search with ACL-trimmed results	Per-document access control is the right instinct	Vector indexes often flatten ACLs at embed time. Access controls must be re-applied at retrieval, not assumed from the source system.
Workflow	Orchestrated application process	Step-level authorization, credential-per-step, audit trail	A workflow step can return 200 OK with a wrong answer. The model fills content probabilistically inside a deterministic frame. Evals exist for exactly this reason.
Agent (including MCP tool discovery)	Service account / non-human identity	Credential issuance, scoping, revocation all apply	Static service-account permissions miss dynamic tool choice. MCP standardizes tool discovery but does not enforce resource-level authorization. Credentials must be short-lived and scoped per task.
Multi-agent	Multiple service accounts with delegation	Each agent needs its own identity, credentials, audit trail	Delegation chains form at runtime and can't be fully pre-configured. Agent A grants Agent B access to a tool neither was explicitly authorized for in combination. Blast radius is non-deterministic.

If you remember nothing else

When a buyer uses any of these terms, pause before assuming you know what they mean. Same word. Different mechanism entirely.

Source Index

Every entry above traces to the Patterns & Practice section. Use this to navigate back when a concept needs more depth than the recap provides.

Concept	Source Article
Complexity ladder, diagnostic questions, "start simple" principle	The Spectrum of AI Applications (Section Opener)
Context engineering, system prompt, context failure modes	Prompting and Context Engineering
RAG mechanics, access controls per-document, naive RAG failures	Retrieval-Augmented Generation
Fine-tuning vs. prompting, knowledge injection misconception	Fine-Tuning vs. Prompting
Workflows vs. agents, five workflow patterns, agent loop	Workflows vs. Agents
Tool use, function calling, MCP integration and authorization	Tool Use, Function Calling, and MCP
Evals, observability, tracing, tool-call failure modes	Evals and Observability
Multi-agent topologies, delegation chains, coordination overhead	Multi-Agent Patterns
Vocabulary collisions (token, agent, scope, session, context, identity)	Introduced across all lessons; consolidated in section glossary

Things to follow up on...

MCP's accumulating security surface: Trend Micro's scan found 492 MCP servers running without basic security controls, and the OWASP MCP Top 10 is now in beta — worth tracking as the auth story matures.
Multi-agent failure rates in practice: The MAST study analyzed 1,642 execution traces across seven open-source frameworks and found failure rates ranging from 41% to 86.7%, with coordination breakdowns as the largest category — useful ammunition when a buyer's pitch outpaces their architecture.
Anthropic's agent-building guidance: The canonical Building Effective Agents piece defines the workflow-vs-agent distinction this entire ladder rests on, and it's worth reading in full for the five workflow pattern descriptions and the complexity warnings.
Context engineering as a discipline: Anthropic Engineering published a detailed guide on effective context engineering for AI agents that explains why context assembly is now the core production skill — and why prompt engineering alone stopped being sufficient.