Patterns & Practice
Patterns & Practice
The Deployment Pattern Matters More Than the Model

Most AI initiatives your buyers describe are workflows wearing agent vocabulary, carrying governance overhead they never needed. Anthropic's complexity ladder — from single LLM calls to genuine autonomous agents — has become the most stable framework for sorting what's real from what's mislabeled. Each rung raises different identity questions. Your IDAM instincts are reliable through about the third one. This section maps the territory after that, pattern by pattern, starting with where the familiar mental models stop working.
The Deployment Pattern Matters More Than the Model
Most AI initiatives your buyers describe are workflows wearing agent vocabulary, carrying governance overhead they never needed. Anthropic's complexity ladder — from single LLM calls to genuine autonomous agents — has become the most stable framework for sorting what's real from what's mislabeled. Each rung raises different identity questions. Your IDAM instincts are reliable through about the third one. This section maps the territory after that, pattern by pattern, starting with where the familiar mental models stop working.

When your buyer says they're "using Claude," they almost certainly aren't calling Anthropic. They're calling Claude through Bedrock, Azure AI Foundry, or Vertex AI — same model, same per-token rate, completely different compliance and procurement story. The hyperscaler won this decision before AI entered the conversation. This piece covers the consumption layer: what tokens actually are, how pricing works beneath the rate card (nobody pays sticker), and the structural reasons hyperscaler-hosted endpoints became the regulated-enterprise default. One follow-up question — "where are you accessing it?" — tells you more about your buyer's AI architecture than ten minutes of discovery.

When a buyer says they're evaluating Llama or DeepSeek for internal deployment, they're describing an infrastructure decision with specific consequences for identity and access. "Releasing the weights" gets treated as synonymous with open source, which creates real problems once procurement and legal get involved. The licenses aren't interchangeable, the GPU requirements are steep, and the dominant serving framework ships with no authentication by default. This piece covers what a weights release actually contains, what self-hosting costs in hardware and people, and the reliable pattern where proof-of-concept GPU clusters quietly become managed cloud deployments within a year. It also marks the exact point where your IdP migration intuition helps and where it starts to mislead you.

When a buyer says "we built a RAG pipeline," they could mean a weekend prototype that answers questions about embedded PDFs or a multi-stage retrieval system with hybrid search and document-level authorization. You'll hear both claims on calls, sometimes from the same account. The difference between them is where conversations get interesting. Naive RAG has no mechanism to detect it retrieved the wrong chunks. It generates confidently from whatever it found. Production-grade RAG adds hybrid search, reranking, and per-user authorization at the retrieval layer. That authorization surface is exactly where your IDAM expertise connects to AI architecture. This piece maps where that connection holds and where it misleads you.

Anthropic drew the sharpest technical line between workflows and agents: workflows follow paths you designed; agents choose their own. That distinction governs authorization, cost, and how the system fails. Most buyers who say "agent" are describing a workflow. The AE who can name which one they actually need, and explain why it matters for auditability and blast radius, earns a different kind of conversation. This piece covers the five workflow patterns, the agent failure mode nobody mentions in demos, and field-ready language for each.

Microsoft 365 Copilot doesn't get its own permissions. It gets the user's. Every SharePoint site, every Teams channel, every OneDrive folder they can access — Copilot searches all of it on every prompt. The authorization boundary hasn't moved. But the AI exercises the full granted permission set simultaneously, which means every overshared folder and every stale group membership from a 2019 migration is now active exposure. Your IDAM instinct says: if the permissions haven't changed, the risk profile hasn't changed. That instinct is wrong here, and understanding exactly where it breaks is the difference between leading the copilot conversation and losing it to a framework slide.

Function calling, MCP, and Skills keep showing up in the same buyer sentences, usually without clear boundaries between them. They're layers in a single stack, each operating at a different altitude: the model formats the request, MCP transports it, Skills decide whether it should happen at all. That distinction buys real credibility in a CAIO conversation. The security picture matters here too. Thirty CVEs filed against MCP in the first two months of 2026. Thirty-eight percent of surveyed servers running without authentication. Tool poisoning demonstrated as a real attack class, not a theoretical one. MCP solves connectivity. Governance is someone else's problem. Knowing where that line falls changes the conversation.

Every AI model runs somewhere. Your buyer's constraint decides where. Cloud API inference sends prompts to a provider's GPUs over the network. On-device inference keeps everything on hardware the buyer controls. The moment a public sector buyer says "that data can't leave our network," you're in this conversation whether you planned to be or not. This piece maps both inference patterns against the constraints that actually drive the decision: data classification, disconnected environments, latency, privacy, and per-token cost at volume. It covers why quantization made capable models viable on a laptop, where Apple Intelligence previews the hybrid pattern enterprise will need, and the specific field language that shows your buyer you understand the tradeoff space.

Recap — Deployment Patterns Cheat Sheet
You just read eight articles. Your head is full. This is the structure that makes it stick. Four diagnostic questions that place any buyer's AI initiative on the complexity ladder. Six deployment patterns with the identity implications spelled out. Two vocabulary collision tables that flag exactly where your IDAM intuition helps and where it starts lying to you — token, scope, session, agent, context, identity. Each term means something different on the other side of the conversation. The cheat sheet maps every entry back to its source article, so you can go deeper when a deal demands it.
