Share

Patterns & Practice

The Context Window

Know what you're walking into.

The Context Window | Patterns & Practice — Issue 1

Patterns & Practice

VERSION 1Sunday, May 10

Every AI deployment pattern exists because the simpler one broke. Knowing the ladder, and spotting which rung your buyer actually needs, is the skill this section builds.

Patterns & Practice

VERSION 1Sunday, May 10

Every AI deployment pattern exists because the simpler one broke. Knowing the ladder, and spotting which rung your buyer actually needs, is the skill this section builds.

Section Opener

The Deployment Pattern Matters More Than the Model

By Leigh Garrity— May 9, 2026

Feature image for article: The Deployment Pattern Matters More Than the Model

Most AI initiatives your buyers describe are workflows wearing agent vocabulary, carrying governance overhead they never needed. Anthropic's complexity ladder — from single LLM calls to genuine autonomous agents — has become the most stable framework for sorting what's real from what's mislabeled. Each rung raises different identity questions. Your IDAM instincts are reliable through about the third one. This section maps the territory after that, pattern by pattern, starting with where the familiar mental models stop working.

Section Opener

The Deployment Pattern Matters More Than the Model

By Leigh Garrity— May 9, 2026

PART ONE

Your Buyer Pays by the Token, and They Already Chose Where

OPEN

When your buyer says they're "using Claude," they almost certainly aren't calling Anthropic. They're calling Claude through Bedrock, Azure AI Foundry, or Vertex AI — same model, same per-token rate, completely different compliance and procurement story. The hyperscaler won this decision before AI entered the conversation. This piece covers the consumption layer: what tokens actually are, how pricing works beneath the rate card (nobody pays sticker), and the structural reasons hyperscaler-hosted endpoints became the regulated-enterprise default. One follow-up question — "where are you accessing it?" — tells you more about your buyer's AI architecture than ten minutes of discovery.

Your Buyer Pays by the Token, and They Already Chose Where

PART TWO

What You Get When a Lab Releases the Weights and What It Costs to Run Them Yourself

OPEN

When a buyer says they're evaluating Llama or DeepSeek for internal deployment, they're describing an infrastructure decision with specific consequences for identity and access. "Releasing the weights" gets treated as synonymous with open source, which creates real problems once procurement and legal get involved. The licenses aren't interchangeable, the GPU requirements are steep, and the dominant serving framework ships with no authentication by default. This piece covers what a weights release actually contains, what self-hosting costs in hardware and people, and the reliable pattern where proof-of-concept GPU clusters quietly become managed cloud deployments within a year. It also marks the exact point where your IdP migration intuition helps and where it starts to mislead you.

What You Get When a Lab Releases the Weights and What It Costs to Run Them Yourself

PART THREE

Naive RAG vs. Production-Grade RAG

OPEN

When a buyer says "we built a RAG pipeline," they could mean a weekend prototype that answers questions about embedded PDFs or a multi-stage retrieval system with hybrid search and document-level authorization. You'll hear both claims on calls, sometimes from the same account. The difference between them is where conversations get interesting. Naive RAG has no mechanism to detect it retrieved the wrong chunks. It generates confidently from whatever it found. Production-grade RAG adds hybrid search, reranking, and per-user authorization at the retrieval layer. That authorization surface is exactly where your IDAM expertise connects to AI architecture. This piece maps where that connection holds and where it misleads you.

PART FOUR

Workflows vs. Agents — The Distinction That Earns You the Room

OPEN

Anthropic drew the sharpest technical line between workflows and agents: workflows follow paths you designed; agents choose their own. That distinction governs authorization, cost, and how the system fails. Most buyers who say "agent" are describing a workflow. The AE who can name which one they actually need, and explain why it matters for auditability and blast radius, earns a different kind of conversation. This piece covers the five workflow patterns, the agent failure mode nobody mentions in demos, and field-ready language for each.

Workflows vs. Agents — The Distinction That Earns You the Room

PART FIVE

Copilots Have Your Permissions. All of Them.

OPEN

Microsoft 365 Copilot doesn't get its own permissions. It gets the user's. Every SharePoint site, every Teams channel, every OneDrive folder they can access — Copilot searches all of it on every prompt. The authorization boundary hasn't moved. But the AI exercises the full granted permission set simultaneously, which means every overshared folder and every stale group membership from a 2019 migration is now active exposure. Your IDAM instinct says: if the permissions haven't changed, the risk profile hasn't changed. That instinct is wrong here, and understanding exactly where it breaks is the difference between leading the copilot conversation and losing it to a framework slide.

Copilots Have Your Permissions. All of Them.

PART SIX

Function Calling, MCP, and Skills Are Layers, Not Choices

OPEN

Function calling, MCP, and Skills keep showing up in the same buyer sentences, usually without clear boundaries between them. They're layers in a single stack, each operating at a different altitude: the model formats the request, MCP transports it, Skills decide whether it should happen at all. That distinction buys real credibility in a CAIO conversation. The security picture matters here too. Thirty CVEs filed against MCP in the first two months of 2026. Thirty-eight percent of surveyed servers running without authentication. Tool poisoning demonstrated as a real attack class, not a theoretical one. MCP solves connectivity. Governance is someone else's problem. Knowing where that line falls changes the conversation.

Function Calling, MCP, and Skills Are Layers, Not Choices

PART SEVEN

Cloud API vs. On-Device Inference — Let the Buyer's Constraint Decide

OPEN

Every AI model runs somewhere. Your buyer's constraint decides where. Cloud API inference sends prompts to a provider's GPUs over the network. On-device inference keeps everything on hardware the buyer controls. The moment a public sector buyer says "that data can't leave our network," you're in this conversation whether you planned to be or not. This piece maps both inference patterns against the constraints that actually drive the decision: data classification, disconnected environments, latency, privacy, and per-token cost at volume. It covers why quantization made capable models viable on a laptop, where Apple Intelligence previews the hybrid pattern enterprise will need, and the specific field language that shows your buyer you understand the tradeoff space.

Cloud API vs. On-Device Inference — Let the Buyer's Constraint Decide

Recap — Deployment Patterns Cheat Sheet

You just read eight articles. Your head is full. This is the structure that makes it stick. Four diagnostic questions that place any buyer's AI initiative on the complexity ladder. Six deployment patterns with the identity implications spelled out. Two vocabulary collision tables that flag exactly where your IDAM intuition helps and where it starts lying to you — token, scope, session, agent, context, identity. Each term means something different on the other side of the conversation. The cheat sheet maps every entry back to its source article, so you can go deeper when a deal demands it.

VIEW ARTICLE

PART ONE

Your Buyer Pays by the Token, and They Already Chose Where

OPEN

PART TWO

What You Get When a Lab Releases the Weights and What It Costs to Run Them Yourself

OPEN

PART THREE

Naive RAG vs. Production-Grade RAG

OPEN

PART FOUR

Workflows vs. Agents — The Distinction That Earns You the Room

OPEN

PART FIVE

Copilots Have Your Permissions. All of Them.

OPEN

PART SIX

Function Calling, MCP, and Skills Are Layers, Not Choices

OPEN

PART SEVEN

Cloud API vs. On-Device Inference — Let the Buyer's Constraint Decide

OPEN

Recap — Deployment Patterns Cheat Sheet

VIEW ARTICLE

Quick-Scan Dashboard

AI systems get assembled in layers. Each layer solves a problem the one below it couldn't handle, which is the reasonable part. The less reasonable part is that each layer also introduces its own failure mode, its own governance surface, and its own vocabulary that your buyer already expects you to speak fluently.

These six cards cover the architecture patterns landing in public sector conversations right now. The complexity ladder and why Anthropic says to stay as low on it as possible. Where regulated buyers actually deploy models and the shortcut that explains why. Open weights, RAG failure modes, the workflow-versus-agent distinction that most buyers get backwards, and what MCP leaves dangerously unfinished.

Scan before the call. Skip what doesn't apply.

Quick-Scan Dashboard

Scan before the call. Skip what doesn't apply.

Architecture Ladder

Six Rungs From Prompt to Agent

Anthropic's operating principle: find the simplest solution possible, and climb only when it breaks. Single prompt → RAG (adds live data) → workflow (adds orchestrated steps) → tool use (adds real-world action) → agent (adds autonomous planning) → multi-agent (adds delegation). Every rung up costs you latency, money, and governance headaches. Climb reluctantly.

Deployment Posture

Regulated Buyers Choose Hyperscalers for the ATO

Bedrock, Azure AI Foundry, and Vertex AI inherit the hyperscaler's FedRAMP/HIPAA/SOC posture. Direct model APIs authenticate with static keys, route over public internet, and live outside CloudTrail. Yes, the per-token cost is higher and model versions lag. Buyers accept that trade because the ATO already exists. One less fight with the authorizing official.

Open Weights

Open Weights Ship Parameters, Not Infrastructure or Ops

"Releasing the weights" means distributing trained model parameters. Not training data. Not alignment procedures. Not the vendor's safety guardrails. The capability gap with proprietary models has narrowed to 5–10 benchmark points. The ops gap has not. Self-hosting breaks even above roughly 1.2B tokens per month, and that math conveniently ignores the $250K+ inference engineer you'll need to hire. In practice, most self-hosting plans quietly turn into Bedrock deployments.