The Plumbing: Section Recap

By Leigh Garrity— May 8, 2026

This document organizes what you read, not what you missed. New information is flagged explicitly. Everything else is retrieval, not instruction.

The Loop

The mechanical picture underneath every "AI-powered" feature your customers will describe:

Model receives context → emits response or tool call → harness executes the call with credentials → result loads back into context → model runs again.

That's it. Copilots, agents, assistants — all of them are this loop running at different speeds with different tools attached. The three problems below are the three ways the loop breaks in practice.

Problem 1: Context Budget

Context window — The fixed-size memory space the model reads on every inference call. Everything the model "knows" during a task must fit here. When it comes up: Customer says their agent "forgets" earlier instructions mid-task. That's context budget exhaustion. The model is fine; the window is full. Don't confuse with: Long-term memory or persistent storage. The context window clears between sessions unless something explicitly reloads it.

Token (AI sense) — The unit the model uses to measure context size. Roughly ¾ of a word in English. A 128K-token window holds roughly 90,000 words. When it comes up: Vendor quotes context window size in tokens. Translates to "how much can the agent hold in its head at once." Don't confuse with: An auth token. Same word, completely different object. See the collision table below — this one will bite you in a live meeting.

Context accumulation — The pattern where each tool call result loads back into context, consuming budget. Long agentic tasks can exhaust the window before completing. When it comes up: Customer asks why their agent degrades on complex, multi-step workflows. The loop is eating its own context. Don't confuse with: Memory leaks in traditional software. Different mechanism, similar symptom.

If you remember nothing else: The context window is the model's working memory. Everything costs tokens. Nothing persists automatically.

Problem 2: Retrieval Quality

RAG (Retrieval-Augmented Generation) — The pattern where the harness retrieves relevant documents and loads them into context before the model generates a response. The model doesn't "know" the documents; it reads them in real time. When it comes up: Customer asks how the AI "knows" about their internal policies or procurement catalog. RAG is the answer. The model was trained once. The documents were retrieved at query time. Don't confuse with: Fine-tuning. Fine-tuning changes the model weights. RAG changes what's in the context window. Not interchangeable, not equivalent.

Vector search — Retrieval method that converts text to numerical embeddings and finds documents by semantic similarity, not keyword match. When it comes up: Customer asks why the AI surfaced a relevant document even though the query didn't contain the exact words. Vector search. Don't confuse with: Full-text search. Vector search finds meaning-neighbors; full-text search finds string-matches. Neither is strictly better; they fail on different inputs.

Hybrid retrieval — Combining vector search and keyword search, then merging the ranked results. Catches what each method misses alone. When it comes up: Customer reports the AI misses documents containing specific contract numbers or agency acronyms. Pure vector search fails on exact-match terms. Hybrid retrieval is the fix. Don't confuse with: Ensemble models. Hybrid retrieval is about the search step, not the model architecture.

Grounding — Constraining model responses to retrieved content rather than trained knowledge. Reduces hallucination; increases auditability. When it comes up: Compliance-sensitive customers ask how they know the AI isn't fabricating citations. Grounding is the mechanism — responses trace back to retrieved documents. Don't confuse with: Prompt injection defense. Grounding limits what the model draws on; it doesn't prevent malicious inputs from manipulating the model's behavior.

If you remember nothing else: RAG is how the model gets information it wasn't trained on. The retrieval step is where accuracy lives or dies — not the model.

Problem 3: Tool Discovery

MCP (Model Context Protocol) — A wire protocol that standardizes how models communicate with external tools and data sources. Defines the message format; does not define what the tools do or who can call them. When it comes up: Vendor says their product is "MCP-compatible." That means it can receive and respond to MCP-formatted tool calls. It says nothing about auth, scope enforcement, or audit logging. Don't confuse with: An agent framework or orchestration platform. MCP is the pipe. The harness is what runs on top of it.

Tool manifest — The list of available tools the harness exposes to the model at the start of a session. The model selects from this list when it decides to call a tool. When it comes up: Customer asks how the AI "knows" what it can do. The tool manifest is the answer. The model reads a list; it doesn't discover tools dynamically at runtime. Don't confuse with: API documentation. The tool manifest is machine-readable and loaded into context. API docs are for humans and don't affect model behavior.

Skills (load management sense) — Discrete, pre-packaged tool bundles that the harness loads selectively based on task context. Prevents the tool manifest from consuming the entire context budget. When it comes up: Customer asks why the agent seems to have different capabilities in different workflows. Different Skills are loaded for different task types — this is intentional context management, not inconsistency. Don't confuse with: Plugins or extensions in the consumer sense. Skills are a context budget mechanism. The feature-store analogy breaks quickly.

If you remember nothing else: MCP is the wire. Skills are what you load onto it. The harness decides what runs — and right now, the harness is where auth lives by default, which means it's also where auth gaps live.

Vocabulary Collision Tables

Token

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Token	Unit of text measurement; ~¾ of a word; used to size context windows and meter API costs	Bearer token / access token / JWT	In AI, a token is consumed to measure input/output size. In IDAM, a token is presented to prove identity or carry claims. A 128K-token context window and a JWT are both called "tokens" in the same customer conversation. Clarify which one immediately or the conversation derails.

Session and Context

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Session	A single conversation or task run; context is scoped to it; ends when the task ends	Authenticated session with TTL, bound to a user identity	AI sessions carry no inherent auth state. An AI session can outlive the user's auth session, or run with no user session at all. The session boundary and the auth boundary are different objects.
Context	Everything loaded into the model's working memory for a given inference call — documents, tool results, conversation history	Claims and attributes inside an identity token	In IDAM, context is small and structured (a few KB of claims). In AI, context is the entire working memory — potentially hundreds of thousands of tokens. Same word, three orders of magnitude apart in size and purpose.

Scope and Agent

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
Scope	Informal: the task domain or operational boundary the agent is working within	OAuth scope: a declared, enforced permission boundary on an access token	In OAuth, scope is enforced by the authorization server and the resource server. In AI, "scope" describes what the agent is doing, not what it's permitted to do. The enforcement mechanism is absent unless explicitly built into the harness.
Agent	An AI system that executes multi-step tasks autonomously, calling tools and making decisions	Service account / non-human identity (NHI)	An agent is a behavioral description. A service account is an identity with credentials. The agent needs a service account to act, and most current implementations blur the line between them — which is a problem the next section addresses directly.

The Identity Bridge

Once the loop is visible, three questions become unavoidable. These are the same questions you ask about any non-human identity operating in a federal environment. The loop just makes clear why they can't wait.

Whose credentials did the harness use when it called that tool? The model doesn't authenticate. The harness does. That means the harness holds credentials, and "the harness holds credentials" currently means, in most deployments, a service account with broader access than any individual task requires.

What scope governed that call? The OAuth sense of scope, not the AI sense. What permission boundary was on the token the harness presented to the resource server? Was it scoped to the task, or to the harness's full capability set?

What's in the audit trail? The model's decision to call a tool, the tool call itself, the result that loaded back into context — which of those events are logged, and under whose identity are they attributed?

The next section addresses each of these directly.

For More Information

Recap Entry	Source Article	Section
The inference loop	"How AI Agents Actually Work: The Loop Beneath the Feature"	The Inference Loop
Context window, token (AI sense), context accumulation	"Context Windows and Why They Fill Up"	Context Budget
RAG, vector search, hybrid retrieval, grounding	"Retrieval-Augmented Generation: What Your Customer Means When They Say the AI Knows Things"	Retrieval Quality
MCP	"MCP: The Wire Protocol Everyone Is Shipping Against"	Tool Discovery
Skills, tool manifest	"Skills and Tool Loading: How Agents Manage What They Can Do"	Tool Discovery
Token, session, context, scope, agent (collision entries)	All section articles	Vocabulary Collisions

Not covered above, but worth knowing: The MCP specification does not define an authentication mechanism. How the harness authenticates tool calls over MCP is implementation-specific — and in most current deployments, it's a static credential attached to the harness process. This is the gap the next section addresses directly.