Recap — From Black Box to Plumbing

Reference map of the four mechanical layers behind every AI agent, organized so identity questions land on the right component.

By Leigh Garrity— May 9, 2026

Reference map of the four mechanical layers behind every AI agent, organized so identity questions land on the right component.

You just finished nine articles about the mechanical reality behind every "AI-powered" feature a buyer will describe to you. This is the reference version. Four layers, the terms that live in each, the sales moment where they surface, and the adjacent concept that causes confusion.

The organizing insight: an AI agent is a loop. A model emits requests. A harness executes them. A retrieval layer feeds the model what it needs. A protocol layer connects the tools. Every identity question you already know how to ask attaches to the harness and protocol layers, not the model. Your IDAM instincts are correct. They just attach to a different component than you might assume.

The Model Layer

What the model does: emit structured requests and reason over its context window.

Tool call — A structured JSON output where the model requests that a function be executed. The model does not execute anything. It asks. As you saw in the function calling article, the model produces the request; the harness executes it.
- When it comes up: Buyer says "our agent can use tools." They mean the model emits these requests. Execution happens elsewhere.
- Don't confuse with: API call. Your OAuth intuition about bearer tokens and scopes applies to the harness's API call, not to the model's output.
Context window — The fixed-size input the model reasons over for a single turn: system prompt, conversation history, retrieved documents, prior tool results. Everything the model knows lives here and only here.
- When it comes up: Buyer mentions token costs, context limits, or "the agent losing track." That's context window pressure.
- Don't confuse with: Memory. The model has no persistent state between sessions unless the harness builds one.
Grounding — Connecting model outputs to verifiable source material so the model reasons from evidence, not training data alone.
- When it comes up: Buyer says "we need the AI to use our data, not make things up."
- Don't confuse with: RAG. RAG is one technique for achieving grounding. Other approaches exist. The distinction matters when a buyer treats them as synonyms.

If you remember nothing else

The model is stateless, credentialless, and executes nothing. Every security-relevant action happens outside it.

The Harness Layer

What the harness does: execute tool calls, carry credentials, manage the growing context, enforce policy.

Harness (orchestrator) — The software that wraps the model. Recall from the agentic loop article: the harness receives tool-call outputs, executes them against real systems, feeds results back, and decides whether to loop again or stop. This is where the agent actually lives.
- When it comes up: Buyer says "we're building agents" or "we're using agentic AI." They're describing a harness.
- Don't confuse with: The model. The model is a component inside the harness. The harness is the agent; the model is the reasoning engine.
Credential carrier (a functional label, not a named standard) — The harness holds and presents credentials (OAuth tokens, API keys, service account secrets) when executing tool calls. The model never sees raw secrets.
- When it comes up: "What identity does the agent use?" This is your question. The answer lives in harness configuration.
- Don't confuse with: User identity. The harness may act on behalf of a user but present a service account credential. The delegation chain matters, and it's not always visible.
Context accumulation — Each loop iteration adds content to the context window: the tool call, its result, the model's next reasoning step. As you saw in the context management article, context grows until the window fills or the harness truncates.
- When it comes up: Buyer describes agents that "slow down" or "forget earlier steps." That's context bloat, and it gets misdiagnosed as a performance issue.
- Don't confuse with: Rate limiting. Context accumulation is a content problem (too much text in the window), not a throughput problem.
Progressive disclosure (Skills) — Anthropic's Skills architecture loads only a skill's name and description into context by default; full instructions load when triggered. Harness-layer logic: the orchestrator deciding what the model needs to see and when.
- When it comes up: Buyer mentions Claude Skills or "the agent knows how to do X." Skills are the playbook layer. MCP servers are the connection layer. Complementary, not interchangeable.
- Don't confuse with: MCP. Skills tell the model how to do something. MCP connects it to the system where it does it.

If you remember nothing else

The harness is where identity policy gets enforced or doesn't. If you can't see the harness, you can't audit the agent.

The Retrieval Layer

What the retrieval layer does: select and deliver the right external content into the context window before the model reasons.

Embeddings — Vector representations of text that capture semantic similarity. Two passages about the same concept land near each other in vector space, even with zero keyword overlap.
- When it comes up: Buyer says "we're vectorizing our documents" or "we have a vector database."
- Don't confuse with: Search. Embeddings enable semantic search but aren't search by themselves. They need an index and a query pipeline.
RAG (Retrieval-Augmented Generation) — Retrieve relevant documents, inject them into the context window, let the model generate grounded in that material. As you saw in the embeddings article, retrieval happens before the model sees anything.
- When it comes up: Buyer says "we're using RAG." Ask what they're retrieving from and how. The identity question: who authorized access to the source corpus?
- Don't confuse with: Fine-tuning. RAG adds knowledge at query time. Fine-tuning changes model weights permanently. Completely different governance implications.
Hybrid search — Combining keyword search (BM25) with vector similarity, merging results via reciprocal rank fusion. Microsoft's testing confirms hybrid retrieval with semantic reranking outperforms either method alone. Their recommended default for production RAG is hybrid plus semantic ranking.
- When it comes up: Buyer says retrieval quality is poor. Hybrid search is the standard fix. The identity question still applies: does the retrieval pipeline respect user access permissions on source documents?
- Don't confuse with: Semantic search alone. Pure vector search misses exact-match terms like product names and policy numbers.

If you remember nothing else

Whatever enters the context window is what the model reasons over. If access controls aren't enforced before retrieval, the model will reason over documents the user shouldn't see. OWASP LLM08:2025 exists because this keeps happening.

The Protocol Layer

What the protocol layer does: standardize how tools are described, discovered, and connected.

Function calling — The model provider's native format for structured tool use. OpenAI, Anthropic, and Google each define their own JSON schema. Not standardized across providers.
- When it comes up: Buyer says "we're using function calling." Provider-specific plumbing.
- Don't confuse with: MCP. Function calling is how one model talks to its harness. MCP is a cross-provider protocol for tool discovery and connection.
MCP (Model Context Protocol) — An open protocol for connecting AI clients to tool servers. Defines how tools are described, discovered, and invoked. Two transports: STDIO (local subprocess) and Streamable HTTP (remote).
- When it comes up: Buyer says "we're using MCP." Ask: local STDIO servers or remote HTTP? The identity architecture differs completely.
- Don't confuse with: An API gateway. MCP standardizes tool description and discovery, not traffic management.
MCP authorization — For remote (Streamable HTTP) servers, the spec requires OAuth 2.1. MCP servers are OAuth resource servers, not authorization servers. For STDIO servers, recall from the MCP authorization article: the spec punts. Credentials come from the environment. This is the gap the spec acknowledges but doesn't close.
- When it comes up: "How does the agent authenticate to that MCP server?" For remote: OAuth 2.1. For local STDIO: whatever the host process has in its environment variables. The spec calls this "delegation," which is a generous word for what's actually happening.
- Don't confuse with: End-user authentication. MCP authorization governs the client-to-server connection, not the human-to-application session.

If you remember nothing else

MCP standardizes how tools get connected. Trust establishment sits outside the protocol entirely. The OWASP Agentic Top 10 exists because three of its top four risks involve identity, tools, and delegated trust boundaries. That's your conversation.

Vocabulary Mapping: Buyer Language to Identity Questions

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
"Agentic AI"	A harness looping a model through tool calls	Service account executing a workflow	The agent's credential may outlive the user session that spawned it; session binding isn't automatic
"RAG pipeline"	Retrieve docs → inject into context → generate	Federated query across data sources	No built-in access control at retrieval; the pipeline must enforce permissions the model won't
"MCP server"	A tool endpoint the agent discovers and calls	A SCIM endpoint or SAML SP	MCP defines discovery and invocation; trust establishment sits outside the protocol
"Tool use"	Model emits a structured JSON request for execution	API call with a bearer token	The model doesn't hold the token. The harness does. Scope enforcement is the harness's job
"Context window"	Everything the model can see for this turn	Session state	Fixed size, no pagination. When it fills, content gets dropped. The model doesn't know what it lost

Vocabulary Mapping: Confusion Pairs

Term A	Term B	They Sound Similar Because…	The Actual Difference
Function calling	MCP	Both involve tools	Function calling is one provider's format. MCP is a cross-provider protocol for tool discovery and connection
RAG	Fine-tuning	Both "teach" the model new information	RAG injects knowledge at query time into context. Fine-tuning changes model weights permanently
Skills	MCP servers	Both extend what an agent can do	Skills are instructions (how). MCP servers are connections (where). Anthropic's docs call them complementary
STDIO transport	HTTP transport	Both are MCP transports	STDIO runs as a local subprocess inheriting host credentials. HTTP runs remotely with OAuth 2.1. Completely different identity postures

Source Article Index

Recap Entry	Source Article
Tool call, function calling	The Agentic Loop; Function Calling
Harness, credential carrier, context accumulation	The Agentic Loop; Context Management
Progressive disclosure, Skills	Skills and Agent Design; Context Management
Context window, grounding	Context Management; Embeddings, RAG, and Grounding
Embeddings, RAG, hybrid search	Embeddings, RAG, and Grounding; Retrieval Quality
MCP, STDIO vs. HTTP transports	MCP Connection Layer
MCP authorization, STDIO credentials	MCP Authorization and Trust
OWASP LLM08:2025	Embeddings, RAG, and Grounding
OWASP Agentic Top 10	MCP Authorization and Trust

Spec note: The current MCP specification is 2025-11-25. A revision is tentatively scheduled for June 2026 but has not shipped as of this writing. If you're reading this after June 2026, verify which spec version is current before citing authorization requirements in a buyer conversation.

Things to follow up on...

Context rot in production: Anthropic found that Claude Sonnet 4.5 would prematurely wrap up tasks as its context limit approached, a behavior they addressed with harness-level context resets that didn't generalize to other models.
MCP's token cost problem: A single GitHub MCP server can expose 90+ tools and consume over 50,000 tokens of schema definitions before the agent starts working, a scaling challenge Anthropic addressed with progressive tool discovery in January 2026.
OWASP's agentic security framework: The OWASP Top 10 for Agentic Applications landed in December 2025 with three of its top four risks centered on identity, delegated trust, and tool-level privilege abuse.
Grep vs. RAG tradeoffs: Claude Code's decision to drop vector-based retrieval in favor of filesystem tools like grep and glob is documented by its engineering team and has reshaped how practitioners think about retrieval for fast-moving codebases.

The Model Layer

What the model does: emit structured requests and reason over its context window.

Tool call — A structured JSON output where the model requests that a function be executed. The model does not execute anything. It asks. As you saw in the function calling article, the model produces the request; the harness executes it.
- When it comes up: Buyer says "our agent can use tools." They mean the model emits these requests. Execution happens elsewhere.
- Don't confuse with: API call. Your OAuth intuition about bearer tokens and scopes applies to the harness's API call, not to the model's output.
Context window — The fixed-size input the model reasons over for a single turn: system prompt, conversation history, retrieved documents, prior tool results. Everything the model knows lives here and only here.
- When it comes up: Buyer mentions token costs, context limits, or "the agent losing track." That's context window pressure.
- Don't confuse with: Memory. The model has no persistent state between sessions unless the harness builds one.
Grounding — Connecting model outputs to verifiable source material so the model reasons from evidence, not training data alone.
- When it comes up: Buyer says "we need the AI to use our data, not make things up."
- Don't confuse with: RAG. RAG is one technique for achieving grounding. Other approaches exist. The distinction matters when a buyer treats them as synonyms.

If you remember nothing else

The model is stateless, credentialless, and executes nothing. Every security-relevant action happens outside it.

The Harness Layer

What the harness does: execute tool calls, carry credentials, manage the growing context, enforce policy.

Harness (orchestrator) — The software that wraps the model. Recall from the agentic loop article: the harness receives tool-call outputs, executes them against real systems, feeds results back, and decides whether to loop again or stop. This is where the agent actually lives.
- When it comes up: Buyer says "we're building agents" or "we're using agentic AI." They're describing a harness.
- Don't confuse with: The model. The model is a component inside the harness. The harness is the agent; the model is the reasoning engine.
Credential carrier (a functional label, not a named standard) — The harness holds and presents credentials (OAuth tokens, API keys, service account secrets) when executing tool calls. The model never sees raw secrets.
- When it comes up: "What identity does the agent use?" This is your question. The answer lives in harness configuration.
- Don't confuse with: User identity. The harness may act on behalf of a user but present a service account credential. The delegation chain matters, and it's not always visible.
Context accumulation — Each loop iteration adds content to the context window: the tool call, its result, the model's next reasoning step. As you saw in the context management article, context grows until the window fills or the harness truncates.
- When it comes up: Buyer describes agents that "slow down" or "forget earlier steps." That's context bloat, and it gets misdiagnosed as a performance issue.
- Don't confuse with: Rate limiting. Context accumulation is a content problem (too much text in the window), not a throughput problem.
Progressive disclosure (Skills) — Anthropic's Skills architecture loads only a skill's name and description into context by default; full instructions load when triggered. Harness-layer logic: the orchestrator deciding what the model needs to see and when.
- When it comes up: Buyer mentions Claude Skills or "the agent knows how to do X." Skills are the playbook layer. MCP servers are the connection layer. Complementary, not interchangeable.
- Don't confuse with: MCP. Skills tell the model how to do something. MCP connects it to the system where it does it.

If you remember nothing else

The harness is where identity policy gets enforced or doesn't. If you can't see the harness, you can't audit the agent.

The Retrieval Layer

What the retrieval layer does: select and deliver the right external content into the context window before the model reasons.

Embeddings — Vector representations of text that capture semantic similarity. Two passages about the same concept land near each other in vector space, even with zero keyword overlap.
- When it comes up: Buyer says "we're vectorizing our documents" or "we have a vector database."
- Don't confuse with: Search. Embeddings enable semantic search but aren't search by themselves. They need an index and a query pipeline.
RAG (Retrieval-Augmented Generation) — Retrieve relevant documents, inject them into the context window, let the model generate grounded in that material. As you saw in the embeddings article, retrieval happens before the model sees anything.
- When it comes up: Buyer says "we're using RAG." Ask what they're retrieving from and how. The identity question: who authorized access to the source corpus?
- Don't confuse with: Fine-tuning. RAG adds knowledge at query time. Fine-tuning changes model weights permanently. Completely different governance implications.
Hybrid search — Combining keyword search (BM25) with vector similarity, merging results via reciprocal rank fusion. Microsoft's testing confirms hybrid retrieval with semantic reranking outperforms either method alone. Their recommended default for production RAG is hybrid plus semantic ranking.
- When it comes up: Buyer says retrieval quality is poor. Hybrid search is the standard fix. The identity question still applies: does the retrieval pipeline respect user access permissions on source documents?
- Don't confuse with: Semantic search alone. Pure vector search misses exact-match terms like product names and policy numbers.

If you remember nothing else

The Protocol Layer

What the protocol layer does: standardize how tools are described, discovered, and connected.

Function calling — The model provider's native format for structured tool use. OpenAI, Anthropic, and Google each define their own JSON schema. Not standardized across providers.
- When it comes up: Buyer says "we're using function calling." Provider-specific plumbing.
- Don't confuse with: MCP. Function calling is how one model talks to its harness. MCP is a cross-provider protocol for tool discovery and connection.
MCP (Model Context Protocol) — An open protocol for connecting AI clients to tool servers. Defines how tools are described, discovered, and invoked. Two transports: STDIO (local subprocess) and Streamable HTTP (remote).
- When it comes up: Buyer says "we're using MCP." Ask: local STDIO servers or remote HTTP? The identity architecture differs completely.
- Don't confuse with: An API gateway. MCP standardizes tool description and discovery, not traffic management.
MCP authorization — For remote (Streamable HTTP) servers, the spec requires OAuth 2.1. MCP servers are OAuth resource servers, not authorization servers. For STDIO servers, recall from the MCP authorization article: the spec punts. Credentials come from the environment. This is the gap the spec acknowledges but doesn't close.
- When it comes up: "How does the agent authenticate to that MCP server?" For remote: OAuth 2.1. For local STDIO: whatever the host process has in its environment variables. The spec calls this "delegation," which is a generous word for what's actually happening.
- Don't confuse with: End-user authentication. MCP authorization governs the client-to-server connection, not the human-to-application session.

If you remember nothing else

Vocabulary Mapping: Buyer Language to Identity Questions

AI Term	What It Means in AI	IDAM Equivalent	Key Divergence
"Agentic AI"	A harness looping a model through tool calls	Service account executing a workflow	The agent's credential may outlive the user session that spawned it; session binding isn't automatic
"RAG pipeline"	Retrieve docs → inject into context → generate	Federated query across data sources	No built-in access control at retrieval; the pipeline must enforce permissions the model won't
"MCP server"	A tool endpoint the agent discovers and calls	A SCIM endpoint or SAML SP	MCP defines discovery and invocation; trust establishment sits outside the protocol
"Tool use"	Model emits a structured JSON request for execution	API call with a bearer token	The model doesn't hold the token. The harness does. Scope enforcement is the harness's job
"Context window"	Everything the model can see for this turn	Session state	Fixed size, no pagination. When it fills, content gets dropped. The model doesn't know what it lost

Vocabulary Mapping: Confusion Pairs

Term A	Term B	They Sound Similar Because…	The Actual Difference
Function calling	MCP	Both involve tools	Function calling is one provider's format. MCP is a cross-provider protocol for tool discovery and connection
RAG	Fine-tuning	Both "teach" the model new information	RAG injects knowledge at query time into context. Fine-tuning changes model weights permanently
Skills	MCP servers	Both extend what an agent can do	Skills are instructions (how). MCP servers are connections (where). Anthropic's docs call them complementary
STDIO transport	HTTP transport	Both are MCP transports	STDIO runs as a local subprocess inheriting host credentials. HTTP runs remotely with OAuth 2.1. Completely different identity postures

Source Article Index

Recap Entry	Source Article
Tool call, function calling	The Agentic Loop; Function Calling
Harness, credential carrier, context accumulation	The Agentic Loop; Context Management
Progressive disclosure, Skills	Skills and Agent Design; Context Management
Context window, grounding	Context Management; Embeddings, RAG, and Grounding
Embeddings, RAG, hybrid search	Embeddings, RAG, and Grounding; Retrieval Quality
MCP, STDIO vs. HTTP transports	MCP Connection Layer
MCP authorization, STDIO credentials	MCP Authorization and Trust
OWASP LLM08:2025	Embeddings, RAG, and Grounding
OWASP Agentic Top 10	MCP Authorization and Trust

Things to follow up on...

Context rot in production: Anthropic found that Claude Sonnet 4.5 would prematurely wrap up tasks as its context limit approached, a behavior they addressed with harness-level context resets that didn't generalize to other models.
MCP's token cost problem: A single GitHub MCP server can expose 90+ tools and consume over 50,000 tokens of schema definitions before the agent starts working, a scaling challenge Anthropic addressed with progressive tool discovery in January 2026.
OWASP's agentic security framework: The OWASP Top 10 for Agentic Applications landed in December 2025 with three of its top four risks centered on identity, delegated trust, and tool-level privilege abuse.
Grep vs. RAG tradeoffs: Claude Code's decision to drop vector-based retrieval in favor of filesystem tools like grep and glob is documented by its engineering team and has reshaped how practitioners think about retrieval for fast-moving codebases.