Context Budget as Architecture

By Leigh Garrity— May 8, 2026

MCP-as-deployed burns tokens before any work happens. Anthropic's Skills pattern bets on loading less. The difference shows up in your next vendor conversation.

MCP support is now table stakes in enterprise AI pitches. Anthropic's Skills pattern is showing up in the same conversations, usually without a clear explanation of why it exists alongside MCP rather than instead of it. Both patterns connect models to external capabilities. The architectural difference is where and when the model learns what those capabilities are — and that difference has a direct cost, measured in tokens.

Quick disambiguation before we go further: in this piece, "token" means a unit of text that a language model processes, roughly four characters or three-quarters of a word. Not an OAuth bearer token. Not a SAML assertion. When I mean those, I'll say "access token" or "bearer token." The vocabulary overlap is genuinely annoying and has caused real confusion in buyer conversations where the IDAM team and the AI team are in the same room.

A language model's context window — the total amount of text it can hold in working memory at once — is finite. Claude 3.5 Sonnet's context window runs to 200,000 of these tokens. That sounds large until you start filling it with tool definitions before the user has typed a single word. Architectural choices determine how much of that budget gets spent on infrastructure versus work.

MCP-as-Naively-Deployed

What it is: MCP (Model Context Protocol) is an open standard for connecting language models to external tools and data sources. Naive deployment means every connected MCP server loads its complete tool schema into the model's context at the start of every conversation turn.

What it does: When an agent session initializes, each connected server announces itself by sending a full description of every tool it exposes — names, descriptions, parameter schemas, usage examples. The model receives this as part of its input context. Five connected servers, each with a moderately complex tool schema, can consume 8,000–17,000 context tokens before the user's first message is processed. The model then reasons over the full set of available tools on every turn, whether it needs them or not.

Who's behind it / where it comes from: MCP was published by Anthropic in late 2024 as an open protocol and has since been adopted across the vendor ecosystem — Microsoft, Google DeepMind, and a growing list of enterprise software vendors have announced MCP support. The "naive deployment" pattern isn't a spec requirement. It's what happens when developers answer the integration question before they ask the budget question.

What makes it distinct: The honest version: MCP-as-naively-deployed is what you get when connecting things is treated as the goal rather than the starting point. The protocol is sound. The deployment pattern is expensive. A typical MCP server tool schema runs 1,500–3,500 context tokens. That's per server, per turn. The math compounds fast, and it compounds before the model has done anything useful.

“

Okta analog: The all-attributes SAML assertion

Some IdP configurations include every available user attribute in every SAML assertion, regardless of what the service provider actually requested. The SP asked for authentication and got authentication plus department, manager, cost center, badge number, and parking permit status. Technically correct. Wasteful. Naive MCP has the same shape: every tool schema loads every turn, whether the model needs it or not.

Where the analogy breaks: SAML bloat is a latency and privacy problem. MCP context bloat is a capability problem — you're not just wasting bandwidth, you're consuming the resource the model uses to reason. In a buyer conversation, this reframes "how many tools can it connect to?" as "how does it decide what to load?"

Anthropic's Skills Pattern

What it is: Skills is Anthropic's pattern for lazy-loading tool definitions. A minimal trigger description — a brief summary of when a skill is relevant — loads into context at session start. The full tool specification loads only when the model determines the skill is needed for the current task.

What it does: Instead of loading complete tool schemas upfront, the Skills pattern loads a compact registry: short descriptions of available capabilities, typically 50–150 context tokens per skill. When the model's reasoning determines that a skill is relevant to the current request, the full specification for that skill loads at that point. Skills that aren't relevant to the current task never load at all. On a five-skill setup, the upfront context cost might be 300–750 tokens rather than 8,000–17,000. The budget gets spent on work, not on announcing what work is theoretically possible.

Who's behind it / where it comes from: Anthropic introduced the Skills pattern in documentation accompanying Claude's tool use capabilities, as a recommended approach for managing context in multi-tool deployments. It's not a competing standard to MCP. It's an architectural pattern that can be implemented on top of MCP or independently of it.

What makes it distinct: The bet is that the model's judgment about relevance is more efficient than loading everything and letting the model sort it out. The trigger description does two things: it tells the model a skill exists, and it gives the model enough information to decide whether to load the full specification. The full specification only materializes when the model decides it's needed. Infrastructure engineers will recognize this as lazy loading applied to tool definitions, a pattern from a different context entirely.

“

Okta analog: Just-in-time attribute release

Attribute-based access control systems can be configured to assert claims only when policy says they're needed for the current transaction, rather than including all available claims in every token. The Skills pattern applies the same logic to tool definitions: the model gets a minimal description of what's available, and the full specification only materializes when it's relevant.

Where the analogy breaks: ABAC policy enforcement is external to the principal — the IdP decides what to release. In Skills, the model decides what to load. That's a different trust model. In a buyer conversation, "who decides when to load the full skill?" has a specific answer: the model. That has implications for auditability that are worth surfacing before a pilot starts.

Comparing the Two

Three traits matter for this audience: context budget consumption, appropriate deployment context, and what to ask any vendor claiming support for either pattern.

Context budget consumption

Naive MCP: front-loaded, fixed, and independent of task. The cost is the same whether the agent uses two tools or twelve on a given turn. In a 200,000-token context window, five connected servers consume 4–8% of total capacity before the user types anything. In a longer agentic session where the model is also accumulating conversation history, tool outputs, and intermediate reasoning steps, that upfront cost starts competing directly with the work itself.

Skills: proportional to relevance. The upfront cost is the trigger registry — small and bounded. The marginal cost of each skill is paid only when the model decides to use it. In practice, a session that uses two of five available skills pays for two full schemas plus five trigger descriptions, rather than five full schemas regardless of use. The practical implication isn't just efficiency. It's that Skills-based deployments can support more total capabilities within the same context budget, because unused capabilities cost almost nothing.

Appropriate deployment context

Naive MCP is appropriate when the tool set is small, stable, and likely to be used in most sessions. If an agent connects to three tools and uses all three on nearly every turn, the upfront loading cost is a rounding error. The pattern also has a predictability advantage: the model always knows exactly what's available, with no latency for on-demand loading and no risk that a trigger description is too vague to surface a relevant skill.

Skills is appropriate when the tool set is large, varied, or when most sessions use a small subset of available capabilities. Enterprise deployments, where an agent might theoretically connect to dozens of systems but typically uses three or four per task, are the natural fit. The pattern also handles capability expansion more gracefully: adding a new skill means adding a trigger description to the registry, not increasing the baseline context cost for every session.

Neither pattern is universally better. The question is whether your deployment looks more like "small stable toolset, high utilization" or "large varied toolset, selective utilization." Most enterprise deployments look like the second one.

What to ask any vendor claiming MCP support

"We support MCP" is now a commodity claim. The follow-up that separates architectural thinking from marketing copy: How does your system manage context budget when multiple MCP servers are connected?

A vendor who has thought about this will describe a loading strategy. A vendor who hasn't will describe connection count. Those are different conversations.

Specific follow-ups worth asking:

What's the baseline context cost at session initialization with five connected servers?
Does tool schema loading happen per-turn or per-session?
How does the system handle context budget exhaustion mid-task?

The last question is particularly revealing. Naive MCP deployments that run out of context mid-session have a specific failure mode: the model loses access to earlier conversation history as the context window fills, because something has to give. Skills-based deployments have a different failure mode: the model might fail to surface a skill it needs if the trigger description is insufficiently specific. Neither failure mode is acceptable in production. Both are predictable. Neither gets mentioned in demos.

“

Okta analog: Least privilege as an architectural principle

Least privilege in IDAM means a system operates with only the permissions it needs for the current task. Skills applies the same principle to context: load only what's needed for the current turn. The framing works well in buyer conversations with security-minded buyers who already think in terms of minimal footprint.

Where it breaks: least privilege in IDAM is enforced by an authorization system external to the principal. In Skills, the "enforcement" is the model's own judgment about what to load — the model is simultaneously the principal and the policy engine. That's a design choice, not a flaw, but it means your audit trail looks different from what IDAM teams expect. Ask vendors how they log which skills were loaded on which turn, and whether that log is queryable independently of the model's output.

How to Say This in the Field

Don't say	Do say	Why it matters
"We support MCP"	"We support MCP with on-demand schema loading — tool definitions load when the model needs them, not all at once"	Distinguishes architectural approach from commodity connectivity claim
"It connects to everything"	"It can connect to a wide range of tools, and the context cost scales with what the agent actually uses"	Reframes connection count as budget efficiency, not just capability breadth
"The agent knows what tools it has"	"The agent has a registry of available tools and loads the full specification for whichever ones are relevant to the current task"	Distinguishes knowing-about from loading — the mechanism matters for cost and auditability
"More integrations means more capable"	"More integrations means more potential capability — the architecture determines whether that potential costs you upfront or on demand"	Prevents the buyer from equating connection count with performance
"Token limits won't be a problem"	"We manage context budget actively — here's how tool definitions are loaded and what happens if the session approaches the limit"	Forces a specific answer instead of a reassurance
"The context window is huge"	"The context window is 200,000 tokens, and the loading strategy keeps tool infrastructure to a small fraction of that"	Anchors the claim in a specific number and a specific design choice
"It uses Anthropic's Skills"	"It uses a lazy-loading pattern for tool definitions — the full schema only loads when the model decides it's relevant"	Explains the mechanism rather than citing a brand name the buyer may not recognize
"Token" (without disambiguation)	Specify: "context token" (unit of text the model processes) vs. "access token" (bearer credential for API authentication)	Prevents the vocabulary collision that derails technical conversations with IDAM-fluent buyers
"The model decides what to do"	"The model decides which tools to load based on trigger descriptions, then executes against the full schema — here's how that decision gets logged"	Surfaces the auditability question before the buyer asks it
"We haven't hit context limits in testing"	"Our test scenarios used [X] tools with [Y] average schema size — here's the context budget breakdown for a representative session"	Replaces anecdote with a reproducible claim the buyer can verify

One honest note before you take this into a meeting: the practitioner community is still working out what "Skills support" means across vendors. Anthropic's documentation describes the pattern; how faithfully any given vendor implements it varies, and the term is not yet standardized enough to use as a differentiator on its own. The question "can you show me the context budget breakdown for a five-tool session?" will tell you more than any feature checklist. Ask it early, before the pilot scope gets set.