In most cases, a vendor has built a carefully assembled context layer that shapes what the model sees before your words arrive. The model itself is often identical to what you'd get through a public API. The behavioral differences trace back to that assembly layer, not the model.
What It Is, Precisely
Context engineering is the practice of deliberately assembling the full input to a language model at inference time, treating everything the model processes — not just the user's message — as a design surface. A language model doesn't receive a message; it receives a context window, a finite buffer of text that it processes as a single unit to generate a response. The user's message is one component of that buffer. The rest is assembled by the system.
Context engineering is the discipline of designing that rest.
The field evolved from "prompt engineering" as practitioners discovered that optimizing the user-facing prompt was the wrong lever. The model's behavior is shaped by the entire context window. Focusing only on the user's words while leaving the surrounding structure unexamined is like tuning a radio station while ignoring that the transmitter is pointed the wrong direction.
How It Works
Three components do most of the work.
System prompts are instructions prepended to the conversation before any user input. They're invisible to the user and define the model's operating parameters: its persona, its constraints, its response format, the scope of what it will and won't address. A system prompt might be 500 words or 5,000. It might specify that the model should always respond in plain language, never speculate about legal liability, and treat every user as a federal civilian employee with no clearance. The model doesn't "remember" these instructions across sessions — they're reassembled fresh at every inference call.
Few-shot examples are sample exchanges embedded in the context that demonstrate the desired response pattern. Rather than describing what good output looks like, you show it. The model infers the pattern from the examples and applies it to the new input. Three well-chosen examples often do more to shape response style and format than a paragraph of explicit instruction. Counterintuitive, but consistent: models are better at pattern-matching than at following abstract rules.
Structured context assembly is the practice of programmatically composing the context window before the user's message lands. This is where the engineering complexity lives. The system pulls in relevant signals — user role, session history, task type, retrieved documents, current date, organizational metadata — and assembles them into a structured input. A modern large language model might have a context window of 128,000 tokens, roughly 100,000 words. A user's message is typically 50 to 200 tokens. The assembled context surrounding it might be 5,000 to 15,000 tokens. The user's words are a small fraction of what the model is actually processing.
The model's response is a function of the entire context, not the user's message alone. Change the system prompt, and you change the model's behavior across every conversation, without touching the model itself.
Okta Concept Mapping
Context engineering most resembles access token assembly at an authorization server. When an authorization server issues a token, it gathers claims from multiple sources — identity, group membership, resource policy, session state — and packages them into a structure that shapes what the bearer can do downstream. The resource server doesn't re-authenticate; it reads the claims and acts. Context engineering works analogously: the assembly layer gathers signals and packages them into the context window, where they shape model behavior downstream. The analogy holds well enough to be useful. Token claims are discrete, enumerable, and their effect on authorization decisions is deterministic. You can reason formally about what a token with claim X will authorize. A context window's effect on model output is probabilistic: the same context can produce different responses across runs, and the relationship between a specific instruction and model behavior cannot be formally audited the way a policy can be. That gap is not cosmetic. The governance model for context engineering is fundamentally unlike the governance model for token-based authorization, and treating them as equivalent will produce gaps.
What This Means in a Vendor Conversation
When a vendor tells a CIO they've built a "custom AI," that claim covers at least three distinct things, and they're not equivalent.
The first is fine-tuning: modifying the model's weights by training on domain-specific data. Expensive, requires ongoing maintenance as base models improve, and increasingly rare because context engineering has gotten good enough to substitute for it in most enterprise use cases.
The second is context engineering: building a sophisticated assembly layer on top of a commodity model. Legitimate differentiation. A well-engineered context layer can make a general-purpose model behave like a domain expert, enforce compliance constraints, maintain consistent persona, and adapt to user role, all without touching the model itself.
The third is a system prompt on a commodity model with a marketing wrapper. Also common.
Ask the vendor directly what's customized: the model weights, the context assembly layer, or both? A vendor with real context engineering work will have a concrete answer about what signals they're assembling, how they're structured, and how they're maintained. A vendor selling the third option will change the subject.
For public sector accounts specifically, the context assembly layer is also where compliance constraints live. Data handling restrictions, response scope limitations, role-based behavioral differences — these are implemented as context engineering decisions, not model properties. That means they're configurable, auditable at the assembly layer, and dependent on the vendor's discipline in maintaining them. Worth asking about before the procurement closes.
The model is a commodity. The context layer is the product.

