Context Engineering: What the Model Sees Before You Type

By Carey Whitten— May 5, 2026

Context Engineering: What the Model Sees Before You Type

Every model response starts the same way: the model receives a block of text and predicts what should come next. What varies — what determines whether the model sounds like a careful federal compliance assistant or a casual consumer chatbot — is what's in that block before the user's message arrives.

That pre-assembly is context engineering. You don't toggle it on. It's the discipline of constructing the input payload that shapes model behavior at inference time, before any user interaction occurs.

How the Assembly Works

A production AI system typically assembles three categories of content into the context window before the user's message is appended.

The system prompt is a natural-language instruction block placed at the start of the context. It establishes role, constraints, tone, and behavioral rules. "You are a procurement assistant for a federal civilian agency. You do not provide legal advice. You cite FAR clauses when relevant. You respond in plain language at an eighth-grade reading level." The model doesn't execute these as code — it reads them as text and weights its outputs accordingly. A well-constructed system prompt can meaningfully shift the distribution of model outputs toward the intended behavior. A poorly constructed one produces a model that ignores it under pressure.

Few-shot examples are input-output pairs embedded in the context before the user's turn. They demonstrate the expected format and reasoning pattern rather than describing it. If you want the model to respond to policy questions with a specific structure — question restatement, applicable regulation, plain-language answer, caveat — you show it three examples of that pattern. The model infers the template from demonstration. This is more reliable than instruction alone for complex output formats, because the model is pattern-matching against concrete evidence rather than interpreting abstract description.

Structured context is the assembled situational data relevant to the session: user role, agency, current workflow state, prior conversation turns, any retrieved content. (Retrieval mechanics are Lesson 2's problem.) A contracting officer and a CISO can interact with the same underlying model and get meaningfully different responses — same weights, different context payload.

The full assembled input — system prompt, examples, structured context, user message — is what the model actually processes. Context engineering is the practice of designing and maintaining that assembly. Practitioners moved from "prompt engineering" to "context engineering" as a label because the clever query was a distraction. The architecture of the entire input is where the behavior lives.

Okta Concept Mapping

Closest IDAM analogy: claims transformation at the authorization server. When an OAuth AS issues an access token, it doesn't pass through raw user intent — it applies policy, filters scopes, enriches claims, and sets audience. The downstream resource server sees a shaped artifact. Context engineering does the same thing: the model sees an assembled, policy-shaped payload rather than raw user input. The system prompt is the policy. Few-shot examples are transformation rules that show rather than tell.

Where it breaks: The AS enforces policy deterministically and with cryptographic guarantees. A token's scope boundary is a hard constraint — no amount of clever user behavior crosses it. A system prompt is natural language in a text buffer. The model follows it probabilistically, not deterministically. A sufficiently adversarial user input can cause a model to ignore, contradict, or reveal its system prompt. There is no jailbreak for an OAuth scope. When a buyer asks "how do you enforce the AI's behavior," the honest answer is: you influence it, you don't enforce it. That distinction matters in a federal risk conversation.

What "Customized AI" Actually Means in Practice

When a CAIO tells you they need a "customized AI," they're usually describing one of three things: a different model (fine-tuning, Lesson 3), a model with access to their documents (retrieval, Lesson 2), or a model with different behavioral defaults for their context. The third one is context engineering, and it's the fastest and most common path to what most agencies actually want.

Most "custom AI" deployments in production today are the same foundation model with a different context assembly. A vendor claiming to offer a "customized AI for federal procurement" is, in most cases, offering a system prompt and a few-shot library wrapped in a product interface. Worth understanding clearly, not as a criticism — that's often exactly what's needed. But the buyer who understands the mechanism can evaluate the claim. The seller who can explain the mechanism earns the conversation.

The vocabulary question your buyer is actually asking, underneath "can we customize it," is: how durable is the customization, who controls it, and what happens when a user tries to work around it. Context engineering gives you the first two. The third one is where the honest answer gets uncomfortable — and where the conversation about governance, monitoring, and red-teaming begins.

That's Lesson 4's territory. For now: the model is shaped by what it reads before you arrive. Understanding what goes into that assembly is what separates a credible AI conversation from a features demo.

How the Assembly Works

A production AI system typically assembles three categories of content into the context window before the user's message is appended.

Okta Concept Mapping