"Custom AI" is showing up in every federal AI procurement right now. The phrase is doing significant work, and it's almost never defined. When a vendor says their model is customized for your agency's mission, they are almost certainly describing context engineering — not a new model, not fine-tuning on agency data, not a proprietary architecture. They engineered the context. Understanding what that means mechanically is what separates a seller who can follow the technical conversation from one who goes quiet when the CAIO starts asking pointed questions.
What Context Engineering Actually Is
Context engineering is the discipline of constructing the complete informational input to a language model — its instructions, examples, and relevant data — such that the model reliably produces behavior appropriate to a specific deployment. The term evolved from "prompt engineering" when practitioners recognized that phrasing questions well was the least of the work; the real discipline was architecting the entire information environment the model operates in before it generates a single token of output.
The model itself doesn't change. Its weights, the billions of numerical parameters that encode its knowledge and reasoning patterns, are fixed at training time and remain fixed during deployment. Everything the model sees when it's asked to do something is assembled fresh, deliberately and precisely. That assembled input is the context. Engineering it is the work.
Three Instruments
Production context engineering uses three primary instruments, usually in combination.
System prompts occupy a privileged position at the start of the context window, before any user input. They carry the foundational instructions: the model's persona, its domain constraints, its output format requirements, what it should refuse to discuss, and how it should handle edge cases. A well-constructed system prompt for a federal benefits application might run 800 to 1,500 tokens — roughly a page and a half of dense instruction. It tells the model it is an assistant for a specific agency, that it answers only questions within a defined scope, that it cites specific policy documents when making eligibility determinations, and that it escalates anything involving personal health information. The model doesn't "know" these things from training. It's told them, freshly, every time a conversation begins.
Few-shot examples are input-output pairs embedded in the context that demonstrate desired behavior without any modification to the model's underlying weights. If you want the model to format benefit determinations in a specific structure — claim number, eligibility finding, statutory basis, next steps — you show it three or four examples of that format before the live query. The model infers the pattern and applies it. This is cheaper and faster than fine-tuning, and it's reversible: change the examples, change the behavior. In production deployments, three to eight well-chosen examples typically outperform dozens of poorly chosen ones. Quality matters more than quantity.
Structured context assembly is the discipline that ties the other two together: deciding what information goes into the context window, in what order, with what formatting, and at what priority. Models are not indifferent to structure. Information placed early in a long context tends to receive more weight than information buried in the middle. Formatting signals — headers, delimiters, explicit labels — help the model parse what's instruction versus data versus example. In a production system handling thousands of queries, the assembly logic is often more complex than the prompts themselves. It's the difference between handing someone a folder of documents and handing them a briefing book.
Okta Concept Mapping
Context engineering most resembles how SAML assertions work: a trusted source makes claims about an entity — its identity, its permissions, its context — and the relying party acts on those claims without independently verifying the underlying facts. The system prompt is the assertion; the model is the service provider; the claims are trusted because of their position in the exchange, not because they've been cryptographically verified. Where the analogy breaks: in SAML, the SP can verify the assertion's signature against a certificate chain. The model cannot verify anything in its context. It trusts the system prompt because of convention, not cryptographic proof — which means a sufficiently crafted user message can sometimes override system prompt instructions entirely. Prompt injection has no clean SAML equivalent because you can't forge a signed assertion. The model's trust architecture is positional, not cryptographic, and that distinction has real security implications.
When a Vendor Says "Custom"
When a CAIO asks whether the AI is customized for their agency, they're usually asking several distinct questions simultaneously — and the answers to each are different.
Has the model been fine-tuned on agency data? In most commercial deployments, no. Fine-tuning is expensive, requires careful data governance, and is genuinely uncommon outside of specialized use cases. Has the context been engineered for agency-specific behavior? Almost certainly yes — that's what the vendor means by "custom." Who controls the system prompt? Usually the vendor, and the agency often has limited visibility into its contents. Can the agency audit what instructions the model is operating under? That answer varies significantly by vendor and contract structure, and it's where procurement conversations get interesting.
A seller who understands context engineering can help the buyer ask the right version of each question. "Custom AI" is not a binary claim: it's a description of a configuration layer that sits between a general-purpose model and an agency-specific deployment. The model is a platform. The context is the configuration. And like any configuration, it can be audited, versioned, tested, and governed — if the contract requires it.
That's the conversation worth having. The buyer who asks "is this AI customized?" is really asking "do we control what this model is being told to do, and can we verify it?" Those are identity and governance questions dressed in AI vocabulary. They're questions you already know how to think about.

