The Three-Part Structure of a Model Call — and Why Wording Is Engineering

By Leigh Garrity— May 6, 2026

The Three-Part Structure of a Model Call — and Why Wording Is Engineering

When your application sends a request to a language model API, it sends a messages array: an ordered list of objects, each tagged with a role. Three roles matter here. Systemdefines the model's operating context — its persona, constraints, and task framing. Usercarries the end-user's input for the current exchange. Assistantcontains the model's prior responses, included to give the model a coherent view of what's already been said. The model receives all three as a single concatenated context and generates the next token from there.

That's the mechanism.

What Each Component Actually Does

The system prompt is the operator's layer. It's set by whoever built the application, not by the end user. In a federal procurement assistant, the system prompt might specify that the model should only reference FAR clauses, should not speculate about vendor pricing, and should respond in plain language suitable for a contracting officer's review. The system prompt runs before the user says anything. It shapes the entire session.

The user turn is the live input — what the end user typed or what the application injected on their behalf. It can be a question, a document to analyze, a command. In agentic workflows, the "user" turn is often generated programmatically, not typed by a human at all.

The assistant turn is history. When a multi-turn conversation is in progress, prior model responses get appended to the messages array with the assistant role. The model doesn't have memory in any persistent sense — it has the conversation history you give it, formatted as assistant turns. If you don't include them, it doesn't know what it said two messages ago.

Why Three Words Can Change the Output

The model isn't parsing your intent. It's doing next-token prediction over the entire context, drawing on statistical patterns from training. When you change the phrasing of an instruction, you're activating different patterns — different associations between that phrase and the kinds of outputs that followed it in training data.

"Be concise" and "respond in three sentences or fewer" are not equivalent instructions. The first is a style cue; the model has seen it paired with outputs ranging from two sentences to ten. The second is a structural constraint with a much tighter distribution of associated outputs. The model has seen "three sentences" followed by, reliably, three sentences. Specificity narrows the probability distribution over possible outputs. Vagueness widens it.

This scales. A system prompt that says "you are a helpful assistant for government procurement" produces a different behavioral envelope than one that says "you are a procurement analyst supporting GS-13 contracting officers working under FAR Part 15. Do not provide legal advice. Do not speculate about vendor pricing. If a question falls outside your scope, say so explicitly." The second prompt has done real engineering work. The first has done almost none.

For production systems: the system prompt is the primary control surface. It's where you encode the behavioral requirements that would otherwise require a human in the loop. Getting it wrong doesn't produce an error — it produces subtly wrong outputs at scale, which is harder to catch and harder to explain to a CIO.

“

Okta Concept Mapping: System Prompt as Authorization Policy

The system prompt resembles an authorization policy — operator-defined rules that constrain what the session can do, evaluated before the user's request is processed. The analogy holds in the sense that both are written by the application layer, not the end user, and both shape what outputs are permissible.

The analogy breaks on enforcement. An authorization policy is evaluated by a policy decision point that sits outside the resource being protected. The model's enforcement of its own system prompt is the model itself. There is no external enforcement layer. A sufficiently crafted user input can cause the model to ignore, contradict, or reveal its system prompt — an attack class called prompt injection. In IDAM terms, this would mean your PDP could be convinced to rewrite its own policies by a clever enough access request. That doesn't happen in OAuth. It happens regularly with LLMs. The system prompt is a behavioral instruction, not a security boundary.

What This Means Before a Meeting

When a federal agency says they're building a prompt-driven tool, the core artifact is a system prompt — and everything downstream of it. The quality of that tool is largely a function of how precisely the system prompt encodes the agency's requirements. Vague prompts produce vague tools. Brittle prompts produce brittle tools. And because the model doesn't throw exceptions, the failure mode is quiet: outputs that are plausible but wrong, at whatever volume the tool is running.

Prompt engineering is the discipline of writing system prompts that are precise enough to constrain the model's behavior to the intended envelope, robust enough to hold under adversarial or unexpected user inputs, and maintainable enough to update when requirements change. It sits closer to writing access control policy than to copywriting — with the added complication that the thing enforcing the policy is also the thing being controlled.

That's a different kind of trust problem than the ones you're used to. The category, though, is the same.