Every time you use ChatGPT or Claude, three roles participate in the conversation. You control one of them. The application provider controls another before you type a single character. And the model generates the third. Most of the confusion about how language models work in enterprise settings traces back to misunderstanding the relationship between those three roles, specifically how much authority each one actually carries.
The concept worth anchoring before anything else: the model processes all three roles as text in a single stream. Weighted input, with trained preferences about which parts matter more. Prompt sensitivity, prompt injection, the whole enforcement gap — all consequences of that single fact.
The three roles, mechanically
The developer message (until recently called the "system message," and we'll get to why that changed) is written by whoever built the application. It typically contains the model's identity ("You are a helpful assistant that specializes in federal procurement"), behavioral constraints ("Do not discuss competitor products"), and contextual anchors ("Today's date is May 8, 2026"). When you build on the API, you write this yourself. When you use ChatGPT or Claude, the provider has written one for you.
The user message is whatever the human types. "Summarize this RFP." "Explain the difference between FedRAMP High and Moderate." The input.
The assistant turn is the model's generated response. In multi-turn conversations, previous assistant turns become part of the context the model processes. But "context" here does not work like a session. There is no persistent state between turns. The model re-reads the entire conversation, including the developer message, every time it generates a new response. If you're used to thinking about authenticated sessions where the server maintains state, set that aside. Every turn is a fresh read of the full transcript.
All three roles arrive at the model as tokens in a single context window (the bounded space the model can process at once). The developer message shares the same processing space as everything else. One stream of text. Training teaches the model to weight the developer message more heavily than user input, but that weighting is learned behavior, a statistical tendency, with no architectural wall behind it.
OpenAI's researchers named this directly. Their Instruction Hierarchy paper (a research paper from OpenAI's alignment team, not a product announcement; it proposes a training-based mitigation that the authors themselves acknowledge remains incomplete against powerful adversarial attacks) identified the root vulnerability: "LLMs often consider system prompts to be the same priority as text from untrusted users and third parties." Their proposed fix was training-based. Teach the model to prioritize privileged instructions better. The architecture stayed flat. That distinction will matter a great deal in a few paragraphs.
- Three roles: Developer message (application-controlled instructions), user message (human input), and assistant turn (model output) form every model conversation. The model receives all three as text in one context window, with trained — not enforced — priority given to the developer message. No persistent session state exists between turns.
Why "system" became "developer"
Starting in December 2024, OpenAI renamed the "system" role to "developer" in their API for newer reasoning models. The old name still works for backward compatibility. The reason for the change matters more than the change itself.
"System" implies something architectural. If you've spent your career configuring authorization servers, "system" sounds like the layer that enforces. Root-level. Immutable. Above the user. Exactly the wrong intuition here. "Developer" is more honest. It names who actually holds this role: the application builder.
The OpenAI Model Spec (OpenAI's published specification for intended model behavior, updated December 2025) makes the hierarchy explicit, and this is the part worth internalizing. Four levels of instruction authority:
- Root: Fixed rules that apply to all model instances. Set by OpenAI. You cannot see or change these.
- System: Surface-specific rules, also controlled only by OpenAI, applied through the Model Spec or internal system messages.
- Developer: Instructions provided by application builders through the API.
- User: The end user's input.
When you build an enterprise application on the API, you are writing at level three. Two layers of provider-controlled instructions sit above yours. For enterprise builders who assume they have top-level control over model behavior: they don't. The provider's instructions take precedence, and those instructions can change without notice.
The rename from "system" to "developer" is a small correction that encodes a large one. Even the provider recognized the old term was creating false expectations about who holds authority.
- The rename matters: OpenAI moved from "system" to "developer" because "system" implied enforcement that doesn't exist. The Model Spec reveals a four-tier hierarchy where API developers operate at level three, below two layers of provider-controlled instructions.
Small wording changes, measurable consequences
If the developer message is just text the model tries to follow, you might reasonably wonder how much the specific wording matters. More than most people expect, and the providers have documented it themselves.
OpenAI's GPT-4.1 prompting guide (primary provider documentation from OpenAI's developer cookbook, not a third-party benchmark) reports that adding three sentences to a developer message improved their internal SWE-bench benchmark score by close to 20%. Three sentences transformed the model from "chatbot-like" behavior to an agentic mode. These numbers reflect a specific model version and evaluation condition; the percentage will shift as models evolve, but the pattern of sensitivity to developer message wording is consistent across providers.
On the other side of the ledger, Anthropic published a post-mortem in April 2026 (a first-party engineering disclosure, which makes it more credible than a third-party report of the same incident) documenting what happened when they added a 25-word instruction to Claude Code's system prompt: "Keep text between tool calls to ≤25 words. Keep final responses to ≤100 words unless the task requires more detail." Weeks of internal testing showed no regressions. They shipped it. It caused measurable quality degradation across two model versions. They reverted it four days later.
Twenty-five words. Measurable regression. Four days before anyone traced the root cause.
Prompt sensitivity. Precise wording improves output. Imprecise wording degrades it. The effect shows up in benchmarks, in production quality, and in whether the model hallucinates or stays grounded. A peer-reviewed study in Communications Medicine found that when clinical vignettes contained a single planted false detail, models repeated or elaborated on the planted error in up to 83% of cases. A mitigation prompt cut that rate roughly in half. Meaningful improvement. Still far from elimination.
OpenAI's documentation notes that different model snapshots within the same family can produce different results from identical prompts, which is why they recommend pinning production applications to specific model versions.
So the developer message is code. Mutable, version-sensitive, and consequential. It requires testing, version control, and evaluation, because small changes in wording produce large changes in behavior.
- Prompt sensitivity is real and quantified: Provider documentation shows ~20% benchmark improvement from three added sentences (OpenAI) and measurable quality regression from a 25-word addition (Anthropic). Specific numbers are version-dependent; the pattern of sensitivity is durable. Wording is functional, not cosmetic.
Where your policy intuition misleads you
Now the lesson gets consequential for anyone who sells or builds identity infrastructure.
When you configure an OAuth authorization server, the scopes you define are enforced architecturally — a token with read:reports cannot write reports regardless of how the client crafts its request. A developer message carries a similar shape — structured instructions, behavioral constraints, named permissions — but the boundary it creates is probabilistic. The model follows it most of the time, under most conditions. In IDAM, policy enforcement is architectural. In LLMs, instruction following is probabilistic. Your OAuth intuition about policy configuration will make you trust the developer message more than you should.
The mechanism that exploits this gap has a name: prompt injection. OWASP ranks it #1 in their Top 10 for LLM Applications (the same OWASP organization your security buyers already trust for web application risk frameworks, now maintaining a parallel list for LLM-specific vulnerabilities). Their assessment is blunt:
"Given the stochastic influence at the heart of the way models work, it is unclear if there are fool-proof methods of prevention."
Prompt injection comes in two forms. Direct injection is when a user crafts input designed to override the developer message. Indirect injection is when the model processes external content — a retrieved document, a webpage, an uploaded file — that contains text designed to alter its behavior. Both work because the model has no architectural separation between trusted instructions and untrusted input. All tokens in the same window.
Microsoft's Azure documentation says it without hedging: "A system message influences the model, but it doesn't guarantee compliance." Anthropic's security research (a first-party research publication from Anthropic's safety team documenting their own mitigation efforts and residual risk) acknowledges that even after training-based mitigations, they measure approximately 1% attack success rates against prompt injection in their browser agent. That figure reflects specific model versions and testing conditions, but Anthropic frames it as meaningful residual risk, not a solved problem.
One percent sounds small. Run a thousand agent sessions a day and it stops sounding small.
The developer message matters. It materially shapes model behavior. But it carries no enforcement mechanism, and any enterprise feature that treats it as a security boundary has a vulnerability that better prompting alone will never close.
- The analogy-break: A developer message resembles an authorization policy but lacks architectural enforcement. Prompt injection exists because the model processes trusted and untrusted input as text in the same context window. OWASP ranks it #1 for LLM applications. Enforcement must live outside the model.
From ChatGPT to enterprise feature
When you type into ChatGPT, OpenAI controls the developer message. You don't see it. You don't set it. It changes without notice. Anthropic publishes release notes for Claude's system prompts and states explicitly that these updates "do not apply to the Claude API." The consumer product and the API are different surfaces with different developer messages and different risk profiles.
When an enterprise builds a feature on the API, the developer message becomes their responsibility. They write it. They version it. They test it. And they learn, sometimes through the kind of incident Anthropic documented in April, that it is mutable infrastructure, code that the application injects at call time, that can differ per tenant, per use case, per product version.
In the consumer case, the provider manages the developer message and absorbs the risk. In the enterprise case, your customer's engineering team manages it, and the risk transfers completely.
Picture a CISO briefing where someone says they've "configured the system prompt" for their new AI-powered case management tool. Or a discovery call where a civilian agency describes the internal chatbot they're building for benefits adjudication. Three questions will tell you more about their architecture's maturity than any slide deck:
Who writes the developer message? (Is it one engineer, a product team, or an outsourced integrator?)
Who can change it, and what controls govern changes? (Version control? Evaluation pipeline? Or someone editing a text field in a dashboard?)
What happens when the model doesn't follow it? (Is there output filtering? Authorization at the resource layer? Or is the developer message the only guardrail?)
That third question is the one that separates teams who understand the enforcement gap from teams who are about to discover it in production.
- Consumer vs. enterprise: In ChatGPT or Claude, the provider controls the developer message and absorbs the risk. In API-built enterprise features, the developer message is application-controlled infrastructure that requires versioning, testing, and layered enforcement around it.
Where enforcement actually lives
If the developer message carries no enforcement mechanism, what does?
The same things that enforce policy everywhere else in your stack. Application controls. Authorization layers. Tool gateways. The model should not have access to data the user isn't authorized to see. That's an access control problem, and it has the same answer it has always had: enforce at the resource layer.
The developer message still matters. It shapes behavior, reduces the likelihood of unwanted outputs, and provides the conversational frame that makes the model useful. Think of it as the job description you give a new contractor. It tells them what you want, what tone to use, what topics to focus on. But the analogy has a limit worth naming: a contractor who ignores the job description faces consequences. Termination, legal liability, reputational damage. A model has no consequence structure. It has probability distributions. When it "ignores" the developer message, there's no feedback mechanism that corrects the next response. You just get a bad output and have to catch it downstream.
OWASP's recommended mitigations for prompt injection read like an IDAM architecture review: privilege separation, input and output filtering, monitoring, defense in depth. The field hasn't invented new enforcement primitives. It needs the ones that already exist, applied to a new kind of compute.
- Enforcement lives in the application layer: Authorization, tool gateways, output filtering, and monitoring enforce boundaries. The developer message shapes behavior but carries no consequence mechanism when violated. The two are complementary, and treating either as sufficient on its own creates the gap.
Things to follow up on...
-
OpenAI's Instruction Hierarchy proposal: The Instruction Hierarchy paper from OpenAI's alignment team lays out the training-based approach to making models respect priority levels between developer, user, and third-party text, and names exactly where the authors think it still falls short.
-
OWASP's LLM07 on prompt leakage: The 2025 update to the OWASP Top 10 for LLMs added a dedicated entry for system prompt leakage, recognizing that if the developer message isn't an enforcement boundary, it isn't a confidentiality boundary either.
-
Anthropic's prompt injection research: Anthropic's browser-agent security publication documents their mitigation efforts and the residual ~1% attack success rate, framing the problem as meaningfully reduced but explicitly unsolved.
-
The Claude Code post-mortem: Anthropic's April 2026 engineering disclosure is a rare first-party account of how a system prompt change caused production degradation across multiple model versions, and what institutional controls they're adding to prevent recurrence.

