When the Instructions Are in the Document

By Carey Whitten— May 5, 2026

When the Instructions Are in the Document

Most buyers ask about jailbreaking. That's the direct injection problem — attacker controls the prompt, model does something it shouldn't. It's real, it's documented, and it's the easier half of the problem.

The harder half is indirect injection. The attacker isn't in the room. Their instructions are in the PDF the model just retrieved.

What Prompt Injection Is

Prompt injection is an attack in which an adversary embeds instructions in content that an LLM processes, causing the model to treat those instructions as authoritative directives rather than as data. The model has no native mechanism to distinguish between instructions from its operator and instructions embedded in content it was asked to process — both arrive as tokens in the same context window. The attack doesn't require access to the model, the API, or the user session. It requires access to anything the model reads.

Direct injection: the attacker controls the user-facing input. They type "ignore previous instructions" or a variant. The attack surface is the user prompt itself.

Indirect injection: the attacker has no direct access to the model. They place malicious instructions in a document, email, web page, or database record that the model will retrieve and process as part of a legitimate workflow. The model reads the content, encounters the embedded instruction, and acts on it — because from the model's perspective, an instruction is an instruction regardless of where it came from.

Indirect injection is the harder operational problem for two reasons. First, the attack surface is every external data source the model touches, which in a RAG-based deployment can be substantial. Second, the malicious content arrives through a channel the system was designed to trust: the retrieval pipeline itself.

What This Looks Like in a Procurement Conversation

A CISO evaluating an LLM deployment for a federal agency will often frame the risk as model behavior — can the model be made to say something it shouldn't? That's the direct injection frame. The question worth asking is architectural: what external content does this model retrieve, from what sources, with what validation, and what happens if that content contains instructions? If the answer to "what validation" is "none," the indirect injection surface is open. Ask it before the RFP drops.

“

Okta Concept Mapping

Prompt injection resembles a confused deputy attack. A legitimate agent — the LLM — is manipulated into using its authority on behalf of an attacker who has no direct access to that authority. The analogy holds for the basic structure: a trusted intermediary, an attacker who can't act directly, instructions laundered through the intermediary's context. It breaks at enforcement. In a confused deputy attack on a traditional system, you can add explicit capability checks at the API or OS layer — the deputy can be required to present credentials for each action. In an LLM, the "capability check" is the model's own judgment about what constitutes an instruction versus data. That judgment is not reliably enforceable through external controls.

The 2025 OWASP LLM Top 10: Architectural Conditions

Sensitive Information Disclosure moved to #2 in the 2025 revision. System Prompt Leakage and Vector and Embedding Weaknesses are new categories.

LLM01:2025 – Prompt Injection Exposed when the model processes external content without instruction/data separation; when the system prompt provides no constraint on acting on embedded instructions; when the retrieval pipeline is trusted without content validation.

LLM02:2025 – Sensitive Information Disclosure Exposed when training data included PII, credentials, or internal documents without sanitization; when the model has access to sensitive context it doesn't need for the task; when output filtering is absent or misconfigured. (Data leakage controls are covered in Lesson 2.)

LLM03:2025 – Supply Chain Vulnerabilities Exposed when model weights, fine-tuning datasets, or inference infrastructure are sourced from unverified third parties; when model provenance is not tracked through the deployment pipeline; when plugin or tool dependencies are not inventoried.

LLM04:2025 – Data and Model Poisoning Exposed when training or fine-tuning data is sourced from external repositories without integrity verification; when retrieval corpora can be written to by untrusted parties; when no baseline behavioral testing exists to detect drift.

LLM05:2025 – Improper Output Handling Exposed when LLM output is passed directly to downstream systems — code interpreters, databases, APIs — without sanitization; when the consuming system treats model output as trusted input.

LLM06:2025 – Excessive Agency Exposed when the model is granted tool access, API permissions, or system actions beyond what the task requires; when there is no human-in-the-loop for consequential actions; when agent permissions are not scoped to minimum necessary.

LLM07:2025 – System Prompt Leakage (new) Exposed when the system prompt contains confidential instructions, credentials, or architectural details; when the model can be induced to reproduce its system prompt through prompt manipulation; when the system prompt is treated as a security boundary rather than as potentially recoverable content.

LLM08:2025 – Vector and Embedding Weaknesses (new) Exposed when the vector database is writable by untrusted parties; when embedding similarity is used as the sole trust signal for retrieved content; when retrieval results are not validated before being passed into the model context.

LLM09:2025 – Misinformation Exposed when the model is used for authoritative outputs — legal, medical, policy — without human review; when the deployment context creates user reliance on model accuracy without disclosure of model limitations.

LLM10:2025 – Unbounded Consumption Exposed when there are no rate limits, token budgets, or cost controls on model inference; when the model can be induced to generate recursive or excessively long outputs; when resource consumption is not monitored per user or session.

The same question runs through all ten categories — the one you ask about any trust architecture: who decided this principal could do this thing, and what happens when that decision was wrong? For LLM deployments, the answer is often "the model decided." That's the architectural condition that exposes everything else on this list.

Source: OWASP Top 10 for LLM Applications 2025, published by the OWASP Foundation.