Risk & Compliance, Lesson 1
Prompt injection is the condition in which an attacker supplies text that the model interprets as an instruction rather than as data to be processed. The attack works because large language models do not maintain a structural separation between the instruction channel and the content channel — both arrive as tokens in a context window, and the model has no cryptographic or architectural mechanism to distinguish "this is what I was told to do" from "this is what I am being asked to analyze." Any text the model processes can, in principle, override, extend, or replace its original instructions. No specific implementation is at fault. The behavior is a property of how current transformer-based systems process input.
Direct and Indirect Injection: Why the Distinction Matters Operationally
Direct injection is the simpler case: the attacker controls the user-facing input field and submits a prompt designed to hijack the model's behavior. The architectural condition that creates exposure is any deployment where user-supplied text reaches the model without a trust boundary separating it from system instructions. Mitigations here are imperfect but at least addressable — input filtering, instruction hierarchy enforcement, output monitoring.
Indirect injection is the harder operational problem, and the one most enterprise deployments are underestimating. Here, the attacker does not touch the user interface at all. Malicious instructions are embedded in content the model retrieves and processes: a PDF the document summarizer ingests, an email the assistant reads, a web page the research agent visits. The model processes that content with the same trust level it applies to its own system prompt, because it has no mechanism to do otherwise. Every data source a retrieval-augmented system touches is a potential attack vector. The input field is the least of it.
This distinction structures the 2025 OWASP LLM Top 10. Some categories are direct injection exposures. Others are indirect injection exposures in disguise. A few represent downstream consequences of both.
Okta Concept Mapping
The closest IDAM analogy is the confused deputy problem. A privileged agent — the LLM, operating with tool access and system permissions — is manipulated into using its authority on behalf of an attacker rather than the legitimate principal. The analogy holds well enough to be useful: the deputy has authority it didn't grant itself, and an attacker is exploiting that authority indirectly. Where it breaks: in traditional confused deputy attacks, the deputy has a defined API surface. You can enumerate what it can be tricked into doing and write controls accordingly. An LLM's "API surface" is natural language — unbounded, context-dependent, and not enumerable. You cannot write a complete access control list for "things a language model might be instructed to do." That gap is where IDAM intuition runs out.
The 2025 OWASP LLM Top 10, Mapped to Architectural Conditions
LLM01: Prompt Injection. The foundational category. Exposure condition: any architecture where user input, retrieved content, or tool output reaches the model's context without a structural trust hierarchy separating instructions from data. Both direct and indirect variants apply.
LLM02: Sensitive Information Disclosure. Moved to second position in the 2025 revision, reflecting how frequently retrieval systems surface content the requester was never authorized to see. Exposure condition: RAG pipelines that retrieve documents without enforcing the access controls that govern those documents in their source systems. The retrieval layer inherits none of the source system's authorization logic unless explicitly built to do so.
LLM03: Supply Chain. Exposure condition: use of third-party model weights, plugins, fine-tuning datasets, or inference APIs from sources whose security posture is unverified. The model itself becomes a supply chain artifact.
LLM04: Data and Model Poisoning. Exposure condition: training or fine-tuning pipelines that ingest web-scraped, community-contributed, or third-party data without integrity verification. Poisoned training data can embed persistent behavioral backdoors that survive deployment.
LLM05: Improper Output Handling. Exposure condition: downstream systems that consume LLM output without sanitization — particularly code execution environments, browser rendering engines, or API orchestration layers that treat model output as trusted input. The model's output is user-controlled data. Treating it otherwise is the error.
LLM06: Excessive Agency. Exposure condition: agentic deployments where the model is granted tool access, file system permissions, or API credentials beyond what any single task requires, with no scope limitation enforced at the authorization layer. Least privilege applies here exactly as it does to service accounts — and is violated just as routinely.
LLM07: System Prompt Leakage. Added in the 2025 revision. Exposure condition: system prompts that contain API keys, behavioral guardrails, or confidential operational instructions that the model can be induced to repeat verbatim. If the system prompt is a secret, it should not be the only control enforcing that secret.
LLM08: Vector and Embedding Weaknesses. Added in the 2025 revision. Exposure condition: RAG architectures where retrieved chunks are trusted implicitly, with no integrity verification on the vector store and no provenance tracking on embedded content. A poisoned vector store is an indirect injection attack at scale.
LLM09: Misinformation. Exposure condition: deployments where model output is presented as authoritative without human review, citation requirements, or grounding against verified sources. The risk is an architecture that treats model output as reliable by design, rather than as output that requires verification.
LLM10: Unbounded Consumption. Exposure condition: no rate limiting, token budgets, or cost controls on inference; agentic loops without explicit termination conditions. The attack surface here is the operational budget, not the data.
What This Looks Like in a Procurement Conversation
A federal agency CISO asks whether the agency's document summarization tool is "prompt injection resistant." That question contains two separate questions, and conflating them will cost you credibility.
The first is about direct injection: can a user submit a malicious prompt through the interface? The controls here are input validation, output monitoring, and instruction hierarchy enforcement. Imperfect, but addressable.
The second — the one the CISO may not have named yet — is about indirect injection: what happens when the tool summarizes a document that contains embedded instructions? If the retrieval pipeline trusts document content at the same level it trusts system instructions, any document the tool processes is a potential attack vector. Architectural conditions create that exposure. Configuration doesn't fix it.
The question that earns trust in that conversation is: "When you say prompt injection resistant, are you asking about user input, retrieved content, or both? Because the controls are different, and the architectural conditions that create exposure are different." That question signals that you understand the list. Most vendors don't.
Sources: OWASP Top 10 for Large Language Model Applications, 2025 revision (owasp.org). Category names and ordering as published. Production version of this article will include double-citation sourcing for all security claims per editorial standard.

