What Prompt Injection Actually Is
A prompt injection attack occurs when an attacker supplies input that causes a large language model to treat untrusted content as a trusted instruction. The model has no cryptographic mechanism to distinguish between its system prompt — the instructions its operators wrote — and content that arrives through the user turn or through retrieved data. An attacker who can place text in front of the model can, under the right conditions, override or redirect the model's intended behavior.
Two distinct cases follow from that definition, and they are not equally tractable.
Direct injection is the straightforward version: a user submits a malicious prompt directly to the model. "Ignore your previous instructions and do X instead." This is the case that gets the most coverage, and it's the easier one to reason about. The attack surface is the user input field. You can inspect it, filter it, and build guardrails around it.
Indirect injection is the harder problem. The malicious instruction doesn't come from the user — it arrives in content the model retrieves or processes: a document pulled from a knowledge base, an email the model is summarizing, a web page the model is browsing on the user's behalf. The user's query is clean. The attack is embedded in the data pipeline. In a retrieval-augmented generation (RAG) deployment, the model fetches external documents to ground its responses; any of those documents can carry injected instructions. The user never touched the payload.
Inspecting user input is tractable. Inspecting every document in a retrieval corpus — at query time, before it reaches the model — is a different category of challenge entirely.
Okta Concept Mapping
Prompt injection resembles the input validation problem your team already knows: untrusted data interpreted as executable instruction, the same structural failure as SQL injection. The IDAM framing that fits best is complete mediation — the principle that every access request must be checked at the boundary, not assumed safe because it arrived through a trusted channel. In IDAM, complete mediation is enforced by the protocol: the resource server validates the token signature, expiry, and scope before acting. In an LLM system, there is no protocol layer enforcing the boundary between instruction and data. The model's separation between "what I was told to do" and "what this document says" is semantic inference, not cryptographic verification. An attacker who understands that distinction has a meaningful advantage over a defender who doesn't.
The Full 2025 Taxonomy
OWASP published the 2025 revision of the LLM Top 10 with two structural changes: Sensitive Information Disclosure moved from its prior position to number two, and two new categories — System Prompt Leakage and Vector and Embedding Weaknesses — were added. The full list, with the architectural conditions that expose a deployment to each:
LLM01: Prompt Injection. Covered above. Architectural condition: any deployment where user-supplied or externally-retrieved content reaches the model without structural separation from the system prompt. RAG deployments and agentic systems with tool access are the highest-exposure cases.
LLM02: Sensitive Information Disclosure. The model reveals training data, system prompt contents, or user data in its output. Architectural condition: models trained on or given access to sensitive data, deployed without output filtering or data classification controls. Operational controls for this category are covered in Lesson 2.
LLM03: Supply Chain. Compromised model weights, poisoned fine-tuning datasets, or malicious third-party plugins introduce vulnerabilities before deployment. Architectural condition: any deployment using pre-trained models, external APIs, or third-party integrations without provenance verification or integrity checks on model artifacts.
LLM04: Data and Model Poisoning. Malicious content injected into training or fine-tuning data causes the model to behave incorrectly in targeted ways. Architectural condition: fine-tuning pipelines with insufficient data provenance controls or no anomaly detection on training datasets.
LLM05: Improper Output Handling. LLM output is passed to downstream systems — code execution environments, SQL query engines, external APIs — without sanitization. The model's output becomes the attack vector for the next system in the chain. Architectural condition: agentic deployments where model output triggers downstream actions without an intermediate validation layer.
LLM06: Excessive Agency. The model is granted permissions beyond what the task requires, and acts on them. Architectural condition: agentic systems with overly broad tool access, no scope constraints on what actions the model can take, and no confirmation requirements before consequential operations.
LLM07: System Prompt Leakage. (New in 2025.) The model reveals the contents of its system prompt through its output, exposing operator instructions, configuration details, or sensitive policy information. Architectural condition: system prompts that contain credentials, sensitive business logic, or security-relevant configuration — combined with insufficient output controls.
LLM08: Vector and Embedding Weaknesses. (New in 2025.) Attacks targeting the retrieval layer in RAG systems: poisoned embeddings that cause the model to retrieve malicious content, or cross-tenant data exposure in shared vector stores where access controls on the retrieval index don't match access controls on the underlying data. Architectural condition: RAG deployments using shared vector databases without per-query access enforcement at the retrieval layer.
LLM09: Misinformation. The model generates plausible but false information that users or downstream systems act on. Architectural condition: deployments without output verification, grounding mechanisms, or human review for high-stakes decisions. The risk scales with how much autonomy the system has to act on its own outputs.
LLM10: Unbounded Consumption. An agentic system consumes excessive compute, API calls, or financial resources — through runaway loops, adversarially triggered recursion, or simply poor scope definition. Architectural condition: agentic systems without rate limiting, cost controls, loop detection, or hard limits on tool invocation depth.
In the Procurement Conversation
A public sector CISO reviewing an AI procurement will ask about the OWASP LLM Top 10. OMB guidance on federal AI use and the NIST AI Risk Management Framework both reference it as a baseline. The question will often be framed as: "How does your solution address prompt injection?"
The answer depends entirely on whether the deployment uses RAG. A system that only processes direct user input has a bounded injection surface. A system that retrieves external documents, browses the web, or processes emails has an indirect injection surface that scales with the breadth of its data access. The CISO who asks this question probably knows the difference. The AE who answers it should too.
The taxonomy also surfaces a question worth raising before the CISO does: which of these ten categories has the vendor actually addressed, and which has it deferred to the customer's operational controls? LLM06 (Excessive Agency) and LLM08 (Vector and Embedding Weaknesses) are the categories most likely to be underdiscussed in a vendor briefing and most likely to surface in a post-deployment incident review.
The list is table stakes. Which items on it are architectural problems versus configuration problems — and which ones nobody has fully solved yet — is where the conversation earns its keep.

