Prompt Injection and the 2025 OWASP LLM Top 10

Your IDAM injection intuitions carry you halfway into the OWASP LLM Top 10's #1 risk. Here's exactly where they stop.

By Leigh Garrity— May 8, 2026

Prompt Injection and the 2025 OWASP LLM Top 10

Your IDAM injection intuitions carry you halfway into the OWASP LLM Top 10's #1 risk. Here's exactly where they stop.

The OWASP Top 10 for Large Language Model Applications is the risk taxonomy your buyer's security team is using to evaluate LLM deployments. It first published in 2023, got a meaningful revision for the 2025 edition, and prompt injection has held the #1 slot through both versions. That persistence matters. Prompt injection is ranked first because it's structurally embedded in how LLMs work. The architectural conditions that create it are present in virtually every deployment your accounts are building.

If you've worked in identity and access management, you already have strong intuitions about injection attacks. Those intuitions will carry you about halfway through this conversation, and then they'll actively mislead you. That line is worth knowing precisely, because it's the difference between following a CISO's risk discussion and bluffing through one.

What the 2025 Revision Changed

The 2025 list reflects what OWASP describes as real-world incidents, shifting deployment practices, and emerging attack techniques. Two categories are new. Several changed rank. The reorganization tells you where operational pain has concentrated since agencies started putting real data into these systems.

All ten, with the architectural condition that exposes a deployment to each:

LLM01: Prompt Injection. The model processes untrusted input alongside its instructions and cannot reliably distinguish between them. Present whenever an LLM accepts external content. Persistent at #1.

LLM02: Sensitive Information Disclosure. Jumped from #6. The model reveals PII, proprietary data, or training artifacts in its responses. Present when models are trained or fine-tuned on sensitive data without adequate output filtering.

LLM03: Supply Chain Vulnerabilities. Up from #5. Foundation weights, fine-tuning datasets, adapters, or third-party plugins are compromised before you ever touch them. Present whenever you consume components you didn't build.

LLM04: Data and Model Poisoning. Broadened from "Training Data Poisoning." Malicious data in training, fine-tuning, or embedding pipelines creates backdoors or biased outputs. Present in any system that learns from data it doesn't fully control.

LLM05: Improper Output Handling. Dropped from #2. The model's output is passed to downstream systems without validation, enabling XSS, SQL injection, or command execution. This one should feel familiar. It's the output-side version of injection problems you already know.

LLM06: Excessive Agency. The model has more functionality, permissions, or autonomy than its task requires. An LLM with read-write access when it only needs read. An agent that sends emails without human approval. Present whenever tool permissions aren't scoped to least privilege.

LLM07: System Prompt Leakage. New in 2025. The system prompt that developers assumed was hidden gets extracted through user interaction. OWASP is blunt about this: system prompt constraints "can be bypassed via prompt injection." This category exists because LLM01 exists.

LLM08: Vector and Embedding Weaknesses. New in 2025. The vector database in a RAG pipeline becomes an attack surface: unauthorized access to embeddings, cross-tenant data leakage, or poisoning of the knowledge base. This compounds LLM01 directly, because a poisoned RAG document is an indirect prompt injection delivery mechanism.

LLM09: Misinformation. Renamed from "Overreliance." The model generates false or misleading content that users or downstream systems treat as authoritative.

LLM10: Unbounded Consumption. Resource-heavy operations overload the model, causing service disruption or runaway costs. The denial-of-service category.

Three patterns worth your attention. First, the new entries (System Prompt Leakage, Vector and Embedding Weaknesses) both reflect what happens when LLMs get connected to enterprise data stores and tools. They're deployment-architecture risks — they emerge from how you wire the model into your environment. Second, Sensitive Information Disclosure jumping to #2 tracks with the reality that agencies are putting real data into these systems now, not just running pilots. Third, two of the ten categories are direct downstream consequences of prompt injection. LLM07 fails because LLM01 succeeds. LLM08 compounds LLM01's attack surface. That relationship is the argument for why prompt injection is foundational to the entire taxonomy.

OWASP LLM Top 10: The consensus risk taxonomy for LLM deployments, revised for 2025 with two new categories and significant rank changes reflecting production deployment experience rather than theoretical concern.

How Prompt Injection Actually Works

OWASP's definition is precise: a prompt injection vulnerability occurs when user prompts alter the LLM's behavior or output in unintended ways, including inputs that are imperceptible to humans, "as long as the content is parsed by the model."

Two types. Direct injection: an attacker controls the user prompt and manipulates the model through it. Indirect injection: malicious instructions are embedded in external content the model retrieves — documents, emails, web pages. The model encounters those instructions during processing and follows them.

Direct injection is conceptually straightforward. Someone types something adversarial into the chat box. Your IDAM intuition maps cleanly: it's an input validation problem. The attacker controls the input channel, you know where the channel is, you can apply controls at that boundary. Not easy, but the shape of the problem is one you've seen before.

Indirect injection is where the architecture of the problem changes.

Picture a RAG-enabled assistant that retrieves documents from a SharePoint library to answer employee questions. An attacker uploads a document containing instructions hidden in white text on a white background, or embedded in metadata, or simply written in natural language the model will process. When a user asks a question and the system retrieves that document, the model reads the attacker's instructions as part of its context. It doesn't distinguish "content I was asked to analyze" from "instructions I should follow." It can't. The instructions and the data occupy the same token stream with no privilege boundary between them.

The model processes everything in its context window as language, and following language is what it was designed to do.

The 2025 revision adds a multimodal dimension. OWASP now explicitly includes malicious instructions hidden in images that accompany benign text. An image of a routine invoice, for instance, with adversarial instructions encoded in a region the model's vision component processes but a human reviewer wouldn't notice. The attack surface expands with every modality the model can process, and the core problem remains identical: the model has no mechanism to distinguish content from commands.

IDAM Bridge & Break: Input Validation

Your SQL injection intuition applies cleanly to direct prompt injection: attacker controls input, you validate input, you can parameterize queries to separate instructions from data. The analogy stops at indirect injection. There is no equivalent of parameterized queries. In SQL, you solved injection by ensuring user input was never interpreted as code. In an LLM, the "input" is natural language, and interpreting natural language as instructions is the model's core function. You cannot tell the model to stop treating language as instructions without telling it to stop working. OWASP acknowledges this directly: "It is unclear if there are fool-proof methods of prevention for prompt injection."

Direct prompt injection: An attacker manipulates the model through the user input channel. Resembles traditional injection attacks; familiar controls partially apply.
Indirect prompt injection: Malicious instructions arrive via retrieved content and execute because the model cannot distinguish data from instructions. The harder operational problem, and the one without a clean architectural solution.

Why Defenses Haven't Held

In December 2025, the UK's National Cyber Security Centre published a formal assessment. Their technical director for platforms research stated it plainly:

“

"Current large language models simply do not enforce a security boundary between instructions and data inside a prompt."

The NCSC's position is that prompt injection is "inextricably intertwined in LLMs' architecture." (Source note: the NCSC is the UK's national technical authority for cybersecurity, part of GCHQ. This is a Five Eyes signals intelligence agency making a formal technical assessment, not a blog post.)

OpenAI's own researchers proposed the most promising structural mitigation to date: an instruction hierarchy that assigns different privilege levels to system prompts, user inputs, and retrieved content. The analogy they used is telling. They compared the current state of LLMs to an operating system where "every instruction is executed as if it was in kernel mode." Their hierarchy improved robustness significantly on GPT-3.5. But a 2025 IEEE S&P paper found that plugin developers routinely violated the hierarchy by inserting retrieved content as system-role messages, giving untrusted data the highest privilege tier in the hierarchy designed to restrict it. The defense worked exactly as long as it took for someone to build on top of it.

Then came the stress test. In October 2025, researchers from OpenAI, Anthropic, and Google DeepMind jointly published "The Attacker Moves Second", examining 12 recent defenses against prompt injection and jailbreaks. The defenses had originally reported near-zero attack success rates. Under adaptive attack conditions, where the attacker modifies strategy to counter the specific defense, every defense was bypassed. Prompting-based defenses collapsed to 95–99% attacker success. Training-based methods hit 96–100%. (Source note: this is a preprint, not yet peer-reviewed at a named venue. But three competing frontier labs collaborating on a finding that none of them benefit from commercially carries significant weight. Treat it as the strongest available signal, not settled science.)

In production, the consequences are concrete. In June 2025, researchers at Aim Security disclosed EchoLeak (CVE-2025-32711, CVSS 9.3), a zero-click prompt injection vulnerability in Microsoft 365 Copilot. (The disclosure and patch were June 2025; the detailed arXiv paper followed in September.) A single crafted email, requiring no user interaction, coerced Copilot into accessing internal files — chat logs, OneDrive documents, SharePoint content, Teams messages — and transmitting their contents to an attacker-controlled server. Microsoft's cross-prompt injection classifier, built specifically to catch this class of attack, failed because the malicious email was written to sound like it was addressed to a human, not an AI. The language was natural. The classifier couldn't tell the difference. Microsoft patched it server-side, but the architectural lesson stands: when the attack vector is well-crafted natural language, your classifier is playing a game it structurally cannot win at 100%.

IDAM Bridge & Break: Least Privilege

Excessive Agency (LLM06) maps directly to least-privilege principles you already apply. An LLM agent with read-write-delete permissions when it only needs read is the same misconfiguration as an over-provisioned service account. Your intuition holds here. Where it breaks: least privilege reduces blast radius but doesn't prevent the injection itself. An agent scoped to read-only can still be tricked into disclosing everything it can read. The permission boundary limits damage. The compromise still happens.

No architectural fix exists: The NCSC, OpenAI's instruction hierarchy research, and a joint study from three frontier labs all converge on the same finding — current LLM architectures cannot enforce a reliable boundary between instructions and data.
Production impact is real: EchoLeak demonstrated zero-click data exfiltration from Microsoft 365 Copilot via a single email, bypassing purpose-built defenses.

What This Means in the Room

NIST AI 600-1, the Generative AI Profile published in July 2024, treats prompt injection under its Information Security risk area and maps it to the Measure and Manage functions of the AI Risk Management Framework. Your public sector buyers are working within this framework. OMB M-25-21 directs agencies to implement minimum risk management practices for high-impact AI and references NIST AI RMF as the governing framework. The OWASP LLM Top 10 is not referenced by name in OMB guidance. Your buyer's compliance team speaks NIST vocabulary. The OWASP taxonomy is the technical community's operational language. Both describe the same risks. Being able to translate between them is a practical skill worth having before Tuesday.

When a CISO raises prompt injection in a conversation about their agency's LLM deployment, they're asking a specific architectural question: what happens when this system processes content it didn't generate? Every RAG pipeline, every email-connected assistant, every agent that reads documents from a shared drive is processing content it didn't generate. The deployment is exposed to indirect injection. The conversation worth having is about limiting consequences.

Simon Willison, the security researcher who coined the term "prompt injection," describes the critical pattern as a lethal trifecta: access to private data, exposure to untrusted content, and the ability to communicate externally. When all three are present, a single poisoned document can lead an agent to exfiltrate sensitive data without a vulnerability in traditional code. That's a diagnostic you can use in a live conversation. When the buyer describes their deployment, listen for those three elements. If all three are present, the prompt injection conversation isn't theoretical. It's operational.

The controls conversation starts at familiar ground: scoping permissions so the model can only access what its task requires. Validating the model's output before it reaches downstream systems (LLM05). Monitoring for data exfiltration paths like the EchoLeak pattern. Treating the model's responses as untrusted input to every system they touch. These are zero-trust principles applied to a new kind of principal. None of this prevents prompt injection. All of it contains the damage when injection succeeds. And that's the honest framing. The buyer who hears you say it will trust the rest of what you say more, because you just told them something a vendor wouldn't.

Okta Concept Mapping: Zero Trust for a Non-Human Principal

An LLM agent operating within an enterprise environment is, from an identity perspective, a non-human principal with access to resources. Okta's publicly shipped capabilities for managing non-human identities, service accounts, and API tokens apply to the infrastructure around the model: controlling what systems it can reach, what credentials it holds, what actions those credentials authorize. Where the mapping breaks: traditional identity controls assume the principal's intent is stable. A prompt-injected model's intent has been altered by an attacker, but its credentials haven't changed. The identity layer sees an authorized principal behaving normally. The compromise is invisible to the access management plane.

NIST and OWASP describe the same risks in different vocabularies: Your buyer's compliance team uses NIST AI RMF language. The technical community uses OWASP. Translating between them is a concrete advantage in public sector conversations.
The operational question: What limits the consequences when prompt injection succeeds? The answer is familiar controls applied to a principal whose intent can be silently compromised. Containment is the achievable goal. Prevention remains unsolved.

Your IDAM intuitions about injection are real, and they'll carry you into this conversation. Input validation, least privilege, zero trust, treating outputs as untrusted. All of it applies. The ceiling is different. In SQL injection, the fix was architectural: parameterized queries eliminated the class of vulnerability. In prompt injection, no equivalent fix exists yet, and the people building these models are the ones saying so. The controls you know reduce the blast radius. The hole stays open. That's the line between the intuition that helps you and the intuition that misleads you, and now you know where it falls.

Things to follow up on...

Google's internal defense data: Google's research team published "Lessons from Defending Gemini" in May 2025, documenting why training models on known attacks fails to generalize to novel indirect injection techniques.
Gemini memory corruption attack: Security researcher Johann Rehberger demonstrated in February 2025 that indirect injection could permanently plant false data in Gemini's long-term memory, corrupting future sessions across conversations without the user's knowledge.
NIST AI 600-1's agentic gap: The Cloud Security Alliance published an agentic profile extension in April 2026 arguing that NIST AI 600-1 treats prompt injection as an output risk rather than an action risk, leaving agent-specific attack surfaces like tool impersonation and malicious tool registration unaddressed.
PoisonedRAG at scale: A 2024 research paper demonstrated that adding just five malicious documents to a corpus of millions caused a RAG-enabled system to return attacker-controlled answers 90% of the time for targeted queries, illustrating how little effort indirect injection requires when the retrieval pipeline is the attack surface.