The OWASP LLM Top 10: A Taxonomy of Architectural Exposure

By Leigh Garrity— May 6, 2026

The OWASP LLM Top 10: A Taxonomy of Architectural Exposure

The OWASP LLM Top 10 is a risk classification framework for applications built on large language models, not for the models themselves. The framework doesn't evaluate whether GPT-4 is secure. It catalogs the conditions under which an applicationbuilt on top of an LLM becomes exploitable, and it does so by mapping risk categories to architectural decisions: what the model can access, what it can act on, and what it trusts.

The 2025 revision, published by OWASP's LLM Security project, made three structural changes. Sensitive Information Disclosure moved to number two, reflecting how frequently retrieval-augmented deployments expose data through the model's outputs rather than through direct breach. Two categories are new: System Prompt Leakage (LLM07) and Vector and Embedding Weaknesses (LLM08), both responses to the rapid adoption of RAG architectures in enterprise deployments. The rest of the list was reordered and tightened, but the additions tell you where the field moved.

The Conceptual Spine: Direct vs. Indirect Injection

Prompt injection sits at LLM01 for a reason. It's the category that makes every other category harder to defend, because it's the mechanism by which an attacker can override the application's intended behavior from inside the model's context window.

Direct injection is the simpler case. The attacker is the user. They craft input designed to override the system prompt, extract configuration, or redirect the model's behavior. The attack surface is the input field. Defenses — input validation, content filtering, output monitoring — are imperfect but at least they're legible. You know where the untrusted input enters.

Indirect injection is the harder problem, and the specific way it's harder is worth stating plainly. In an indirect injection attack, the attacker is not the user. The attacker has pre-positioned malicious content somewhere the model will retrieve — a document in a connected SharePoint library, an email in an inbox the agent has access to, a web page the model is asked to summarize. That content contains embedded instructions. When the model retrieves and processes it, those instructions land in the context window alongside legitimate data, and the model has no architectural mechanism to distinguish between them.

This is the core exposure condition: LLMs have no separation between data plane and control plane. Everything in the context window is tokens. An instruction embedded in a retrieved PDF is, from the model's perspective, structurally identical to an instruction in the system prompt. The model doesn't verify provenance. It processes. An attacker who can write to any data source the model reads has a vector into the model's behavior, without ever touching the application directly.

The architectural conditions that expose a deployment to indirect injection are increasingly common: any RAG pipeline that retrieves from user-accessible storage, any agent that reads email or calendar data, any workflow that summarizes external web content. Which is to say: most of the enterprise AI deployments being evaluated in federal procurement right now.

“

Okta Concept Mapping

The closest IDAM analogue to prompt injection is the confused deputy problem — a trusted intermediary acting on behalf of a principal, manipulated into exercising permissions the attacker doesn't hold directly. Your OAuth intuition is useful here: an agent holding a delegated token can be redirected by injected instructions just as a confused deputy can be redirected by a malicious caller. The analogy holds at the "trusted channel" level. It breaks at verification. In federation, we have cryptographic separation: a SAML assertion carries a signature; a retrieved document doesn't. The model cannot apply that distinction. There is no signature on a chunk of text in a vector store, and the model will not ask for one.

The Full Taxonomy

LLM02: Sensitive Information Disclosure. The model surfaces data it shouldn't — from training, from the system prompt, or from retrieved context. Data leakage controls and the mechanics of what gets memorized during training are covered in Lesson 2. The architectural condition that belongs here: any deployment where the model has access to data it isn't authorized to surface to the requesting user.

LLM03: Supply Chain Vulnerabilities. Third-party model weights, plugins, fine-tuning datasets, and inference APIs all introduce dependencies outside the deploying organization's control. Any component in the LLM application stack sourced from an unverified or insufficiently audited third party is the exposure.

LLM04: Data and Model Poisoning. Adversarially crafted training or fine-tuning data that embeds backdoors or shifts model behavior. The training data mechanics are Lesson 2 territory; the architectural exposure here is any pipeline where fine-tuning data ingestion lacks integrity controls.

LLM05: Improper Output Handling. LLM output passed downstream to an interpreter — SQL engine, shell, HTML renderer, API call — without validation. The model generates; something else executes. That gap is the attack surface.

LLM06: Excessive Agency. An agent granted permissions beyond what its task requires, or permissions that don't expire with the session. Over-provisioning is the condition: an agent that can write when it only needed to read, or that retains access after the task completes.

LLM07: System Prompt Leakage. System prompts frequently contain configuration logic, business rules, and occasionally credentials. Adversarial prompting can extract this content. Treating the system prompt as a security boundary rather than an operational one is the mistake.

LLM08: Vector and Embedding Weaknesses. RAG deployments where the vector store lacks access controls, where retrieval can be manipulated through embedding poisoning, or where document-level permissions aren't enforced at query time. This category and indirect injection overlap significantly — LLM08 describes the infrastructure condition that makes indirect injection viable at scale.

LLM09: Misinformation. Model output trusted as authoritative without human review, particularly in high-stakes decision workflows. The architectural failure is any pipeline where the model's output is the terminal step rather than an input to human judgment.

LLM10: Unbounded Consumption. No rate limiting, no token budgets, no cost controls. Exposed to resource exhaustion, denial-of-service, and runaway inference costs. Less glamorous than the other nine, and genuinely dangerous in multi-tenant government deployments where one misconfigured agent can affect shared infrastructure.

What This Means in a Procurement Conversation

When a federal CISO asks whether an AI deployment is "secure," the OWASP LLM Top 10 gives you a structured way to ask back: secure against which conditions? A deployment with strong input validation and no RAG pipeline has a very different risk profile than a deployment with an agent that reads agency email and has write access to a document management system.

Each category in the framework is a question about what the deployment can access, what it can act on, and what it trusts. Those questions have answers. The answers determine exposure. That's the conversation worth having.