LLM02: The Exfiltration Path That Doesn't Need an Attacker

By Carey Whitten— May 5, 2026

LLM02: The Exfiltration Path That Doesn't Need an Attacker

What the Specification Actually Says

The OWASP LLM Top 10 defines LLM02: Sensitive Information Disclosure as the risk that an LLM exposes sensitive information — including personally identifiable information (PII), proprietary business data, and confidential system details — through its outputs, because that information appeared in training data, was provided in a prompt, or was retrieved from an external source the model had access to. The category covers both inadvertent disclosure (the model reproduces something it shouldn't) and structural disclosure (the architecture itself creates pathways for data to move where it wasn't intended). A misconfigured storage bucket leaves traces in access logs. A stolen credential has a revocation event. Here, neither applies: the exposure may not surface in logs, and once data is encoded into model weights, there is no revocation event that closes the pathway.

Three Pathways, One Direction

Chat logs. Every prompt sent to a hosted LLM is, by default, a log entry somewhere. Enterprise users routinely include context that belongs nowhere near a vendor's infrastructure: draft contract language, internal project names, HR case details, customer account specifics. A deployment generating 40,000 queries per month produces a substantial corpus of sensitive business context — providing context is how you get useful completions, and users know it. Most enterprise agreements specify a retention window (the period during which the provider holds prompt and completion data), but the default on many commercial APIs is measured in days to weeks, and the data is often used for abuse monitoring, quality review, and, absent explicit contractual restriction, model improvement.

Embeddings. An embedding is a numerical vector representing the semantic content of text, used in retrieval-augmented generation (RAG) systems to find relevant documents without exact keyword matching. When an enterprise builds a RAG layer over its internal document corpus, it converts those documents into embeddings stored in a vector database. Those vectors are not encrypted representations of their source text, but they are not fully opaque either. Research published in 2024 by teams at Stanford and the University of Washington demonstrated that embedding inversion attacks — reconstructing approximate source text from embedding vectors — can recover meaningful content from common commercial models, particularly for short, structured text like names, addresses, and account numbers. The vector database is not a one-way function.

Fine-tuning datasets. Fine-tuning adapts a base model to a specific domain by training it on a curated dataset. Enterprises that fine-tune on proprietary data — customer service transcripts, internal policy documents, technical manuals — are encoding that data into model weights. Model memorization, a well-documented phenomenon in which a model reproduces training data verbatim or near-verbatim under specific prompting conditions, means fine-tuned models can surface confidential information through ordinary completions. Research from Google DeepMind and collaborating institutions found that memorization rates increase with data repetition and model size — meaning the enterprise documents most likely to appear repeatedly in a fine-tuning corpus are precisely the ones most likely to be reproduced.

The Control Set

Prompt-boundary DLP. Data loss prevention (DLP) applied at the prompt boundary inspects outbound content before it reaches the model, flagging or redacting PII, credential patterns, and sensitive entity types before transmission. The logic is identical to email DLP, which makes it operationally familiar. The limitation is coverage: DLP rules catch what they're configured to catch. Context that's sensitive but doesn't match a pattern — a project codename, an unreleased specification, a personnel decision framed in plain language — passes through.

Contractual no-training clauses. Most enterprise LLM agreements now include provisions under which the vendor commits not to use customer inputs for model training or improvement. These are meaningful legal commitments, not technical controls. A no-training clause tells you what the vendor has agreed not to do; it does not give you visibility into whether that commitment is honored, and it does not prevent your data from residing in inference logs or abuse-monitoring systems for the duration of the retention window.

Tenant isolation. In a multi-tenant LLM deployment, tenant isolation refers to the logical or physical separation of one customer's data (prompts, completions, fine-tuning artifacts, embeddings) from another's. Weak isolation creates cross-tenant leakage risk. Strong isolation for fine-tuned models typically requires dedicated model instances rather than shared infrastructure, which carries significant cost implications.

Retention windows and zero-data-retention. A retention window is the period during which a provider stores prompt and completion data after a session ends. Zero-data-retention (ZDR) claims go further, asserting that no prompt or completion data is persisted after the response is returned. In regulated environments, the distinction between ZDR claim types matters more than the label.

ZDR claims come in two forms, and conflating them is a procurement mistake.

A contractual ZDR commitment means the vendor agrees not to retain your data. This is a legal obligation enforceable through your agreement, and not independently verifiable from the outside. You cannot confirm that no data was written to disk, cached in an intermediate layer, or captured in a monitoring pipeline. You are trusting the vendor's architecture and their compliance with their own commitment.

Technically verifiable ZDR means an architecture in which you can confirm, through controls you operate, that data does not leave your environment. This typically requires on-premises or private-cloud deployment with network egress controls on your side of the boundary. It is substantially more expensive and operationally complex. Very few enterprises actually have it.

When a federal agency's CISO asks whether your solution offers zero-data-retention, they are asking which kind. "Yes, we have a ZDR agreement" and "yes, we can demonstrate technically that data doesn't leave your boundary" answer different questions. Sophisticated buyers in regulated environments know the difference. The seller who can name that distinction before the buyer has to ask for it is the one who earns the next meeting.

“

Okta Concept Mapping: Attribute Release in SCIM Provisioning

The closest IDAM analogy is attribute release in SCIM provisioning: you decide which user attributes to sync to a downstream application, because anything you provision becomes part of that system's data model and can surface in unexpected ways. The discipline of scoping what you expose, and auditing what was actually sent, maps directly onto the discipline of scoping what enters an LLM prompt or fine-tuning corpus. The analogy breaks at revocation. In SCIM, you can deprovision an attribute: stop syncing it, remove it from the downstream schema, audit the change. In an LLM fine-tuned on your data, there is no equivalent operation. The data isn't in a table you can query or a field you can null out; it's distributed across model weights. Removing it requires retraining. The IDAM instinct that revocation is always available does not transfer here, and that gap is the lesson.

What the Specification Actually Says

Three Pathways, One Direction

The Control Set

ZDR claims come in two forms, and conflating them is a procurement mistake.

“

Okta Concept Mapping: Attribute Release in SCIM Provisioning