Sensitive Information Disclosure is the risk that an LLM reveals data it shouldn't have, through three distinct mechanisms: the model reproduces data baked into its weights during training, an attacker manipulates a live session to exfiltrate data the model can reach at runtime, or someone extracts the system prompt and whatever secrets a developer mistakenly embedded in it. The risk climbed from #6 to #2on the 2025 OWASP LLM Top 10. That's the single largest jump on the list, and the reason has nothing to do with novelty. Organizations started deploying LLMs against real data and discovered that the controls they'd spent two decades building cover only part of the exposure.
If you sell identity and access management, two of those vectors will feel familiar. The third is where the conversation earns its keep.
Three vectors. Let's take them in order.
Training data memorization
During training, an LLM processes massive volumes of text and encodes statistical patterns into its model weights. Sometimes those patterns are verbatim. The model learns the structure of a phone number. It also learns that a specific person has a specific phone number, because that data appeared in the training corpus frequently enough, or in a distinctive enough context, to get baked directly into the weights.
The research base here is concrete. The group that established the empirical foundations of LLM memorization, led by Nicholas Carlini across Google, Stanford, OpenAI, and Berkeley, extracted hundreds of verbatim sequences from GPT-2 in a peer-reviewed study published at USENIX Security 2021. The extracted data included names, phone numbers, email addresses, physical addresses, and data that had since been removed from the public internet. A follow-up study at ICLR 2023 quantified the relationships that predict when memorization happens: it scales log-linearly with model size, with how many times a data point was duplicated in training, and with prompt length. Bigger models memorize more. Repeated data memorizes more easily. Longer prompts give the model more runway to reproduce what it absorbed.
The finding worth anchoring on: the same research group attacked production ChatGPT using a divergence technique that caused the model to drop its chat persona and emit raw training data at 150 times the normal rate. Total cost of the attack was under $300. All the alignment work that makes ChatGPT sound like a helpful assistant did not prevent the memorized data from being extractable underneath. The paper was presented at NeurIPS 2023, and its core finding is blunt: current alignment techniques do not eliminate memorization.
Fine-tuning makes this worse in exactly the way you'd expect. When an organization fine-tunes a model on its own data, the memorization risk concentrates on precisely the data the organization cares most about protecting. A CMU technical report showed fine-tuned models memorizing specific sensitive values like credit card numbers from training data, creating a direct extraction path for anyone with access to the model. For agencies considering fine-tuning models on government data, this is the risk that should shape the conversation before the project starts.
This vector matters most for your buyer conversations. It's the one where everything you know about identity stops working.
Your instinct here is access lifecycle. Data shouldn't be accessible? Revoke the credential, terminate the session, update the policy. Training data memorization breaks that model completely. The data has been absorbed into the model's weights. There is no token to expire, no session to kill. The model weights that encode the memorized data are the model itself. You'd have to retrain or discard it entirely, and even retraining doesn't guarantee removal. This is where your OAuth intuition stops helping. The memorized data is part of the model's structure. It's in the concrete the building is made of.
Section recap:
- Memorization: LLMs can encode verbatim training data into model weights. Larger models, duplicated data, and fine-tuning increase the risk. Once memorized, data cannot be revoked through access controls because it exists as part of the model itself, not as a retrievable record behind an authorization boundary.
Inference-time exfiltration
This vector will feel more like home. Inference-time exfiltration happens when an attacker manipulates a live LLM interaction to extract data the model can reach at runtime. No memorized training data involved. The model is being tricked into surfacing data from its current context: documents it was given to summarize, emails it was asked to search, files in a connected repository.
EchoLeak (CVE-2025-32711) is the production example that clarifies the pattern. A zero-click vulnerability in Microsoft 365 Copilot, disclosed by researchers at Aim Security in June 2025. An attacker sent a single crafted email containing hidden prompt injection instructions. When Copilot processed the email, it followed the injected instructions, accessed data within its scope (chat logs, OneDrive files, SharePoint content, Teams messages), and exfiltrated it through an image auto-fetch technique that bypassed Microsoft's content security policy. No user interaction required. The vulnerability received a CVSS score of 9.3 from NIST NVD, and Microsoft patched it server-side as part of June 2025 Patch Tuesday. The researchers published their full methodology, documenting the chain of bypasses that made the attack possible.
EchoLeak's severity is obvious. The detection problem is harder. The exploit operates entirely in natural language. No malware hash, no suspicious file download, no anomalous network signature. Standard EDR and SIEM tools would struggle to confirm an attack occurred, because the exfiltration looks like normal Copilot behavior.
The pattern repeated. In January 2026, researchers demonstrated a similar technique against Google Gemini using crafted calendar invitations that instructed the assistant to leak summaries of private meetings. Bleeping Computer independently confirmed the attack vector. Days later, Varonis Threat Labs disclosed Reprompt (CVE-2026-24307), a single-click attack against Microsoft Copilot Personal that maintained data extraction persistence even after the session closed. SC World corroborated the disclosure, noting that enterprise customers using Microsoft 365 Copilot were not affected due to additional tenant-level controls.
All three attacks share the same geometry: the attacker injects instructions into a context the LLM will process. The LLM already has the access. The attacker redirects it.
This maps to a problem you already know how to talk about: an integration account with broader access than it needs, running without adequate monitoring. The LLM is the service account. The fix follows the same principle — least privilege on what the model can reach, segmentation of data sources, runtime monitoring of what it actually touches. Your IDAM language works here. The gap is that most organizations haven't applied it yet, because they're still treating the LLM as a user-facing tool rather than a privileged integration.
Section recap:
- Inference-time exfiltration: Attackers manipulate live LLM interactions to redirect the model's existing data access. The model acts as a proxy, accessing and exfiltrating data it was legitimately connected to. Least-privilege principles and runtime monitoring apply directly.
System prompt exposure
System prompts are the instructions developers write to configure an LLM's behavior before the user sees it. They define the model's persona, its constraints, its operational boundaries. The problem is that developers sometimes embed information in system prompts that was never meant to be user-facing: API endpoints, escalation procedures, internal business logic, and occasionally actual credentials. In one documented case, a researcher tricked Google's Gemini CLI into publishing its own API key as a GitHub issue comment by injecting a fake "trusted content section" into an issue body. Google paid a bug bounty. No CVE was issued.
Extracting system prompts ranges from trivial to moderately difficult depending on the model and its guardrails. Early ChatGPT versions would reveal their system prompt if you asked nicely. Current models resist direct requests, but techniques like asking the model to encode its instructions in Base64, or to "translate" its instructions into another format, continue to work against many deployments.
The OWASP LLM02 entry recommends adding restrictions within the system prompt about what data types the model should return, but notes that such restrictions "may not always be honored and could be bypassed via prompt injection." The spec calls this "mitigation," which is a generous word for what's actually happening.
The control is architectural. Treat system prompts as eventually public. Never embed secrets in them. If a credential needs to be available at inference time, retrieve it through a secure runtime integration at the moment it's needed. Don't bake it into the prompt where any sufficiently creative user can extract it.
Section recap:
- System prompt exposure: System prompts are extractable. Guardrails reduce casual exposure but do not prevent determined extraction. The control is design-time hygiene: never put anything in a system prompt that you wouldn't put in a public README.
Zero-data-retention and the verification gap
When your buyer asks "how do we keep our data out of the model?" they're usually asking about two things at once without distinguishing them: training (will our data improve the model?) and retention (how long does the provider keep our inputs and outputs?). Different questions. Different answers.
On training, the major providers have drawn clear commercial lines. OpenAI's API terms state that API data is not used for training. Anthropic's commercial terms explicitly exclude enterprise customers from training use. Azure OpenAI inherits Microsoft's commercial data protection commitments.
Retention is where it gets complicated.
| Provider | Default Retention | ZDR Availability | ZDR Conditions |
|---|---|---|---|
| OpenAI API | Up to 30 days (abuse monitoring) | Prior approval required; eligible endpoints only | OpenAI reserves the right to make future models ineligible for ZDR if "reasonably necessary to investigate severe risk activity" |
| Azure OpenAI | Varies by configuration | Enterprise Agreement or Microsoft Customer Agreement + support ticket | Pay-as-you-go customers not eligible |
| Anthropic API | 7 days (reduced from 30 days, Sept 2025 per policy analysis) | Reduced retention, not full ZDR | Consumer product simultaneously shifted to training-use-by-default; users who missed the Sept 2025 opt-out window had data included |
That last row. The same model, accessed through the same browser, on the same laptop, follows completely different data handling rules depending on whether the user authenticated with an enterprise SSO credential or a personal account. Same tool. The login determines which rules apply. For government accounts handling CUI or operating under data classification requirements, this is an operational boundary with compliance consequences, and it's enforced entirely at the authentication layer.
Now: can any of this be independently verified? Honest uncertainty is warranted here. Vendor ZDR commitments are backed by contractual representations and operational controls audited through SOC 2 Type II reports. But no primary source in the public record shows an independent third-party technical audit of the specific ZDR enforcement mechanism, as distinct from the broader operational controls that SOC 2 covers. A SOC 2 report confirms that the vendor has controls and follows them. It does not confirm that a specific API call's content was provably not persisted anywhere in the provider's infrastructure.
The industry doesn't have a standard for that yet. If your buyer needs that distinction drawn, draw it clearly. They'll respect you for it.
This resembles the trust decision in SAML federation: you're relying on the IdP's assertion that the user is who they claim to be, backed by contractual and operational commitments, not by your ability to independently verify the IdP's authentication mechanism in real time. The analogy holds well. The difference is maturity — in federation, the trust framework is decades old and the audit mechanisms are well-understood. In AI data retention, the trust framework is still forming.
Section recap:
- Zero-data-retention: Major providers offer contractual no-training and reduced-retention commitments for commercial customers. These are real but not independently verifiable at the technical level. The authentication context (enterprise vs. personal login) determines which data handling rules apply. The gap between contractual and technically verifiable ZDR is an open problem the industry hasn't solved.
When you'll need this
The buyer question that triggers this conversation is some version of: "If we deploy this, what happens to our data?" The answer depends on which vector they're worried about, and most buyers are conflating all three.
If they're worried about their data ending up in a future model version, the answer is contractual. Commercial API terms from major providers prohibit training use, and ZDR options exist for retention. Push them to verify which tier of service they're actually on and whether ZDR is enabled, not just available.
If they're worried about runtime data exfiltration, the answer is architectural. Least-privilege access for the model, prompt-boundary DLP that inspects both inputs and outputs, runtime monitoring that can detect anomalous data flows. Your identity and access management expertise translates most directly here.
If they're worried about training data memorization in a model they're fine-tuning, or a model trained on data they can't fully account for, the honest answer is that access controls don't solve this. Data sanitization before training reduces the risk. Differential privacy techniques during training reduce it further. But once data is in the weights, it's in the weights. The control lives upstream of deployment.
Sensitive Information Disclosure jumped to #2 because organizations are encountering all three vectors simultaneously. The tools they built for the first twenty years of data protection address the middle one. Your job in the conversation is to help the buyer see all three clearly enough to ask for the right controls.
Things to follow up on...
- Traditional DLP's blind spot: AI-aware DLP requires semantic classification and prompt inspection that legacy file-scanning tools were never built for, and one illustrative walkthrough shows how 12 patient records can leave an organization with every DLP check passing green.
- Agentic data propagation risk: As LLM agents chain tool calls across planning, research, and reporting steps, sensitive data accumulates at each hop at machine speed, creating uninspected data flows that prompt-boundary DLP hasn't caught up to yet.
- Bug bounties without CVEs: Anthropic, Google, and Microsoft have all paid bug bounties for prompt injection exploits without issuing formal CVEs, which means traditional vulnerability tracking understates the real-world frequency of these attacks.
- FedRAMP AI prioritization criteria: GSA announced in August 2025 that FedRAMP will prioritize authorization of AI-based cloud services meeting enterprise-grade SSO, SCIM provisioning, and guaranteed data separation requirements, directly linking identity controls to AI procurement eligibility.

