The AI Audit Trail Your SOC 2 Report Can't Cover

AI disclosure artifacts, the three frameworks converging to require them, and where your SOC 2 audit instincts stop applying.

By Leigh Garrity— May 8, 2026

The AI Audit Trail Your SOC 2 Report Can't Cover

AI disclosure artifacts, the three frameworks converging to require them, and where your SOC 2 audit instincts stop applying.

A new genre of compliance artifact is forming around AI systems. Model cards, system cards, datasheets for datasets, training content summaries. These are structured disclosure documents that describe what an AI model was trained on, how it was tested, what it can and can't do, and how its behavior has changed over time. Three regulatory frameworks are converging to make them mandatory or strongly expected. Your SOC 2 instincts about documentation discipline will carry you into the conversation, right up to a specific, identifiable point where they stop working.

This piece covers the artifacts, the frameworks driving them, and exactly where the bridge to your existing audit vocabulary stops bearing weight.

The four disclosure artifacts

Four document types make up the emerging AI disclosure genre. None of them existed in any formal sense before 2018. All of them are now referenced, required, or presumed by at least one major regulatory framework.

Model cards are the foundational format. Introduced by Mitchell et al. at Google in 2019 (published in the proceedings of the ACM Conference on Fairness, Accountability, and Transparency), a model card is a structured document that ships alongside a trained model. Nine sections: Model Details, Intended Use, Factors, Metrics, Evaluation Data, Training Data, Quantitative Analyses, Ethical Considerations, and Caveats and Recommendations. Think of it as a nutrition label for a model. It tells you what went in, what the model is designed to do, how it performed under evaluation, and where the developers know it falls short. Critically, the original paper specified that model cards should track changes across versions, highlighting "the drastic ways that models can change over time." That version-tracking function turns out to be the most consequential thing about them for audit purposes.

System cards document the deployed system, not just the underlying model. OpenAI publishes these alongside major releases. Their GPT-5 system card and o3/o4-mini system card are representative examples, containing benchmark evaluations, adversarial testing results, safety scores by category, and model-to-model comparisons across versions. A system card covers the model plus its safety features, access policies, and operational constraints. A model card describes the engine. A system card describes the car. Worth noting: there is no formal standard for system card structure yet. OpenAI and Meta use different formats. Evolving practice, not settled specification.

Datasheets for datasets, introduced by Gebru et al. (published in Communications of the ACM, 2021) and inspired by electronics component datasheets, document the training data itself. Where it came from, how it was collected, how it was labeled, what was excluded, who maintains it. If the model card is the nutrition label, the datasheet is the supply chain audit for the ingredients.

Training content summaries are the newest and most consequential artifact. Under the EU AI Act, Article 53(1)(d), providers of general-purpose AI (GPAI) models must publish a summary of the content used for training, using a mandatory template issued by the AI Office in July 2025. The European Commission's Digital Strategy page confirms the template requires disclosure of data sources, top domain names, and data processing methods. Mayer Brown (an international law firm tracking AI Act implementation) notes the Commission calls it a "minimal baseline," meaning providers are bound to provide at least this much. Not voluntary.

Model card: a structured disclosure document covering a model's training, evaluation, intended use, and known limitations. Nine standard sections from Mitchell et al.; now referenced by multiple regulatory frameworks as a baseline for model transparency.
System card: a deployment-level document covering the model plus its safety features, adversarial test results, and operational constraints. Published by major providers alongside model releases but not yet formally standardized.
Datasheet: a supply-chain document for training data, covering provenance, collection methods, labeling, and maintenance. Introduced by Gebru et al. to make dataset characteristics auditable before they become embedded in model behavior.
Training content summary: a mandatory EU AI Act artifact disclosing training data sources via a Commission-issued template. The first legally required public disclosure of what went into a model's training.

Three frameworks converging on the same documentation stack

The artifacts above aren't floating free. Three frameworks are converging to require or strongly incentivize them. A public sector buyer in 2026 is likely operating under the influence of at least two.

The EU AI Act is the most prescriptive. Annex XI specifies the technical documentation GPAI providers must maintain for the AI Office: model architecture, training methodology, evaluation strategies and results, adversarial testing records, system architecture descriptions. This is the internal package, not public, but available on regulatory request. Annex XII governs what downstream providers receive: enough documentation about capabilities and limitations to comply with their own AI Act obligations.

GPAI providers whose models exceed 10²⁵ FLOPs in training compute (a threshold widely understood to capture GPT-4-class models and above) are classified as systemic risk and face additional requirements: mandatory model evaluations, adversarial testing, incident reporting to the AI Office, and a Safety and Security Framework submitted before release. Providers must notify the Commission within two weeks of meeting this threshold. The timeline is phased: GPAI model obligations started applying August 2, 2025. The AI Office gains full enforcement powers over GPAI providers on August 2, 2026. High-risk AI system obligations extend to August 2027. Noncompliance fines for GPAI providers run up to €15 million or 3% of global annual turnover, whichever is higher, per Article 101 of the AI Act (corroborated by WilmerHale's analysis of the implementing guidelines).

NIST AI RMF is voluntary but operationally influential, especially in U.S. federal procurement. Its four functions, Govern, Map, Measure, and Manage, generate specific documentation artifacts. The Measure function calls for risk measurement plans, bias testing results, red-teaming records, and explainability assessments, all documented as versioned, evidence-grade artifacts. The Manage function calls for incident disclosure records, drift detection logs, content provenance documentation, and structured decommissioning plans. NIST AI 600-1, the Generative AI Profile published July 2024, extends these with over 200 suggested actions across twelve risk areas specific to generative AI, including confabulation, information integrity, and human-AI configuration.

A caveat worth flagging: AI RMF 1.0 is under revision. NIST's own AI Resource Center confirms this, and a preliminary draft cybersecurity profile footnotes that "the AI RMF is currently in revision." The AI Action Plan directed NIST to remove references to misinformation, DEI, and climate change. Version 1.1 has not been published as of this writing, and 1.0 remains the operative document. If your buyer cites NIST AI RMF, they're working from 1.0. Know that the ground is shifting.

ISO/IEC 42001:2023 is the first certifiable AI management system standard, published December 2023. It follows the same Annex SL high-level structure as ISO 27001 (Clauses 4–10), which means organizations with an existing ISMS have a significant head start. Schellman (the first ANAB-accredited certification body for ISO 42001 in the U.S.) and practitioner analyses estimate roughly 60–70% of the management system scaffolding already exists if you have a working ISMS. The AI-specific additions are what matter: mandatory AI impact assessments, bias mitigation documentation, transparency obligations, and lifecycle monitoring. These are artifacts that have no analog in ISO 27001.

Certification requires a third-party audit by an accredited body, is valid for three years with annual surveillance audits, and typically takes nine to fourteen months from initiation to certificate (per Schellman's certification guidance). Most of that time goes toward generating six to twelve months of evidence that the controls work in practice. The audit itself is the shorter part.

The certified pool is still small. Microsoft, IBM Granite, Anthropic, AWS, KPMG, Darktrace, and Synthesia are among the organizations that have achieved certification. IBM completed its Granite certification in under three months with zero non-conformities. But the global total remained below fifty organizations through most of 2025, per practitioner estimates. That number is consistent with the auditor scarcity problem. ISO/IEC 42006 sets requirements for the certification bodies themselves, and the firm that did your last SOC 2 probably does not have the qualified personnel to audit an AI management system. Even if they say they can.

These three frameworks don't compete. They layer. Most programs use NIST AI RMF as the operational risk model inside an ISO 42001 management system, while EU AI Act compliance drives the specific documentation outputs. A buyer who mentions any one of them is likely aware of the others.

EU AI Act: the most prescriptive framework, requiring mandatory technical documentation (Annex XI), downstream transparency (Annex XII), and public training content summaries. GPAI enforcement powers activate August 2026; high-risk system obligations extend to August 2027.
NIST AI RMF: a voluntary but procurement-influential framework whose Measure and Manage functions generate versioned evaluation artifacts, drift detection logs, and incident records. Version 1.0 is operative; 1.1 revision is underway with no published timeline.
ISO/IEC 42001: the first certifiable AI management system standard, adding AI impact assessments, bias documentation, and lifecycle monitoring to the familiar ISO management system structure. Certified pool is growing but remains small relative to market demand.

Okta Concept Mapping: SOC 2 Audit Trail → AI Audit Trail

Your SOC 2 discipline — collecting control evidence over an observation window, maintaining auditor-ready documentation, demonstrating operational effectiveness — transfers directly to AI audit. ISO 42001's Clause 9 (Performance Evaluation) and Clause 10 (Improvement) will feel structurally familiar. The break: SOC 2 assumes the system under audit is fundamentally the same system at month twelve as at month one. AI audit documents a system that changes, and the change itself is what must be governed.

SOC 2 controls are static: access rules, encryption settings, availability SLAs. An AI system that has been retrained, fine-tuned, or simply operating on shifting input distributions is a different system. It has learned to behave differently. AI audit requires documenting training data lineage, model behavior drift across versions, adversarial test results, human oversight decisions, and post-deployment incidents. That last category sounds familiar. You know incident response. But an AI incident report looks nothing like a SIEM alert and a remediation ticket. It documents what the model produced, what training or deployment conditions contributed, and what corrective action was taken on the model itself, not the infrastructure around it. SOC 2 has no mechanism for any of these artifacts. The observation window is watching a system change and documenting what changed, why, and whether the change was governed.

Where this shows up in your conversations

A federal CAIO or CISO brings up AI documentation requirements in one of two contexts. The first is procurement: they're evaluating an AI-enabled product and want to know what disclosure artifacts the vendor provides. Model cards? System cards? Training data documentation? They're asking because the EU AI Act's downstream provider obligations mean they need sufficient documentation to comply with their own requirements, and because NIST AI RMF and OMB guidance are pushing similar expectations domestically.

The second context is internal governance: the agency is building or deploying AI systems and needs to establish an audit trail that satisfies NIST AI RMF and potentially ISO 42001.

In either case, the buyer's question is whether your platform supports the governance infrastructure around these artifacts. Identity is relevant where it touches human oversight records (who approved what decision), access controls on model training pipelines, and audit logging for AI system interactions. If the conversation stays in that lane, you're on solid ground. If it moves into training data provenance or model evaluation methodology, that's your SE's conversation.

The vocabulary to hold onto: model cards document the model, system cards document the deployment, datasheets document the data, and the three frameworks are converging on requiring all of them. Your SOC 2 instincts about documentation rigor are real and transferable. The artifacts themselves are entirely new. The EU AI Act's Annex XI calls this "technical documentation," which is a generous word for a fundamentally different kind of evidence about a fundamentally different kind of system.

Practical trigger: when a buyer raises AI documentation, they're usually asking about procurement compliance (what does the vendor disclose?) or internal governance (what audit trail do we need?). Identity is relevant to the governance infrastructure — oversight logging, access control, audit trails — and stops at the boundary of the model artifacts.
Escalation signal: conversations about training data lineage, model evaluation methodology, or adversarial testing results are SE conversations. Conversations about access controls, oversight logging, and audit infrastructure are yours.

Things to follow up on...

NIST AI RMF 1.1: The NIST AI Resource Center confirms the framework is under revision with directed changes to remove misinformation, DEI, and climate references, but no publication date has been set for the updated version.
EU high-risk deadline uncertainty: The European Commission's Digital Omnibus proposal may push the August 2026 high-risk AI system application date, linking it to the availability of harmonized standards that CEN/CENELEC failed to deliver on time.
ISO 42001 auditor scarcity: Fewer than fifty organizations held certification through most of 2025, and Schellman's lessons-learned analysis notes that most SOC 2 audit firms lack the qualified personnel to assess AI management systems under the new standard.
GPAI Code of Practice enforcement: The AI Office's Safety and Security Framework requirements for systemic-risk GPAI providers become enforceable in August 2026, making the next twelve months the window where voluntary compliance hardens into regulatory expectation.