What AI Disclosure Actually Requires — and Where Your Audit Trail Falls Short

By Carey Whitten— May 5, 2026

What AI Disclosure Actually Requires — and Where Your Audit Trail Falls Short

The Disclosure Artifacts

A model card is a structured document that accompanies a trained AI model, specifying its intended use cases, performance characteristics across demographic groups and operating conditions, known limitations, and evaluation methodology. The concept was formalized by researchers at Google in 2019 as a mechanism for communicating what a model can and cannot reliably do.

A system card extends that documentation to the deployed system — the model plus its integration context, safety mitigations, and operational constraints. Meta introduced system cards for its large language model deployments; the format has since been adopted more broadly for complex AI systems where the model alone doesn't tell the full story.

A data sheet (sometimes "datasheet for datasets") documents the provenance of training data: where it came from, how it was collected, what filtering or preprocessing was applied, and what populations or domains it may over- or under-represent. Dataset provenance — the documented lineage of training data — is increasingly treated as a first-order compliance artifact, not a technical footnote.

These three document types constitute the emerging disclosure genre. Evidence artifacts, not marketing materials.

Where Three Frameworks Land

The EU AI Act's Article 13 imposes transparency obligations on providers of high-risk AI systems. Deployers must receive documentation sufficient to understand the system's purpose, the logic underlying its outputs, its performance metrics across relevant conditions, and its known risks. This documentation obligation runs to the deployer — the agency or enterprise putting the system into production — not just to a regulator. The Act applies to systems deployed in the EU market regardless of where the provider is headquartered, a extraterritorial reach that means U.S. federal vendors selling to EU-connected programs are inside scope.

The NIST AI Risk Management Framework structures AI governance across four functions: Govern, Map, Measure, and Manage. The Measure function is where disclosure artifacts live — it requires that AI risks be analyzed and assessed through testing, evaluation, and ongoing monitoring, and that the methodology for that analysis be documented. The Govern function requires accountability structures and policies that reference those measurements. NIST's framing is voluntary for most U.S. contexts, but federal agencies are increasingly incorporating AI RMF alignment into acquisition requirements.

ISO/IEC 42001, published in late 2023, establishes an AI management system standard analogous to ISO 27001 for information security. It requires documented AI impact assessments, defined processes for model evaluation, and structured post-deployment monitoring — systematic observation of AI system behavior after it enters production, including mechanisms for detecting performance drift or unexpected outputs. Certification against 42001 is beginning to appear in enterprise procurement requirements as a proxy for AI governance maturity.

All three frameworks arrive at the same place: evaluation methodology, dataset provenance, and post-deployment monitoring must be documented, not just practiced. An organization that tests its models rigorously but keeps no structured record of how that testing was conducted, against what benchmarks, using what data, cannot satisfy any of these frameworks. The practice without the paper doesn't count.

When You'll Encounter This

In a public sector account, the scenario looks like this: an agency issues an RFI or acquisition vehicle that includes AI governance requirements, and a vendor responds with a SOC 2 Type II report as its primary compliance evidence. The contracting officer or CISO asks for the model card. The vendor doesn't have one, or has a marketing-facing document that doesn't address evaluation methodology or dataset provenance. That gap is becoming a disqualifying condition in more sophisticated procurements, particularly those touching high-risk use cases like benefits adjudication, fraud detection, or law enforcement analytics.

Listen for: "What documentation do you have about how the model was trained and tested?" A SOC 2 report doesn't answer that question.

Okta Concept Mapping

The natural IDAM anchor here is the SOC 2 narrative — the structured attestation that control objectives are met, supported by evidence of control operation. SOC 2 works because the systems it audits are deterministic: MFA is either enforced or it isn't, encryption is either enabled or it isn't, access reviews either happened or they didn't. Control states are binary and auditable.

AI audit trails break this model at the foundation. A probabilistic model doesn't have a control state for "produces accurate outputs" — it has a distribution of behaviors across an input space that no audit can fully enumerate. The decision chain that produced a specific output isn't a log entry you can retrieve; it's an emergent property of billions of weighted parameters responding to a specific context. SOC 2 control frameworks have no mechanism to represent this. Model cards, system cards, and data sheets exist precisely because the compliance community needed a new artifact type for a new kind of system — one where the evidence isn't "the control was operating" but "here is what we tested, how we tested it, and what we observed." The analogy to SOC 2 holds for the governance intent; it fails at the level of what evidence is technically possible to collect.

Disclosure standards in this space are still forming. The EU AI Act's implementing acts are pending as of this writing, and NIST continues to publish AI RMF profiles for specific sectors. Treat specific implementation thresholds as subject to change; treat the direction of travel as settled.