Legal status as of May 5, 2026. All case citations flagged for triggered accuracy review — outcomes in several proceedings remain pending. This piece provides vocabulary for procurement conversations, not legal advice.
When a CAIO's legal team flags "copyright concerns" about an AI deployment, they are almost certainly asking two distinct questions without knowing it. The first is about training. The second is about how the training data was obtained. These questions sit on different legal surfaces, have different evidentiary records, and — critically — are addressed very differently in the indemnification clauses your buyer is about to ask you to explain.
Getting them confused in that conversation is the kind of credibility problem that follows you into the next quarter.
The Exposure, Precisely
Copyright infringement in the AI context can arise at two points: when a model is trained on copyrighted material, and when that material was acquired for training without authorization. Fair use — a legal doctrine that permits limited use of copyrighted material without permission under certain conditions, including transformation of the original work — is the primary defense on the training side. It is not a defense against unlicensed acquisition. These are separate legal theories, with separate evidentiary requirements and separate litigation tracks. Conflating them is the most common error in vendor conversations about AI copyright risk.
The Training Question
Courts reviewing AI training have generally applied a transformative use analysis — asking whether the use adds new meaning, expression, or message rather than simply reproducing the original. In Authors Guild v. Google (2d Cir. 2015), the court found that indexing and displaying snippets of books was transformative, a precedent that AI defendants have leaned on heavily. More recently, in Andersen v. Stability AI (N.D. Cal., filed 2023), plaintiffs alleged that training image-generation models on scraped artwork constituted infringement; as of this writing, the case has survived early dismissal motions but no final ruling on the fair use question has issued. Similarly, The New York Times Co. v. OpenAI (S.D.N.Y., filed 2023) remains in active litigation, with the training question unresolved at the merits level.
Courts have not definitively ruled that AI training is fair use. What exists is a body of lower-court decisions that have declined to dismiss training-based claims outright while also declining to rule against defendants on fair use at the pleading stage. That is meaningfully different from settled law. Anyone telling your buyer that training is "clearly fair use" is overstating the record.
The Source-Acquisition Question
This is where the legal landscape shifts. Several platforms and vendors have reached licensing arrangements or settlements with rights holders over how training data was gathered — not over whether training itself infringed, but over whether the underlying data was lawfully obtained in the first place. Getty Images' litigation against Stability AI, filed in both U.S. and U.K. courts, centers substantially on alleged unauthorized scraping of licensed image libraries. The U.K. proceeding has advanced further procedurally; the U.S. case remains pending as of this writing. Separately, a number of news organizations and publishers have reached licensing agreements with major AI developers — arrangements that, while framed as forward-looking partnerships, implicitly acknowledge that prior scraping was legally contested.
The source-acquisition question is not a fair use question. It involves potential violations of terms of service, the Computer Fraud and Abuse Act in some readings, and direct copyright claims that don't depend on whether training is transformative. Vendors whose training pipelines relied on bulk web scraping without rights clearance carry a different risk profile than vendors who licensed their corpora. Your buyer's legal team may not be distinguishing between these tracks. You should be.
What Indemnification Actually Covers
Major AI vendors — Microsoft, Google, OpenAI, Adobe — have published indemnification commitments for enterprise customers, generally covering third-party copyright claims arising from AI-generated output. Microsoft's Copilot Copyright Commitment, for example, covers claims that outputs infringe third-party intellectual property, subject to conditions including that the customer used the product as designed and didn't intentionally prompt for infringing content.
What these clauses typically do not cover: claims arising from the vendor's own training data acquisition practices. That exposure sits upstream of the customer relationship. If Getty Images prevails against a vendor on scraping claims, the vendor's enterprise indemnification clause does not protect the vendor's customers from secondary exposure — and it does not protect the vendor itself from that upstream liability. The indemnification runs downstream, not upstream.
When your buyer's legal team asks whether the indemnification clause covers "copyright risk," the right response is to ask which surface they mean. Output indemnification is real and increasingly standardized. Training-data provenance indemnification is not a thing any major vendor currently offers.
IDAM Concept Mapping
The closest structural analogy in identity infrastructure is federated trust chain liability. When your organization accepts SAML assertions from an upstream identity provider, you inherit the quality of that provider's authentication decisions — if their validation was flawed, your relying party inherits the problem. AI training data provenance works similarly: you're trusting that the vendor's upstream data acquisition was clean. Where the analogy breaks is auditability. In federated identity, the trust chain is explicit, documented, and governable through metadata and federation agreements. In AI, the training corpus is rarely auditable by the deploying organization — there is no SBOM equivalent for what a model learned from. You're accepting a trust assertion with no way to inspect the chain behind it.
In your next procurement conversation, "copyright risk" will come up. The buyer's legal team will be worried about a specific surface; the vendor's indemnification clause addresses a specific surface. Most clauses cover output liability. Most legal teams, when they say "copyright," are thinking about training. Naming that gap before they do is the work.
Verification date: May 5, 2026. Readers should verify current case status in Andersen v. Stability AI, The New York Times Co. v. OpenAI, and Getty Images v. Stability AI before using this material in compliance or procurement contexts. This piece will be updated upon material court decisions in any of these proceedings.

