AI Copyright Goes to Court: The Rulings That Will Shape the Next Decade of Tech
Three landmark cases are working through US courts right now. The outcomes will determine whether AI companies owe billions in licensing fees — or whether training on public data remains fair use.
The $100 Billion Question
Is training an AI model on copyrighted data “fair use” or infringement? The answer to this question is worth somewhere between $0 and $100 billion, depending on which side wins.
Three cases currently making their way through US courts will set the precedent:
NYT v. OpenAI: The New York Times is arguing that OpenAI’s models can reproduce Times articles nearly verbatim, proving they memorized copyrighted content. OpenAI argues that training is transformative fair use — the model creates something new, not copies.
Getty v. Stability AI: Getty Images claims Stable Diffusion was trained on millions of Getty’s copyrighted images without permission. Stability AI argues that learning from images is the same as a human artist studying art — observation, not copying.
Authors Guild v. Meta: A class action by thousands of authors claiming Llama was trained on pirated books. Meta’s defense: the training data question is separate from the model’s outputs.
Why This Isn’t Straightforward
The honest answer is that copyright law wasn’t designed for this situation. The existing framework has to answer questions it was never meant to address:
Is a model a “copy”? When a model trains on a book, it doesn’t store the book. It adjusts billions of numerical weights based on statistical patterns in the text. Is that a copy? Legally, nobody knows.
Is training “transformative”? Fair use protects transformative works — things that create new meaning or purpose. A model that generates completely original text from patterns learned from copyrighted data is arguably the most transformative thing in copyright history. Or it’s the most sophisticated plagiarism engine ever built.
Who’s the infringer? If a user prompts ChatGPT to “write something in the style of Stephen King” and it produces something suspiciously similar, is OpenAI liable? Or the user? Or neither?
The Possible Outcomes
Scenario A: Training is fair use — AI companies win. No licensing fees. The current model (train on everything, ask forgiveness later) continues. Content creators get nothing unless they can prove direct, verbatim copying in outputs.
Scenario B: Training requires licensing — Content owners win. Every AI company needs to negotiate licensing deals with publishers, image agencies, and record labels. Costs increase dramatically. Small AI companies can’t afford to compete. The big players lock up exclusive data deals.
Scenario C: Compulsory licensing (the most likely outcome) — A court or Congress creates a standardized licensing framework. AI companies pay a per-token fee into a collective fund, similar to how radio stations pay ASCAP/BMI. Nobody’s fully happy, but the system works.
What Nobody’s Talking About
The irony of the copyright debate is that it might not matter in 18 months. Synthetic data is improving so quickly that future models might not need human-created training data at all. If AI models can generate their own training data, the copyright question becomes moot.
But the current models — the ones making billions in revenue right now — were all trained on copyrighted data. The retroactive liability is what keeps AI company lawyers up at night.
What to Watch
- The NYT v. OpenAI ruling (expected late 2026) — this will set the strongest precedent
- Congressional action on AI copyright — several bills are in committee
- Whether AI companies proactively offer licensing deals to preempt court rulings
- The EU’s approach (stricter by default — the AI Act requires training data transparency)
Copyright law is about to have its Napster moment. The question isn’t whether the music is free — it’s whether anyone can build a business on it without paying the musicians.