The only exception is UMG v. Anthropic case, because at least initially, earlier versions of Anthropic generated lyrics for the songs included in the output. This is a problem. The current status of this case is that protections have been put in place to prevent this from happening, and the parties have sort of agreed that until the case is resolved, these protections will be sufficient, so they are no longer seeking a preliminary injunction.
Ultimately, the more arduous question for AI companies isn’t Is participation in training legal? His what do you do when the AI produces output that is too similar to a particular work?
Do you expect most of these cases to go to court, or do you see a resolution on the horizon?
Perhaps some settlements will appear. I expect settlements with immense players who either have immense amounts of content or content that is particularly valuable. The Recent York Times may end up in a settlement and licensing agreement, perhaps in which OpenAI will pay money to exploit Recent York Times content.
There’s enough money at stake that we’ll likely get at least a few rulings setting the parameters. I feel like the class action plaintiffs have stars in their eyes. There are many class action lawsuits, and I suspect that defendants will resist them in hopes of winning summary judgment. It is not obvious that they will appear in court. Supreme Court in Google v. Oracle the case pushed fair exploit law very strongly toward resolution by summary judgment rather than by jury. I think AI companies will make every effort to resolve these cases on summary judgment.
Why would it be better for them to win on summary judgment rather than a jury verdict?
It’s faster and cheaper than going on a trial period. Artificial intelligence companies worry that they won’t be perceived as popular, that many people will think: Oh, you made a copy of a work that should have been illegal and not to delve into the details of the fair exploit doctrine.
Many agreements have been made between AI companies and media, content providers and other rights holders. In most cases, these deals seem to be more about search than about entry-level models, or at least that’s how it’s been described to me. Do you think it is legally mandatory to license content for exploit in AI search engines – where answers are obtained through enhanced search generation or RAG? Why do they do it this way?
If you’re using luxurious search generation for targeted, specific content, the fair exploit argument becomes more arduous. An AI-generated search is much more likely to produce text in the results that has been taken directly from one specific source, and much less likely to constitute fair exploit. Meaning could be, but the risky thing is that it’s much more likely to compete with the original source material. If, instead of directing people to a Recent York Times article, I give them an AI tip that uses RAG to pull text straight from the Recent York Times article, it will look like a substitute that could harm the Recent York Times. The legal risk is greater for an AI company.
What do you want people to know about the copyright fights for generative AI that they may not already know or may have been misinformed about?
What I hear most often, which is technically wrong, is the idea that these are simply plagiarism machines. All they do is take my stuff and then process it as text and replies. I hear what a lot of artists and a lot of laymen say, but it’s just technically wrong. You can decide whether generative AI is good or bad. You can decide whether it is legal or illegal. But it’s actually a completely fresh thing that we haven’t experienced before. Just because you have to practice on a lot of content to understand how sentences work, how arguments work, and understand various facts about the world, doesn’t mean it’s just copying and pasting things or making a collage. It’s really generating things that no one could have expected or predicted, and it’s giving us a lot of fresh content. I think it’s vital and valuable.