Apple, Nvidia, and Anthropic Used Thousands of Captured YouTube Videos to Train AI

Share

In response to the lawsuits, defendants such as: Finish, OpenAIAND Bloomberg argued that their actions constituted fair exploit. The case against EleutherAI, which originally scraped the books and made them public, was voluntarily dismissed by the plaintiff.

Litigation in the remaining cases remains in its early stages, leaving issues of permission and payment unresolved. Pile has since been removed from the official download site, but is still available on file-sharing services.

“The tech companies have been brutal,” said Amy Keller, a consumer lawyer and partner at DiCello Levitt, who has filed lawsuits on behalf of creators whose work was allegedly taken by artificial intelligence companies without their consent.

“People are concerned that they didn’t have a choice in the matter,” Keller said. “I think that’s really problematic.”

Parrot parroting

Many creators feel uncertain about the path forward.

YouTubers have full-time jobs patrolling public spaces to detect unauthorized uses of their work, regularly filing takedown requests, and some fear it’s only a matter of time before AI is able to generate content similar to theirs—or even complete copies of it.

Pakman, the creator David Pakman’s Programrecently saw the power of AI while scrolling through TikTok. He came across a video labeled as a Tucker Carlson clip, but when Pakman watched it, he was surprised. It sounded like Carlson, but it was, word for word, what Pakman said on his YouTube show, right down to the beat. He was equally disturbed that only one of the video’s commenters seemed to recognize it was counterfeit — a clone of Carlson’s voice reading Pakman’s script.

“This will be a problem,” Pakman said in YouTube video he said of the counterfeit. “You can do it with basically anyone.”

EleutherAI Co-Founder Sid Black he wrote on GitHub that he created YouTube Subtitles using a script. The script retrieves subtitles from the YouTube API in the same way a YouTube viewer’s browser retrieves them when watching a video. According to the documentation on GitHub, Black used 495 search terms to reject videos, including “funny vloggers,” “Einstein,” “black protester,” “protective social services,” “infowars,” “quantum chromodynamics,” “Ben Shapiro,” “Uyghurs,” “fruitarian,” “cake recipe,” “Nazca lines,” and “flat earth.”

Although YouTube’s terms of service to prohibit access to your videos via “automated means” over 2000 GitHub users bookmarked the code or supported it.

“There are many ways YouTube could prevent this module from working, if that’s what they wanted,” machine learning engineer Jonas Depoix wrote in discussion on GitHub, where he posted the code Black used to access YouTube captions. “This has never happened before.”

In an email to Proof News, Depoix said he hadn’t used the code since he wrote it as a university student for a project a few years ago and was surprised people found it useful. He declined to answer questions about YouTube’s policies.

Google spokesman Jack Malon said in an emailed response to a request for comment that the company “has taken steps over the years to prevent misuse and unauthorized scraping.” He did not respond to questions about the material being used by other companies as training data.

Among the videos used by AI companies are 146 Einstein’s parrotchannel with nearly 150,000 subscribers. The parrot’s keeper, Marcia, who did not want to give her last name for fear of endangering the celebrated bird’s safety, said that at first she thought it was witty to learn that the AI models had learned the words of the mimicking parrot.

“Who would want to use a parrot’s voice?” Marcia said. “But I know he speaks very well. He speaks my voice. So he parrots me, and the AI parrots the parrot.”

Once the AI has absorbed the data, it can’t be unlearned. Marcia was concerned about all the unknown ways the information about her bird could be used, including creating a digital copy of the parrot and, she worried, making it swear.

“We are entering uncharted territory,” Marcia said.

The AI Sckool

Categories

Apple, Nvidia, and Anthropic Used Thousands of Captured YouTube Videos to Train AI

Parrot parroting

3 questions: Beyond data-driven aesthetics

Almost anyone can now sell you GLP-1 on the Internet

7 Real Python Projects You Can Build in 2026 (with Guides)

Start building with Nano Banana 2 Lite and Gemini Omni Flash

Trump Administration Lifts Export Controls on Anthropic’s Mythos and Fable AI Models

More News

What’s going on with Alexa+?

The winter storm tested power grids that are strained to accommodate AI data centers

Google DeepMind employees ask leaders to ensure their “physical safety” from ICE

Google Photos now lets you describe how to turn images into videos

3 questions: Beyond data-driven aesthetics

Almost anyone can now sell you GLP-1 on the Internet

7 Real Python Projects You Can Build in 2026 (with Guides)