Reddit is suing Perplexity and three “data collection service providers” to “stop the unlawful, industrial-scale circumvention of data protection measures by a group of bad actors who will stop at nothing to obtain Reddit’s valuable copyrighted content.” according to the complaint.
The company likens data harvesting companies SerpApi, Oxylabs and AWMProxy to “would-be bank robbers” who “knowing they can’t get into a bank’s vault, break into an armored truck carrying cash instead.” Reddit says Perplexity is a client of “at least one” data harvesting company, saying it will “apparently do anything to get the data from Reddit it desperately needs to power its ‘response engine’ – that is, everything other than make a deal directly with Reddit, as some of its competitors have done.”
According to the lawsuit, in May 2024, Reddit sent Perplexity a cease-and-desist letter “demanding that it stop collecting data from Reddit.” Although Perplexity told Reddit at the time that it did not use Reddit content to train AI models and that it would respect Reddit’s robots.txt file, the number of citations to Perplexity on Reddit actually increased after this writing. Reddit also created a post that only Google could index, and “within hours,” Perplexity “produced the content” of that post, the company claims.
“The only way Perplexity could have obtained this content from Reddit and then used it in its ‘response engine’ is if it and/or its co-defendants pulled Google’s SERPs for this content from Reddit, and Perplexity then quickly incorporated that data into its response engine,” Reddit writes.
“AI companies are engaged in an arms race for high-quality human content, and this pressure is fueling an industrial-scale data laundering economy,” Ben Lee, Reddit’s chief legal officer, said in a statement. “Scrapers bypass technological security measures to steal data, then sell it to customers hungry for training materials. Reddit is a prime target because it is one of the largest and most animated collections of human conversations ever created.
“Defendants Oxylabs UAB, AWM Proxy, and SerpAI – a Lithuanian data harvesting company, a former Russian botnet, and a company that openly advertises its dubious security circumvention tactics – are textbook examples of this illegal behavior,” Lee says. “Unable to scrape Reddit directly, they mask their identities, hide their locations, and hide their web scrapers in order to steal Reddit content from Google Search. Perplexity is a willing customer of one or more of these scrapers, choosing to purchase the stolen data rather than entering into a legal contract with Reddit itself.”
“Perplexity has not yet been served with a lawsuit, but we will always vigorously fight for users’ rights to free and fair access to public knowledge,” says Jesse Dwyer, Perplexity’s chief communications officer Edge. “Our approach remains principled and responsible as we provide fact-based responses using accurate AI, and we will not tolerate threats that undermine openness and the public interest.”
