In the coming weeks, Reddit will begin blocking most automated bots from accessing its public data. You’ll need to enter into a licensing agreement, as Google and OpenAI have done, to exploit Reddit content for model training and other commercial purposes.
Although he has it technically speaking Reddit’s policy is already in place, and now the company is enforcing it updating the robots.txt filea fundamental part of the Web that defines how crawlers can access a site. “It’s a signal to those who don’t have a contract with us that they shouldn’t have access to Reddit data,” says the company’s chief legal officer, Ben Leetells me. “It also sends a signal to bad actors that the word ‘allow’ in a robots.txt file does not, and never has, meant they can use the data however they want.”
