The first wave of mainstream generative AI tools were largely trained on “publicly available”data—basically anything you could scrape off the Internet. Now, training data sources are increasingly restricting access and pushing for licensing agreements. As the search for additional data sources intensifies, recent licensing startups have emerged to keep the source material flowing.
This Dataset Provider AllianceThe trade group, formed this summer, wants to make the AI industry more standardized and fair. To that end, it just released a position paper outlining its positions on key AI issues. The alliance is made up of seven AI licensing companies, including a music rights management company Authorizejapanese stock photo market Pixtaand a startup that licenses copyrights for generative AI Calliope Networks. (At least five recent members will be announced in the fall.)
The DPA advocates for an opt-in system, meaning that data can only be used with the explicit consent of creators and rights holders. This is a significant departure from the way most huge AI companies operate. Some have developed their own opt-out systems, requiring data owners to withdraw their work on a case-by-case basis. Others offer no opt-outs at all.
The DPA, which expects members to follow an opt-in policy, sees this as a much more ethical path. “Artists and creators should be on board,” says Alex Bestall, CEO of Rightsify and music data licensing company Global Copyright Exchangewho led the effort. Bestall sees opt-in as a pragmatic approach, as well as a moral one: “Selling publicly available data sets is one way to get sued and lose credibility.”
Ed Newton-Rex, a former AI executive who now runs the ethical nonprofit Fairly Trained, calls opt-outs “fundamentally unfair to creators,” adding that some may not even know when they’re being offered opt-outs. “It’s especially good to see the DPA calling for opt-ins,” he says.
Shayne Longpre, leader at Data Provenance Initiativea volunteer collective that audits AI datasets finds DPA’s efforts to ethically source data admirable, though it suspects the opt-in standard may be a tough sell given the huge amounts of data that most up-to-date AI models require. “Within this system, you’re either going to be data hungry or you’re going to have to pay a lot,” it says. “It may end up being that only a few players, the big tech companies, can afford to license all that data.”
In the article, the DPA opposes government-mandated licensing, favoring a “free market” approach in which data creators and AI companies negotiate directly. Other guidelines are more specific. For example, the alliance suggests five potential compensation structures to ensure that creators and rights holders are adequately compensated for their data. They include a subscription-based model, “usage-based licensing” (in which fees are paid for each apply) and “outcome-based” licensing, in which royalties are tied to profit. “These could work for anything from music to paintings to movies to TV to books,” Bestall says.
