This paper explores the interplay between litigation, legislation, and infrastructures as regulation in relation to scraping news texts by artificial intelligence (AI). It delves into the pivotal role of scraping public texts from news websites in training AI, which raises conflicts over data generation and revenue distribution between news and AI companies. The current bargaining imbalances between the parties limit news companies from receiving adequate compensation, potentially undermining incentives for public news creation. The paper analyzes the limitations of litigation in addressing text-scraping disputes due to its lengthy, costly nature and fragmented US case law. It proposes targeted legislative interventions inspired by Australian and Canadian models regarding presenting news on digital advertising platforms. The three proposed regulatory measures are collaborative negotiations by the new companies, integrating AI technologies into future collaborations, and regulating Robot.txt and AI.txt infrastructures while embracing the fair use doctrine. These tools can improve the bargaining in the shadow of the law between news and AI companies by tipping the scale in favor of news companies. Despite their challenges, these regulatory measures suggest new avenues for value distribution between the news and AI companies in the ever-evolving technological landscape.

This paper was initially written for the Global Data Law course. It won second prize in the Berkeley Technology Law Journal writing competition 2024 and is forthcoming in that journal.

SSRN link

From Headlines to Al: Narrowing the Bargaining Gaps between News and AI Companies

Thomas Streinz