Cloudflare Blocks AI Crawlers by Default, Launches Pay-for-Access Model for Publishers

Cloudflare, which handles approximately 16% of global internet traffic, announced Tuesday that it will begin blocking known artificial intelligence web crawlers by default, a significant move in the growing battle between publishers and AI developers over data scraping and content usage.

Effective immediately, new domains registering with Cloudflare will be asked whether they wish to permit AI crawlers. By default, the company's system will block scrapers it identifies as bots used by AI firms such as OpenAI and Google, even if those bots respect a site's robots.txt file.

"AI crawlers have been scraping content without limits. Our goal is to put the power back in the hands of creators, while still helping AI companies innovate," said Cloudflare CEO Matthew Prince in a statement Tuesday.

The company also introduced a new commercial offering called "Pay Per Crawl," which allows participating publishers to set a price for AI firms seeking access to their websites. The initiative is currently available to a limited number of major outlets, including The Atlantic, Fortune, Stack Overflow, Quora, and the Associated Press.

AI crawlers are automated bots used by companies like OpenAI and Google to collect vast amounts of content to train large language models. But as these bots increasingly divert traffic away from original publishers by generating AI summaries and answers, many sites have begun seeking ways to restrict unauthorized scraping and recover lost ad revenue.

Cloudflare had previously rolled out opt-in crawler-blocking features in 2023 and later introduced technology to trap bots in an "AI Labyrinth" - a method designed to frustrate web scrapers by sending them into looping, dead-end code.

Tuesday's update goes further by enabling these protections as the default for all new Cloudflare users. The company says it will also begin verifying AI companies' crawler identities and requesting that they disclose whether scraped content will be used for training, inference, or search purposes.

"This is about safeguarding the future of a free and vibrant Internet with a new model that works for everyone," Prince said during an Axios Live event. "People trust the AI more over the last six months, which means they're not reading original content."

Not all companies have embraced Cloudflare's new approach. OpenAI declined to participate in the Pay Per Crawl program and criticized the decision to insert a middleman between publishers and AI firms. The Microsoft-backed firm said it continues to respect robots.txt files, the industry-standard protocol for indicating scraping permissions, though enforcement is voluntary.

Legal experts say Cloudflare's move could complicate AI firms' ability to acquire high-quality training data. "If effective, the development would hinder AI chatbots' ability to harvest data for training and search purposes," said Matthew Holman, a partner at U.K. law firm Cripps. "This is likely to lead to a short term impact on AI model training and could, over the long term, affect the viability of models."

Cloudflare Blocks AI Crawlers by Default, Launches Pay-for-Access Model for Publishers

More From BusinessTImes

The best of BusinessTimes news delivered right into your email box absolutely free.