In recent years, the training of artificial intelligence models has increasingly relied on the collection of network data. As an important source of data, news websites’ attitude towards artificial intelligence crawlers directly affects the training effect of the model. This article will analyze the results of a study on news websites blocking OpenAI crawlers and explore the reasons and potential impacts behind it.
One study found that nearly half of popular news websites blocked OpenAI’s crawlers. Traditional print media websites are more blocked from OpenAI’s crawlers, and new AI models may experience degradation when trained with previous models. Artificial intelligence crawlers are used to collect data to train language models, and news organizations in northern hemisphere countries are more inclined to block artificial intelligence crawlers.
The findings shed light on the growing tension between news websites and the training of artificial intelligence models. The behavior of news websites blocking crawlers may lead to a decrease in the quality of artificial intelligence model training data, thereby affecting the performance and reliability of the model. In the future, how to balance the training needs of artificial intelligence models and the protection of the rights and interests of news websites will be an important topic. More effective cooperation models need to be explored to promote the development of artificial intelligence technology while respecting the intellectual property rights and data security of news organizations.