Google Gemini Chinese corpus is suspected to come from Wen Xinyiyan: Are big companies poaching each other?

Author：Eve Cole Update Time：2025-01-14 17:32:02

Recently, Google's Gemini-Pro Chinese corpus training has sparked heated discussions. It admitted that it used Baidu Wenxin Yiyan data for training, which caused an uproar on social media. Many netizens questioned whether there was unfair competition among large companies, sparking discussions about the sources and ethics of artificial intelligence model training data. The core of the incident is that Gemini-Pro relied on Baidu Wenxinyiyan's data in Chinese corpus training. This directly challenged the industry's boundaries on data ownership and intellectual property rights, and also exposed possible risks and hidden dangers in large-scale language model training. .

The article focuses on:

Google's Gemini-Pro Chinese corpus caused controversy. It revealed that it used Baidu Wenxinyiyan for training. Netizens wondered whether big companies were trying to steal each other's wool. Gemini-Pro sparked social media attention after appearing confused about its identity during testing. Gemini officials finally admitted to using Baidu Wenxin for Chinese training data, further intensifying the topic.

This incident not only highlights the contradiction between data resource sharing and intellectual property protection in the field of artificial intelligence, but also raises concerns about the trustworthiness and transparency of large language models. In the future, the training and application of artificial intelligence models will require more standardized management and stricter ethical standards to ensure the healthy development of the industry.