In the era of rapid development of AI, data is the key to training. However, an online community that seemed incompatible with AI training - Baidu Tieba's Mentally Retarded Bar, unexpectedly became an important source of data for AI training, triggering widespread discussion and concern. This community is famous for its absurd humor. The extraordinary value of its data in AI training has subverted people's traditional understanding of high-quality data and triggered our in-depth thinking about the artificial intelligence learning mechanism.
Today, with the rapid development of artificial intelligence, a seemingly inconspicuous online community, Baidu Tieba's Mentally Retarded Bar, has unexpectedly become an important source of data for AI training, attracting widespread attention in the technology circle and the online community. This community, full of absurd humor, has shown amazing value in AI training. It makes people think: What makes these retarded remarks the cradle of intelligence?
In April this year, the results of a study jointly released by the Chinese Academy of Sciences, the University of Waterloo and other institutions were shocking. In eight tests including question and answer, brainstorming, classification, generation, and summary, Zhiba's performance surpassed well-known platforms such as Encyclopedia, Zhihu, Douban, and Xiaohongshu, becoming one of the most popular Chinese AI training databases. This discovery overturned people's traditional understanding of high-quality data.
At the recent Bund Conference, the core members of Retarded Bar made their first public appearance. They not only challenged AI, but also revealed the true face of this unique community. Zeng Xiaodong, CEO of Unbounded Ark, explained the reason for choosing the Mentally Retarded Bar as the training corpus: In order to make AI closer to humans, colloquial language and multiple rounds of question-and-answer corpus are needed, and the Mentally Retarded Bar just meets this demand.
Hu Luobei, a core member of Mentally Retarded Bar, shared his interesting experience with AI. As early as 2022, he tried to let AI interpret some jokes, but found that although AI could search for relevant information, it could not understand the true meaning of the jokes. This highlights the limitations of AI in understanding human humor.
However, there is a profound logic behind these seemingly absurd jokes. For example, the sentence "Knowing that there are tigers in the mountains, don't go to the mountains knowingly" cleverly dismantles and reorganizes the word "knowingly" to create new meanings. It is these language traps that train AI's ability to understand and reason about Chinese, allowing machines to communicate more like humans.
The charm of retarded bar is that it is the basic science in the joke. Most of the members of this community have science backgrounds. The jokes they create are not only logically rigorous, but also incorporate rich rhetoric and life observations. This unique creative method provides valuable learning material for AI.
Interestingly, the existence of the mentally retarded bar seems to have become a line of defense between humans and AI. As Hu Luobei said: No AI can laugh out of its retardation, because they don't understand humor at all. This deep language understanding and sense of humor become the key to distinguishing human intelligence from artificial intelligence.
Even though we live in an age surrounded by AI, the existence of Mental Bar reminds us that human creativity and humor are still unique. This seemingly absurd community not only provides a unique perspective on AI training, but also becomes a microcosm of human wisdom and creativity.
The case of Retarded Bar caused us to rethink the source of AI training data, and also allowed us to see the brilliance of human wisdom still shining in the era of artificial intelligence. It proves that seemingly useless data can also be of unexpectedly huge value under certain circumstances. In the future, there may be more unexpected data sources to promote the development of AI technology.