Recently, a seemingly simple mathematical comparison question "Which one is bigger, 13.8 or 13.11?" has stumped many people, including some advanced AI models. The editor of Downcodes will take you to delve into this incident, analyze the shortcomings of AI in dealing with common sense issues, and the direction of future improvement. This not only reveals the limitations of AI technology, but also triggers people's thinking about the future development of AI.
Recently, a simple mathematical question—which is bigger, 13.8 or 13.11?—not only stumped some humans, but also put many large language models (LLM) into trouble. This question has sparked widespread discussion about AI's ability to handle common-sense problems.
In a well-known variety show, this issue triggered heated discussions among netizens. Many people think that 13.11% should be greater than 13.8%, but in fact, 13.8% is greater.
AI2 researcher Lin Yuchen found that even large language models, such as GPT-4o, make mistakes on this simple comparison problem. GPT-4o mistakenly believed that 13.11 was larger than 13.8 and gave the wrong explanation.
Lin Yuchen's discovery quickly aroused heated discussions in the AI community. Many other large language models, such as Gemini, Claude3.5Sonnet, etc., also make the same mistake on this simple comparison problem.
The emergence of this problem reveals the difficulties that AI can encounter when dealing with tasks that seem simple but actually involve precise numerical comparisons.
Although artificial intelligence has made significant progress in many fields, such as natural language understanding, image recognition, and complex decision-making, they can still make mistakes when it comes to basic mathematical operations and logical reasoning, showing the limitations of current technology.
Why does AI make such mistakes?
Bias in the training data: The training data for the AI model may not contain enough examples to correctly handle this specific type of numerical comparison problem. If the model is exposed to data during training that primarily indicates that larger numbers always have more decimal places, it may incorrectly interpret more decimal places as larger values.
Floating point precision issues: In computer science, the representation and calculation of floating point numbers involves precision issues. Even small differences can cause erroneous results when comparing, especially if the precision is not explicitly specified.
Insufficient contextual understanding: While contextual clarity may not be a major issue in this case, AI models often need to correctly interpret information based on context. Misunderstandings can result if the question is formulated in a way that is not clear enough or does not match patterns common to AI in training data.
Impact of prompt design: How you ask questions to an AI is critical to getting the right answer. Different questioning methods may affect the AI's understanding and accuracy of answers.
How to improve?
Improved training data: By providing more diverse and accurate training data, AI models can be helped to better understand numerical comparisons and other basic mathematical concepts.
Optimize prompt design: Well-designed problem formulation can increase the chance of AI giving the correct answer. For example, using more explicit numerical representations and questioning methods can reduce ambiguity.
Improve the accuracy of numerical processing: Develop and adopt algorithms and techniques that handle floating point operations more accurately to reduce computational errors.
Enhanced logical and common sense reasoning capabilities: Through training specifically focused on logical and common sense reasoning, AI’s capabilities in these areas are enhanced, allowing it to better understand and handle common sense-related tasks.
All in all, the flaws exposed by AI in handling simple mathematical comparison problems remind us that AI technology is still in the development stage and needs continuous improvement and improvement. In the future, by optimizing training data, improving algorithms and enhancing logical reasoning capabilities, AI will make greater progress in handling common sense problems.