Big disappointment! User actual test of OpenAI’s new model o1: It actually made all kinds of low-level mistakes, even miscalculating a letter

Author：Eve Cole Update Time：2024-12-02 20:00:01

OpenAI’s latest AI model “o1-preview” (previously codenamed “Strawberry”) has sparked heated discussions. OpenAI claimed that its capabilities were as good as those of a PhD student, but in actual tests it showed disappointing errors. The editor of Downcodes will take you to have an in-depth understanding of this highly anticipated but problematic AI model to see what level it has reached and the real feedback from users.

Recently, OpenAI launched the highly anticipated AI model, previously codenamed "Strawberry" and officially named "o1-preview".

OpenAI promises the new model will perform as well as a PhD student on difficult benchmark tasks in physics, chemistry and biology. However, preliminary test results show that this AI is still far from its goal of replacing human scientists or programmers.

On social media, many users shared their experiences interacting with the “OpenAI o1” AI, and the results showed that the model still performed poorly on basic tasks.

For example, Mathieu Acher, a researcher at INSA Rennes, found that OpenAI o1 frequently proposed illegal moves when solving certain chess puzzles.

Meta AI scientist Colin Fraser pointed out that in a simple word puzzle about farmers transporting sheep across a river, the AI actually gave up the correct answer and instead gave some illogical nonsense.

Even in the logic puzzle that OpenAI used as a demonstration, questions involving strawberries led to users getting different answers, with one user finding that the model had an error rate as high as 75%.

Not only that, but some users have reported that the new model even makes mistakes when counting the number of times the letter "R" appears in the word "strawberry."

Although OpenAI stated at the time of release that this was an early model and did not yet have features such as web browsing and file uploading, such basic errors are still surprising.

In order to improve, OpenAI introduced the "thinking chain" process in the new model, making OpenAI o1 significantly different from the previous GPT-4o model. This approach allows the AI to think over and over again before arriving at an answer, although this also results in longer response times.

Some users discovered that the model actually took 92 seconds to give an answer to a word puzzle, but the result was still wrong.

Noam Brown, a research scientist at OpenAI, said that although the current response speed is slow, they expect future versions to think longer and even provide new insights on breakthrough problems.

However, the famous AI critic Gary Marcus is skeptical about this and believes that long-term processing does not necessarily lead to transcendent reasoning capabilities. He emphasized that despite the continuous development of AI technology, real-life research and experiments are still indispensable.

It can be seen that in actual use, the performance of OpenAI's new AI model is still disappointing in all aspects, and this has also triggered discussions about the future development of AI technology.

Highlight:

Recently, OpenAI launched a new AI model "Strawberry", claiming to be comparable to PhD students in complex tasks.

Many users found that the AI frequently made mistakes on basic tasks, such as coming up with illegal moves and answering simple puzzles incorrectly.

? OpenAI admits that the model is still under development, but thinking for a long time may not improve reasoning capabilities, and many basic issues remain unresolved.

All in all, although OpenAI's "o1-preview" model shows the potential of AI technology development, it also exposes many shortcomings in its practical application. In the future, the development of AI models still needs to strike a balance between technical improvement and practical application in order to truly achieve its expected goals. The editor of Downcodes will continue to pay attention to the trends in the AI field and bring you more exciting reports.