OpenAI's GPT-5 project (codenamed Orion) has made slow progress, which has aroused widespread concern in the industry about the future development direction of large-scale language models. According to reports, although GPT-5 performs better than existing models, its improvement is not enough to support the huge research and development costs. More importantly, the lack of global data has become the main bottleneck restricting GPT-5 from further improving its intelligence level. This article will deeply explore the technical challenges, internal difficulties faced by the GPT-5 project, and the resulting thinking on the bottlenecks of AI development.
The high-profile GPT-5 project (codenamed Orion) has been under development for more than 18 months, but has yet to be released. According to the latest report from the Wall Street Journal, people familiar with the matter revealed that although Orion’s performance is better than OpenAI’s existing model, its improvement is not enough to justify continuing to invest huge costs. What is even more worrying is that the lack of global data may be becoming the biggest obstacle for GPT-5 to move towards a higher level of intelligence.
It is said that GPT-5 has undergone at least two trainings, and each training exposed new problems and failed to meet the researchers' expectations. Each round of training takes several months, and the computational cost alone is as high as $500 million. It's unclear if or when the project will succeed.
The road to training is full of difficulties: data bottlenecks appear
Since the release of GPT-4 in March 2023, OpenAI has begun the development of GPT-5. Typically, the capabilities of an AI model increase as the amount of data it absorbs increases. The training process requires massive amounts of data, takes months, and relies on a large number of expensive computing chips. OpenAI CEO Altman once revealed that the cost of training GPT-4 alone exceeds US$100 million, and the cost of training AI models in the future is expected to exceed US$1 billion.
In order to reduce risks, OpenAI usually conducts a small-scale trial run first to verify the feasibility of the model. However, the development of GPT-5 encountered challenges from the beginning. In mid-2023, OpenAI launched an experimental training called "Arrakis" designed to test the new design of GPT-5. However, training progress is slow and costly, and experimental results indicate that the development of GPT-5 is more complex and difficult than originally expected.
Therefore, OpenAI’s research team decided to make a series of technical adjustments to Orion and realized that the existing public Internet data could no longer meet the needs of the model. In order to improve the performance of GPT-5, they urgently need more types and higher quality data.
“Creating data from scratch”: Coping with data shortages
In order to deal with the problem of insufficient data, OpenAI decided to "create data from scratch." They hire software engineers and mathematicians to write new software code or solve mathematical problems, and let Orion learn from these tasks. OpenAI also lets these experts explain their work processes, transforming human intelligence into machine-learnable knowledge.
Many researchers believe that code, as the language of software, can help large models solve problems they have not seen before. Turing CEO Jonathan Siddharth said: "We are transferring human intelligence from the human brain to the machine brain."
OpenAI even works with experts in fields such as theoretical physics to let them explain how to solve difficult problems in their fields. However, this “create data from scratch” approach is not very efficient. The training data of GPT-4 is about 13 trillion tokens. Even if 1,000 people write 5,000 words a day, it will take several months to produce 1 billion tokens.
In order to speed up training, OpenAI is also trying to use "synthetic data" generated by AI. However, studies have shown that the feedback loop in which AI-generated data is reused for AI training sometimes causes the model to make errors or generate meaningless answers. In this regard, OpenAI scientists believe that these problems can be avoided by using the data generated by o1.
Internal and external troubles: OpenAI faces multiple challenges
OpenAI not only faces technical challenges, but also internal turmoil and poaching by competitors. At the same time, dual pressures from technology and finance are also increasing. Each training session costs up to $500 million, and the final training cost is likely to exceed $1 billion. At the same time, competitors such as Anthropic and Google are also launching new generation models in an attempt to catch up with OpenAI.
Brain drain and internal disagreements further slowed development. Last year, OpenAI's board of directors abruptly fired Altman, leading some researchers to question the company's future. While Altman was quickly reappointed as CEO and began reforming the company's governance structure, more than 20 key executives, researchers and long-term executives, including co-founder and chief scientist Ilya Sutskever and head of technology Mira Murati, have been replaced since the beginning of this year. Employees are leaving one after another.
As the progress of the Orion project stalled, OpenAI began to develop other projects and applications, including a simplified version of GPT-4 and the AI video generation product Sora. But this has led to fierce competition between different teams for limited computing resources, especially between the new product development team and the Orion research team.
AI development bottleneck? The industry faces in-depth thinking
The dilemma of GPT-5 may reveal a larger industry proposition: Is AI approaching the "bottleneck period" of development? Industry insiders point out that strategies that rely on massive data and larger models are gradually ineffective. Former OpenAI scientist Suzko Ver once said, "We only have one Internet," the growth of data is slowing down, and the "fossil fuel" that drives the leap in AI is gradually drying up.
Altman has never given a clear timetable for the future of GPT-5. We still don’t know for sure when or if OpenAI will launch a model worthy of being called GPT-5. This dilemma about GPT-5 has also triggered people’s in-depth thinking about the future development direction of AI.
The stagnation of the GPT-5 project not only affects the development of OpenAI itself, but also sounds the alarm for the entire AI industry, suggesting that the path of relying solely on data scale and model size may have come to an end, and future AI development needs to explore new approaches. direction and technological breakthroughs.