OpenAI is about to launch the highly anticipated Alpha version of voice mode for ChatGPT Plus subscribers. This feature is based on its flagship model GPT-4o and significantly improves the voice interaction experience. The GPT-4o model can process audio input at a speed close to human reaction, and combines end-to-end training of three modalities: text, vision, and audio, demonstrating OpenAI's latest breakthrough in the field of multi-modal AI. Previously, the rollout of this feature was delayed due to the need to improve model content moderation and infrastructure construction. This update will not only solve the problem of excessive delay in the existing ChatGPT voice mode, but will also bring users a smoother and more natural voice conversation experience.
When OpenAI's flagship model GPT-4o (o stands for omni) was released in May, its audio understanding capabilities attracted much attention. The GPT-4o model was able to respond to audio input in an average of 320 milliseconds, which is similar to the reaction time of humans in a typical conversation.
OpenAI also announced that ChatGPT’s voice mode feature will leverage the audio capabilities of the GPT-4o model to provide users with a seamless voice conversation experience. Regarding GPT-4o’s speech capabilities, the OpenAI team wrote:
With GPT-4o, we trained a brand-new model that trains the three modalities of text, vision, and audio end-to-end, that is, all inputs and sums are processed by the same neural network. Since GPT-4o is our first model to combine all these modalities, we have still only scratched the surface of our model's potential and limitations.
In June, OpenAI announced plans to roll out advanced lingo mode in alpha to a small group of ChatGPT Plus users at a later date, but plans were delayed by a month due to the need to improve the model's ability to detect and reject certain content. . Additionally, OpenAI is preparing its infrastructure to scale to millions of users while maintaining real-time responsiveness.
Now, OpenAI CEO Sam Altman confirmed via X that the Alpha version of the voice mode will be rolled out to ChatGPT Plus subscribers starting next week.
The current ChatGPT voice mode is not intuitive to use due to the average delay of 2.8 seconds (GPT3.5) and 5.4 seconds (GPT-4). The upcoming advanced voice mode based on GPT-4o will allow ChatGPT subscribers to have smooth conversations without lag.
In addition, OpenAI today also released the highly anticipated SearchGPT, which is their new attempt at web search experience. Currently a prototype, SearchGPT provides artificial intelligence search capabilities that can quickly provide accurate answers from clear and relevant sources. You can learn more here.
All in all, OpenAI’s series of updates show its ability to continue to innovate in the field of artificial intelligence. In particular, the application of the GPT-4o model will significantly improve the user experience, and the release of SearchGPT heralds a new direction for future search engine development. We look forward to more surprising technological innovations brought by OpenAI in the future.