OpenAI recently announced an important update to its real-time API, launching five new voice options and reducing caching costs, aiming to provide developers with more affordable voice-to-voice application solutions.
Today, OpenAI announced an update to the real-time API, which is still in beta. The highlight of this update is the launch of five new voice options, designed for voice-to-voice applications, while also reducing related cache fees, making developers more affordable when using them.
Of the five new voices released, OpenAI showed three of these new sounds in an article on X, Ash, Verse and the UK-sounding Ballad. Not only are these sounds more vivid and adjustable, they also provide a more natural communication experience. OpenAI mentioned in its API documentation that this native voice-to-voice feature eliminates intermediate text formatting processing, enabling low latency and more delicate output.
However, OpenAI also reminds users that since the real-time API is still in the testing phase, it is temporarily unable to provide client authentication. In addition, real-time audio processing may be affected by network conditions, which also poses challenges in large-scale audio transmission. OpenAI points out that ensuring reliable audio transmission is indeed a difficult task when network conditions are unstable.
OpenAI's development history in voice technology is also controversial. In March, they launched the Voice Engine, a voice cloning platform, which attempted to compete with ElevenLabs, but was only open to a few researchers. With the demonstration of GPT-4o and voice modes, OpenAI paused voice use called "Sky" in May, as Hollywood actress Scarlett Johnson expressed dissatisfaction with it, believing it was too similar to her voice.
In September, OpenAI launched ChatGPT Advanced Voice Mode for its paid subscribers, which can be used by users such as ChatGPT Plus, Enterprise, Teams and Edu. Through this voice-to-voice technology, enterprises can generate real-time responses more quickly, greatly improving the efficiency of customer service.
Reduce costs by more than 50%Regarding the pricing of real-time APIs, OpenAI priced at $0.06 in a previous release at $0.24 in a minute audio input and $0.24 in an audio output, which is relatively high for developers. However, after this update, the cost of using cached text input will be reduced by 50%, while the cost of cached audio input will be up to 80%.
OpenAI announced the new feature of "Prompt Caching" in Developer Day, which can save context prompts of frequent requests in the memory of the model, thereby reducing the number of tokens required to generate a response. By lowering the input price, OpenAI hopes to attract more developers to use its API.
In addition, other companies such as Anthropic have launched similar caching features to increase the appeal of their voice technology.
Key points:
Five new natural voices are added to improve voice application experience
Real-time API reduces input costs through cache, making developers more cost-effective
Real-time audio processing is affected by network conditions, and reliability needs to be paid attention to
This update of OpenAI not only improves the application experience of voice technology, but also attracts more developers by reducing costs, further promoting the popularization and development of voice technology.