OpenAI's latest "predictive output" function of GPT-4o model has greatly improved the model's response speed, up to 5 times the original speed. This function developed in collaboration with FactoryAI effectively avoids duplicate generation by identifying and reusing foreseeable content, especially in scenarios such as code reconstruction and blog updates. This feature is currently only open through the API, supporting GPT-4o and GPT-4mini models. Many developers have actively tested and have good feedback.
OpenAI recently launched an important update to introduce "predicted Outputs" function to the GPT-4o model. This innovative technology significantly improves the response speed of the model, reaching up to 5 times the original speed in a specific scenario, bringing developers a new efficiency experience.
The core advantage of this feature, developed jointly by OpenAI and FactoryAI, is that it can bypass the repeated generation process of known content. In practical applications, it performs well in tasks such as updating blog posts, iterating existing replies, or rewriting code. According to data provided by FactoryAI, in programming tasks, the response time is reduced by 2 to 4 times, and the task that originally took 70 seconds was compressed to complete within 20 seconds.
At present, this function is only open to developers through API form, and supports GPT-4o and GPT-4mini models. The actual feedback on usage is positive, and many developers have launched tests and shared their experience. Firecrawl founder Eric Ciarla said when converting SEO content: "The speed is significant and the use is simple and straightforward."
Technically, the predictive output function works by identifying and reusing predictable content parts. OpenAI official documents give examples. In scenarios such as code reconstruction, if the "Username" attribute in C# code is modified to "Email", the entire class file can be input as predictive text, which can greatly improve the generation speed.
However, there are some usage limitations and precautions for this feature. In addition to the limitations supported by the model, some API parameters are not available when using predicted output, including n values greater than 1, logprobs, and presence_penalty and frequency_penalty greater than 0.
It is worth noting that while providing faster response, this feature also brings slight cost increase. According to user test data, the same task has reduced the processing time from 5.2 seconds to 3.3 seconds after using the predicted output function, but the cost has increased from 0.1555 cents to 0.2675 cents. This is because OpenAI also charges the tokens rate for the non-final completed part of the prediction.
Despite the slight increase in costs, this feature still has considerable application value given the significant efficiency improvement. Developers can obtain more detailed technical instructions and usage guides through the official OpenAI documentation.
OpenAI official documentation:
https://platform.openai.com/docs/guides/lateency-optimization#use-predicted-outputs
In short, OpenAI's "predictive output" function has brought significant efficiency improvements to developers. Although there are some limitations and cost increases, the speed advantages it brings are still worth paying attention to. Developers can weigh the pros and cons based on actual needs and choose whether to use this new feature.