MiniMax Yan Junjie: Fast is good for large models, but sometimes slow is just to be faster.

Author：Eve Cole Update Time：2024-11-22 18:24:02

Large models are a field where speed is the key, but sometimes slowness can be another kind of speed.

On August 31, domestic AI unicorn company MiniMax Shanghai Xiyu Technology Co., Ltd. (hereinafter referred to as "MiniMax") low-key released the video model abab-video-1 at the first developer conference "MiniMaxLink Partner Day". Enter the prompt word to It can generate videos up to 6 seconds long, focusing on high resolution and high frame rate.

In other words, the video model mentioned by MiniMax is similar to the Vincent video model of OpenAI's sora. As the founder and CEO of MiniMax, Yan Junjie believes that "fast" is the core technology research and development goal of the company's underlying large model. However, the video model is several months behind Sora.

"Why is our launch one or two months late? The core is that we are solving a more difficult technical problem, that is, how to natively train things with relatively high computing power." Yan Junjie told a reporter from China Business News that during training When developing video generation capabilities, you need to first convert videos into tokens, and these tokens will be very long, and the complexity will be higher. "In fact, what we mainly did in the first half of the year was to reduce the complexity. Make the compression ratio higher, so it’s a month or two late.”

MiniMax said that based on internal evaluation and running scores, the company's video model performs better than Runway's. Currently, Keling has launched a commercial model of membership subscription plan. So, what will be the business model of the MiniMax video model? In this regard, Yan Junjie said: "Our strategy is to wait for another week or two. After new things come out and we are in a more satisfactory state, we may consider (taking) some commercialization (measures)."

He also mentioned that due to the rapid progress of models, although AI-generated videos cannot replace traditional rendering engines, it "at least provides a possibility" for creating 3A games like "Black Myth: Wukong".

Consider commercialization only when you are more satisfied

Although the commercialization path of the video model was not mentioned, Yan Junjie said: "The commercialization of the entire company is basically divided into two forms. One form is our open platform, which now has more than 2,000 customers, including many well-known Internet companies, including traditional enterprises, already have the ability for users to use sound and vision. Not all companies can do it themselves like Kuaishou. We are a good partner, and this is the 2B part.”

"The second is that our own products also have advertising mechanisms, and advertising can be monetized commercially." Yan Junjie believes that at the current stage, "the most important thing is not commercialization, but truly making the technology widely available." degree of availability.”

AI-generated videos (video models) with relatively complex technology have become a common operation for large model manufacturers to demonstrate their strength or "flex their muscles" this year, and OpenAI has started this. In February this year, OpenAI released Sora, a large video model, but it has not yet been released for public testing. In April, Shengshu Technology released the large video model Vidu; in June, Kuaishou released the large video model Keling; in July, the Zhipu AI-generated video model Qingying was officially launched...

Why does MiniMax want to create a video model? Yan Junjie said that the essence is that most of the content that humans consume every day is pictures, texts and videos, and text does not account for a high proportion. “In order to have very high user coverage and higher depth of use, as a large model manufacturer, the only The way is to be able to output multi-modal content instead of just outputting purely text-based content. This is a very core judgment."

He further mentioned: "It's just that we made text first, then made sounds, and we made pictures a long time ago. Now that the technology has become stronger, (can) also make videos. This route is consistent, It must be multi-modal. "In the past, MiniMax made large language models, then sound models, and then image models," but now the technology has become stronger, and it must also make videos. This route must be consistent. Ability to do multi-modality”.

According to AI algorithm engineer Zhang Yuxuan, although MiniMax has not announced the specific parameters and technical points of the video model, it can be seen from the displayed model generation video that the company's algorithm is still very strong, and Kuaishou's Keling is relatively Engineering is better.

Yan Junjie told reporters: "Whether it is video, text, or sound, the core research and development idea of the MiniMax team is not to improve the algorithm by 5% or 10%. What is more important is whether it can be improved several times. If it can be improved several times, then It must be done, it is not worth doing if it only increases by 5%.”

It is understood that MiniMax’s video model is currently only the first version and will be provided to users free of charge for a period of time. A new version will be available soon. "The follow-up work will focus on the data and algorithm itself, including details that are more convenient to use. For example, currently only text-based videos are provided. In the future, picture-based videos, text+picture generated videos, as well as editability and controllability will be released one after another," Yan Junjie said. .

"Black Myth: Wukong" is still popular, and AI has created new gameplay in the game. Recently, Google pointed out in a paper that they have created the first fully AI-driven real-time game engine - GameNGen, which can generate the game graphics of the classic shooting game "Doom" in real time at 20 frames per second. All game graphics are It is generated in real time based on player operations and interaction with complex environments, and each frame is predicted by the diffusion model.

So, will it be far away in the future for AI to generate 3A game masterpieces in real time? Yan Junjie said that "Black Myth: Wukong" still uses the traditional modeling and rendering method. This method has progressed very slowly. Generating video and generating text are the same. Generating text two years ago may not be available at all. But it is now available and developing rapidly.

"(Video generation) is actually just the beginning, because this is only the first year, and the progress will definitely be very fast. I don't know if it can replace the traditional rendering engine, but at least it can provide a possibility Because the progress is fast, in the long run, the faster the progress, the better." Yan Junjie said.

Significant growth in usage and enhanced competitiveness of the model

Fast is a keyword mentioned by Yan Junjie many times. "Whether we are doing MoE, Linear attention, or other explorations, the essence is to make the same effect model faster." Yan Junjie said that fast is good, which means that the same computing power can become better. This is MiniMax’s approach to underlying R&D.

At the same time, he also pointed out that how to continuously reduce the error rate of the model, infinitely long input and output, and multi-modality are three challenges that the industry needs to continue to solve.

According to the company, MiniMax has experienced two key underlying technology changes in the past, including MoE (Mixture of Experts, mixed expert model) and Linear Attention (linear attention). In April this year, the company developed a new generation model based on MoE+ Linear Attention, which is comparable to GPT-4o.

Public information shows that MiniMax is an artificial intelligence start-up company established in December 2021. It was founded by Yan Junjie, the former vice president of SenseTime and the former deputy director of the research institute. Its members are mainly from well-known AI companies such as SenseTime.

Tianyancha shows that in March this year, MiniMax completed a US$600 million Series B financing, with Alibaba as the investor, and its valuation reached US$2.5 billion. Previously, in June 2023, MiniMax completed a Series A financing of over US$250 million, and the investor was Tencent Investment.

One year after its founding, MiniMax independently developed the basic model architecture of three modes: text-to-visual, text-to-speech, and text-to-text, and built a computational reasoning platform based on the basic model.

In terms of products, MiniMax takes care of both the B-side and C-side markets. The C-side applications include role-playing AI chat application Glow, AI social software Hoshino, AI voice conversation assistant Conch WeChat, etc., while the B-side provides customized solutions for enterprises. The API interface allows enterprises to access various capabilities of the ABAB model. Companies such as Huoshan Engine, Kingsoft Office, DingTalk, Zhaopin Recruitment, and China Literature are all using its services. Official data shows that MiniMax’s models currently interact with global users more than 3 billion times a day, processing more than 3 trillion text tokens, 20 million pictures and 70,000 hours of voice. A year ago, MiniMax interaction time was only 3% of ChatGPT, and now this proportion has increased to 53%.

Since May, a price war has erupted in the field of large models, and APIs have dropped to “cabbage prices.” When talking about the large model price war, Yan Junjie pointed out that with the price war, many traditional companies began to be willing to use large models, "objectively speaking, it has greatly increased the number of model calls."

At the same time, this also promotes the improvement of model performance from the side. China's large models have also become competitive in Southeast Asia and other overseas countries. "It is such fierce competition among domestic models that we must move forward. At least in non-English speaking countries, we can achieve a level comparable to GPT." Yan Junjie said that competition is inevitable. We must strive to do our best. The optimistic side shows two positive changes: first, the use of large domestic models is growing significantly, and second, Chinese models are indeed becoming more and more competitive overseas.

Yan Junjie said that most companies thought that large models were expensive, but later many people thought that large models were cheap and could be used with confidence. In the end, I was surprised to find that many traditional companies are very willing to use large models. They think that the cost is low anyway, and it doesn’t matter if they make mistakes. They can just call it once more. Objectively speaking, this has greatly increased the number of model calls, thereby promoting the model to do better. At least for now, in non-English languages, the level of domestic large models is comparable to GPT. Therefore, from an optimistic perspective, the use of large domestic models is indeed growing significantly, and China's large AI models are indeed becoming more and more competitive overseas.

When talking about the possibility of head-on competition with major Internet companies, Yan Junjie said that what he can do is to infinitely amplify the things that have the potential to become stronger. One is how to improve technology, and the other is how to have better cooperation with users. Create.