When will super apps appear? This may be the most common anxiety in the AI industry over the past year.
It is easy for people to compare artificial intelligence with the PC Internet or the mobile Internet. A few years after the emergence of both, they have popular super applications. However, artificial intelligence is an industrial revolution-level technology wave. The emergence time of artificial intelligence super applications is more comparable to the emergence time of super applications after the emergence of steam engines and electricity.
In 1776, the first steam engine with practical value was manufactured and became a universal prime mover, leading human society into the "steam age". It was not until the 1800s that steam engines were widely used in railways and shipping. Used in various industrial sectors. The second law of thermodynamics appeared nearly 100 years later; the electric revolution was also an evolutionary process. The invention of electric power set off the climax of the second industrial revolution. Power plants, electric lights, assembly lines, etc. all appeared after electricity. A new business format that has gradually evolved over decades.
Therefore, super applications in the artificial intelligence era will definitely appear, but the time has not yet arrived. In the past year, the AI industry has been pursuing so-called "super applications", which seems to be a bit eager for quick success.
As a basic technology, large models do not directly produce practical value. The various applications built based on the basic large model are the meaning of the model's existence. For AI application developers and entrepreneurs, the best strategy is obviously not to stick to AGI or "super applications", but to take small steps and continue to iterate to make super useful applications.
Recently, at the 2024 Baidu World Conference, Baidu announced the latest data for its Wenxin Big Model: half a year ago, the daily API calls of Wenxin Big Model were 200 million, and now it exceeds 1.5 billion, an increase of 7.5 times in just six months. This is not only a microcosm of the explosion of AI applications in China, but also shows that large models have truly produced practical value for applications.
For a long time, it has been difficult to sell large domestic models to other industries. An industry insider once told 36Kr, “Whether it is smart hardware or AI agents, the demand in the industry is very strong, but few people are really willing. Pay the bill because the large model generation is so poor and there are illusions everywhere”. Limited by the development of multi-modal capabilities, the initial user experience of generative artificial intelligence is closer to that of a simple conversational bot. Initially, users have a need for early adopters, but due to the mediocre experience, retention is poor.
In the past year, the biggest change in large models is that the “illusion” has been basically eliminated and the models have become usable. The large model is essentially a probabilistic model. In text generation, the next most likely text is automatically generated, which leads to AI often experiencing "hallucinations", which is the so-called "serious nonsense".
If you want to develop applications based on large models, you must eliminate "illusions". The AI industry generally uses retrieval enhancement technology (RAG, Retrieval-augmented Generation) to basically eliminate the illusion of text generated by large models, making large models have practical value. To be practical, multi-modal technology also needs accuracy and controllability to expand the AI application space.
Baidu released a new iRAG (image based RAG) at this world conference - retrieval enhanced image based technology. At the beginning of this year, Baidu decided to solve the problem of multi-modal generation of "illusions", so that Vincent pictures can also eliminate illusions, thus landing in the fields of film and television works, comic works, comic books, poster production and other fields.
For example, the automobile industry relies heavily on marketing and often requires a large number of high-quality photography. In order to produce a perfect picture, it requires a lot of human, financial and material resources. Using iRAG technology, car companies can obtain a photograph with remarkable visual performance at a very low cost and in a faster time. It may even be more visually stunning.
At present, the technical route of generative artificial intelligence is basically divided into two schools. One is the AGI school, which dreams of achieving general artificial intelligence in a few years through basic large models; the other is the application-driven school, which starts from application needs and uses Apply feedback models to innovate.
On the basis of continuous research and development of large-scale underlying models, Baidu places more emphasis on application-driven development. It is understood that iRAG is used because applications need to generate accurate images. For example, a company's logo cannot be deformed or color distorted, which requires precise multi-modal capabilities. After nearly a year of hard work, this technology has become practical. The progress of application can also feed back into the research and development of the model itself.
After two years, generative AI is in a critical period of shifting gears. 36Kr previously disclosed that two domestic AI startups have suspended pre-training of large models. In the past two days, the industry's debate on whether the Scaling Law has reached "diminishing returns on investment" has intensified.
In fact, on a global scale, changes have already begun. Global technology giants such as OpenAI, Microsoft, and Google have successively stepped down and deployed intelligent agents. In mid-September, OpenAI researcher Noam Brown announced on social media that he was recruiting machine learning engineers for the new multi-agent research team. Microsoft CEO and Chairman Nadella personally announced the new progress of his own AI, released 10 new business intelligence agents in one go, and formed a group to debut. Almost at the same time, there was news that Google was also going to release an intelligent agent. Soon Google "accidentally" leaked an "internal preview version" of the latest AI development result Jarvis, which is an agent-type artificial intelligence that can browse the Internet and search independently. information intelligence.
Baidu is leading the smart agent trend in China. At this Baidu World Conference event, the intelligent agent became the protagonist. Baidu focuses on four types of agents: company type, role type, tool type, and industry type.
For example, the tool-like agent "Free Canvas": based on Baidu's long-term accumulation of library business in the early years, and superimposed on generative artificial intelligence technology, it has achieved a great leap in creation.
In the early years, people's need for using libraries was to find ready-made documents. However, when generative artificial intelligence technology emerged, Baidu discovered that people's most fundamental need is not to find a ready-made document, but to create content that is more suitable for them.
In order to meet such needs, Baidu began to think about how to enable people to create better based on ready-made documents, or without a material basis. Following this path, the earliest Baidu library was reconstructed. Later, Baidu released an independent product, Orange Pian, which can generate long articles with one click. The birth of Free Canvas is also based on this logic, allowing people to more conveniently "communicate your ideas." In layman's terms, it means how to Express your inner thoughts more conveniently and accurately.
Robin Li, founder of Baidu, believes that “agents are the most mainstream form of AI applications and are about to usher in its explosion point.” The analogy of making agents is building a website in the PC era, or building a self-media account in the mobile era. The difference is that the agent is more human-like, more intelligent, and more like your sales, customer service, and assistant. Agents may become the new carrier of content, information and services in the AI-native era.
OpenAI CEO Sam Altman also expressed the possibility of turning to AI agent developers when answering a question on Reddit last month. "We're going to have better and better models, but I think the next big breakthrough will be AI agents." NVIDIA’s Jen-Hsun Huang also said that NVIDIA will have 100 million intelligent agents in the future.
The characteristic of an intelligent agent is that the threshold is low enough and the ceiling is high enough, and it can grow into a very powerful company. Just like Google and Meta, which were founded by college students many years ago, they have grown into the most powerful technology giants in the world. . To a certain extent, not building an intelligent agent now is like not building a website twenty years ago or an APP ten years ago.
There are few Chinese companies that have played such an important role in the talent, resource and technology nodes of global artificial intelligence development as Baidu. Behind this, it is inseparable from the founder’s belief and persistence in AI. In the industry, Robin Li has a classic saying, "When I have 1 yuan, I will invest in technology; when I have 100 million, I will invest in technology; when I have 10 billion, I will still invest in technology." .
Baidu's AI work can be traced back to the famous auction more than ten years ago. One day in December 2012, a secret auction was held at the foot of a ski mountain south of Lake Tahoe in Nevada, USA. The assets being auctioned were actually "three people" - Professor Geoffrey E. Hinton, the "godfather of AI" and two of his students.
Representatives from Baidu, Google, Microsoft, and DeepMind frequently raised their bids, and the offer soared to US$44 million. At this point, only Baidu and Google were left among the participants. Although Baidu participated in the auction without an upper limit, it was ultimately unsuccessful.
This also made Robin Li realize that he must develop deep learning, autonomous driving and other technologies on his own. After that, he established Baidu America Research Institute and began to vigorously recruit global talents. Since then, he has successfully attracted talents including Ng Enda, Dario Amodei, etc. Top talents from around the world join.
In the next ten years, Baidu began its full-stack self-research period in artificial intelligence technology, from chips, frameworks, models to application layers, defeating them one by one. Baidu has successively released the autonomous driving open platform Apollo, open sourced the deep learning framework PaddlePaddle, and even released version 1.0 of the Wenxin large model early in 2019.
However, until the birth of ChatGPT, the application of AI technology had not found a tipping point. It has been regarded by the industry as a bottomless pit of money, and its practical application is still far away.
Persistence always pays off. The reversal occurred in March 2023. Based on version 3.0 of the Wenxin large model, Baidu was the first in the world to release a product that benchmarked ChatGPT, Wenxin Yiyan. At this point, ten years of silent investment finally paid off.
Beginning in the second half of 2023, while ensuring that the basic model continues to lead, Baidu suddenly realized that the homogeneous competition of large models has caused a huge waste of resources. Robin Li has publicly called on many times to "revolve applications, not models." , and requested within the company to be the first company to reconstruct all products using large models. At the 2023 World Conference, Baidu showed the outside world the reconstruction results of important applications such as search, maps, and network disks. At this year's World Conference, Baidu's theme was directly set as "Applications Are Coming", allowing the outside world to see the large-scale model in The huge value created in the fields of intelligent bodies, industrial applications and other fields.
Looking back on the past, it is not difficult to see that Baidu has made the right choices at every important node in the development of global artificial intelligence in the past decade. In the longer-term future, Robin Li hopes that AI can truly be used by every ordinary person, so that everyone can have the ability of a programmer.
At the Baidu World Conference, Robin Li also released One More Thing - Miaida, a software with no code programming, multi-agent collaboration, and multi-tool invocation.
Miaida is very different from any previous auxiliary code generation tools in that it does not require users to understand the code. In contrast, previous AI tools, as productivity tools, were more about strengthening the capabilities of elites at the top of the pyramid. For example, in Silicon Valley, auxiliary code generation is very important because there is a shortage of engineers in the United States and engineers’ hourly wages are also very expensive. Auxiliary tools It can improve efficiency and make those at the top of the pyramid more powerful.
But AI should be something that everyone can benefit from, rather than being a patent used by a few.
As the capabilities of basic models and agents gradually improve, Baidu integrates these technical capabilities to allow real ordinary people, who cannot understand a line of code, to have the capabilities of programmers.
Just imagine, when hundreds of millions or more than a billion people have this ability, it will correspond to a huge market space, especially the explosion of creativity, which is unmatched by technologies such as auxiliary code generation tools. Baidu hopes that every ordinary person can have the abilities of those at the top of the pyramid, and its significance is naturally more profound.
Robin Li said during the conference: "Baidu is not going to launch a 'super application', but will continue to help more people and more companies create millions of 'super useful' applications."
Just imagine that in the AI era, more and more people can learn to create new products and services, and use natural language programming, a creative and low-threshold action, to realize some wild ideas and make countless valuable products. Application, this is the true inclusiveness of technology.