Is the AI application era really here?
The theme of this year's Baidu World Conference is "Applications are coming" - as of early November, the average daily calls of Baidu Wenxin's large model have exceeded 1.5 billion, an increase of about 30 times compared with the 50 million times first disclosed a year ago. Robin Li said, “This steep growth curve represents the explosion of large model applications in China in the past two years.”
As the most mainstream form of AI application, intelligent agents are about to reach a breaking point. Robin Li emphasized, "It will become a new carrier of content, information and services in the native era of AI."
This is because, “On the one hand, the threshold for making an agent is low enough; on the other hand, the ceiling of an agent is high enough to make very powerful applications. The collaboration of multiple agents can also solve more complex problems.” question."
Robin Li demonstrated 4 different types of agents at the meeting, including company agents, role agents, tool agents and industry agents. Among them, the company's official intelligence is likely to replace the official website and become the most direct interface to consumers.
"Company-like agents are equivalent to company official websites in the AI era. Company-like agents have all the capabilities that traditional official websites have, such as company introductions, product pictures and parameter displays, offline store locations, etc. But traditional websites do not have the ability to proactively Recommendation, timely response and one-to-one service capabilities are also available in the company’s intelligent agent.”
Robin Li also released the code-free tool "Miaida" - a software composed of large models and agents that can realize any idea without writing code, including code-free programming, multi-agent collaboration and large-scale call of various This tool and other capabilities are "the most complex multi-agent collaboration tool in human history so far."
With the help of "Miaoda", users can complete the construction of the entire system through natural language interaction, and can also do various applications in any scenario. With the improvement of basic model capabilities and the evolution of Miaoda's own technical capabilities, more can be done in the future , more complex things, "This means that you do not need to recruit project managers, designers, developers, testers, etc., you can direct multiple agents to complete tasks collaboratively." One person can be an entire A team.
In Robin Li’s words, with Miaoda, we will usher in an “era where you can make money just by relying on ideas.”
After the C-side business of Baidu Netdisk was reclassified to the Mobile Ecosystem Group (MEG) in September and was taken over by Wang Ying, Baidu Vice President and Head of Baidu Library & Baidu Netdisk, at this world conference, Baidu The network disk and library have been further integrated.
Wang Ying shared some of Baidu’s new changes in content production and consumption at the conference. She mentioned that current content creation and consumption face many challenges, such as complicated tools, long production cycles and low consumption efficiency. To this end, Baidu Wenku and Netdisk have introduced AI technology to break the constraints of tools and modal boundaries and achieve a freer and more efficient content experience.
Baidu Wenku's intelligent PPT generation, AI paper creation, AI picture book production, AI search capabilities and AI novel and comic generation have significantly improved the efficiency and quality of content production; at the same time, in terms of content consumption, Baidu Netdisk has launched a simple Tools such as scanning, simple dictation, and AI video summarization greatly facilitate users to process and understand information.
Specifically, in office scenarios, Baidu Wenku provides smart PPT and smart research and reporting functions, while Baidu Netdisk has functions such as simple listening notes; in learning scenarios, users can use Baidu Wenku’s smart drawing books and photo search In terms of question function, Baidu Netdisk provides auxiliary learning tools such as video interpretation and Panpan vocabulary; in terms of entertainment, Baidu Wenku provides smart novels and smart comics, and Baidu Netdisk has functions such as AI photo editing. The integration of Baidu Wenku and Netdisk broadens users' application scenarios, realizes the intelligence and diversification of content, and further enhances user experience.
Free canvas function|Image source: Baidu
In addition, Baidu Wenku and Netdisk jointly launched a new content operating system-Free Canvas. This system can help users complete all tasks from finding information, to editing, to generating and sharing. It not only supports full-format input and output in multiple modalities, but also enables element-level content utilization, enriching creation and sharing. degree of freedom.
In Robin Li's words, "The free canvas is a universal whiteboard blessed by Wenxin's multi-modal large model." These innovations not only demonstrate the huge potential of AI technology in the content field, but also herald more possibilities for content production and consumption in the future.
At the conference, Shen Dou, executive vice president of Baidu Group and president of Baidu Intelligent Cloud Business Group, shared the latest progress of Baidu Intelligent Cloud in large models and AI native applications. Shen Dou said that large model technology is moving from technological change to industrial change, redefining the way people interact with the digital world and the physical world, and becoming a key element for enterprises to enhance their competitiveness.
Shen Dou focused on the Qianfan platform, which is a platform for large model fine-tuning and application development. It provides a rich tool chain and significantly lowers the threshold for AI native application development. The Qianfan platform not only meets the highly customizable, large-scale, high-availability and high-security requirements of enterprise-level applications, but also releases workflow agents. This innovation leverages the intent understanding and generalization capabilities of large models to turn complex workflows into flexible agents, significantly improving enterprise efficiency. For example, China Pacific Insurance used the Qianfan platform to generate a "gold medal sales" agent, which significantly improved the service efficiency and user experience of auto insurance renewal.
Baidu Smart Cloud Qianfan|Image source: Baidu
In addition, the Baige platform provides efficient large model-related computing services, from cluster creation to model training and inference, ensuring stable and extremely fast performance. The platform solves key problems in large-scale cluster deployment, supports efficient operation, and meets the computing power needs of different customers.
Shen Dou also demonstrated application cases of the Qianfan platform in multiple industries, including general diagnosis medicine improving the accuracy of medical record generation through fine-tuning models and saving doctors time; State Grid is exploring AI applications in the power industry based on the Wenxin large model and has achieved remarkable results. Results. In addition, the upgrade of XiLing 4.0 platform enables it to generate 3D digital human images and produce professional videos, significantly reducing the cost of short video production.
Baidu Intelligent Cloud has built a new AI infrastructure through Qianfan and Baige platforms, promoting the application of large model technology in various industries to improve the intelligence level and efficiency of enterprises.
In addition to applications, Baidu has also released hardware this time. At the meeting, Li Ying, vice president of Baidu Group and CEO of Xiaodu Technology, released "the first native AI glasses equipped with Chinese large models" - Xiaodu AI glasses.
Xiaodu AI glasses|Image source: Baidu
Li Ying said that as a first-person perspective device for humans, AI glasses’ ability to capture vision, sound, location and other information will bring unprecedented extension of people’s senses, and will also become a more efficient and convenient entrance to human-computer interaction.
Based on the Wenxin large model and the DuerOS AI native operating system, Xiaodu AI can realize functions such as first-person perspective shooting, asking questions while walking, calorie recognition, object recognition encyclopedia, audio-visual translation, and smart memo.
By combining the device and cloud with large models, Xiaodu AI glasses can be used independently or paired with an APP. The glasses have built-in Chinese one-liners and can respond to users' questions in real time.
In terms of hardware, Xiaodu AI glasses are equipped with a four-microphone array to recognize sounds, an open leak-proof speaker design, a 16-megapixel ultra-wide-angle lens and an AI anti-shake algorithm; they can be fully charged in 30 minutes, achieving 56 hours of standby, and more than 5 hours of continuous use. Listen; the whole machine weighs only 45 grams, which is lower than the industry average of 49 grams.
Xiaodu AI glasses are expected to go on sale in the first half of next year. The price has not yet been announced, but the booth staff said the price may be around 2,000 yuan.
Robin Li mentioned at the meeting that the AI industry has undergone significant changes in the past 24 months, the most prominent of which is that large models have basically eliminated the phenomenon of hallucinations. This change makes AI from the original "serious nonsense" to usable and trustworthy. The large model is essentially a probabilistic model, and the content it generates has a certain degree of uncertainty. However, by employing RAG technology, large models are able to leverage the retrieved information to guide the generation of text or answers, significantly improving the quality and accuracy of content.
In order to solve the problem of hallucinations in image generation, Baidu developed a technology at the beginning of this year - iRAG (Image based RAG), which is retrieval-enhanced Vincent graph technology. Prior to this, the pictures generated by Vincentian graph systems based entirely on large language models were often of poor quality and even illogical. Baidu's iRAG technology combines Baidu Search's billion-level image resources and powerful basic model capabilities to generate a variety of ultra-realistic images. The overall effect far exceeds the native Vincentian system and eliminates traces of machine generation.
As the usability of AI-generated images has greatly improved, its application space has also been greatly expanded. For example, in a brand promotion scenario, it might have cost hundreds of thousands of yuan to produce a set of posters in the past, but now the creation cost is almost zero. In short, the commercial value of iRAG is reflected in the aspects of no illusion, ultra-realism, low cost and instant availability.
Robin Li at the conference | Image source: Baidu
When the basic model capabilities mature, a prosperous period of AI applications is coming. So, where do AI applications come from and where will they go? There are two main directions: one is intelligent agents, and the other is industrial applications.
In the future, perhaps when the prosperous period of AI applications truly arrives, AI can truly realize its mission of "industrial revolution-level opportunities" and bring unlimited expansion of productivity to the social economy.