Cui Lingling, general manager of the Patent Affairs Department of Baidu Group, released the "Baidu Top Ten Technological Frontier Inventions in 2024": Baidu's cutting-edge patented inventions in the field of artificial intelligence, covering all-round breakthroughs from basic algorithms to application scenarios. According to the "New Generation Artificial Intelligence Patent Technology Analysis Report" released by the National Industrial Information Security Development Research Center and the Electronic Intellectual Property Center of the Ministry of Industry and Information Technology in April this year, as of the end of 2023, Baidu had applied for 19,308 patents in the entire field of artificial intelligence and granted 9,260 patents. Baidu has ranked first in China for six consecutive years; in the new generation AI field with large models as the core, Baidu applied for 1,432 patents and authorized 651, becoming a leader in technological innovation and patent layout. According to the generative artificial intelligence patent landscape insights released by the patent database organization IFIclaims, Baidu's generative artificial intelligence patent applications are among the top 10 in the world. It is the only Chinese innovation subject to enter the list, and the patented technology covers text, image, voice, and video. In the large basic field, it has become one of the four companies in the world with comprehensive layout in these four fields.
On November 12, Baidu World 2024 with the theme of "Applications Are Here" will be held in Shanghai. As a high-profile technology conference of the year, Baidu will also release its latest results, once again bringing eye-catching technological breakthroughs and product launches.
Baidu's top ten cutting-edge technological inventions in 2024 are as follows:
1. Agent technology based on generative large models
This invention technology innovatively introduces a thinking model, enabling the agent to have multiple capabilities such as task planning, tool invocation, knowledge enhancement, and reflective evolution. Through systematic design and directional optimization of core capabilities, it can support the large-scale construction and deployment of agents in different application scenarios at low cost; by building large-scale simulation capabilities, it can accelerate the construction and distribution of agents. This technical system has been successfully used in many key scenarios such as Wenxin Intelligent Platform, Merchant Intelligent Agent, Wenxin Quick Code, etc., which has significantly improved the research and development efficiency of intelligent agents and lowered the research and development threshold. Among them, merchant agents use planning + expert multi-model collaboration technology and large-scale simulation technology to improve their ability to reflect, evolve and use tools, and build AI marketing capabilities; Wenxin Kuaicode relies on code recommendation and agent systems to integrate with traditional DevOps The organic combination of tool chains promotes the in-depth exploration and implementation of human-machine collaborative pair programming.
2. Multi-model co-evolution technology based on large model efficient training framework
This inventive technology overcomes a series of difficult problems from both engineering and algorithmic perspectives. In terms of engineering architecture, all-round innovative breakthroughs include hybrid parallel strategies, communication efficiency, and computing and storage optimization, which significantly improve the training performance of large language models and support the efficient and stable training of Wenxin's entire series of models throughout the process. In terms of algorithm strategies, we have developed pre-training technology for large and small model collaboration, overcoming the technical problem of difficult inheritance of knowledge between models, changing the training paradigm of traditional models, and reducing the cost of training new models. Based on this invention, technical barriers to models of various sizes have been constructed, which has increased the training throughput of Wenxin's large model by 4.1 times in the past year, supporting Wenxin Yiyan to efficiently meet a wide range of businesses with different needs and empowering thousands of industries.
3. Intelligent system integrating multi-modal content creation and compilation based on large models and knowledge retrieval enhancement technology
The technology of this invention comprehensively uses technologies such as knowledge enhancement, multi-source content analysis, integrated editing, and retrieval-enhanced lexicon to solve problems such as weak production quality of professional long articles and multi-modal content, inability to share containers when creating and editing, and poor accuracy of the main body of lexicon. . Retrieval of enhanced textual images aims to adaptively process reference images through intelligent judgment of user needs. The mixed-mode image generation system significantly improves the consistency of the main body of the image, effectively making up for the shortcomings of inaccurate long-tail content description. Overall The effect far exceeds the native system of Wenshengtu. Baidu Wenku has made great achievements in generating industry research reports, presentations, mind maps, and comic books in real time based on user instructions and uploaded content, and supports complex tasks such as one-stop editing, cross-modal conversion, and general/personalized drawings. Significant performance improvement. In August 2024, Aurora's Yuehu Data released a report showing that Baidu Wenku's smart PPT market share has reached 80%. In the past three months, the compound growth rate of user scale has reached 23%, and the growth rate far exceeds the industry level.
4. Support large-scale autonomous driving positioning and lane-level map generation technology
This inventive technology breaks through the efficiency and cost problems of the traditional model, reduces the cost of map production by 95%, and has a lane-level road mileage of more than 3.6 million kilometers, achieving full coverage of more than 41,000 urban and rural towns across the country. The high-precision positioning technology for autonomous driving based on multi-modal sensor fusion further constructed based on map data has an accuracy of centimeter level, which greatly improves mass production and reduces the volume of the map package that vehicle-side positioning relies on by 97.5%, and the reliability reaches 99.9999 %, fully supporting the current large-scale operation of fully autonomous driving of Luobo Kuaipao, and realizing fully autonomous driving in various complex and difficult scenarios such as under viaducts, multi-layer roads, and tunnels.
5. Personalized memory mechanism for large model intelligence
This invention technology innovatively proposes a comprehensive set of memory mechanisms, covering five modules of memory processing, storage, management, triggering and utilization, giving large models personalized memory capabilities. Memory processing draws on the human hippocampus mechanism to achieve in-depth understanding and accurate processing of user information in all scenarios; memory management supports users' active additions, deletions, and modifications and the system's automatic additions, deletions, and modifications, ensuring real-time updates and accuracy of the memory bank; memory triggering and utilization, Assist large models to produce more anthropomorphic and personalized responses through the speculative generation of relevant memories. This invention technology has been widely used in scenarios such as intelligent AI assistants and digital humans.
6. Super realistic digital human modeling, driving and generation system based on large models
This inventive technology proposes a complete set of super-realistic digital human modeling, driving and generation solutions. For real digital people, we have developed data-driven portrait modeling, cross-modal driving and large portrait video generation models to achieve natural and realistic digital human content production. We exclusively support live portrait cloning in large-scale action & occlusion scenes, and The first full-body intelligent-driven live broadcast room was implemented. For hyper-realistic 3D digital humans, we have developed modal migration and multi-agent collaboration technologies based on the Wenxin large model, achieving minute-level production of hyper-realistic digital human images and operational content that are comparable to film and television blockbusters and 3A games. The technology of the present invention has been widely used in many real-person and 3D digital human products such as digital human live broadcast, video production, and intelligent bodies.
7. Generative commercial retrieval system based on large models
This invention technology has changed the traditional "index-recall-sort" process, flattened the system funnel, reduced information loss, and encoded business information into model parameters by building index learning tasks to achieve "model as index" and utilize the power of large models. Understanding and reasoning capabilities, realizing "generation and retrieval", the new paradigm significantly improves the system orientation efficiency by 120%. The project involved in this invention was the first to be implemented in the industry, realizing large-scale industrial application. The generative large model was combined with commercial search scenarios to achieve multiple technological innovations. The creative richness increased by 37 times, the creative quality increased by 92%, and significant business benefits and broad scope were achieved. Technical influence.
8. Large model data flywheel technology
This invented technology automatically identifies model defects and efficiently synthesizes high-quality, diverse training data by integrating information from multiple sources and forms such as user feedback, execution feedback, and self-supervised feedback. At the same time, the reinforcement learning method combined with multi-source feedback significantly improves the model training effect. This innovative technology builds a data flywheel that can continuously improve itself, effectively breaking through the data bottleneck of large models, reducing data acquisition costs, improving the adaptability and robustness of large models, and improving the model's performance in different task scenarios. Generalization ability accelerates the continuous evolution of large models.
9. Large model efficient inference technology
The efficient reasoning technology proposed by this invention technology, the underlying model layer is based on the Flying Paddle Framework. In the direction of reasoning architecture, it continues to innovate in the direction of mainstream PrefixCaching, Lookahead, PagedAttention, PD separation, etc., and efficiently combines various technologies to greatly improve model throughput. and performance. In terms of large model compression, the company adopts large model lossless quantization technology and activates methods such as adaptive segmentation smoothing and weight linkage rearrangement. It is the first in the industry to achieve efficient lossless compression of large models of tens of billions and hundreds of billions. This invention supports a variety of large model compression and inference acceleration methods, and has been used in core businesses such as Baidu Intelligent Cloud Qianfan large model platform to reduce resource consumption of model inference, save large model deployment costs by more than 50%, and improve model performance and model throughput. Improved by 3-5 times.
10. Retrieval generation system driven by user data feedback
The retrieval generation system proposed by this inventive technology can combine user behavior feedback signals to achieve rapid self-reinforcement. Directly aligning user preferences through satisfaction modeling and reinforcement learning, and using user feedback to trigger rapid system reflection, solving the problems of low expert feedback efficiency and difficulty in user preference modeling in traditional data applications. The retrieval generation system based on this framework has covered 18% of search traffic and is widely used in text, video, image and other search scenarios. The large-scale and recyclable characteristics of multiple user feedback enable the system to quickly adapt to changes in data, products and environment, help the system automatically seek optimization, and accelerate the system's evolution to an ideal state. It has extremely high practical value and market competitiveness.