Zhipu AI launches AutoGLM agent: Input instructions to simulate human operation of mobile phones - AI Articles

Author：Eve Cole Update Time：2025-02-08 21:16:01

Zhipu AI recently released a new product based on GLM technology, AutoGLM, an intelligent body that can simulate human operation of mobile phones and perform various daily tasks. The emergence of AutoGLM marks a major breakthrough in AI in the field of mobile phone applications. It can complete various operations on commonly used applications such as WeChat, Taobao, Ctrip, 12306, and Meituan, greatly improving user efficiency and integrating AI applications into daily life. . Its operating logic is similar to that of humans, without complex processes, and the threshold for use is extremely low.

微信截图_20241026150533.png

AutoGLM can perform a variety of tasks, such as like and commenting on WeChat Moments, purchasing historical order products on Taobao, booking hotels on Ctrip, purchasing train tickets on 12306, ordering takeaways on Meituan, etc. Its application scenarios are not limited to this. In theory, AutoGLM can accomplish anything humans can do on visual electronic devices. The operation logic is similar to that of humans and does not require complex workflow construction.

Currently, users can experience AutoGLM-Web by installing the "Zhipu Qingyan" plug-in, which is a browser assistant that can simulate users visiting web pages, clicking web pages, and automatically complete advanced search, summary and content generation on the website. In addition, AutoGLM has also opened an application for internal testing on Android systems and has carried out in-depth cooperation with mobile phone manufacturers such as Honor.

微信截图_20241026150714.png

AutoGLM's technology is based on Zhipu's self-developed "basic agent decoupling intermediate interface" and "self-evolution online course reinforced learning framework", which solves the ability antagonism, training tasks and data scarcity in large model agent task planning and action execution. , problems such as sparse feedback signals and strategy distribution drift. AutoGLM can continuously improve itself and continuously improve its own performance steadily, similar to people constantly obtaining new skills during their growth.

In terms of technical challenges, AutoGLM solves the problem of insufficient "action execution" and insufficient "task planning". Through the design of the "basic agent decoupling intermediate interface", it decouples the two stages of "task planning" and "action execution" through the natural language intermediate interface, achieving a great improvement in the agent's capabilities. At the same time, AutoGLM adopts the "self-evolution online course reinforcement learning framework" to learn and improve the capabilities of large-model agents in the Web and Phone environments in real online environments.

AutoGLM has achieved significant performance improvements on both Phone Use and Web Browser Use, and surpassed the performance of GPT-4o and Claude-3.5-Sonnet in AndroidLab evaluation benchmarks. In the WebArena-Lite evaluation benchmark, AutoGLM achieved about 200% performance improvement compared to GPT-4o, narrowing the gap in success rate between humans and large model agents in GUI manipulation.

Project address: https://xiao9905.github.io/AutoGLM

With its powerful functions and technological innovation, AutoGLM has demonstrated the huge potential of artificial intelligence in the field of mobile phone operation, bringing more convenience to people's daily lives. Its excellent performance in performance testing also proves its technical strength. In the future, with the continuous advancement of technology, AutoGLM is expected to achieve wider applications and create more value for users.