Tsinghua University Intelligent Industry Research Institute (AIR) released the latest AI model AutoDroid-V2 on December 24, 2024, aiming to significantly improve the automation control efficiency of mobile devices. This model breakthroughly adopts a script method based on a small language model instead of relying on a large language model in the cloud. It effectively solves the problems of high traffic consumption and high privacy and security risks in traditional methods, thereby improving user experience while reducing server-side running costs.
Recently, the Tsinghua University Intelligent Industry Research Institute (AIR) released an AI model called AutoDroid-V2 on December 24, 2024, aiming to optimize the automation control capabilities of mobile devices. This model significantly improves the efficiency of users' operations through natural language through the application of small language models.
AutoDroid-V2 adopts a script-based approach, which is different from the traditional approach that relies on large language models (LLM) in the cloud. This innovation enables the device to efficiently execute user instructions and reduce dependence on cloud services, thereby significantly improving privacy and security. At the same time, it also reduces the user-side traffic consumption and server-side operating costs, and promotes the widespread application of mobile devices.
On the background of the project, in recent years, the rise of large language models and visual language models has made it possible to control mobile devices through natural language commands. These technologies provide new ways to solve complex user tasks. However, the traditional "step-by-step GUI agent" approach has problems with high traffic consumption and privacy security risks, making large-scale deployment face obstacles.
The innovation of AutoDroid-V2 is that it can generate multi-step scripts based on user instructions to perform multiple GUI operations at once. This method greatly reduces the query frequency, reduces resource consumption, and can directly generate and execute task scripts on the user device. This model builds application documentation offline, laying the foundation for subsequent script generation.
In the performance test, AutoDroid-V2 conducted a benchmark test of 226 tasks on 23 mobile applications. Compared with previous models, such as AutoDroid and SeeClick, the task completion rate increased by 10.5% to 51.7%. In addition, its input and output token consumption are reduced to 1/43.5 and 1/5.8 respectively, and the model inference latency is reduced to 5.7 to 13.4 times of the original. These results show the efficiency and reliability of AutoDroid-V2 in practical applications.
Highlights:
AutoDroid-V2 is a new AI model launched by Tsinghua University that improves the efficiency of natural language control of mobile devices.
This model reduces dependence on cloud services through small language models and enhances user privacy and security.
Benchmark tests show that AutoDroid-V2 has significant improvements in task completion rate and resource consumption, demonstrating strong application potential.
All in all, AutoDroid-V2 provides a new solution for the automated control of mobile devices with its high efficiency, safety and low cost, showing its huge application prospects. It is worth looking forward to its future development and wider applications. .