In the field of artificial intelligence, a revolutionary breakthrough is reshaping the way we interact with computers. Kunlun Wanwei has joined hands with top institutions such as Beijing Zhiyuan Artificial Intelligence Research Institute, Singapore Nanyang Technological University and Peking University to jointly launch a general-purpose computer control framework called Cradle. This innovative AI framework breaks through the limitations of traditional agents, allowing them to directly manipulate keyboards and mouse like humans, interact seamlessly with various open or closed source software without relying on any internal APIs. Cradle is unique in that it is the first AI framework that can simultaneously control multiple commercial games and operate various software applications. Its research results, project data and source code have been opened to the public, injecting the development of the AI field. New vitality.
Cradle is amazing in practical applications, showing its superior capabilities in many different types of games. From completing a 40-minute main mission in Red Dead 2 to meticulously tending farms and shopping in Stardew Valley; from building a town with thousands of people in City Skyline , to the complex bargaining with clients in Pawnshop Life 2, Cradle has shown amazing adaptability. Not only that, it can also be proficient in daily office software such as Chrome, Outlook, Feishu, etc., and can even perform professional photo editing and video editing, truly becoming an all-round AI assistant.
Cradle's success is thanks to its exquisite system architecture, which consists of six core components: information collection, self-reflection, task inference, skills management, action planning and memory modules. By cleverly encapsulating and abstracting the original input and output, Cradle enables natural interaction with the computer. It uses the video image displayed on the screen as the main input source, extracts text and visual information from it for decision making, and outputs signals that control the keyboard and mouse. It is particularly worth mentioning that Cradle's decision-making and reasoning module can spontaneously interact with the software and complete tasks, and operate through reflection on the past, summarizing the present and planning the future, showing a human-like thinking model.
In actual testing, Cradle's performance fully proves its versatility. It can complete complex tasks in games with completely different styles and operating methods, and can also perform various tasks with ease in common software, such as downloading academic papers, sending emails, image processing, video editing, etc. What's even more remarkable is that Cradle even beats the baseline method using truth tags in the challenging benchmark OSWorld test, demonstrating its strong learning ability and adaptability.
The launch of Cradle marks an important step in the development of GCC Agents. It not only promotes the development of unified input and output interfaces, lays a solid foundation for future interaction and self-improvement of agents in different environments, but also takes a critical step to achieving the goal of general artificial intelligence (AGI). The birth of this innovative framework will likely revolutionize the way we interact with computers and open a new era of human-computer collaboration.
Project homepage: https://baai-agents.github.io/Cradle
Code link: https://github.com/BAAI-Agents/Cradle