Anthropic's "computer use" feature of Claude, launched in October, has attracted attention for its AI agent capabilities. Claude has become the first cutting-edge model that can interact through the same graphical user interface (GUI) as humans. The editor of Downcodes will give you an in-depth understanding of the breakthrough progress of this technology, as well as the challenges it faces and its future development direction.
Since Anthropic launched Claude's "Computer Use" feature in October, the AI agent's capabilities have attracted widespread attention. This feature makes Claude the first cutting-edge model to interact through the same graphical user interface (GUI) as a human.
Claude provides users with a convenient way to automate operations without the need for an API interface by accessing desktop screenshots and completing tasks through keyboard and mouse operations.
In a study conducted by the National University of Singapore's Show Lab, researchers tested Claude on a variety of tasks, including web searches, workflow completion, office productivity and video games. These tasks tested Claude's ability in different scenarios, such as searching for and purchasing items on the web, or extracting information from a website and inserting it into a spreadsheet. Through these tests, the researchers assessed Claude's performance along three dimensions: planning, action, and evaluation.
Claude's performance is impressive when it comes to executing complex tasks. It is the ability to formulate a clear plan, follow it step by step, and evaluate its progress at each step. In addition, it can coordinate between multiple applications, such as copying information web pages into a spreadsheet. In some cases, Claude is even able to review the results at the end of the mission to make sure everything is on target.
However, Claude also makes some simple mistakes that the average user can easily avoid. For example, in one task, it failed to complete the subscription because there was no scrolling down the page to find the corresponding button.
There were also cases where it was clunky when performing obvious tasks, like selecting and replacing text or changing bullets to numbers. Additionally, Claude sometimes does not realize his mistakes or makes incorrect assumptions about why he failed to achieve his goals.
The researchers pointed out that Claude's deficiencies in self-assessment mechanisms may be the cause of these errors, and that the GUI agent framework may need to be improved in the future to add more rigorous self-assessment modules. The results also show that existing GUI agents do not fully replicate the fundamental nuances of how humans use computers.
For businesses, the potential to use simple text to describe automated tasks is enticing, but the technology is not yet ready for large-scale adoption. The model's behavior is erratic, which can lead to unpredictable consequences in sensitive applications. At the same time, performing operations through a human-designed interface is not the fastest way to complete a task.
Before widespread deployment, enterprises also need to be concerned about the security risks posed by entrusting large language models (LLMs) to mice and keyboards. For example, research has shown that network proxies are vulnerable to adversarial attacks that humans can easily ignore. Still, tools like Claude can help product teams explore ideas and iterate on solutions, saving time and money before developing new features or services.
Claude's "Computer Usage" feature demonstrates the huge potential for advancements in AI technology, but also reveals room for improvement in terms of reliability and security. In the future, as technology continues to develop and improve, I believe that AI tools like Claude will better serve humans, improve efficiency, and bring more possibilities.