Anthropic has upgraded the Claude 3.5 series models, allowing you to operate the computer like a human!

Author：Eve Cole Update Time：2024-12-09 17:48:01

The editor of Downcodes learned that the artificial intelligence company Anthropic has upgraded the Claude 3.5 series of models and launched the new Claude 3.5 Sonnet and Claude 3.5 Haiku. The core highlight of this upgrade is that it greatly improves coding capabilities and gives Claude the ability to simulate human operations on computers, marking Anthropic's important step in expanding commercial AI models into comprehensive "AI agents." Sonnet’s score on SWE-bench Verified increased to 49.0%, surpassing all publicly available models, including OpenAI’s o1-preview model. Haiku performs well in multiple intelligent benchmark tests with faster speed and more accurate command following capabilities. This upgrade will bring more powerful AI assistance tools and a more convenient experience to developers and users.

Artificial intelligence company Anthropic announced major upgrades to the Claude 3.5 series of models, including the new Claude 3.5 Sonnet and Claude 3.5 Haiku. The upgraded version claims to allow artificial intelligence to take over your PC, performing basic tasks such as simulating keyboard input and mouse clicks to use any application installed on your computer.

Coding capabilities are significantly improved, surpassing the OpenAI o1-preview model

The new Claude3.5Sonnet has been significantly improved in all aspects, especially in terms of coding capabilities. Its score on SWE-bench Verified increased from 33.4% to 49.0%, surpassing all publicly available models, including OpenAI's o1-preview model.

Additionally, its performance on TAU-bench has improved, especially in the retail and aviation sectors. All this while maintaining the same price and speed as its predecessor.

Customer feedback shows that the upgraded Claude3.5Sonnet has made a qualitative leap in AI coding. For example, GitLab tested this model for DevSecOps tasks and found significant improvements in reasoning capabilities without increasing latency.

Claude3.5Haiku is Claude's next-generation fastest model , outperforming Claude3Opus at the same cost and speed, and performing well on multiple smart benchmarks, especially on encoding tasks. Claude3.5Haiku's low latency and more accurate command following capabilities make it very suitable for the generation of user interface products and personalized experiences.

Operate computers like a human

The newly launched computer usage function is a completely new attempt . Officials say this is not to develop specific tools for Claude, but to teach it general computer skills so that it can use a variety of standard tools and software programs. Developers can use this capability to automate repetitive processes, build and test software, conduct open research, and more.

Of course, Claude's current ability to use computers still needs to be improved. Some simple operations, such as scrolling and dragging, are still challenging for Claude. To ensure safety, officials have also developed a new classifier that can identify whether computer use has caused potential harm.

"We are about to enter a new era where artificial intelligence can leverage all the tools you use as an individual to complete tasks," Jared Kaplan, Anthropic's chief scientific officer, said in an interview. An important step has been taken in expanding commercial AI models from traditional chat frameworks to full-scale "AI agents".

In one demo, Claude was asked to plan a trip for a friend to watch the sunrise over the Golden Gate Bridge. The AI not only opened the web page, but also found a suitable viewing spot on Google and added the itinerary to the calendar app. While this performance is impressive, Wired points out that it doesn't provide some additional information, such as how to get to your destination.

In addition, in another demonstration, Claude was asked to build a simple website. As a result, he successfully created a website using Microsoft's Visual Studio Code and opened the local server for testing. However, along the way, it encountered some minor bugs but successfully fixed the code when prompted.

Claude 3.5 Sonne demonstrated his ability to perform multi-step tasks on different software platforms by retrieving the required information from a customer relationship management system (CRM) to autonomously complete a supplier request form.

The upgraded Claude3.5Sonnet is now available to all users. Starting today, developers can build with computer beta on the Anthropic API, Amazon Bedrock, and Google Cloud’s Vertex AI. The new Claude3.5Haiku will be released later this month.

Official blog: https://www.anthropic.com/news/3-5-models-and-computer-use

Highlight:

Claude3.5Sonnet and Haiku models have been significantly upgraded, and their coding capabilities have been significantly improved.

? The newly launched computer usage function allows Claude to operate the computer like a human, opening up more possibilities.

? Using AI assistants brings safety risks, and Anthropic emphasizes gradual observation and improvement to ensure safety.

All in all, Anthropic's Claude 3.5 series model upgrade demonstrates the rapid development of AI technology and also indicates that AI will play a more powerful role in more fields in the future. Although there are still some challenges, its development prospects are worth looking forward to. The editor of Downcodes will continue to pay attention to the latest developments in this field and bring more exciting reports to readers.