Anthropic’s latest AI update can operate a computer on its own

Share

Anthropic’s latest Claude 3.5 Sonnet AI model has a fresh feature in public beta that makes this possible control your computer by looking at the screenmoving the cursor, clicking buttons and entering text. A fresh feature called “Use the Computer” is now available in the API, allowing developers to direct Claude to work on the computer just like a human would, as shown on a Mac in the video below.

Microsoft’s Copilot Vision and OpenAI desktop app for ChatGPT have shown what their AI tools can do based on a computer screen view, and Google has similar capabilities in its Gemini app on Android phones. But they haven’t yet moved on to the next step, which is making tools widely available that are ready to click and do these tasks for you. Rabbit promised similar capabilities for its R1, which it has yet to deliver.

Anthropic warns that the computer experience is still experimental and may be “cumbersome and error-prone.” The company says: “We’re releasing desktop capabilities early to get developer feedback, and we expect these capabilities to improve rapidly over time.”

There are many actions that people routinely perform on computers (dragging, zooming, etc.) that Claude cannot yet perform. The “flipbook” nature of Claude’s screen view – taking screenshots and stitching them together rather than observing a more detailed video stream – means he may miss short-term activities or notifications.

Additionally, this version of Claude has apparently been told to stay off social media, putting in place “measures to monitor when Claude is asked to engage in election-related activities, as well as systems designed to steer Claude away from activities such as generating and posting content on social media, registering internet domains or interacting with government websites.”

Meanwhile, Anthropic says the fresh Claude 3.5 Sonnet model has improvements across multiple benchmarks and is being offered to customers at the same price and speed as its predecessor:

Updated Claudius 3.5 Sonnet shows far-reaching improvement over industry benchmarks, with particularly strong growth in agent-based coding and tooling tasks. While encoding, it improves performance Verified on the SWE bench from 33.4% to 49.0%, scoring higher than all publicly available models – including reasoning models such as OpenAI o1-preview and specialized systems designed for agentic coding. It also improves performance TAU benchtask of using an agent tool, from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the more complex airline domain.

Latest Posts

More News