Thursday, March 12, 2026

Google’s artificial intelligence can now surf the internet, click the buttons and fill out the forms using Gemini 2.5 Computer Apply

Share

Some of the largest suppliers of huge language models (LLM) tried to go beyond multimodal chatbots and expand their models to “agents” who can actually take more actions on behalf of the user on websites. Let’s recall the chatgpt agent Opeli (previously known as the “operator”) and Anthropic Computer Apply, both released in the last two years.

Now Google also joins the same game. Today the search giant The DEEPMIND AI laboratory branch presented a up-to-date, refined and specially trained version of the powerful Gemini 2.5 Pro LLM known as “Using the Gemini 2.5 Pro computer“What maybe Apply a virtual browser to surf the Internet on your behalf, get information, complete forms, and even take action on websites – Everything from the level of a single user text monitor.

“This is just the beginning, but the model’s ability to interact with the network – for example, rewinding, filling out forms and navigating the menu – is An important next step in building general purpose agents ”, he said General Director of Google Sundar Pichai, as part A longer statement on a social networking site, X.

However, this model is not available to consumers directly from Google.

Instead, Google has established cooperation with another company, Browser basefounded by former engineer Twilio Paul Klein at the beginning of 2024which offers a virtual “headless” web browser especially for use by agents and AI applications. (The “Wales” browser is one that does not require a graphic user interface, i.e. GUI, to move on the Internet, although in this and other cases Browserbase actually presents the graphic representation to the user).

Users can demonstrate the new Gemini 2.5 computer skills directly in the browserbase browser Here And even compare him next to older, competitive Openai and Anthropic offers in the new “Browser arena“Launched by the startup (although at the same time next to Gemini you can choose only one additional model).

For creators and programmers AI, it is created as raw, though reserved LLM By Gemini API at Google AI Studio Down quick prototypingand Google Cloud Top -up artificial intelligence Model selector and application platform.

The new offer is based on the company’s capabilities Twins 2.5 ProIssued in March 2025, but since then significantly updated, with particular emphasis on enabling AI agents direct interaction with user interfaces, including browsers and mobile applications.

In general, it seems Gemini 2.5 Computer Use has been designed to allow programmers to create agents who can independently perform interface -based tasks – such as clicking, writing, scrolling, completing forms and navigation behind login screens.

Instead of relying only on API interfaces or structured input data, this model enables artificial intelligence systems to interact with software in visual and functional terms, just like a man would do.

Short practical tests of the user

After my short, unscientific preliminary practical tests on the Browserbase, Gemini 2.5 Computer Use website successfully went to the official Taylor Swift website in accordance with the instructions and provided me with a summary of what was sold or promoted at the top – a special release of her latest album “The Life of A Showgirl”.

In another test, I asked Gemini 2.5 Computer Use to find high -rated and well -reviewed sun lamps on Amazon, which I could put in my yard, and with joy I watched how it successfully performed the CAPTCHA function in Google search engine, designed to eliminate users other than people (“Choose all boxes with a motorcycle”). It did it in a few seconds.

However, when he arrived there, he hung up and was unable to complete the task, despite the display of the “competing task” message.

I should also notice that although the Anthropic CHATGPT agent from Openai and Claude can create and edit local files – such as PowerPoint presentations, spreadsheets or text documents – on behalf of the user, Gemini 2.5 Computer Apply does not currently offer direct access to the file system or native file creation possibilities.

Instead, it is intended for controlling and moving on the user’s internet and mobile interfaces through such activities as clicking, writing and rewinding. His output data is restricted to suggested user interface activities or chatbot text answers; Any structured output data, such as a document or file, must be served by a programmer separately, often using non -standard code or integration of other companies.

Performance tests

Google claims that Gemini 2.5 Computer Apply showed leading results in many comparative tests of interface controls, especially in comparison with other main artificial intelligence systems, including Claude Sonnet and OpenAI agent models.

The grades were carried out using the Browserbase browser and Google’s own tests.

Here are some of the most crucial information:

  • Online -ind2web (browser base): 65.7% for Gemini 2.5 compared to 61.0% (Claude Sonnet 4) and 44.3% (Openai agent)

  • Webvoyager (browser base): 79.9% for Gemini 2.5 compared to 69.4% (Claude Sonnet 4) and 61.0% (Openai agent)

  • The world of Android (Deepmind): 69.7% for 2.5 twins compared to 62.1% (Claude Sonnet 4); It was impossible to measure the OPENAI model due to the lack of access

  • Osświat: Currently, it is not supported by Gemini 2.5; The result of the best competitor was 61.4%

In addition to high accuracy, Google reports that the model works with smaller delays than other browser control solutions, which is a key factor in cases of production applications, such as automation and testing of the user interface.

How it works

Agents using the computer exploit model operate in the interaction loop. They receive:

  • Prompt for a user task

  • Interface screenshot

  • History of past activities

The model analyzes this input data and generates the recommended user interface action, such as clicking the button or entering the text in the field.

If necessary, he may ask the end user to confirm the performance of more risky tasks, such as making the purchase.

After completing the action, the interface is updated and a up-to-date screenshot is sent to the model. The loop continues until the task is completed or stopped due to a mistake or safety decision.

The model uses a specialized so -called tool computer_useAnd it can be integrated with non -standard environments using tools such as Playwright or through Browser base Demonstration sandbox.

Cases of exploit and acceptance

According to Google, internal and external teams have already started to exploit this model in several domains:

  • Google payment platform team He reports that Gemini 2.5 Computer Apply effectively restores over 60% of unsuccessful tests, reducing the main source of engineering inefficiency.

  • CaresThe external platform of AI agents said that this model exceeds other models in the case of convoluted data analysis tasks, increasing efficiency by up to 18% in the most tough ratings.

  • Poke.comProactive supplier of AI assistants, noticed that the Gemini model works often 50% faster than competitive solutions during interface interactions.

The model is also used in its own Google activities for product development, including Mariner project, Test agent FirebaseAND AI mode in searching.

Security measures

Because this model directly controls software interfaces, Google puts emphasis on a multi -layered safety approach:

  • AND Safety service at every step Checks every proposed action before performing.

  • Developers can define Instructions at the system level block or require confirmation of specific actions.

  • The model contains built -in protection that avoids activities that can threaten safety or violate Google rules regarding prohibited exploit.

For example, if the model encounters a captch, it generates the action involving clicking the selection field, but will mark them as a user requiring confirmation, which will ensure that the system will not work without human supervision.

Technical possibilities

The model supports a wide range of built -in user interface actions, such as:

  • click_at, type_text_at, scroll_document, drag_and_dropand not only

  • You can add user -defined functions to extend its range to mobile or non -standard environments

  • The screen coordinates are normalized (scale 0-1000) and when performing transformed back to dimensions in pixels

Accepts Image and text entrances and outputs text answers Or calling the function to perform tasks. Recommended screen resolution ensuring optimal results 1440×900Although it can work with other sizes.

API prices remain almost identical to Gemini 2.5 Pro

Prices for Gemini 2.5 Using a computer Strictly corresponds to the standard Gemini 2.5 Pro model. In both cases, the same structure of settlements for token applies: the price of input tokens is $ 1.25 for a million tokens for prompts below 200,000 tokens and $ 2.50 per million tokens in the case of longer hints.

Output tokens have a similar division and their price is $ 10.00 for a million for smaller answers and $ 15.00 For larger ones.

The models differ in availability and additional functions.

Gemini 2.5 Pro contains a free layer which enables programmers to exploit the model for free, without publishing a clear limit of tokens, although the exploit may be subject to speed or restrictions depending on the platform (e.g. Google AI Studio).

This free access includes both input and output tokens. When developers exceed the assigned limit or go to a paid layer, standard prices for token are required.

For contrast, Using the Gemini 2.5 computer is only available in a paid layer. Is no free access Currently offered for this model, and each exploit from the beginning is associated with fees based on tokens.

In functional terms, Gemini 2.5 Pro supports optional functions, such as contextual buffering (from USD 0.31 per million tokens) and grounding with Google search engine (free of charge up to USD 1,500 per day, then 35 USD for 1000 additional requests). They are not currently available for computer exploit.

Another difference is to operate data: The results of the computer exploit model are not used to improve Google products in a paid layer, while using Gemini 2.5 Pro in a free layer contributes to improving the model, unless you clearly give up this option.

In general, programmers can expect similar costs based on tokens for both models, but when making a decision which model corresponds to their needs, they should take into account the access to the layers, the possibilities and principles of data exploit.

Latest Posts

More News