Google says Gemini AI makes its robots smarter

Share

Google is training its robots with Gemini AI so they can better navigate and perform tasks, DeepMind’s robotics team explained in new research paper How leveraging Gemini 1.5 Pro’s long context window — which determines how much information the AI ​​model can process — allows users to more easily interact with RT-2 robots using natural language instructions.

It works by filming a video tour of a designated area, such as a home or office, with researchers using Gemini 1.5 Pro to have the robot “watch” the video to learn about its surroundings. The robot can then make commands based on what it observes, using verbal and/or visual data — such as directing users to a power outlet after showing them their phone and asking, “Where can I charge it?” DeepMind says its Gemini-powered robot has achieved a 90 percent success rate in more than 50 user instructions that have been given across an operational area of ​​more than 9,000 square feet.

The researchers also found “preliminary evidence” that Gemini 1.5 Pro enabled its droids to plan how to complete instructions beyond just navigation. For example, when a user with a lot of Coca-Cola cans on their desk asks the droid if their favorite drink is available, the team said Gemini “knows that the robot should go to the refrigerator, check for Cokes, and then return to the user to provide the result.” DeepMind says it plans to investigate these results further.

The video demonstrations Google shares are impressive, though the obvious cuts after the droid acknowledges each request hide the fact that processing those instructions takes between 10 and 30 seconds, according to the research paper. It may be a while before we start sharing our homes with more advanced, environment-mapping robots, but at least these ones might be able to find our lost keys or wallets.

Latest Posts

More News