RoboCat: a self-improving robotic agent

Tests

Published: June 20, 2023
Author’s: RoboCat Team

The foundation’s modern agent learns how to utilize various robotic arms, solves tasks based on just 100 demonstrations, and improves based on self-generated data.

Robots are quickly becoming part of our everyday lives, but they are often programmed only to do specific tasks well. While leveraging the latest advances in artificial intelligence can lead to robots that can assist in many other ways, progress in building general-purpose robots has been slower in part because of the time it takes to collect real-world training data.

Our latest publication introduces a self-improving AI agent for robotics, RoboCat, which learns to perform a variety of tasks on different arms and then autonomously generates modern training data to improve its technique.

Previous research has examined how to develop robots that can learn to multitask on a large scale AND combine understanding of language models with real-world capabilities helper robot. RoboCat is the first agent that solves and adapts to multiple tasks using a variety of real robots.

RoboCat learns much faster than other newfangled models. It can take on a modern task with just 100 demos because it draws from a gigantic and diverse dataset. This capability will assist accelerate robotics research because it reduces the need for human training and is an vital step towards creating a general-purpose robot.

How RoboCat improves itself

RoboCat is based on our multimodal Gato model (Spanish for “cat”), which can process language, images and actions in both simulated and physical environments. We combined the Gato architecture with a gigantic training data set consisting of sequences of images and actions of various robotic arms solving hundreds of different tasks.

After this first round of training, we launched RoboCat in a “self-improvement” training cycle with a set of previously unseen tasks. Learning each modern task took place in five stages:

Collect 100-1000 demonstrations of a modern task or robot using a human-controlled robotic arm.
Adapt RoboCat to this modern task/arm by creating a specialized spin-off agent.
The secondary agent practices on this modern task/arm an average of 10,000 times, generating more training data.
Incorporate demo and self-generated data into your existing RoboCat training dataset.
Train a modern version of RoboCat on a modern training dataset.

RoboCat’s training cycle, enhanced by its ability to autonomously generate additional training data.

The combination of all this training means that the latest RoboCat is based on a dataset of millions of trajectories from both real and simulated robotic arms, including self-generated data. We used four different types of robots and multiple robotic arms to collect vision data representing the tasks RoboCat would be trained to perform.

RoboCat learns from a variety of types of training data and tasks: videos of a real robot arm picking up gears, a simulated arm stacking blocks, and RoboCat using a robot arm to pick up a cucumber.

Learning to utilize modern robotic arms and solve more complicated tasks

Through a variety of training, RoboCat learned to operate various robotic arms in a matter of hours. Although he was trained on arms with two-pronged grippers, he was able to adapt to a more complicated arm with a three-pronged gripper and twice as many controllable commands.

Left: The modern RoboCat robot arm has learned to control itself
Normal: Video of RoboCat using its gear lifting arm

After watching 1,000 human-controlled demonstrations, collected in just a few hours, RoboCat was able to control this modern arm skillfully enough to successfully lift the gears 86% of the time. With the same level of demonstration, it could adapt to solving tasks that combine precision and understanding, such as removing the right fruit from a bowl and solving a shape-matching puzzle, which are necessary for more complicated control.

Sample tasks that RoboCat is able to adapt to solve after 500-1000 demonstrations.

Self-improving specialist

RoboCat has a virtuous training cycle: the more modern tasks it learns, the better it becomes at learning additional modern tasks. The initial version of RoboCat succeeded only 36% of the time on previously unseen tasks, after learning from 500 demonstrations of each task. However, the latest RoboCat, which was trained on a greater variety of tasks, more than doubled this success rate for the same tasks.

The gigantic performance difference between the initial RoboCat (one round of training) compared to the final version (extensive and varied training, including self-improvement) after both versions was refined on 500 demonstrations of previously unseen tasks.

These improvements resulted from RoboCat’s increasing range of experience, just as people develop a more diverse range of skills as they learn more in a given field. RoboCat’s ability to learn skills on its own and rapidly self-improve, especially when applied to a variety of robotic devices, will assist pave the way for a modern generation of more helpful general-purpose robots.

Categories

RoboCat: a self-improving robotic agent

How RoboCat improves itself

Learning to utilize modern robotic arms and solve more complicated tasks

Self-improving specialist

Scientists think they have found a brain region that regulates conscious perception

Five and overheating, most humanoid robots do not end the half -marathon in Beijing

Chatbot from customer service AI submitted the company’s rules – and created a mess

Zoom launches agency mobile AII messages for first line staff

When summer is approaching, federal cuts threaten the program to keep sensitive people in the frigid

More News

Start construction with Flash Gemini 2.5

Dolphingemma: How Google AI helps decoding dolphins communication

Generate movies in twins and beat with veo 2

Assessment of potential threats of advanced cyber security AI

Scientists think they have found a brain region that regulates conscious perception

Five and overheating, most humanoid robots do not end the half -marathon in Beijing

Chatbot from customer service AI submitted the company’s rules – and created a mess