Let’s say you want to train a robot to understand how to employ tools, and then it can quickly learn to repair your house using a hammer, a wrench, and a screwdriver. To do this would require a huge amount of data showing tool usage.
Existing robot datasets vary greatly in modalities, with some containing color images and others consisting of, for example, touch prints. Data can also be collected in various domains, such as simulations or human demonstrations. Each dataset can capture a unique task and environment.
It is complex to effectively integrate data from so many sources into a single machine learning model, so many methods employ only one type of data to train the robot. However, robots trained in this way, with relatively little data regarding a specific task, are often unable to perform modern tasks in an unfamiliar environment.
In an attempt to train better multi-functional robots, MIT researchers have developed a technique for combining multiple data sources across domains, modalities and tasks using a type of generative artificial intelligence called diffusion models.
They train a separate diffusion model to learn the strategy or policy for performing one task using one specific set of data. They then combine the policies learned by the diffusion models into an overall policy that enables the robot to perform multiple tasks in different settings.
In real-world simulations and experiments, this training approach allowed the robot to perform multiple tool tasks and adapt to modern tasks it had not seen during training. This method, known as policy composition (PoCo), led to a 20 percent improvement in task performance compared to baseline techniques.
“Addressing the heterogeneity of robotic data sets is like dealing with the chicken egg problem. If we want to employ a lot of data to train general robot principles, we first need deployable robots to get all that data. I think that using all available heterogeneous data, similar to what scientists have done with ChatGPT, is an essential step in the field of robotics,” says Lirui Wang, an electrical engineering and computer science (EECS) graduate student and lead author of the book article on PoCo.
Wang’s co-authors include Jialiang Zhao, a mechanical engineering graduate; Yilun Du, EECS graduate; Edward Adelson, John and Dorothy Wilson Professor of Vision Sciences in the Department of Brain and Cognitive Sciences and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); and senior author Russ Tedrake, Toyota Professor of EECS, Aeronautics and Astronautics, and Mechanical Engineering and a member of CSAIL. The research results will be presented at the Robotics: Science and Systems conference.
Combining different data sets
The datasets used to learn robot policies are typically small and focused on one specific task and environment, such as packing items into boxes in a warehouse.
“Each robotic warehouse generates terabytes of data, but it only belongs to the specific robot installation working on those packages. This is not ideal if you want to use all this data to train a general machine,” Wang says.
They represent each policy using a type of generative AI model called a diffusion model. Diffusion models, often used to generate images, learn to create new data samples that resemble the samples in the training dataset by iteratively refining their results.
However, instead of teaching the diffusion model to generate images, researchers teach it to generate robot trajectories. They do this by adding noise to the trajectories in the training dataset. The diffusion model gradually removes the noise and refines its output to obtain a trajectory.
This technique, known as Distribution policy, was previously introduced by researchers at MIT, Columbia University and the Toyota Research Institute. PoCo builds on the work of the Dissemination Policy.
The team trains each diffusion model using a different type of dataset, such as a human video demonstration and a second teleoperated robotic arm.
Researchers then perform a weighted combination of the individual policies learned by all diffusion models, iteratively refining the results so that the combined policy meets the goals of each individual policy.
Greater than the sum of its parts
“One of the advantages of this approach is that we can combine policies to get the best of both worlds. For example, a policy trained on real-world data may allow for greater dexterity, while a policy trained on simulation may allow for greater generalization,” Wang says.
Since the policies are trained separately, you can mix and match dissemination policies to achieve better results for a specific task. The user can also add data in a modern modality or domain by training an additional Distribution Policy with this dataset, rather than starting the entire process from scratch.
The researchers tested PoCo in simulation and on real robotic arms that performed various tool-related tasks, such as hammering a nail with a hammer and turning an object with a spatula. PoCo led to a 20 percent improvement in task performance compared to baseline methods.
“What was striking was that when we finished the tuning and visualized it, we could clearly see that the composed trajectory looked much better than either of them alone,” Wang says.
In the future, researchers want to apply this technique to long-term tasks in which the robot picks up one tool, uses it, and then switches to another. They also want to incorporate larger robotics datasets to improve efficiency.
“For robotics to be successful, we will need all three types of data: internet data, simulation data, and real robot data. How to combine them effectively will be the million dollar question. PoCo is a solid step on the right track,” says Jim Fan, a senior scientist at NVIDIA and leader of the AI Agents Initiative, who was not involved in the work.
This research is funded in part by Amazon, the Singapore Defense Science and Technology Agency, the US National Science Foundation and the Toyota Research Institute.