Anyone who has ever tried to pack a family-sized bag into a sedan-sized trunk knows that it’s a tough problem. Robots also struggle with dense packing.
Solving the problem of packing with a robot involves meeting many constraints, such as arranging luggage so that suitcases do not fall out of the trunk, avoiding placing ponderous objects on top of lighter ones, and preventing the robot arm from colliding with a car bumper.
Some time-honored methods solve this problem sequentially, by guessing a partial solution that satisfies one constraint at a time and then checking to see if other constraints have been violated. With a long sequence of actions to perform and a pile of luggage to pack, this process can be impractically time-consuming.
MIT researchers used a form of generative AI called diffusion modeling to solve this problem more efficiently. Their method uses a set of machine learning models, each trained to represent one specific type of constraint. These models are combined to generate global solutions to the packing problem, taking into account all the constraints at once.
Their method was able to generate competent solutions faster than other techniques and produced a larger number of successful solutions in the same time. Importantly, their technique was also able to solve problems with modern combinations of constraints and more objects that the models had not seen during training.
Because of this generalization, their technique could be used to teach robots how to understand and satisfy general constraints of packaging problems, such as the importance of avoiding collisions or the desire for one object to be next to another. Robots trained in this way could be used for a wide range of sophisticated tasks in a variety of environments, from fulfilling orders in a warehouse to organizing a bookshelf in someone’s home.
“My vision is to make robots perform more complex tasks that have many geometric constraints and more continuous decisions that need to be made—these are the kinds of problems that service robots face in our unstructured and diverse human environment. With the powerful tool of compositional diffusion models, we can now solve these more complex problems and get great generalization results,” says Zhutian Yang, a graduate student in electrical engineering and computer science and lead author article about this new machine learning technique.
Co-authors include MIT graduate students Jiayuan Mao and Yilun Du; Jiajun Wu, an assistant professor of computer science at Stanford University; Joshua B. Tenenbaum, a professor in the Department of Brain and Cognitive Sciences at MIT and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL); Tomás Lozano-Pérez, a professor of computer science and engineering at MIT and a member of CSAIL; and senior author Leslie Kaelbling, a professor of computer science and engineering at MIT at Panasonic and a member of CSAIL. The research will be presented at a conference on robot learning.
Complications related to restrictions
The problem of continuous constraint satisfaction is particularly tough for robots. These problems arise in multi-step robotic manipulation tasks, such as packing objects into a box or setting a table. They often involve meeting a series of constraints, including geometric constraints, such as avoiding collisions between the robot arm and the environment; physical constraints, such as stacking objects to be stable; and qualitative constraints, such as placing a spoon to the right of a knife.
There may be many constraints and their nature may vary depending on the problem and environment, the geometry of the objects and the requirements defined by man.
To effectively solve these problems, MIT researchers developed a machine learning technique called Diffusion-CCSPDiffusion models learn to generate modern data samples that resemble the samples in the training data set by iteratively refining their output.
To do this, diffusion models learn a procedure for making miniature improvements to a potential solution. Then, to solve the problem, they start with a random, very bad solution and then gradually improve it.
For example, imagine randomly placing plates and cutlery on a simulated table, allowing them to physically overlap. Collision-free constraints between objects will cause them to repel each other, while qualitative constraints will move the plate toward the center, align the salad fork with the dinner fork, and so on.
Diffusion models are well suited to this type of continuous constraint satisfaction problem because the influences from multiple models on the position of a single object can be composed to encourage all constraints to be satisfied, Yang explains. By starting with a random initial guess each time, the models can yield a diverse set of good solutions.
Working together
In the case of Diffusion-CCSP, the researchers wanted to capture the interrelationships of constraints. For example, in packaging, one constraint might require that an object be next to another object, while a second constraint might specify where one of those objects must be located.
Diffusion-CCSP learns a family of diffusion models, one for each constraint type. The models are trained together, so they share some knowledge, e.g. the geometry of the objects to be packed.
The models then work together to find solutions (in this case, locations to place objects) that collectively satisfy the constraints.
“We don’t always find the solution on the first try. But if you keep refining the solution and there’s some violation, that should lead you to a better solution. You get guidance when you make a mistake,” he says.
Training individual models for each type of constraint and then combining them to make predictions significantly reduces the amount of data required for training compared to other approaches.
But training these models still requires a lot of data that shows solved problems. Humans would have to solve each problem using time-honored, tardy methods, making the cost of generating such data prohibitive, Yang says.
Instead, the researchers reversed the process by first coming up with solutions. They used speedy algorithms to generate segmented boxes and fit a diverse set of 3D objects to each segment, ensuring tight packing, stable poses, and collision-free solutions.
“This process makes data generation almost instantaneous in simulation. We can generate tens of thousands of environments where we know the problems are solvable,” he says.
Diffusion models trained using this data work together to determine where objects should be placed by the robotic gripper that performs the packing task while satisfying all constraints.
They conducted feasibility studies and then demonstrated Diffusion-CCSP with a real robot solving a range of challenging problems, including fitting 2D triangles into a box, packing 2D shapes while respecting spatial relation constraints, stacking 3D objects while respecting stability constraints, and packing 3D objects using a robot arm.
Their method outperformed other techniques in many experiments, generating a larger number of competent solutions that were both stable and collision-free.
In the future, Yang and her colleagues want to test Diffusion-CCSP in more complicated situations, such as robots that can navigate a room. They also want to enable Diffusion-CCSP to solve problems in different domains without having to retrain on modern data.
“Diffusion-CCSP is a machine learning solution that builds on existing, powerful generative models,” says Danfei Xu, an assistant professor in the School of Interactive Computing at the Georgia Institute of Technology and a research scientist at NVIDIA AI, who was not involved in this work. “It can quickly generate solutions that simultaneously satisfy multiple constraints by composing known individual constraint models. Although still in its early stages, ongoing advances in this approach show promise for enabling more efficient, safe, and reliable autonomous systems in a variety of applications.”
This research was funded in part by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, the MIT-IBM Watson AI Lab, the MIT Quest for Intelligence, the Center for Brains, Minds, and Machines, the Boston Dynamics Artificial Intelligence Institute, and Stanford Institute for Human-Centered Artificial Intelligence, Analog Devices, JPMorgan Chase and Co. and Salesforce.