If someone advises you to “know your limits,” they are probably suggesting that you exercise in moderation. However, for a robot, the motto represents learning constraints, that is, the limitations of a specific task in a machine environment to perform the work safely and correctly.
For example, imagine asking a robot to neat the kitchen when it doesn’t understand the physics of its surroundings. How can a machine generate a practical, multi-step plan to keep a room spotless? Enormous language models (LLM) can come close, but if the model is trained solely on text, it will likely miss key details about the robot’s physical limitations, such as the distance the robot can reach or whether there are nearby obstacles that need to be avoided. Stick to just LLMs and you’ll probably end up cleaning pasta stains from your floorboards.
To assist robots perform these open-ended tasks, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) used computer vision models to see what was around the machine and model its limitations. The team’s strategy involves the LLM sketching out a plan, which is checked in a simulator to ensure it is sheltered and realistic. If this sequence of actions is not feasible, the language model will generate a recent plan until it reaches one that the robot can execute.
This trial-and-error method, which researchers call “Planning for Robots via Code for Continuous Constraint Satisfaction” (PRoC3S), tests long-term plans to make sure they meet all constraints and enables the robot to perform tasks as diverse as writing individual letters, drawing stars and sorting and placing blocks in different positions. In the future, PROC3S could assist robots perform more convoluted tasks in lively environments such as homes, where they may be asked to perform a general action consisting of multiple steps (e.g. “make me breakfast”).
“LLM and classic robotics systems, such as task and motion planners, cannot perform these types of tasks on their own, but their synergy enables open problem solving,” says graduate student Nishanth Kumar SM ’24, co-lead author of the recent paper on PROC3S. “We create an ongoing simulation of what is happening around the robot and try many possible action plans. Vision models assist us create a highly realistic digital world that enables the robot to consider feasible actions at every step of a long-term plan.
The team’s work was presented last month in a paper presented at the Conference on Robotic Learning (CoRL) in Munich, Germany.
Teaching the robot its limits for open tasks
WITH CSAIL
The researchers’ method uses LLM pre-trained on the basis of text from the Internet. Before asking PROC3S to perform the task, the team provided its language model with an example task (e.g., drawing a square) related to the target task (drawing a star). The sample task includes a description of the activity, a long-term plan, and relevant details about the robot’s environment.
But how did these plans work out in practice? In simulations, PROC3S successfully drew stars and letters eight out of 10 times. It can also arrange digital blocks into pyramids and lines and precisely place items such as fruit on a plate. In each of these digital demonstrations, the CSAIL method performed the desired task more consistently than comparable methods, e.g “LLM3” AND “Code as Rules”.
CSAIL engineers then took their approach to the real world. As part of their method, plans were developed and made on a robot arm, teaching it to arrange blocks in straight lines. PROC3S also allowed the machine to place blue and red blocks into matching bowls and move all objects near the center of the table.
Kumar and co-author Aidan Curtis SM ’23, who is also a graduate student working at CSAIL, say these findings point to how LLM can develop safer plans that people can be sure will work in practice. Scientists imagine a home robot that can be asked a more general request (e.g., “bring me some chips”) and reliably determine the specific steps needed to complete it. PROC3S can assist your robot test its plans in an identical digital environment to find a working course of action and, more importantly, provide you with a tasty snack.
In future work, researchers aim to improve the results using a more advanced physics simulator and expand the work to more convoluted tasks with longer time horizons using more scalable data mining techniques. Moreover, they plan to apply PRoC3S to mobile robots, such as the quadruped, for tasks that include walking and scanning the environment.
“Using basic models like ChatGPT to control robot actions can lead to dangerous or abnormal behavior due to hallucinations,” says Eric Rosen, a researcher at the AI Institute who is not involved in the research. “PRoC3S solves this problem by using foundational models to drive high-level tasks, while applying artificial intelligence techniques that directly analyze the world to ensure verifiable, safe and correct actions. This combination of planning-driven and data-driven approaches may be the key to developing robots capable of understanding and reliably performing a wider range of tasks than is currently possible.”
Kumar and Curtis’ co-authors are also CSAIL collaborators: MIT undergraduate student Jing Cao and MIT Department of Electrical Engineering and Computer Science professors Leslie Pack Kaelbling and Tomás Lozano-Pérez. Their work was supported in part by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, the Office of Army Research, the MIT Quest for Intelligence, and the AI Institute.