Friday, June 6, 2025

The recent system allows robots to solve problems with manipulation in seconds

Share

Ready for a long -awaited summer holiday? First of all, you need to pack all the elements required to travel to the suitcase, making sure that everything fits safely without crunchy crushing.

Because people have robust visual and geometric skills, this is usually a straightforward problem, even if it can take a little finish to push everything.

However, for a robot it is an extremely intricate planning challenge, which requires simultaneous thinking about many activities, restrictions and mechanical capabilities. Finding an effective solution can take the robot for a long time – if you can think of it.

Scientists from MIT and NVIDIA Research have developed a recent algorithm that dramatically accelerates the process of planning the robot. Their approach allows the robot to “think forward”, at the same time assessing thousands of possible solutions, and then improving the best to meet the restrictions of the robot and its environment.

Instead of testing every potential action individually, like many existing approaches, this recent method includes thousands of activities at the same time, solving problems with multi -stage manipulation in a few seconds.

Scientists apply the huge computing power of specialized processors called graphics processing units (GPU) to enable this speed.

In the factory or warehouse, their technique can enable robots to quickly determine how to manipulate and tightly pack objects that have different shapes and sizes without damage, turning anything or colliding with obstacles, even in a narrow space.

“This would be very helpful in industrial conditions, in which time really matters and you need to find an effective solution as soon as possible. If your algorithm takes a few minutes to find a plan, unlike seconds, it costs business money,” says Mit William SM ’23, the main author of the author of the author of the paper about this technique.

He is joined by the article Caelan Garrett ’15, Meng ’15, doctor ’21, senior scientist at Nvidia Research; Nishanth Kumar, MIT graduate; Ankit Goyal, scientists from NVIDIA; Tucker Hermans, a scientist from Nvidia Research and associate professor at the University of Utah; Leslie Pack Kaelbling, professor of computer science and engineering in MIT and member of the IT laboratory and artificial intelligence (CSIL); Tomás Lozano-Pérez, professor of computer science and engineering MIT and a member of CSAIL; and Fabio Ramos, chief scientist from NVIDIA and professor at the University of Sydney. The research will be presented at the Robotics: Science and Systems conference.

Planning in parallel

The scientists’ algorithm has been designed for so -called task and movement planning (TAMP). The purpose of the TAMP algorithm is to develop a job plan for a robot, which is a high -level sequence, along with a movement plan, which contains low -level parameters, such as common positions and gripper orientation that complement this high level plan.

To create a plan to pack elements in a box, the robot must reason many variables, such as the final orientation of packed objects to match each other, as well as how to pick them up and manipulate them with the lend a hand of hand and grabbing.

He must do this by determining how to avoid collisions and achieve any restrictions set by the user, such as a specific order to pack the elements.

With so many potential sequences of activities, sampling possible solutions randomly and trying one by one can take a long time.

“This is a very large search space and many actions that the robot does in this space does not achieve anything productive,” adds Garrett.

Instead, the algorithm of scientists called Cutamp, which is accelerated using a parallel computing platform called Miracle, simulates and stunned thousands of solutions in parallel. He does this by combining two techniques, taking samples and optimization.

The sampling involves choosing a solution to try. But instead of randomly sampling solutions, Cutamp limits the range of potential solutions for people who will most likely meet the restrictions on the problem. This modified sampling procedure allows Cutamp to study potential solutions while narrowing the sampling space.

“After combining the results of these samples, we get a much better starting point than in the case of random sampling. This will ensure that you find solutions during optimization,” says Shen.

After generating this set of samples, it performs a parallel optimization procedure, which calculates the cost, which corresponds to how well each sample avoids collision and meets the restrictions of the robot movement, as well as all the goals defined by the user.

He updates samples in parallel, selects the best candidates and repeats the process until he narrows them to a successful solution.

Apply of accelerated calculations

Scientists apply GPU, specialized processors that are much stronger for parallel calculations and loads related to general employees to boost the number of solutions that they can try and optimize at the same time. This maximized the performance of their algorithm.

“Using the GPU, the calculation cost of the optimization of one solution is the same as optimizing hundreds or thousands of solutions,” explains Shen.

When they tested their approach to challenges related to packaging similar to Tetris in the simulation, Cutamp took only a few seconds to find a successful, collision -free plans that can take a sequential approach to planning much longer.

And after being implemented on a real robotic arm, the algorithm always found a solution in less than 30 seconds.

The system works on robots and has been tested on a robotic arm in myth and humanoid robot in Nvidia. Because Cutamp is not an machine learning algorithm, it does not require training data that could enable uncomplicated implementation in many situations.

“You can give him a new problem that solves him,” says Garrett.

The algorithm can be generalized in situations that go beyond packing, such as the robot with the lend a hand of tools. The user can enable different types of skills to the system to automatically expand the robot’s capabilities.

In the future, scientists want Use models of large languages ​​and vision language models At Cutamp, it allows you to work and make a plan that will achieve specific goals based on voice commands from the user.

These works are partly supported by the National Science Foundation (NSF), Air Force Office for Scientific Research, Office of Naval Research, Mit Quest for Intelligence, Nvidia and the Institute of Robotics and Artificial Intelligence.

Latest Posts

More News