Hundreds of robots jump back and forth across the floor of a colossal robot warehouse, grabbing items and delivering them to workers for packing and shipping. Such warehouses are increasingly becoming part of the supply chain in many industries, from e-commerce to automotive production.
However, successfully delivering 800 robots to and from their destinations while preventing collisions is no uncomplicated task. This is such a sophisticated problem that even the best pathfinding algorithms have difficulty keeping up with the breakneck pace of e-commerce and manufacturing.
In some ways, these robots are like cars trying to navigate a crowded city center. So a group of MIT researchers who operate artificial intelligence to alleviate traffic congestion have applied ideas from the field to solve the problem.
They built a deep learning model that encodes vital information about the warehouse, including robots, planned paths, tasks and obstacles, and uses it to predict which areas of the warehouse are best offloaded to improve overall efficiency.
Their technique divides warehouse robots into groups so that these smaller groups of robots can be unloaded more quickly using time-honored algorithms used to coordinate robots. Ultimately, their method offloads the robots almost four times faster than the mighty random search method.
In addition to improving warehouse operations, this deep learning approach can be applied to other sophisticated planning tasks, such as designing computer chips or laying pipes in huge buildings.
“We have developed a new neural network architecture that is truly suitable for real-time operations at the scale and complexity of these warehouses. It can code hundreds of robots for their trajectories, origins, destinations, and relationships with other robots, and it can do so in an efficient way by reusing computations across groups of robots,” says Cathy Wu, Gilbert W. Winslow Career Assistant Professor of Development in Engineering Land and Environment (CEE) and member of the Laboratory for Information and Decision Systems (LIDS) and the Institute for Data, Systems and Society (IDSS).
Wu, senior author, among others article about this techniqueJoined by lead author Zhongxia Yan, a graduate student in electrical engineering and computer science. The work will be presented at the International Conference on Learning Representations.
Robotic Tetris
From a bird’s eye view, the floor of an automated e-commerce warehouse looks a bit like a fast-paced game of “Tetris.”
After receiving an order from a customer, the robot goes to the appropriate warehouse area, grabs the shelf with the desired item, and delivers it to the operator, who selects and packs the item. Hundreds of robots do this simultaneously, and if the paths of two robots collide while moving through a huge warehouse, they can collide.
Customary search-based algorithms avoid potential failures by keeping one robot on course and re-planning the trajectory for the other. However, with so many robots and potential collisions, the problem quickly grows exponentially.
“Because the warehouse is online, the robots’ plans are changed approximately every 100 milliseconds. This means that the robot is replanned 10 times every second. Therefore, these operations have to be very fast,” Wu says.
Because time is of the essence during replanning, MIT researchers are using machine learning to focus replanning on the most practical areas of traffic congestion – where there is the greatest potential to reduce overall robot travel times.
Wu and Yan built a neural network architecture that accommodates smaller groups of robots simultaneously. For example, in a warehouse with 800 robots, the network can divide the warehouse space into smaller groups of 40 robots each.
It then predicts which group has the greatest potential to improve the overall solution if a search-based tool was used to coordinate the trajectories of the robots in that group.
It is an iterative process in which a general algorithm selects the most promising group of robots using a neural network, offloads the group using a search-based tool, then selects the next most promising group using the neural network, and so on.
Considering relationships
A neural network can effectively reason about groups of robots because it captures the sophisticated relationships that exist between individual robots. For example, even though one robot may initially be far from the other, their paths may cross during their journey.
This technique also streamlines computation by encoding constraints only once, rather than repeating the process for each subproblem. For example, in a warehouse with 800 robots, relieving a group of 40 robots requires maintaining the remaining 760 robots as constraints. Other approaches require inferring all 800 robots once per group in each iteration.
Instead, the researchers’ approach only requires making inferences about the 800 robots once across all groups in each iteration.
“A warehouse is one big place, so multiple groups of robots will share aspects of a larger problem. We designed our architecture to leverage this common information,” he adds.
They tested their technique in several simulated environments, including some that resembled warehouses, others with random obstacles, and even maze-like environments that mimic building interiors.
By identifying more effective groups to offload, their learning-based approach decongests the warehouse up to four times faster than mighty non-learning-based approaches. Even after accounting for the additional computational overhead of running the neural network, their approach still solved the problem 3.5 times faster.
In the future, researchers want to draw elementary rule-based conclusions from their neural model because decisions made in a neural network can be murky and tough to interpret. Simpler rule-based methods may also be easier to implement and maintain in real-world robotic warehouse settings.
“This approach is based on a novel architecture in which convolution and attention mechanisms interact effectively and efficiently. Impressively, this leads to the ability to take into account the spatiotemporal component of constructed paths without having to design problem-specific features. The results are outstanding: not only is it possible to improve state-of-the-art search methods over huge areas in terms of solution quality and speed, but the model perfectly generalizes unseen cases,” says Andrea Lodi, Andrew H. and Ann R. Tisch professors at Cornell Tech, who were not involved in this research.
This work was supported by Amazon and the Amazon Science Hub at MIT.