MIT researchers have spent more than a decade investigating techniques that enable robots to find and manipulate hidden objects by “seeing” through obstacles. Their methods apply surface-penetrating wireless signals that bounce off hidden objects.
Researchers are now using generative AI models to overcome a long-standing bottleneck that has confined the precision of previous approaches. The result is a modern method that enables more true shape reconstructions, which could improve the robot’s ability to reliably grasp and manipulate occluded objects.
This modern technique partially reconstructs a hidden object based on reflected wireless signals and fills in the missing parts of its shape using a specially trained generative artificial intelligence model.
The researchers also introduced an augmented system that uses generative artificial intelligence to accurately reconstruct an entire room, including all the furniture. The system uses wireless signals sent from one stationary radar that are reflected by people moving in space.
This overcomes one of the key challenges facing many existing methods, which requires mounting a wireless sensor on a mobile robot to scan the environment. Unlike some popular camera-based techniques, their method protects the privacy of people in the environment.
These innovations could enable warehouse robots to verify packaged products before shipment, eliminating waste resulting from product returns. They could also enable sharp home robots to understand a person’s location in a room, improving the safety and efficiency of human-robot interactions.
“We have now developed generative artificial intelligence models that help us understand wireless reflections. This opens up many interesting new applications, but from a technical point of view it is also a qualitative leap in possibilities, from the ability to fill in gaps that we were unable to see before, to the ability to interpret reflections and reconstruct entire scenes,” says Fadel Adib, an associate professor in the Department of Electrical Engineering and Computer Science, director of the Signal Kinetics group at the MIT Media Lab and senior author of two papers on the technique. “We are using artificial intelligence to finally unlock wireless vision.”
Adib joins first paper by lead author and research assistant Laura Dodds; and research assistants Maisy Lam, Waleed Akbar, and Yibo Cheng; and on second paper by lead author and former postdoctoral fellow Kaichen Zhou; Dodds; and research assistant Sayed Saad Afzal. Both papers will be presented at the IEEE Conference on Computer Vision and Pattern Recognition.
Overcoming specularism
Adib’s group has previously demonstrated the apply of millimeter wave (mmWave) signals to create true reconstructions of 3D objects that are hidden from view, such as a lost wallet buried under a pile.
These waves, which are the same type of signals found in a Wi-Fi network, can pass through common obstacles such as drywall, plastic and cardboard, and bounce off hidden objects.
But mm waves are usually mirror reflecting, which means that the wave reflects in one direction when it hits a surface. Such vast portions of the surface will reflect signals away from the mmWave sensor, making these areas virtually imperceptible.
“When we want to reconstruct an object, we can only see the top surface and we can’t see any of the bottom or sides,” explains Dodds.
Scientists have previously used principles of physics to interpret reflected signals, but this limits the accuracy of the reconstructed 3D shape.
In the modern papers, they overcome this limitation by using a generative AI model to fill in parts that are missing in partial reconstruction.
“But then the challenge becomes: How do we train these models to fill these gaps?” – says Adib.
Typically, researchers apply extremely vast datasets to train a generative AI model, which is one of the reasons why models like Claude and Llama show such impressive performance. However, no mmWave dataset is vast enough for training.
Instead, researchers adapted images from vast computer vision datasets to mimic the properties of mmWave reflections.
“We simulated the specularity property and the noise produced by these reflections so that we could apply existing datasets to our domain. It would take years to collect enough new data to do this,” Lam says.
Scientists embed the physics of mmWave reflections directly into this customized data, creating a synthetic dataset that they apply to train a generative AI model to perform reliable shape reconstructions.
The complete system, called Wave-Former, proposes a set of potential object surfaces based on mmWave reflections, feeds them into a generative AI model to complete the shape, and then refines the surfaces until a complete reconstruction is achieved.
Wave-Former was able to generate faithful reconstructions of approximately 70 everyday objects such as cans, boxes, kitchen utensils and fruit, increasing accuracy by almost 20 percent compared to state-of-the-art baselines. Items were hidden behind or under cardboard, wood, drywall, plastic and fabric.
Seeing “ghosts”
The team used the same approach to build an extended system that fully reconstructs entire indoor scenes by using mmWave reflections from people moving around in the room.
Human movement generates multipath reflections. Some millimeter waves bounce off a person, then bounce back off a wall or object, and then make their way back to the sensor, Dodds explains.
These secondary reflections create so-called “ghost signals”, which are reflected copies of the primary signal and change position as the person moves. These spectrum signals are usually rejected as noise, but they also contain information about the room layout.
“By analyzing how these reflections change over time, we can begin to roughly understand the environment around us. However, attempting to interpret these signals directly will be limited in terms of accuracy and resolution.” says Dodds.
They used a similar training method to teach a generative AI model to interpret rugged scene reconstructions and understand the behavior of multipath mmWave reflections. This model fills in the gaps by refining the initial reconstruction until the scene is complete.
They tested their scene reconstruction system, called RISE, using more than 100 human movement trajectories captured by a single mmWave radar. On average, RISE’s reconstructions were approximately twice as true as existing techniques.
In the future, researchers want to improve the detail and detail of their reconstructions. They also want to build vast baseline models for wireless signals, such as the GPT, Claude, and Gemini baseline models for language and vision, which could open up modern applications.
This work is supported in part by the National Science Foundation (NSF), MIT Media Lab, and Amazon.
