Imagine that you are driving through a tunnel in an autonomous vehicle, but unbeknownst to you, an accident has stopped traffic in front of you. Typically you would have to rely on the car in front of you to know to start braking. But what if your vehicle could see the car in front and apply the brakes even faster?
Researchers at MIT and Meta have developed a computer vision technique that could one day enable an autonomous vehicle to do just that.
They introduced a method for creating physically true 3D models of an entire scene, including areas obscured from view, using images from a single camera position. Their technique uses shadows to determine what is in obscured parts of the scene.
They call their approach PlatoNeRF, referring to Plato’s allegory of the cave, a fragment of the Greek philosopher’s “Republic”, in which prisoners chained in a cave recognize the reality of the outside world based on the shadows cast on the cave wall.
By combining lidar (delicate detection and ranging) technology with machine learning, PlatoNeRF can generate more true 3D geometry reconstructions than some existing artificial intelligence techniques. Additionally, PlatoNeRF is better at smoothly reconstructing scenes where shadows are tough to see, such as those with high ambient lighting or a gloomy background.
In addition to improving the safety of autonomous vehicles, PlatoNeRF can escalate the performance of AR/VR headsets by allowing the user to model the geometry of a room without having to walk around the room and take measurements. This can also assist warehouse robots find products faster in cluttered environments.
“Our key idea was to take these two things that had already been done in different disciplines and combine them into one whole – multi-reflection lidar and machine learning. It turns out that when you combine these two elements, you will discover many new opportunities to explore and get the best of both worlds,” says Tzofi Klinghoffer, MIT graduate in media arts and sciences, an affiliate of the MIT Media Lab and lead author of, among others article on PlatoNeRF.
Klinghoffer wrote the article with his advisor, Ramesh Raskar, associate professor of media arts and sciences and leader of MIT’s Camera Culture Group; senior author Rakesh Ranjan, director of artificial intelligence research at Meta Reality Labs; as well as Siddharth Somasundaram of MIT and Xiaoyu Xiang, Yuchen Fan and Christian Richardt of Meta. The research results will be presented at the Conference on Computer Vision and Pattern Recognition.
Shedding light on the problem
Reconstructing a full 3D scene from the point of view of a single camera is a complex problem.
Some machine learning approaches use generative artificial intelligence models that try to guess what is in occluded areas, but these models can induce hallucinations of objects that are not actually there. Other approaches try to infer the shapes of hidden objects from shadows in a color image, but these methods can prove difficult when shadows are poorly visible.
For PlatoNeRF, MIT researchers developed these approaches using a new detection method called single-photon lidar. Lidars map a 3D scene by emitting pulses of light and measuring the time it takes for that light to reflect back to the sensor. Because single-photon lidars can detect single photons, they provide higher resolution data.
Scientists use single-photon lidar to illuminate a target point in the scene. Some of the light is reflected from this point and returns directly to the sensor. However, most of the light scatters and reflects off other objects before returning to the sensor. PlatoNeRF is based on these second light reflections.
By calculating how long it takes for light to reflect twice and then return to the lidar sensor, PlatoNeRF records additional information about the scene, including depth. The second reflection of light also contains information about shadows.
The system tracks secondary light rays – those that bounce off the target point and reach other points in the scene – to determine which points are in shadow (due to lack of light). From the location of these shadows, PlatoNeRF can infer the geometry of hidden objects.
The lidar illuminates 16 points sequentially, capturing multiple images that are used to reconstruct the entire 3D scene.
“Every time we illuminate a point in the scene, we create new shadows. Because we have such different light sources, there are many light rays shooting around us, so we cut out the area that is obscured and lies beyond the visible eye,” says Klinghoffer.
A winning combination
The key to PlatoNeRF is the combination of multi-reflection lidar with a special type of machine learning model known as a neural radiation field (NeRF). NeRF encodes the geometry of the scene in the weights of the neural network, which gives the model a high ability to interpolate, i.e. estimate novel views of the scene.
This interpolation ability also leads to highly accurate scene reconstructions when combined with multi-reflection lidar, Klinghoffer says.
“The biggest challenge was finding a way to combine these two things. We really had to think about the physics of light transport with multireflectance lidar and how to model that using machine learning,” he says.
They compared PlatoNeRF with two common alternative methods, one using only lidar and the other using only NeRF with color imaging.
They found that their method was able to outperform both techniques, especially when the lidar sensor had a lower resolution. This would make their approach more practical for real-world implementation, where lower resolution sensors are common in commercial devices.
“About 15 years ago, our group invented the first camera to ‘see’ corners, which uses multiple reflections of light, or ‘light echoes.’ These techniques used special lasers and sensors and three light reflections. Since then, lidar technology has become more popular, leading to our research into cameras that can see through fog. This new design uses only two light reflections, which means the signal-to-noise ratio is very high and the quality of the 3D reconstruction is impressive,” says Raskar.
In the future, researchers want to try tracking more than two delicate reflections to see how this could improve scene reconstructions. Additionally, they are interested in applying more deep learning techniques and combining PlatoNeRF with color image measurements to capture texture information.
“While shadow images from cameras have long been studied as a means of 3D reconstruction, in this work we re-examine the problem in the context of lidar, demonstrating significant improvements in the accuracy of hidden geometry reconstruction. The work shows how clever algorithms can provide extraordinary capabilities when combined with ordinary sensors – including the lidar systems that many of us now carry in our pockets,” says David Lindell, an assistant professor in the Department of Computer Science at the University of Toronto. who was not involved in this work.