Monday, December 23, 2024

A novel way to create realistic 3D shapes using generative artificial intelligence

Share

Creating realistic 3D models for applications such as virtual reality, filmmaking, and engineering design can be a tedious process requiring a lot of manual trial and error.

While generative AI models for images can improve artistic processes by enabling creators to create realistic 2D images from text prompts, these models are not intended to generate 3D shapes. To fill this gap, a recently developed technique called Spot distillation uses 2D image generation models to create 3D shapes, but the results are often blurry or cartoonish.

MIT researchers examined the connections and differences between the algorithms used to generate 2D images and 3D shapes, identifying the main cause of lower-quality 3D models. They then developed a straightforward fix for Score Distillation that can generate edged, high-quality 3D shapes that are closer in quality to the best 2D images generated by the model.

Some other methods try to solve this problem by retraining or tuning the generative AI model, which can be exorbitant and time-consuming.

In turn, the technique used by the MIT researchers allows for 3D shape quality comparable to or better than these approaches without additional training or complicated post-processing.

Moreover, by identifying the cause of the problem, researchers improved the mathematical understanding of spot distillation and related techniques, which will enable future work to further improve performance.

“Now we know where we should be heading, which allows us to find more efficient, faster and higher-quality solutions,” says Artem Lukoianov, an electrical engineering and computer science (EECS) graduate student who is the lead author of a paper on this technique. “In the long run, our work could ease the process of being a co-pilot for designers, making it easier to create more realistic 3D shapes.”

Lukoianova’s co-authors include Haitz Sáez de Ocáriz Borde, a graduate of the University of Oxford; Kristjan Greenewald, research associate at the MIT-IBM Watson AI Lab; Vitor Campagnolo Guizilini, scientist at the Toyota Research Institute; Timur Bagautdinov, researcher at Meta; and senior authors Vincent Sitzmann, an assistant professor of EECS at MIT who leads the Scene Representation Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL), and Justin Solomon, an associate professor of EECS and leader of CSAIL’s Geometric Data Processing Group. The research results will be presented at the Conference on Neural Information Processing Systems.

From 2D images to 3D shapes

Diffusion models such as DALL-E are a type of generative artificial intelligence model that can generate realistic images from random noise. To train these models, researchers add noise to images and then train the model to reverse the process and remove the noise. The models employ this learned “denoising” process to create images based on the user’s text prompts.

However, diffusion models cannot directly generate realistic 3D shapes because there is not enough 3D data to train them. To get around this problem, scientists have developed a technique called Evaluation of distillation samples (SDS) in 2022, which uses a pre-trained diffusion model to combine 2D images into a 3D representation.

The technique involves starting with a random 3D representation, rendering a 2D view of the desired object from a random camera angle, adding noise to the image, denoising it using a diffusion model, and then optimizing the random 3D representation to match the denoised image. These steps are repeated until the desired 3D object is generated.

However, the 3D shapes created this way appear blurry or oversaturated.

“It has been a bottleneck for some time. We know that the basic model can perform better, but people didn’t know why this happened with 3D shapes,” Lukoianov says.

MIT researchers examined the steps of SDS and identified a discrepancy between the formula that is a key part of the process and its counterpart in 2D diffusion models. The formula tells the model how to update the random representation by adding and removing noise, step by step, to make it look more like the desired image.

Because part of this formula involves an equation that is too complex to solve efficiently, SDS replaces it with randomly sampled noise at each step. MIT researchers found that this noise leads to blurry or cartoonish 3D shapes.

Approximate answer

Instead of trying to precisely solve this troublesome formula, researchers tested approximation techniques until they found the best one. Instead of randomly sampling the noise component, their approximation technique infers the missing component from the current 3D shape rendering.

“This, as the analysis in the article predicts, generates 3D shapes that look sharp and realistic,” he says.

Additionally, researchers increased the image rendering resolution and adjusted some model parameters to further improve the quality of 3D shapes.

Ultimately, they were able to employ a ready-made, pre-trained image diffusion model to create glossy, realistic-looking 3D shapes without the need for exorbitant retraining. 3D objects are similarly edged to those produced using other methods based on ad hoc solutions.

“We try to blindly experiment with different parameters, sometimes it works and sometimes it doesn’t, but you don’t know why. We know this is an equation we need to solve. Thanks to this, we can find more effective ways to solve this problem,” he says.

Because their method relies on a pre-trained diffusion model, it inherits the errors and flaws of that model, making it susceptible to hallucinations and other failures. Improving the basic diffusion model would improve this process.

In addition to studying the pattern to see how it can be solved more efficiently, researchers are interested in exploring how these insights could improve image editing techniques.

This work is funded in part by the Toyota Research Institute, the U.S. National Science Foundation, the Singapore Defense Science and Technology Agency, the U.S. Intelligence Advanced Research Projects Activity, Amazon Science Hub, IBM, the U.S. Army Office of Research, the CSAIL Future of Data program, Wistron Corporation and MIT-IBM Watson AI Laboratory.

Latest Posts

More News