Friday, May 9, 2025

The controlled dispersion model can change material properties in images

Share

Scientists at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) and Google Research may have just performed some digital wizardry — in the form of a diffusion model that can change the material properties of objects in images.

Called Alchemistthe system allows users to change four attributes of real and AI-generated images: roughness, metallicity, albedo (the initial base color of the object), and transparency. As an image diffusion model, you can input any photo into an image and then adjust each property on a continuous scale from -1 to 1 to create a recent image. These photo editing capabilities could potentially expand to improving models in video games, expanding artificial intelligence’s visual effects capabilities, and enriching robot training data.

The magic of Alchemist starts with the denoising diffusion model: In practice, researchers used Stable Diffusion 1.5, which is a text-to-image model praised for its photorealistic results and editing capabilities. Previous work built on the popular model to allow users to make higher-level changes, such as swapping objects or changing the depth of images. Meanwhile, CSAIL and Google Research employ this model to focus on low-level attributes, examining the finest details of an object’s material properties through a unique slider-based interface that outperforms its peers.

While earlier diffusion systems could pull the proverbial rabbit out of a hat to produce an image, Alchemist could transform the same animal to appear translucent. The system can also make a rubber duck look metallic, remove the gold tint of a goldfish and polish an venerable shoe. Programs like Photoshop have similar capabilities, but this model can change material properties in a simpler way. For example, modifying the metallic appearance of a photo requires several steps in a commonly used application.

“When you look at the image you’ve created, often the result isn’t exactly what you had in mind,” says Prafull Sharma, an MIT doctoral student in electrical engineering and computer science, a CSAIL affiliate and lead author of a recent paper describing the work. “You want to control the image as you edit it, but existing controls in image editors cannot change materials. At Alchemist, we take advantage of the photorealism of the results from text-to-image models and provide a slider that allows us to modify a specific property after the initial image is provided.

Precise control

“Text-to-image generative models have made it possible for everyday users to generate images as easily as writing a sentence. But controlling these models can be challenging, says Jun-Yan Zhu, an assistant professor at Carnegie Mellon University, who was not involved in the paper. “While generating a vase is simple, synthesizing a vase with specific material properties such as transparency and roughness requires users to spend hours trying out various text prompts and random seeds. This can be frustrating, especially for professional users who require precision in their work. Alchemist presents a practical solution to this challenge, enabling precise control over input image materials while leveraging the data-driven assumptions of large-scale diffusion models, inspiring future work to seamlessly incorporate generative models into existing interfaces commonly used for content creation software.”

Alchemist’s design capabilities can help improve the appearance of various models in video games. Applying such a diffusion model to this domain could help developers speed up the design process and refine textures to fit the gameplay at a given level. Additionally, Sharma and his team on the project were able to help alter graphic design elements, videos and film effects to increase photorealism and precisely achieve the desired look of the material.

The method can also refine robot training data for tasks such as manipulation. By introducing machines to more textures, they can better understand the variety of elements they will capture in the real world. Alchemist can even potentially help with image classification by analyzing where the neural network fails to recognize important image changes.

The work of Sharma and his team went beyond similar models in faithfully editing only the desired object of interest. For example, when a user asked different models to adjust the dolphin to maximum transparency, only Alchemist achieved this feat, leaving the ocean background unedited. When the researchers trained the comparable InstructPix2Pix diffusion model on the same data as their benchmark method, they found that Alchemist achieved excellent accuracy results. Similarly, user research showed that the MIT model was preferred and perceived as more photorealistic than its counterpart.

Bringing reality to life with synthetic data

According to the researchers, collecting real data was impractical. Instead, they trained their model on a synthetic dataset by randomly editing the material attributes of 1,200 materials applied to 100 publicly available, unique 3D objects in Blender, a popular computer graphics design tool.

“Control of AI’s generative image synthesis has so far been limited by what text can describe,” says Frédo Durand, Amar Bose Professor of Computer Science in MIT’s Department of Electrical Engineering and Computer Science (EECS) and CSAIL Fellow, who is senior author of the paper. “This work opens up new, more precise control of visual attributes inherited from decades of computer graphics research.”

“Alchemist is the type of technique needed to make machine learning and diffusion models practical and useful for the CGI and graphic design communities,” adds Mark Matthews, Google Research senior software engineer and co-author. “Without it, you’re stuck. This kind of uncontrolled stochasticity can be fun for a while, but at some point you have to do the real work and commit to the inventive vision.

Sharma’s latest project comes a year after he conducted his research Materialistic, a machine learning method that can identify similar materials in an image. This previous work demonstrated how AI models can improve materials understanding skills, and like Alchemist, it was fine-tuned on a synthetic dataset of 3D models from Blender.

Despite this, Alchemist currently has a few limitations. The model has difficulty inferring lighting correctly and sometimes fails to follow the user’s input. Sharma notes that this method also sometimes generates physically improbable transparencies. For example, imagine your hand is partially inside a cereal box – with this attribute set to maximum in Alchemist, you will see the see-through container without reaching your fingers.

The researchers would like to expand on how such a model could improve scene-level 3D graphics resources. The alchemist can also lend a hand in inferring the properties of materials from images. According to Sharma, this type of work could unlock connections between the visual and mechanical characteristics of objects in the future.

MIT EECS professor and CSAIL member William T. Freeman is also senior author, joining Varun Jampani and Google Research scientists Yuanzhen Li PhD ’09, Xuhui Jia and Dmitry Lagun. The work was supported in part by a grant from the National Science Foundation and donations from Google and Amazon. The group’s work will be highlighted at CVPR in June.

Latest Posts

More News