Generational AI and robotics are bringing us ever closer to the day when we can request an object and have it made in minutes. MIT researchers have developed a speech-to-reality system, an artificial intelligence workflow that allows them to feed input into a robotic arm and “bring objects into existence,” creating furniture, for example, in just five minutes.
Thanks to the speech-to-reality system, a robotic arm mounted on a table is able to receive spoken information from a human, for example, “I want a simple stool,” and then construct objects from modular components. So far, researchers have used this system to create stools, shelves, chairs, a petite table, and even decorative objects such as a dog statue.
“We combine natural language processing, 3D generative artificial intelligence and robotic assembly,” says Alexander Htet Kyaw, MIT graduate and Morningside Academy for Design (MAD) fellow. “These are rapidly developing areas of research that have not previously been connected in a way that allows the creation of physical objects with a simple voice message.”
Speech to reality: on-demand production using 3D generative artificial intelligence and discrete robotic assembly
The idea was born when Kyaw – a graduate of the faculties of architecture, electrical engineering and computer science – took part in Professor Neil Gershenfeld’s course “How to do almost anything”. During these classes, he built a system for converting speech into reality. He continued working on the project at the MIT Center for Bits and Atoms (CBA), led by Gershenfeld, collaborating with graduate students Se Hwan Jeon of the Department of Mechanical Engineering and Miana Smith of CBA.
The speech-to-reality system starts with speech recognition, which processes the user’s request using a vast language model, followed by 3D generative artificial intelligence, which creates a digital representation of the object in the mesh, and a voxelization algorithm, which breaks the 3D mesh into assembly components.
Geometric processing then modifies the AI-generated assembly to account for real-world manufacturing and physical constraints, such as component counts, overhangs, and geometry connectivity. This is followed by the creation of a feasible assembly sequence and automated path planning for the robot arm to assemble the physical objects based on user commands.
Using natural language, the system makes design and production more accessible to people without specialized knowledge in 3D modeling or robot programming. And unlike 3D printing, which can take hours or days, this system builds in minutes.
“This project provides an interface between humans, artificial intelligence and robots to co-create the world around us,” says Kyaw. “Imagine a scenario where you say, ‘I want a chair,’ and within five minutes a material chair materializes in front of you.”
The team has immediate plans to improve the load-bearing capacity of the furniture by changing the way the magnet cubes are connected to stronger connections.
“We have also developed pipelines for transforming voxel structures into feasible assembly sequences for small, distributed mobile robots, which could help translate this work to structures of any scale,” Smith says.
The purpose of using modular components is to eliminate the waste created when creating physical objects by disassembling them and then reassembling them into something else, such as turning a sofa into a bed when the sofa is no longer needed.
Because Kyaw also has experience in using gesture recognition and augmented reality for interaction using robots in the production process, he is currently working on incorporating speech and gesture control into a system that transforms speech into reality.
Drawing on memories of the replicator from the “Star Trek” series and the robots from the animated film “Big Hero 6,” Kyaw explains his vision.
“I want to increase people’s access to creating physical objects in a fast, accessible and sustainable way,” he says. “I am working on a future where the essence of matter is truly under your control. One where reality can be generated on demand.”
The team presented their report “Speech to reality: on-demand production using natural language, 3D generative artificial intelligence and discreet robotic assembly” at the Association for Computing Machinery (ACM) Symposium on Computational Manufacturing (SCF ’25), held November 21 at MIT.
