Tuesday, January 7, 2025

To be able to interact with the real world, artificial intelligence will gain physical intelligence

Share

The latest AI models are surprisingly human-like in their ability to generate text, audio, and video when prompted. However, until now, these algorithms have largely remained relegated to the digital world rather than the physical, three-dimensional world in which we live. In fact, every time we try to apply these models in the real world, even the most sophisticated struggle to get them to work properly – think, for example, how challenging it was to develop unthreatening and reliable autonomous cars. Although these models are artificially smart, not only do they have no understanding of physics, but they often hallucinate, leading them to make inexplicable mistakes.

However, this is the year when artificial intelligence will finally jump from the digital world to the real world we live in. Extending AI beyond its digital boundaries requires changing the way machines think and combining the digital intelligence of AI with the mechanical efficiency of robotics. This is what I call “physical intelligence” – a up-to-date form of smart machine that can understand active environments, cope with unpredictability, and make decisions in real time. Unlike the models used by standard AI, physical intelligence is rooted in physics; in understanding basic real world principles such as cause and effect.

Such features enable physical intelligence models to interact and adapt to different environments. In my research group at MIT, we develop models of physical intelligence that we call fluid networks. For example, in one experiment, we trained two drones – one powered by a standard AI model and the other using a seamless network – to locate objects in a forest in the summer based on data collected by human pilots. While both drones performed equally well when instructed to do exactly what they were trained to do, when asked to locate objects in different circumstances – winter or urban environments – only the fluid network drone successfully performed its task. This experiment showed us that unlike customary AI systems that stop evolving after an initial training phase, fluid networks continue to learn and adapt based on experience, just like humans do.

Physical intelligence is also able to interpret and physically execute sophisticated commands from text or images, bridging the gap between digital instructions and real-world execution. For example, in my lab we have developed a physically smart system that can iteratively design and then 3D print petite robots in less than a minute based on messages such as “a robot that can walk forward” or “a robot that can grab objects.”

Other labs are also making significant breakthroughs. For example, the robotics startup Covariant, founded by Pieter Abbeel, a researcher at the University of California, Berkeley, is developing chatbots — similar to ChatGTP — that can control robotic arms when asked to do so. They have already secured over $222 million to develop and deploy sorting robots in warehouses around the world. A team from Carnegie Mellon University recently did the same demonstrated that a robot with just one camera and imprecise controls can perform active and sophisticated parkour movements – including jumping onto obstacles twice their size and through gaps twice their length – using a single neural network trained through reinforcement learning.

If 2023 was the year of text-to-image and 2024 was the year of text-to-video, then 2025 will be the year of physical intelligence, with a up-to-date generation of devices – not just robots, but everything from power grids to shrewd homes – that can interpret what we tell them and complete tasks in the real world.

Latest Posts

More News