Saturday, April 19, 2025

Intuitive learning of physics in a deep learning model inspired by developmental psychology

Share

Understanding the physical world is a key skill that most people apply effortlessly. But it still presents a challenge for AI; if we are to implement secure and helpful systems in the real world, we want these models to share our intuitive sense of physics. But before we can build these models, there is another challenge: how do we measure the ability of these models to understand the physical world? That is, what does it mean to understand the physical world, and how can we quantify that?

Fortunately for us, developmental psychologists have spent decades studying what infants know about the physical world. Along the way, they have carved out a vague notion of physical knowledge into a concrete set of physical concepts. And they have developed a violation of expectation (VoE) paradigm for testing these concepts in infants.

In our paper published today in the journal Nature Human Behavior, we extend their work and make the source code available Physical Concepts DatasetThis synthetic video dataset transfers the VoE paradigm to evaluate five physical concepts: solidity, object permanence, continuity, “invariance,” and directional inertia.

With a point of reference in physical knowledge, we turned to the task of building a model capable of learning about the physical world. Again, we turned to developmental psychologists for inspiration. Scientists have not only catalogued what infants know about the physical world, but have also posited mechanisms that might enable such behaviour. Despite their variability, these explanations play a central role in the notion of dividing the physical world into a set objects which evolve over time.

Inspired by this work, we built a system we call PLATO (Physics Learning through Auto-encoding and Tracking Objects). PLATO represents and reasons about the world as a collection of objects. It predicts where objects will be in the future based on where they have been in the past and what other objects they interact with.

After training PLATO on videos of basic physical interactions, we found that PLATO passed tests on our Physical Concepts dataset. In addition, we trained “flat” models that were as immense (or even larger) than PLATO but did not apply object-based representations. When we tested these models, we found that they failed all of our tests. This suggests that objects are helpful for learning intuitive physics, confirming hypotheses from the developmental literature.

We also wanted to determine how much experience is needed to develop this ability. Evidence of physical knowledge has been shown in infants as newborn as two and a half months. How does PLATO compare? By varying the amount of training data PLATO uses, we found that PLATO can learn our physical concepts after just 28 hours of visual experience. The circumscribed and synthetic nature of our data set means that we cannot make a “how-to” comparison between the amount of visual experience infants receive and PLATO. However, this result suggests that intuitive physics can be learned with relatively little experience if it is supported by an inductive bias towards representing the world as objects.

Finally, we wanted to test PLATO’s ability to generalize. In the Physical Concepts dataset, all objects in our test set are also present in the training set. What if we tested PLATO with objects it had never seen before? To do this, we used a subset of another synthetic data set developed by MIT researchers. This dataset also examines physical knowledge, albeit with different visual appearances and a set of objects that PLATO had never seen before. PLATO passed, without any retraining, despite being tested on completely recent stimuli.

We hope that this dataset can provide researchers with a more detailed understanding of their model’s ability to understand the physical world. In the future, it can be extended to test more aspects of intuitive physics by increasing the list of physical concepts tested and by using richer visual stimuli, including recent object shapes and even real-world videos.

Latest Posts

More News