This AI model can intuitively understand how the physical world works

Share

Original version With this story appeared in Quanta Magazine.

Here’s a test for babies: show them a glass of water on the desk. Hide it behind a wooden board. Now move the board towards the glass. If the sign keeps passing the window as if it wasn’t there, are they surprised? Many 6-month-olds have this ability, and after a year almost all babies have an intuitive idea of object permanence, learned through observation. Now some AI models do this too.

Scientists have developed an artificial intelligence system that learns about the world through videos and exhibits the concept of “surprise” when presented with information that contradicts the knowledge it has accumulated.

The model created by Meta and called the Video Joint Embedding Predictive Architecture (V-JEPA) makes no assumptions about the physics of the world contained in the videos. Nevertheless, he can begin to understand how the world works.

“Their claims are a priori very plausible and the results are extremely interesting,” he says Michael Heilbroncognitive scientist at the University of Amsterdam who studies how brains and artificial systems make sense of the world.

Higher abstractions

As engineers building autonomous cars know, ensuring that an AI system reliably understands what it sees can be arduous. Most systems designed to “understand” videos in order to classify their content (for example, “a person playing tennis”) or to identify the outlines of an object, such as a car in front of us, operate in what is called “pixel space”. The model essentially treats every pixel in a video as equally essential.

However, these pixel space models have limitations. Imagine trying to understand a suburban street. If the scene contains cars, traffic lights, and trees, the model may focus too much on unimportant details such as leaf movement. It may not take into account the color of traffic lights or the position of nearby cars. “When you look at photos or videos, you don’t want to work on them [pixel] space because there are too many details you don’t want to model,” he said Randall Balestrierocomputer scientist from Brown University.

Yann LeCun, a computer scientist at Modern York University and director of artificial intelligence research at Meta, created JEPA in 2022, a predecessor to V-JEPA that works on still images.

Photo: Ecole Polytechnique Paris-Saclay University

The AI Sckool

Categories

This AI model can intuitively understand how the physical world works

Higher abstractions

The war in Iran is affecting the environment in undetectable ways

The man behind AlphaGo believes that artificial intelligence is heading down the wrong path

A faster way to estimate AI energy consumption

Here’s how much San Francisco tech companies pay for police protection

We announce our partnership with the Republic of Korea

More News

The war in Iran is affecting the environment in undetectable ways

10 GitHub repositories to master Claude’s code

Caves that can assist us find aliens or become aliens

7 specific unconventional things about language models

The war in Iran is affecting the environment in undetectable ways

The man behind AlphaGo believes that artificial intelligence is heading down the wrong path

A faster way to estimate AI energy consumption