Sunday, April 20, 2025

From Motor Control to Embodied Intelligence

Share

Tests

Published
Author’s

Siqi Liu, Leonard Hasenclever, Steven Bohez, Guy Lever, Zhe Wang, SM Ali Eslami, Nicolas Heess

Using human and animal movements to teach robots to dribble a ball and simulating humanoid characters to carry boxes and play soccer

A humanoid character learns to navigate an obstacle course through trial and error, which can lead to unusual solutions. Heess et al. “The Emergence of Locomotion Behavior in Rich Environments” (2017).

Five years ago we took on the challenge of teaching a fully constructed humanoid character overcome obstacle courses. It shows what can be achieved through trial and error with reinforcement learning (RL), but also highlights two challenges in solving embodied intelligence:

  1. Reusing previously learned behaviors: A significant amount of data was needed for the agent to “get off the ground.” Without prior knowledge of how much force to apply to each of its joints, the agent began by randomly twitching its body and quickly falling to the ground. This problem could be mitigated by reusing previously learned behaviors.
  2. Idiosyncratic behavior: When the agent finally learned how to overcome the obstacle courses, he did so in an unnatural way (although it’s funny) movement patterns that would be impractical in applications such as robotics.

In this paper, we describe a solution to both challenges, called neural probabilistic motor primitives (NPMPs), involving guided learning using movement patterns from humans and animals, and discuss how this approach is used in our Football Humanoid Paper, published today in the journal Science Robotics.

We also discuss how the same approach enables manipulation of the entire humanoid body via vision, such as when a humanoid carries an object, as well as control of a robot in the real world, such as when a robot dribbles a ball.

Distilling Data into Controllable Engine Primitives Using NPMP

The NPMP is a general purpose engine control module that converts short-term engine intentions into low-level control signals. trained offline Or by RL by imitating motion capture (MoCap) data recorded by devices that track the movements of people and animals concerning them.

An agent learning to imitate MoCap trajectories (marked in gray).

The model consists of two parts:

  1. An encoder that takes a future trajectory and compresses it into a movement intent.
  2. A low-level controller that, based on the agent’s current state and movement intent, performs the next action.

Our NPMP model first distills the reference data into a low-level controller (left). This low-level controller can then be used as a plug-and-play motor control module in a novel task (right).

Once trained, the low-level controller can be reused to learn novel tasks, while the high-level controller is optimized to directly derive motor intentions. This enables productive exploration—since consistent behaviors are produced, even with randomly selected motor intentions—and constrains the final solution.

Emerging Team Coordination in Humanoid Football

Football was long-term challenge for research on embodied intelligence requiring individual skills and coordinated team play. In our most recent work, we used NPMP as a precursor to guiding the learning of motor skills.

The result was a team of players who went from learning to chase a ball to learning to coordinate. Earlier, in study with simple embodimentswe showed that coordinated behavior can occur in teams competing with each other. NPMP allowed us to observe a similar effect, but in a scenario that required much more advanced motor control.

Agents first imitate soccer player movements to learn the NPMP module (top). Then, using NPMP, agents learn soccer-specific skills (bottom).

Our agents have acquired skills that include agile movement, handovers and work sharing, which has been confirmed by a number of statistics, including indicators used in real world sports analysisPlayers demonstrate both agile, high-frequency motor control and long-term decision-making that includes anticipating the behavior of teammates, leading to coordinated team play.

An agent learns to play soccer competitively using multi-agent RL.

Whole-body manipulation and cognitive tasks using vision

Learning to interact with objects using arms is another arduous control challenge. NPMP can also enable this type of whole-body manipulation. With a compact amount of MoCap data on interactions with boxes, we are able to train an agent to carry a box from one place to another using egocentric vision and only a meager reward signal:

Given a compact amount of MoCap data (top), our NPMP approach can solve the box-moving task (bottom).

Similarly, we can teach an agent to catch and throw balls:

A simulated humanoid creature catching and throwing a ball.

A simulated humanoid creature collecting blue orbs in a maze.

Sheltered and productive control of real-world robots

NPMP can also lend a hand control real robots. Having well-standardized behavior is critical for activities such as walking on uneven terrain or handling dainty objects. Erratic movements can damage the robot itself or its environment, or at least drain its battery. Therefore, significant effort is often invested in designing learning objectives that make the robot do what we want it to do, while behaving in a secure and productive manner.

As an alternative, we investigated whether the employ of earlier values ​​derived from biological movement can provide us with well-regulated, natural-looking and reusable motor skills for legged robots, such as walking, running and turning, which are suitable for implementation in real-world robots.

Starting with MoCap data from humans and dogs, we adapted the NPMP approach to train skills and controllers in a simulation that can then be implemented on real humanoid (OP3) and quadruped (ANYmal B) robots, respectively. This allowed the user to control the robots with a joystick or dribble a ball to a target location in a way that looked natural and solid.

The locomotion skills of the ANYmal robot are acquired by imitation of a MoCap animation of a dog.

Locomotion skills can then be used for controlled walking and dribbling the ball.

Benefits of using probabilistic neural motor primitives

In summary, we have used the NPMP skill model to learn sophisticated tasks with humanoid characters in simulation and real-world robots. NPMP packages low-level movement skills in a reusable way, facilitating the learning of useful behaviors that would be arduous to discover through unstructured trial and error. By using motion capture as a source of prior information, it biases motor control learning toward naturalistic movements.

With NPMP, embodied agents can learn faster using real-world language, learn more natural behaviors, learn safer, more productive, and more stable behaviors appropriate for real-world robotics, and combine full-body motor control with longer-term cognitive skills such as teamwork and coordination.

Learn more about our work:

Latest Posts

More News