Saturday, April 19, 2025

BYOL-Explore: Exploring with Bootstrap Prediction

Share

Tests

Published
Author’s

Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Avila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot

Second-person and top-down view of a BYOL-Explore agent solving the Thow-Across level in DM-HARD-8, while pure RL and other basic exploration methods make no progress in Thow-Across.

Curiosity-driven exploration is the energetic process of seeking recent information in order to boost the agent’s understanding of the environment. Suppose the agent has learned a model of the world that can predict future events based on a history of past events. The curiosity-driven agent can then exploit the mismatch in the world model’s predictions as an intrinsic reward for directing its exploration policy toward seeking recent information. In the following way, the agent can then exploit this recent information to improve the world model itself so that it can make better predictions. This iterative process can eventually allow the agent to explore every novelty in the world and exploit this information to build an right model of the world.

Inspired by successes create your own hidden (BYOL) – which was used in computer vision, learning graph representationAND learning representations in RL – we propose BYOL-Explore: a conceptually straightforward but general curiosity-driven AI agent for solving challenging exploration tasks. BYOL-Explore learns a representation of the world by predicting its own future representation. It then uses the representation-level prediction error as an internal reward to train a curiosity-driven policy. Therefore, BYOL-Explore learns a representation of the world, the dynamics of the world, and the curiosity-driven exploration policy by simply optimizing the representation-level prediction error.

Comparison between BYOL-Explore, Random Network Distillation (RND), Intrinsic Curiosity Module (ICM) and pure RL (without intrinsic reward) in terms of mean constrained normalized human score (CHNS).

Despite the simplicity of the design, when applied to DM-HARD-8 a set of challenging 3D exploration tasks that are visually complicated and challenging, BYOL-Explore outperforms standard curiosity-based exploration methods such as Random Network Distillation (RND) and Inner Curiosity Module (ICM), in terms of the average maximum normalized human score (CHNS), measured across all tasks. Remarkably, BYOL-Explore achieved this performance using only a single network trained simultaneously for all tasks, whereas previous work was constrained to a single-task setting and could only make significant progress on these tasks when human expert demonstrations were provided.

As further proof of its universality, BYOL-Explore achieves superhuman performance on ten of the most arduous exploration tasks Atari gameswhile having a simpler design than other competing means such as Agent57 AND Go-explore.

Comparison between BYOL-Explore, Random Network Distillation (RND), Intrinsic Curiosity Module (ICM) and pure RL (without intrinsic reward) in terms of mean constrained normalized human score (CHNS).

Going further, we can generalize BYOL-Explore to highly stochastic environments by learning a probabilistic model of the world that could be used to generate trajectories of future events. This could allow the agent to model the possible stochasticity of the environment, avoid stochastic traps, and plan exploration.

Latest Posts

More News