Generalist Agent

Share

Inspired by advances in large-scale language modeling, we take a similar approach to building a single generalist agent that goes beyond the realm of textual output. The agent we call Gato acts as a multimodal, multitasking, multiincarnation policy generalist. The same network with the same weights can play Atari, signature images, chat, assemble blocks with a real robot arm, and much more, deciding based on context whether to output text, joint torques, button presses, or other tokens.

During Gato’s training phase, data from different tasks and modalities are serialized into a flat sequence of tokens, grouped, and processed by a transformer neural network similar to a gigantic language model. The loss is masked so that Gato only predicts action and text targets.

When deploying Gato, a prompt, such as a demo, is tokenized, creating an initial sequence. The framework then generates the first observation, which is also tokenized and appended to the sequence. Gato samples the action vector autoregressively, one token at a time.

Once all tokens that make up the action vector have been fetched (as determined by the environment’s action specification), the action is decoded and sent to the environment, which executes the steps and generates a up-to-date observation. The procedure then repeats. The model always sees all previous observations and actions in its context window of 1024 tokens.

Gato is trained on a gigantic number of datasets covering the agent’s experience in both simulated and real environments, in addition to a variety of natural language and image datasets. The number of tasks where the performance of the pre-trained Gato model exceeds a percentage of the expert score, grouped by domain, is shown here.

The images below also show how a pre-trained Gato model using the same weights can do things like caption images, engage in interactive dialogue, and control a robot arm, among other things.

The AI Sckool

Categories

When summer is approaching, federal cuts threaten the program to keep sensitive people in the frigid

Chatgpt will now exploit its “memory” to personalize internet search

Up-to-date reasoning of AI OpenAi Hallucinations more

Himscast: Should every health care organization have an AI strategy?

Wikipedia gives programmers to artificial intelligence of their data to reject the Bot Coppes

More News

Start construction with Flash Gemini 2.5

Dolphingemma: How Google AI helps decoding dolphins communication

Generate movies in twins and beat with veo 2

Assessment of potential threats of advanced cyber security AI

When summer is approaching, federal cuts threaten the program to keep sensitive people in the frigid

Chatgpt will now exploit its “memory” to personalize internet search

Up-to-date reasoning of AI OpenAi Hallucinations more