Sunday, December 22, 2024

Google DeepMind at NeurIPS 2024

Share

Tests

Published

Improving adaptive AI agents, improving 3D scene creation and pioneering LLM training for a smarter and safer future

Next week, artificial intelligence researchers from around the world will gather for a conference 38th Annual Neural Information Processing Systems Conference (NeurIPS), taking place on December 10-15 in Vancouver,

Two works under the supervision of Google DeepMind researchers will be distinguished The test of time awards for his “undeniable impact” on the pitch. Ilya Sutskever will present Sequence-by-sequence learning with neural networks co-authored by Google DeepMind’s vice president of Drastic research, Oriol Vinyals, and distinguished scientist Quoc V. Le. Google scientist David Warde and Google DeepMind scientist Ian Goodfellow will present Generative adversarial networks.

We’ll also show how we translate our basic research into real-world applications with live demonstrations including Gemma Scope, artificial intelligence for music generation, weather forecasting and more.

Teams at Google DeepMind will present over 100 novel papers on topics ranging from AI agents and generative media to pioneering approaches to learning.

Creating adaptive, clever and safe and sound AI agents

LLM-based AI agents show promise in performing digital tasks using natural language commands. However, their success depends on precise interaction with complicated user interfaces, which requires extensive training data. WITH Android controlsWe provide the most diverse set of audit data to date, with over 15,000 demonstrations collected by people in over 800 applications. AI agents trained using this dataset showed significant performance gains, which we hope will aid advance research on AI agents more generally.

For AI agents to generalize tasks, they must learn from every experience they encounter. We present the method learning contextual abstraction helps agents capture key task patterns and relationships from imperfect demos and natural language feedback, increasing their performance and adaptability.

A still from a video demonstration of a person preparing a sauce, with each item identified and numbered. ICAL is able to isolate vital aspects of the process

Developing agentic AI that works to advance user goals can aid make technology more useful, but customization is key when developing AI that works on our behalf. For this purpose, we propose a theoretical method measure the AI ​​system’s goal orientationand also show how a the user’s perception of the model may influence the security filters it uses. Together, these insights underscore the importance of resilient safeguards to prevent unintended or unsafe behavior, ensuring that AI agent actions remain consistent with safe and sound, intended uses.

Progress in creating and simulating 3D scenes

As the demand for high-quality 3D content grows across industries such as gaming and visual effects, creating realistic 3D scenes remains costly and time-consuming. Our recent work introduces novel approaches to 3D generation, simulation and control, streamlining content creation for faster and more malleable workflows.

Creating high-quality, realistic 3D assets and scenes often requires capturing and modeling thousands of 2D photos. We present CAT3Da system that can create 3D content in just a minute from any number of images – even a single image or text prompt. CAT3D achieves this with a multi-view diffusion model that generates additional coherent 2D images from many different viewpoints and uses these generated images as input for established 3D modeling techniques. The results outperform previous methods in both speed and quality.

CAT3D allows you to create 3D scenes from any number of generated or real images.

From left to right: text to image in 3D, real photo in 3D, multiple photos in 3D.

Simulating scenes with many fixed objects, such as a cluttered table top or falling Lego bricks, also requires a lot of processing power. To overcome this obstacle, we present a new technique called SDF-Sim that represents object shapes in a scalable way, speeding up collision detection and enabling productive simulation of gigantic, complicated scenes.

Elaborate simulation of hundreds of falling and colliding objects, accurately modeled with SDF-Sim

AI image generators based on diffusion models have difficulty controlling the 3D position and orientation of many objects. Our solution, Neural assetsintroduces object-specific representations that capture both 3D appearance and pose learned by training on animated video data. Neural Assets allows users to move, rotate and swap objects in different scenes – a useful tool for animation, gaming and virtual reality.

Given a source image and an object’s 3D bounding box, we can translate, rotate and scale the object, or move objects or backgrounds between images

Improving the way LLM learns and responds

We are also improving the way LLMs train, learn and respond to users, improving efficiency and effectiveness on several fronts.

Thanks to larger context windows, LLM teachers can now learn from potentially thousands of examples at the same time – this is called multi-step context learning (ICL). This process improves model performance for tasks such as mathematics, translation, and reasoning, but often requires high-quality human-generated data. To make training more profitable, we investigate multi-shot ICL adaptation methods that reduce dependence on manually selected data. There is so much data available for training language models that the main limitation for teams building them becomes available computing power. We raise an important question: given a fixed computational budget, how to choose the right model size to achieve the best results?

Another pioneering approach we call Time-reversed language models (TRLM), examines the initial training and tuning of the LLM to operate in reverse. After receiving established LLM responses as input, TRLM generates the queries that may have generated these responses. When combined with established LLM, this method not only helps ensure that responses better align with user instructions, but also improves citation generation for summarized text and improves filters against harmful content.

High-quality data curation is indispensable for training gigantic AI models, but manual curation is tough at scale. To remedy this, our Common selection of examples (JEST) optimizes training by identifying the most learnable data in larger batches, enabling up to 13x fewer training rounds and 10x fewer computations, which outperforms state-of-the-art multimodal pretraining foundations.

Planning tasks present another challenge for artificial intelligence, especially in stochastic environments where results are influenced by randomness or uncertainty. Researchers employ different types of reasoning for planning, but there is no consistent approach. We show it Planning itself can be viewed as a distinct type of probabilistic reasoning and propose a framework for ranking different inference techniques based on their planning performance.

Connecting the global AI community

We are proud to be the Diamond Sponsor of the conference and to support it Women in machine learning, LatinX in artificial intelligence AND Black in AI in building artificial intelligence, machine learning and data science communities around the world.

If you’re attending NeurIPs this year, visit the Google DeepMind and Google Research booths to see cutting-edge research in demonstrations, workshops, and more throughout the conference.

Latest Posts

More News