Tests
Exploring AGI, the Challenges of Scaling, and the Future of Multimodal Generative AI
Next week, the artificial intelligence (AI) community will gather for the 2024 conference International Conference on Machine Learning (ICML). Held July 21–27 in Vienna, Austria, the conference is an international platform for presenting the latest advances, exchanging ideas, and shaping the future of AI research.
This year, teams from across Google DeepMind will present over 80 research papers. At our booth, we will also showcase our multi-modal model on device, Gemini Nano, our recent family of AI models for education called LearnLM and we will present TacticAI, an AI assistant that can lend a hand develop football tactics.
Below are some of our oral presentations, keynote addresses, and posters:
Defining the Path to AGI
What is artificial general intelligence (AGI)? The phrase describes an AI system that is at least as capable as a human at most tasks. As AI models evolve, defining what AGI might look like in practice will become increasingly critical.
We will present a framework for classification of AGI model capabilities and behaviors. Depending on their performance, generality, and autonomy, our article classifies systems from non-AI calculators to emerging AI models and other recent technologies.
We will also show that Openness is key to creating generalized AI beyond human capabilities. While many recent advances in AI have been driven by existing Internet-scale data, open systems can generate recent discoveries that extend human knowledge.
At ICML we will present Genie, a model that can generate a variety of playable environments based on text clues, images, photos, and sketches.
Scaling AI Systems Efficiently and Responsibly
Developing larger and more proficient AI models requires more effective training methods, better adaptation to human preferences, and better privacy safeguards.
We will show you how to apply classification instead of regression techniques facilitates scaling of deep reinforcement learning systems and achieving state-of-the-art performance across domains. We further propose a novel approach that predicts the distribution of consequences of the actions of a reinforcement learning agenthelping to quickly evaluate recent scenarios.
Our researchers present approach to maintaining alignment which reduces the need for human supervision and a new approach to tuning large language models (LLM)based on game theory, better aligns LLM results with human preferences.
We criticize the approach of training models on public data and fine-tuning them solely using “differentially private” trainingand argue that this approach may not provide the privacy and usability that is often claimed.
VideoPoet is a comprehensive language model for generating frameless videos.
Modern Approaches in Generative AI and Multimodality
Generative AI technologies and multimodal capabilities expand the artistic possibilities of digital media.
We will present VideoPoetwhich uses the LLM method to generate state-of-the-art video and audio content from multimodal input data including images, text, audio, and other video content.
And share gin (Generative Interactive Environments) that can generate a range of playable environments for training AI agents based on text prompts, images, photos or sketches.
Finally, we present MagicLensa novel image retrieval system that uses text-based instructions to search for images with richer relationships beyond visual similarity.
Supporting the AI Community
We are proud to sponsor ICML and support the diverse AI and machine learning community by supporting initiatives led by Disability in AI,Queer in AI,LatinX in AI ANDWomen in Machine Learning.
If you’re at the conference, stop by the Google DeepMind and Google Research booths to meet our teams, watch live presentations, and learn more about our research.