Waymo is exploring using Google's Gemini to train its robotics

Share

Waymo has long touted its ties to Google DeepMind and decades of artificial intelligence research as a strategic advantage over rivals in the autonomous driving space. Now the Alphabet-owned company is taking it a step further by developing a modern training model for its robotics based on Google’s Gemini multimodal enormous language model (MLLM).

Waymo today published a modern research paper introducing its “End-to-End Multimodal Model for Autonomous Driving,” also known as EMMA. This modern end-to-end training model processes sensor data to generate “future autonomous vehicle trajectories,” helping Waymo’s autonomous vehicles make decisions about where to go and how to avoid obstacles.

But more importantly, this is one of the first signs that an autonomous driving leader has projects to exploit MLLM in its business. This is a sign that these LLMs can break free from their current exploit as chatbots, email organizers and image generators and find exploit in a completely modern environment down the road. In its research paper, Waymo proposes “developing an autonomous driving system in which MLLM will be a first-class citizen.”

A comprehensive multimodal autonomous driving model, also known as EMMA

The article describes how, historically, autonomous driving systems have developed specific “modules” for various functions, including perception, mapping, prediction and planning. This approach proved useful for many years, but presented scaling problems “due to accumulated errors between modules and limited communication between modules.” Moreover, these modules may have difficulty responding to “new environments” because they are “predefined” by nature, which can make adaptation challenging.

a:hover]:text-gray-63 [&>a:hover]:shadow-underline-black dark:[&>a:hover]:text-grey-bd dark:[&>a:hover]:shadow-highlight-gray [&>a]:shadow-highlight-gray-63 dark:[&>a]:text-grey-bd dark:[&>a]:shadow-underline-grey”>Screenshot: Waymo

Waymo developed EMMA as a tool to lend a hand robots navigate sophisticated environments. The company identified several situations where the model helped autonomous cars find the right route, including: in case of encountering various animals or road works.

Other companies, such as Tesla, have spoken extensively about developing comprehensive models of their autonomous cars. says Elon Musk that the latest version of the Full Self-Driving system (12.5.5) uses an “end-to-end neural network” artificial intelligence system that translates camera images into driving decisions.

This is a clear indication that Waymo, which has an advantage over Tesla in deploying autonomous vehicles on the road, is also interested in developing a comprehensive system. The company said its EMMA model excels at predicting trajectories, detecting objects and understanding road charts.

“This suggests a promising direction for future research that could combine even more basic autonomous driving tasks in a similar scaled-up configuration,” the company said in a blog post today.

However, EMMA also has its limitations, and Waymo acknowledges that further research will be necessary before the model can be implemented. For example, EMMA couldn’t incorporate 3D sensor input from lidar or radar, which Waymo said was “computationally expensive.” It could only process a petite number of image frames at a time.

There are also risks associated with using MLLM for robotics training that were not mentioned in the research article. Chatbots like Gemini often hallucinate or fail to perform plain tasks such as reading clocks or counting objects. Waymo has very little margin for error when its autonomous vehicles are traveling at 40 miles per hour on a busy road. More research will be needed before these models can be deployed on a enormous scale – Waymo has no doubt about that.

“We hope that our results will inspire further research to alleviate these issues,” writes the company’s research team, “and further develop state-of-the-art architecture for autonomous driving models.”

The AI Sckool

Categories

Waymo is exploring using Google’s Gemini to train its robotics

Penalties: Does the team that kicks first have a better chance of winning?

3 questions: Beyond data-driven aesthetics

Almost anyone can now sell you GLP-1 on the Internet

7 Real Python Projects You Can Build in 2026 (with Guides)

Start building with Nano Banana 2 Lite and Gemini Omni Flash

More News

What’s going on with Alexa+?

The winter storm tested power grids that are strained to accommodate AI data centers

Google DeepMind employees ask leaders to ensure their “physical safety” from ICE

Google Photos now lets you describe how to turn images into videos

Penalties: Does the team that kicks first have a better chance of winning?

3 questions: Beyond data-driven aesthetics

Almost anyone can now sell you GLP-1 on the Internet