For robots to be truly helpful in our everyday lives and industry, they must do more than just follow instructions, they must think about the physical world. From navigating a convoluted object to interpreting a pressure gauge, a robot’s “embodied reasoning” allows it to bridge the gap between digital intelligence and physical action.
Today we are introducing Gemini Robotics-ER 1.6, a significant improvement to our reasoning-based model that enables robots to understand their environments with unprecedented precision. By enhancing spatial reasoning and understanding of multiple viewpoints, we enable a up-to-date level of autonomy for the next generation of physical agents.
This model specializes in reasoning abilities key to robotics, including visual and spatial understanding, task planning, and success detection. It acts as a high-level reasoning model for a robot that can perform tasks by natively invoking tools such as Google Search for information retrieval, vision, language and action (VLA) models, or other third-party user-defined functions.
Gemini Robotics-ER 1.6 shows significant improvement over both Gemini Robotics-ER 1.5 AND Gemini Flash 3.0in particular, improving spatial and physical reasoning skills such as pointing, counting and detecting successes. We are also unlocking a up-to-date capability: instrument reading, enabling robots to read convoluted gauges and sight glasses – a exploit case we discovered through close collaboration with our partner Boston Dynamics.
Starting today, Gemini Robotics-ER 1.6 is available to developers via Gemini API AND Google Artificial Intelligence Studio. To assist you get started, we provide a developer aluminum containing examples of setting up the model and prompting it to perform embodied reasoning tasks.
