Today we are releasing two improved Gemini models ready for production: Gemini-1.5-Pro-002 AND Gemini-1.5-Flash-002 together with:
- >50% off price on version 1.5 Pro (both input and output for <128K prompts)
- 2x higher speed limits for 1.5 Flash and ~3x higher for 1.5 Pro
- 2x faster results and 3x less delay
- Updated default filter settings
These fresh models are based on our latest experimental models and include significant improvements to the Gemini 1.5 models released at Google I/O in May. Developers can access our latest models for free via Google AI Studio and API GeminiFor larger organizations and Google Cloud customers, models are also available on Vertex Artificial Intelligence.
Improved overall quality, with greater gains in math, long context, and vision
With the latest 1.5 updates, Pro and Flash are now better, faster, and more cost-effective to build for production. We see a ~7% enhance in MMLU-Pro, a more demanding version of the popular MMLU benchmark. In the MATH and HiddenMath benchmarks (the competition’s internal math problem set), both models achieved significant ~20% improvement. In the vision and code utilize cases, both models also perform better (in the ~2-7% range) in assessments measuring visual comprehension and Python code generation.
We’ve also improved the overall usability of model responses while still upholding our content safety policies and standards. This means less digging/fewer downvotes and more helpful responses across a range of topics.
Both models now feature a more concise style in response to developer feedback, with the goal of making these models easier to utilize and cheaper. For utilize cases such as summarizing, answering questions, and extraction, the default output length of the updated models is ~5-20% shorter than previous models. For chat-based products where users may prefer longer responses by default, you can read our a guide to encouraging strategies to learn more about how to make your models more detailed and conversational.
For more details on migrating to the latest Gemini 1.5 Pro and 1.5 Flash versions, please refer to Gemini API Models Page.
Gemini 1.5 Pro
Increased rate limits
To make it even easier for developers to build with Gemini, we’re increasing the speed limits for Flash 1.5 to 2000 RPM and 1.5 Pro to 1000 RPM, from 1000 and 360, respectively. We expect to see further increases in the coming weeks. Gemini API Rate Limits allowing developers to create more with Gemini.
2x faster results and 3x less delay
In addition to the major improvements to our latest models, over the past few weeks we have reduced latency with 1.5 Flash memory and significantly increased the number of output tokens per second, enabling fresh applications for our most powerful models.
Updated filter settings
Since the first Gemini launch in December 2023, building a safe and reliable model was a key goal. In the latest versions of Gemini (-002 models) we have made improvements in the model’s ability to follow user instructions while maintaining a balance of safety. We will continue to offer a set security filters that developers can apply to Google models. For models released today, filters will not be applied by default so that developers can determine the configuration that best suits their utilize case.
Gemini 1.5 Flash-8B Experimental Updates
We are releasing another improved version of Gemini 1.5 that we announced in August, called “Gemini-1.5-Flash-8B-Exp-0924”. This improved version includes significant performance improvements for both text and multimodal utilize cases. It is available now via Google AI Studio and Gemini API.
The overwhelmingly positive feedback we have received from developers about Flash-8B 1.5 is incredible, and we will continue to shape our experimental-to-production process based on developer feedback.
We are excited about these updates and can’t wait to see what you build with the fresh Gemini models! And for Gemini Advanced Users, you will soon be able to access the chat-optimized version of Gemini 1.5 Pro-002.