Gemini breaks fresh ground with a faster model, longer context, AI agents and more

Share

1.5 Flash is great for summaries, chat applications, image and video captions, extracting data from long documents and tables, and more. This is because it has been trained by the 1.5 Pro in a process called “distillation”, during which the most significant knowledge and skills from the larger model are transferred to a smaller, more capable model.

Read more about version 1.5 Flash in our update Gemini Technical Report 1.5on the Gemini technology website and learn more 1.5 Flash Availability and Pricing.

Significantly improved version 1.5 Pro

Over the past few months, we’ve significantly improved the 1.5 Pro, our top model for overall performance across a wide range of tasks.

In addition to expanding the context window to 2 million tokens, we have improved code generation, logical reasoning and planning, multi-turn conversation, and audio and video understanding thanks to advances in data and algorithms. For each of these tasks, we see significant improvements in public and internal benchmarks.

Version 1.5 Pro can now execute increasingly elaborate and refined instructions, including those that specify product-level behavior including role, format, and style. We’ve improved control over model responses for specific operate cases, such as creating a chat agent’s personality and response style or automating workflows through multiple function calls. We also enabled users to control the model’s behavior through settings system instructions.

We have added a function to understand audio in the file Gemini API AND Google Artificial Intelligence Studio, so version 1.5 Pro can now analyze video and audio for videos uploaded to Google AI Studio. We are now integrating version 1.5 Pro with Google products, including: Advanced Gemini and Working area apps.

Read more about 1.5 Pro in our update Gemini Technical Report 1.5 and on the Gemini technology website.

Gemini Nano supports multimodal inputs

Gemini Nano goes beyond text-only input and also includes images. Starting with Pixel, apps using Gemini Nano with Multimodality will be able to understand the world as humans do – not just through text, but also through image, sound and spoken language.

Read more about Gemini 1.0 Nano for Android.

The AI Sckool

Categories

Gemini breaks fresh ground with a faster model, longer context, AI agents and more

Significantly improved version 1.5 Pro

Gemini Nano supports multimodal inputs

Why do LLMs mess up your documents when posting?

The untold story of Google buses taking over San Francisco

So long, “Ferrynoia.” Green marine technology is already here

Polymarket and Kalshi say influencers can’t actually deny election results

OpenAI confidentially files for an IPO right after SpaceX and Anthropic

More News

A novel take on the mouse pointer for the AI era

Co-scientist: A multi-agent AI partner to accelerate research

Gemini 3.5: intelligence bordering on action

How WeatherNext helped the National Hurricane Center better predict Hurricane Melissa’s historic landfall in Jamaica

Why do LLMs mess up your documents when posting?

The untold story of Google buses taking over San Francisco

So long, “Ferrynoia.” Green marine technology is already here