Novel capabilities of Gemini 2.5
Native audio output and Live API improvements
Today, Active API introduces a preview of audiovisual input and native audio output dialogue, so you can directly build conversational experiences with a more natural and expressive Gemini.
It also allows the user to control tone, accent and speaking style. For example, you can tell your model to tell a story in a dramatic voice. It also supports the operate of tools to search on your behalf.
You can experiment with a set of early features including:
- Affective dialogue, in which the model detects emotions in the user’s voice and responds accordingly.
- Proactive Audio, where the model will ignore background conversations and know when to respond.
- Live API thinking, where the model leverages Gemini’s thinking capabilities to handle more complicated tasks.
We’re also rolling out recent Text-to-Speech previews in 2.5 Pro and 2.5 Flash. They provide first-of-its-kind multi-speaker support, enabling text-to-speech with two voices via native audio output.
Like Native Audio dialogue, text-to-speech is crisp and capable of capturing truly subtle nuances such as whispers. It works in over 24 languages and switches between them seamlessly.
