Wednesday, December 25, 2024

Gemini Live First Look: Better Than Talking to Siri, But Worse Than I’d Like

Share

Google launched Gemini Live at its Made by Google event on Tuesday. The feature lets you have a semi-natural conversation, spoken rather than typed, with an AI chatbot powered by Google’s latest major language model. TechCrunch was there to test it out for ourselves.

Gemini Live is Google’s answer to OpenAI Advanced Voice Mode, a nearly identical ChatGPT feature that’s currently in confined alpha testing. While OpenAI beat Google to the punch by demonstrating the feature first, Google is the first to release a final version of the feature.

In my experience, these low-latency verbal features feel much more natural than texting with ChatGPT or even talking to Siri or Alexa. I found that Gemini Live answered questions in under two seconds and was able to change the subject fairly quickly when interrupted. Gemini Live isn’t perfect, but it’s the best hands-free way to employ a phone that I’ve seen.

How Gemini Live works

Before talking to Gemini Live, the feature lets you choose from 10 voices, compared to just three from OpenAI. Google worked with voice actors to create each one. I appreciated the variety and found each to sound very human.

In one example, a Google product manager verbally asked Gemini Live to find family-friendly wineries near Mountain View with outdoor areas and playgrounds nearby so kids could potentially come. That’s a much more involved task than I would have asked Siri — or Google Search, to be truthful — but Gemini successfully recommended a place that met the criteria: Cooper-Garrod Vineyards in Saratoga.

That said, Gemini Live leaves much to be desired. It seemed to hallucinate a nearby playground called Henry Elementary School Playground, which is supposedly “10 minutes away” from this winery. There are other playgrounds near Saratoga, but the closest one is Henry Elementary School, which is over a two-hour drive away. There is Henry Ford Elementary School in Redwood City, but it’s 30 minutes away.

Google used to like to show how users could interrupt Gemini Live mid-sentence, and the AI ​​would quickly change. The company says this lets users control the conversation. In practice, the feature doesn’t work perfectly. Sometimes, Google and Gemini Live project managers would talk to each other and the AI ​​wouldn’t pick up on what was being said.

Interestingly, Google doesn’t allow Gemini Live to sing or imitate any voices beyond the 10 it provides, according to product manager Leland Rechis. The company likely does this to avoid copyright conflicts. Additionally, Rechis said Google isn’t focused on making Gemini Live understand the emotional intonation of a user’s voice — something OpenAI touted during its demo.

Overall, the feature seems like a great way to dig deeper into a topic in a more natural way than you would with a uncomplicated Google search. Google notes that Gemini Live is a step toward Project Astra, the fully multi-modal AI model the company showed off at Google I/O. For now, Gemini Live only allows for voice calls; however, in the future, Google wants to add real-time video understanding.

Latest Posts

More News