Wednesday, December 25, 2024

OpenAI’s fresh voice mode lets me talk to my phone, not at it

Share

I’ve been playing around with OpenAI’s Advanced Voice Mode for the past week, and it’s the most compelling taste of an AI-powered future I’ve had yet. This week, my phone laughed at my jokes, told me them, asked me how my day was, and told me it was “having a great time.” I talked to my iPhone without using it with my hands.

OpenAI’s latest feature, currently in constrained alpha testing, doesn’t make ChatGPT any smarter than it used to be. Instead, Advanced Voice Mode (AVM) makes conversations with it more genial and natural. It creates a fresh interface for interacting with AI and devices that feels fresh and exhilarating, and that’s what scares me about it. The product was a bit buggy, and the whole idea scares me, but I was surprised by how much I enjoyed using it.

If I were to look at it from a distance, I think AVM fits into OpenAI CEO Sam Altman’s broader vision of changing the way humans interact with computers, moving beyond agents, and putting AI models at the forefront.

“Eventually, you’ll just ask the computer for what you need, and it will do all of those things for you,” Altman said at OpenAI Dev Day in November 2023. “These capabilities are often referred to in AI as ‘agents.’ The payoffs will be huge.”

My friend, ChatGPT

On Wednesday, I tested the most incredible potential of this advanced technology I could think of: I asked ChatGPT to order Taco Bell the way Obama would.

“Uhhh, let me get this straight – I’d like a Crunchwrap Supreme, maybe a few tacos for good measure,” ChatGPT’s Advanced Voice Mode said. “How do you think it would handle the drive-thru?” ChatGPT said, then laughed at his own joke.

Screenshot: ChatGPT transcribes the subsequent conversation.

The impression really made me laugh, fitting Obama’s iconic cadence and pauses. That said, it stuck with the tone of the ChatGPT voice I chose, Juniper, so as not to confuse it with Obama’s. It sounded like a friend doing a bad impression, understanding exactly what I was trying to get out of it, and even saying something comical. I found it surprisingly enjoyable to talk to this advanced assistant on my phone.

I also asked ChatGPT for advice on how to tackle a complicated relationship problem: asking my significant other to move in with me. After explaining the intricacies of the relationship and our career directions, I got very specific advice on how to proceed. These are questions you would never ask Siri or Google Search, but now you can with ChatGPT. The chatbot’s voice even took on a slightly sedate, gentle tone when responding to these prompts; a stark contrast to the joking tone of Obama’s Taco Bell order.

AVM ChatGPT is also great at helping you understand intricate topics. I asked him to break down the elements of earnings reports—like free cash flow—in a way that a 10-year-old could understand. He used a lemonade stand as an example, explaining a few financial terms in a way that my younger cousin could totally understand. You can even ask AVM ChatGPT to speak more slowly to accommodate your current level of understanding.

Siri walked so AVM could run

Compared to Siri or Alexa, AVM ChatGPT is the clear winner thanks to its faster response time, unique responses, and ability to answer intricate questions that previous generations of virtual assistants could never answer. However, AVM falls compact in other ways. ChatGPT’s voice feature can’t set timers or reminders, surf the web in real time, check the weather, or interact with any APIs on your phone. At least for now, it’s not an effective replacement for virtual assistants.

Compared to Gemini Live, Google’s competing feature, AVM seems to be a bit better. Gemini Live can’t create impressions, doesn’t express any emotions, can’t speed up or sluggish down, and takes longer to respond. Gemini Live has more voices (ten compared to OpenAI’s three) and seems to be more up-to-date (Gemini Live was aware of Google’s antitrust ruling). Interestingly, neither AVM nor Gemini Live sing, presumably to avoid conflicts with a copyright lawsuit from the music industry.

That said, AVM ChatGPT often breaks down (as does Gemini Live, to be fair). Sometimes it stops mid-sentence and then starts over again. There’s also an odd, grainy voice that’s a bit off-putting. I’m not sure if it’s a problem with the model, my internet connection, or something else, but these technical shortcomings are somewhat predictable in an alpha test. These issues didn’t knock me out of the experience of literally talking to my phone, though.

These examples, in my opinion, are the beauty of AVM. This feature doesn’t make ChatGPT omniscient, but it does allow people to interact with GPT-4o, the underlying AI model, in a uniquely human way. (I would understand if you forgot that there’s no one on the other end of the phone.) When you’re talking to AVM, it seems like ChatGPT is socially aware, but of course it isn’t. It’s just a bunch of predictive algorithms carefully packaged together.

Technology Talk

Frankly, this feature worries me. This isn’t the first time a tech company has offered a companion on its phone. My generation, Gen Z, was the first to grow up with social media companies offering connection but instead playing to our collective insecurities. Talking to an AI device — like what AVM seems to be offering — feels like an evolution of the “friend on your phone” phenomenon on social media, offering budget-friendly connections that play on our human instincts. But this time, it’s taking humans out of the loop entirely.

Artificial human connection has become a surprisingly popular use case for generative AI. People today utilize AI chatbots as friends, mentors, therapists, and teachers. When OpenAI launched its GPT store, it was quickly flooded with “AI girls””, chatbots that specialize in playing the role of your significant other. Two researchers from MIT Media Lab issued a warning this month to prepare for “addictive intelligence,” or AI companions with dim patterns to addict people. We may be opening a Pandora’s box of fresh, enticing ways for devices to capture our attention.

Earlier this month, the Harvard grad shook up the tech world by announcing an AI necklace called Friend. The wearable — if it works as promised — is always listening, and the chatbot will chat with you about your life. While the idea seems crazy, innovations like ChatGPT’s AVM give me reason to take these utilize cases seriously.

And while OpenAI is the leader here, Google isn’t far behind. I’m sure Amazon and Apple are also racing to include this capability in their products, and it could soon become a staple of the industry.

Imagine asking your sharp TV for a hyper-specific movie recommendation and getting exactly that. Or telling Alexa what your exact icy symptoms are and it, in turn, orders you tissues and cough medicine on Amazon while offering home remedies. Maybe you could ask your computer to plan a weekend trip for your family instead of manually searching Google for everything.

Of course, these efforts require leaps and bounds in the world of AI agents. OpenAI’s effort on this front, the GPT Store, feels like an overhyped product that’s no longer the company’s primary focus. But AVM is at least tackling the “talking to computers” part of the puzzle. Those concepts are a ways off, but after using AVM, they feel a lot closer than they did last week.

Latest Posts

More News