OpenAI began rolling out ChatGPT’s Advanced Voice Mode on Tuesday, giving users the first access to hyper-realistic GPT-4o audio responses. The alpha version will be available to a petite group of ChatGPT Plus users today, and OpenAI says the feature will gradually roll out to all Plus users in fall 2024.
When OpenAI first unveiled the GPT-4o voice in May, the feature shocked viewers with its quick reactions and uncanny resemblance to a real human voice—one in particular. The voice, Sky, resembled that of Scarlett Johansson, the actress behind the artificial assistant in the movie “Her.” Shortly after OpenAI’s demo, Johansson said she had declined numerous requests from CEO Sam Altman to employ her voice, and after seeing a GPT-4o demo, she hired a lawyer to defend her likeness. OpenAI denied using Johansson’s voice but later removed the voice shown in its demo. In June, OpenAI said it would delay the release of Advanced Voice Mode to improve its security measures.
A month later, the wait is over (sort of). OpenAI says the video and screen sharing capabilities showcased during the Spring Update won’t be part of this alpha, but will launch “at a later date.” For now, the GPT-4o demo that everyone raved about is still just a demo, but some premium users will now have access to the ChatGPT voice feature showcased there.
ChatGPT can now talk and listen
You may have already tried the voice mode currently available in ChatGPT, but OpenAI says the advanced voice mode is different. ChatGPT’s elderly audio solution used three separate models: one to convert voice to text, GPT-4 to process the message, and then a third to convert ChatGPT text to voice. However, GPT-4o is multimodal, capable of processing these tasks without the assist of auxiliary models, allowing for conversations with much lower latency. OpenAI also says GPT-4o can detect emotional intonations in a voice, including sadness, excitement, or singing.
In this pilot, ChatGPT Plus users will see first-hand just how hyper-realistic OpenAI’s Advanced Voice Mode really is. TechCrunch wasn’t able to test the feature before this article went live, but we’ll review it as soon as we get access to it.
OpenAI says it is gradually rolling out the fresh ChatGPT voice to closely monitor its employ. Those in the alpha group will receive an alert in the ChatGPT app, followed by an email with instructions on how to employ it.
In the months since OpenAI’s demonstration, the company says it has been testing GPT-4o’s voice capabilities with more than 100 external red team members who speak 45 different languages. OpenAI says a report on those security efforts is due in early August.
The company says Advanced Voice Mode will be narrow to ChatGPT’s four pre-defined voices — Juniper, Breeze, Cove, and Ember — created in partnership with paid voice actors. The Sky voice shown in OpenAI’s May demo is no longer available on ChatGPT. OpenAI spokesperson Lindsay McCallum says, “ChatGPT cannot imitate the voices of other people, whether private or public, and will block outputs that differ from one of these pre-defined voices.”
OpenAI is trying to avoid deepfake controversy. In January, AI startup ElevenLabs’ voice-cloning technology was used to impersonate President Bidendefrauding voters in the Novel Hampshire primary election.
OpenAI also says it has introduced fresh filters to block specific requests to generate music or other copyrighted audio. AI companies have run into legal trouble over copyright infringement in the past year, and audio models like GPT-4o are opening up a whole fresh category of companies that could file a complaint. Specifically, record labels, which have a history of litigation and have already sued AI song generators Suno and Udio.