In May, when OpenAI first demonstrated an incredibly realistic, near-real-time “advanced voice mode” for its AI-powered ChatGPT chatbot platform, the company said the feature would be available to paying ChatGPT users within a few weeks.
A few months later, OpenAI says it needs more time.
In a post on the official OpenAI Discord server, OpenAI says that it had planned to start rolling out the Advanced Voice Mode in alpha to a diminutive group of ChatGPT Plus users in slow June, but ongoing issues forced it to postpone the launch until sometime in July.
“For example, we are improving the model’s ability to detect and reject specific content,” writes OpenAI. “We are also working to improve user experience and prepare our infrastructure to scale to millions while maintaining real-time response. As part of our iterative rollout strategy, we will launch an alpha release with a diminutive group of users to gather feedback and build on what we learn.
Advanced Voice Mode may not launch for all ChatGPT Plus customers until the fall, OpenAI says, depending on whether it meets certain internal security and reliability checks. However, the delay will not affect the rollout of up-to-date video and screen sharing features, which were unveiled separately at OpenAI’s spring press conference.
These capabilities include solving math problems based on a picture of the problem and explaining the various settings menus on the device. They are designed to work with ChatGPT on smartphones as well as desktop computers, just like the macOS app that was made available to all ChatGPT users today.
“ChatGPT’s advanced voice mode can understand and respond to emotions and non-verbal signals, bringing us closer to natural, real-time conversations with artificial intelligence,” writes OpenAI. “Our mission is to thoughtfully bring you new experiences.”
On stage at the launch event, OpenAI staff demonstrated that ChatGPT responds almost instantly to requests, such as solving a math problem on a piece of paper placed in front of a researcher’s smartphone camera.
OpenAI’s advanced voice mode has caused a lot of controversy due to the similarity of the default “Sky” voice to that of actress Scarlett Johansson. Johansson later released a statement saying that she had retained legal counsel to inquire about the voice and obtain exact information about its creation, and that she had refused OpenAI’s repeated requests to license her voice for ChatGPT.
OpenAI, denying that it used Johansson’s voice without permission or sounded similar, later removed the offending voice.