It’s been a tumultuous week for OpenAI, filled with executive departures and large fundraising events, but the startup is back at it and trying to convince developers to build tools using artificial intelligence models at DevDay 2024. The company announced several modern tools on Tuesday, including a public release beta “Realtime API” for building applications with low-latency AI-generated voice responses. It’s not quite the advanced ChatGPT voice mode, but it’s close.
In a briefing with reporters before the event, OpenAI’s chief product officer Kevin Weil said the recent departures of chief technology officer Mira Murati and chief research officer Bob McGrew would have no impact on the company’s progress.
“Let me start by saying that Bob and Mira were incredible leaders. I learned a lot from them and they played a huge role in getting us to where we are today,” Weil said. “Plus, we have no intention of slowing down.”
As OpenAI undergoes yet another C suite overhaul – echoing the turmoil following last year’s DevDay – the company is trying to convince developers that it still offers the best platform for building AI applications. Leaders say the startup has more than 3 million developers using artificial intelligence models, but OpenAI operates in an increasingly competitive space.
OpenAI noted that it has reduced the cost of developer API access by 99% over the past two years, though this was likely forced by competitors like Meta and Google constantly undercutting prices.
One of OpenAI’s modern features, called Realtime API, will give developers the ability to create near real-time speech-to-speech functionality in their applications, with the ability to utilize six voices provided by OpenAI. These voices are different from those offered in ChatGPT, and developers are not allowed to utilize third-party voices to prevent copyright issues. (The voice ambiguously based on Scarlett Johansson’s is not available anywhere.)
During the briefing, OpenAI’s director of developer experience, Romain Huet, shared a demo of a travel planning app built using the Realtime API. The app allowed users to verbally converse with an AI assistant about an upcoming trip to London and receive a low-latency response. The Realtime API also has access to a number of tools, so the app could annotate the map with restaurant locations as it responded.
At another point, Huet showed how the Realtime API could talk to a human on the phone to ask about ordering food for an event. Unlike Google’s infamous Duo, OpenAI’s API cannot call restaurants or stores directly; however, it can integrate with calling APIs such as Twilio to do so. It’s worth noting that OpenAI adds disclosures so that its AI models automatically identify themselves during such conversations, even though the AI-generated voices sound quite realistic. For now, it appears that adding this disclosure is up to developers, which may be required by the modern California law.
As part of the DevDay announcement, OpenAI also introduced vision tuning in its API, which will enable developers to utilize images and text to enhance GPT-4o applications. Theoretically, this should lend a hand developers improve GPT-4o’s performance for tasks requiring visual understanding. OpenAI product API lead Olivier Godement tells TechCrunch that developers will not be able to upload copyrighted images (like a photo of Donald Duck), violent images, or other images that violate OpenAI security policies.
OpenAI is racing to match what competitors are already offering in the AI model licensing space. The swift caching feature is similar to a feature Anthropic launched a few months ago that allows developers to cache frequently used context between API calls, reducing costs and improving latency. OpenAI claims developers can save 50% by using this feature, while Anthropic promises 90% discount for this.
Finally, OpenAI offers a model distillation feature that allows developers to utilize larger AI models, such as o1-preview and GPT-4o, to fine-tune smaller models, such as GPT-4o mini. Running smaller models typically provides savings compared to running larger ones, but this feature should allow developers to improve the performance of these compact AI models. As part of its model distillation, OpenAI launches a beta evaluation tool so developers can measure and tune performance in the OpenAI API.
DevDay may cause more confusion because of what it didn’t announce – for example, no modern additions to the GPT store were announced during last year’s DevDay. Last we heard, OpenAI was piloting a revenue-sharing program with some of the most popular GPT developers, but the company hasn’t announced much since then.
Additionally, OpenAI says it will not release any modern AI models on DevDay this year. Developers waiting for OpenAI o1 (not a preview or mini version) or the startup Sora’s video generation model will have to wait a little longer.