Siri was a huge deal for Apple. At the 4S launch event, Apple’s Phil Schiller said Siri was the best feature of the recent device. “For decades, technologists have teased us, dreaming that we could talk to technology and it would do everything for us,” he said. “But it never works!” That said, all we really want is to talk to our device however we want and get information and support. In a moment of classic Apple bravado, Schiller announced that Apple had terminated him.
Apple hasn’t solved this. In the 13 years since it was first launched, Siri has become either a way for most people to set timers or a useless feature to be avoided at all costs. Siri has been bad for a long time, so long that for years it seemed that Apple either forgot about it or simply preferred to pretend it didn’t exist.
But next week at WWDC, if the rumors and reports turn out to be true, we might get our first chance to meet the real Siri — or at least something much closer to it. According to Bloomberg, New York Timesand more, Apple will unveil a massive assistant overhaul that will make Siri more reliable with gigantic language models, but without many recent features. Even that would be a victory. However, it looks like Apple is working on a version of Siri that will integrate with apps and may be almost ready to launch, which means the assistant will be able to take actions on your behalf on your device. At least in theory, everything you can do on your phone, Siri could soon do for you.
This, of course, was Siri’s vision all along. You can even see it in iPhone 4S ads: these celebrities are asking Siri for support, and Siri almost never finishes what she’s doing. It provides Deschanel with a list of restaurants that mention delivery, but doesn’t offer the option to order anything or show a menu. He informs Scorsese that there’s traffic, but doesn’t reroute him – and shouldn’t he already know he’ll be slow for his meeting? Siri tells Malkovich to be nice to people and read a good book, but offers no practical support. So far, using Siri is like having a virtual assistant whose only job is to look things up on Google for you. This is something! But it’s not much.
Siri’s inabilities were made all the more frustrating by the fact that everything that needs to be useful is there right there, on your phone. When I want pizza, why can’t Siri check my email for confirmation of my last order, open DoorDash, enter the same order, pay with one of the cards in my Apple Wallet and be done with it? If I’m having a Scorsese-level busy day, Siri seems to be right next to all my contacts, my Slack, my email, and everything else I need to quickly move things around on my behalf. If Siri could take over my phone like one of those remote access tools that lets someone else move the cursor on their computer, it would be unstoppable.
There are really two reasons why Siri has never lived up to its potential in this way. The first is elementary: the underlying technology wasn’t good enough. If you’ve used Siri, you know how often it misses names, misunderstands commands, and reverts to “here’s some stuff I found on the internet” when you just wanted to play a podcast. This is where gigantic language models are undoubtedly very exhilarating, as we’ve seen how much better speech-to-text tools like Whisper are, and how much more broadly these models can understand language. They’re not perfect, but they’re a huge improvement over what we had before – which is why Amazon is also moving Alexa to LLM, and Google Assistant is being taken over by Gemini.
The second reason Siri has never quite worked is simply that neither Apple nor third-party developers have ever figured out how to do it should Work. How do you know what Siri can do and how to ask about it? How should developers integrate Siri? Even now, if you want to add a task to a to-do list app, Siri can’t simply figure out which app you’re using. You have to say, Hey Siri, remind me to water the grass in Todoist, which is a weird sentence that makes no sense and in my experience it fails half the time anyway. If you want to perform a multi-step action, your only option is to mess around with Shortcuts, which is a very powerful tool but doesn’t require writing any code. For most people this is too much.
Artificial intelligence may also give Apple a chance to end the whole problem. Its researchers published a paper earlier this year detailing a system called Ferret-UI, which uses an artificial intelligence model to understand the fine details of on-screen images. The researchers even describe in detail how a general app using Siri might work: OpenAI’s GPT-4 is good at understanding generally what an image is, and Ferret is able to understand petite areas and details. In practice, this may mean that one of the systems will say: “This is the Ticketmaster application!” and the other one says, “Here’s the buy button.”
We should be skeptical of any claims Apple makes about Siri. More than a decade ago, Schiller stood on stage and announced that Apple had built a better voice assistant, and it didn’t. The same may be true now, as the hype around AI continues to advance much faster than the actual technology. Humane, Rabbit, Google and other companies are working on similar ideas — “agent” is the buzzword of the summer in the world of artificial intelligence — and no one has shown it’s ready yet.
But if Apple has broken something here, this could be the first time we see the real Siri – the Siri we were promised many years ago. Perhaps in the next commercial, Deschanel’s tomato soup will magically appear in her home and the Headspace app will launch to provide Malkovich with inner peace. Maybe we’ll finally get the Siri Apple has always dreamed of.
