When Google Lens was introduced in 2017, the search function did something that would have seemed like science fiction not too long ago: point your phone’s camera at an object, and Google Lens could identify it, show context, and maybe even let you buy it. It was a modern way of searching that didn’t require awkwardly typing in descriptions of the things you saw in front of you.
Lens also showed how Google planned to apply its machine learning and artificial intelligence tools to make sure its search engine was displayed on every possible surface. As Google increasingly leverages its core generative AI models to generate information summaries in response to text searches, visual search in Google Lens is also evolving. And now the company says Lens, which powers about 20 billion searches per month, will support even more ways to search, including video search and multimodal search.
Another change to Lens means even more shopping context will appear in the results. Shopping is, unsurprisingly, one of Lens’ key uses; Amazon and Pinterest also have visual search tools designed to encourage more repeated purchases. Search for a friend’s sneakers on elderly Google Lens and you might be presented with a carousel of similar items. Google says that in the updated version, Lens will show more direct links to purchases, customer reviews, publisher reviews and shopping comparison tools.
Lens search is now multi-modal, which is the current buzzword in AI, which means users can now search using a combination of video, images and voice commands. Instead of pointing a smartphone camera at an object, tapping the focus point on the screen, and waiting for the Lens app to display the results, users can point the lens and simultaneously issue voice commands, such as: “What kind of clouds are these?” or “What brand are these sneakers and where can I buy them?”
The lens will also start working on real-time video capture, which will take the tool a step further than identifying objects in still images. If you have a broken record player or see a flashing featherlight on a malfunctioning device at home, you can record a tiny video using Lens and, thanks to AI’s generative review, see tips on how to fix the item.
The feature, first announced at I/O, is considered experimental and is only available to people who have opted in to apply Google’s search labs, says Rajan Patel, an 18-year Google employee and co-founder of Lens. Other Google Lens features, voice mode and enhanced shopping, will be rolled out more broadly.
The “video understanding” feature, as Google calls it, is intriguing for several reasons. While it currently works with real-time video, if or when Google expands this feature to captured videos, entire repositories of videos — whether in a person’s camera roll or in a massive database like Google’s — could potentially become straightforward to access markings and, for the most part, straightforward to purchase.
Secondly, the Lens feature shares some features with Google’s Astra design, which is expected to be available later this year. Astra, like Lens, uses multimodal data to interpret the world around you through your phone. As part of Astra’s demonstration this spring, the company showed off a pair of prototype astute glasses.
Plus, Meta just made a splash with its long-term vision for the future of augmented reality, which involves mere mortals with strange glasses who can intelligently interpret the world around them and show them holographic interfaces. Google, of course, has already tried to realize this future with Google Glass (which uses fundamentally different technology than Meta’s latest offering). Will the modern Lens features combined with Astra be a natural transition to a modern breed of astute glasses?