A week after algorithms advised humans eat stones and place pizza glueGoogle admitted Thursday that it needs to make changes to its bold up-to-date AI-powered generative search feature. This episode highlights the risks associated with Google’s aggressive pursuit of commercializing generative AI, as well as the insidious and fundamental limitations of this technology.
Google’s AI Reviews feature draws on Gemini, a vast language model like the one behind OpenAI’s ChatGPT, to generate written responses to certain queries by summarizing information found on the Internet. The current AI boom is based on LLM’s impressive text-handling prowess, but software can also utilize this feature to convincingly put a gloss on untruths and errors. Using this technology to summarize the promises of information available on the Internet can make search results easier to digest, but it is risky when online sources are meager or when people can utilize the information to make vital decisions.
“With the LLM you can now get a quick, neat prototype pretty quickly, but actually making it so that it doesn’t make you eat rocks takes a lot of work,” says Richard Socher, who made key contributions to artificial intelligence in languages as a researcher and tardy 2021 launched an artificial intelligence-powered search engine called Ty.com.
Socher argues that LLM disputes require significant effort because the underlying technology does not allow for real understanding of the world and the web is overflowing with unreliable information. “In some cases, it’s better to not just give an answer or show many different points of view,” he says.
– said Liz Reid, head of Google search, in a company report blog post late Thursday that it conducted extensive testing before launching the AI overhaul. But she added that errors such as the rock-eating and pizza-gluing examples, in which Google’s algorithms pulled information from a satirical article and a tongue-in-cheek Reddit comment, respectively, resulted in additional changes. These include better detection of “meaningless queries,” Google says, and reducing the system’s reliance on user-generated content.
Socher says You.com routinely avoids errors showing up in Google’s AI review because his company has developed more than a dozen tricks to prevent LLM from malfunctioning during searches.
“We are more accurate because we have put a lot of resources into improving accuracy,” says Socher. You.com uses, among other things, a specially created online index to support LLM avoid incorrect information. It also selects from a variety of different LLMs to answer specific queries, and uses a citation mechanism that can clarify when sources conflict. Still, getting AI searches right is arduous. WIRED determined Friday that You.com did not correctly respond to a query known to trigger other artificial intelligence systems, stating that “based on available information, there are no African countries whose names begin with the letter ‘K.” In previous tests it was successful on the query.
Google’s generative AI upgrade to its most popular and lucrative product is part of a reboot across the tech industry inspired by OpenAI’s release of its ChatGPT chatbot in November 2022. A few months after ChatGPT’s debut, Microsoft, a key OpenAI partner, used its technology to modernize the also-ran search engine Bing. The improved Bing was plagued by AI-generated errors and strange behavior, but the company’s CEO, Satya Nadella, said the move was intended to challenge Google, saying “I want people to know that we made them dance.”
Some experts believe that Google rushed to update its artificial intelligence. “I’m surprised they introduced this service for so many inquiries – medical and financial – I thought they would be more careful,” he says Barry Schwartz, news editor at Search Engine Land, a publication that tracks the search industry. He adds that the company should have better anticipated that some people would deliberately try to undermine AI Overviews. “Google has to be smart about this,” says Schwartz, especially when it displays default results for its most valuable product.
LilyRay, a search engine optimization (SEO) consultant, spent a year as a beta tester for a prototype that preceded AI Reviews, which Google called Search Generative Experience. She says she wasn’t surprised by the bugs that appeared last week, considering the previous version usually didn’t work. “I think it’s virtually impossible to get everything right all the time,” Ray says. “That’s the nature of artificial intelligence.”
