When bizarre and misleading responses to queries generated by Google’s modern AI Review feature became popular on social media last week, the company issued statements that generally downplayed the notion that the technology was in trouble. Overdue Thursday, the company’s head of search, Liz Reid, acknowledged that the errors highlighted areas that needed improvement, writing that “we wanted to explain what happened and what steps we took.”
Reid’s post directly referenced two of the most viral and wildly incorrect AI review results. One of them saw the support of Google algorithms eating stones because doing so “might be good for you,” and the other suggested using non-toxic glue thicken the pizza sauce.
Eating stones isn’t a topic that many people have written about or asked questions about online, so there aren’t many sources for a search engine to employ. According to Reid, an AI tool was found article from The Oniona satirical website that was republished by a software company that misinterpreted the information as factual.
As for the Google command that advised users to put glue on their pizza, Reid effectively attributed the mistake to a deterioration in his sense of humor. “We have seen AI reviews containing sarcastic or trolling content from message boards,” she wrote. “Forums are often a great source of authentic, first-hand information, but in some cases they can lead to not-so-helpful advice, such as how to use glue to make cheese stick to pizza.”
It’s probably best not to create any AI-generated dinner menu without reading it carefully first.
Reid also suggested that it would be unfair to judge the quality of Google’s modern approach to search based on virus screenshots. She claimed that the company conducted extensive testing before its launch and that the company’s data shows that people value AI reviews, including indicating that people are more likely to stay on a site discovered this way.
Where did these embarrassing failures come from? Reid characterized the errors that came to attention as a result of the internet-wide audit, which was not always well intended. “There’s nothing better than millions of people using this feature for lots of innovative searches. We also observed new, meaningless searches seemingly intended to produce incorrect results.”
Google claims that some widely shared screenshots of incorrect AI overhauls were bogus, which appears to be true based on WIRED’s own tests. For example, user X posted a screenshot it looked like an AI review answering the question “Can a cockroach live in your penis?” with enthusiastic confirmation from the search engine that this is normal. The post has been viewed over five million times. However, upon further inspection, the screenshot format is not consistent with how AI overviews are actually presented to users. WIRED was unable to reproduce a result close to this result.
And it’s not just social media users who have been fooled by misleading screenshots of bogus AI overviews. Novel York Times issued a correction to its reports on the feature and clarified that AI Reviews never suggested that users should jump off the Golden Gate Bridge if they were experiencing depression – it was simply a murky meme on social media. “Others have suggested that we had dangerous results on topics such as leaving dogs in cars, smoking during pregnancy and depression,” Reid wrote Thursday. “These AI reviews never came out.”
However, Reid’s post also makes clear that not all was well with the initial form of Google’s modern major search engine update. She wrote that the company had made “a dozen technical improvements” to AI Review.
Only four were described: better detection of “meaningless queries” unworthy of AI review; reducing the feature’s reliance on user-generated content from sites like Reddit; Offering AI reviews less frequently in situations where users didn’t find them useful; and strengthening barriers to AI summaries on crucial topics such as health.
There was no mention in Reid’s blog post of any significant phasing out of AI summaries. Google says it will continue to monitor user feedback and adjust features as necessary.
