Chatbot Hallucinations Are Poisoning Web Search

Web search is such a routine part of daily life that it’s easy to forget how marvelous it is. Type into a little text box and a complex array of technologies—vast data centers, ravenous web crawlers, and stacks of algorithms that poke and parse a query—spring into action to serve you a simple set of relevant results.

At least, that’s the idea. The age of generative AI threatens to sprinkle epistemological sand into the gears of web search by fooling algorithms designed for a time when the web was mostly written by humans.

Take what I learned this week about Claude Shannon, the brilliant mathematician and engineer known especially for his work on information theory in the 1940s. Microsoft’s Bing search engine informed me that he had also foreseen the appearance of search algorithms, describing a 1948 research paper by Shannon called “A Short History of Searching” as “a seminal work in the field of computer science outlining the history of search algorithms and their evolution over time.”

Like a good AI tool, Bing also offers a few citations to show that it has checked its facts.

Microsoft’s Bing search engine served up this information about a research paper mathematician Claude Shannon never wrote as if it were true.

Microsoft via Will Knight

There is just one big problem: Shannon did not write any such paper, and the citations offered by Bing consist of fabrications—or “hallucinations” in generative AI parlance—by two chatbots, Pi from Inflection AI and Claude from Anthropic.

This generative-AI trap that caused Bing to offer up untruths was laid—purely by accident—by Daniel Griffin, who recently finished a PhD on web search at UC Berkeley. In July he posted the fabricated responses from the bots on his blog. Griffin had instructed both bots, “Please summarize Claude E. Shannon’s ‘A Short History of Searching’ (1948)”. He thought it a nice example of the kind of query that brings out the worst in large language models, because it asks for information that is similar to existing text found in its training data, encouraging the models to make very confident statements. Shannon did write an incredibly important article in 1948 titled “A Mathematical Theory of Communication,” which helped lay the foundation for the field of information theory.

Last week, Griffin discovered that his blog post and the links to these chatbot results had inadvertently poisoned Bing with false information. On a whim, he tried feeding the same question into Bing and discovered that the chatbot hallucinations he had induced were highlighted above the search results in the same way as facts drawn from Wikipedia might be. “It gives no indication to the user that several of these results are actually sending you straight to conversations people have with LLMs,” Griffin says. (Although WIRED could initially replicate the troubling Bing result, after an enquiry was made to Microsoft it appears to have been resolved.)

Griffin’s accidental experiment shows how the rush to deploy ChatGPT-style AI is tripping up even the companies most familiar with the technology. And how the flaws in these impressive systems can harm services that millions of people use every day.

It may be difficult for search engines to automatically detect AI-generated text. But Microsoft could have implemented some basic safeguards, perhaps barring text drawn from chatbot transcripts from becoming a featured snippet or adding warnings that certain results or citations consist of text dreamt up by an algorithm. Griffin added a disclaimer to his blog post warning that the Shannon result was false, but Bing initially seemed to ignore it.

Although WIRED could initially replicate the troubling Bing result, it now appears to have been resolved. Caitlin Roulston, director of communications at Microsoft, says the company has adjusted Bing and regularly tweaks the search engine to stop it from showing low authority content. “There are circumstances where this may appear in search results—often because the user has expressed a clear intent to see that content or because the only content relevant to the search terms entered by the user happens to be low authority,” Roulston says. “We have developed a process for identifying these issues and are adjusting results accordingly.”

Francesca Tripodi, an assistant professor at the University of North Carolina at Chapel Hill, who studies how search queries that produce few results, dubbed data voids, can be used to manipulate results, says large language models are affected by the same issue, because they are trained on web data and are more likely to hallucinate when an answer is absent from that training. Before long, Tripodi says, we may see people use AI-generated content to intentionally manipulate search results, a tactic Griffin’s accidental experiment suggests could be powerful. “You’re going to increasingly see inaccuracies, but these inaccuracies can also be wielded and without that much computer savvy,” Tripodi says.

Even WIRED was able to try a bit of search subterfuge. I was able to get Pi to create a summary of a fake article of my own by inputting, “Summarize Will Knight’s article ‘Google’s Secret AI Project That Uses Cat Brains.’” Google did once famously develop an AI algorithm that learned to recognize cats on YouTube, which perhaps led the chatbot to find my request not too far a jump from its training data. Griffin added a link to the result on his blog; we’ll see if it too becomes elevated by Bing as a bizarre piece of alternative internet history.

The problem of search results becoming soured by AI content may get a lot worse as SEO pages, social media posts, and blog posts are increasingly made with help from AI. This may be just one example of generative AI eating itself like an algorithmic ouroboros.

Griffin says he hopes to see AI-powered search tools shake things up in the industry and spur wider choice for users. But given the accidental trap he sprang on Bing and the way people rely so heavily on web search, he says “there’s also some very real concerns.”

Given his “seminal work” on the subject, I think Shannon would almost certainly agree.

Facebook
Twitter
LinkedIn
Telegram
Tumblr