Let’s start with the fact that not all RAGs are of the same caliber. The accuracy of your custom database content is critical to solid results, but it’s not the only variable. “It’s not just about the quality of the content itself,” says Joel Hron, global head of content Artificial intelligence at Thomson Reuters. “It’s about the quality of searching and getting relevant content based on the question.” Mastering each step of the process is crucial because one wrong step can completely ruin your model.
“Any lawyer who has ever tried to use a natural language search on a search engine will find that semantic similarity often leads to completely irrelevant material,” says Daniel Ho, a professor at Stanford University and senior research fellow at the Institute Institute for Human-Centered AI. Ho’s research on AI legal tools that rely on RAG found higher levels of error in their results than the companies creating the models they found.
Which brings us to the most tough question in the discussion: How do you define hallucinations within a RAG implementation? Does this only happen when the chatbot generates results without quotes and fills in the information? Does this also happen when the tool may miss significant data or misinterpret certain aspects of the quote?
According to Lewis, hallucinations in a RAG system come down to whether the output is consistent with what the model discovered when it took in the data. While Stanford’s study of artificial intelligence tools for lawyers expands this definition a bit, checking whether the results are based on the data provided and whether they are factually precise – that’s a high bar for lawyers, who often analyze intricate cases and navigate intricate hierarchies. precedent.
While the legal-specific RAG system is clearly better at answering case law questions than OpenAI’s ChatGPT or Google’s Gemini, it can miss finer details and make accidental errors. All AI experts I spoke with emphasized the constant need for thoughtful human interaction throughout the process to double-check quotes and the overall accuracy of results.
Law is an area where there is a lot going on around RAG-based artificial intelligence tools, but the potential of this process is not constrained to a single intellectual work. “Choose any profession or any company. You need to get answers based on real documents,” Arredondo says. “So I think RAG will become the foundation that will be used in basically every professional application, at least in the near and medium term.” Risk-averse executives seem excited at the prospect of using AI tools to better understand their proprietary data without having to submit sensitive information to a standard, public chatbot.
However, it is critical that users understand the limitations of these tools and that AI-focused companies refrain from over-promising the accuracy of their answers. Anyone using an AI tool should still avoid putting complete trust in the results and should approach the answer with a hearty sense of skepticism, even if the answer is improved by RAG.
“The hallucinations won’t go away,” Ho says. “We don’t have a way to really eliminate hallucinations yet.” Even if RAG reduces error rates, human judgment is paramount. And that’s not a lie.
