contents‍
Have you ever called an LLM-powered chatbot – like ChatGPT or Claude – on giving you outdated or incorrect information?Â
When generating a response, large language models (LLMs) rely on the datasets on which they were trained. However, because they are designed to predict text rather than retrieve exact facts, you can’t always rely on them for precise information. The training datasets are also usually limited to publicly available data and, in some domains, may quickly become obsolete.Â
When you create LLM apps for business use, relying on default model behavior isn’t enough. How can we make LLM systems — from support chatbots to AI assistants — more accurate and reliable?
One way is to use Retrieval-Augmented Generation (RAG). This approach helps ground LLM outputs in trusted data sources, such as company policies or documents. For example, when a company uses RAG for customer support, the AI would search through support documentation before responding to a customer query, ensuring the answer aligns with current company guidelines.
This approach has several benefits:
In this blog, we compiled 10 real-world examples of how companies apply RAG to improve customer experience, automate routine tasks, and improve productivity.
Doordash, a food delivery company, enhances delivery support with a RAG-based chatbot. The company developed an in-house solution that combines three key components: the RAG system, the LLM guardrail, and the LLM judge.Â
When a “Dasher,” an independent contractor who does deliveries through DoorDash, reports a problem, the system first condenses the conversation to grasp the core issue accurately. Using this summary, it then searches the knowledge base for the most relevant articles and past resolved cases. The retrieved information is fed into an LLM, which crafts a coherent and contextually appropriate response tailored to Dasher's query.
To maintain the high quality of the system’s responses, DoorDash implemented the LLM Guardrail system, an online monitoring tool that evaluates each LLM-generated response for accuracy and compliance. It helps prevent hallucinations and filter out responses that violate company policies.Â
To monitor the system quality over time, DoorDash uses an LLM Judge that assesses the chatbot's performance across five LLM evaluation metrics: retrieval correctness, response accuracy, grammar and language accuracy, coherence to context, and relevance to the Dasher's request.
Online professional platform LinkedIn introduced a novel customer service question-answering method that combines RAG with a knowledge graph.Â
Instead of treating the corpus of past issue tracking tickets as plain text, the solution constructs a knowledge graph from historical issues, taking into account intra-issue structure and inter-issue relations. When a user asks a question, the system parses consumer queries and retrieves related sub-graphs from the knowledge graph to generate answers. This approach mitigates the effects of text segmentation and improves retrieval accuracy.Â
It has been deployed within LinkedIn’s customer service team, reducing the median per-issue resolution time by 28.6%.
Bell, a telecommunication services company, utilized RAG to enhance its knowledge management processes and ensure its employees have access to up-to-date company policies.
The company shared how it built a knowledge management component within the RAG system. Bell adopted modular document embedding pipelines that allow it to efficiently process and index raw documents from various sources. The solution supports both batch and incremental updates to the knowledge base and automatically updates the indexes when documents are added to or removed from their source location.Â
Bell treats each component as a service and applies DevOps principles while building and maintaining the system.Â
Harvard Business School's Senior Lecturer, Jeffrey Bussgang, created a RAG-based AI faculty chatbot to help him teach his entrepreneurship course. The chatbot, ChatLTV, helps students with course preparation — like clarifying complex concepts or finding additional information on case studies — as well as administrative matters.
ChatLTV was trained on the course corpus, including case studies, teaching notes, books, blog posts, and historical Q&A from the course's Slack channel. The chatbot is integrated into the course's Slack channel, allowing students to interact with it in private and public modes.
To respond to a student's question, the LLM is provided with the query and relevant context stored in a vector database. The most relevant content chunks are served to the LLM using OpenAI's API. To ensure ChatLTV’s responses are accurate, the course team used a mix of manual and automated testing. They used an LLM judge to compare the outputs to the ground-truth data and generate a quality score.Â
Vimeo, a video hosting platform, enables users to converse with videos. The company developed a RAG-based chatbot that can summarize video content, link to key moments, and suggest additional questions.Â
The first step of the implementation is to transform video content into text. The video transcript is then processed and saved in a vector database. Vimeo applies a bottom-up approach for the transcript database registration: they use several sizes of context windows, summarize long context, and create a description for the entire video.
When a user asks a question – e.g., “What is the video about?” – the system retrieves the relevant context from the database and passes it to the LLM to generate the answer. Along with the response, the chatbot outputs playable video moments supporting the answer.Â
To make the experience more engaging, the chatbot also suggests pregenerated question/answer pairs that cover the most important moments in the video and questions related to the user's query.Â
Asian super-app Grab uses RAG-powered LLMs to automate routine analytical tasks like generating reports and performing fraud investigations.Â
Grab's Report Summarizer automatically generates and summarizes regular reports, significantly reducing manual effort. It integrates several in-house platforms to achieve this:
When a report is due, the process is triggered by calling the appropriate Data-Arks API to retrieve the necessary data. This data is then processed by Spellvault, which utilizes the LLM to generate a concise summary. The final report, comprising the data and its summary, is delivered to users via Slack. The company states that the automated report summarization saves 3-4 hours per report.
Grab also introduced A* bot, an AI-powered assistant for streamlining fraud investigations. It utilizes a collection of frequently used queries packaged as Data-Arks APIs. When a user submits a prompt, the system selects the most relevant queries using RAG, executes them, and concisely summarizes the results through Slack. Here’s how a sample response from A* bot looks like:
RAG capabilities allow users from different teams and functions to self-serve and get interpretable outputs for any data use case.Â
Business information and content technology provider Thomson Reuters uses RAG to improve customer service. The company built a solution that helps customer support executives quickly access the most relevant information from a curated database in a chatty interface.
The system uses embeddings to find the most relevant documents from internal knowledge bases. To achieve this, the text is split into small chunks, and then each chunk is embedded and stored in a vector database. User questions are also converted into a vector embedding and then queried against the vector database to get the best matches.Â
Once relevant documents are retrieved, the seq-to-seq model refines and generates a well-structured response. The model tailors answers using retrieved knowledge, improving accuracy and reducing hallucinations.
The solution allows Thomson Reuters to provide support executives with accurate, up-to-date responses while maintaining conversational interaction.
Discovery platform Pinterest helps internal company data users write SQL queries to solve analytical problems.Â
The initial approach was rather straightforward. A user asked a question and chose the relevant data sources – tables – to be used to answer the question. That input was compiled into a text-to-SQL prompt and fed into an LLM that generated the response. The solution demonstrated decent results, but it turned out that identifying the correct tables to refer to was a significant challenge for users.Â
To solve this, Pinterest integrated RAG to guide users in selecting the right tables for their tasks. The system generates a vector index of tables’ summaries and transforms user questions into embeddings. Then, a similarity search is conducted to infer the top N suitable tables. The results are passed to the LLM, which selects the top K most suitable tables. Then, the text-to-SQL prompt is created and passed to the LLM to generate the response.
A fintech company Ramp used RAG to improve how they classified their customers and migrate to a standardized classification system.
Initially, the company relied on a homegrown classification that combined third-party data, sales inputs, and customer self-reporting. This often led to inconsistent or overly broad categorizations that made auditing and interpreting customer data difficult.
To solve this problem, Ramp built an in-house RAG-based industry classification system that enabled migration to a standardized classification framework (North American Industry Classification System, NAICS). Here’s how it works:
Relevant information about a customer business is transformed into vector representations to capture semantic meaning. These vectors are compared against a database of NAICS codes to identify the closest matches. The recommended codes are passed to an LLM to generate a final prediction.Â
Ramp uses internal services for embeddings and LLM prompt evaluations, Clickhouse to calculate similarity scores, and Kafka to log intermediate results. The company also implemented guardrails to ensure the outputs are valid NAICS codes.
The Royal Bank of Canada (RBC) built Arcane, a RAG system that points the bank’s specialists to the most relevant policies scattered across its internal web platform. Financial operations are complex, and it takes years to teach proprietary guidelines to trained banking professionals. Enabling specialists to locate relevant policies quickly can boost their productivity and streamline customer support.Â
Here’s how the Arcane experience looks: a bank specialist asks a question in a chatbot interface, and the system navigates through internal databases and extracts relevant information from documents, presenting it in a concise format and providing information sources.
One of the primary challenges the RBC addressed while developing Arcane was data parsing and chunking. The data were dispersed across various web platforms, proprietary sources of information, PDF documents, and Excel tables, making it difficult to access and retrieve relevant information efficiently.Â
RAG helps ground LLM outputs in up-to-date, relevant information. Whether it's improving customer support, analyzing data, or making a company’s knowledge base more accessible, RAG helps to ensure AI systems provide trustworthy, contextual answers.
If you’re building a complex system like RAG, you need evaluations to test it during development and production monitoring. That’s why we built Evidently. Our open-source library, with over 25 million downloads, makes it easy to test and evaluate LLM-powered applications, from chatbots to RAG. It simplifies evaluation workflows, offering 100+ built-in checks and easy configuration of custom LLM judges for every use case.
We also provide Evidently Cloud, a no-code workspace for teams to collaborate on AI quality, testing, and monitoring and run complex evaluation workflows.Â
Ready to test your RAG? Sign up for free or schedule a demo to see Evidently Cloud in action. We're here to help you build with confidence!