📚 LLM-as-a-Judge: a Complete Guide on Using LLMs for Evaluations. Get your copy

LLM Evals

10 RAG examples and use cases from real companies

Last updated:

July 23, 2025

Published:

February 13, 2025

contents‍

Start testing your AI systems today

Get demo

Have you ever called an LLM-powered chatbot – like ChatGPT or Claude – on giving you outdated or incorrect information?

When generating a response, large language models (LLMs) rely on the datasets on which they were trained. However, because they are designed to predict text rather than retrieve exact facts, you can’t always rely on them for precise information. The training datasets are also usually limited to publicly available data and, in some domains, may quickly become obsolete.

When you create LLM apps for business use, relying on default model behavior isn’t enough. How can we make LLM systems — from support chatbots to AI assistants — more accurate and reliable?

One way is to use Retrieval-Augmented Generation (RAG). This approach helps ground LLM outputs in trusted data sources, such as company policies or documents. For example, when a company uses RAG for customer support, the AI would search through support documentation before responding to a customer query, ensuring the answer aligns with current company guidelines.

This approach has several benefits:

You provide up-to-date and domain-specific knowledge to the LLM model to reduce hallucinations and improve response accuracy.
RAG can give source citations along with its responses, making it easier to verify its claims.
By pulling user-specific data, RAG can generate responses tailored to individual users.

In this blog, we compiled 10 real-world examples of how companies apply RAG to improve customer experience, automate routine tasks, and improve productivity.

Using RAG? Check out this in-depth guide on RAG evaluation.

🚚 Delivery support chatbot

Doordash, a food delivery company, enhances delivery support with a RAG-based chatbot. The company developed an in-house solution that combines three key components: the RAG system, the LLM guardrail, and the LLM judge.

When a “Dasher,” an independent contractor who does deliveries through DoorDash, reports a problem, the system first condenses the conversation to grasp the core issue accurately. Using this summary, it then searches the knowledge base for the most relevant articles and past resolved cases. The retrieved information is fed into an LLM, which crafts a coherent and contextually appropriate response tailored to Dasher's query.

To maintain the high quality of the system’s responses, DoorDash implemented the LLM Guardrail system, an online monitoring tool that evaluates each LLM-generated response for accuracy and compliance. It helps prevent hallucinations and filter out responses that violate company policies.

To monitor the system quality over time, DoorDash uses an LLM Judge that assesses the chatbot's performance across five LLM evaluation metrics: retrieval correctness, response accuracy, grammar and language accuracy, coherence to context, and relevance to the Dasher's request.

DoorDash RAG system architecture — *DoorDash RAG-based support system. Source:* *Path to high-quality LLM-based Dasher support automation*

💻 Customer tech support

Online professional platform LinkedIn introduced a novel customer service question-answering method that combines RAG with a knowledge graph.

Instead of treating the corpus of past issue tracking tickets as plain text, the solution constructs a knowledge graph from historical issues, taking into account intra-issue structure and inter-issue relations. When a user asks a question, the system parses consumer queries and retrieves related sub-graphs from the knowledge graph to generate answers. This approach mitigates the effects of text segmentation and improves retrieval accuracy.

It has been deployed within LinkedIn’s customer service team, reducing the median per-issue resolution time by 28.6%.

*RAG with knowledge graph framework at LinkedIn. Source:* *Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering*

🏢 Internal policies chatbot

Bell, a telecommunication services company, utilized RAG to enhance its knowledge management processes and ensure its employees have access to up-to-date company policies.

The company shared how it built a knowledge management component within the RAG system. Bell adopted modular document embedding pipelines that allow it to efficiently process and index raw documents from various sources. The solution supports both batch and incremental updates to the knowledge base and automatically updates the indexes when documents are added to or removed from their source location.

Bell treats each component as a service and applies DevOps principles while building and maintaining the system.

*Knowledge management solution for RAG system at Bell. Source:* *[VIDEO] Modular Solutions for Knowledge Management at scale in RAG Systems*

🎓 AI Professor

Harvard Business School's Senior Lecturer, Jeffrey Bussgang, created a RAG-based AI faculty chatbot to help him teach his entrepreneurship course. The chatbot, ChatLTV, helps students with course preparation — like clarifying complex concepts or finding additional information on case studies — as well as administrative matters.

ChatLTV was trained on the course corpus, including case studies, teaching notes, books, blog posts, and historical Q&A from the course's Slack channel. The chatbot is integrated into the course's Slack channel, allowing students to interact with it in private and public modes.

To respond to a student's question, the LLM is provided with the query and relevant context stored in a vector database. The most relevant content chunks are served to the LLM using OpenAI's API. To ensure ChatLTV’s responses are accurate, the course team used a mix of manual and automated testing. They used an LLM judge to compare the outputs to the ground-truth data and generate a quality score.

*ChatLTV tech stack. Source:* *An AI Professor at Harvard: ChatLTV*

🍿 Video summaries

Vimeo, a video hosting platform, enables users to converse with videos. The company developed a RAG-based chatbot that can summarize video content, link to key moments, and suggest additional questions.

The first step of the implementation is to transform video content into text. The video transcript is then processed and saved in a vector database. Vimeo applies a bottom-up approach for the transcript database registration: they use several sizes of context windows, summarize long context, and create a description for the entire video.

When a user asks a question – e.g., “What is the video about?” – the system retrieves the relevant context from the database and passes it to the LLM to generate the answer. Along with the response, the chatbot outputs playable video moments supporting the answer.

*Vimeo RAG-based Q&A system. Source:* *Unlocking knowledge sharing for videos with RAG*

To make the experience more engaging, the chatbot also suggests pregenerated question/answer pairs that cover the most important moments in the video and questions related to the user's query.

🕵️‍♀️ Analytical fraud reports

Asian super-app Grab uses RAG-powered LLMs to automate routine analytical tasks like generating reports and performing fraud investigations.

Grab's Report Summarizer automatically generates and summarizes regular reports, significantly reducing manual effort. It integrates several in-house platforms to achieve this:

Data-Arks is a Python-based API platform that houses frequently used SQL queries and Python functions packaged into individual APIs.
Spellvault is an internal platform that stores, shares, and refines LLM prompts.

When a report is due, the process is triggered by calling the appropriate Data-Arks API to retrieve the necessary data. This data is then processed by Spellvault, which utilizes the LLM to generate a concise summary. The final report, comprising the data and its summary, is delivered to users via Slack. The company states that the automated report summarization saves 3-4 hours per report.

*Grab’s RAG-based report summarizer. Source:* *Leveraging RAG-powered LLMs for Analytical Tasks*

Grab also introduced A* bot, an AI-powered assistant for streamlining fraud investigations. It utilizes a collection of frequently used queries packaged as Data-Arks APIs. When a user submits a prompt, the system selects the most relevant queries using RAG, executes them, and concisely summarizes the results through Slack. Here’s how a sample response from A* bot looks like:

*Grab’s RAG-based A* bot. Source:* *Leveraging RAG-powered LLMs for Analytical Tasks*

RAG capabilities allow users from different teams and functions to self-serve and get interpretable outputs for any data use case.

💼 Executive customer support

Business information and content technology provider Thomson Reuters uses RAG to improve customer service. The company built a solution that helps customer support executives quickly access the most relevant information from a curated database in a chatty interface.

The system uses embeddings to find the most relevant documents from internal knowledge bases. To achieve this, the text is split into small chunks, and then each chunk is embedded and stored in a vector database. User questions are also converted into a vector embedding and then queried against the vector database to get the best matches.

Once relevant documents are retrieved, the seq-to-seq model refines and generates a well-structured response. The model tailors answers using retrieved knowledge, improving accuracy and reducing hallucinations.

*Processing and indexing Flow, Retrieval Flow. Source:* *Better Customer Support Using RAG at Thomson Reuters*

The solution allows Thomson Reuters to provide support executives with accurate, up-to-date responses while maintaining conversational interaction.

🔢 RAG for data tables

Discovery platform Pinterest helps internal company data users write SQL queries to solve analytical problems.

The initial approach was rather straightforward. A user asked a question and chose the relevant data sources – tables – to be used to answer the question. That input was compiled into a text-to-SQL prompt and fed into an LLM that generated the response. The solution demonstrated decent results, but it turned out that identifying the correct tables to refer to was a significant challenge for users.

To solve this, Pinterest integrated RAG to guide users in selecting the right tables for their tasks. The system generates a vector index of tables’ summaries and transforms user questions into embeddings. Then, a similarity search is conducted to infer the top N suitable tables. The results are passed to the LLM, which selects the top K most suitable tables. Then, the text-to-SQL prompt is created and passed to the LLM to generate the response.

Incorporating RAG for table selection at Pinterest — *Incorporating RAG for Table Selection. Source:* *How we built Text-to-SQL at Pinterest*

✅ Improved industry classification

A fintech company Ramp used RAG to improve how they classified their customers and migrate to a standardized classification system.

Initially, the company relied on a homegrown classification that combined third-party data, sales inputs, and customer self-reporting. This often led to inconsistent or overly broad categorizations that made auditing and interpreting customer data difficult.

To solve this problem, Ramp built an in-house RAG-based industry classification system that enabled migration to a standardized classification framework (North American Industry Classification System, NAICS). Here’s how it works:

Relevant information about a customer business is transformed into vector representations to capture semantic meaning. These vectors are compared against a database of NAICS codes to identify the closest matches. The recommended codes are passed to an LLM to generate a final prediction.

Ramp uses internal services for embeddings and LLM prompt evaluations, Clickhouse to calculate similarity scores, and Kafka to log intermediate results. The company also implemented guardrails to ensure the outputs are valid NAICS codes.

*Ramp’s RAG system design. Source:* *From RAG to Richness: How Ramp Revamped Industry Classification*

🚦 Locating internal policies

The Royal Bank of Canada (RBC) built Arcane, a RAG system that points the bank’s specialists to the most relevant policies scattered across its internal web platform. Financial operations are complex, and it takes years to teach proprietary guidelines to trained banking professionals. Enabling specialists to locate relevant policies quickly can boost their productivity and streamline customer support.

Here’s how the Arcane experience looks: a bank specialist asks a question in a chatbot interface, and the system navigates through internal databases and extracts relevant information from documents, presenting it in a concise format and providing information sources.

One of the primary challenges the RBC addressed while developing Arcane was data parsing and chunking. The data were dispersed across various web platforms, proprietary sources of information, PDF documents, and Excel tables, making it difficult to access and retrieve relevant information efficiently.

📊 Evaluate RAG with Evidently

RAG helps ground LLM outputs in up-to-date, relevant information. Whether it's improving customer support, analyzing data, or making a company’s knowledge base more accessible, RAG helps to ensure AI systems provide trustworthy, contextual answers.

If you’re building a complex system like RAG, you need evaluations to test it during development and production monitoring. That’s why we built Evidently. Our open-source library, with over 25 million downloads, makes it easy to test and evaluate LLM-powered applications, from chatbots to RAG. It simplifies evaluation workflows, offering 100+ built-in checks and easy configuration of custom LLM judges for every use case.

We also provide Evidently Cloud, a no-code workspace for teams to collaborate on AI quality, testing, and monitoring and run complex evaluation workflows.