RAG App for Local PDFs

In this blog post, I’ll guide you through the process of creating a Retrieval-Augmented Generation (RAG) app for querying local documents using a combination of AWS Bedrock API, ChromaDB, and Flask. You’ll learn how to set up document indexing, implement search and query functionality, and generate AI-powered responses based on local PDFs. This is an essential tutorial for anyone looking to integrate advanced AI query capabilities into their local document management systems. The data base indexing is based on an excellent tutorial by Pixegami.

side project machine learning webdev

18 min

0010Feb '25

1. Introduction

Ask a large language model questions about local files securely. Here is a screenshot of what we will be creating:

What is a Retrieval-Augmented Generation App?

A Retrieval-Augmented Generation (RAG) app combines two powerful techniques to generate high-quality, contextually aware responses from an AI model. In a RAG system, when a user submits a query, the system first retrieves relevant documents or data from a database or an index. Then, the AI model generates an answer based on the retrieved information. This method allows the AI to pull from a wide range of sources and offer more accurate and personalized responses compared to standard text generation models, which rely solely on the information they were trained on.

In this tutorial, we’ll create a RAG app that uses local documents (in this case, PDFs) as the knowledge base. The system will retrieve relevant text from these documents, pass it to an AI model for processing, and generate contextually accurate responses. This app will showcase the integration of AWS Bedrock for text embeddings, ChromaDB for document indexing, and Flask for building the web application.

Overview of Technologies Used

AWS Bedrock API: AWS Bedrock provides access to powerful machine learning models for natural language processing tasks. It helps us create text embeddings, which are vector representations of the document content that enable us to perform efficient document retrieval.
ChromaDB: ChromaDB is a document retrieval and indexing system that allows us to efficiently search for documents based on their content. It works seamlessly with embeddings, making it a perfect choice for this RAG system.
Flask: Flask is a lightweight Python web framework that we’ll use to build the web interface for our RAG app. It handles the user’s input, processes queries, and displays the AI-generated responses.

2. Setting Up the Environment

Step 1: Create a Conda Environment and Install Libraries

Start by creating a new conda environment. This will ensure that your project dependencies are isolated from other projects. Open your terminal and run the following commands:

bitnami@ip-127-26-2-172:~$

  conda create --name rag-app python=3.9
  conda activate rag-app

Next, you’ll need to install the necessary libraries for this project. These include libraries for document loading, text splitting, embeddings, and web app functionality. Run the following command to install all dependencies:

bitnami@ip-127-26-2-172:~$ pip install langchain langchain-aws boto3 Flask ChromaDB python-dotenv PyPDF2

This installs:

Langchain: A framework that simplifies working with large language models and embeddings.
boto3: The AWS SDK for Python, which allows you to interact with AWS services, including Bedrock.
Flask: For creating the web app.
ChromaDB: For document indexing and retrieval.
python-dotenv: For loading environment variables from .env files.
PyPDF2: For handling PDF documents.

Step 2: Create an AWS Account and Set Up Bedrock

To use AWS Bedrock, you need an AWS account. If you don’t already have one, follow these steps to create it:

Create an AWS account: Go to AWS sign-up page and follow the steps to create a new account.
Add payment information: During the sign-up process, you’ll be asked to provide payment information.
Create an IAM user:
- Log in to your AWS Management Console.
- Navigate to the IAM (Identity and Access Management) service.
- Create a new user with Programmatic access.
- For permissions, choose Attach policies directly and select the AmazonBedrockFullAccess policy.
- Complete the setup and save the Access key ID and Secret access key—you’ll need them to authenticate your Bedrock API requests.

Step 3: Configure AWS Credentials

To authenticate your application with AWS, you’ll need to set up your credentials. The easiest way is to use the AWS CLI, which will automatically configure the necessary credentials.

Install the AWS CLI (if you don’t have it already): bitnami@ip-127-26-2-172:~$ pip install awscli
Configure your AWS CLI with your credentials: bitnami@ip-127-26-2-172:~$ aws configure Enter your Access key ID, Secret access key, and Default region name when prompted. You can leave the Default output format as “json.”

Now that the environment is set up and the AWS credentials are configured, you’re ready to start building the RAG app!

3. Preparing the Data

Before we can start using the RAG app, we need to prepare the data. This involves loading, splitting, and indexing the documents that will be used for answering queries. In this section, we will set up the pipeline for processing the documents, generating embeddings, and storing them in a Chroma database.

Step 1: Set Up Environment Variables

Start by creating a .env file in your project directory to store environment variables. These will include the paths for your PDF data and Chroma database. The file should look like this:

.env

Copy

PDF_DATA_PATH=/path/to/your/pdf/documents
CHROMA_PATH=/path/to/your/chroma/database

Make sure to replace /path/to/your/pdf/documents with the actual path to your PDF files and /path/to/your/chroma/database with the path where you’d like to store your Chroma database.

Step 2: Create the Embedding Function

The next step is to create a function to fetch embeddings using AWS Bedrock. This function will use the amazon.titan-embed-text-v2:0 model to generate text embeddings.

Create a file called get_embedding_function.py with the following content:

get_embedding_function.py

Copy

from langchain_aws import BedrockEmbeddings

def get_embedding_function():
    embeddings = BedrockEmbeddings(model_id='amazon.titan-embed-text-v2:0')
    return embeddings

This function will return the embedding function that will later be used to transform text into vectors.

Step 3: Populate the Chroma Database

Now that we have the embedding function, it’s time to create a script to load the documents, split them into smaller chunks, and index them into Chroma. Create a file called populate_database.py and add the following code:

populate_database.py

Copy

import argparse
import shutil
import os
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from get_embedding_function import get_embedding_function
from dotenv import load_dotenv

load_dotenv()

PDF_DATA_PATH = os.environ.get('PDF_DATA_PATH')
CHROMA_PATH = os.environ.get('CHROMA_PATH')

def main():
    # Check if the database should be cleared (using the --clear flag).
    parser = argparse.ArgumentParser()
    parser.add_argument("--reset", action="store_true", help="Reset the database.")
    args = parser.parse_args()
    if args.reset:
        print("✨ Clearing Database")
        clear_database()

    # Create (or update) the data store.
    documents = load_documents()
    chunks = split_documents(documents)
    add_to_chroma(chunks)


def load_documents():
    document_loader = PyPDFDirectoryLoader(PDF_DATA_PATH)
    return document_loader.load()


def split_documents(documents: list[Document]):
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=800,
        chunk_overlap=80,
        length_function=len,
        is_separator_regex=False,
    )
    return text_splitter.split_documents(documents)


def add_to_chroma(chunks: list[Document]):
    # Load the existing database.
    db = Chroma(
        persist_directory=CHROMA_PATH, embedding_function=get_embedding_function()
    )

    # Calculate Page IDs.
    chunks_with_ids = calculate_chunk_ids(chunks)

    # Add or Update the documents.
    existing_items = db.get(include=[])  # IDs are always included by default
    existing_ids = set(existing_items["ids"])
    print(f"Number of existing documents in DB: {len(existing_ids)}")

    # Only add documents that don't exist in the DB.
    new_chunks = []
    for chunk in chunks_with_ids:
        if chunk.metadata["id"] not in existing_ids:
            new_chunks.append(chunk)

    if len(new_chunks):
        print(f"👉 Adding new documents: {len(new_chunks)}")
        new_chunk_ids = [chunk.metadata["id"] for chunk in new_chunks]
        db.add_documents(new_chunks, ids=new_chunk_ids)
        # db.persist()
    else:
        print("✅ No new documents to add")


def calculate_chunk_ids(chunks):

    # This will create IDs like "data/monopoly.pdf:6:2"
    # Page Source : Page Number : Chunk Index

    last_page_id = None
    current_chunk_index = 0

    for chunk in chunks:
        source = chunk.metadata.get("source")
        page = chunk.metadata.get("page")
        current_page_id = f"{source}:{page}"

        # If the page ID is the same as the last one, increment the index.
        if current_page_id == last_page_id:
            current_chunk_index += 1
        else:
            current_chunk_index = 0

        # Calculate the chunk ID.
        chunk_id = f"{current_page_id}:{current_chunk_index}"
        last_page_id = current_page_id

        # Add it to the page meta-data.
        chunk.metadata["id"] = chunk_id

    return chunks


def clear_database():
    if os.path.exists(CHROMA_PATH):
        shutil.rmtree(CHROMA_PATH)


if __name__ == "__main__":
    main()

This script does the following:

Load Documents: It loads PDF files from a directory defined by the PDF_DATA_PATH environment variable.
Split Documents: The documents are split into smaller chunks using a RecursiveCharacterTextSplitter to make them manageable for the embedding model.
Add Documents to Chroma: It calculates unique IDs for each document chunk, checks if the document is already in the Chroma database, and adds new chunks that aren’t already indexed.

Step 4: Run the Script to Populate the Database

Now that everything is set up, you can run the script to populate the Chroma database. Open your terminal and execute the following command:

bitnami@ip-127-26-2-172:~$ python populate_database.py --reset

This command will:

Reset the database (if you add the --reset flag).
Load the documents from the PDF directory.
Split the documents into chunks.
Add the chunks to the Chroma database.

Once the script completes, you should see the message ”✨ Clearing Database” if the database was reset, followed by details about how many documents were added.

With the Chroma database populated, you now have a locally indexed set of documents ready to be used in your RAG app!

4. Building the RAG System

In this section, we’ll focus on setting up the Retrieval-Augmented Generation (RAG) system, which combines document retrieval with AI-generated responses. This involves setting up ChromaDB for document indexing, processing user queries, and generating responses using AWS Bedrock (via the Claude model).

ChromaDB Setup and Integration

ChromaDB is used for storing and retrieving documents that are relevant to the user’s query. We’ll use ChromaDB to index the documents we previously processed and then search the database to find the most relevant documents.

Creating and Managing the ChromaDB for Indexing Documents

To interact with ChromaDB, we’ll be using the langchain_chroma library. Let’s first set up a connection to the Chroma database.

Create the query_rag function that searches ChromaDB and retrieves relevant documents:

app.py

Copy

from langchain_chroma import Chroma
from get_embedding_function import get_embedding_function

def query_rag(query_text: str):
    # Prepare the DB.
    embedding_function = get_embedding_function()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    # Search the DB.
    results = db.similarity_search_with_score(query_text, k=5)

    # Prepare the context for the response.
    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    return context_text

This function initializes the Chroma database with the embeddings we created earlier and searches for the most relevant documents based on the query.

Adding Documents to ChromaDB

We already have a function add_to_chroma() from the “Preparing the Data” section. This function is responsible for adding the documents to ChromaDB during the data preparation process.

If you haven’t done so already, make sure to run the populate_database.py script to add your documents to the ChromaDB before proceeding.

Query Processing

Once the documents are indexed, we can process user queries by searching ChromaDB for relevant documents and generating responses using an AI model.

How the App Processes User Queries

The core of the query processing system is the query_rag function, which retrieves relevant documents from ChromaDB. After obtaining the relevant context from the search results, we need to integrate this context into a prompt for the AI model.

The query_rag function does exactly that, it retrieves documents and formats them into a prompt that will be sent to the AI model.

Generating Responses with AWS Bedrock

AWS Bedrock is a powerful tool that allows us to query AI models for generating text. We’ll use it to generate responses based on the query and the relevant context retrieved from ChromaDB.

Using the Claude Class to Query AWS Bedrock

We will create a class called Claude to handle interactions with the Claude AI model via the AWS Bedrock API. The Claude class will send a prompt to the model and retrieve the response.

Create a file called chat.py with the following content:

chat.py

Copy

import boto3

client = boto3.client('bedrock-runtime')

# MODEL_ID = "anthropic.claude-3-haiku-20240307-v1:0"
# MODEL_ID = "amazon.titan-text-express-v1"
MODEL_ID = "eu.anthropic.claude-3-sonnet-20240229-v1:0"

class Claude:
    def __init__(self):
        self.model_id = MODEL_ID

    def invoke(self, prompt: str) -> str:
        messages = [
            {
                "role": "user",
                "content": [
                    {"text": prompt},
                ],
            }
        ]

        response = client.converse(
            modelId=self.model_id,
            messages=messages,
        )

        return response['output']['message']['content'][0]['text']

The Claude class allows us to send a prompt to the Claude model and get back a response. The invoke() method takes in a prompt and returns the generated response.

Integrating the Prompt Format with Context from ChromaDB Search Results

Now, we need to combine the context from the ChromaDB search results with a user’s question to create a prompt. The prompt will be formatted and sent to the Claude model for generating the answer.

In the app.py file, we’ll define the main logic for the web application. The app will use the query_rag function to process user queries and format the answers.

Here is the code for app.py:

app.py

Copy

from flask import Flask, render_template, request
from langchain_chroma import Chroma
from langchain.prompts import ChatPromptTemplate
from get_embedding_function import get_embedding_function
from dotenv import load_dotenv
from chat import Claude
from markdown import markdown
from pathlib import Path
import os

# Load environment variables
load_dotenv()
CHROMA_PATH = os.environ.get("CHROMA_PATH")

# Initialize Flask app
app = Flask(__name__)

PROMPT_TEMPLATE = "Answer the question based only on the following context:"\
                  "{context}---"\
                  "Answer the question based on the above context: "\
                  "{question}"

url = 'https://domain.sharepoint.com/path%2Fto%2FPapers%2F{}%2Epdf'

def query_rag(query_text: str):
    # Prepare the DB.
    embedding_function = get_embedding_function()
    db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)

    # Search the DB.
    results = db.similarity_search_with_score(query_text, k=5)

    context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
    prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
    prompt = prompt_template.format(context=context_text, question=query_text)

    # Use Claude model to get the response
    model = Claude()
    response_text = model.invoke(prompt)

    # Format the response with sources
    formatted_source_list = []
    for doc, _score in results:
        source = doc.metadata.get("id", None)
        if source is None:
            continue

        document, page_no, chunk_no = source.split(":")
        file_name = Path(document).stem
        document_url = url.format(file_name)

        # Extract the chunk text (source content)
        chunk_text = doc.page_content.replace("\n", "")
        
        formatted_source_list.append(
            f'''
            <div class="source-block">
                <p><strong>Source:</strong> <a href="{document_url}" target="_blank" rel="noreferrer">
                {file_name}</a>, page {int(page_no) + 1}, chunk {int(chunk_no) + 1}</p>
                <div class="source-text">{chunk_text}</div>
            </div>
            '''
        )

    formatted_sources = "".join(formatted_source_list)
    formatted_response = f"<h2>Question</h2><p>{query_text}</p><h2>Response</h2><p>{response_text}</p><h2>Sources</h2>{formatted_sources}"

    return formatted_response


@app.route("/", methods=["GET", "POST"])
def index():
    if request.method == "POST":
        # Get the input question from the form
        query_text = request.form["query_text"]
        # Get the formatted markdown response
        formatted_response = query_rag(query_text)
        return render_template("index.html", response=formatted_response)
    return render_template("index.html", response="")

if __name__ == "__main__":
    app.run(debug=True, port=5001)

This app will:

Accept a question from the user via a web form.
Use the query_rag function to search ChromaDB for relevant documents.
Send the formatted prompt to Claude via AWS Bedrock to generate an answer.
Return the formatted answer with source information.

Running the Flask App

Once you’ve set everything up, run the Flask app using the following command:

bitnami@ip-127-26-2-172:~$ python app.py --reset

You can now access the app at http://127.0.0.1:5001/. Enter a question, and the app will retrieve relevant documents, generate a response using AWS Bedrock, and display it along with the source documents.

Congratulations! You have successfully built a RAG system that integrates document retrieval with AI-generated responses.

Chapter 3: Formatting and Styling Your Flask App

In this chapter, we’ll enhance the user interface of the “Question the Lung Sage” web application by adding custom styling and formatting. This will improve the user experience, ensuring that your app looks clean, modern, and responsive across devices.

1. Setting Up the HTML Template

First, let’s start with the main HTML template. This file will handle the structure of the page, including the form where users will submit their questions, and the area where the response will be displayed.

Create the file index.html inside the /templates directory with the following content:

/templates/index.html

Copy

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Question the Lung Sage</title>
    <link rel="stylesheet" href="{{ url_for('static', filename='styles.css') }}">
</head>
<body>
    <h1>Question the Lung Sage 👴🏻</h1>
    <form method="POST" id="queryForm">
        <label for="query_text">Enter your question:</label><br>
        <input type="text" id="query_text" name="query_text" required><br><br>
        <button type="submit" id="submitBtn">Submit</button>

        <!-- Loading Spinner (Initially Hidden) -->
        <div id="loading" class="spinner hidden"></div>
    </form>

    {% if response %}
    <div>{{ response | safe }}</div>
    {% endif %}

    <script>
        document.getElementById("queryForm").addEventListener("submit", function() {
            // Change button text and disable it
            let submitBtn = document.getElementById("submitBtn");
            submitBtn.innerText = "Thinking...";
            submitBtn.disabled = true;

            // Show the loading spinner
            document.getElementById("loading").classList.remove("hidden");
        });
    </script>
</body>
</html>

2. Creating Custom Styles with `styles.css`

Next, let’s create a custom CSS file to style the form, the response, and the loading spinner.

Create the file styles.css inside the /static directory with the following content:

static/styles.css

Copy

/* General Styles */
body {
    font-family: Arial, sans-serif;
    text-align: center;
    max-width: 800px;
    margin: auto auto 40px auto;
}

/* Loading Spinner */
.spinner {
    margin-top: 20px;
    width: 40px;
    height: 40px;
    border: 5px solid #f3f3f3;
    border-top: 5px solid #007bff;
    border-radius: 50%;
    animation: spin 1s linear infinite;
}

.hidden {
    display: none;
}

/* Spinner Animation */
@keyframes spin {
    0% { transform: rotate(0deg); }
    100% { transform: rotate(360deg); }
}

/* Reset some default styling */
* {
    margin: 0;
    padding: 0;
    box-sizing: border-box;
}

body {
    font-family: 'Arial', sans-serif;
    background-color: #f4f7fa;
    color: #333;
    line-height: 1.6;
    padding: 20px;
}

/* Main Header */
h1 {
    font-size: 2.5rem;
    color: #2d3e50;
    max-width: 600px;
    margin: auto auto 20px;
}

/* Section Header */
h2 {
    font-size: 1.8rem;
    color: #34495e;
    margin-bottom: 15px;
}

/* Form Styles */
form {
    background-color: #fff;
    border-radius: 8px;
    box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1);
    padding: 30px;
    width: 100%;
    max-width: 600px;
    margin: 20px auto;
}

/* Label Styles */
label {
    font-size: 1.1rem;
    color: #34495e;
    margin-bottom: 10px;
    display: block;
}

/* Input Fields */
input[type="text"] {
    width: 100%;
    padding: 10px;
    font-size: 1rem;
    border: 2px solid #ddd;
    border-radius: 4px;
    margin-bottom: 20px;
}

/* Submit Button Styling */
button[type="submit"], input[type="submit"] {
    background-color: #2ecc71;
    color: #fff;
    padding: 12px 20px;
    border: none;
    border-radius: 4px;
    font-size: 1rem;
    cursor: pointer;
    transition: background-color 0.3s ease;
    margin-top: 10px;
}

button[type="submit"]:hover, input[type="submit"]:hover {
    background-color: #27ae60;
}

button[type="submit"]:disabled, input[type="submit"]:disabled {
    background-color: #7f8c8d;
    cursor: not-allowed;
}

/* Styled Markdown Response */
div {
    background-color: #fff;
    border-radius: 8px;
    box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1);
    padding: 30px;
    margin-top: 30px;
    max-width: 800px;
    margin: 30px auto;
}

/* Code Block */
pre {
    white-space: pre-wrap;
    word-wrap: break-word;
    background-color: #f9f9f9;
    padding: 15px;
    border-radius: 5px;
    overflow-x: auto;
}

/* Source Block */
.source-block {
    background-color: #f9f9f9;
    border-left: 4px solid #2ecc71;
    padding: 12px;
    margin: 10px 0;
    font-family: 'Arial', sans-serif;
    font-size: 14px;
    text-align: left;
    border-radius: 8px;
    box-shadow: 0px 2px 8px rgba(0, 0, 0, 0.1);
}

.source-text {
    font-style: italic;
    color: #616161;
    white-space: pre-wrap;
}

/* Responsive Design */
@media (max-width: 600px) {
    body {
        padding: 10px;
    }

    h1 {
        font-size: 2rem;
    }

    form {
        padding: 20px;
    }

    input[type="text"],
    input[type="submit"],
    button[type="submit"] {
        font-size: 0.9rem;
    }
}

Conclusion

Congratulations! You have successfully built a fully functional RAG (Retrieval-Augmented Generation) system using Flask, AWS Bedrock, ChromaDB, and custom styling. In this tutorial, we’ve covered everything from setting up the backend to integrating the AI-driven document retrieval system, and finally enhancing the user interface with clean, responsive design elements.

With your app now able to process user queries and return AI-generated responses based on relevant documents, you have a powerful tool for information retrieval. The integration of Flask with AI models and a well-styled frontend makes it both user-friendly and efficient.

What’s Next?

Now that you have a solid foundation, here are some ideas for next steps:

Deploy the App: Host your app using cloud platforms like AWS, Heroku, or DigitalOcean for real-world use.
Expand the Model: Experiment with more sophisticated retrieval techniques or explore larger document sets to improve the quality and accuracy of responses.
Add More Features: Consider adding user authentication, data persistence, or advanced features like conversational AI.

You have learned how to integrate powerful AI models with a simple web interface, and this knowledge can be applied to many other applications that require document-based question answering.

Thank you for following along with this tutorial, and best of luck with your future AI projects!

side project machine learning webdev AWS