In this blog post, I’ll guide you through the process of creating a Retrieval-Augmented Generation (RAG) app for querying local documents using a combination of AWS Bedrock API, ChromaDB, and Flask. You’ll learn how to set up document indexing, implement search and query functionality, and generate AI-powered responses based on local PDFs. This is an essential tutorial for anyone looking to integrate advanced AI query capabilities into their local document management systems. The data base indexing is based on an excellent tutorial by Pixegami.
Ask a large language model questions about local files securely. Here is a screenshot of what we will be creating:
A Retrieval-Augmented Generation (RAG) app combines two powerful techniques to generate high-quality, contextually aware responses from an AI model. In a RAG system, when a user submits a query, the system first retrieves relevant documents or data from a database or an index. Then, the AI model generates an answer based on the retrieved information. This method allows the AI to pull from a wide range of sources and offer more accurate and personalized responses compared to standard text generation models, which rely solely on the information they were trained on.
In this tutorial, we’ll create a RAG app that uses local documents (in this case, PDFs) as the knowledge base. The system will retrieve relevant text from these documents, pass it to an AI model for processing, and generate contextually accurate responses. This app will showcase the integration of AWS Bedrock for text embeddings, ChromaDB for document indexing, and Flask for building the web application.
Start by creating a new conda environment. This will ensure that your project dependencies are isolated from other projects. Open your terminal and run the following commands:
bitnami@ip-127-26-2-172:~$
conda create --name rag-app python=3.9
conda activate rag-app
Next, you’ll need to install the necessary libraries for this project. These include libraries for document loading, text splitting, embeddings, and web app functionality. Run the following command to install all dependencies:
bitnami@ip-127-26-2-172:~$
pip install langchain langchain-aws boto3 Flask ChromaDB python-dotenv PyPDF2
This installs:
.env
files.To use AWS Bedrock, you need an AWS account. If you don’t already have one, follow these steps to create it:
AmazonBedrockFullAccess
policy.To authenticate your application with AWS, you’ll need to set up your credentials. The easiest way is to use the AWS CLI, which will automatically configure the necessary credentials.
Install the AWS CLI (if you don’t have it already):
bitnami@ip-127-26-2-172:~$
pip install awscli
Configure your AWS CLI with your credentials:
bitnami@ip-127-26-2-172:~$
aws configure
Now that the environment is set up and the AWS credentials are configured, you’re ready to start building the RAG app!
Before we can start using the RAG app, we need to prepare the data. This involves loading, splitting, and indexing the documents that will be used for answering queries. In this section, we will set up the pipeline for processing the documents, generating embeddings, and storing them in a Chroma database.
Start by creating a .env
file in your project directory to store environment variables. These will include the paths for your PDF data and Chroma database. The file should look like this:
PDF_DATA_PATH=/path/to/your/pdf/documents
CHROMA_PATH=/path/to/your/chroma/database
Make sure to replace /path/to/your/pdf/documents
with the actual path to your PDF files and /path/to/your/chroma/database
with the path where you’d like to store your Chroma database.
The next step is to create a function to fetch embeddings using AWS Bedrock. This function will use the amazon.titan-embed-text-v2:0
model to generate text embeddings.
Create a file called get_embedding_function.py
with the following content:
from langchain_aws import BedrockEmbeddings
def get_embedding_function():
embeddings = BedrockEmbeddings(model_id='amazon.titan-embed-text-v2:0')
return embeddings
This function will return the embedding function that will later be used to transform text into vectors.
Now that we have the embedding function, it’s time to create a script to load the documents, split them into smaller chunks, and index them into Chroma. Create a file called populate_database.py
and add the following code:
import argparse
import shutil
import os
from langchain_community.document_loaders import PyPDFDirectoryLoader
from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from get_embedding_function import get_embedding_function
from dotenv import load_dotenv
load_dotenv()
PDF_DATA_PATH = os.environ.get('PDF_DATA_PATH')
CHROMA_PATH = os.environ.get('CHROMA_PATH')
def main():
# Check if the database should be cleared (using the --clear flag).
parser = argparse.ArgumentParser()
parser.add_argument("--reset", action="store_true", help="Reset the database.")
args = parser.parse_args()
if args.reset:
print("✨ Clearing Database")
clear_database()
# Create (or update) the data store.
documents = load_documents()
chunks = split_documents(documents)
add_to_chroma(chunks)
def load_documents():
document_loader = PyPDFDirectoryLoader(PDF_DATA_PATH)
return document_loader.load()
def split_documents(documents: list[Document]):
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=80,
length_function=len,
is_separator_regex=False,
)
return text_splitter.split_documents(documents)
def add_to_chroma(chunks: list[Document]):
# Load the existing database.
db = Chroma(
persist_directory=CHROMA_PATH, embedding_function=get_embedding_function()
)
# Calculate Page IDs.
chunks_with_ids = calculate_chunk_ids(chunks)
# Add or Update the documents.
existing_items = db.get(include=[]) # IDs are always included by default
existing_ids = set(existing_items["ids"])
print(f"Number of existing documents in DB: {len(existing_ids)}")
# Only add documents that don't exist in the DB.
new_chunks = []
for chunk in chunks_with_ids:
if chunk.metadata["id"] not in existing_ids:
new_chunks.append(chunk)
if len(new_chunks):
print(f"👉 Adding new documents: {len(new_chunks)}")
new_chunk_ids = [chunk.metadata["id"] for chunk in new_chunks]
db.add_documents(new_chunks, ids=new_chunk_ids)
# db.persist()
else:
print("✅ No new documents to add")
def calculate_chunk_ids(chunks):
# This will create IDs like "data/monopoly.pdf:6:2"
# Page Source : Page Number : Chunk Index
last_page_id = None
current_chunk_index = 0
for chunk in chunks:
source = chunk.metadata.get("source")
page = chunk.metadata.get("page")
current_page_id = f"{source}:{page}"
# If the page ID is the same as the last one, increment the index.
if current_page_id == last_page_id:
current_chunk_index += 1
else:
current_chunk_index = 0
# Calculate the chunk ID.
chunk_id = f"{current_page_id}:{current_chunk_index}"
last_page_id = current_page_id
# Add it to the page meta-data.
chunk.metadata["id"] = chunk_id
return chunks
def clear_database():
if os.path.exists(CHROMA_PATH):
shutil.rmtree(CHROMA_PATH)
if __name__ == "__main__":
main()
This script does the following:
PDF_DATA_PATH
environment variable.RecursiveCharacterTextSplitter
to make them manageable for the embedding model.Now that everything is set up, you can run the script to populate the Chroma database. Open your terminal and execute the following command:
bitnami@ip-127-26-2-172:~$
python populate_database.py --reset
This command will:
--reset
flag).Once the script completes, you should see the message ”✨ Clearing Database” if the database was reset, followed by details about how many documents were added.
With the Chroma database populated, you now have a locally indexed set of documents ready to be used in your RAG app!
In this section, we’ll focus on setting up the Retrieval-Augmented Generation (RAG) system, which combines document retrieval with AI-generated responses. This involves setting up ChromaDB for document indexing, processing user queries, and generating responses using AWS Bedrock (via the Claude model).
ChromaDB is used for storing and retrieving documents that are relevant to the user’s query. We’ll use ChromaDB to index the documents we previously processed and then search the database to find the most relevant documents.
To interact with ChromaDB, we’ll be using the langchain_chroma
library. Let’s first set up a connection to the Chroma database.
Create the query_rag
function that searches ChromaDB and retrieves relevant documents:
from langchain_chroma import Chroma
from get_embedding_function import get_embedding_function
def query_rag(query_text: str):
# Prepare the DB.
embedding_function = get_embedding_function()
db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)
# Search the DB.
results = db.similarity_search_with_score(query_text, k=5)
# Prepare the context for the response.
context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
return context_text
This function initializes the Chroma database with the embeddings we created earlier and searches for the most relevant documents based on the query.
We already have a function add_to_chroma()
from the “Preparing the Data” section. This function is responsible for adding the documents to ChromaDB during the data preparation process.
If you haven’t done so already, make sure to run the populate_database.py
script to add your documents to the ChromaDB before proceeding.
Once the documents are indexed, we can process user queries by searching ChromaDB for relevant documents and generating responses using an AI model.
The core of the query processing system is the query_rag
function, which retrieves relevant documents from ChromaDB. After obtaining the relevant context from the search results, we need to integrate this context into a prompt for the AI model.
The query_rag
function does exactly that—retrieves documents and formats them into a prompt that will be sent to the AI model.
AWS Bedrock is a powerful tool that allows us to query AI models for generating text. We’ll use it to generate responses based on the query and the relevant context retrieved from ChromaDB.
We will create a class called Claude
to handle interactions with the Claude AI model via the AWS Bedrock API. The Claude
class will send a prompt to the model and retrieve the response.
Create a file called chat.py
with the following content:
import boto3
client = boto3.client('bedrock-runtime')
# MODEL_ID = "anthropic.claude-3-haiku-20240307-v1:0"
# MODEL_ID = "amazon.titan-text-express-v1"
MODEL_ID = "eu.anthropic.claude-3-sonnet-20240229-v1:0"
class Claude:
def __init__(self):
self.model_id = MODEL_ID
def invoke(self, prompt: str) -> str:
messages = [
{
"role": "user",
"content": [
{"text": prompt},
],
}
]
response = client.converse(
modelId=self.model_id,
messages=messages,
)
return response['output']['message']['content'][0]['text']
The Claude
class allows us to send a prompt to the Claude model and get back a response. The invoke()
method takes in a prompt
and returns the generated response.
Now, we need to combine the context from the ChromaDB search results with a user’s question to create a prompt. The prompt will be formatted and sent to the Claude model for generating the answer.
In the app.py
file, we’ll define the main logic for the web application. The app will use the query_rag
function to process user queries and format the answers.
Here’s the code for app.py
:
from flask import Flask, render_template, request
from langchain_chroma import Chroma
from langchain.prompts import ChatPromptTemplate
from get_embedding_function import get_embedding_function
from dotenv import load_dotenv
from chat import Claude
from markdown import markdown
from pathlib import Path
import os
# Load environment variables
load_dotenv()
CHROMA_PATH = os.environ.get("CHROMA_PATH")
# Initialize Flask app
app = Flask(__name__)
PROMPT_TEMPLATE = "Answer the question based only on the following context:"\
"{context}---"\
"Answer the question based on the above context: "\
"{question}"
url = 'https://domain.sharepoint.com/path%2Fto%2FPapers%2F{}%2Epdf'
def query_rag(query_text: str):
# Prepare the DB.
embedding_function = get_embedding_function()
db = Chroma(persist_directory=CHROMA_PATH, embedding_function=embedding_function)
# Search the DB.
results = db.similarity_search_with_score(query_text, k=5)
context_text = "\n\n---\n\n".join([doc.page_content for doc, _score in results])
prompt_template = ChatPromptTemplate.from_template(PROMPT_TEMPLATE)
prompt = prompt_template.format(context=context_text, question=query_text)
# Use Claude model to get the response
model = Claude()
response_text = model.invoke(prompt)
# Format the response with sources
formatted_source_list = []
for doc, _score in results:
source = doc.metadata.get("id", None)
if source is None:
continue
document, page_no, chunk_no = source.split(":")
file_name = Path(document).stem
document_url = url.format(file_name)
# Extract the chunk text (source content)
chunk_text = doc.page_content.replace("\n", "")
formatted_source_list.append(
f'''
<div class="source-block">
<p><strong>Source:</strong> <a href="{document_url}" target="_blank" rel="noreferrer">
{file_name}</a>, page {int(page_no) + 1}, chunk {int(chunk_no) + 1}</p>
<div class="source-text">{chunk_text}</div>
</div>
'''
)
formatted_sources = "".join(formatted_source_list)
formatted_response = f"<h2>Question</h2><p>{query_text}</p><h2>Response</h2><p>{response_text}</p><h2>Sources</h2>{formatted_sources}"
return formatted_response
@app.route("/", methods=["GET", "POST"])
def index():
if request.method == "POST":
# Get the input question from the form
query_text = request.form["query_text"]
# Get the formatted markdown response
formatted_response = query_rag(query_text)
return render_template("index.html", response=formatted_response)
return render_template("index.html", response="")
if __name__ == "__main__":
app.run(debug=True, port=5001)
This app will:
query_rag
function to search ChromaDB for relevant documents.Once you’ve set everything up, run the Flask app using the following command:
bitnami@ip-127-26-2-172:~$
python app.py --reset
You can now access the app at http://127.0.0.1:5001/
. Enter a question, and the app will retrieve relevant documents, generate a response using AWS Bedrock, and display it along with the source documents.
Congratulations! You have successfully built a RAG system that integrates document retrieval with AI-generated responses.
In this chapter, we’ll enhance the user interface of the “Question the Lung Sage” web application by adding custom styling and formatting. This will improve the user experience, ensuring that your app looks clean, modern, and responsive across devices.
First, let’s start with the main HTML template. This file will handle the structure of the page, including the form where users will submit their questions, and the area where the response will be displayed.
Create the file index.html
inside the /templates
directory with the following content:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Question the Lung Sage</title>
<link rel="stylesheet" href="{{ url_for('static', filename='styles.css') }}">
</head>
<body>
<h1>Question the Lung Sage 👴🏻</h1>
<form method="POST" id="queryForm">
<label for="query_text">Enter your question:</label><br>
<input type="text" id="query_text" name="query_text" required><br><br>
<button type="submit" id="submitBtn">Submit</button>
<!-- Loading Spinner (Initially Hidden) -->
<div id="loading" class="spinner hidden"></div>
</form>
{% if response %}
<div>{{ response | safe }}</div>
{% endif %}
<script>
document.getElementById("queryForm").addEventListener("submit", function() {
// Change button text and disable it
let submitBtn = document.getElementById("submitBtn");
submitBtn.innerText = "Thinking...";
submitBtn.disabled = true;
// Show the loading spinner
document.getElementById("loading").classList.remove("hidden");
});
</script>
</body>
</html>
styles.css
Next, let’s create a custom CSS file to style the form, the response, and the loading spinner.
Create the file styles.css
inside the /static
directory with the following content:
/* General Styles */
body {
font-family: Arial, sans-serif;
text-align: center;
max-width: 800px;
margin: auto auto 40px auto;
}
/* Loading Spinner */
.spinner {
margin-top: 20px;
width: 40px;
height: 40px;
border: 5px solid #f3f3f3;
border-top: 5px solid #007bff;
border-radius: 50%;
animation: spin 1s linear infinite;
}
.hidden {
display: none;
}
/* Spinner Animation */
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
/* Reset some default styling */
* {
margin: 0;
padding: 0;
box-sizing: border-box;
}
body {
font-family: 'Arial', sans-serif;
background-color: #f4f7fa;
color: #333;
line-height: 1.6;
padding: 20px;
}
/* Main Header */
h1 {
font-size: 2.5rem;
color: #2d3e50;
max-width: 600px;
margin: auto auto 20px;
}
/* Section Header */
h2 {
font-size: 1.8rem;
color: #34495e;
margin-bottom: 15px;
}
/* Form Styles */
form {
background-color: #fff;
border-radius: 8px;
box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1);
padding: 30px;
width: 100%;
max-width: 600px;
margin: 20px auto;
}
/* Label Styles */
label {
font-size: 1.1rem;
color: #34495e;
margin-bottom: 10px;
display: block;
}
/* Input Fields */
input[type="text"] {
width: 100%;
padding: 10px;
font-size: 1rem;
border: 2px solid #ddd;
border-radius: 4px;
margin-bottom: 20px;
}
/* Submit Button Styling */
button[type="submit"], input[type="submit"] {
background-color: #2ecc71;
color: #fff;
padding: 12px 20px;
border: none;
border-radius: 4px;
font-size: 1rem;
cursor: pointer;
transition: background-color 0.3s ease;
margin-top: 10px;
}
button[type="submit"]:hover, input[type="submit"]:hover {
background-color: #27ae60;
}
button[type="submit"]:disabled, input[type="submit"]:disabled {
background-color: #7f8c8d;
cursor: not-allowed;
}
/* Styled Markdown Response */
div {
background-color: #fff;
border-radius: 8px;
box-shadow: 0 2px 10px rgba(0, 0, 0, 0.1);
padding: 30px;
margin-top: 30px;
max-width: 800px;
margin: 30px auto;
}
/* Code Block */
pre {
white-space: pre-wrap;
word-wrap: break-word;
background-color: #f9f9f9;
padding: 15px;
border-radius: 5px;
overflow-x: auto;
}
/* Source Block */
.source-block {
background-color: #f9f9f9;
border-left: 4px solid #2ecc71;
padding: 12px;
margin: 10px 0;
font-family: 'Arial', sans-serif;
font-size: 14px;
text-align: left;
border-radius: 8px;
box-shadow: 0px 2px 8px rgba(0, 0, 0, 0.1);
}
.source-text {
font-style: italic;
color: #616161;
white-space: pre-wrap;
}
/* Responsive Design */
@media (max-width: 600px) {
body {
padding: 10px;
}
h1 {
font-size: 2rem;
}
form {
padding: 20px;
}
input[type="text"],
input[type="submit"],
button[type="submit"] {
font-size: 0.9rem;
}
}
Congratulations! You’ve successfully built a fully functional RAG (Retrieval-Augmented Generation) system using Flask, AWS Bedrock, ChromaDB, and custom styling. In this tutorial, we’ve covered everything from setting up the backend to integrating the AI-driven document retrieval system, and finally enhancing the user interface with clean, responsive design elements.
With your app now able to process user queries and return AI-generated responses based on relevant documents, you have a powerful tool for information retrieval. The integration of Flask with AI models and a well-styled frontend makes it both user-friendly and efficient.
Now that you have a solid foundation, here are some ideas for next steps:
You’ve learned how to integrate powerful AI models with a simple web interface, and this knowledge can be applied to many other applications that require document-based question answering.
Thank you for following along with this tutorial, and best of luck with your future AI projects!