Skip to content

Commit 403fd5e

Browse files
committed
add custom prompt
1 parent 69bc345 commit 403fd5e

File tree

1 file changed

+144
-84
lines changed

1 file changed

+144
-84
lines changed

tutorials/how-to-implement-rag/index.mdx

+144-84
Original file line numberDiff line numberDiff line change
@@ -33,12 +33,16 @@ LangChain simplifies the process of enhancing language models with retrieval cap
3333

3434
## Configure your development environment
3535

36-
1. Run the following command to install the required packages:
36+
### Step 1: Install Required Packages
37+
38+
Run the following command to install the required packages:
3739

3840
```sh
3941
pip install langchain psycopg2 python-dotenv
4042
```
41-
2. Create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values.
43+
### Step 2: Create a .env File
44+
45+
Create a .env file and add the following variables. These will store your API keys, database connection details, and other configuration values.
4246

4347
```sh
4448
# .env file
@@ -67,23 +71,28 @@ LangChain simplifies the process of enhancing language models with retrieval cap
6771

6872
## Setting Up Managed Databases
6973

74+
### Step 1: Connect to Your PostgreSQL Database
75+
7076
To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as psql. The following steps will guide you through setting up your database to handle vector storage and document tracking.
7177

72-
1. Install the pgvector extension
78+
### Step 2: Install the pgvector Extension
79+
7380
pgvector is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command:
7481

7582
```sql
7683
CREATE EXTENSION IF NOT EXISTS vector;
7784
```
78-
2. Create a table to track processed documents
85+
### Step 3: Create a Table to Track Processed Documents
86+
7987
To prevent reprocessing documents that have already been loaded and vectorized, you should create a table to keep track of them. This will ensure that new documents added to your object storage bucket are only processed once, avoiding duplicate downloads and redundant vectorization:
8088

8189
```sql
8290
CREATE TABLE IF NOT EXISTS object_loaded (id SERIAL PRIMARY KEY, object_key TEXT);
8391
```
8492

85-
3. Connect to PostgreSQL programmatically using Python
86-
You can also connect to your PostgreSQL instance and perform the same tasks programmatically.
93+
### Step 4: Connect to PostgreSQL Programmatically
94+
95+
Connect to your PostgreSQL instance and perform tasks programmatically.
8796

8897
```python
8998
# rag.py file
@@ -108,62 +117,30 @@ conn = psycopg2.connect(
108117
cur = conn.cursor()
109118
```
110119

120+
## Embeddings and Vector Store Setup
111121

112-
### Set Up Document Loaders for Object Storage
113-
114-
In this section, we will use LangChain to load documents stored in your Scaleway Object Storage bucket. The document loader retrieves the contents of each document for further processing, such as vectorization or embedding generation.
115-
116-
1. Storing Data for RAG
117-
Ensure that all the documents and data you want to inject into your Retrieval-Augmented Generation (RAG) system are stored in this Scaleway Object Storage bucket. These could include text files, PDFs, or any other format that will be processed and vectorized in the following steps.
118-
119-
2. Import Required Modules
120-
Before setting up the document loader, you need to import the necessary modules from LangChain and other libraries. Here's how to do that:
122+
### Step 1: Import Required Modules
121123

122124
```python
123125
# rag.py
124126

125-
from langchain.document_loaders import S3DirectoryLoader
126-
import os
127-
```
128-
129-
3. Set Up the Document Loader
130-
The S3DirectoryLoader class, part of LangChain, is specifically designed to load documents from S3-compatible storage (in this case, Scaleway Object Storage).
131-
Now, let’s configure the document loader to pull files from your Scaleway Object Storage bucket using the appropriate credentials and environment variables:
132-
133-
```python
134-
# rag.py
135-
136-
document_loader = S3DirectoryLoader(
137-
bucket=os.getenv('SCW_BUCKET_NAME'),
138-
endpoint_url=os.getenv('SCW_BUCKET_ENDPOINT'),
139-
aws_access_key_id=os.getenv("SCW_ACCESS_KEY"),
140-
aws_secret_access_key=os.getenv("SCW_API_KEY")
141-
)
142-
143-
```
144-
145-
This section highlights that you're leveraging LangChain’s document loader capabilities to connect directly to your Scaleway Object Storage. LangChain simplifies the process of integrating external data sources, allowing you to focus on building a RAG system without handling low-level integration details.
146-
147-
### Embeddings and Vector Store Setup
148-
1. Import the required module
149-
```python
150-
# rag.py
151-
152127
from langchain_openai import OpenAIEmbeddings
153128
from langchain_postgres import PGVector
154129
```
155130

156-
2. We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration.
131+
### Step 2: Configure OpenAI Embeddings
132+
133+
We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration.
157134

158135
```python
159136
# rag.py
160137

161-
embeddings = OpenAIEmbeddings(
162-
openai_api_key=os.getenv("SCW_API_KEY"),
163-
openai_api_base=os.getenv("SCW_INFERENCE_EMBEDDINGS_ENDPOINT"),
164-
model="sentence-transformers/sentence-t5-xxl",
165-
tiktoken_enabled=False,
166-
)
138+
embeddings = OpenAIEmbeddings(
139+
openai_api_key=os.getenv("SCW_API_KEY"),
140+
openai_api_base=os.getenv("SCW_INFERENCE_EMBEDDINGS_ENDPOINT"),
141+
model="sentence-transformers/sentence-t5-xxl",
142+
tiktoken_enabled=False,
143+
)
167144
```
168145

169146
#### Key Parameters:
@@ -182,31 +159,49 @@ In the context of using Scaleway’s Managed Inference and the sentence-t5-xxl m
182159
Moreover, leaving tiktoken_enabled as True causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior.
183160
By setting tiktoken_enabled=False, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure.
184161

185-
2. Next, configure the connection string for your PostgreSQL instance and create a PGVector store to store these embeddings.
162+
### Step 3: Create a PGVector Store
163+
164+
Configure the connection string for your PostgreSQL instance and create a PGVector store to store these embeddings.
186165

187166
```python
167+
# rag.py
188168

189-
connection_string = f"postgresql+psycopg2://{conn.info.user}:{conn.info.password}@{conn.info.host}:{conn.info.port}/{conn.info.dbname}"
190-
vector_store = PGVector(connection=connection_string, embeddings=embeddings)
169+
connection_string = f"postgresql+psycopg2://{conn.info.user}:{conn.info.password}@{conn.info.host}:{conn.info.port}/{conn.info.dbname}"
170+
vector_store = PGVector(connection=connection_string, embeddings=embeddings)
191171
```
192172

193173
PGVector: This creates the vector store in your PostgreSQL database to store the embeddings.
194174

195-
### Load and Process Documents
175+
## Load and Process Documents
196176

197177
Use the S3FileLoader to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database.
198178

199-
1. Load Metadata for Improved Efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document.
179+
### Step 1: Import Required Modules
180+
181+
```python
182+
#rag.py
183+
184+
import boto3
185+
from langchain_community.document_loaders import S3FileLoader
186+
from langchain.text_splitter import RecursiveCharacterTextSplitter
187+
from langchain_openai import OpenAIEmbeddings
188+
189+
```
190+
191+
### Step 2: Load Metadata for Improved Efficiency
192+
193+
Load Metadata for Improved Efficiency: By loading the metadata for all objects in your bucket, you can speed up the process significantly. This allows you to quickly check if a document has already been embedded without the need to load the entire document.
200194

201195
```python
202-
endpoint_s3 = f"https://s3.{os.getenv('SCW_DEFAULT_REGION', '')}.scw.cloud"
203-
session = boto3.session.Session()
204-
client_s3 = session.client(service_name='s3', endpoint_url=endpoint_s3,
196+
# rag.py
197+
198+
endpoint_s3 = f"https://s3.{os.getenv('SCW_DEFAULT_REGION', '')}.scw.cloud"
199+
session = boto3.session.Session()
200+
client_s3 = session.client(service_name='s3', endpoint_url=endpoint_s3,
205201
aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""),
206202
aws_secret_access_key=os.getenv("SCW_SECRET_KEY", ""))
207-
paginator = client_s3.get_paginator('list_objects_v2')
208-
page_iterator = paginator.paginate(Bucket=BUCKET_NAME)
209-
203+
paginator = client_s3.get_paginator('list_objects_v2')
204+
page_iterator = paginator.paginate(Bucket=BUCKET_NAME)
210205
```
211206

212207
In this code sample we:
@@ -215,34 +210,37 @@ In this code sample we:
215210
- Set Up Pagination for Listing Objects: We prepare pagination to handle potentially large lists of objects efficiently.
216211
- Iterate Through the Bucket: This initiates the pagination process, allowing us to list all objects within the specified Scaleway Object bucket seamlessly.
217212

218-
2. Iterate Through Metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database.
213+
### Step 3: Iterate Through Metadata
214+
215+
Iterate Through Metadata: Next, we will iterate through the metadata to determine if each object has already been embedded. If an object hasn’t been processed yet, we will embed it and load it into the database.
219216

220217
```python
221-
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True, length_function=len, is_separator_regex=False)
222-
for page in page_iterator:
223-
for obj in page.get('Contents', []):
224-
cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (obj['Key'],))
225-
response = cur.fetchone()
218+
# rag.py
219+
220+
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0, add_start_index=True, length_function=len, is_separator_regex=False)
221+
for page in page_iterator:
222+
for obj in page.get('Contents', []):
223+
cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (obj['Key'],))
224+
response = cur.fetchone()
226225
if response is None:
227226
file_loader = S3FileLoader(
228-
bucket=BUCKET_NAME,
229-
key=obj['Key'],
230-
endpoint_url=endpoint_s3,
231-
aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""),
232-
aws_secret_access_key=os.getenv("SCW_SECRET_KEY", "")
233-
)
227+
bucket=BUCKET_NAME,
228+
key=obj['Key'],
229+
endpoint_url=endpoint_s3,
230+
aws_access_key_id=os.getenv("SCW_ACCESS_KEY", ""),
231+
aws_secret_access_key=os.getenv("SCW_SECRET_KEY", "")
232+
)
234233
file_to_load = file_loader.load()
235234
cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)", (obj['Key'],))
236235
chunks = text_splitter.split_text(file_to_load[0].page_content)
237236
try:
238237
embeddings_list = [embeddings.embed_query(chunk) for chunk in chunks]
239238
vector_store.add_embeddings(chunks, embeddings_list)
240-
cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)",
241-
(obj['Key'],))
239+
cur.execute("INSERT INTO object_loaded (object_key) VALUES (%s)", (obj['Key'],))
242240
except Exception as e:
243241
logger.error(f"An error occurred: {e}")
244242

245-
conn.commit()
243+
conn.commit()
246244
```
247245

248246
- S3FileLoader: The S3FileLoader loads each file individually from your ***Scaleway Object Storage bucket*** using the file's object_key (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time.
@@ -266,29 +264,44 @@ When a query is made, the RAG system will retrieve the most relevant embeddings,
266264

267265
### Query the RAG System with a pre-defined prompt template
268266

267+
### Step 1: Import Required Modules
268+
269+
```python
270+
#rag.py
271+
272+
from langchain import hub
273+
from langchain_core.output_parsers import StrOutputParser
274+
from langchain_core.runnables import RunnablePassthrough
275+
276+
```
277+
278+
### Step 2: Setup LLM for Querying
279+
269280
Now, set up the RAG system to handle queries
270281

271282
```python
283+
#rag.py
284+
272285
llm = ChatOpenAI(
273286
base_url=os.getenv("SCW_INFERENCE_DEPLOYMENT_ENDPOINT"),
274287
api_key=os.getenv("SCW_SECRET_KEY"),
275288
model=deployment.model_name,
276-
)
289+
)
277290

278-
prompt = hub.pull("rlm/rag-prompt")
279-
retriever = vector_store.as_retriever()
291+
prompt = hub.pull("rlm/rag-prompt")
292+
retriever = vector_store.as_retriever()
280293

281294

282-
rag_chain = (
295+
rag_chain = (
283296
{"context": retriever, "question": RunnablePassthrough()}
284297
| prompt
285298
| llm
286299
| StrOutputParser()
287300
)
288301

289-
for r in rag_chain.stream("Your question"):
290-
print(r, end="", flush=True)
291-
time.sleep(0.15)
302+
for r in rag_chain.stream("Your question"):
303+
print(r, end="", flush=True)
304+
time.sleep(0.1)
292305
```
293306
- LLM Initialization: We initialize the ChatOpenAI instance using the endpoint and API key from the environment variables, along with the specified model name.
294307

@@ -302,8 +315,55 @@ llm = ChatOpenAI(
302315

303316
### Query the RAG system with you own prompt template
304317

318+
Personalizing your prompt template allows you to tailor the responses from your RAG (Retrieval-Augmented Generation) system to better fit your specific needs. This can significantly improve the relevance and tone of the answers you receive. Below is a detailed guide on how to create a custom prompt for querying the system.
319+
320+
```python
321+
#rag.py
322+
323+
from langchain.chains.combine_documents import create_stuff_documents_chain
324+
from langchain_core.prompts import PromptTemplate
325+
from langchain_openai import ChatOpenAI
326+
327+
llm = ChatOpenAI(
328+
base_url=os.getenv("SCW_INFERENCE_DEPLOYMENT_ENDPOINT"),
329+
api_key=os.getenv("SCW_SECRET_KEY"),
330+
model=deployment.model_name,
331+
)
332+
prompt = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Always finish your answer by "Thank you for asking". {context} Question: {question} Helpful Answer:"""
333+
custom_rag_prompt = PromptTemplate.from_template(prompt)
334+
retriever = vector_store.as_retriever()
335+
custom_rag_chain = create_stuff_documents_chain(llm, custom_rag_prompt)
336+
337+
338+
context = retriever.invoke("your question")
339+
for r in custom_rag_chain.stream({"question":"your question", "context": context}):
340+
print(r, end="", flush=True)
341+
time.sleep(0.1)
342+
```
343+
344+
- Prompt Template: The prompt template is meticulously crafted to direct the model's responses. It clearly instructs the model on how to leverage the provided context and emphasizes the importance of honesty in cases where it lacks information.
345+
To make the responses more engaging, consider adding a light-hearted conclusion or a personalized touch. For example, you might modify the closing line to say, "Thank you for asking! I'm here to help with anything else you need!"
346+
Retrieving Context:
347+
- The retriever.invoke(new_message) method fetches relevant information from your vector store based on the user’s query. It's essential that this step retrieves high-quality context to ensure that the model's responses are accurate and helpful.
348+
You can enhance the quality of the context by fine-tuning your embeddings and ensuring that the documents in your vector store are relevant and well-structured.
349+
Creating the RAG Chain:
350+
- The create_stuff_documents_chain function connects the language model with your custom prompt. This integration allows the model to process the retrieved context effectively and formulate a coherent and context-aware response.
351+
Consider experimenting with different chain configurations to see how they affect the output. For instance, using a different chain type may yield varied responses.
352+
Streaming Responses:
353+
- The loop that streams responses from the custom_rag_chain provides a dynamic user experience. Instead of waiting for the entire output, users can see responses as they are generated, enhancing interactivity.
354+
You can customize the streaming behavior further, such as implementing progress indicators or more sophisticated UI elements for applications.
355+
356+
#### Example Use Cases
357+
- Customer Support: Use a custom prompt to answer customer queries effectively, making the interactions feel more personalized and engaging.
358+
- Research Assistance: Tailor prompts to provide concise summaries or detailed explanations on specific topics, enhancing your research capabilities.
359+
- Content Generation: Personalize prompts for creative writing, generating responses that align with specific themes or tones.
360+
305361
### Conclusion
306362

307-
In this tutorial, we explored essential techniques for efficiently processing and storing large document datasets for a Retrieval-Augmented Generation (RAG) system. By leveraging metadata, we can quickly check which documents have already been processed, ensuring that our system operates smoothly without redundant data handling. Chunking optimizes the processing of each document, maximizing the performance of the LLM. Storing embeddings in PostgreSQL via pgvector enables fast and scalable retrieval, ensuring quick responses to user queries.
363+
In this tutorial, we explored essential techniques for efficiently processing and storing large document datasets within a Retrieval-Augmented Generation (RAG) system. By leveraging metadata, we ensured that our system avoids redundant data handling, allowing for smooth and efficient operations. The use of chunking optimizes document processing, maximizing the performance of the language model. Storing embeddings in PostgreSQL via pgvector enables rapid and scalable retrieval, ensuring quick responses to user queries.
364+
365+
Furthermore, you can continually enhance your RAG system by implementing mechanisms to retain chat history. Keeping track of past interactions allows for more contextually aware responses, fostering a more engaging user experience. This historical data can be used to refine your prompts, adapt to user preferences, and improve the overall accuracy of responses.
366+
367+
By integrating Scaleway’s Managed Object Storage, PostgreSQL with pgvector, and LangChain’s embedding tools, you have the foundation to build a powerful RAG system that scales with your data while offering robust information retrieval capabilities. This approach equips you with the tools necessary to handle complex queries and deliver accurate, relevant results efficiently.
308368

309-
By integrating Scaleway’s Managed Object Storage, PostgreSQL with pgvector, and LangChain’s embedding tools, you can build a powerful RAG system that scales with your data while offering robust information retrieval capabilities. This approach equips you with the tools necessary to handle complex queries and deliver accurate, relevant results efficiently.
369+
With ongoing refinement and adaptation, your RAG system can evolve to meet the changing needs of your users, ensuring that it remains a valuable asset in your AI toolkit.

0 commit comments

Comments
 (0)