Skip to content

Commit 003a3b6

Browse files
committed
last part of the first version
1 parent 83490a3 commit 003a3b6

File tree

1 file changed

+77
-7
lines changed

1 file changed

+77
-7
lines changed

tutorials/how-to-implement-rag/index.mdx

+77-7
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,14 @@ meta:
44
description: Learn how to implement Retrieval-Augmented Generation (RAG) using Scaleway's managed inference, PostgreSQL, pgvector, and object storage.
55
content:
66
h1: How to implement RAG with managed inference
7-
tags: inference, managed, postgresql, pgvector, object storage
7+
tags: inference, managed, postgresql, pgvector, object storage, RAG
88
categories:
99
- inference
1010
---
1111

12-
RAG (Retrieval-Augmented Generation) is a powerful approach for enhancing a model's knowledge by leveraging your own dataset.
13-
Scaleway's robust infrastructure makes it easier than ever to implement RAG, as our products are fully compatible with LangChain, especially the OpenAI integration.
14-
By utilizing our managed inference services, managed databases, and object storage, you can effortlessly build and deploy a customized model tailored to your specific needs.
12+
Retrieval-Augmented Generation (RAG) enhances the power of language models by enabling them to retrieve relevant information from external datasets. In this tutorial, we’ll implement RAG using Scaleway’s Managed Inference, PostgreSQL, pgvector, and Scaleway’s Object Storage.
13+
14+
With Scaleway's fully managed services, integrating RAG becomes a streamlined process. You'll use a sentence transformer for embedding text, store embeddings in a PostgreSQL database with pgvector, and leverage object storage for scalable data management.
1515

1616
<Macro id="requirements" />
1717

@@ -56,16 +56,14 @@ By utilizing our managed inference services, managed databases, and object stora
5656

5757
# Scaleway Inference API configuration (Embeddings)
5858
SCW_INFERENCE_EMBEDDINGS_ENDPOINT=your_scaleway_inference_embeddings_endpoint # Endpoint for sentence-transformers/sentence-t5-xxl deployment
59-
SCW_INFERENCE_API_KEY_EMBEDDINGS=your_scaleway_api_key_for_embeddings
6059

6160
# Scaleway Inference API configuration (LLM deployment)
6261
SCW_INFERENCE_DEPLOYMENT_ENDPOINT=your_scaleway_inference_endpoint # Endpoint for your LLM deployment
63-
SCW_INFERENCE_API_KEY=your_scaleway_api_key_for_inference_deployment
6462
```
6563

6664
### Set Up Managed Database
6765

68-
1. Connect to your PostgreSQL instance and install the pg_vector extension.
66+
1. Connect to your PostgreSQL instance and install the pgvector extension, which is used for storing high-dimensional embeddings.
6967

7068
```python
7169
conn = psycopg2.connect(
@@ -89,3 +87,75 @@ By utilizing our managed inference services, managed databases, and object stora
8987
conn.commit()
9088
```
9189

90+
### Set Up Document Loaders for Object Storage
91+
92+
```python
93+
document_loader = S3DirectoryLoader(
94+
bucket=os.getenv('SCW_BUCKET_NAME'),
95+
endpoint_url=os.getenv('SCW_BUCKET_ENDPOINT'),
96+
aws_access_key_id=os.getenv("SCW_ACCESS_KEY"),
97+
aws_secret_access_key=os.getenv("SCW_SECRET_KEY")
98+
)
99+
100+
```
101+
102+
### Embeddings and Vector Store Setup
103+
104+
We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration.
105+
106+
```python
107+
embeddings = OpenAIEmbeddings(
108+
openai_api_key=os.getenv("SCW_API_KEY"),
109+
openai_api_base=os.getenv("SCW_INFERENCE_EMBEDDINGS_ENDPOINT"),
110+
model="sentence-transformers/sentence-t5-xxl",
111+
)
112+
113+
connection_string = f"postgresql+psycopg2://{conn.info.user}:{conn.info.password}@{conn.info.host}:{conn.info.port}/{conn.info.dbname}"
114+
vector_store = PGVector(connection=connection_string, embeddings=embeddings)
115+
```
116+
117+
### Load and Process Documents
118+
119+
Use the S3FileLoader to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database.
120+
121+
```python
122+
files = document_loader.lazy_load()
123+
text_splitter = RecursiveCharacterTextSplitter(chunk_size=300, chunk_overlap=20)
124+
125+
for file in files:
126+
cur.execute("SELECT object_key FROM object_loaded WHERE object_key = %s", (file.metadata["source"],))
127+
if cur.fetchone() is None:
128+
fileLoader = S3FileLoader(
129+
bucket=os.getenv(),
130+
key=file.metadata["source"].split("/")[-1],
131+
endpoint_url=endpoint_s3,
132+
aws_access_key_id=os.getenv("SCW_ACCESS_KEY"),
133+
aws_secret_access_key=os.getenv("SCW_SECRET_KEY")
134+
)
135+
file_to_load = fileLoader.load()
136+
chunks = text_splitter.split_text(file.page_content)
137+
138+
embeddings_list = [embeddings.embed_query(chunk) for chunk in chunks]
139+
for chunk, embedding in zip(chunks, embeddings_list):
140+
vector_store.add_embeddings(embedding, chunk)
141+
```
142+
143+
### Query the RAG System
144+
145+
Now, set up the RAG system to handle queries using RetrievalQA and the LLM.
146+
147+
```python
148+
retriever = vector_store.as_retriever(search_kwargs={"k": 3})
149+
llm = ChatOpenAI(
150+
base_url=os.getenv("SCW_INFERENCE_DEPLOYMENT_ENDPOINT"),
151+
api_key=os.getenv("SCW_API_KEY"),
152+
model=deployment.model_name,
153+
)
154+
155+
qa_stuff = RetrievalQA.from_chain_type(llm=llm, chain_type="stuff", retriever=retriever)
156+
157+
query = "What are the commands to set up a database with the CLI of Scaleway?"
158+
response = qa_stuff.invoke(query)
159+
160+
print(response['result'])
161+
```

0 commit comments

Comments
 (0)