Skip to content

Commit 69c9974

Browse files
committed
exemple of endpoint
1 parent 403fd5e commit 69c9974

File tree

1 file changed

+15
-17
lines changed

1 file changed

+15
-17
lines changed

tutorials/how-to-implement-rag/index.mdx

+15-17
Original file line numberDiff line numberDiff line change
@@ -60,24 +60,24 @@ Create a .env file and add the following variables. These will store your API ke
6060

6161
# Scaleway S3 bucket configuration
6262
SCW_BUCKET_NAME=your_scaleway_bucket_name
63-
SCW_BUCKET_ENDPOINT=your_scaleway_bucket_endpoint # S3 endpoint, e.g., https://s3.fr-par.scw.cloud
63+
SCW_BUCKET_ENDPOINT="https://{{SCW_BUCKET_NAME}}.s3.{{SCW_REGION}}.scw.cloud" # S3 endpoint, e.g., https://s3.fr-par.scw.cloud
6464

6565
# Scaleway Inference API configuration (Embeddings)
66-
SCW_INFERENCE_EMBEDDINGS_ENDPOINT=your_scaleway_inference_embeddings_endpoint # Endpoint for sentence-transformers/sentence-t5-xxl deployment
66+
SCW_INFERENCE_EMBEDDINGS_ENDPOINT="https://{{SCW_INFERENCE_DEPLOYMENT_ID}}.ifr.fr-par.scw.cloud/v1" # Endpoint for sentence-transformers/sentence-t5-xxl deployment
6767

6868
# Scaleway Inference API configuration (LLM deployment)
69-
SCW_INFERENCE_DEPLOYMENT_ENDPOINT=your_scaleway_inference_endpoint # Endpoint for your LLM deployment
69+
SCW_INFERENCE_DEPLOYMENT_ENDPOINT="https://{{SCW_INFERENCE_DEPLOYMENT_ID}}.ifr.fr-par.scw.cloud/v1" # Endpoint for your LLM deployment
7070
```
7171

7272
## Setting Up Managed Databases
7373

7474
### Step 1: Connect to Your PostgreSQL Database
7575

76-
To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as psql. The following steps will guide you through setting up your database to handle vector storage and document tracking.
76+
To perform these actions, you'll need to connect to your PostgreSQL database. You can use any PostgreSQL client, such as [psql](https://www.postgresql.org/docs/current/app-psql.html). The following steps will guide you through setting up your database to handle vector storage and document tracking.
7777

7878
### Step 2: Install the pgvector Extension
7979

80-
pgvector is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command:
80+
[pgvector](https://github.com/pgvector/pgvector) is essential for storing and indexing high-dimensional vectors, which are critical for retrieval-augmented generation (RAG) systems. Ensure that it is installed by executing the following SQL command:
8181

8282
```sql
8383
CREATE EXTENSION IF NOT EXISTS vector;
@@ -130,7 +130,7 @@ from langchain_postgres import PGVector
130130

131131
### Step 2: Configure OpenAI Embeddings
132132

133-
We will utilize the OpenAIEmbeddings class from LangChain and store the embeddings in PostgreSQL using the PGVector integration.
133+
We will utilize the [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) class from LangChain and store the embeddings in PostgreSQL using the PGVector integration.
134134

135135
```python
136136
# rag.py
@@ -144,20 +144,20 @@ embeddings = OpenAIEmbeddings(
144144
```
145145

146146
#### Key Parameters:
147-
- openai_api_key: This is your API key for accessing the OpenAI-powered embeddings service, in this case, deployed via Scaleway’s Managed Inference.
148-
- openai_api_base: This is the base URL that points to your deployment of the sentence-transformers/sentence-t5-xxl model on Scaleway's Managed Inference. This URL serves as the entry point to make API calls for generating embeddings.
149-
- model="sentence-transformers/sentence-t5-xxl": This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems.
150-
- tiktoken_enabled=False: This is an important parameter, which disables the use of TikToken for tokenization within the embeddings process.
147+
- `openai_api_key`: This is your API key for accessing the OpenAI-powered embeddings service, in this case, deployed via Scaleway’s Managed Inference.
148+
- `openai_api_base`: This is the base URL that points to your deployment of the sentence-transformers/sentence-t5-xxl model on Scaleway's Managed Inference. This URL serves as the entry point to make API calls for generating embeddings.
149+
- `model="sentence-transformers/sentence-t5-xxl"`: This defines the specific model being used for text embeddings. sentence-transformers/sentence-t5-xxl is a powerful model optimized for generating high-quality sentence embeddings, making it ideal for tasks like document retrieval in RAG systems.
150+
- `tiktoken_enabled=False`: This is parameter disables the use of TikToken for tokenization within the embeddings process.
151151

152152
#### What is tiktoken_enabled?
153153

154-
tiktoken is a tokenization library developed by OpenAI, which is optimized for working with GPT-based models (like GPT-3.5 or GPT-4). It transforms text into smaller token units that the model can process.
154+
[`tiktoken`](https://github.com/openai/tiktoken) is a tokenization library developed by OpenAI, which is optimized for working with GPT-based models (like GPT-3.5 or GPT-4). It transforms text into smaller token units that the model can process.
155155

156156
#### Why set tiktoken_enabled=False?
157157

158-
In the context of using Scaleway’s Managed Inference and the sentence-t5-xxl model, TikToken tokenization is not necessary because the model you are using (sentence-transformers) works with raw text and handles its own tokenization internally.
159-
Moreover, leaving tiktoken_enabled as True causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior.
160-
By setting tiktoken_enabled=False, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure.
158+
In the context of using Scaleway’s Managed Inference and the `sentence-t5-xxl` model, TikToken tokenization is not necessary because the model you are using (sentence-transformers) works with raw text and handles its own tokenization internally.
159+
Moreover, leaving `tiktoken_enabled` as `True` causes issues when sending data to Scaleway’s API because it results in tokenized vectors being sent instead of raw text. Since Scaleway's endpoint expects text and not pre-tokenized data, this mismatch can lead to errors or incorrect behavior.
160+
By setting `tiktoken_enabled=False`, you ensure that raw text is sent to Scaleway's Managed Inference endpoint, which is what the sentence-transformers model expects to process. This guarantees that the embedding generation process works smoothly with Scaleway's infrastructure.
161161

162162
### Step 3: Create a PGVector Store
163163

@@ -174,7 +174,7 @@ PGVector: This creates the vector store in your PostgreSQL database to store the
174174

175175
## Load and Process Documents
176176

177-
Use the S3FileLoader to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database.
177+
Use the [`S3FileLoader`](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.s3_file.S3FileLoader.html) to load documents and split them into chunks. Then, embed and store them in your PostgreSQL database.
178178

179179
### Step 1: Import Required Modules
180180

@@ -245,8 +245,6 @@ conn.commit()
245245

246246
- S3FileLoader: The S3FileLoader loads each file individually from your ***Scaleway Object Storage bucket*** using the file's object_key (extracted from the file's metadata). It ensures that only the specific file is loaded from the bucket, minimizing the amount of data being retrieved at any given time.
247247
- RecursiveCharacterTextSplitter: The RecursiveCharacterTextSplitter breaks each document into smaller chunks of text. This is crucial because embeddings models, like those used in Retrieval-Augmented Generation (RAG), typically have a limited context window (the number of tokens they can process at once).
248-
- Chunk Size: Here, the chunk size is set to 480 characters, with an overlap of 20 characters. The choice of 480 characters is based on the context size supported by the embeddings model. Models have a maximum number of tokens they can process in a single pass, often around 512 tokens or fewer, depending on the specific model you are using. To ensure that each chunk fits within this limit, 380 characters provide a buffer, as different models tokenize characters into variable-length tokens.
249-
- Chunk Overlap: The 20-character overlap ensures continuity between chunks, which helps prevent loss of meaning or context between segments.
250248
- Embedding the Chunks: For each document, the text is split into smaller chunks using the text splitter, and an embedding is generated for each chunk using the embeddings.embed_query(chunk) function. This function transforms each chunk into a vector representation that can later be used for similarity search.
251249
- Embedding Storage: After generating the embeddings for each chunk, they are stored in a vector database (e.g., PostgreSQL with pgvector) using the vector_store.add_embeddings(embedding, chunk) method. Each embedding is stored alongside its corresponding text chunk, enabling retrieval during a query.
252250
- Avoiding Redundant Processing: The script checks the object_loaded table in PostgreSQL to see if a document has already been processed (i.e., the object_key exists in the table). If it has, the file is skipped, avoiding redundant downloads, vectorization, and database inserts. This ensures that only new or modified documents are processed, reducing the system's computational load and saving both time and resources.

0 commit comments

Comments
 (0)