Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(community): Support google bigquery vector store #7790

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
169 changes: 169 additions & 0 deletions docs/core_docs/docs/integrations/vectorstores/google_bigquery.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
---
sidebar_class_name: node-only
---

import CodeBlock from "@theme/CodeBlock";

# Google BigQuery Vector Store

:::tip Compatibility
Only available on Node.js.
:::

The Google BigQuery provide the search embeddings stored in BigQuery Table.
This guide provides a quick overview for getting started with `GoogleBigQueryVectorSearch`

## Setup

To use Google BigQuery stores, you'll need to launch a Google BigQuery service on GCP and install the `@langchain/google-bigquery` integration package.
You also need to install the [`@google-cloud/bigquery`](https://github.com/googleapis/nodejs-bigquery) package to initialize a `Table` instance via `BigQuery`.

This guide will use [OpenAI embeddings](/docs/integrations/text_embedding/openai), which require you to install the `@langchain/openai` integration package. You can also use [other supported embeddings models](/docs/integrations/text_embedding) if you wish.

```{=mdx}
import IntegrationInstallTooltip from "@mdx_components/integration_install_tooltip.mdx";
import Npm2Yarn from "@theme/Npm2Yarn";

<IntegrationInstallTooltip></IntegrationInstallTooltip>

<Npm2Yarn>
@langchain/google-bigquery @langchain/core @google-cloud/bigquery @langchain/openai
</Npm2Yarn>
```

You can set up a BigQuery service locally by [`bigquery-emulator`](https://github.com/goccy/bigquery-emulator).

If you are using OpenAI embeddings for this guide, you'll need to set your OpenAI key as well:

```typescript
process.env.OPENAI_API_KEY = "YOUR_API_KEY";
```

If you want to get automated tracing of your model calls you can also set your [LangSmith](https://docs.smith.langchain.com/) API key by uncommenting below:

```typescript
// process.env.LANGSMITH_TRACING="true"
// process.env.LANGSMITH_API_KEY="your-api-key"
```

## Usage

### Instantiation

Following provide the minimal initialization for setting up the `GoogleBigQueryVectorSearch`:

```typescript
import { BigQuery } from "@google-cloud/bigquery";
import { GoogleBigQueryVectorSearch } from "@langchain/google-bigquery";
import { OpenAIEmbeddings } from "@langchain/openai";

const embeddings = new OpenAIEmbeddings({
model: "text-embedding-3-small",
});

// Create a BigQuery client that uses Application Default Credentials(ADC)
// see https://cloud.google.com/docs/authentication/production#providing_credentials_to_your_application for all the authentication options
const bigQueryClient = new BigQuery();
const table = bigQueryClient.dataset().table();

const vectorStore = new GoogleBigQueryVectorSearch(embeddings, {
table,
});
```

Apart from `table` param, you can custom the `GoogleBigQueryVectorSearch` by your needs

```typescript
const vectorStore = new GoogleBigQueryVectorSearch(embeddings, {
table,
textKey: "text_field", // The name of the table field containing the raw content. Defaults to "text"
embeddingKey: "embedding_field", // The name of the table field containing the embedding text. Defaults to "embedding"
documentKey: "id_field", // The name of the table field representing an unique id. Defaults to "_id"
distanceType: "COSINE", // Only used when vector search, which specifies the type of metric to use to compute the distance between two vectors. Default is "EUCLIDEAN"
useBruteForce: true, // Only used when vector search, which determines whether to use brute force search by skipping the vector index if one is available. Default is false
fractionListsToSearch: 0.005, // Only used when vector search, which specifies the percentage of lists to search. No default value.
});
```

### Adding documents

```typescript
const documents = [
new Document({
pageContent: "this apple",
metadata: {
color: "red",
category: "edible",
},
}),
new Document({
pageContent: "this blueberry",
metadata: {
color: "blue",
category: "edible",
},
}),
new Document({
pageContent: "this firetruck",
metadata: {
color: "red",
category: "machine",
},
}),
];

// Add all the documents to store
await vectorStore.addDocuments(documents);
```

### Querying Documents

Doing a straightforward search which returns all results by searching a keyword

```typescript
const results = await vectorStore.similaritySearch("this", 3);
for (const doc of results) {
console.log(` ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`);
}
```

You can also get the corrsponding score by `similaritySearchWithScore` method

```typescript
const results = await vectorStore.similaritySearchWithScore("this", 3);
for (const [doc, score] of results) {
console.log(
`${score}: ${doc.pageContent} [${JSON.stringify(doc.metadata, null)}]`
);
}
```

### Querying documents with filter

The vectorstore supports two methods for applying filters to metadata fields when performing document searches

#### Object-based Filters

You can pass a Json object where the keys represent metadata fields and the values specify the filter condition.
This method applies an equality filter between the key and the corresponding value. When multiple key-value pairs are provided, they are combined using a logical AND operation.

```typescript
const filters = {
namespace: "color",
category: "machine",
};
const result = await vectorStore.similaritySearch("this", 2, filters);
```

#### SQL-based Filters

Comming soon

## Related

- Vector store [conceptual guide](/docs/concepts/#vectorstores)
- Vector store [how-to guides](/docs/how_to/#vectorstores)

## API reference

For detailed documentation of all `GoogleBigQueryVectorSearch` features and configurations head to the [API reference](https://api.js.langchain.com/classes/langchain_google-bigquery.GoogleBigQueryVectorSearch.html).
13 changes: 13 additions & 0 deletions docs/core_docs/src/theme/FeatureTables.js
Original file line number Diff line number Diff line change
Expand Up @@ -660,6 +660,19 @@ const FEATURE_TABLES = {
local: true,
idsInAddDocuments: false,
},
{
name: "Google BigQuery",
link: "google_bigquery",
deleteById: false,
filtering: true,
searchByVector: true,
searchWithScore: true,
async: true,
passesStandardTests: true,
multiTenancy: false,
local: true,
idsInAddDocuments: false,
},
{
name: "InMemoryVectorStore",
link: "in_memory",
Expand Down
74 changes: 74 additions & 0 deletions libs/langchain-google-bigquery/.eslintrc.cjs
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
module.exports = {
extends: [
"airbnb-base",
"eslint:recommended",
"prettier",
"plugin:@typescript-eslint/recommended",
],
parserOptions: {
ecmaVersion: 12,
parser: "@typescript-eslint/parser",
project: "./tsconfig.json",
sourceType: "module",
},
plugins: ["@typescript-eslint", "no-instanceof"],
ignorePatterns: [
".eslintrc.cjs",
"scripts",
"node_modules",
"dist",
"dist-cjs",
"*.js",
"*.cjs",
"*.d.ts",
],
rules: {
"no-process-env": 2,
"no-instanceof/no-instanceof": 2,
"@typescript-eslint/explicit-module-boundary-types": 0,
"@typescript-eslint/no-empty-function": 0,
"@typescript-eslint/no-shadow": 0,
"@typescript-eslint/no-empty-interface": 0,
"@typescript-eslint/no-use-before-define": ["error", "nofunc"],
"@typescript-eslint/no-unused-vars": ["warn", { args: "none" }],
"@typescript-eslint/no-floating-promises": "error",
"@typescript-eslint/no-misused-promises": "error",
camelcase: 0,
"class-methods-use-this": 0,
"import/extensions": [2, "ignorePackages"],
"import/no-extraneous-dependencies": [
"error",
{ devDependencies: ["**/*.test.ts"] },
],
"import/no-unresolved": 0,
"import/prefer-default-export": 0,
"keyword-spacing": "error",
"max-classes-per-file": 0,
"max-len": 0,
"no-await-in-loop": 0,
"no-bitwise": 0,
"no-console": 0,
"no-restricted-syntax": 0,
"no-shadow": 0,
"no-continue": 0,
"no-void": 0,
"no-underscore-dangle": 0,
"no-use-before-define": 0,
"no-useless-constructor": 0,
"no-return-await": 0,
"consistent-return": 0,
"no-else-return": 0,
"func-names": 0,
"no-lonely-if": 0,
"prefer-rest-params": 0,
"new-cap": ["error", { properties: false, capIsNew: false }],
},
overrides: [
{
files: ["**/*.test.ts"],
rules: {
"@typescript-eslint/no-unused-vars": "off",
},
},
],
};
7 changes: 7 additions & 0 deletions libs/langchain-google-bigquery/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
index.cjs
index.js
index.d.ts
index.d.cts
node_modules
dist
.yarn
19 changes: 19 additions & 0 deletions libs/langchain-google-bigquery/.prettierrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
{
"$schema": "https://json.schemastore.org/prettierrc",
"printWidth": 80,
"tabWidth": 2,
"useTabs": false,
"semi": true,
"singleQuote": false,
"quoteProps": "as-needed",
"jsxSingleQuote": false,
"trailingComma": "es5",
"bracketSpacing": true,
"arrowParens": "always",
"requirePragma": false,
"insertPragma": false,
"proseWrap": "preserve",
"htmlWhitespaceSensitivity": "css",
"vueIndentScriptAndStyle": false,
"endOfLine": "lf"
}
10 changes: 10 additions & 0 deletions libs/langchain-google-bigquery/.release-it.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"github": {
"release": true,
"autoGenerate": true,
"tokenRef": "GITHUB_TOKEN_RELEASE"
},
"npm": {
"versionArgs": ["--workspaces-update=false"]
}
}
21 changes: 21 additions & 0 deletions libs/langchain-google-bigquery/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
The MIT License

Copyright (c) 2025 LangChain

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.
59 changes: 59 additions & 0 deletions libs/langchain-google-bigquery/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# @langchain/google-bigquery

This package contains the LangChain.js integrations for Google BigQuery.

## Installation

```bash npm2yarn
npm install @langchain/google-bigquery @langchain/core
```

## Development

To develop the BigQuery package, you'll need to follow these instructions:

### Install dependencies

```bash
yarn install
```

### Build the package

```bash
yarn build
```

Or from the repo root:

```bash
yarn build --filter=@langchain/google-bigquery
```

### Run tests

Test files should live within a `tests/` file in the `src/` folder. Unit tests should end in `.test.ts` and integration tests should
end in `.int.test.ts`:

```bash
$ yarn test
$ yarn test:int
```

### Lint & Format

Run the linter & formatter to ensure your code is up to standard:

```bash
yarn lint && yarn format
```

### Adding new entrypoints

If you add a new file to be exported, either import & re-export from `src/index.ts`, or add it to the `entrypoints` field in the `config` variable located inside `langchain.config.js` and run `yarn build` to generate the new entrypoint.

### Todo
- [ ] Create table for vector store if not existing
- [ ] Create index for vector store if not existing
- [ ] Support SQL based filter
- [ ] Support delete document
Loading