-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vecbench: enable benchmarking against CRDB vector index #142480
base: master
Are you sure you want to change the base?
Conversation
Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR refactors vecbench to support benchmarking against the CockroachDB vector index by introducing a new VectorProvider interface with two implementations (SQLProvider and MemProvider). Key changes include:
- Adding progress reporting via an updated ProgressWriter.
- Implementing the VectorProvider interface in both SQLProvider and MemProvider.
- Updating vector set functionality with a new Slice method and corresponding tests.
- Minor improvements in transaction and fixup worker logic.
Reviewed Changes
File | Description |
---|---|
pkg/cmd/vecbench/progress_writer.go | Adds progress reporting with a progress writer implementation |
pkg/cmd/vecbench/vector_provider.go | Introduces the VectorProvider interface |
pkg/cmd/vecbench/sql_provider.go | Implements SQLProvider for vector indexing |
pkg/cmd/vecbench/mem_provider.go | Implements MemProvider with in-memory vector store and persistence |
pkg/util/vector/vector_set.go | Adds a documented Slice method for subsetting vector sets |
pkg/util/vector/vector_set_test.go | Adds tests for the new Slice method |
pkg/sql/vecindex/cspann/fixup_worker.go | Improves error handling in deferred transaction finalization |
pkg/sql/vecindex/cspann/index_test.go | Adjusts goroutine wait handling in index insertion tests |
pkg/sql/vecindex/vecstore/store_txn.go | Adds notes to clarify scan behavior during transactional operations |
pkg/sql/vecindex/cspann/memstore/memstore_txn.go | Enhances transaction time advancement handling |
Copilot reviewed 11 out of 11 changed files in this pull request and generated 1 comment.
|
||
var idxCtx cspann.Context | ||
idxCtx.Init(txn) | ||
for i := range vectors.Count { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop iteration 'for i := range vectors.Count' is incorrect since 'vectors.Count' is an integer. Use a traditional for loop such as 'for i := 0; i < vectors.Count; i++' instead.
for i := range vectors.Count { | |
for i := 0; i < vectors.Count; i++ { |
Copilot is powered by AI, so mistakes are possible. Review output carefully before use.
Refactor vecbench to use a new VectorProvider index that abstracts two different vector indexing implementations: 1. A MemProvider that wraps the existing in-memory vector index and persists its state to a file on disk. 2. A new SQLProvider that runs against CockroachDB and uses SQL to create, insert into, and search vector indexes. Epic: CRDB-42943 Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 11 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @mw5h)
pkg/cmd/vecbench/main.go
line 416 at r2 (raw file):
panic(err) } return provider, errors.Wrap(err, "creating in-memory provider")
Shouldn't we return nil
for the error here?
pkg/cmd/vecbench/sql_provider.go
line 127 at r2 (raw file):
} defer func() { _ = tx.Rollback(ctx)
Should the Rollback
be unconditional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 11 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @andy-kimball and @DrewKimball)
pkg/cmd/vecbench/main.go
line 416 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Shouldn't we return
nil
for the error here?
Also, NewMemProvider can only return nil for the error, so that return value is a bit misleading.
pkg/cmd/vecbench/sql_provider.go
line 127 at r2 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Should the
Rollback
be unconditional?
It's probably fine to rollback after a commit, but a check on the return value would be cleaner.
pkg/cmd/vecbench/sql_provider.go
line 84 at r2 (raw file):
// Load implements the VectorProvider interface. func (s *SQLProvider) Load(ctx context.Context) (bool, error) { // Assume that if the table exists, that it contains the dataset rows.
nit: Maybe in a future version we could write a status row to a table when we get done loading data so that we have more confidence that the loaded table isn't truncated in some way.
pkg/cmd/vecbench/main.go
line 95 at r2 (raw file):
Test vector.Set Neighbors [][]int64 }
nit: I personally find comments on data structures to be very high value compared to almost any other kind.
Code quote:
type searchData struct {
Count int
Test vector.Set
Neighbors [][]int64
}
Refactor vecbench to use a new VectorProvider index that abstracts two different vector indexing implementations:
Epic: CRDB-42943
Release note: None