Skip to content

Conversation

@tfeher
Copy link
Contributor

@tfeher tfeher commented Jan 20, 2025

After calling build(), ideally the CAGRA index contains both the dataset and the graph. But when we do not have sufficient device memory, then only the graph is returned. In such case we need to pass the dataset explicitly to the serialization routines.

For serialization in HNSW format, in case we have flat hierarchy, the dataset was not passed. This PR fixes this problem by adding an optional dataset argument to cagra::serialize_to_hnswlib.

Furthermore, to improve execution time, we change from writing a single element to writing a single row of the graph and dataset at time.

Additionally, debug messages for tracking data saving time are added.

@tfeher tfeher requested a review from a team as a code owner January 20, 2025 11:48
@tfeher tfeher self-assigned this Jan 20, 2025
@github-actions github-actions bot added the cpp label Jan 20, 2025
@tfeher tfeher added bug Something isn't working non-breaking Introduces a non-breaking change and removed cpp labels Jan 20, 2025
@tfeher tfeher requested a review from divyegala January 20, 2025 11:48
@github-actions github-actions bot added the cpp label Jan 20, 2025
@cjnolet
Copy link
Member

cjnolet commented Jan 24, 2025

@tfeher changes look good but there's a docs build failure.

@cjnolet
Copy link
Member

cjnolet commented Jan 30, 2025

/merge

@rapids-bot rapids-bot bot merged commit 0dd7bde into rapidsai:branch-25.02 Jan 30, 2025
61 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cpp non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

3 participants