-
Notifications
You must be signed in to change notification settings - Fork 143
Optimize hnsw::from_cagra<GPU> #826
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize hnsw::from_cagra<GPU> #826
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
/ok to test |
Example timelineHardware: NVIDIA H200 NVL + Intel(R) Xeon(R) Gold 6444Y (16 cores / 32 threads) branch-25.06 / input data on the hostbranch-25.06 / input data on the device(segmentation fault) PR-826 / input data on the hostPR-826 / input data on the deviceNotes
|
|
@achirkin it looks like we have some python failures from these changes (this makes me happy we now have python tests for the benchmarks). |
|
@cjnolet lol these tests haven't made me happy while I was trying to setup my environment to run them :) and they just complain about a missing useless column I removed for HNSW algorithm (GPU time). |
|
/merge |
Reduce the CAGRA-for-HNSW build times by: - avoiding unnecessary copies of the data between cagra::build and hnsw::from_cagra in the benchmarks - avoiding unnecessary temporary data buffers in hnsw::from_cagra<GPU> - reducing random reads via forcing 1-1 mapping between the internal indices and external labels during HNSW import As a side-effect, this PR also fixes the bug where hnsw::from_cagra segfaults in benchmarks if the dataset is passed in device memory (and incorrectly wrapped in a host_matrix_view). In addition, this PR adds a bit more verbose NVTX reporting of different stages during the CAGRA/HNSW index build. Authors: - Artem M. Chirkin (https://github.com/achirkin) - Corey J. Nolet (https://github.com/cjnolet) Approvers: - Corey J. Nolet (https://github.com/cjnolet) URL: rapidsai#826



Reduce the CAGRA-for-HNSW build times by:
As a side-effect, this PR also fixes the bug where hnsw::from_cagra segfaults in benchmarks if the dataset is passed in device memory (and incorrectly wrapped in a host_matrix_view).
In addition, this PR adds a bit more verbose NVTX reporting of different stages during the CAGRA/HNSW index build.