Skip to content

Conversation

@cjnolet
Copy link
Member

@cjnolet cjnolet commented Jul 11, 2024

Closes #135

This PR brings in RBC implementation from RAFT while also reducing the number of templates that are instantiated by moving the following templates to runtime parameters:

  1. dims
  2. booleans
  3. distance functor

Notes for Reviewers

Benefits of these changes

Allows for a reduction in cuML binary sizes, once cuML switches from RAFT's implementation to this one.

See:

rapidsai/cuml#6626 (comment)

@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Jul 11, 2024
@cjnolet cjnolet self-assigned this Jul 11, 2024
@cjnolet cjnolet requested review from a team as code owners July 11, 2024 19:54
@codecov-commenter
Copy link

codecov-commenter commented Jul 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.49%. Comparing base (05494cb) to head (0e9323f).
Report is 4 commits behind head on branch-25.06.

Additional details and impacted files
@@              Coverage Diff              @@
##           branch-25.06     #218   +/-   ##
=============================================
  Coverage         84.49%   84.49%           
=============================================
  Files                20       20           
  Lines               129      129           
=============================================
  Hits                109      109           
  Misses               20       20           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -0,0 +1,205 @@
/*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: since we're no longer header only - I'd prefer to get rid of having -ext.cuh and -inl.cuh header files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean just consolidate them into one file or do you mean remove the intantiations / ignored extern templates?

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cjnolet
Copy link
Member Author

cjnolet commented Jul 30, 2024

Linking #110

@cjnolet cjnolet requested review from a team as code owners August 15, 2024 19:01
@cjnolet cjnolet requested a review from KyleFromNVIDIA August 15, 2024 19:01
@cjnolet cjnolet changed the base branch from branch-24.08 to branch-24.10 August 26, 2024 21:53
@cjnolet
Copy link
Member Author

cjnolet commented Aug 27, 2024

/ok to test

@github-actions github-actions bot removed the ci label Aug 27, 2024
@divyegala
Copy link
Member

divyegala commented May 7, 2025

As of commit 854837c, the total size added is now ~19 MB, while the wheel size increases from ~900 MB to ~911 MB.

Screenshot 2025-05-07 at 5 39 41 PM

* @param[inout] index an empty (and not previous built) instance of
* cuvs::neighbors::ball_cover::index
*/
void build(raft::resources const& handle, index<int64_t, float>& index);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note- we should have the factory function here too. We can do as a follow-on at some point. RBC isn't a heavily used API today.

* many datasets can still have great recall even by only
* looking in the closest landmark.
*/
void all_knn_query(raft::resources const& handle,
Copy link
Member Author

@cjnolet cjnolet May 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we should move this behind our all_neighbors API cc @jinsolp

"This notebook demonstrates how to run approximate nearest neighbor search using cuVS IVF-Flat algorithm.\n",
"It builds and searches an index using a dataset from the ann-benchmarks million-scale datasets, saves/loads the index to disk, and explores important parameters for fine-tuning the search performance and accuracy of the index."
]
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this change is valid. Do you mind looking at this notebook real quick just to verify this isn't stale?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just reverted the changes.

@rapidsai rapidsai deleted a comment from cjnolet May 8, 2025
@divyegala divyegala requested a review from a team May 8, 2025 22:57
Copy link
Member

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving, with the understanding that while this increases the binary size for cuVS a bit, it leads to a much larger compensating reduction in cuML binary sizes: rapidsai/cuml#6626 (comment)

@jameslamb jameslamb removed the request for review from KyleFromNVIDIA May 9, 2025 20:14
@divyegala
Copy link
Member

/merge

@rapids-bot rapids-bot bot merged commit 3c303a0 into branch-25.06 May 9, 2025
75 checks passed
rapids-bot bot pushed a commit to rapidsai/cuml that referenced this pull request May 14, 2025
Depends on rapidsai/cuvs#218. This PR reduces the supported combination of types for `RBC` method in `dbscan.cu` to only `<float, int64_t>`. This is because this is the only type combination that cuVS compiles RBC for, which is otherwise very expensive and slow to compile.

### Effects on Binary Size
Tracked here #6626 (comment)

Authors:
  - Divye Gala (https://github.com/divyegala)

Approvers:
  - Simon Adorf (https://github.com/csadorf)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #6644
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci CMake cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change Python

Development

Successfully merging this pull request may close these issues.

6 participants