Skip to content

Conversation

@tfeher
Copy link
Contributor

@tfeher tfeher commented Jul 9, 2025

The refine functions that work with GPU data use IVF-Flat under the hood to perform the refinement operation. This PR adds extern template declarations for ivfflat_interleaved_scan and uses these in the refine functions. This way we avoid recompiling the IVF-Flat search kernels, and save binary size.

Before this PR ivfflat_interleaved_scan was compiled through the ivf_flat::search() function instantiations. But the function symbols were not available due to inlining. This PR also add explicit instantiations for ivfflat_interleaved_scan, and now both ivf_flat::search and refine can use the same interleaved scan function.

@tfeher tfeher requested a review from a team as a code owner July 9, 2025 17:29
@tfeher tfeher self-assigned this Jul 9, 2025
@github-actions github-actions bot added the cpp label Jul 9, 2025
@tfeher tfeher added improvement Improves an existing functionality non-breaking Introduces a non-breaking change and removed cpp labels Jul 9, 2025
@github-actions github-actions bot added the cpp label Jul 9, 2025
@tfeher
Copy link
Contributor Author

tfeher commented Jul 9, 2025

The ivfflat_interleaved_scan function is expected to be the largest contributor in binary size for the refine_device. We still have a few other kernel calls in refine_device, In a separate PR I will check if we can get rid of those.

@tfeher
Copy link
Contributor Author

tfeher commented Jul 9, 2025

The binary size is nicely reduced, but test fail due to undefined symbols. It worked locally, I will look into it.

filname compile time binary size
refine_device_half_float.cu.o 112.149 s 2.185 MB
refine_device_float_float.cu.o 111.532 s 2.263 MB
refine_device_uint8_t_float.cu.o 111.155 s 2.183 MB
refine_device_int8_t_float.cu.o 110.015 s 2.183 MB

@tfeher
Copy link
Contributor Author

tfeher commented Jul 14, 2025

The error is related to the filter type used for instantiating the search kernels. I am looking into the details.

@tfeher tfeher force-pushed the refine_binary_size branch from 643f3b4 to 028e98b Compare July 28, 2025 23:37
@tfeher tfeher requested a review from a team as a code owner July 28, 2025 23:37
@github-actions github-actions bot added the CMake label Jul 28, 2025
Copy link
Contributor

@achirkin achirkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Tamas, the code looks good to me. I have couple comments to improve my understanding of the new indexing types.

@tfeher
Copy link
Contributor Author

tfeher commented Jul 29, 2025

object file size after [MB] size before [MB]
ivf_flat/ivf_flat_interleaved_scan_uint8_t_int64_t.cu.o 49.2  
ivf_flat/ivf_flat_interleaved_scan_int8_t_int64_t.cu.o 48.7  
ivf_flat/ivf_flat_interleaved_scan_half_int64_t.cu.o 45.1  
ivf_flat/ivf_flat_interleaved_scan_float_int64_t.cu.o 43.6  
refine/detail/refine_device_uint8_t_float.cu.o 2.2 25.5
refine/detail/refine_device_half_float.cu.o 2.2 23.4
refine/detail/refine_device_float_float.cu.o 2.2 43.4
refine/detail/refine_device_int8_t_float.cu.o 2.2 25.1
ivf_flat/ivf_flat_search_half_int64_t.cu.o 1.2 45.9
ivf_flat/ivf_flat_search_int8_t_int64_t.cu.o 1.2 49.5
ivf_flat/ivf_flat_search_float_int64_t.cu.o 1.2 44.4
ivf_flat/ivf_flat_search_uint8_t_int64_t.cu.o 1.2 50.0
total 200.0 307.2

Copy link
Contributor Author

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Artem for the review, I have addressed the issues.

Copy link
Contributor Author

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Divye for the review, I have addressed the issues.

@tfeher
Copy link
Contributor Author

tfeher commented Jul 30, 2025

/merge

@rapids-bot rapids-bot bot merged commit 10e0795 into rapidsai:branch-25.08 Jul 30, 2025
53 checks passed
@divyegala divyegala linked an issue Jul 31, 2025 that may be closed by this pull request
lowener pushed a commit to lowener/cuvs that referenced this pull request Aug 11, 2025
The refine functions that work with GPU data use IVF-Flat under the hood to perform the refinement operation. This PR adds extern template declarations for `ivfflat_interleaved_scan` and uses these in the refine functions. This way we avoid recompiling the IVF-Flat search kernels, and save binary size.

Before this PR `ivfflat_interleaved_scan` was compiled through the `ivf_flat::search()` function instantiations. But the function symbols were not available due to inlining. This PR also add explicit instantiations for `ivfflat_interleaved_scan`, and now both `ivf_flat::search` and `refine` can use the same interleaved scan function.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Divye Gala (https://github.com/divyegala)

URL: rapidsai#1095
enp1s0 pushed a commit to enp1s0/cuvs that referenced this pull request Aug 22, 2025
The refine functions that work with GPU data use IVF-Flat under the hood to perform the refinement operation. This PR adds extern template declarations for `ivfflat_interleaved_scan` and uses these in the refine functions. This way we avoid recompiling the IVF-Flat search kernels, and save binary size.

Before this PR `ivfflat_interleaved_scan` was compiled through the `ivf_flat::search()` function instantiations. But the function symbols were not available due to inlining. This PR also add explicit instantiations for `ivfflat_interleaved_scan`, and now both `ivf_flat::search` and `refine` can use the same interleaved scan function.

Authors:
  - Tamas Bela Feher (https://github.com/tfeher)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Divye Gala (https://github.com/divyegala)

URL: rapidsai#1095
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

Explore kernel sizes of refine_*.cu

3 participants