Skip to content

Conversation

@enp1s0
Copy link
Member

@enp1s0 enp1s0 commented Jan 24, 2025

This PR adds the binary Hamming distance support to CAGRA.

dependency: #612

TODO:

  • Add Hamming distance dist_op for CAGRA search
    • Add DistanceType::BinaryHamming (because HammingUnexpanded is not bitwise operation)
  • Support graph build
    • Want to use the iterative graph build method that uses the CAGRA search for graph build (anaruse@5a80659) because otherwise the binary Hamming distance support for IVFPQ or NN Descent is additionally required.
  • Add CI test
    • GT creation
    • Add test cases
  • Test on some benchmark datasets
  • Add preprocessing::quantize::binary

@copy-pr-bot
Copy link

copy-pr-bot bot commented Jan 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@enp1s0 enp1s0 self-assigned this Jan 24, 2025
@enp1s0 enp1s0 added feature request New feature or request non-breaking Introduces a non-breaking change labels Jan 24, 2025
@cjnolet
Copy link
Member

cjnolet commented Jan 25, 2025

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Feb 1, 2025

/ok to test

@enp1s0
Copy link
Member Author

enp1s0 commented Feb 1, 2025

@cjnolet In this PR, I include preprocess::binary::transform, which performs binary quantization. The current algorithm sets a bit to 1 if a corresponding element of the input vector is positive, but is it better to be able to select another algorithm for a future extension?
The current function arguments are:

void transform(raft::resources const& res,
               raft::device_matrix_view<const double, int64_t> dataset,
               raft::device_matrix_view<uint8_t, int64_t> out);

and, for example, is it better to define a param structure and change the function arguments as follows?

enum class binary_quantization_algo_t {set_bit_if_positive};
struct cuvs::cuvs::preprocessing::quantize::binary::params{
    binary_quantization_algo_t algo = binary_quantize_algo_t::set_if_positive;
};
void transform(raft::resources const& res,
               cuvs::cuvs::preprocessing::quantize::binary::params& params,
               raft::device_matrix_view<const double, int64_t> dataset,
               raft::device_matrix_view<uint8_t, int64_t> out);

cjnolet and others added 3 commits February 1, 2025 18:43
This PR is about how CAGRA's search() and optimize() can be used to iteratively create and improve graph index.

Currently, IVFPQ and NND are used to create the initial kNN graph, which is then optimized to create the CAGRA search graph. So, for example, if you want to support a new data type in CAGRA, you need to create an initial kNN graph with that data type, and IVFPQ or NND must also support that new data type. This is a bit of hassle.

This PR is one solution to that problem. With functionality of this PR, once the CAGRA search supports the new data type, it can be used to create a graph index with it.

Authors:
  - Akira Naruse (https://github.com/anaruse)
  - Corey J. Nolet (https://github.com/cjnolet)

Approvers:
  - Artem M. Chirkin (https://github.com/achirkin)
  - Tamas Bela Feher (https://github.com/tfeher)

URL: rapidsai#612
@cjnolet
Copy link
Member

cjnolet commented Feb 2, 2025

/ok to test

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the PR, it looks great, I just have a few smaller comments.

@enp1s0
Copy link
Member Author

enp1s0 commented Feb 3, 2025

@tfeher Thank you for reviewing the code. I fixed it. Can you review it again?

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @enp1s0 for the updates, the PR looks good to me!

@cjnolet
Copy link
Member

cjnolet commented Feb 3, 2025

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Feb 4, 2025

/merge

@rapids-bot rapids-bot bot merged commit 058eef2 into rapidsai:branch-25.02 Feb 4, 2025
61 checks passed
@enp1s0 enp1s0 deleted the cagra-hamming-distance branch September 29, 2025 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake cpp feature request New feature or request non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

4 participants