Skip to content

Conversation

@lowener
Copy link
Contributor

@lowener lowener commented Apr 30, 2025

Related to #841.
The IVF-PQ build metric is not properly initialized in the case of InnerProduct, so I am adding here correct initialization on the C layer, as well as some checks on the C++ side so that the CAGRA metric match the knn metric.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Apr 30, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@github-actions github-actions bot added the cpp label Apr 30, 2025
@cjnolet cjnolet added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Apr 30, 2025
@cjnolet
Copy link
Member

cjnolet commented Apr 30, 2025

/ok to test 57aa341

@navneet1v
Copy link

@lowener does this PR impacts the integration of Cuvs with Faiss library?

@tarang-jain
Copy link
Contributor

@navneet1v Faiss is currently tied to libcuvs=25.04. Whatever goes in after 25.04 is not reflected in faiss until it upgrades to the latest cuVS version.

@navneet1v
Copy link

@tarang-jain let me clarify what I am asking, question is:
Is this bug present in 25.04 version of libcuvs which is present in faiss.

@lowener
Copy link
Contributor Author

lowener commented May 2, 2025

Yes this was already present in libcuvs 25.04 @navneet1v

@navneet1v
Copy link

navneet1v commented May 2, 2025

@lowener should we create a PR in Faiss to upgrade Cuvs version in Faiss to fix the issue?

@tarang-jain
Copy link
Contributor

Since Faiss just depends on a pre-installed version of cuVS and does not build cuVS from source, you should be able to simply install 25.06 after this PR is merged (or even build this PR branch from source) in your environment. So long as any of the public API signatures have not changed between 25.04 and 25.06, it will work just fine while building Faiss from source.
For bumping up Faiss to RAPIDS version 25.06, yes it can be done since it is a simple version upgrade and that would change the faiss-gpu-cuvs conda package. However it might be safer to do it after the release (stable) and not with the cuVS nightlies unless there are crucial feature enhancements / bug fixes in the nightlies.

Signed-off-by: Mickael Ide <[email protected]>
@lowener lowener marked this pull request as ready for review May 5, 2025 14:12
@lowener lowener requested a review from a team as a code owner May 5, 2025 14:12
search_params.lut_dtype = CUDA_R_16F;
search_params.internal_distance_dtype = CUDA_R_16F;
search_params.coarse_search_dtype = CUDA_R_16F;
search_params.lut_dtype = CUDA_R_32F;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context of data already normalized (for cosine computation for example) using FP16 as data type leads to terrible recall due to the loss of precision on small floats. Defaulting to FP32 could prevent those extreme cases, at the expense of index build time. This is important since CAGRA-IVFPQ build params are not yet exposed through python API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I don't think we should be changing defaults. This can lead to massive confusion for users when their perf suffers as a result.
  2. We need to get the ivfpq params expoed through CAGRA's Python API. I didn't realize they weren't already exposed.

Copy link
Contributor

@achirkin achirkin May 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lowener I agree with Corey we should better keep fp16, at least for now, as it has a significant impact on performance (I saw tens of percents of build time difference hence the default is fp16).
However, I'm very curious about the cases where these three internal types significantly affect the recall being fp16 against fp32. Could you please point me to these for a follow up study?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#841 With the Flickr dataset. The user also gave some reproducer code.

@cjnolet
Copy link
Member

cjnolet commented May 7, 2025

/ok to test c620cc1

@navneet1v
Copy link

@cjnolet by when we can expect this change to be merged? We are looking to test this with Faiss to see if the issue of recall is fixed.

@cjnolet
Copy link
Member

cjnolet commented May 7, 2025

@navneet1v it should be merged for the 25.06 release (burndown is in 2 weeks). It'll likely be merged well before burndown, but it needs to pass CI first.

@benfred
Copy link
Member

benfred commented May 8, 2025

/merge

@rapids-bot rapids-bot bot merged commit 19a1759 into rapidsai:branch-25.06 May 8, 2025
74 checks passed
@lowener lowener deleted the 25.06-fix-cagra-build branch May 8, 2025 17:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cpp improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Development

Successfully merging this pull request may close these issues.

6 participants