Skip to content

Conversation

@viclafargue
Copy link
Contributor

@viclafargue viclafargue commented Jul 18, 2024

This PR implements a distributed (single-node-multiple-GPUs) implementation of ANN indexes. It allows to build, extend and search an index on multiple GPUs.

Before building the index, the user has to choose between two modes :

Sharding mode : The index dataset is split, each GPU trains its own index with its respective share of the dataset. This is intended to both increase the search throughput and the maximal size of the index.
Index duplication mode : The index is built once on a GPU and then copied over to others. Alternatively, the index dataset is sent to each GPU to be built there. This intended to increase the search throughput.

SNMG indexes can be serialized and de-serialized. Local models can also be deserialized and deployed in index duplication mode.

bench

Migrated from rapidsai/raft#1993

@viclafargue viclafargue requested review from a team as code owners July 18, 2024 10:46
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 18, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@viclafargue viclafargue requested a review from a team as a code owner July 18, 2024 16:08
@viclafargue viclafargue requested a review from jameslamb July 18, 2024 16:08
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Victor for the PR, it looks really great! I have just a few smaller comments.

@viclafargue viclafargue requested a review from a team as a code owner July 26, 2024 13:17
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few additional comments

Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are almost there, but the matrix extent type has to be fixed: we should distinguish the neighbor index type (IdxT, usually int64), and the mdspan extent type which is always int64. In a previous review I have marked such changes in ann_mp.cuh, but actually the public API in ann_mg.hpp has also a few places where this need to be fixed

@tfeher tfeher added feature request New feature or request non-breaking Introduces a non-breaking change labels Jul 29, 2024
@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@dantegd
Copy link
Member

dantegd commented Oct 3, 2024

CI error here seemed to have been a connection error:

[rapids-conda-retry] Exiting, no retryable mamba errors detected: 'ChecksumMismatchError:', 'ChunkedEncodingError:', 'CondaHTTPError:', 'CondaMultiError:', 'Connection broken:', 'ConnectionError:', 'DependencyNeedsBuildingError:', 'EOFError:', 'JSONDecodeError:', 'Multi-download failed', 'Timeout was reached', segfault exit code 139

@divyegala
Copy link
Member

/ok to test

@tfeher
Copy link
Contributor

tfeher commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Oct 3, 2024

/merge

@rapids-bot rapids-bot bot merged commit 3383f28 into rapidsai:branch-24.10 Oct 3, 2024
const cuvs::neighbors::index_params* index_params,
raft::mdspan<const T, matrix_extent<int64_t>, row_major, Accessor> index_dataset)
{
interface.mutex_->lock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @viclafargue @achirkin, I'm studying the code for the multi-GPU index feature—great work! I noticed a potential issue: the lock system does not prevent data races on device memory since the API does not explicitly synchronize the stream (at least, I found this to be the case with deserialize). If this behavior is intentional, please feel free to disregard my comments.

Copy link
Contributor Author

@viclafargue viclafargue Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rhdong, thank you for noticing this! It is indeed probably safer to synchronize before unlocking to unsure that all cudaMemcpy calls made during deserialization have completed. Will update this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again! Did an edit to fix this issue : 4e0f512

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CMake cpp feature request New feature or request non-breaking Introduces a non-breaking change Python

Development

Successfully merging this pull request may close these issues.

10 participants