-
Notifications
You must be signed in to change notification settings - Fork 143
Fix missing sync_stream in ScaNN build #1224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| dataset_vec_batches.prefetch_next_batch(); | ||
|
|
||
| // Make sure work on device is finished before swapping buffers | ||
| raft::resource::sync_stream(res); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefetch_next_batch seems to be synchronizing at the very end. Would it make better sense to have the sync_stream before prefetching the batch (above line 175)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
prefetch_next_batch() only syncs on the stream used for copying, so if the copy happens on a separate stream from the work (e.g. if you want to prfetch/overlap) then work might not be done when buffers are swapped (which is what I am trying to fix). Moving sync stream before prefetch will fix the problem, but at the cost of no overlapping of work/copy, reducing performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I had not realized that prefetching is done using a different stream
|
/merge |
As title. For larger number of clusters (or faster PCI-E), the work required to find cluster assignments would take longer than the HtoD copy for each batch. This would lead to overwriting of one of the batch_load_iterator device buffers before kmeans predict completed, and thus lower recall. This adds a raft::resource::sync_stream at the end of the kmeans predict loop to avoid this. Although not technically necessary (as the prior DtoH copy into pageable memory is synchronous), sync_stream is also added to the end of the quantization loop for consistency/correctness. Also removed one unused comment. Authors: - https://github.com/rmaschal Approvers: - Corey J. Nolet (https://github.com/cjnolet) - Tarang Jain (https://github.com/tarang-jain) URL: rapidsai#1224
As title. For larger number of clusters (or faster PCI-E), the work required to find cluster assignments would take longer than the HtoD copy for each batch. This would lead to overwriting of one of the batch_load_iterator device buffers before kmeans predict completed, and thus lower recall.
This adds a raft::resource::sync_stream at the end of the kmeans predict loop to avoid this. Although not technically necessary (as the prior DtoH copy into pageable memory is synchronous), sync_stream is also added to the end of the quantization loop for consistency/correctness.
Also removed one unused comment.