You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make duplicate removal in all neighbors robust to distance drift across batches (rapidsai#1185)
This is to fix an edge case that happens and the root cause is in issue: rapidsai#1056, which is about different distance results from `raft::linalg::gemm` based on the input sizes.
Right now, when merging two knn graphs from different batches, we sort by distances (i.e. keys), and if the distances are same we sort by indices (i.e. values). After doing so, we compare indices right next to each other to check for duplicates under assumption that same vectors end up with same distances.
However, due to the problem stated in issue 1056, distance for same index can be slightly different based on the size of the input matrix to gemm (or where the vector is in the entire matrix).
For example, say we are calculating nearest neighbors for vector 0.
we could end up with
```
indices = [1, 2, 3, 2, ....]
distances = [0.023, 0.02355981, 0.02355983, 0.02355987]
```
because distance between vector 0 and vector 2 is calculated as 0.02355981 in the first batch, and 0.02355987 in the second batch.
This PR fixes this issue by checking 4 neighbors to its left for duplicates, instead of checking the one next to itself.
Authors:
- Jinsol Park (https://github.com/jinsolp)
- Corey J. Nolet (https://github.com/cjnolet)
Approvers:
- Corey J. Nolet (https://github.com/cjnolet)
URL: rapidsai#1185
0 commit comments