Skip to content

[Tracker] Optimizing the performance of the Cagra::merge under Logical Strategy #946

@rhdong

Description

@rhdong

This tracker is related with the PR #713 . With logical merge, the search delay increases roughly linearly with split_num. Although each graph is smaller, overall search time grows due to multiple searches, and there's no noticeable speedup.

The setup: A6000 * 1, dataset: sift-128-euclidean, batchsize=1:

- Build

---------------------------------------------------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations  Split_num    MergeStrategy   UserCounters...
---------------------------------------------------------------------------------------------------------------------------
raft_cagra.dim32/process_time/real_time       7.51 s           212 s             1          1         No merge     GPU=7.51419 graph_degree=32 index_size=1000k intermediate_graph_degree=48 ivf_pq_build_niter=10 ivf_pq_build_nlist=16.384k ivf_pq_build_pq_bits=8 ivf_pq_build_pq_dim=32 ivf_pq_build_ratio=10 ivf_pq_search_nprobe=30 ivf_pq_search_refine_ratio=1 split_num=1 dataset_memory_type="device"#graph_build_algo="NN_DESCENT"#ivf_pq_search_internalDistanceDtype="float"#ivf_pq_search_smemLutDtype="float"#merge_type="LOGICAL"
raft_cagra.dim32/process_time/real_time       7.55 s           211 s             1          4         Physical     GPU=7.54935 graph_degree=32 index_size=1000k intermediate_graph_degree=48 ivf_pq_build_niter=10 ivf_pq_build_nlist=16.384k ivf_pq_build_pq_bits=8 ivf_pq_build_pq_dim=32 ivf_pq_build_ratio=10 ivf_pq_search_nprobe=30 ivf_pq_search_refine_ratio=1 split_num=4 dataset_memory_type="device"#graph_build_algo="NN_DESCENT"#ivf_pq_search_internalDistanceDtype="float"#ivf_pq_search_smemLutDtype="float"#merge_type="PHYSICAL"
raft_cagra.dim32/process_time/real_time       8.18 s           254 s             1          4          Logical     GPU=8.17527 graph_degree=32 index_size=1000k intermediate_graph_degree=48 ivf_pq_build_niter=10 ivf_pq_build_nlist=16.384k ivf_pq_build_pq_bits=8 ivf_pq_build_pq_dim=32 ivf_pq_build_ratio=10 ivf_pq_search_nprobe=30 ivf_pq_search_refine_ratio=1 split_num=4 dataset_memory_type="device"#graph_build_algo="NN_DESCENT"#ivf_pq_search_internalDistanceDtype="float"#ivf_pq_search_smemLutDtype="float"#merge_type="LOGICAL"
                                                                                                               
                                                                                                               
- Search
----------------------------------------------------------------------------------------------------------------------------------
Benchmark                                        Time             CPU   Iterations   Split_num    MergeStrategy    UserCounters...
----------------------------------------------------------------------------------------------------------------------------------
raft_cagra.dim32/process_time/real_time       1.26 ms         1.26 ms        44323 GPU=1.25917m Latency=1.26365m Recall=0.99851 end_to_end=56.009 items_per_second=791.355/s itopk=256 k=10 n_queries=1 total_queries=44.323k algo="single_cta"#dataset_memory_type="device"
raft_cagra.dim32/process_time/real_time       1.26 ms         1.26 ms        44651 GPU=1.25266m Latency=1.25714m Recall=0.99856 end_to_end=56.1326 items_per_second=795.455/s itopk=256 k=10 n_queries=1 total_queries=44.651k algo="single_cta"#dataset_memory_type="device"
raft_cagra.dim32/process_time/real_time       5.09 ms         5.09 ms        10989 GPU=5.08771m Latency=5.0925m Recall=0.99928 end_to_end=55.9615 items_per_second=196.367/s itopk=256 k=10 n_queries=1 total_queries=10.989k algo="single_cta"#dataset_memory_type="device"

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions