Skip to content

jean-pierreBoth/bigann

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

bigann

This mini crate benches the crate hnsw-rs on sampled u8 vectors from the BIGANN benchmark. See BIGANN and IRISA

Files bigann_base.bvecs, bigann_query.bvecs must be dowloaded and installed in some directory (This amounts to 133Gb). Then depending on the size of the data you want to run on the first (10M, 100M, 1B) of the large file bigann_base.bvecs you download the corresponding ground truth as explained in the BIGANN web page.

To run on the first 10M slices of data you download the corresponding ground truth corresponding to this size and extract the file dis_10M.fvecs and idx_10M.ivecs from the gnd directory to put it in the same directory as bigann_base.bvecs and bigann_query.bvecs check in source.

commandline

  • bigann --dir DataDir --nbdata 10 (or 100 or 1000) to specify the number million data you want to run.

Use --dump to dump the hnsw structure for search variations

  • bigann --dir DataDir --dump --nbdata 10 (or 100 or 1000) and
  • bigann --dir DataDir --hnsw "dumpbigann" --nbdata 10

For more see documentation (cargo doc --no-deps as usual)

Results for the first 10 Million data points.

Results with standard level sampling

Results on Intel E5-2630 v3 @2.4GHz 16 cores 2 thread / core

All parameters are explained in doc of hnsw-rs.

knbn max_nb_conn ef_cons ef_search extend keep pruned recall req/s last ratio
10 64 100 128 no no 0.995 2610 1.0002
100 64 100 128 no no 0.983 1350 1.0006
10 24 100 128 no no 0.970 4845 1.001
100 24 100 128 no no 0.923 2411 1.003
knbn max_nb_conn ef_cons ef_search extend keep pruned recall req/s last ratio
10 24 100 128 no no 0.960 5900 1.001
100 24 100 128 no no 0.907 2800 1.004
10 24 400 128 no no 0.972 4678 1.001
100 24 400 128 no no 0.938 2338 1.003
10 24 800 128 no no 0.975 4313 1.001
100 24 800 128 no no 0.9428 2151 1.0025

Results with Amd Ryzen 9 7950 16 core and 0.5 scale modification factor

With modified level sampling level (as documented in hnsw-rs) we increase recall and have with max_nb_conn 48 better results than cited above with max_nb_conn=64 without scale modification.
This decreases memory consumption

knbn max_nb_conn ef_cons ef_search extend keep pruned recall req/s last ratio
10 48 100 128 no no 0.997 6283 1.0001
100 48 100 128 no no 0.980 3152 1.0007
10 24 100 128 no no 0.989 9825 1.0003
100 24 100 128 no no 0.958 4897 1.0017

About

bigann benchmark for hnsw_rs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages