Replies: 5 comments 1 reply
-
Commit/Date: main @ da5e70d (Thu Jan 15 09:44:44 2026 -0500)
MSMARCO-1M (1000 queries, Recall@50)
Disk Usage Breakdownstore_vectors_in_graph=False and quantization=INT8du -sh arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/* | sort -h
320K arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/dictionary.0.327680.v0.dict
59M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/VectorData_0_2689535159251959_vecgraph.5.262144.v0.vecgraph
999M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/VectorData_0_2689535159251959.4.262144.v0.lsmvecidx
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/VectorData_0.1.65536.v0.bucketstore_vectors_in_graph=True and quantization=INT8du -sh arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/* | sort -h
320K arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/dictionary.0.327680.v0.dict
999M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/VectorData_0_2689534677234566.4.262144.v0.lsmvecidx
3.9G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/VectorData_0_2689534677234566_vecgraph.5.262144.v0.vecgraph
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/VectorData_0.1.65536.v0.bucketstore_vectors_in_graph=False and quantization=Nonedu -sh arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/* | sort -h
320K arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/dictionary.0.327680.v0.dict
11M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/VectorData_0_2689535353426837.4.262144.v0.lsmvecidx
59M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/VectorData_0_2689535353426837_vecgraph.5.262144.v0.vecgraph
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/VectorData_0.1.65536.v0.bucketstore_vectors_in_graph=True and quantization=Nonedu -sh arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/* | sort -h
320K arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/dictionary.0.327680.v0.dict
11M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/VectorData_0_2689535105029551.4.262144.v0.lsmvecidx
3.9G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/VectorData_0_2689535105029551_vecgraph.5.262144.v0.vecgraph
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/VectorData_0_1.65536.v0.bucket |
Beta Was this translation helpful? Give feedback.
0 replies
-
Commit/Date: main @ da5e70d (Thu Jan 15 09:44:44 2026 -0500)
MSMARCO-1M (1000 queries, Recall@50)
Disk Usage Breakdownstore_vectors_in_graph=False and quantization=INT8du -sh arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/* | sort -h
320K arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/dictionary.0.327680.v0.dict
59M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/VectorData_0_2689535159251959_vecgraph.5.262144.v0.vecgraph
999M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/VectorData_0_2689535159251959.4.262144.v0.lsmvecidx
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=off_hier=on_batch=100000_seed=42/VectorData_0.1.65536.v0.bucketstore_vectors_in_graph=True and quantization=INT8du -sh arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/* | sort -h
320K arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/dictionary.0.327680.v0.dict
999M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/VectorData_0_2689534677234566.4.262144.v0.lsmvecidx
3.9G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/VectorData_0_2689534677234566_vecgraph.5.262144.v0.vecgraph
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=int8_store=on_hier=on_batch=100000_seed=42/VectorData_0.1.65536.v0.bucketstore_vectors_in_graph=False and quantization=Nonedu -sh arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/* | sort -h
320K arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/dictionary.0.327680.v0.dict
11M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/VectorData_0_2689535353426837.4.262144.v0.lsmvecidx
59M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/VectorData_0_2689535353426837_vecgraph.5.262144.v0.vecgraph
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=off_hier=on_batch=100000_seed=42/VectorData_0.1.65536.v0.bucketstore_vectors_in_graph=True and quantization=Nonedu -sh arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/* | sort -h
320K arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/dictionary.0.327680.v0.dict
11M arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/VectorData_0_2689535105029551.4.262144.v0.lsmvecidx
3.9G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/VectorData_0_2689535105029551_vecgraph.5.262144.v0.vecgraph
5.6G arcadedb_runs/dataset=MSMARCO-1M_label=1000000_maxconn=12_beam=64_oq=1_quant=none_store=on_hier=on_batch=100000_seed=42/VectorData_0_1.65536.v0.bucketFindings
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Commit/Date: main @ 91a86e3 (Thu Jan 15 10:32:50 2026 -0500)
MSMARCO-1M (1000 queries, Recall@50)
Findings
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Commit/Date: main @ 6ef8858 (Thu Jan 15 16:40:51 2026 -0500)
MSMARCO-1M (1000 queries, Recall@50)
Disk Usage Breakdown
|
Beta Was this translation helpful? Give feedback.
0 replies
-
17 January 2026 Update
MSMARCO-1M (1000 queries, Recall@50)
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
The original document was written here
Benchmarking Arcadedb's Vector Index Build and Search (LSM + JVector) Performance
MSMARCO Dataset
Commit/Date: main @ d8098d7 (Wed Jan 14 15:20:25 2026 -0500)
MSMARCO-1M (1000 queries, Recall@50)
Findings
*.lsmvecidx+ ~5.6GB bucket + ~3.9GB*.vecgraph; vectors are effectively stored twice (bucket + graph) becausestore_vectors_in_graph=Falseis ignored—LSMVectorIndexGraphFile still serializes inline vectors. This doubles disk and keeps RSS high when mapping the graph file.graphState=MUTABLE, and the search path currently rebuilds on the very next query since it only checksmutationsSinceSerialize>0; the configured threshold (GlobalConfiguration default 100) is bypassed. Pure queries do not increment the counter, so 1,000 searches alone never trigger rebuilds.add_hierarchyraises build time modestly (~+9m: 2h00 vs. 1h51) but improves recall (0.9101 vs. 0.8994) and cuts search time materially (6s vs. 13–16s across 1K queries); likely fewer hops during graph search.MAX_CONNECTIONS=12andBEAM_WIDTH=64held constant; higher will improve recall at higher build cost. JVector lacksefSearch, so overquery (>k then rerank) is the lever; overquery factor was 1 here to simplify results.Beta Was this translation helpful? Give feedback.
All reactions