Skip to content

[Question]: HNSW Performance Changes in Nightly Build: Query Speedup vs Recall Drop #2812

@woshichengpeng

Description

@woshichengpeng

Describe your problem

Summary

We observed significant performance improvements but also recall degradation when comparing the stable version (v0.6.0-dev3) with the nightly build. The query latency improved by ~81.5% but recall dropped by ~7.7%. We're seeking clarification on whether these changes are expected behavior.

Background

During benchmarking of HNSW performance between Infinity versions, we compared:

  • Stable version: v0.6.0-dev3
  • Nightly build: Latest nightly version (docker id: 48932341cd18)

Performance Changes Observed

Query Performance Improvements

  • Query latency: ~81.5% improvement (9.6ms → 3.0ms average)
  • P95 latency: ~68.5% improvement (11.1ms → 3.5ms)
  • P99 latency: ~65.1% improvement (12.6ms → 4.4ms)

Recall Accuracy Changes

  • Recall drop: ~7.7% degradation (0.912 → 0.842)
  • ID consistency: ~80% of returned IDs remain the same
  • Similarity scores: More accurate in nightly build (old version had systematic calculation errors)

Import Performance

  • Import speed: ~40% slower in nightly build
  • Memory usage: Optimized in nightly build

Technical Details

Test Configuration

{
    "name": "infinity_nytimes",
    "app": "infinity",
    "host": "127.0.0.1:23817",
    "data_path": "datasets/NYTimes/nytimes-256-angular.hdf5",
    "insert_batch_size": 8192,
    "topK": 100,
    "mode": "vector", 
    "schema": {
        "embeddings": {"type": "vector, 256, float"}
    }, 
    "vector_size": 256,
    "vector_name": "embeddings",
    "metric_type": "cosine",
    "index": {
        "embeddings": {
            "type": "HNSW",
            "index_params": {
                "M": 16, 
                "ef_construction": 600,
                "metric": "cosine",
                "encode": "lvq"
            },
            "query_params": {
                "ef": 600
            }
        }
    },
    "query_path": "datasets/NYTimes/nytimes-256-angular.hdf5",
    "batch_size": 8192,
    "ground_truth_path": "datasets/NYTimes/nytimes-256-angular.hdf5"
}

Client Fix Applied

During our testing, we discovered and fixed an issue in the benchmark client (python/benchmark/clients/infinity_client.py):

Issue: The ef parameter from query_params was not being sent per query in the original client code.

Fix: Applied the following patch to properly handle query parameters:

diff --git a/python/benchmark/clients/infinity_client.py b/python/benchmark/clients/infinity_client.py
index 7d9ffea12..af9a72d15 100644
--- a/python/benchmark/clients/infinity_client.py
+++ b/python/benchmark/clients/infinity_client.py
@@ -37,6 +37,7 @@ class InfinityClient(BaseClient):
         self.data_mode = self.data["mode"]
         self.path_prefix = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
         self.table_objs = list()
+        self.query_options_ = {}
 
     def _parse_index_schema(self, index_schema):
         indexs = []
@@ -44,10 +45,9 @@ class InfinityClient(BaseClient):
             if value["type"] == "text":
                 indexs.append(index.IndexInfo(key, index.IndexType.FullText))
             elif value["type"] == "HNSW":
-                params = {}
-                for param, v in value["params"].items():
-                    params[param] = str(v)
-                indexs.append(index.IndexInfo(key, index.IndexType.Hnsw, params))
+                # Pass only index-time parameters
+                index_params = {str(k): str(v) for k, v in value.get("index_params", {}).items()}
+                indexs.append(index.IndexInfo(key, index.IndexType.Hnsw, index_params))
             elif value["type"] == "BMP":
                 params = {}
                 for param, v in value["params"].items():
@@ -150,6 +150,20 @@ class InfinityClient(BaseClient):
         else:
             raise TypeError("Unsupport file type!")
 
+    def create_index(self):
+        """
+        Create indexes for the table.
+        """
+        db_obj = self.client.get_database("default_db")
+        table_obj = db_obj.get_table(self.table_name)
+        indexs = self._parse_index_schema(self.data["index"])
+        logging.info(f"Creating {len(indexs)} indexes...")
+        for i, idx in enumerate(indexs):
+            logging.info(f"Creating index: index{i} on column {idx.column_name}")
+            table_obj.create_index(f"index{i}", [idx])
+        logging.info("Finished creating indexes.")
+
+
     def setup_clients(self, num_threads=1):
         host, port = self.data["host"].split(":")
         self.clients = list()
@@ -160,12 +174,22 @@ class InfinityClient(BaseClient):
             self.clients.append(client)
             self.table_objs.append(table_obj)
 
+        # One-time processing of query parameters
+        embedding_column = ""
+        for k, v in self.data["schema"].items():
+            if "vector" in v['type']:
+                embedding_column = k
+                break
+        if embedding_column:
+            query_params_config = self.data["index"][embedding_column].get("query_params", {})
+            self.query_options_ = {str(k): str(v) for k, v in query_params_config.items()}

    def do_single_query(self, query_id, client_id) -> list[Any]:
        result = None
        query = self.queries[query_id]
        table_obj = self.table_objs[client_id]
        if self.data_mode == "vector":
-            res, _ = (
+            res, _, _ = (
                table_obj.output(["_row_id"])
                .match_dense(
                    self.data["vector_name"],
@@ -173,12 +197,13 @@ class InfinityClient(BaseClient):
                    "float",
                    self.data["metric_type"],
                    self.data["topK"],
                    self.query_options_,
                )
                .to_result()
            )
            result = res["ROW_ID"]
        elif self.data_mode == "fulltext":
-            res, _ = (
+            res, _, _ = (
                table_obj.output(["_row_id", "_score"])
                .match_text(
                    "",

Key Changes:

  1. Added self.query_options_ = {} in __init__ to initialize query options
  2. Fixed HNSW index parameter handling to use index_params instead of params
  3. Added query parameter processing in setup_clients to extract query_params from config
  4. Added self.query_options_ parameter to match_dense calls to pass query parameters per query
  5. Fixed return value unpacking from to_result() calls

This ensures that the ef parameter (and other query parameters) are properly passed to each query, which is crucial for accurate HNSW performance testing.

Questions for the Development Team

  1. Expected Behavior: Are these performance/recall trade-offs expected in the nightly build? Is the recall drop a known side effect of the performance optimizations?

  2. Root Cause: What specific changes in the HNSW implementation are causing the recall degradation? Is it related to:

    • Memory layout changes affecting search paths?
    • Concurrency improvements altering insertion order?
    • Search algorithm optimizations?
  3. Search Path Changes: Are the recall differences due to changes in HNSW search algorithms or just different memory layouts and insertion patterns?

  4. Performance vs Accuracy Trade-off: Is this an intentional trade-off where we gain query speed at the cost of some recall accuracy?

  5. Future Plans: Are there plans to improve recall while maintaining the performance gains?

  6. Configuration Impact: Do different HNSW parameters (M, ef_construction, ef) affect this trade-off differently?

  7. Client Fix: Should the query parameter processing fix be incorporated into the main benchmark client?

Verification Results

We've verified that:

  • All data is properly inserted (no data loss)
  • Missing keys exist in the database
  • Similarity calculations are more accurate in nightly build
  • The recall drop is due to search/index changes, not data integrity issues
  • Query parameters (including ef) are now properly sent per query

Current Impact

  • Positive: Significant query performance improvements
  • Negative: Recall degradation might affect search quality
  • Neutral: More accurate similarity scores
  • Fixed: Query parameters now properly applied

Environment

  • Stable version: v0.6.0-dev3
  • Nightly build: Latest nightly version (docker id: 48932341cd18)
  • Dataset: NYTimes 256-angular (https://ann-benchmarks.com/nytimes-256-angular.hdf5)
  • Index: HNSW with cosine similarity
  • Config: Standard HNSW config (LSG builder not enabled)
  • Client: Fixed benchmark client with proper query parameter handling

Additional Context

The nightly build shows improved similarity calculation precision, suggesting the optimizations are generally beneficial. However, the recall drop raises questions about whether this is an acceptable trade-off for the performance gains.

We're simply seeking clarification on whether these observed changes are normal and expected.


Labels: performance, hnsw, recall-accuracy, nightly-build, client-fix
Priority: Medium
Component: HNSW Index

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions