[Question]: HNSW Performance Changes in Nightly Build: Query Speedup vs Recall Drop

### Describe your problem

## Summary
We observed significant performance improvements but also recall degradation when comparing the stable version (v0.6.0-dev3) with the nightly build. The query latency improved by ~81.5% but recall dropped by ~7.7%. We're seeking clarification on whether these changes are expected behavior.

## Background
During benchmarking of HNSW performance between Infinity versions, we compared:
- **Stable version**: v0.6.0-dev3
- **Nightly build**: Latest nightly version (docker id: 48932341cd18)

## Performance Changes Observed

### Query Performance Improvements
- **Query latency**: ~81.5% improvement (9.6ms → 3.0ms average)
- **P95 latency**: ~68.5% improvement (11.1ms → 3.5ms)
- **P99 latency**: ~65.1% improvement (12.6ms → 4.4ms)

### Recall Accuracy Changes
- **Recall drop**: ~7.7% degradation (0.912 → 0.842)
- **ID consistency**: ~80% of returned IDs remain the same
- **Similarity scores**: More accurate in nightly build (old version had systematic calculation errors)

### Import Performance
- **Import speed**: ~40% slower in nightly build
- **Memory usage**: Optimized in nightly build

## Technical Details
- **Dataset**: NYTimes 256-dimensional vectors (~290K vectors)
- **Dataset download**: https://ann-benchmarks.com/nytimes-256-angular.hdf5
- **Index Type**: HNSW with M=16, ef_construction=600, ef=600
- **Metric**: Cosine similarity
- **Batch size**: 8192 (same for both versions)

## Test Configuration
```json
{
    "name": "infinity_nytimes",
    "app": "infinity",
    "host": "127.0.0.1:23817",
    "data_path": "datasets/NYTimes/nytimes-256-angular.hdf5",
    "insert_batch_size": 8192,
    "topK": 100,
    "mode": "vector", 
    "schema": {
        "embeddings": {"type": "vector, 256, float"}
    }, 
    "vector_size": 256,
    "vector_name": "embeddings",
    "metric_type": "cosine",
    "index": {
        "embeddings": {
            "type": "HNSW",
            "index_params": {
                "M": 16, 
                "ef_construction": 600,
                "metric": "cosine",
                "encode": "lvq"
            },
            "query_params": {
                "ef": 600
            }
        }
    },
    "query_path": "datasets/NYTimes/nytimes-256-angular.hdf5",
    "batch_size": 8192,
    "ground_truth_path": "datasets/NYTimes/nytimes-256-angular.hdf5"
}
```



## Client Fix Applied
During our testing, we discovered and fixed an issue in the benchmark client (`python/benchmark/clients/infinity_client.py`):

**Issue**: The `ef` parameter from `query_params` was not being sent per query in the original client code.

**Fix**: Applied the following patch to properly handle query parameters:

```diff
diff --git a/python/benchmark/clients/infinity_client.py b/python/benchmark/clients/infinity_client.py
index 7d9ffea12..af9a72d15 100644
--- a/python/benchmark/clients/infinity_client.py
+++ b/python/benchmark/clients/infinity_client.py
@@ -37,6 +37,7 @@ class InfinityClient(BaseClient):
         self.data_mode = self.data["mode"]
         self.path_prefix = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
         self.table_objs = list()
+        self.query_options_ = {}
 
     def _parse_index_schema(self, index_schema):
         indexs = []
@@ -44,10 +45,9 @@ class InfinityClient(BaseClient):
             if value["type"] == "text":
                 indexs.append(index.IndexInfo(key, index.IndexType.FullText))
             elif value["type"] == "HNSW":
-                params = {}
-                for param, v in value["params"].items():
-                    params[param] = str(v)
-                indexs.append(index.IndexInfo(key, index.IndexType.Hnsw, params))
+                # Pass only index-time parameters
+                index_params = {str(k): str(v) for k, v in value.get("index_params", {}).items()}
+                indexs.append(index.IndexInfo(key, index.IndexType.Hnsw, index_params))
             elif value["type"] == "BMP":
                 params = {}
                 for param, v in value["params"].items():
@@ -150,6 +150,20 @@ class InfinityClient(BaseClient):
         else:
             raise TypeError("Unsupport file type!")
 
+    def create_index(self):
+        """
+        Create indexes for the table.
+        """
+        db_obj = self.client.get_database("default_db")
+        table_obj = db_obj.get_table(self.table_name)
+        indexs = self._parse_index_schema(self.data["index"])
+        logging.info(f"Creating {len(indexs)} indexes...")
+        for i, idx in enumerate(indexs):
+            logging.info(f"Creating index: index{i} on column {idx.column_name}")
+            table_obj.create_index(f"index{i}", [idx])
+        logging.info("Finished creating indexes.")
+
+
     def setup_clients(self, num_threads=1):
         host, port = self.data["host"].split(":")
         self.clients = list()
@@ -160,12 +174,22 @@ class InfinityClient(BaseClient):
             self.clients.append(client)
             self.table_objs.append(table_obj)
 
+        # One-time processing of query parameters
+        embedding_column = ""
+        for k, v in self.data["schema"].items():
+            if "vector" in v['type']:
+                embedding_column = k
+                break
+        if embedding_column:
+            query_params_config = self.data["index"][embedding_column].get("query_params", {})
+            self.query_options_ = {str(k): str(v) for k, v in query_params_config.items()}

    def do_single_query(self, query_id, client_id) -> list[Any]:
        result = None
        query = self.queries[query_id]
        table_obj = self.table_objs[client_id]
        if self.data_mode == "vector":
-            res, _ = (
+            res, _, _ = (
                table_obj.output(["_row_id"])
                .match_dense(
                    self.data["vector_name"],
@@ -173,12 +197,13 @@ class InfinityClient(BaseClient):
                    "float",
                    self.data["metric_type"],
                    self.data["topK"],
                    self.query_options_,
                )
                .to_result()
            )
            result = res["ROW_ID"]
        elif self.data_mode == "fulltext":
-            res, _ = (
+            res, _, _ = (
                table_obj.output(["_row_id", "_score"])
                .match_text(
                    "",
```

**Key Changes**:
1. **Added `self.query_options_ = {}`** in `__init__` to initialize query options
2. **Fixed HNSW index parameter handling** to use `index_params` instead of `params`
3. **Added query parameter processing** in `setup_clients` to extract `query_params` from config
4. **Added `self.query_options_` parameter** to `match_dense` calls to pass query parameters per query
5. **Fixed return value unpacking** from `to_result()` calls

This ensures that the `ef` parameter (and other query parameters) are properly passed to each query, which is crucial for accurate HNSW performance testing.

## Questions for the Development Team

1. **Expected Behavior**: Are these performance/recall trade-offs expected in the nightly build? Is the recall drop a known side effect of the performance optimizations?

2. **Root Cause**: What specific changes in the HNSW implementation are causing the recall degradation? Is it related to:
   - Memory layout changes affecting search paths?
   - Concurrency improvements altering insertion order?
   - Search algorithm optimizations?

3. **Search Path Changes**: Are the recall differences due to changes in HNSW search algorithms or just different memory layouts and insertion patterns?

4. **Performance vs Accuracy Trade-off**: Is this an intentional trade-off where we gain query speed at the cost of some recall accuracy?

5. **Future Plans**: Are there plans to improve recall while maintaining the performance gains?

6. **Configuration Impact**: Do different HNSW parameters (M, ef_construction, ef) affect this trade-off differently?

7. **Client Fix**: Should the query parameter processing fix be incorporated into the main benchmark client?

## Verification Results
We've verified that:
- All data is properly inserted (no data loss)
- Missing keys exist in the database
- Similarity calculations are more accurate in nightly build
- The recall drop is due to search/index changes, not data integrity issues
- Query parameters (including `ef`) are now properly sent per query


## Current Impact
- **Positive**: Significant query performance improvements
- **Negative**: Recall degradation might affect search quality
- **Neutral**: More accurate similarity scores
- **Fixed**: Query parameters now properly applied

## Environment
- **Stable version**: v0.6.0-dev3
- **Nightly build**: Latest nightly version (docker id: 48932341cd18)
- **Dataset**: NYTimes 256-angular (https://ann-benchmarks.com/nytimes-256-angular.hdf5)
- **Index**: HNSW with cosine similarity
- **Config**: Standard HNSW config (LSG builder not enabled)
- **Client**: Fixed benchmark client with proper query parameter handling

## Additional Context
The nightly build shows improved similarity calculation precision, suggesting the optimizations are generally beneficial. However, the recall drop raises questions about whether this is an acceptable trade-off for the performance gains.

We're simply seeking clarification on whether these observed changes are normal and expected.

---

**Labels**: `performance`, `hnsw`, `recall-accuracy`, `nightly-build`, `client-fix`
**Priority**: Medium
**Component**: HNSW Index 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: HNSW Performance Changes in Nightly Build: Query Speedup vs Recall Drop #2812

Describe your problem

Summary

Background

Performance Changes Observed

Query Performance Improvements

Recall Accuracy Changes

Import Performance

Technical Details

Test Configuration

Client Fix Applied

Questions for the Development Team

Verification Results

Current Impact

Environment

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: HNSW Performance Changes in Nightly Build: Query Speedup vs Recall Drop #2812

Description

Describe your problem

Summary

Background

Performance Changes Observed

Query Performance Improvements

Recall Accuracy Changes

Import Performance

Technical Details

Test Configuration

Client Fix Applied

Questions for the Development Team

Verification Results

Current Impact

Environment

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions